Aristotle University of Thessaloniki Department of Informatics

11

Constantine Kotropoulos

Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002

Visual Information RetrievalVisual Information RetrievalVisual Information RetrievalVisual Information Retrieval

Aristotle University of Aristotle University of Thessaloniki Thessaloniki

Department of InformaticsDepartment of Informatics

22

Fundamentals Still image segmentation: Comparison of ICM and LVQ

techniques Shape retrieval based on Hausdorff distance Video Summarization: Detecting shots, cuts, and fades

in video – Selection of key frames MPEG-7: Standard for Multimedia Applications Conclusions

OutlineOutline

33

About Toward visual information retrieval Data types associated with images or video First generation systems Second generation systems Content-based interactivity Representation of visual content Similarity models Indexing methods Performance evaluation

Fundamentals Fundamentals

44

Visual information retrieval: – To retrieve images or image sequences from a database that are relevant to a query. – Extension of traditional information retrieval designed to include visual media.

Needs: Tools and interaction paradigms that permit searching for visual data by referring directly to its content.

– Visual elements (color, texture, shape, spatial relationships) related to perceptual aspects of image content.

– Higher-level concepts: clues for retrieving images with similar content from a database. Multidisciplinary field:

– Information retrieval Image/video analysis and processing– Visual data modeling and representation Pattern recognition– Multimedia database organization Computer vision– Multimedia database organization User behavior modeling– Multidimensional indexing Human-computer interaction

AboutAbout

55

Databases – allow a large amount of alphanumeric data to be stored in a local repository

and accessed by content through appropriate query languages. Information Retrieval Systems

– provide access to unstructured text documents• Search engines working in the textual domain either using keywords or

full text. Need for Visual Information Retrieval Systems has become

apparent when– digital archives were released. – distribution of image and video data though large-bandwidth computer

networks emerged– more prominent as we progress to the wireless era!

Toward visual information Toward visual information retrievalretrieval

66

Query by image content Query by image content using NOKIA 9210 using NOKIA 9210

CommunicatorCommunicator

www.iva.cs.tut.fi/COST211www.iva.cs.tut.fi/COST211

Iftikhar et al.Iftikhar et al.

77

Content-dependent metadata – Data related in some way to image/video content (e.g., format,

author’s name, date, etc.) Content-dependent metadata

– Low/intermediate-level features: color, texture, shape, spatial relationship, motion, etc.

– Data referring to content semantics (content-descriptive metadata)

Impact on the internal organization of the retrieval system

Data types associated with Data types associated with images or videoimages or video

88

Answers to queries: Find– All images of paintings of El Greco.– All byzantine ikons dated from 13th century, etc.

Content-independent metadata: alphanumeric strings Representation schemes: relational models, frame models, object-oriented. Content-dependent metadata: annotated keywords or scripts Retrieval: Search engines working in the textual domain (SQL, full text

retrieval) Examples: PICDMS (1984), PICQUERY (1988), etc. Drawbacks:

– Difficult for text to capture the distinctive properties of visual features– Text not appropriate for modeling perceptual similarity– Subjective

First generation systems First generation systems

99

Supports full retrieval by visual content– Conceptual level: keywords– Perceptual level: objective measurements at pixel level– Other sensory data (speech, sound) might help (e.g. video streams).

Image processing, pattern recognition and computer vision are an integral part of architecture and operation

Retrieval systems for – 2-D still images– Video – 3-D images and video– WWW

Second generation systems Second generation systems

1010

Content– Perceptual properties: color, texture, shape, and spatial relationships– Semantic primitives: objects, roles, and scenes– Impressions, emotions, and meaning associated with the combination of perceptual

features Basic retrieval paradigm: For each image a set of descriptive features are pre-

computed Queries by visual examples

– The user selects the features, ranges of model parameters, and chooses a similarity measure

– The system checks the similarity between the visual content of the user’s query and database images.

Objective: To keep the number of misses as low as possible. Number of false alarms?

Interaction: Relevance feedback

Retrieval systems for 2-D still Retrieval systems for 2-D still images (1)images (1)

1111

Similarity vs. matching

Matching is a binary partition operator: “Does the observed object correspond to a model or not?”

Uncertainties are managed during the process

Similarity-based retrieval: To re-order the database of images according to how similar are to a query example. Ranking not classification

The user is in the retrieval loop; Need for a flexible interface.

Retrieval systems for 2-D still Retrieval systems for 2-D still images (2)images (2)

1212

Video conveys information from multiple planes of communication– How the frames are linked together using editing effects (cuts, fades,

dissolves, etc).– What is in the frames (characters,story content, etc.)

Each type of video (commercials, news, movies, sport) has its own peculiar characteristics.

Basic Terminology– Frame: basic unit of information usually samples at 1/25 or 1/30 of a second.– Shot: A set of frames between a camera turn-on and a camera turn-off– Clip: A set of frames with some semantic content– Episodes: An hierarchy of shots; – Scene: A collection of consecutive shots that share simultaneity is space, time, and

action (e.g. a dialog scene). Video is accessed through browsing and navigation

Retrieval systems for video (1)Retrieval systems for video (1)

1313

Retrieval systems for video (2)Retrieval systems for video (2)

1414

3-D images and video are available in– biomedicine– computer-aided design– Geographic maps– Painting– Games and entertainment industry (immersive environments)

Expected to flourish in the current decade Retrieval on the WWW:

– Distributed problem– Need for standardization (MPEG-7)– Response time is critical (work in the compressed domain, summarization)

Retrieval systems for 3-D Retrieval systems for 3-D images and video / WWWimages and video / WWW

1515

1. Visual interfaces

2. Standards for content representation

3. Database models

4. Tools for automatics extraction of features from images and video

5. Tools for extraction of semantics

6. Similarity models

7. Effective indexing

8. Web search and retrieval

9. Role of 3-D

Research directionsResearch directions

1616

Browsing offers a panoramic view of the visual information space

Visualization

Content-based interactivityContent-based interactivity

www.virage.comwww.virage.com

1717

QBICQBIC

http://wwwqbic.almaden.ibm.com/http://wwwqbic.almaden.ibm.com/

color layout

1818

For still images: To check if the concepts expressed in a query match the concepts

of database images: “find all Holy Ikons with a nativity” “find all Holy Ikons with Saint George” (object categories)Treated with free-text or SQL-based retrieval engines (Google)

To verify spatial relations between spatial entities“find all images with a car parked outside a house” topological queries (disjunction, adjacency, containment, overlapping) metric queries (distances, directions, angles)

Treated with SQL-like spatial query languages

Querying by content (1)Querying by content (1)

1919

To check the similarity of perceptual features (color, texture, edges, corners, and shapes) exact queries: “find all images of President Bush” range queries: “find all images with colors between green and blue” K-nearest neighbor queries: find the ten most similar images to the example”

For video: Concepts related to video content Motion, objects, texture, and color features of video: Shot

extraction, dominant colors, etc.

Querying by content (2)Querying by content (2)

2020

Google Google

2121

Ark of Refugee HeirloomArk of Refugee Heirloom

www.ceti.gr/kivotoswww.ceti.gr/kivotos

2222

Suited to express perceptual aspects of low/intermediate features of visual content.

The user provides a prototype image as a reference example Relevance feedback: the user analyses the responses of the system and

indicates, for each item retrieved the degree of relevance or the exactness of the ranking; the annotated results are fed back into the system to refine the query.

Types of querying:– Iconic (PN) :

• Suitable for retrieval based on high-level concepts

– By painting• Employed in color-based retrieval (NETRA)

– By sketch (PICASSO)– By image (NETRA)

Querying by visual exampleQuerying by visual example

2323

PICASSO/PNPICASSO/PN

http://viplab.dsi.unifi.it/PN/

2424

NETRANETRA

http://maya.ece.ucsb.edu/Netra/netra.html

2525

Representation of perceptual features of images and video is a fundamental problem in visual information retrieval.

Image analysis and pattern recognition algorithms provide the means to extract numeric descriptors.

Computer vision enables object and motion identification Representation of perceptual features

Color Texture Shape Structure Spatial relationships Motion

Representation of content semantics Semantic primitives Semiotics

Representation of visual Representation of visual content content

2626

Representation of Representation of perceptual featuresperceptual features

Color (1)Color (1)

2727

Human visual system: Responsible for color perception are the cones.

From psychological point of view, perception of color is related to several factors e.g., color attributes (brightness, chromaticity, saturation) surrounding colors color spatial organization observer’s memory/knowledge/experience

Geometric color models (RGB, HSV, Lab, etc.) Color histogram: to describe the low-level color properties.

Representation of Representation of perceptual features Color perceptual features Color

(2)(2)

2828

Image retrieval by color Image retrieval by color similarity (1)similarity (1)

Color spaces Histograms; Moments of distribution Quantization of the color space Similarity measures

L1 and L2 norm of the difference between the query histogram H(IQ) and the histogram of a database image H(ID)

2929

Image retrieval by color Image retrieval by color similarity (2)similarity (2)

histogram intersection

weighted Euclidean distance

3030

Texture: One level of abstraction above pixels.

Perceptual texture dimensions: Uniformity Density Coarseness Roughness Regularity Linearity Directionality/Direction Frequency Phase

Representation of Representation of perceptual features Texture perceptual features Texture

(1)(1)

Brodatz albumBrodatz album

3131

Statistical methods: Autocorrelation function (coarseness, periodicity) Frequency content [rings, wedges] Coarseness,

Directionality, isotropic/non-isotropic patterns Moments Directional histograms and related features Run-lengths and related features Co-occurrence matrices

Structural methods (Grammars and production rules)

Representation of Representation of perceptual features Texture perceptual features Texture

(2)(2)

3232

Criteria of a good shape representation Each shape possesses a unique representation invariant to

translation, rotation, and scaling. Similar shapes should have similar representations

Methods to extract shapes and to derive features stem from image processing Chain codes Polygonal approximations Skeletons Boundary descriptors

contour length/ diameter shape numbers Fourier descriptors Moments

Representation of Representation of perceptual features Shape perceptual features Shape

(1)(1)

3333


(2)(2)

Chain codesChain codes

Polygonal approximationPolygonal approximation

(I. Pitas)(I. Pitas)

3434


(3)(3)

a b c d

Face segmentation: (a) original color image (b) skin segmentation. (c ) connected components (d) best fit-ellipses.

3535

Structure To provide a Gestalt impression of the shapes in the image.

set of edges corners

To distinguish photographs from drawings. To classify scenes: portrait, landscape, indoor

Spatial relationships Spatial entities: points, lines, regions, and objects Relationships:

Directional (include a distance/angle measure) Topological (do not include distance but they capture set-theoretical

concepts e.g. disjunction) They are represented symbolically.

Representation of Representation of perceptual features perceptual features Structure/Spatial Structure/Spatial

relationships relationships

3636

Main characterizing element in a sequence of frames Related to change in the relative position of spatial entities or toa

a camera movement. Methods:

Detection of temporal changes of gray-level primitives (optical flow) Extraction of a set of sparse characteristic features of the objects, such as

corners or salient points and their tracking in subsequent frames. Crucial role in video

Representation of Representation of perceptual features Motionperceptual features Motion

Salient features (Kanade et al.)

3737

Identification of objects, roles, actions and events as abstractions of visual signs.

Achieved through recognition and interpretation Recognition

To select a set of low-level local features and statistical pattern recognition for object classification

Interpretation is based on reasoning. Domain-dependent e.g. Photobook (www-white.media.mit.edu) Retrieval systems including interpretation: facial database

systems to compare facial expressions

Representation of content Representation of content semantics Semantic semantics Semantic

primitivesprimitives

3838

Grammar of color usage to formalize effects Association of color hue, saturation, etc to psychological

behaviors Semiotics identifies two distinct steps for the production of

meaning Abstract level by narrative structures (e.g. camera breaks, colors, editing

effects, rhythm, shot angle) Concrete level by discourse structures: how the narrative elements create

a story.

Representation of content Representation of content semantics Semiotics semantics Semiotics

3939

Pre-attentive: perceived similarity between stimuli Color/texture/shape; Models close to human perception

Attentive: Interpretation Previous knowledge and a form of reasoning Domain-specific retrieval applications (mugshots); need for

models and similarity criteria definition

Similarity models Similarity models

4040

Distance in a metric psychological Properties of a distance function d:

Commonly used distance functions: Euclidean City-block Minkowsky

Metric model (1)Metric model (1)

4141

Inadequacies: shape similarity Advantages:

similarity judgment of color stimuli consistent with pattern recognition and computer vision suitable for creating indices

Other similarity models: Virtual metric spaces Tversky’s model: function of two types of features: those that are

common to the two stimuli and those that exclusively appear to one only stimulus.

Transformational distances: elastic graph matching

User subjectivity?

Metric model (2)Metric model (2)

4242

Self improving database browser and annotator based on user interaction

Similarity is presented with groupings

The system chooses in trees hierarchies those nodes which most efficiently represent the positive examples.

Set-covering algorithm to remove all positive examples covered.

Iterations

Four eyes approach Four eyes approach

4343

To avoid sequential scanning Retrieved images are ranked in order of similarity to a query Compound measure of similarity between visual features and

text attributes. Indexing of string attributes Commonly used indexing techniques

Hashing tables and signatures Cosine similarity function

Indexing methods (1)Indexing methods (1)

4444

Triangle inequality (Barros et al.) When the query item q is presented, then d(q,r) is computed. For all database items i:

Maximum threshold l=d(q,r); r the most similar item Search for distances closest to d(q,r) If d(i,r) inferior to d(q,r) is found, item i is regarded as the most

similar item, and l=d(i,r). Continue until | d(i,r)-d(q,r)| l

Indexing methods (2)Indexing methods (2)

4545

Fixed grids: non-hierarchical index structure that organizes the space into buckets.

Grid files: fixed grids with buckets of unequal size K-d trees: Binary tree; the values of one of the k features is

checked at each node. R-trees: partition the feature space into multidimensional

rectangles SS-trees: Weighted Euclidean distance; suitable for clustering;

ellipsoidal clusters

Index structuresIndex structures

4646

Performance evaluationPerformance evaluation

Judgment by evaluator

Relevant Not relevant

Retrieved A (correctly retrieved)

C (falsely retrieved)

Not Retrieved B(missed)

D(correctly rejected)

CA

Arecall

BA

Aprecision

4747

Wrap-upWrap-up

Visual information retrieval is a research topic at the intersection of digital image processing, pattern recognition, and computer vision (fields of our interest/expertise) but also information retrieval, databases. Related to semantic web Challenging research topic dealing with many unsolved problems:

segmentation machine similarity vs. human perception focused searching

4848

Still Image Segmentation: Still Image Segmentation: Comparison of ICM and LVQComparison of ICM and LVQ

Comparison– Iterated Conditional Modes (ICM)– Split and Merge Learning Vector Quantizer (LVQ)

Ability to extract meaningful image parts based on the ground truth

Evaluation of still image segmentation algorithms

4949

Iterated Conditional Iterated Conditional Modes (ICM)Modes (ICM)

The ICM method is based on the maximization of the probability density function of the image model given real image data.

The criterion function is:

where xs is the region assignment and ys is the luminance value of the pixel s mi and δi are mean value and the standard deviation of luminance of the region i; C is the clique of the pixel s, VC(x) is the potential function of C, N8(s) is 8x8 neighborhood of the pixel s.

s CxC

i

isqss

s

xVmy

sNqxyxp )(2

exp)(,,|2

2

8

5050

How ICM worksHow ICM works

Initial segmentation is obtained using the K-means clustering algorithm. Cluster center initialization is based on image intensity histogram.

At each iteration probability, the value of the criterion function, is calculated for each pixel. Pixels are assigned to clusters- regions with maximum probability.

Having a new segmentation, the mean intensity value and the cluster variance are estimated. The iterative process stops when no change occurs in clusters.

For obtained segmentation, small regions are merged with nearest ones. The output image contains the large regions assigned the mean luminance value.

5151

Image features and Image features and parameters of the ICM parameters of the ICM

algorithmalgorithmThe ICM algorithm is applied on the luminance

component of the image. Input for the algorithm is a gray level image.

The parameter of the algorithm is the value of the potential function.

The parameter controls the roughness of the segment boundaries.

The value of the parameter is tuned experimentally.

5252

Segmentation results Segmentation results (ICM )(ICM )

5353

Learning Vector Quantizer Learning Vector Quantizer (1)(1)

neural networkself organizingcompetitive learning lawunsupervisedapproximates data pdf by adjusting the weights of the

reference vectors

5454


codebookreference vectors representing their nearest data patternsnumber of reference vectors

– predefined– split and merge

5555


Minimal error for data representation:

Iterative correction of reference vectors:

dxxfwxx

r

c )(

)()()()()1( kwkxkakwkw cicc

5656


Split and merge technique– Find the winner reference vector w(k) for pattern x(k).– if x(k) is not an outlier proceed as in standard LVQ.– if x(k) is an outlier:

• split the cluster and include x(k) in one of the sub-clusters.• or• create a new cluster having seed x(k).

5757

Learning Vector Quantizer (5)Learning Vector Quantizer (5)

5858

Experimental set-up (1)Experimental set-up (1)

Apply both methods on images provided by BAL Explore the ability of the algorithms to extract

meaningful image parts based on the qualitative description of the ground truth.

5959

Paintings from Bridgeman Paintings from Bridgeman Art Art

LibraryLibrary – sky, mountains, people, water (smpw)– hammerhead cloud, reflection (cr)– sky, buildings, trees, people, pavement (sbtpp)– sky, people, hat (sph)– sky, trees, water, sails (stws)– horses, sledges, people, snow, sky (hspss)

6060

Experimental setup (2)Experimental setup (2)

We define by O={O1,..,OM} the set of objects given in the qualitative description of the ground truth, where M is the number of objects.

We define by T={T1,..,TN} the set of the regions with the unique label, obtained in the segmented image, where N is number of regions.

Three cases on the outcome of the segmentation as compared to the ground truth are possible.

6161

MatchingMatching

Case 1, best match (BM): The best match is when the region of the segmented image has one to one correspondence with the ground truth object;

Case 2, reasonable match (RM): The reasonable match is when the ground truth object has one to many correspondence with the regions of the segmented image;

Case 3, mismatch. The mismatch is when there is no correspondence between the ground truth objects and the regions of the segmented image.

6262

Three CasesThree Cases

For the jth ground truth object Oj by denoting the cases by i , and the segmented region by T the three cases occur as follows:

.,,1

,3

,2

,,1

Nk

TOwheni

TTOwheni

TTOwheni

j

kj

kj

6363

DecisionDecision

The decision about the presence of the ground truth object Oj in the segmented image according to all cases is:

We put a decision for each object after visual examination of the segmented image according to the definition of the ground truth.

.,0

,,1

otherwise

icaseOr j

ij

6464

Assessment of results (1)Assessment of results (1)

•Ground truth

•sky

•buildings

•trees

•people

•pavement

6565


LVQ

6666


ICM

6767

Assessment of results Assessment of results (4)(4)

•Ground truth

•horses

•sledges

•people

•snow

•sky

6868


LVQ

6969


ICM

7070


Image Gr. truth ICM LVQ

Smpw 5 7 7

Cr 4 10 8

Sbtpp 5 13 8

Sph 3 14 8

Stws 4 15 11

Hspss 3 5 8

Number of regions

7171


ICM LVQ

Image BM RM MM BM RM MMSmpw 0 0.75 0.25 0.5 0.5 0

Cr 0 0.75 0.25 0.5 0.5 0

Sbtpp 0.4 0.4 0.2 0.2 0.8 0

Sph 0 1 0 0 1 0Stws 0.5 0.25 0.25 0.25 0.75 0Hspss 0.33 0 0.66 0 1 0

Ranking: ICM vs. LVQ

BM: Best Match

RM: Reasonable Match

MM: Mismatch

7272


ICM LVQ

BM RM MM BM RM MMAverage 0.20 0.53 0.27 0.24 0.76 0

Ranking: ICM vs. LVQ

BM: Best Match

RM: Reasonable Match

MM: Mismatch

7373

Evaluation of Image Evaluation of Image Segmentation Algorithms (1)Segmentation Algorithms (1)

CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge

7474

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms

((22)) Evaluation within the Semantic Space; Impossible to ask the Average User to provide all possible h Compromise: Evaluation in the Indexing Space;Allows us to access S without explicitly defining σ. Average User: to achieve a consensus on h. Ask users to evaluate two proposed arrows π to obtain Average User’s response. Implicitly characterize h and σ.

7575

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms

((33)) Unsupervised algorithms

1. Multiscale Image Segmentation (UCAM-MIS)

2. Blobworld (UC Berkeley-Blobworld)

3. Iterated Conditional Modes (AUTH-ICM)

4. Learning Vector Quantizer (AUTH-LVQ)

5. Double Markov Random Field (TCD-DMRF)

6. Complex Wavelet based Hidden Markov Tree (UCAM-CHMT)

7676

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((44))

Hard measurements Soft measurements: The speed of response of the user

(time-1): how much better the user prefers one scheme over the other

Faster response: the selected scheme provides a better semantic breakdown of the original image

Slower response: reflects the similarity of two schemes

Aims: To determine whether or not agreement exists in users’ decisions Do two pairwise rankings lead to consistent total orderings? Do hard and soft measurements coincide?

7777



7878



7979

Wrap-upWrap-up

ICM– continuous, large sized regions– appropriate for homogeneous regions

LVQ– spatially connected, small regions– more detailed segmentation

Both provide good RM

8080

Image retrieval based on Hausdorff Image retrieval based on Hausdorff distancedistance

Hausdorff distance definition Advantages How to speed-up the computations Experiments

8181

Hausdorff distance Hausdorff distance definitiondefinition

dH+(A,B) = sup {d(x,B) : x A}

dH-(A,B) = sup {d(y,A) : y B},

d(v,W) = inf {d(v,w) : w W}.

dH(A,B) = max (dH+(A,B), dH-(A,B))

8282

Hausdorff distance Hausdorff distance advantagesadvantages

dH (A, B) = 0 A=B (A, B – sets representing graphical objects, object contours, etc.)

Information about parameters of transformation (complex object recognition)

Predictable – simple intuitive interpretation

dH+ and dH

- - for partial obscured or erroneously segmented objects

Possibility of generalization: max quantiles

Possibility of taking into consideration any object transformations

8383

How to speed up the How to speed up the computations computations

for comparing one pair (1)for comparing one pair (1)A. Replacing objects by their contours

The HD between the objects may be large although for contours the HD is small (e.g. disk and ring) possibility of false alarms

but

Contours of similar objects are always similar (small HD)no possibility of omitting similar objects

8484

How to speed up the How to speed up the computations for comparing computations for comparing

one pairone pair (2)(2)

B. Voronoi diagram or distance transform

C. Early scan termination

D. Pruning some parts of transformation space

8585

How to speed up the How to speed up the computations – Number of computations – Number of

models considermodels considerIdea:

Matrix of distances for models (every pair)

1. Pruning some models (we know they will not match query)

2. Database navigation optimal search order (possibility of early finish)

8686

How to speed up the How to speed up the computationscomputations

A. Excluding of model object from the search

queryquery

ref – any model object

- distance to the closest model

only here may lay model closest to query object

refref

Model closest to query object may lay only in colored area

8787


B. Pruning with many reference objects

8888


C. Optimal searching order

8989


D. Introducing other criteria (pre-computation)

Moment invariants:Moment invariants:

• MM11==(M(M2020+M+M

0202) / m) / m0000

22

• MM22==(M(M2020MM02 02 – M– M

111122) / m) / m

000044

where:where:

ijqp

J

j

I

ipq fjjiiM )()( 00

11

J

jij

qpI

ipq fjim

11

Shape coefficients:Shape coefficients:

• Blair-Bliss coefficientBlair-Bliss coefficient

dsr

SWBB

22

9090

Experiments - databaseExperiments - database

Database: 76 islands, represented as *.bmp images

..…..…

9191

Experiment 1: map queryExperiment 1: map query

Image retrieval. Step 1: interactive segmentation of query object

9292


Searching order: 8 / 76 model object were checked Loading model 1 / 76: "amorgos.bmp"Loading model 1 / 76: "amorgos.bmp"Hausdorff distance: 0.156709Hausdorff distance: 0.156709

Loading model 42 / 76: "ithaca.bmp"Loading model 42 / 76: "ithaca.bmp"Hausdorff distance: 0.143915Hausdorff distance: 0.143915

Loading model 27 / 76: "ikaria.bmp"Loading model 27 / 76: "ikaria.bmp"Hausdorff distance: 0.080666Hausdorff distance: 0.080666

Loading model 31 / 76: "kasos.bmp"Loading model 31 / 76: "kasos.bmp"Hausdorff distance: 0.080551Hausdorff distance: 0.080551

Loading model 20 / 76: "sikinos.bmp"Loading model 20 / 76: "sikinos.bmp"Hausdorff distance: 0.121180Hausdorff distance: 0.121180

Loading model 52 / 76: "alonissos.bmp"Loading model 52 / 76: "alonissos.bmp"Hausdorff distance: 0.153914Hausdorff distance: 0.153914

Loading model 17 / 76: "rithnos.bmp"Loading model 17 / 76: "rithnos.bmp"Hausdorff distance: 0.103512Hausdorff distance: 0.103512

Loading model 61 / 76: "skopelos.bmp"Loading model 61 / 76: "skopelos.bmp"Hausdorff distance: 0.045430Hausdorff distance: 0.045430

9393


Minimum of Hausdorff distance of model closest to query object

9494

Experiment 2: mouse-Experiment 2: mouse-drawing querydrawing query

Query HD criterion position for min HD HD+M1+M2+WBB

SantoriniSantorini HD = 0.112HD = 0.112 MCD=1.024 MCD=1.024

HD = 0.143HD = 0.143 MCD=1.771 MCD=1.771max HD = 0. 3072max HD = 0. 3072 max MCD=3.4326 max MCD=3.4326

closestclosest

secondsecond

furthestfurthest

PorosPorosElafonisosElafonisos

9595

Wrap-upWrap-up

Hausdorff distance is better for shape recognition than feature-based criteria.Big computational cost of image retrieval based on HD can be reduced by:

• decreasing cost of computation for pair of objects

• replacing object by it’s contours

• using of Voronoi diagram

• off-line database processing – calculating of matrix of distances between model objects

• reducing number of model objects to be compared

• optimal searching order

• using features as auxiliary similarity criteria

9696

Video Summarization: Detecting shots, cuts, and fades in video –

Selection of key frames

9797

OutlineOutline

Entropy, joint entropy, and mutual information Shot cut detection based on mutual information Fade detection based on joint entropy Key frame selection Comparison with other methods Wrap-up

9898

Entropy-Joint EntropyEntropy-Joint Entropy

measure of the information content or the “uncertainty” about X.

• Joint entropy of RVs X and Y:

• Entropy of a random variable X (RV):

9999

Mutual InformationMutual Information

It measures the average reduction in uncertainty about X that results from learning the value of Y.

It measures the amount of information that X conveys about Y.

100100

- for each pair of successive frames ft and f t+1 whose gray levels vary from

0 to N-1

• Calculate three NxN co-occurrence matrices, one for each chromatic component R, G, and B,

whose (i,j) element is the joint probability of observing a pixel having the ith gray level in ft and jth gray level in f t+1

• calculate the mutual information of the gray levels for the three components R, G, B independently and sum them.

CCCB

tt

G

tt

R

tt 1,1,1,,,

Algorithm for detecting Algorithm for detecting abrupt cuts (1)abrupt cuts (1)

101101

– Apply a robust estimator of the mean value in the time-series of mutual information values by defining a time-window around each time instant t0

- An abrupt cut is detected if

Algorithm for detecting Algorithm for detecting abrupt cuts (2)abrupt cuts (2)

102102

cuts

• Mutual information pattern from “star” video sequence that depicts cuts

Mutual information pattern Mutual information pattern (1)(1)

103103

Ground truthGround truth

104104

Performance evaluationPerformance evaluation

GT: denotes the ground truth,Seg: the segmented (correct and false) shots using

our methodsRecall is corresponding to the probability of detection

Precision is corresponding to the accuracy of the method considering false detections

Overlap (for fades)

105105

Test results (1)Test results (1)

106106

Test results (2)Test results (2)

107107

– Features that could be used to define a distance measure: • Successive color frame differences:

• Successive color vector bin-wise HS histogram differences (invariant to brightness changes):

– Fusion of the two differences: – Shot cut detection

• by adaptive local thresholding

Alternative technique for shot Alternative technique for shot cut detectioncut detection

108108

results using mutualinformation

results using the combined method

Comparison of abrupt cut Comparison of abrupt cut detection detection methodsmethods

109109

If G(x,y,t) is a gray scale sequence then, the chromatic scaling of G(x,y,t) can be modeled as

Therefore, a fade-out can be modeled as:

and a fade-in as:

Fades (1)Fades (1)

110110

part of video sequence showing fade-in

part of video sequence showing fade-out

Fades (2)Fades (2)

111111

cuts fade

• Mutual information pattern from “basketball” video sequence showing cuts and fade

Mutual information pattern Mutual information pattern (2)(2)

112112

For each pair of successive frames ft and f t+1 calculate the joint entropy of the basic chromatic components.

Determine the values of the joint entropy close to zero

Detect fade-out (fade-in)• The first (last) zero value defines the end (start) of fade-out (fade-in)

• Find the start (end) of fade-out (fade-in).

A fade should have at least a duration of 2 frames:

Algorithms for detecting Algorithms for detecting fades (1)fades (1)

113113

Fade out cut

frame 1785 frames 1791-1802 frame 1803 frame 1805

frame 1765 frame 1770 frame 1775 frame 1780

Joint entropy pattern (1)Joint entropy pattern (1)

114114

threshold

fade fade

frame 4420 frame 4425 frame 4426 frame 4430 frame 4440

Cut to the dark frame

Joint entropy pattern (2)Joint entropy pattern (2)

115115

results using the joint entropy

results using the average frame value

Comparison of fade detection Comparison of fade detection methods (1)methods (1)

116116

results using the joint entropy

results using the average frame value

Comparison of fade detection Comparison of fade detection methods (2)methods (2)

117117

split & merge algorithm

based on the series of mutual information of gray levels at successive frames within the shot

choose clusters of large sizes

select as potential key frame the first frame from each cluster.

test the similarity of potential key-frames using the mutual information

Algorithm for key frame Algorithm for key frame selection (1)selection (1)

118118

Key frame selection (1)Key frame selection (1)

119119

star sequence


120120

frame 1690 frame 1770


121121

frame 314 frame 2026 frame 2904 frame 4344

key frames selected from different shots

two key frames selected from one shot

frame 2607 frame 2637


122122

Wrap-upWrap-up

New methods for detecting cuts and fades with high precision have been described.

Accurate detection of fade borders (starting and ending point) has been achieved.

Comparisons with other methods demonstrate the accuracy/success of the proposed techniques.

Satisfactory results for key frame selection by performing clustering on the mutual information series have been reported.

123123

Introduction Applications Standard Description elements Visual structural elements Description schemes

for still images video

Wrap-up

MPEG-7: Standard for MPEG-7: Standard for Multimedia Information Multimedia Information

SystemsSystems

124124

MPEG-7: annotates – data in

• MPEG-4 object-based representations (interactive representations)• MPEG-2• MPEG-1

– analog data (e.g. VHS)– photo prints– artistic pictures

It is not about compression. Aim: Description of audiovisual content

– Descriptors– Description Schemes– Description Definition

IntroductionIntroduction

Frame-based encoding of waveforms

125125

Provides generic description of audiovisual and multimedia content for– systematic access to audiovisual

information sources– re-usability of descriptions and

annotations– management and linking of

content, events, and user interaction

(Jens-Rainer Ohm, HHI)

ApplicationsApplications

126126

o MPEG-7 consists of Descriptors (D) with

Descriptor Value (DV)

Description Schemes (DS)

Description Definition Language (DDL)

(Jens-Rainer Ohm, HHI)

Standard Standard

127127

Structural (Can be extracted automatically) Signal-based features Regions and Segments

Semantic/Conceptual (Mostly manual annotation)o Objectso Sceneso Events

Metadata (Manual or non-signal based annotation)o Acquisition & productionso High-level content descriptiono Intellectual property, usage

Description elements Description elements

128128

Examples of low-level visual features Color Texture Shape Motion

Examples of MPEG-7 visual descriptors

Visual structural elements Visual structural elements

Color Color histogram, Dominant color

Texture Frequency layout, Edge histogramShape Zernike moments, curvature peaksMotion Motion trajectory, parametric motion

Examples of MPEG-7 Visual Description Schemes

Still region Moving region Video Segment

129129

Layouts for description schemes Hierarchical (tree) Relational (entity relationship graph)

Description SchemesDescription Schemes

130130

Still Region Description Still Region Description SchemeScheme

131131

Video Sequence Description Video Sequence Description SchemeScheme

132132

Description Definition Description Definition Language Language

Based on Extensible Markup Languages

133133

MPEG-7: Generic description interface for audiovisual and multimedia content

MPEG-7: Can be used for Search/filtering and manipulation of audiovisual information Multimedia browsing and navigation Data organization, archiving, and authoring Interpretation and understanding of multimedia content

Key technology

Wrap-upWrap-up

134134

Overview of fundamentals for information retrieval Focus on segmentation and its assessment Shape retrieval based on Hausdorff distance Video Summarization

Acknowledgments: I. Pitas, E. Pranckeviciene, Z. Chernekova, C. Nikou, and P. Rotter.

Conclusions Conclusions

Documents

Aristotle University of Thessaloniki Department of Informatics