134
1 1 Constantine Kotropoulos Monday July 8, 2002 Monday July 8, 2002 Visual Information Retrieval Visual Information Retrieval Aristotle University of Aristotle University of Thessaloniki Thessaloniki Department of Informatics Department of Informatics

Aristotle University of Thessaloniki Department of Informatics

  • Upload
    howell

  • View
    50

  • Download
    1

Embed Size (px)

DESCRIPTION

Aristotle University of Thessaloniki Department of Informatics. Visual Information Retrieval. Constantine Kotropoulos. Monday July 8, 2002. Outline. Fundamentals Still image segmentation: Comparison of ICM and LVQ techniques Shape retrieval based on Hausdorff distance - PowerPoint PPT Presentation

Citation preview

Page 1: Aristotle University of Thessaloniki  Department of Informatics

11

Constantine Kotropoulos

Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002Monday July 8, 2002

Visual Information RetrievalVisual Information RetrievalVisual Information RetrievalVisual Information Retrieval

Aristotle University of Aristotle University of Thessaloniki Thessaloniki

Department of InformaticsDepartment of Informatics

Page 2: Aristotle University of Thessaloniki  Department of Informatics

22

Fundamentals Still image segmentation: Comparison of ICM and LVQ

techniques Shape retrieval based on Hausdorff distance Video Summarization: Detecting shots, cuts, and fades

in video – Selection of key frames MPEG-7: Standard for Multimedia Applications Conclusions

OutlineOutline

Page 3: Aristotle University of Thessaloniki  Department of Informatics

33

About Toward visual information retrieval Data types associated with images or video First generation systems Second generation systems Content-based interactivity Representation of visual content Similarity models Indexing methods Performance evaluation

Fundamentals Fundamentals

Page 4: Aristotle University of Thessaloniki  Department of Informatics

44

Visual information retrieval: – To retrieve images or image sequences from a database that are relevant to a query. – Extension of traditional information retrieval designed to include visual media.

Needs: Tools and interaction paradigms that permit searching for visual data by referring directly to its content.

– Visual elements (color, texture, shape, spatial relationships) related to perceptual aspects of image content.

– Higher-level concepts: clues for retrieving images with similar content from a database. Multidisciplinary field:

– Information retrieval Image/video analysis and processing– Visual data modeling and representation Pattern recognition– Multimedia database organization Computer vision– Multimedia database organization User behavior modeling– Multidimensional indexing Human-computer interaction

AboutAbout

Page 5: Aristotle University of Thessaloniki  Department of Informatics

55

Databases – allow a large amount of alphanumeric data to be stored in a local repository

and accessed by content through appropriate query languages. Information Retrieval Systems

– provide access to unstructured text documents• Search engines working in the textual domain either using keywords or

full text. Need for Visual Information Retrieval Systems has become

apparent when– digital archives were released. – distribution of image and video data though large-bandwidth computer

networks emerged– more prominent as we progress to the wireless era!

Toward visual information Toward visual information retrievalretrieval

Page 6: Aristotle University of Thessaloniki  Department of Informatics

66

Query by image content Query by image content using NOKIA 9210 using NOKIA 9210

CommunicatorCommunicator

www.iva.cs.tut.fi/COST211www.iva.cs.tut.fi/COST211

Iftikhar et al.Iftikhar et al.

Page 7: Aristotle University of Thessaloniki  Department of Informatics

77

Content-dependent metadata – Data related in some way to image/video content (e.g., format,

author’s name, date, etc.) Content-dependent metadata

– Low/intermediate-level features: color, texture, shape, spatial relationship, motion, etc.

– Data referring to content semantics (content-descriptive metadata)

Impact on the internal organization of the retrieval system

Data types associated with Data types associated with images or videoimages or video

Page 8: Aristotle University of Thessaloniki  Department of Informatics

88

Answers to queries: Find– All images of paintings of El Greco.– All byzantine ikons dated from 13th century, etc.

Content-independent metadata: alphanumeric strings Representation schemes: relational models, frame models, object-oriented. Content-dependent metadata: annotated keywords or scripts Retrieval: Search engines working in the textual domain (SQL, full text

retrieval) Examples: PICDMS (1984), PICQUERY (1988), etc. Drawbacks:

– Difficult for text to capture the distinctive properties of visual features– Text not appropriate for modeling perceptual similarity– Subjective

First generation systems First generation systems

Page 9: Aristotle University of Thessaloniki  Department of Informatics

99

Supports full retrieval by visual content– Conceptual level: keywords– Perceptual level: objective measurements at pixel level– Other sensory data (speech, sound) might help (e.g. video streams).

Image processing, pattern recognition and computer vision are an integral part of architecture and operation

Retrieval systems for – 2-D still images– Video – 3-D images and video– WWW

Second generation systems Second generation systems

Page 10: Aristotle University of Thessaloniki  Department of Informatics

1010

Content– Perceptual properties: color, texture, shape, and spatial relationships– Semantic primitives: objects, roles, and scenes– Impressions, emotions, and meaning associated with the combination of perceptual

features Basic retrieval paradigm: For each image a set of descriptive features are pre-

computed Queries by visual examples

– The user selects the features, ranges of model parameters, and chooses a similarity measure

– The system checks the similarity between the visual content of the user’s query and database images.

Objective: To keep the number of misses as low as possible. Number of false alarms?

Interaction: Relevance feedback

Retrieval systems for 2-D still Retrieval systems for 2-D still images (1)images (1)

Page 11: Aristotle University of Thessaloniki  Department of Informatics

1111

Similarity vs. matching

Matching is a binary partition operator: “Does the observed object correspond to a model or not?”

Uncertainties are managed during the process

Similarity-based retrieval: To re-order the database of images according to how similar are to a query example. Ranking not classification

The user is in the retrieval loop; Need for a flexible interface.

Retrieval systems for 2-D still Retrieval systems for 2-D still images (2)images (2)

Page 12: Aristotle University of Thessaloniki  Department of Informatics

1212

Video conveys information from multiple planes of communication– How the frames are linked together using editing effects (cuts, fades,

dissolves, etc).– What is in the frames (characters,story content, etc.)

Each type of video (commercials, news, movies, sport) has its own peculiar characteristics.

Basic Terminology– Frame: basic unit of information usually samples at 1/25 or 1/30 of a second.– Shot: A set of frames between a camera turn-on and a camera turn-off– Clip: A set of frames with some semantic content– Episodes: An hierarchy of shots; – Scene: A collection of consecutive shots that share simultaneity is space, time, and

action (e.g. a dialog scene). Video is accessed through browsing and navigation

Retrieval systems for video (1)Retrieval systems for video (1)

Page 13: Aristotle University of Thessaloniki  Department of Informatics

1313

Retrieval systems for video (2)Retrieval systems for video (2)

Page 14: Aristotle University of Thessaloniki  Department of Informatics

1414

3-D images and video are available in– biomedicine– computer-aided design– Geographic maps– Painting– Games and entertainment industry (immersive environments)

Expected to flourish in the current decade Retrieval on the WWW:

– Distributed problem– Need for standardization (MPEG-7)– Response time is critical (work in the compressed domain, summarization)

Retrieval systems for 3-D Retrieval systems for 3-D images and video / WWWimages and video / WWW

Page 15: Aristotle University of Thessaloniki  Department of Informatics

1515

1. Visual interfaces

2. Standards for content representation

3. Database models

4. Tools for automatics extraction of features from images and video

5. Tools for extraction of semantics

6. Similarity models

7. Effective indexing

8. Web search and retrieval

9. Role of 3-D

Research directionsResearch directions

Page 16: Aristotle University of Thessaloniki  Department of Informatics

1616

Browsing offers a panoramic view of the visual information space

Visualization

Content-based interactivityContent-based interactivity

www.virage.comwww.virage.com

Page 17: Aristotle University of Thessaloniki  Department of Informatics

1717

QBICQBIC

http://wwwqbic.almaden.ibm.com/http://wwwqbic.almaden.ibm.com/

color layout

Page 18: Aristotle University of Thessaloniki  Department of Informatics

1818

For still images: To check if the concepts expressed in a query match the concepts

of database images: “find all Holy Ikons with a nativity” “find all Holy Ikons with Saint George” (object categories)Treated with free-text or SQL-based retrieval engines (Google)

To verify spatial relations between spatial entities“find all images with a car parked outside a house” topological queries (disjunction, adjacency, containment, overlapping) metric queries (distances, directions, angles)

Treated with SQL-like spatial query languages

Querying by content (1)Querying by content (1)

Page 19: Aristotle University of Thessaloniki  Department of Informatics

1919

To check the similarity of perceptual features (color, texture, edges, corners, and shapes) exact queries: “find all images of President Bush” range queries: “find all images with colors between green and blue” K-nearest neighbor queries: find the ten most similar images to the example”

For video: Concepts related to video content Motion, objects, texture, and color features of video: Shot

extraction, dominant colors, etc.

Querying by content (2)Querying by content (2)

Page 20: Aristotle University of Thessaloniki  Department of Informatics

2020

Google Google

Page 21: Aristotle University of Thessaloniki  Department of Informatics

2121

Ark of Refugee HeirloomArk of Refugee Heirloom

www.ceti.gr/kivotoswww.ceti.gr/kivotos

Page 22: Aristotle University of Thessaloniki  Department of Informatics

2222

Suited to express perceptual aspects of low/intermediate features of visual content.

The user provides a prototype image as a reference example Relevance feedback: the user analyses the responses of the system and

indicates, for each item retrieved the degree of relevance or the exactness of the ranking; the annotated results are fed back into the system to refine the query.

Types of querying:– Iconic (PN) :

• Suitable for retrieval based on high-level concepts

– By painting• Employed in color-based retrieval (NETRA)

– By sketch (PICASSO)– By image (NETRA)

Querying by visual exampleQuerying by visual example

Page 23: Aristotle University of Thessaloniki  Department of Informatics

2323

PICASSO/PNPICASSO/PN

http://viplab.dsi.unifi.it/PN/

Page 24: Aristotle University of Thessaloniki  Department of Informatics

2424

NETRANETRA

http://maya.ece.ucsb.edu/Netra/netra.html

Page 25: Aristotle University of Thessaloniki  Department of Informatics

2525

Representation of perceptual features of images and video is a fundamental problem in visual information retrieval.

Image analysis and pattern recognition algorithms provide the means to extract numeric descriptors.

Computer vision enables object and motion identification Representation of perceptual features

Color Texture Shape Structure Spatial relationships Motion

Representation of content semantics Semantic primitives Semiotics

Representation of visual Representation of visual content content

Page 26: Aristotle University of Thessaloniki  Department of Informatics

2626

Representation of Representation of perceptual featuresperceptual features

Color (1)Color (1)

Page 27: Aristotle University of Thessaloniki  Department of Informatics

2727

Human visual system: Responsible for color perception are the cones.

From psychological point of view, perception of color is related to several factors e.g., color attributes (brightness, chromaticity, saturation) surrounding colors color spatial organization observer’s memory/knowledge/experience

Geometric color models (RGB, HSV, Lab, etc.) Color histogram: to describe the low-level color properties.

Representation of Representation of perceptual features Color perceptual features Color

(2)(2)

Page 28: Aristotle University of Thessaloniki  Department of Informatics

2828

Image retrieval by color Image retrieval by color similarity (1)similarity (1)

Color spaces Histograms; Moments of distribution Quantization of the color space Similarity measures

L1 and L2 norm of the difference between the query histogram H(IQ) and the histogram of a database image H(ID)

Page 29: Aristotle University of Thessaloniki  Department of Informatics

2929

Image retrieval by color Image retrieval by color similarity (2)similarity (2)

histogram intersection

weighted Euclidean distance

Page 30: Aristotle University of Thessaloniki  Department of Informatics

3030

Texture: One level of abstraction above pixels.

Perceptual texture dimensions: Uniformity Density Coarseness Roughness Regularity Linearity Directionality/Direction Frequency Phase

Representation of Representation of perceptual features Texture perceptual features Texture

(1)(1)

Brodatz albumBrodatz album

Page 31: Aristotle University of Thessaloniki  Department of Informatics

3131

Statistical methods: Autocorrelation function (coarseness, periodicity) Frequency content [rings, wedges] Coarseness,

Directionality, isotropic/non-isotropic patterns Moments Directional histograms and related features Run-lengths and related features Co-occurrence matrices

Structural methods (Grammars and production rules)

Representation of Representation of perceptual features Texture perceptual features Texture

(2)(2)

Page 32: Aristotle University of Thessaloniki  Department of Informatics

3232

Criteria of a good shape representation Each shape possesses a unique representation invariant to

translation, rotation, and scaling. Similar shapes should have similar representations

Methods to extract shapes and to derive features stem from image processing Chain codes Polygonal approximations Skeletons Boundary descriptors

contour length/ diameter shape numbers Fourier descriptors Moments

Representation of Representation of perceptual features Shape perceptual features Shape

(1)(1)

Page 33: Aristotle University of Thessaloniki  Department of Informatics

3333

Representation of Representation of perceptual features Shape perceptual features Shape

(2)(2)

Chain codesChain codes

Polygonal approximationPolygonal approximation

(I. Pitas)(I. Pitas)

Page 34: Aristotle University of Thessaloniki  Department of Informatics

3434

Representation of Representation of perceptual features Shape perceptual features Shape

(3)(3)

a b c d

Face segmentation: (a) original color image (b) skin segmentation. (c ) connected components (d) best fit-ellipses.

Page 35: Aristotle University of Thessaloniki  Department of Informatics

3535

Structure To provide a Gestalt impression of the shapes in the image.

set of edges corners

To distinguish photographs from drawings. To classify scenes: portrait, landscape, indoor

Spatial relationships Spatial entities: points, lines, regions, and objects Relationships:

Directional (include a distance/angle measure) Topological (do not include distance but they capture set-theoretical

concepts e.g. disjunction) They are represented symbolically.

Representation of Representation of perceptual features perceptual features Structure/Spatial Structure/Spatial

relationships relationships

Page 36: Aristotle University of Thessaloniki  Department of Informatics

3636

Main characterizing element in a sequence of frames Related to change in the relative position of spatial entities or toa

a camera movement. Methods:

Detection of temporal changes of gray-level primitives (optical flow) Extraction of a set of sparse characteristic features of the objects, such as

corners or salient points and their tracking in subsequent frames. Crucial role in video

Representation of Representation of perceptual features Motionperceptual features Motion

Salient features (Kanade et al.)

Page 37: Aristotle University of Thessaloniki  Department of Informatics

3737

Identification of objects, roles, actions and events as abstractions of visual signs.

Achieved through recognition and interpretation Recognition

To select a set of low-level local features and statistical pattern recognition for object classification

Interpretation is based on reasoning. Domain-dependent e.g. Photobook (www-white.media.mit.edu) Retrieval systems including interpretation: facial database

systems to compare facial expressions

Representation of content Representation of content semantics Semantic semantics Semantic

primitivesprimitives

Page 38: Aristotle University of Thessaloniki  Department of Informatics

3838

Grammar of color usage to formalize effects Association of color hue, saturation, etc to psychological

behaviors Semiotics identifies two distinct steps for the production of

meaning Abstract level by narrative structures (e.g. camera breaks, colors, editing

effects, rhythm, shot angle) Concrete level by discourse structures: how the narrative elements create

a story.

Representation of content Representation of content semantics Semiotics semantics Semiotics

Page 39: Aristotle University of Thessaloniki  Department of Informatics

3939

Pre-attentive: perceived similarity between stimuli Color/texture/shape; Models close to human perception

Attentive: Interpretation Previous knowledge and a form of reasoning Domain-specific retrieval applications (mugshots); need for

models and similarity criteria definition

Similarity models Similarity models

Page 40: Aristotle University of Thessaloniki  Department of Informatics

4040

Distance in a metric psychological Properties of a distance function d:

Commonly used distance functions: Euclidean City-block Minkowsky

Metric model (1)Metric model (1)

Page 41: Aristotle University of Thessaloniki  Department of Informatics

4141

Inadequacies: shape similarity Advantages:

similarity judgment of color stimuli consistent with pattern recognition and computer vision suitable for creating indices

Other similarity models: Virtual metric spaces Tversky’s model: function of two types of features: those that are

common to the two stimuli and those that exclusively appear to one only stimulus.

Transformational distances: elastic graph matching

User subjectivity?

Metric model (2)Metric model (2)

Page 42: Aristotle University of Thessaloniki  Department of Informatics

4242

Self improving database browser and annotator based on user interaction

Similarity is presented with groupings

The system chooses in trees hierarchies those nodes which most efficiently represent the positive examples.

Set-covering algorithm to remove all positive examples covered.

Iterations

Four eyes approach Four eyes approach

Page 43: Aristotle University of Thessaloniki  Department of Informatics

4343

To avoid sequential scanning Retrieved images are ranked in order of similarity to a query Compound measure of similarity between visual features and

text attributes. Indexing of string attributes Commonly used indexing techniques

Hashing tables and signatures Cosine similarity function

Indexing methods (1)Indexing methods (1)

Page 44: Aristotle University of Thessaloniki  Department of Informatics

4444

Triangle inequality (Barros et al.) When the query item q is presented, then d(q,r) is computed. For all database items i:

Maximum threshold l=d(q,r); r the most similar item Search for distances closest to d(q,r) If d(i,r) inferior to d(q,r) is found, item i is regarded as the most

similar item, and l=d(i,r). Continue until | d(i,r)-d(q,r)| l

Indexing methods (2)Indexing methods (2)

Page 45: Aristotle University of Thessaloniki  Department of Informatics

4545

Fixed grids: non-hierarchical index structure that organizes the space into buckets.

Grid files: fixed grids with buckets of unequal size K-d trees: Binary tree; the values of one of the k features is

checked at each node. R-trees: partition the feature space into multidimensional

rectangles SS-trees: Weighted Euclidean distance; suitable for clustering;

ellipsoidal clusters

Index structuresIndex structures

Page 46: Aristotle University of Thessaloniki  Department of Informatics

4646

Performance evaluationPerformance evaluation

Judgment by evaluator

Relevant Not relevant

Retrieved A (correctly retrieved)

C (falsely retrieved)

Not Retrieved B(missed)

D(correctly rejected)

CA

Arecall

BA

Aprecision

Page 47: Aristotle University of Thessaloniki  Department of Informatics

4747

Wrap-upWrap-up

Visual information retrieval is a research topic at the intersection of digital image processing, pattern recognition, and computer vision (fields of our interest/expertise) but also information retrieval, databases. Related to semantic web Challenging research topic dealing with many unsolved problems:

segmentation machine similarity vs. human perception focused searching

Page 48: Aristotle University of Thessaloniki  Department of Informatics

4848

Still Image Segmentation: Still Image Segmentation: Comparison of ICM and LVQComparison of ICM and LVQ

Comparison– Iterated Conditional Modes (ICM)– Split and Merge Learning Vector Quantizer (LVQ)

Ability to extract meaningful image parts based on the ground truth

Evaluation of still image segmentation algorithms

Page 49: Aristotle University of Thessaloniki  Department of Informatics

4949

Iterated Conditional Iterated Conditional Modes (ICM)Modes (ICM)

The ICM method is based on the maximization of the probability density function of the image model given real image data.

The criterion function is:

where xs is the region assignment and ys is the luminance value of the pixel s mi and δi are mean value and the standard deviation of luminance of the region i; C is the clique of the pixel s, VC(x) is the potential function of C, N8(s) is 8x8 neighborhood of the pixel s.

s CxC

i

isqss

s

xVmy

sNqxyxp )(2

exp)(,,|2

2

8

Page 50: Aristotle University of Thessaloniki  Department of Informatics

5050

How ICM worksHow ICM works

Initial segmentation is obtained using the K-means clustering algorithm. Cluster center initialization is based on image intensity histogram.

At each iteration probability, the value of the criterion function, is calculated for each pixel. Pixels are assigned to clusters- regions with maximum probability.

Having a new segmentation, the mean intensity value and the cluster variance are estimated. The iterative process stops when no change occurs in clusters.

For obtained segmentation, small regions are merged with nearest ones. The output image contains the large regions assigned the mean luminance value.

Page 51: Aristotle University of Thessaloniki  Department of Informatics

5151

Image features and Image features and parameters of the ICM parameters of the ICM

algorithmalgorithmThe ICM algorithm is applied on the luminance

component of the image. Input for the algorithm is a gray level image.

The parameter of the algorithm is the value of the potential function.

The parameter controls the roughness of the segment boundaries.

The value of the parameter is tuned experimentally.

Page 52: Aristotle University of Thessaloniki  Department of Informatics

5252

Segmentation results Segmentation results (ICM )(ICM )

Page 53: Aristotle University of Thessaloniki  Department of Informatics

5353

Learning Vector Quantizer Learning Vector Quantizer (1)(1)

neural networkself organizingcompetitive learning lawunsupervisedapproximates data pdf by adjusting the weights of the

reference vectors

Page 54: Aristotle University of Thessaloniki  Department of Informatics

5454

Learning Vector Quantizer Learning Vector Quantizer (2)(2)

codebookreference vectors representing their nearest data patternsnumber of reference vectors

– predefined– split and merge

Page 55: Aristotle University of Thessaloniki  Department of Informatics

5555

Learning Vector Quantizer Learning Vector Quantizer (3)(3)

Minimal error for data representation:

Iterative correction of reference vectors:

dxxfwxx

r

c )(

)()()()()1( kwkxkakwkw cicc

Page 56: Aristotle University of Thessaloniki  Department of Informatics

5656

Learning Vector Quantizer Learning Vector Quantizer (4)(4)

Split and merge technique– Find the winner reference vector w(k) for pattern x(k).– if x(k) is not an outlier proceed as in standard LVQ.– if x(k) is an outlier:

• split the cluster and include x(k) in one of the sub-clusters.• or• create a new cluster having seed x(k).

Page 57: Aristotle University of Thessaloniki  Department of Informatics

5757

Learning Vector Quantizer (5)Learning Vector Quantizer (5)

Page 58: Aristotle University of Thessaloniki  Department of Informatics

5858

Experimental set-up (1)Experimental set-up (1)

Apply both methods on images provided by BAL Explore the ability of the algorithms to extract

meaningful image parts based on the qualitative description of the ground truth.

Page 59: Aristotle University of Thessaloniki  Department of Informatics

5959

Paintings from Bridgeman Paintings from Bridgeman Art Art

LibraryLibrary – sky, mountains, people, water (smpw)– hammerhead cloud, reflection (cr)– sky, buildings, trees, people, pavement (sbtpp)– sky, people, hat (sph)– sky, trees, water, sails (stws)– horses, sledges, people, snow, sky (hspss)

Page 60: Aristotle University of Thessaloniki  Department of Informatics

6060

Experimental setup (2)Experimental setup (2)

We define by O={O1,..,OM} the set of objects given in the qualitative description of the ground truth, where M is the number of objects.

We define by T={T1,..,TN} the set of the regions with the unique label, obtained in the segmented image, where N is number of regions.

Three cases on the outcome of the segmentation as compared to the ground truth are possible.

Page 61: Aristotle University of Thessaloniki  Department of Informatics

6161

MatchingMatching

Case 1, best match (BM): The best match is when the region of the segmented image has one to one correspondence with the ground truth object;

Case 2, reasonable match (RM): The reasonable match is when the ground truth object has one to many correspondence with the regions of the segmented image;

Case 3, mismatch. The mismatch is when there is no correspondence between the ground truth objects and the regions of the segmented image.

Page 62: Aristotle University of Thessaloniki  Department of Informatics

6262

Three CasesThree Cases

For the jth ground truth object Oj by denoting the cases by i , and the segmented region by T the three cases occur as follows:

.,,1

,3

,2

,,1

Nk

TOwheni

TTOwheni

TTOwheni

j

kj

kj

Page 63: Aristotle University of Thessaloniki  Department of Informatics

6363

DecisionDecision

The decision about the presence of the ground truth object Oj in the segmented image according to all cases is:

We put a decision for each object after visual examination of the segmented image according to the definition of the ground truth.

.,0

,,1

otherwise

icaseOr j

ij

Page 64: Aristotle University of Thessaloniki  Department of Informatics

6464

Assessment of results (1)Assessment of results (1)

•Ground truth

•sky

•buildings

•trees

•people

•pavement

Page 65: Aristotle University of Thessaloniki  Department of Informatics

6565

Assessment of results (2)Assessment of results (2)

LVQ

Page 66: Aristotle University of Thessaloniki  Department of Informatics

6666

Assessment of results (3)Assessment of results (3)

ICM

Page 67: Aristotle University of Thessaloniki  Department of Informatics

6767

Assessment of results Assessment of results (4)(4)

•Ground truth

•horses

•sledges

•people

•snow

•sky

Page 68: Aristotle University of Thessaloniki  Department of Informatics

6868

Assessment of results (5)Assessment of results (5)

LVQ

Page 69: Aristotle University of Thessaloniki  Department of Informatics

6969

Assessment of results (6)Assessment of results (6)

ICM

Page 70: Aristotle University of Thessaloniki  Department of Informatics

7070

Assessment of results (7)Assessment of results (7)

Image Gr. truth ICM LVQ

Smpw 5 7 7

Cr 4 10 8

Sbtpp 5 13 8

Sph 3 14 8

Stws 4 15 11

Hspss 3 5 8

Number of regions

Page 71: Aristotle University of Thessaloniki  Department of Informatics

7171

Assessment of results (8)Assessment of results (8)

ICM LVQ

Image BM RM MM BM RM MMSmpw 0 0.75 0.25 0.5 0.5 0

Cr 0 0.75 0.25 0.5 0.5 0

Sbtpp 0.4 0.4 0.2 0.2 0.8 0

Sph 0 1 0 0 1 0Stws 0.5 0.25 0.25 0.25 0.75 0Hspss 0.33 0 0.66 0 1 0

Ranking: ICM vs. LVQ

BM: Best Match

RM: Reasonable Match

MM: Mismatch

Page 72: Aristotle University of Thessaloniki  Department of Informatics

7272

Assessment of results (9)Assessment of results (9)

ICM LVQ

BM RM MM BM RM MMAverage 0.20 0.53 0.27 0.24 0.76 0

Ranking: ICM vs. LVQ

BM: Best Match

RM: Reasonable Match

MM: Mismatch

Page 73: Aristotle University of Thessaloniki  Department of Informatics

7373

Evaluation of Image Evaluation of Image Segmentation Algorithms (1)Segmentation Algorithms (1)

CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge

Page 74: Aristotle University of Thessaloniki  Department of Informatics

7474

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms

((22)) Evaluation within the Semantic Space; Impossible to ask the Average User to provide all possible h Compromise: Evaluation in the Indexing Space;Allows us to access S without explicitly defining σ. Average User: to achieve a consensus on h. Ask users to evaluate two proposed arrows π to obtain Average User’s response. Implicitly characterize h and σ.

Page 75: Aristotle University of Thessaloniki  Department of Informatics

7575

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms

((33)) Unsupervised algorithms

1. Multiscale Image Segmentation (UCAM-MIS)

2. Blobworld (UC Berkeley-Blobworld)

3. Iterated Conditional Modes (AUTH-ICM)

4. Learning Vector Quantizer (AUTH-LVQ)

5. Double Markov Random Field (TCD-DMRF)

6. Complex Wavelet based Hidden Markov Tree (UCAM-CHMT)

Page 76: Aristotle University of Thessaloniki  Department of Informatics

7676

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((44))

Hard measurements Soft measurements: The speed of response of the user

(time-1): how much better the user prefers one scheme over the other

Faster response: the selected scheme provides a better semantic breakdown of the original image

Slower response: reflects the similarity of two schemes

Aims: To determine whether or not agreement exists in users’ decisions Do two pairwise rankings lead to consistent total orderings? Do hard and soft measurements coincide?

Page 77: Aristotle University of Thessaloniki  Department of Informatics

7777

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((55))

CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge

Page 78: Aristotle University of Thessaloniki  Department of Informatics

7878

CiCiáán Shaffrey, Univ. of Cambridgen Shaffrey, Univ. of Cambridge

Evaluation of Image Evaluation of Image Segmentation Algorithms Segmentation Algorithms ((66))

Page 79: Aristotle University of Thessaloniki  Department of Informatics

7979

Wrap-upWrap-up

ICM– continuous, large sized regions– appropriate for homogeneous regions

LVQ– spatially connected, small regions– more detailed segmentation

Both provide good RM

Page 80: Aristotle University of Thessaloniki  Department of Informatics

8080

Image retrieval based on Hausdorff Image retrieval based on Hausdorff distancedistance

Hausdorff distance definition Advantages How to speed-up the computations Experiments

Page 81: Aristotle University of Thessaloniki  Department of Informatics

8181

Hausdorff distance Hausdorff distance definitiondefinition

dH+(A,B) = sup {d(x,B) : x A}

dH-(A,B) = sup {d(y,A) : y B},

d(v,W) = inf {d(v,w) : w W}.

dH(A,B) = max (dH+(A,B), dH-(A,B))

Page 82: Aristotle University of Thessaloniki  Department of Informatics

8282

Hausdorff distance Hausdorff distance advantagesadvantages

dH (A, B) = 0 A=B (A, B – sets representing graphical objects, object contours, etc.)

Information about parameters of transformation (complex object recognition)

Predictable – simple intuitive interpretation

dH+ and dH

- - for partial obscured or erroneously segmented objects

Possibility of generalization: max quantiles

Possibility of taking into consideration any object transformations

Page 83: Aristotle University of Thessaloniki  Department of Informatics

8383

How to speed up the How to speed up the computations computations

for comparing one pair (1)for comparing one pair (1)A. Replacing objects by their contours

The HD between the objects may be large although for contours the HD is small (e.g. disk and ring) possibility of false alarms

but

Contours of similar objects are always similar (small HD)no possibility of omitting similar objects

Page 84: Aristotle University of Thessaloniki  Department of Informatics

8484

How to speed up the How to speed up the computations for comparing computations for comparing

one pairone pair (2)(2)

B. Voronoi diagram or distance transform

C. Early scan termination

D. Pruning some parts of transformation space

Page 85: Aristotle University of Thessaloniki  Department of Informatics

8585

How to speed up the How to speed up the computations – Number of computations – Number of

models considermodels considerIdea:

Matrix of distances for models (every pair)

1. Pruning some models (we know they will not match query)

2. Database navigation optimal search order (possibility of early finish)

Page 86: Aristotle University of Thessaloniki  Department of Informatics

8686

How to speed up the How to speed up the computationscomputations

A. Excluding of model object from the search

queryquery

ref – any model object

- distance to the closest model

only here may lay model closest to query object

refref

Model closest to query object may lay only in colored area

Page 87: Aristotle University of Thessaloniki  Department of Informatics

8787

How to speed up the How to speed up the computationscomputations

B. Pruning with many reference objects

Page 88: Aristotle University of Thessaloniki  Department of Informatics

8888

How to speed up the How to speed up the computationscomputations

C. Optimal searching order

Page 89: Aristotle University of Thessaloniki  Department of Informatics

8989

How to speed up the How to speed up the computationscomputations

D. Introducing other criteria (pre-computation)

Moment invariants:Moment invariants:

• MM11==(M(M2020+M+M

0202) / m) / m0000

22

• MM22==(M(M2020MM02 02 – M– M

111122) / m) / m

000044

where:where:

ijqp

J

j

I

ipq fjjiiM )()( 00

11

J

jij

qpI

ipq fjim

11

Shape coefficients:Shape coefficients:

• Blair-Bliss coefficientBlair-Bliss coefficient

dsr

SWBB

22

Page 90: Aristotle University of Thessaloniki  Department of Informatics

9090

Experiments - databaseExperiments - database

Database: 76 islands, represented as *.bmp images

..…..…

Page 91: Aristotle University of Thessaloniki  Department of Informatics

9191

Experiment 1: map queryExperiment 1: map query

Image retrieval. Step 1: interactive segmentation of query object

Page 92: Aristotle University of Thessaloniki  Department of Informatics

9292

Experiment 1: map queryExperiment 1: map query

Searching order: 8 / 76 model object were checked Loading model 1 / 76: "amorgos.bmp"Loading model 1 / 76: "amorgos.bmp"Hausdorff distance: 0.156709Hausdorff distance: 0.156709

Loading model 42 / 76: "ithaca.bmp"Loading model 42 / 76: "ithaca.bmp"Hausdorff distance: 0.143915Hausdorff distance: 0.143915

Loading model 27 / 76: "ikaria.bmp"Loading model 27 / 76: "ikaria.bmp"Hausdorff distance: 0.080666Hausdorff distance: 0.080666

Loading model 31 / 76: "kasos.bmp"Loading model 31 / 76: "kasos.bmp"Hausdorff distance: 0.080551Hausdorff distance: 0.080551

Loading model 20 / 76: "sikinos.bmp"Loading model 20 / 76: "sikinos.bmp"Hausdorff distance: 0.121180Hausdorff distance: 0.121180

Loading model 52 / 76: "alonissos.bmp"Loading model 52 / 76: "alonissos.bmp"Hausdorff distance: 0.153914Hausdorff distance: 0.153914

Loading model 17 / 76: "rithnos.bmp"Loading model 17 / 76: "rithnos.bmp"Hausdorff distance: 0.103512Hausdorff distance: 0.103512

Loading model 61 / 76: "skopelos.bmp"Loading model 61 / 76: "skopelos.bmp"Hausdorff distance: 0.045430Hausdorff distance: 0.045430

Page 93: Aristotle University of Thessaloniki  Department of Informatics

9393

Experiment 1: map queryExperiment 1: map query

Minimum of Hausdorff distance of model closest to query object

Page 94: Aristotle University of Thessaloniki  Department of Informatics

9494

Experiment 2: mouse-Experiment 2: mouse-drawing querydrawing query

Query HD criterion position for min HD HD+M1+M2+WBB

SantoriniSantorini HD = 0.112HD = 0.112 MCD=1.024 MCD=1.024

HD = 0.143HD = 0.143 MCD=1.771 MCD=1.771max HD = 0. 3072max HD = 0. 3072 max MCD=3.4326 max MCD=3.4326

closestclosest

secondsecond

furthestfurthest

PorosPorosElafonisosElafonisos

Page 95: Aristotle University of Thessaloniki  Department of Informatics

9595

Wrap-upWrap-up

Hausdorff distance is better for shape recognition than feature-based criteria.Big computational cost of image retrieval based on HD can be reduced by:

• decreasing cost of computation for pair of objects

• replacing object by it’s contours

• using of Voronoi diagram

• off-line database processing – calculating of matrix of distances between model objects

• reducing number of model objects to be compared

• optimal searching order

• using features as auxiliary similarity criteria

Page 96: Aristotle University of Thessaloniki  Department of Informatics

9696

Video Summarization: Detecting shots, cuts, and fades in video –

Selection of key frames

Page 97: Aristotle University of Thessaloniki  Department of Informatics

9797

OutlineOutline

Entropy, joint entropy, and mutual information Shot cut detection based on mutual information Fade detection based on joint entropy Key frame selection Comparison with other methods Wrap-up

Page 98: Aristotle University of Thessaloniki  Department of Informatics

9898

Entropy-Joint EntropyEntropy-Joint Entropy

measure of the information content or the “uncertainty” about X.

• Joint entropy of RVs X and Y:

• Entropy of a random variable X (RV):

Page 99: Aristotle University of Thessaloniki  Department of Informatics

9999

Mutual InformationMutual Information

It measures the average reduction in uncertainty about X that results from learning the value of Y.

It measures the amount of information that X conveys about Y.

Page 100: Aristotle University of Thessaloniki  Department of Informatics

100100

- for each pair of successive frames ft and f t+1 whose gray levels vary from

0 to N-1

• Calculate three NxN co-occurrence matrices, one for each chromatic component R, G, and B,

whose (i,j) element is the joint probability of observing a pixel having the ith gray level in ft and jth gray level in f t+1

• calculate the mutual information of the gray levels for the three components R, G, B independently and sum them.

CCCB

tt

G

tt

R

tt 1,1,1,,,

Algorithm for detecting Algorithm for detecting abrupt cuts (1)abrupt cuts (1)

Page 101: Aristotle University of Thessaloniki  Department of Informatics

101101

– Apply a robust estimator of the mean value in the time-series of mutual information values by defining a time-window around each time instant t0

- An abrupt cut is detected if

Algorithm for detecting Algorithm for detecting abrupt cuts (2)abrupt cuts (2)

Page 102: Aristotle University of Thessaloniki  Department of Informatics

102102

cuts

• Mutual information pattern from “star” video sequence that depicts cuts

Mutual information pattern Mutual information pattern (1)(1)

Page 103: Aristotle University of Thessaloniki  Department of Informatics

103103

Ground truthGround truth

Page 104: Aristotle University of Thessaloniki  Department of Informatics

104104

Performance evaluationPerformance evaluation

GT: denotes the ground truth,Seg: the segmented (correct and false) shots using

our methodsRecall is corresponding to the probability of detection

Precision is corresponding to the accuracy of the method considering false detections

Overlap (for fades)

Page 105: Aristotle University of Thessaloniki  Department of Informatics

105105

Test results (1)Test results (1)

Page 106: Aristotle University of Thessaloniki  Department of Informatics

106106

Test results (2)Test results (2)

Page 107: Aristotle University of Thessaloniki  Department of Informatics

107107

– Features that could be used to define a distance measure: • Successive color frame differences:

• Successive color vector bin-wise HS histogram differences (invariant to brightness changes):

– Fusion of the two differences: – Shot cut detection

• by adaptive local thresholding

Alternative technique for shot Alternative technique for shot cut detectioncut detection

Page 108: Aristotle University of Thessaloniki  Department of Informatics

108108

results using mutualinformation

results using the combined method

Comparison of abrupt cut Comparison of abrupt cut detection detection methodsmethods

Page 109: Aristotle University of Thessaloniki  Department of Informatics

109109

If G(x,y,t) is a gray scale sequence then, the chromatic scaling of G(x,y,t) can be modeled as

Therefore, a fade-out can be modeled as:

and a fade-in as:

Fades (1)Fades (1)

Page 110: Aristotle University of Thessaloniki  Department of Informatics

110110

part of video sequence showing fade-in

part of video sequence showing fade-out

Fades (2)Fades (2)

Page 111: Aristotle University of Thessaloniki  Department of Informatics

111111

cuts fade

• Mutual information pattern from “basketball” video sequence showing cuts and fade

Mutual information pattern Mutual information pattern (2)(2)

Page 112: Aristotle University of Thessaloniki  Department of Informatics

112112

For each pair of successive frames ft and f t+1 calculate the joint entropy of the basic chromatic components.

Determine the values of the joint entropy close to zero

Detect fade-out (fade-in)• The first (last) zero value defines the end (start) of fade-out (fade-in)

• Find the start (end) of fade-out (fade-in).

A fade should have at least a duration of 2 frames:

Algorithms for detecting Algorithms for detecting fades (1)fades (1)

Page 113: Aristotle University of Thessaloniki  Department of Informatics

113113

Fade out cut

frame 1785 frames 1791-1802 frame 1803 frame 1805

frame 1765 frame 1770 frame 1775 frame 1780

Joint entropy pattern (1)Joint entropy pattern (1)

Page 114: Aristotle University of Thessaloniki  Department of Informatics

114114

threshold

fade fade

frame 4420 frame 4425 frame 4426 frame 4430 frame 4440

Cut to the dark frame

Joint entropy pattern (2)Joint entropy pattern (2)

Page 115: Aristotle University of Thessaloniki  Department of Informatics

115115

results using the joint entropy

results using the average frame value

Comparison of fade detection Comparison of fade detection methods (1)methods (1)

Page 116: Aristotle University of Thessaloniki  Department of Informatics

116116

results using the joint entropy

results using the average frame value

Comparison of fade detection Comparison of fade detection methods (2)methods (2)

Page 117: Aristotle University of Thessaloniki  Department of Informatics

117117

split & merge algorithm

based on the series of mutual information of gray levels at successive frames within the shot

choose clusters of large sizes

select as potential key frame the first frame from each cluster.

test the similarity of potential key-frames using the mutual information

Algorithm for key frame Algorithm for key frame selection (1)selection (1)

Page 118: Aristotle University of Thessaloniki  Department of Informatics

118118

Key frame selection (1)Key frame selection (1)

Page 119: Aristotle University of Thessaloniki  Department of Informatics

119119

star sequence

Key frame selection (2)Key frame selection (2)

Page 120: Aristotle University of Thessaloniki  Department of Informatics

120120

frame 1690 frame 1770

Key frame selection (3)Key frame selection (3)

Page 121: Aristotle University of Thessaloniki  Department of Informatics

121121

frame 314 frame 2026 frame 2904 frame 4344

key frames selected from different shots

two key frames selected from one shot

frame 2607 frame 2637

Key frame selection (4)Key frame selection (4)

Page 122: Aristotle University of Thessaloniki  Department of Informatics

122122

Wrap-upWrap-up

New methods for detecting cuts and fades with high precision have been described.

Accurate detection of fade borders (starting and ending point) has been achieved.

Comparisons with other methods demonstrate the accuracy/success of the proposed techniques.

Satisfactory results for key frame selection by performing clustering on the mutual information series have been reported.

Page 123: Aristotle University of Thessaloniki  Department of Informatics

123123

Introduction Applications Standard Description elements Visual structural elements Description schemes

for still images video

Wrap-up

MPEG-7: Standard for MPEG-7: Standard for Multimedia Information Multimedia Information

SystemsSystems

Page 124: Aristotle University of Thessaloniki  Department of Informatics

124124

MPEG-7: annotates – data in

• MPEG-4 object-based representations (interactive representations)• MPEG-2• MPEG-1

– analog data (e.g. VHS)– photo prints– artistic pictures

It is not about compression. Aim: Description of audiovisual content

– Descriptors– Description Schemes– Description Definition

IntroductionIntroduction

Frame-based encoding of waveforms

Page 125: Aristotle University of Thessaloniki  Department of Informatics

125125

Provides generic description of audiovisual and multimedia content for– systematic access to audiovisual

information sources– re-usability of descriptions and

annotations– management and linking of

content, events, and user interaction

(Jens-Rainer Ohm, HHI)

ApplicationsApplications

Page 126: Aristotle University of Thessaloniki  Department of Informatics

126126

o MPEG-7 consists of Descriptors (D) with

Descriptor Value (DV)

Description Schemes (DS)

Description Definition Language (DDL)

(Jens-Rainer Ohm, HHI)

Standard Standard

Page 127: Aristotle University of Thessaloniki  Department of Informatics

127127

Structural (Can be extracted automatically) Signal-based features Regions and Segments

Semantic/Conceptual (Mostly manual annotation)o Objectso Sceneso Events

Metadata (Manual or non-signal based annotation)o Acquisition & productionso High-level content descriptiono Intellectual property, usage

Description elements Description elements

Page 128: Aristotle University of Thessaloniki  Department of Informatics

128128

Examples of low-level visual features Color Texture Shape Motion

Examples of MPEG-7 visual descriptors

Visual structural elements Visual structural elements

Color Color histogram, Dominant color

Texture Frequency layout, Edge histogramShape Zernike moments, curvature peaksMotion Motion trajectory, parametric motion

Examples of MPEG-7 Visual Description Schemes

Still region Moving region Video Segment

Page 129: Aristotle University of Thessaloniki  Department of Informatics

129129

Layouts for description schemes Hierarchical (tree) Relational (entity relationship graph)

Description SchemesDescription Schemes

Page 130: Aristotle University of Thessaloniki  Department of Informatics

130130

Still Region Description Still Region Description SchemeScheme

Page 131: Aristotle University of Thessaloniki  Department of Informatics

131131

Video Sequence Description Video Sequence Description SchemeScheme

Page 132: Aristotle University of Thessaloniki  Department of Informatics

132132

Description Definition Description Definition Language Language

Based on Extensible Markup Languages

Page 133: Aristotle University of Thessaloniki  Department of Informatics

133133

MPEG-7: Generic description interface for audiovisual and multimedia content

MPEG-7: Can be used for Search/filtering and manipulation of audiovisual information Multimedia browsing and navigation Data organization, archiving, and authoring Interpretation and understanding of multimedia content

Key technology

Wrap-upWrap-up

Page 134: Aristotle University of Thessaloniki  Department of Informatics

134134

Overview of fundamentals for information retrieval Focus on segmentation and its assessment Shape retrieval based on Hausdorff distance Video Summarization

Acknowledgments: I. Pitas, E. Pranckeviciene, Z. Chernekova, C. Nikou, and P. Rotter.

Conclusions Conclusions