Upload
lot
View
91
Download
14
Embed Size (px)
DESCRIPTION
Content-based Image Retrieval (CBIR). Searching a large database for images that match a query: What kinds of databases? What kinds of queries? What constitutes a match? How do we make such searches efficient?. Applications. Art Collections e.g. Museum s, galleries, Archives - PowerPoint PPT Presentation
Citation preview
1
Content-based Image Retrieval (CBIR)
Searching a large database for images that match a query:
What kinds of databases? What kinds of queries? What constitutes a match? How do we make such searches
efficient?
2
Applications
Art Collections e.g. Museums, galleries, Archives Medical Image Databases CT, MRI, Ultrasound, The Visible Human Scientific Databases e.g. Earth Sciences, Hubble General Image Collections for Licensing Corbis, Getty Images, photo galery of
Gulhasan The World Wide Web
3
What is a query?
an image you already have a rough sketch you draw a symbolic description of what you want e.g. an image of a man and a woman on a beach
4
Some Systems You Can Try
Corbis Stock Photography and Pictures
http://www.corbis.com/
• Corbis sells high-quality images for use in advertising, marketing, illustrating, etc.
• Search is entirely by keywords.
• Human indexers look at each new image and enter keywords.
• A thesaurus constructed from user queries is used.
5
AVAİLABLE CBIR SYSTEMS
6
QBICIBM’s QBIC (Query by Image Content)
http://wwwqbic.almaden.ibm.com
• The first commercial system.
• Uses or has-used color percentages, color layout, texture, shape, location, and keywords.
7
Blobworld
UC Berkeley’s Blobworld
http://elib.cs.berkeley.edu/photos/blobworld
•Images are segmented on color plus texture
• User selects a region of the query image
• System returns images with similar regions
• Works really well for tigers and zebras
8
DittoDitto: See the Web
http://www.ditto.com
• Small company
• Allows you to search for pictures from web pages
National Archives, USA
9
Declaration of Independence
Ottoman Archives ???
10
QUERY BY EXAMPLE
11
12
Image Features / Distance Measures
Image Database
Query Image
Distance Measure
Retrieved Images
Image Feature
Extraction
User
Feature SpaceImages
13
Features
• Color (histograms, gridded layout, wavelets)
• Texture (Laws, Gabor filters, local binary partition)
• Shape (first segment the image, then use statistical or structural shape similarity measures)
• Objects and their Relationships
This is the most powerful, but you have to be able to recognize the objects!
14
Color Histograms
15
QBIC’s Histogram Similarity
The QBIC color histogram distance is:
dhist(I,Q) = (h(I) - h(Q)) A (h(I) - h(Q))T
• h(I) is a K-bin histogram of a database image
• h(Q) is a K-bin histogram of the query image
• A is a K x K similarity matrix
16
Similarity Matrix
R G B Y C V 1 0 0 .5 0 .50 1 0 .5 .5 00 0 1 1 1 1
RGBYCV
How similar is blue to cyan?
??
17
Gridded Color
Gridded color distance is the sum of the color distancesin each of the corresponding grid squares.
What color distance would you use for a pair of grid squares?
1 12 2
3 34 4
18
Texture Distances
• Pick and Click (user clicks on a pixel and system retrieves images that have in them a region with similar texture to the region surrounding it.
• Gridded (just like gridded color, but use texture).
• Histogram-based (e.g. compare the LBP histograms).
19
Local Binary Pattern Measure
100 101 103 40 50 80 50 60 90
• For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p’s value and 1 otherwise.
• Represent the texture in the image (or a region) by the histogram of these numbers.• Compute L1 distance between two histograms:
1 1 1 1 1 1 0 0
1 2 3
4
5 7 6
8
20
Laws Texture
21
Law’s texture masks (1)
22
Shape Distances
• Shape goes one step further than color and texture.
• It requires identification of regions to compare.
• There have been many shape similarity measures suggested for pattern recognition that can be used to construct shape distance measures.
23
Global Shape Properties:Projection Matching
041320
0 4 3 2 1 0
In projection matching, the horizontal and verticalprojections form a histogram.
Feature Vector(0,4,1,3,2,0,0,4,3,2,1,0)
What are the weaknesses of this method? strengths?
24
Global Shape Properties:Tangent-Angle Histograms
135
0 30 45 135
Is this feature invariant to starting point?
Boundary Matching:Fourier Descriptors
Given a sequence of points of a boundary Vk
25
26
Boundary Matching
• Elastic Matching
The distance between query shape and image shapehas two components:
1. energy required to deform the query shape into one that best matches the image shape
2. a measure of how well the deformed query matches the image
27
Del Bimbo Elastic Shape Matching
query retrieved images
28
Our work on CBIR at METU1. M. Uysal, F.T. Yarman-Vural, “A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP
Architecture”, Turkish Journal of Electrical Engineering & Computer Sciences, vol. 13, pp.333-342, 2005.
2. Ö.Ö. Tola, N. Arıca, F.T. Yarman-Vural, “Shape Recognition with Generalized Beam Angle Statistics”, Lecture Notes on Computer Science, vol. 2, 2004.
3. Ö.C. Özcanli, F.T. Yarman-Vural, “An Image Retrieval System Based on Region Classification”, Lecture Notes on Computer Science, vol. 1, 2004.
4. Y.Xu, P.Duygulu, E.Saber, A.M. Tekalp, F.T. Yarman-Vural, “Object-based Image Labeling Through Learning by Example and Multi-level Segmentation”, Pattern Recognition, vol. 36, pp.1407-23, 2003.
5. N. Arıca, F.T. Yarman-Vural, “BAS: A Perceptural Shape Descriptor Based on the Beam Angle Statistics”, Pattern Recognition Letters, vol. 24, pp.1627-39, 2003.
6. M. Ozay, F.T. Yarman-Vural. " On the Performance of Stacked Generalization Architecture", Lecture Notes in Computer Science, ICIAR pp.445-451, 2008,
7. E. Akbaş, F.T. Yarman-Vural, "Automatic Image Annotation by Ensemble of Visual Descriptors", CVPR 2007,pp.1-8, 2007.
8. İ. Yalabık, F.T. Yarman-Vural, G. Ucoluk, O.T. Sehitoglu, "A Pattern Classification Approach for Boosting with Genetic Algorithms", ISCIS07, pp.1-6, 2007.
9. M. Uysal, E. Akbas, F.T. Yarman-Vural, “A Hierarchical Classification System Based on Adaptive Resonance Theory”, ICIP06, p.2913-16, 2006.
10. E. Akbaş, F.T. Yarman-Vural, “Design of Feature Set for Face Recognition Problem”, Lecture Notes in Computer Science, ISCIS 2006, pp.239-47, 2006.
11. N. Arica, F.T. Yarman-Vural, “Shape Similarity Measurement for Boundary Based Features”, Lecture Notes in Computer Science, ISBN 3-540-29069-9, 3656, pp.431-438, 2005.
12. M. Uysal and F.T. Yarman-Vural, “ORF-NT: An Object-Based Image Retrieval Framework Using Neighborhood Trees”, Lecture Notes in Computer Science 37332005, ISBN 3-540-29414-7 SCIS2005, p.595-605, 2005.
13. N. Arıca and F.T. Yarman-Vural, “A Compact Shape Descriptor Based on the Beam Angle Statistics”, International Conference on Image and Vidor Retrieval (CIVR), 2003.
14. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, International Conference on Pattern Recognition ICPR, 2002.
15. A. Çarkacıoğlu, F.T. Yarman-Vural "Learning Similarity Space", Proc. Int. Conf. on Processing ICIP02, pp. 405-408, Rochester, NY, USA, September 2002.
16. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, Proc. International Conference on Pattern Recognition (ICPR) Quebec, Canada, 2002.
17. N. Arıca, F.T. Yarman-Vural, “A Shape Descriptor Based on Circular Hidden Markov Model”, Proc. International Conference on Pattern Recognition (ICPR), 2000, Barcelona, Spain.
18. P. Duygulu, E. Saber, M. Tekalp, F.T. Yarman-Vural, “Multi-Level Image Segmentation for Content Based Image Representation" IEEE-ICASSP-2000, July 2000, Istanbul.
19. P. Duygulu, A. Çarkacıoğlu, F.T. Yarman-Vural, "Multi-Level Object Description: Color or Texture", First IEEE Balkan Conference on Signal Processing, Communications, Circuits, and Systems, June 1-3, 2000, Istanbul, Turkey.
20. N. Dicle, F.T. Yarman-Vural, NES A File Format for Content Based Coding of Images, ISCIS 15, 2000, Istanbul.
21. N. Arıca, F.T. Yarman-Vural, “A New HMM Topology for Shape Recognition” IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP'99), pp. 162-168, Antalya TURKEY, 1999.
22. S. Genc, F.T. Yarman-Vural, "Morphing for Motion Estimation" Int.Conf. on Image Processing Systems, and Technologiesand Information Systems, Las Vegas 1999.
29
Türkçe Bildiriler 1. M. Özay, F.T. Yarman Vural, "Yığılmış Genelleme Algoritmalarının Performans Analizi", Sinyal
İşleme ve İletişim Uygulamaları Kurultayı, 2008.2. A. Sayar, F.T. Yarman Vural, "Yarı Denetimli Kümeleme ile Görüntü İçeriğine Açıklayıcı Not
Üretme", Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2008.3. Ö. Özcanli, E. Akbaş, F.T. Yarman Vural, “Görüntü Erişiminde Bulanık ARTMAP Ve Adaboost
Sınıflandırma Yöntemlerinin Performans Karşılaştırması”, IEEE 13. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2005.
4. Ö.Ö. Tola, N. Arıca, F.T. Yarman Vural, “Genelleştirilmiş Kerteriz Açısı İstatistikleri ile Şekil Tanıma”, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 1, 2004.
5. M. Uysal, Ö. Özcanlı, F.T. Yarman Vural, “Bulanık ARTMAP Mimarisini Kullanan İçerik Bazlı Bir İmge Sorgulama Sistemi, Sinyal Isleme ve İletisim Uygulamaları Kurultayı, 2004.
6. A. Çarkacıoğlu, F.T. Yarman Vural, "Doku Benzerliğini Tesbit Etmek İçin Yeni Bir Tanımlayıcı", 11. SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.
7. N. Arıca, F.T. Yarman Vural, "Tıkız Şekil Betimleyicileri", 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.
8. Ö.C. Özcanlı, P. Duygulu Şahin, F.T. Yarman Vural, “Açıklamalı Görüntü Veritabanları Kullanarak Nesne Tanıma ve Erişimi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.
9. S. Alkan, F.T. Yarman Vural, “Öğretim Üyesi Yetiştirme Programı”, Elektrik Elektronik, Bilgisayar Mühendislikleri Eğitimi I. Ulusal Sempozyumu, 2003.
10. M. Uysal, F.T. Yarman Vural, “En İyi Temsil Eden Öznitelik Kullanılarak İçeriğe Dayalı Endeksleme ve Bulanık Mantığa Dayalı Sorgulama Sistemi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.
11. N. Arıca, F.T. Yarman Vural, “Kerteriz Tabanlı Şekil Tanımlayıcısı”, 10. SIU, cilt.1 sayfa 129-134, 2002.
12. Ö.C. Özcanlı, S. Dağlar, F.T. Yarman Vural, “Yerel Benzerlik Örüntüsü Yöntemi ile Görüntü Erişim Sistemi”, 10. SIU, cilt.1 sayfa 141-146, 2002. 30
BAS: Beam Angle Statstics
Ömer Önder TolaNafiz Arıca
Fatoş Tünay Yarman Vural
Beam Angle Statistics (BAS)
BAS, is a shape descriptor that describes a shape that is defined as an ordered set of boundary pixels, by assigning the statistics of beam angles measured for the pixel as a feature vector.
Beam Angle shows how much the shape is bended.
BAS, describes the 2D shape as 1D moment functions of the Beam Angles.
Beam Angle Statistics: Given an ordered set of boundary pixels
1 2 34
56
78
91011
121314
15161718192021222324
2526
2728
2930
313233
34
35 36
Beam Angle Statistics)(ip 0 1 23
45
67
8910
111213
14151617181716151413
1211
109
87
654
3
2 1
))(( ipC1
Beam Angle Statistics)(ip 0 1 23
45
67
8910
111213
14151617181716151413
1211
109
87
654
3
2 1
))(( ipC2
Beam Angle Statistics)(ip 0 1 23
45
67
8910
111213
14151617181716151413
1211
109
87
654
3
2 1
))(( ipC3
Beam Angle Statistics)(ip
))(( ipCK
0 1 23
45
67
8910
111213
14151617181716151413
1211
109
87
654
3
2 1
Beam Angle Statistics)(ip 0 1 23
45
67
8910
111213
14151617181716151413
1211
109
87
654
3
2 1
))(( ipC17
Beam Angle Statistics)(ip
)( Kip
)( Kip
))(( ipCK
Beam Angle Statistics)(ip
)( Kip
)( Kip
,...)(,)()( 21 iCEiCEi
0,1,2,...m ))(()(K
iCPCiCE KKmK
m
METU, Department of Computer Engineering 41
Plots of CK(i)’s with fix K values
k=N/40
k=N/10
k=N/4
k=N/40 : N/4
METU, Department of Computer Engineering 42
What is the most appropriate value for K which discriminates the shapes in large database and represents the shape information at all scale ?
Answer:Find a representation which employs the
information in CK(i) for all values of K.
Output of a stochastic process at each point
METU, Department of Computer Engineering 43
C(i) is a Random Variable of the stochastic process which generates the beam angles
mth moment of random variable C(i)
Each boundary point i is represented by the moments of C(i)
METU, Department of Computer Engineering 44
First three moments of C(i)’s
METU, Department of Computer Engineering 45
Correspondence of Visual Parts and Insensitivity to Affine Transformation
METU, Department of Computer Engineering 46
Robustness to Polygonal Approximation
Robustness to Robustness to NoiseNoise
METU, Department of Computer Engineering 47
Similarity MeasurementElastic Matching Algorithm
Similarity Measurement method Application of dynamic programming Minimize the distance between two
patterns by allowing deformations on the patterns.
Cost of matching two items is calculated by Euclidean metric.
Robust to distortions promises to approximate human ways of
perceiving similarity
METU, Department of Computer Engineering 48
TEST RESULT FOR MPEG 7 CE PART A-1 Robustness to Scaling
METU, Department of Computer Engineering 49
TEST RESULT FOR MPEG 7 CE PART A-2 Robustness to Rotation
METU, Department of Computer Engineering 50
TEST RESULT FOR MPEG 7 CE PART B Similarity-based Retrieval
METU, Department of Computer Engineering 51
TEST RESULT FOR MPEG 7 CE
PART C Motion and Non-Rigid Deformations
METU, Department of Computer Engineering 52
Comparison Best Studies in MPEG 7 CE Shape
1;Data Set Shape
Context
Tangent Space
Curvature Scale Space
Zernika
Moments
Wavelet DAG SBA with length 40
SBA with length 60
Part A1 _ 88.6589.76 92.54 88.04 85 89.32 90.87
Part A2 _ 100 99.37 99.60 97.46 85 99.82 100
Part B 76.51 76.4575.44 70.22 67.76 60 81.04 82.37
Part C _ 92 96 94.5 93 83 93 93.5
METU, Department of Computer Engineering 53
Performance Evaluation (1) Average performance with the average over the three parts; Total Score1 = 1/3 A + 1/3 B + 1/3 C
Tangent Space
Curvature Scale Space
Zernika
Moments
Wavelet DAG SBA with length 40
SBA with length 60
Total Score 1
87.59 88.67 86.93 84.50 76 90.80 91.69
(2) Average performance with the average over the number of queries;Total Score2 = 840/2241 A + 1400/2241 B+ 1 / 2241 C
Tangent Space
Curvature Scale Space
Zernika
Moments
Wavelet DAG SBA with length 40
SBA with length 60
Total Score 2
83.16 82.62 79.92 77.14 69.38
86.12 87.27
Boundary of a shape is not always available
Question: Can we somehow extract the boundary of a shape directly on the edge detected image?
GENERALIZE THE BAS FUNCTIONS
Generalized Beam Angle Statistics (GBAS)
Assume: Shape is embedded in set of unordered edge pixel.
Define the generalized beam angles formed by the forward and backward boundary pixel sets that are partitioned by the mean beam vector.
Generalized Beam Angle Statistics (GBAS)
1 2
3
4
5
6
7
8
9
9 10
11
1213
1415
16
17
1718
19
20 2122
23
24
2526
2728
2930
31 32
33
34
Mean Beam Vector
Mean Beam Vector
)(ip
Forward and BackwardBoundary Pixel Sets
SgggG ,...,, 21 RiiiI ,...,, 21
ForwardBoundary Pixels
Set
BackwardBoundary Pixels
Set
)(ip
Mean Beam Vector
Generalized Beam Angle
li
kg
)(ip
))((, ipC lk
SgggG ,...,, 21 RiiiI ,...,, 21
ForwardBoundary Pixels
Set
BackwardBoundary Pixels
Set
Mean Beam Vector
Generalized Beam Angle
li
kg
)(ip
))((, ipC lk
,...)(,)()( 21 iCEiCEi
0,1,2,...m ))(()( ,,
S
k
R
llki
mlk
m iCPCiCE1 1
Mean Beam Vector
Generalized Beam Angle Statistics
HANOLISTIC: a hierarchical automatic image annotation system using holistic approach
Özge Öztimur Karadağ &
Fatoş T. Yarman Vural
Department of Computer EngineeringMiddle East Technical University, Ankara, Turkey
Automatic Image Annotation
Image Annotation : Assigning keywords to digital images. Labor intensive Time consuming
Need a system that automatically annotates images.
Image Annotation Literature
Annotation problem has become popular since 1990s.
Related to CBIR. CBIR processes visual information Annotation processes visual and
semantic information Relate visual content information
to semantic context information.
Problems About Automatic Image Annotation Human subjectivity Semantic Gap Availability of datasets
Image Annotation Approaches in the Literature…
Segmental Approaches1. Segment or partition the image into
regions2. Extract features from the regions3. Quantize features into blobs4. Model the relation between the
image regions and annotation words
Holistic Approaches Features are extracted from the
whole image.
The Proposed System: HANOLISTIC
Introducing semantic information as supervision. each word is considered as a class label, an image belongs to one or more classes
Holistic Approach: multiple visual features are extracted from the several whole image. Multiple feature spaces
Description of an Image Content Description by Visual Features
of Mpeg-7 Color Layout Color Structure Scalable Color Homogenous Texture Edge Histogram
Context Description by Semantic Words Annotation words
System Architecture of HANOLISTIC
Level-0 : consists of level-0 annotators, one annotator for each visual description space.
Meta-level : consists of a meta-annotator
Level-0 Annotator
refers to the features of the i th image in the j th description space
refers to the membership value of the l th word for the i th image in the j th description space.
Meta-Level The results of level-0 annotators are
aggregated.
is a vector, referring to the final word
membership values for the i th image.
Experimental studies
Realization of HANOLISTIC Instance based realization of Level-0 Eager realization of Level-0 Realization of Meta-level
Performance criteria
Results
Experimental Setup Data set: A subset of Corel Stock
Photo Collection, consisting of 5000 images. Training set: 4500 images (500
images for validation) Testing set: 500 images
Each image is annotated with 1-5 many words.
Instance based Realization of Level-0 Annotator by Fuzzy-knn
Level-0 annotators are realized by fuzzy-knn.
For each description space; k nearest neighbors of the image is determined.
Word membership values are estimated considering the neighbors’ words and their distance from the image.
High membership values are assigned to words that appear in close neighborhood.
Eager Realization of Level-0 by Ann
For a given image Ii, ANN receives visual description of the image as input and semantic annotation words of the image as target.
Each ANN is trained with backpropagation and a randomly selected set of images is used for validation to determine when to stop training. K-fold cross validation is applied.
Realization of Meta-Level by Majority Voting
Adds the membership values returned by level-0 annotators using the formula
where, Pi,j is a vector containing the word membership values returned from the jth level-0 annotator.
For each word select the maximum of the five word membership values estimated by the level-0 annotators.
Performance Criteria Precision
Recall
F-score
Performance of Level-0 Annotators Performance of Level-0
annotators with fuzzy-knn
Performance of HANOLISTIC
Comparison of HANOLISTIC with other systems in the literature:
Annotation Examples
Annotation Examples…
Conclusion We proposed a hierarchical
automatic image annotation system using holistic approach.
We tested the system both with an instance based and an eager method.
We realized that the instance based methods are more promising in the considered problem domain.
Conclusion… The power of the proposed
system comes from the following main principles: Simplicity Fuzziness Simultaneous processing of content
and context information Holistic view of image through
different perspectives
84
Regions and Relationships
• Segment the image into regions
• Find their properties and interrelationships
• Construct a graph representation with nodes for regions and edges for spatial relationships
• Use graph matching to compare images
Like what?
85
Tiger Image as a Graph
sky
sand
tiger grass
aboveadjacent
above
inside
above aboveadjacent
image
abstract regions
86
Object Detection: Rowley’s Face Finder
1. convert to gray scale2. normalize for lighting*
3. histogram equalization4. apply K-NN trained on 16K images
What data is fed tothe classifier?
32 x 32 windows ina pyramid structure
* Like first step in Laws algorithm, p. 220
87
Fleck and Forsyth’s Flesh Detector
The “Finding Naked People” Paper
• Convert RGB to HSI• Use the intensity component to compute a texture map texture = med2 ( | I - med1(I) | )• If a pixel falls into either of the following ranges, it’s a potential skin pixel
texture < 5, 110 < hue < 150, 20 < saturation < 60 texture < 5, 130 < hue < 170, 30 < saturation < 130
median filters of radii 4 and 6
Look for LARGE areas that satisfy this to identify pornography.
See Transparencies
88
Relevance Feedback
In real interactive CBIR systems, the user shouldbe allowed to interact with the system to “refine”the results of a query until he/she is satisfied.
Relevance feedback work has been done by a number of research groups, e.g.
• The Photobook Project (Media Lab, MIT)• The Leiden Portrait Retrieval Project• The MARS Project (Tom Huang’s group at Illinois)
89
Information Retrieval Model*
An IR model consists of: a document model a query model a model for computing similarity between documents
and the queries
Term (keyword) weighting
Relevance Feedback
*from Rui, Huang, and Mehrotra’s work
90
Term weighting
Term weight assigning different weights for different
keyword(terms) according their relative importance to the document
define to be the weight for term ,k=1,2,…,N, in the document i
document i can be represented as a weight vector in the term space
ikwkt
iNiii wwwD ;...;; 21
91
Term weighting
The query Q also is a weight vector in the term space
The similarity between D and Q
QD
QDQDSim
),(
qNqq wwwQ ;...;; 21
.
92
Using Relevance Feedback
The CBIR system should automatically adjust the weight that were given by the user for the relevance of previously retrieved documents
Most systems use a statistical method for adjusting the weights.
93
The Idea of Gaussian Normalization
If all the relevant images have similar values for component j
the component j is relevant to the query If all the relevant images have very different
values for component j the component j is not relevant to the query
the inverse of the standard deviation of the related image sequence is a good measure of the weight for component j
the smaller the variance, the larger the weight
94
Leiden Portrait System
The Leiden Portrait Retrieval System is anexample of the use of relevance feedback.
http://ind156b.wi.leidenuniv.nl:2000/
95
Andy Berman’s FIDS System multiple distance measures Boolean and linear combinations efficient indexing using images as keys
96
Andy Berman’s FIDS System:
Use of key images and the triangle inequalityfor efficient retrieval.
97
Andy Berman’s FIDS System:
Bare-Bones Triangle Inequality Algorithm
Offline
1. Choose a small set of key images
2. Store distances from database images to keys
Online (given query Q)
1. Compute the distance from Q to each key
2. Obtain lower bounds on distances to database images
3. Threshold or return all images in order of lower bounds
98
Andy Berman’s FIDS System:
99
Andy Berman’s FIDS System:
Bare-Bones Algorithm with Multiple Distance Measures
Offline
1. Choose key images for each measure
2. Store distances from database images to keys for all measures
Online (given query Q)
1. Calculate lower bounds for each measure
2. Combine to form lower bounds for composite measures
3. Continue as in single measure algorithm
100
Andy Berman’s FIDS System:
Triangle Tries
A triangle trie is a tree structure that stores the distances from database images to each of the keys, one key per tree level.
root
3 4
1 9 8
W,Z X Y
Distance to key 1
Distance to key 2
101
Andy Berman’s FIDS System:
Triangle Tries and Two-Stage Pruning
• First Stage: Use a short triangle trie.
• Second Stage: Bare-bones algorithm on the images returned from the triangle-trie stage.
The quality of the output is the same as with the bare-bones algorithm itself, but execution is faster.
102
Andy Berman’s FIDS System:
103
Andy Berman’s FIDS System:
104
Andy Berman’s FIDS System:
Performance on a Pentium Pro 200-mHz
Step 1. Extract features from query image. (.02s t .25s)
Step 2. Calculate distance from query to key images. (1s t .8ms)
Step 3. Calculate lower bound distances. (t 4ms per 1000 images using 35 keys, which is about 250,000 images per second.)
Step 4. Return the images with smallest lower bound distances.
105
Andy Berman’s FIDS System:
106
Weakness of Low-level Features
Can’t capture the high-level concepts
107
Current Research Objective
Image Database
Query Image Retrieved Images
Images
Object-oriented Feature
Extraction
User
…
Animals
Buildings
Office Buildings
Houses
Transportation
•Boats
•Vehicles
…
boat
Categories
108
Overall Approach
• Develop object recognizers for common objects
• Use these recognizers to design a new set of both
low- and high-level features
• Design a learning system that can use these
features to recognize classes of objects
109
Boat Recognition
110
Vehicle Recognition
111
Building Recognition
112
Building Features: Consistent Line Clusters (CLC)
A Consistent Line Cluster is a set of lines that are homogeneous in terms of some line features.
Color-CLC: The lines have the same color feature.
Orientation-CLC: The lines are parallel to each other or converge to a common vanishing point.
Spatially-CLC: The lines are in close proximity to each other.
113
Color-CLC Color feature of lines: color pair (c1,c2) Color pair space:
RGB (2563*2563) Too big!Dominant colors (20*20)
Finding the color pairs:One line Several color pairs
Constructing Color-CLC: use clustering
114
Color-CLC
115
Orientation-CLC The lines in an Orientation-CLC are
parallel to each other in the 3D world
The parallel lines of an object in a 2D image can be: Parallel in 2D Converging to a vanishing point
(perspective)
116
Orientation-CLC
117
Spatially-CLC Vertical position clustering Horizontal position clustering
118
Building Recognition by CLC
Two types of buildings Two criteria Inter-relationship criterion Intra-relationship criterion
119
Inter-relationship criterion(Nc1>Ti1 or Nc2>Ti1) and (Nc1+Nc2)>Ti2
Nc1 = number of intersecting lines in cluster 1
Nc2 = number of intersecting lines in cluster 2
120
Intra-relationship criterion
|So| > Tj1 or w(So) > Tj2
S0 = set of heavily overlapping lines in a cluster
121
Experimental Evaluation Object Recognition
97 well-patterned buildings (bp): 97/97 44 not well-patterned buildings (bnp):
42/44 16 not patterned non-buildings (nbnp):
15/16 (one false positive) 25 patterned non-buildings (nbp): 0/25
CBIR
122
Experimental Evaluation Well-Patterned Buildings
123
Experimental Evaluation Non-Well-Patterned Buildings
124
Experimental Evaluation Non-Well-Patterned Non-Buildings
125
Experimental EvaluationWell-Patterned Non-Buildings (false positives)
126
Experimental Evaluation (CBIR)
Total Positive
Classification
(#)
Total Negative
Classification(#)
False positive
(#)
False negativ
e(#)
Accuracy(%)
Arborgreens
0 47 0 0 100
Campusinfall
27 21 0 5 89.6
Cannonbeach
30 18 0 6 87.5
Yellowstone 4 44 4 0 91.7
127
Experimental Evaluation (CBIR) False positives from Yellowstone