Upload
maximilian-summers
View
220
Download
3
Embed Size (px)
Citation preview
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16I9CHAIR OF COMPUTER SCIENCE 9DATA MANAGEMENT AND EXPLORATION
Efficient EMD-based Similarity Search in Multimedia Databases via
Flexible Dimensionality Reduction
SIGMOD 08, June 10th 2008, Vancouver, Canada
Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Outline
Introduction Similarity Search The Earth Mover’s Distance Dimensionality Reduction
Dimensionality Reduction for the EMD Reduction Matrixes Data-independent Reduction Data-dependent Reduction
Experimental Results Conclusion & Outlook
2
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Introduction – Similarity Search
Objective: Find similar objects in database
Applications: Medical images, edutainment, engineering, etc.
Requires: Object feature extraction (here: feature histograms) Similarity measure (here: Earth Mover’s Distance) Efficient retrieval technique for similar objects
3
similar? similar?
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Introduction – The Earth Mover’s Distance[1]
Transform object features to match those of other object
Minimum “cost x flow” for transformation: EMD
4
[1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, 2001.
histogramx histogramy
Flows
histogramx histogramy
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Introduction – Dimensionality Reduction
Challenge for Similarity Search: high computational complexity for high dimensionalities
Approach: Reduce dimensionality of query & DB Filter DB using lower dimensionality Refine using orig. dimensionality
Filter quality criteria Selectivity (few refinements) No false dismissals (lower bound property)
5
reduce
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Dimensionality Reduction for the EMD
Both the feature vectorsand the cost matrixhave to be reduced
General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD Discarding dimensions destroys LB property Splitting dimensions causes poor selectivity
Aggregating dimensionality reductions can work well Original dimensions are not split up Each reduced dimension consists of set of orig. dimensions
6
reduce
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Reduction Matrixes
Aggregating dimensionality reductions are characterized by reduction matrix R = [ rab ] {0,1} d x d’ with
Example:
Lower-bounding reduced cost matrix C’ = [ c’a’b’ ] given R as given by [2] There is no larger lower bound (see paper)
Main question: Which dimensions to aggregate?
7
R =
1 01 00 10 1
x = ( 2 4 3 6 ) x' = ( 2 4 3 6 ) • = ( 6 9 )
1 01 00 10 1
[2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Data-Independent Reduction
8
Goal: Tight lower bound (large reduced EMD values)
Large cost between reduced dimensions Small loss of cost for each reduced dimension
Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity
kMedoid clustering based on the cost matrix
0 1 3 41 0 2 33 2 0 14 3 1 0
C =0 22 0C' =
lost cost information
R =
1 01 00 10 1
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Data-Dependent Reduction based on flows
Idea: Incorporate knowledge on data for better reduction
In data-independent reduction, only C is used Problem: Ensuring large c’a’b’ pointless if f’a’b’ is small
Now: Also include information on F
9
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Data-Dependent Reduction: Algorithm
Add preprocessing step analyzing the data Collect information about flows in unreduced EMD Use information to improve initial / intermediate reduction
matrix iterate until no improvement made
10
no
yes
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Data-Dependent Reduction: Preprocessing
Calculate average flow matrix F = [ fab ] for sample S of DB
Approximate the flows F’ in reduced EMD with F’ = RT F R
Maximize approximate average reduced EMD
11
~
_ _
_
R =
1 01 00 10 1
approximate average reduced
flows
4 89 5F' =~
2 1 2 30 1 2 13 2 3 11 3 0 1
F =_
average flowsapproximate
average reduced EMD
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Data-Dependent Reduction: Optimization
Global optimization of
requires
assessment of all possible reduction matrices Find local optimum via reassignment of dimensions
FB-All: Choose best reassignment in each iteration FB-Mod: Choose first profitable reassignment in each iteration
Initial reduction matrices Base: assign all original
dimensions to first reduced dimension
KMed: reduction matrix from data-independent reduction
12
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Experimental Results
13
Data-independent vs. data-dependent aggregation
sample image [2] data independent(kMedoid)
data dependent(FB-All-Mod)
costliest flows
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Experimental Results
Efficiency vs. reduced dimensionality (Retina DB)
14
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Experimental Results
Efficiency vs. reduced dimensionality (IRMA DB)
15
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Experimental Results
16
Filter & Refinement times and filter selectivity (IRMA DB)
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16
Conclusion & Outlook
17
Conclusion Earth Mover’s Distance as a similarity measure High quality, but computationally expensive in high dimensions Dimensionality reduction for the EMD Data-independent reduction: Clustering in feature space Data-dependent reduction: Analyze flow information
Outlook Local reductions Different reduction for query and DB Index reduced histograms using [3]
[3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.