View
229
Download
2
Category
Tags:
Preview:
Citation preview
Shape-based Similarity Retrieval for Medical Image Databases
Xiaoning QianYale University
A Different Retrieval Strategy
• Image content and semantics:• Hard to describe by text; best described graphically.
• Hard to form meaningful queries using text by naive users
• Different Categorization for different users.
• Rich in geometry.
• Content-based Similarity Retrieval:• A more flexible search strategy for image databases: define semantics
by image appearance directly.
• Image features:• Low tolerance for errors for medical image databases.
• Technically difficult to index.
Motivation
• NHANES II – National health and nutrition examination surveys
• 17,000 cervical and lumbar spine images• Vertebral shapes correlate with the presence and severity of
osteophyte (Sharp Protuberance)• Shape-based similarity retrieval
cervicalcervical lumbarlumbar
Motivation
Content-based similarity retrieval
• Feature space• Feature distance• Similarity retrievals
• Range queries
• K-nearest neighbor queries
},),(|{ Puruqdu T distance feature:)(
examplequery:
d
q
Indexing• Problem of content-based retrieval: Assessing similarity is complex.
Brute-force similarity retrieval is too expensive in large databases• Indexing: a data structure and the corresponding retrieval algorithm
which quickly retrieves all similar feature points
Vertebra Boundaries
• Content-based retrieval based on 2-d shapes in images
• Deformable template contour detection algorithm using orthogonal curves
• Each boundary is obtained as a sequence of m points
• The correspondence of m points is known
• When boundaries are given by points, shapes can not be represented by vectors
Problems in Shape Indexing
• Fact 1: Shape space is a curved manifold.
• Fact 2: Coordinate-based indexing trees usually outperform metric-based trees.
• An optimal embedding algorithm to embed shape space into a Euclidean space
Problems in Shape Indexing
• Fact 3: Shape space and embedding space are high dimensional.• Curse of dimensionality• Practical shapes are non-uniformly distributed.
• To overcome the curse, we adapt indexing trees to further increase the efficiency of indexing trees.
from PhD Thesis “Efficiently from PhD Thesis “Efficiently Indexing High-Dimensional Indexing High-Dimensional Data Spaces” by Christian Data Spaces” by Christian
BBööhmhm
Overview
• Content-based retrieval using 2-d shapes in images• When boundaries are given by points, shapes can not be
represented by vectors• Shape space is high dimensional• Shape indexing is difficult because shape space is a high
dimensional curved manifold
2-D Boundary Shape SpaceImage
segmentation??
Shape Space and Shape Embedding
Shape space theory
• Object boundaries are represented by a fixed number of sample points (object configurations).
• Two boundaries have the same shape if they can be mapped onto each other by translation, rotation and scaling.
• We can define a shape as an equivalence class.
• The set of all equivalence classes is a
shape space, which is a curved manifold.
Tmm iyxiyxiyxz ],,,[ 2211
}},0{\,{][ CtCCztzz m
CtCtzzzz and }0{\, if ~ 2121
Shape and pre-shape
Cm Cm CPm-2
Orbit under translation and scaling
Each orbit maps to a point
Orbit under rotation
Each orbit maps to a point
Pre-shape space
jz pji
sj zez j ][][
configuration
shapepre-shape
1
,,1
1 T
m mm
mT
jjmT
jjpj zzzzz 1/1][
Shape space and Procrustes distances
• Shape space of planar objects is a complex projective space with complex dimension m-2
• There are naturally defined distance metrics• Partial Procrustes distance
• Its extension to measure weighted distance between shapes
• It is also a metric
2
21212 min),( zezzzd i
p
2
21212 min),( WzeWzzzd i
wp
Cm
Pre-shape space
Orbit under rotation
dp
Shape embedding
• Embed shapes in a Euclidean space for shape indexing
• Optimality: minimizing the difference between the partial Procrustes distance and the Euclidean distance after embedding
Shape embedding
Cm CPm-2
Orbit under rotation
Each orbit maps to a point
Pre-shape space
Embedding shape back into pre-shape space
Procrustean mean
• Objective function:
ji
sisipsjsi zzdzz,
22)][,]([)]([)]([min
function embedding : ][)]([ pi
s zez
},,,{ 21 n Θ
Shape embedding – Optimality
Lemma: The original problem is equivalent to minimizing
ji
sjsi zzG,
2)]([)]([
Proposition: If has a minimizer , then
also minimizes .
i
sizH2
)]([ )ˆ,ˆ( Θ
}ˆ,,ˆ,ˆ{ˆ21 n Θ
jisjsi zzG
,
2)]([)]([
ji
sisipsjsi zzdzz,
22)][,]([)]([)]([min
ji
sjsi zz,
2)]([)]([min
i
siz2
,)]([min
Θ
][1
ˆ ˆ
j
pji ze
nj
Procrustean mean size-and-shape
Shape embedding – algorithm
Algorithm: Embedding
1. Embed shapes of configurations zi at pre-shapes [zi]p with 0 initial rotations ;
2. Estimate the procrustean mean:
3. Estimate the embedding functions, that is, rotations:
4. Iterate until the estimates converge.
)ˆ]angle([z ˆ *pii
j
pji
jsj ze
nz j ][
1)]([
n
1 ˆ
s'i
Shape distances in embedding space
• Shape distance after embedding:
• Its extension to measure partial shape distances:
pki
pji
pki
pji
sksje
zezezeze
zzd
kjkj ][][][][
)]([)]([
*
pki
pji
pki
pji
we zezeWWzezed kjkj ][][][][ **
• Small metric distortion after optimal shape embedding• Shape proximity measured by different distances
Experiments
vs.
dp or de dwp or dwe
• Fraction of distance difference • Measure the relative difference between 2 metrics using 1000 vertebral shapes
• Mean: 4.59 x 10 -4
• Variance: 5.21 x 10-7
• Overlap of nearest neighbors
Experiments on metric distortion
pd/ FD pe dd
Experiments – weighted distances• Expert-given severity grades: “0” to “5” (169 graded vertebrae)
• Average grade of nearest neighbors• Query grade “0”
• Average grade of nearest neighbors• Query grade “3”
Experiments – weighted distances
Conclusions
• Shapes can be embedded back in Euclidean space optimally
• Object shape correlates with the expert-given severity grades
• We can compare both complete and partial shapes by extend the original shape distances to the weighted shape distances in shape space and embedding space
Indexing Tree Adaptation
Indexing trees
• Clustering tree• Constructed by pair-
wise distances – metric spherical covers
• Retrieval – recursive node tests
• Kd-tree• Constructed by spatial
positions – rectangular covers
• Retrieval – recursive node tests
Axiomatic properties of indexing trees
• In indexing trees, each tree node is a subset of feature points (a hypercube or a metric sphere); all tree nodes at the same level cover the feature space
• Containment: Each child node is contained (as a subset) in the parent node
• Monotonic: If a feature space sub-set N intersects the query sphere, then any super set of N also intersects the query sphere
N
Node tests
• Similarity retrieval recursively applies node tests from the root to leaves
• Note test checks whether query sphere intersects tree node covers• If it does not, the sub-tree rooted at the node can
not contain desired points and can be ignored• Otherwise, the sub-tree needs to be explored
further• The ability of rejecting subsets/nodes sub-
linearity
REVIEW
Indexing trees – performance measure
• Computational cost – average number of node tests Q
• A recursive formula
T
CTT
QpQ
QpQ
T
1
; otherwise1
node leafa is if0
thempassing given tests
node passing ofy probabilit lconditiona :
testsnode passing ofy probabilit :
in tree of nodes child ofset :)(
nodeat rooted tree:nodes tree:,,
|
p
p
TC
T
ρT
Indexing trees – performance measure
• Main factor – conditional probability• The nodes that have high probabilities of passing
node tests are “in-efficient”.• Need to find an efficient indexing structure based on
this performance measure• Note: Optimizing Binary Tree is an NP-complete
problem (Hyafil and Rivest)
p
Indexing tree adaptation
• Develop a procedure to improve the tree performance by deleting those “in-efficient nodes”• Find a more efficient indexing tree by reducing the average
number of node tests Q under the operation of “node elimination”
Node EliminationNode Elimination
Indexing tree adaptation
• “Markovian” property
• Independence property• Eliminating will not affect the contribution to the
average computations from the nonintersecting sub-trees with .
Node EliminationNode Elimination
ppp
Greedy node elimination
Compare average costs before and after node elimination:
)(CT
))(1( ? )(1
TT QpQp
Proposition 1:
Average computational cost decreases if and only if
|)(|
11
TC
p
Greedy node elimination
Check each node and eliminate in-efficient ones at this level
Then, check this level
• Simple breadth-first tree traversal
FOR Level = root+1 : leaf-1
eliminate in-efficient nodes at this level
END
Optimal tree adaptation
• Objective function:
• Search space:• To find the optimal sub-tree rooted at node :
Set of sub-trees obtained by eliminating nodes from the sub-tree rooted
at :
TT
TT
QT
Τ
Τ
minarg
and ; min
min
min
Τ
Optimal tree adaptation
• Proof by contradiction
min,
minmin,1
minmin
minmin
,...,
1minmin **
m
CVCTT
CTCTT
QpQQ
Τ
Proposition 2: Optimality
Optimal tree adaptation
• Dynamic programming procedure:• Tree traversal embedded in another tree traversal
• Computational complexity:• Greedy node elimination: O(N)
• Optimal tree adaptation: O(NlogN)
Check every node at this Level
Obtain the optimal sub-tree rooted at their parent
THEN, Check every node below this Level
Obtain the optimal sub-tree rooted at their parent
Experiments
• Construct tree for the given data set• Coordinate-based indexing tree: kd tree
• Metric-based indexing tree: clustering tree
• Compare the average computational cost between the original tree and adapted trees• Scalability
• Effect of dimension (intrinsic dimension)
Experiments
• Average computational cost vs. database size
Experiments
• Average computational cost vs. dimension• D – exterior dimension
• d – intrinsic dimension
• Cost – average number of node tests
D d cost2 2 89.2083
10 2 88.0517
20 2 88.6447
shape ? 314.5052
4 4 464.9511
10 10 3234.954
Conclusions
• Tree adaptation can increase the efficiency of indexing trees by eliminating in-efficient nodes• We can achieve the optimality under the operation of
“node elimination”
• The performance of indexing trees depends on the distribution of feature points
• In general, coordinate-based indexing trees (kd trees) appear to be more efficient than metric-based indexing trees (clustering trees)
Compare Different Shape Indexing Strategies
Shape indexing
• Constructed using pair-wise distances
• Overlapped tree nodes• Cannot support
different distances equally efficiently
• Constructed using spatial positions (coordinates)
• Disjoint tree nodes• Support different
distances efficiently
vs.
Performance Comparison
• Indexing performance• Computation: average number of node tests
• Complete shape queries • Partial shape queries
Comparison of retrievals
• Extended to partial shape retrieval • Euclidean distance after embedding• Weighted Euclidean distance after embedding
Conclusions
• Shape indexing after embedding gives better performance
• Shape embedding supports efficient complete as well as partial shape indexing
Cardiac Ultrasound Database
Query
k nearest neighbors
Dolphin Database
Query
k nearest neighbors
Summary and Future Directions
User
Query
Indexing Structure
Shape-based Similarity Retrieval
Image Database
Results
Index based
on shapes
Current work
Relevance Feedback
Browsing
Other Features
Text DatabaseSearching Indexing
Combine results
Future work
Indexing using intrinsic dimension • Construct efficient indexing structures based on the proposed performance
measure
Practical data is not uniform
Relevance Feedback • Update weighting matrix W for the proposed partial shape distances• Learn the meaningful adaptive distance metric for multiple features
integrated in medical image databases• Unify image semantics and text semantics
Similarity:
Procrustes Distance
Other Image Features
Patient RecordsOther Distances
Is there a way of providing feedback to achieve effective retrieval?
Browsing • What are effective interactive interfaces for image databases?• How can we construct indexing structures for efficient browsing?
Move along a path
Big Picture
BRIC
Images
Patient Records
Clinical Decisions
Treatment Plans
Documents
…
SILS
High level knowledge
Low or middle level features
Image processing/analysis;Learning/mining;Indexing & searching;…
(appropriate formulation)
Information model
User behaviors
Metadata
User Interface
NLP
…
IR system
Acknowledgements
• Prof. Hemant D. Tagare• Prof. James S. Duncan• Prof. Larry Staib• Prof. Robert Fulbright• Dr. Carl Jaffe• Mr. Rodney Long• Dr. Sameer Antani
Supported by R01-LM06911 from NLM
Image Processing and Analysis Group, Yale UniversityImage Processing and Analysis Group, Yale University
Finding Endocardium
Experiments
• Improvement over the original indexing trees• Clustering tree
10,000
Data Points
Average Number of Node Tests
Dim = 2 Dim = 10
Test Ttest Orig. NE Opt. Orig. NE Opt.
0.0 0.0 75.85 68.14 67.46 1968.72 1302.12 1301.53
0.0 0.05,0.5 301.81 272.75 272.02 5484.99 3807.02 3807.10
0.05,0.5 0.0 75.85 76.52 70.31 1968,72 2435.49 2435.49
0.05,0.5 0.05,0.5 301.81 253.21 241.93 5484.99 3736.14 3736.14
Experiments
• Improvement over the original indexing trees• Kd-tree
10,000
Data Points
Average Number of Node Tests
Dim = 2 Dim = 10
Test Ttest Orig. NE Opt. Orig. NE Opt.
0.0 0.0 27.13 26.3 26.24 27.14 26.29 26.24
0.0 0.05,0.5 227.69 198.44 199.86 1310.91 1225.57 1230.49
0.05,0.5 0.0 27.13 34.78 29.18 27.14 142.06 50.42
0.05,0.5 0.05,0.5 227.69 186.97 171.71 1310.91 1163.25 1134.10
Optimal tree adaptation
• Optimality• If the sub-tree rooted at is optimal, all of sub-trees rooted
at its child nodes are optimal
• Outer loop:
FOR Depth = leaf-2 : root
FOR each node at this depth
find the minimum cost tree
END
END
Optimal tree adaptation
• Collection of all possible children configurations of after node elimination :
• Independence property• Average computations J(X, ) for the sub-tree rooted
at and set of sub-trees rooted at set of child nodes X
ii
VCVCTTCJCJQ
iii
iii
),(min),(minmin}{}{
Τ
X
QpXJ
min1,
Proposition 3:
T
TCV
Τ
Optimal tree adaptation
• From the independence property, we know the minimizing children configuration for the sub-tree rooted at is:
• Inner loop:
i
iCC
CJCiii
),(minarg)}(},{{
minmin
FOR Depth = leaf-2 : depth of
FOR each node at this depth
compare J({i}, ) and J(Cmin (i), );
find Cmin ();
END
END
Curved Manifold
??
A
BM
N
• Vector Space:
• Curved Space:
2
Recommended