Shape-based Similarity Retrieval for Medical Image Databases Xiaoning Qian Yale University

Shape-based Similarity Retrieval for Medical Image Databases

Xiaoning QianYale University

A Different Retrieval Strategy

• Image content and semantics:• Hard to describe by text; best described graphically.

• Hard to form meaningful queries using text by naive users

• Different Categorization for different users.

• Rich in geometry.

• Content-based Similarity Retrieval:• A more flexible search strategy for image databases: define semantics

by image appearance directly.

• Image features:• Low tolerance for errors for medical image databases.

• Technically difficult to index.

Motivation

• NHANES II – National health and nutrition examination surveys

• 17,000 cervical and lumbar spine images• Vertebral shapes correlate with the presence and severity of

osteophyte (Sharp Protuberance)• Shape-based similarity retrieval

cervicalcervical lumbarlumbar

Motivation

Content-based similarity retrieval

• Feature space• Feature distance• Similarity retrievals

• Range queries

• K-nearest neighbor queries

},),(|{ Puruqdu T distance feature:)(

examplequery:

Indexing• Problem of content-based retrieval: Assessing similarity is complex.

Brute-force similarity retrieval is too expensive in large databases• Indexing: a data structure and the corresponding retrieval algorithm

which quickly retrieves all similar feature points

Vertebra Boundaries

• Content-based retrieval based on 2-d shapes in images

• Deformable template contour detection algorithm using orthogonal curves

• Each boundary is obtained as a sequence of m points

• The correspondence of m points is known

• When boundaries are given by points, shapes can not be represented by vectors

Problems in Shape Indexing

• Fact 1: Shape space is a curved manifold.

• Fact 2: Coordinate-based indexing trees usually outperform metric-based trees.

• An optimal embedding algorithm to embed shape space into a Euclidean space

Problems in Shape Indexing

• Fact 3: Shape space and embedding space are high dimensional.• Curse of dimensionality• Practical shapes are non-uniformly distributed.

• To overcome the curse, we adapt indexing trees to further increase the efficiency of indexing trees.

from PhD Thesis “Efficiently from PhD Thesis “Efficiently Indexing High-Dimensional Indexing High-Dimensional Data Spaces” by Christian Data Spaces” by Christian

BBööhmhm

Overview

• Content-based retrieval using 2-d shapes in images• When boundaries are given by points, shapes can not be

represented by vectors• Shape space is high dimensional• Shape indexing is difficult because shape space is a high

dimensional curved manifold

2-D Boundary Shape SpaceImage

segmentation??

Shape Space and Shape Embedding

Shape space theory

• Object boundaries are represented by a fixed number of sample points (object configurations).

• Two boundaries have the same shape if they can be mapped onto each other by translation, rotation and scaling.

• We can define a shape as an equivalence class.

• The set of all equivalence classes is a

shape space, which is a curved manifold.

Tmm iyxiyxiyxz ],,,[ 2211

}},0{\,{][ CtCCztzz m

CtCtzzzz and }0{\, if ~ 2121

Shape and pre-shape

Cm Cm CPm-2

Orbit under translation and scaling

Each orbit maps to a point

Orbit under rotation

Pre-shape space

jz pji

sj zez j ][][

configuration

shapepre-shape

jjpj zzzzz 1/1][

Shape space and Procrustes distances

• Shape space of planar objects is a complex projective space with complex dimension m-2

• There are naturally defined distance metrics• Partial Procrustes distance

• Its extension to measure weighted distance between shapes

• It is also a metric

21212 min),( zezzzd i

21212 min),( WzeWzzzd i

Pre-shape space

Shape embedding

• Embed shapes in a Euclidean space for shape indexing

• Optimality: minimizing the difference between the partial Procrustes distance and the Euclidean distance after embedding

Shape embedding

Cm CPm-2

Pre-shape space

Embedding shape back into pre-shape space

Procrustean mean

• Objective function:

sisipsjsi zzdzz,

22)][,]([)]([)]([min

function embedding : ][)]([ pi

},,,{ 21 n Θ

Shape embedding – Optimality

Lemma: The original problem is equivalent to minimizing

sjsi zzG,

2)]([)]([

Proposition: If has a minimizer , then

also minimizes .

)]([ )ˆ,ˆ( Θ

}ˆ,,ˆ,ˆ{ˆ21 n Θ

jisjsi zzG

2)]([)]([

sisipsjsi zzdzz,

22)][,]([)]([)]([min

sjsi zz,

2)]([)]([min

,)]([min

pji ze

Procrustean mean size-and-shape

Shape embedding – algorithm

Algorithm: Embedding

1. Embed shapes of configurations zi at pre-shapes [zi]p with 0 initial rotations ;

2. Estimate the procrustean mean:

3. Estimate the embedding functions, that is, rotations:

4. Iterate until the estimates converge.

)ˆ]angle([z ˆ *pii

jsj ze

nz j ][

Shape distances in embedding space

• Shape distance after embedding:

• Its extension to measure partial shape distances:

zezezeze

kjkj ][][][][

)]([)]([

we zezeWWzezed kjkj ][][][][ **

• Small metric distortion after optimal shape embedding• Shape proximity measured by different distances

Experiments

dp or de dwp or dwe

• Fraction of distance difference • Measure the relative difference between 2 metrics using 1000 vertebral shapes

• Mean: 4.59 x 10 -4

• Variance: 5.21 x 10-7

• Overlap of nearest neighbors

Experiments on metric distortion

pd/ FD pe dd

Experiments – weighted distances• Expert-given severity grades: “0” to “5” (169 graded vertebrae)

• Average grade of nearest neighbors• Query grade “0”

• Average grade of nearest neighbors• Query grade “3”

Experiments – weighted distances

Conclusions

• Shapes can be embedded back in Euclidean space optimally

• Object shape correlates with the expert-given severity grades

• We can compare both complete and partial shapes by extend the original shape distances to the weighted shape distances in shape space and embedding space

Indexing Tree Adaptation

Indexing trees

• Clustering tree• Constructed by pair-

wise distances – metric spherical covers

• Retrieval – recursive node tests

• Kd-tree• Constructed by spatial

positions – rectangular covers

• Retrieval – recursive node tests

Axiomatic properties of indexing trees

• In indexing trees, each tree node is a subset of feature points (a hypercube or a metric sphere); all tree nodes at the same level cover the feature space

• Containment: Each child node is contained (as a subset) in the parent node

• Monotonic: If a feature space sub-set N intersects the query sphere, then any super set of N also intersects the query sphere

Node tests

• Similarity retrieval recursively applies node tests from the root to leaves

• Note test checks whether query sphere intersects tree node covers• If it does not, the sub-tree rooted at the node can

not contain desired points and can be ignored• Otherwise, the sub-tree needs to be explored

further• The ability of rejecting subsets/nodes sub-

linearity

REVIEW

Indexing trees – performance measure

• Computational cost – average number of node tests Q

• A recursive formula

; otherwise1

node leafa is if0

thempassing given tests

node passing ofy probabilit lconditiona :

testsnode passing ofy probabilit :

in tree of nodes child ofset :)(

nodeat rooted tree:nodes tree:,,

Indexing trees – performance measure

• Main factor – conditional probability• The nodes that have high probabilities of passing

node tests are “in-efficient”.• Need to find an efficient indexing structure based on

this performance measure• Note: Optimizing Binary Tree is an NP-complete

problem (Hyafil and Rivest)

Indexing tree adaptation

• Develop a procedure to improve the tree performance by deleting those “in-efficient nodes”• Find a more efficient indexing tree by reducing the average

number of node tests Q under the operation of “node elimination”

Node EliminationNode Elimination

Indexing tree adaptation

• “Markovian” property

• Independence property• Eliminating will not affect the contribution to the

average computations from the nonintersecting sub-trees with .

Node EliminationNode Elimination

Greedy node elimination

Compare average costs before and after node elimination:

))(1( ? )(1

TT QpQp

Proposition 1:

Average computational cost decreases if and only if

Greedy node elimination

Check each node and eliminate in-efficient ones at this level

Then, check this level

• Simple breadth-first tree traversal

FOR Level = root+1 : leaf-1

eliminate in-efficient nodes at this level

Optimal tree adaptation

• Objective function:

• Search space:• To find the optimal sub-tree rooted at node :

Set of sub-trees obtained by eliminating nodes from the sub-tree rooted

minarg

and ; min

• Proof by contradiction

minmin,1

minmin

1minmin **

Proposition 2: Optimality

• Dynamic programming procedure:• Tree traversal embedded in another tree traversal

• Computational complexity:• Greedy node elimination: O(N)

• Optimal tree adaptation: O(NlogN)

Check every node at this Level

Obtain the optimal sub-tree rooted at their parent

THEN, Check every node below this Level

Obtain the optimal sub-tree rooted at their parent

Experiments

• Construct tree for the given data set• Coordinate-based indexing tree: kd tree

• Metric-based indexing tree: clustering tree

• Compare the average computational cost between the original tree and adapted trees• Scalability

• Effect of dimension (intrinsic dimension)

Experiments

• Average computational cost vs. database size

Experiments

• Average computational cost vs. dimension• D – exterior dimension

• d – intrinsic dimension

• Cost – average number of node tests

D d cost2 2 89.2083

10 2 88.0517

20 2 88.6447

shape ? 314.5052

4 4 464.9511

10 10 3234.954

Conclusions

• Tree adaptation can increase the efficiency of indexing trees by eliminating in-efficient nodes• We can achieve the optimality under the operation of

“node elimination”

• The performance of indexing trees depends on the distribution of feature points

• In general, coordinate-based indexing trees (kd trees) appear to be more efficient than metric-based indexing trees (clustering trees)

Compare Different Shape Indexing Strategies

Shape indexing

• Constructed using pair-wise distances

• Overlapped tree nodes• Cannot support

different distances equally efficiently

• Constructed using spatial positions (coordinates)

• Disjoint tree nodes• Support different

distances efficiently

Performance Comparison

• Indexing performance• Computation: average number of node tests

• Complete shape queries • Partial shape queries

Comparison of retrievals

• Extended to partial shape retrieval • Euclidean distance after embedding• Weighted Euclidean distance after embedding

Conclusions

• Shape indexing after embedding gives better performance

• Shape embedding supports efficient complete as well as partial shape indexing

Cardiac Ultrasound Database

k nearest neighbors

Dolphin Database

k nearest neighbors

Summary and Future Directions

Indexing Structure

Shape-based Similarity Retrieval

Image Database

Results

Index based

on shapes

Current work

Relevance Feedback

Browsing

Other Features

Text DatabaseSearching Indexing

Combine results

Future work

Indexing using intrinsic dimension • Construct efficient indexing structures based on the proposed performance

measure

Practical data is not uniform

Relevance Feedback • Update weighting matrix W for the proposed partial shape distances• Learn the meaningful adaptive distance metric for multiple features

integrated in medical image databases• Unify image semantics and text semantics

Similarity:

Procrustes Distance

Other Image Features

Patient RecordsOther Distances

Is there a way of providing feedback to achieve effective retrieval?

Browsing • What are effective interactive interfaces for image databases?• How can we construct indexing structures for efficient browsing?

Move along a path

Big Picture

Images

Patient Records

Clinical Decisions

Treatment Plans

Documents

High level knowledge

Low or middle level features

Image processing/analysis;Learning/mining;Indexing & searching;…

(appropriate formulation)

Information model

User behaviors

Metadata

User Interface

IR system

Acknowledgements

• Prof. Hemant D. Tagare• Prof. James S. Duncan• Prof. Larry Staib• Prof. Robert Fulbright• Dr. Carl Jaffe• Mr. Rodney Long• Dr. Sameer Antani

Supported by R01-LM06911 from NLM

Image Processing and Analysis Group, Yale UniversityImage Processing and Analysis Group, Yale University

Finding Endocardium

Experiments

• Improvement over the original indexing trees• Clustering tree

10,000

Data Points

Average Number of Node Tests

Dim = 2 Dim = 10

Test Ttest Orig. NE Opt. Orig. NE Opt.

0.0 0.0 75.85 68.14 67.46 1968.72 1302.12 1301.53

0.0 0.05,0.5 301.81 272.75 272.02 5484.99 3807.02 3807.10

0.05,0.5 0.0 75.85 76.52 70.31 1968,72 2435.49 2435.49

0.05,0.5 0.05,0.5 301.81 253.21 241.93 5484.99 3736.14 3736.14

Experiments

• Improvement over the original indexing trees• Kd-tree

10,000

Data Points

Average Number of Node Tests

Dim = 2 Dim = 10

Test Ttest Orig. NE Opt. Orig. NE Opt.

0.0 0.0 27.13 26.3 26.24 27.14 26.29 26.24

0.0 0.05,0.5 227.69 198.44 199.86 1310.91 1225.57 1230.49

0.05,0.5 0.0 27.13 34.78 29.18 27.14 142.06 50.42

0.05,0.5 0.05,0.5 227.69 186.97 171.71 1310.91 1163.25 1134.10

• Optimality• If the sub-tree rooted at is optimal, all of sub-trees rooted

at its child nodes are optimal

• Outer loop:

FOR Depth = leaf-2 : root

FOR each node at this depth

find the minimum cost tree

• Collection of all possible children configurations of after node elimination :

• Independence property• Average computations J(X, ) for the sub-tree rooted

at and set of sub-trees rooted at set of child nodes X

VCVCTTCJCJQ

),(min),(minmin}{}{

Proposition 3:

• From the independence property, we know the minimizing children configuration for the sub-tree rooted at is:

• Inner loop:

CJCiii

),(minarg)}(},{{

minmin

FOR Depth = leaf-2 : depth of

FOR each node at this depth

compare J({i}, ) and J(Cmin (i), );

find Cmin ();

Curved Manifold

• Vector Space:

• Curved Space:

Shape-based Similarity Retrieval for Medical Image Databases Xiaoning Qian Yale University

Documents

String Similarity Join With Diﬀerent Similarity Thresholds ...ynsilva/publications/StringSimilarityJoin.pdf · String Similarity Join With Diﬀerent Similarity Thresholds Based

Qian Et Al 2014_floridoside

Rachel Neo Wei Qian

© Copyright 2015 Qian Lin

Analytical chemistry 2013 qian liu

ZHANG QIAN

Portfolio of Shuye Qian

Aragon Qian HWM 05Apr10

Weikang Qian - SJTUumji.sjtu.edu.cn/~wkqian/people/Weikang-Qian-CV.pdf · 2020-01-07 · Weikang Qian, and Li Jiang, “Energy-efficient nonvolatile SRAM design based on resistive

Deng, Qian - Volatility Dispersion Trading

Qian-Fu-Ting CIS manual

Internet Service Migration and Placement Part 1 Instructor: Xiaodong Zhang Xiaoning Ding 11/08/2004

Yanlin Qian - arXiv

Seeking the Origin of Asymmetry Xin Qian BNL 12015 Sambamurti Lecture: Xin Qian

Xiaoning Jiang, Ph.D. - nsuf.inl.gov

Junhui Qian and Liangjun Su · Junhui Qian and Liangjun Su August 2014 Paper No. 06-2014 . Shrinkage Estimation of Regression Models with Multiple Structural Changes∗ Junhui Qian

QIAN HU CORPORATION LIMITED

Yvonne Qian Covestro

OUR VALUE ECOSYSTEM - Qian Hu Corporationqianhu.listedcompany.com/misc/ar2005.pdf · ABOUT QIAN HU Incorporated in 1998, Mainboard-listed Qian Hu Corporation Limited is an integrated

Herodotus & Sima Qian