Image Parsing: Unifying Segmentation and
DetectionZ. Tu, X. Chen, A.L. Yuille and S-C.
HzICCV 2003 (Marr Prize) & IJCV
2005
Sanketh Shetty
Outline
• Why Image Parsing?• Introduction to Concepts in DDMCMC• DDMCMC applied to Image Parsing• Combining Discriminative and
Generative Models for Parsing• Results• Comments
Image Parsing
Image I
Parse Structure W
Optimize p(W|I)
Properties of Parse Structure
• Dynamic and reconfigurable– Variable number of nodes and node types
• Defined by a Markov Chain– Data Driven Markov Chain Monte Carlo
(earlier work in segmentation, grouping and recognition)
Key Concepts• Joint model for Segmentation &
Recognition– Combine different modules to obtain cues
• Fully generative explanation for Image generation– Uses Generative and Discriminative Models
+ DDMCMC framework– Concurrent Top-Down & Bottom-Up Parsing
Pattern Classes
62 characters
Faces
Regions
• Key Concepts:– Markov Chains– Markov Chain Monte Carlo
• Metropolis-Hastings [Metropolis 1953, Hastings 1970]
• Reversible Jump [Green 1995]– Data Driven Markov Chain Monte Carlo
MCMC: A Quick Tour
Markov Chains
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Markov Chain Monte Carlo
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Metropolis-Hastings Algorithm
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Metropolis-Hastings Algorithm
Proposal Distribution
Invariant Distribution
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Reversible Jumps MCMC
• Many competing models to explain data– Need to explore this complicated state space
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
DDMCMC Motivation
Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
DDMCMC Motivation
Generative Modelp(I|W)p(W)
State Space
DDMCMC Motivation
Generative Modelp(I|W)p(W)
State Space
Discriminative Modelq( wj | I ) Dramatically reduce search space by focusing
sampling to highly probable states.
DDMCMC Framework
• Moves:– Node Creation– Node Deletion– Change Node Attributes
Transition Kernel
Satisfies detailed balanced equation
Full Transition Kernel
Convergence to p(W|I)
Monotonically at a geometric rate
Criteria for Designing Transition Kernels
Image Generation ModelRegions:
Constant IntensityTexturesShading
State of parse graph
62 characters
Faces
3 Regions
UniformDesigned to penalize high model complexity
Shape Prior
Faces
3 Regions
Shape Prior: Text
Intensity Models
Intensity Model: Faces
Discriminative Cues Used• Adaboost Trained
– Face Detector– Text Detector
• Adaptive Binarization Cues• Edge Cues
– Canny at 3 scales• Shape Affinity Cues• Region Affinity Cues
Transition Kernel Design• Remember
Possible Transitions
1. Birth/Death of a Face Node2. Birth/Death of Text Node3. Boundary Evolution4. Split/Merge Region5. Change node attributes
Face/Text Transitions
Region Transitions
Change Node Attributes
Basic Control Algorithm
Results
Comments• Well motivated but very complicated approach to THE HOLY GRAIL
problem in vision– Good global convergence results for inference with very minor
dependence on initial W.– Extensible to larger set of primitives and pattern types.
• Many details of the algorithm are missing and it is hard to understand the motivation for choices of values for some parameters
• Unclear if the p(W|I)’s for configurations with different class compositions are comparable.
• Derek’s comment on Adaboost false positives and their failure to report their exact improvement
• No quantitative results/comparison to other algorithms and approaches
– It should be possible to design a simple experiment to measure performance on recognition/detection/localization tasks.
Thank You