Part 4: Combined segmentation and recognition
by Rob Fergus (MIT)
Aim• Given an image and object category, to segment the object
Segmentation should (ideally) be• shaped like the object e.g. cow-like• obtained efficiently in an unsupervised manner• able to handle self-occlusion
Segmentation
ObjectCategory
Model
Cow Image Segmented Cow
Slide from Kumar ‘05
Feature-detector view
Examples of bottom-up segmentation
• Using Normalized Cuts, Shi & Malik, 1997
Borenstein and Ullman, ECCV 2002
Jigsaw approach: Borenstein and Ullman, 2002
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
g
Inte
rleaved
Ob
ject
Cate
gori
zati
on
an
d S
eg
men
tati
on
Implicit Shape Model - Liebe and Schiele, 2003
BackprojectedHypotheses
Interest PointsMatched Codebook Entries
Probabilistic Voting
Voting Space(continuous)
Backprojection
of Maxima
Segmentation
Refined Hypotheses(uniform sampling)
Liebe and Schiele, 2003, 2005
Random Fields for segmentation
I = Image pixels (observed)h = foreground/background labels (hidden) – one label per pixel = Parameters
PriorLikelihood
)|(),|()|,(),|( hphIphIpIhp
Posterior Joint
1. Generative approach models joint Markov random field (MRF)
2. Discriminative approach models posterior directly Conditional random field (CRF)
I (pixels)Image Plane
i
j
h (labels)
{foreground,background}
hi
hj Unary Potential
i(I|hi,i)
Pairwise Potential (MRF)
ij(hi, hj|ij)
ijijjiij
iiii hhhI
Z)|,(),|(
)(
1
MRF PriorLikelihood
Generative Markov Random Field
)|(),|()|,( hphIpIhp
Prior has no dependency on I
Conditional Random FieldLafferty, McCallum and Pereira 2001
ijijjiij
iiii IhhIh
IZIhp )|,,()|,(
),(
1),|(
PairwiseUnary
• Dependency on I allows introduction of pairwise terms that make use of image.
• For example, neighboring labels should be similar only if pixel colors are similar Contrast term
Discriminative approach
I (pixels)Image Plane
i
j
hi
hj
e.g Kumar and Hebert 2003
I (pixels)Image Plane
i
j
hi
hj
Figure from Kumar et al., CVPR 2005
OBJCUT
Ω (shape parameter)
Kumar, Torr & Zisserman 2005
ijijjiijijjiijii
iiii hhIhhhhIIhp ),,|()|,( )|(),|(),,|( 2121
PairwiseUnary
• Ω is a shape prior on the labels from a Layered Pictorial Structure (LPS) model
• Segmentation by:
- Match LPS model to image (get number of samples, each with a different pose
-Marginalize over the samples using a single graph cut [Boykov & Jolly, 2001]
Label smoothness
ContrastDistance from Ω
Color Likelihood
OBJCUT:Shape prior - Ω - Layered Pictorial Structures (LPS)
• Generative model
• Composition of parts + spatial layout
Layer 2
Layer 1
Parts in Layer 2 can occlude parts in Layer 1
Spatial Layout(Pairwise Configuration)
Kumar, et al. 2004, 2005
In the absence of a clear boundary between object and background
SegmentationImage
OBJCUT: ResultsUsing LPS Model for Cow
Levin & Weiss [ECCV 2006]
ij
jii
IFi hhjiwhhIhEi
),();( ,
Segmentation alignment with image edges
Resulting min-cut segmentation
Consistency with fragments segmentation
[Lepetit et al. CVPR 2005]
• Decision forest classifier
• Features are differences of pixel intensities
Classifier
Winn and Shotton 2006
Layout Consistent Random Field
Layout consistency
(8,3) (9,3)(7,3)
(8,2) (9,2)(7,2)
(8,4) (9,4)(7,4)
Neighboring pixels
(p,q)
? (p,q+1)(p,q) (p+1,q
+1)(p-
1,q+1)
Layoutconsist
ent
Winn and Shotton 2006
Layout Consistent Random Field
Layout consistency
Part detector
Winn and Shotton 2006
Stability of part labelling
Part color key
Object-Specific Figure-Ground Segregation
Stella X. Yu and Jianbo Shi, 2002
Image parsing: Tu, Zhu and Yuille 2003
Image parsing: Tu, Zhu and Yuille 2003
Segment out all the cars
….
fused tree model for cars
Unseen image
Training images
Segmented Cars
Segmentation Trees
OverviewOverview
Multiscale Seg.
Todorovic and Ahuja, CVPR 2006
Slide from T. Wu
LOCUS model
Deformation field D
Position & size T
Class shape π Class edge sprite μo,σo
Edge image e
Image
Object appearance λ1
Background appearance λ0
Mask m
Shared between images
Different for each image
Kannan, Jojic and Frey 2004Winn and Jojic, 2005
In this section: brief paper reviews
• Jigsaw approach: Borenstein & Ullman, 2001, 2002• Concurrent recognition and segmentation: Yu and Shi,
2002• Image parsing: Tu, Zhu & Yuille 2003 • Interleaved segmentation: Liebe & Schiele, 2004, 2005• OBJCUT: Kumar, Torr, Zisserman 2005• LOCUS: Winn and Jojic, 2005• LayoutCRF: Winn and Shotton, 2006• Levin and Weiss, 2006• Todorovic and Ahuja, 2006
Summary
• Strength– Explains every pixel of the image– Useful for image editing, layering, etc.
• Issues– Invariance issues
• (especially) scale, view-point variations
– Inference difficulties
Conditional Random Fields for Segmentation
• Segmentation map x• Image I
Low-level pairwise term High-level local term
Pixel-wise similarity
Object-Specific Figure-Ground Segregation
Some segmentation/detection results
Yu and Shi, 2002
• Multiscale Conditional Random Fields for Image Labeling
• Xuming He Richard S. Zemel Miguel A´ . Carreira-Perpin˜a´n
• Conditional Random Fields for Object
• Recognition
• Ariadna Quattoni Michael Collins Trevor Darrell
OBJCUT
Probability of labelling in addition has• Unary potential which depend on distance from Θ (shape parameter)
D (pixels)
m (labels)
Θ (shape parameter)
Image Plane
Object CategorySpecific MRFx
y
mx
my
Unary PotentialΦx(mx|Θ)
Kumar, et al. 2004, 2005
Localization using features
Levin and Weiss 2006
Levin and Weiss, ECCV 2006
Results: horses
Results: horses
Cows: Results• Segmentations from interest points
Single-frame recognition - No temporal continuity used!
Liebe and Schiele, 2003, 2005
Examples of low-level image segmentation
• Normalized Cuts, Shi & Malik, 1997
Borenstein & Ullman, ECCV 2002
LayoutCRF