Upload
dayna-dawson
View
221
Download
4
Tags:
Embed Size (px)
Citation preview
Visual Grouping and RecognitionVisual Grouping and Recognition
Jitendra Malik
University of California at Berkeley
Jitendra Malik
University of California at Berkeley
CollaboratorsCollaborators
• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)
• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren
• Recognition: Serge Belongie, Jan Puzicha
• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)
• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren
• Recognition: Serge Belongie, Jan Puzicha
From images to objectsFrom images to objects
Labeled sets: tiger, grass etc
What enables us to parse a scene?What enables us to parse a scene?
– Low level cues• Color/texture• Contours• Motion
– Mid level cues• T-junctions• Convexity
– High level Cues• Familiar Object• Familiar Motion
– Low level cues• Color/texture• Contours• Motion
– Mid level cues• T-junctions• Convexity
– High level Cues• Familiar Object• Familiar Motion
Grouping factors Grouping factors
But is segmentation a meaningful problem?
But is segmentation a meaningful problem?
• Difficult to define formally, but humans are remarkably consistent…
• Difficult to define formally, but humans are remarkably consistent…
Human Segmentations (1)Human Segmentations (1)
Human Segmentations (2)Human Segmentations (2)
ConsistencyConsistencyA
B C
• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept
• Attention accounts for differences
Image
BG L-bird R-bird
grass bush
headeye
beakfar body
headeye
beak body
Perceptual organization forms a tree:
Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).
Ecological Statistics of image segmentation
Ecological Statistics of image segmentation
• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)
• Design algorithm for incorporating multiple cues for image segmentation
• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)
• Design algorithm for incorporating multiple cues for image segmentation
ProximityProximity
Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)
Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)
ConvexityConvexity
Region AreaRegion Area
• Compare to Alvarez,Gousseau,Morel• Compare to Alvarez,Gousseau,Morel
y = Kx-
= 1.008
Lengths of curvesLengths of curves
100 120 140 160 180 200 220 240140
150
160
170
180
190
200
210
220
230
240
-1 0 1 2 3 4 5 6 70
1
2
3
4
5
6
7
8
9
log(contour length)
log(c
ount)
Image Segmentation as Graph PartitioningImage Segmentation as Graph PartitioningBuild a weighted graph G=(V,E) from image
V: image pixels
E: connections between pairs of nearby pixels
region
same the tobelong
j& iy that probabilit :ijW
Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]
Normalized Cut, A measure of dissimilarity
Normalized Cut, A measure of dissimilarity
• Minimum cut is not appropriate since it favors cutting small pieces.
• Normalized Cut, Ncut:
• Minimum cut is not appropriate since it favors cutting small pieces.
• Normalized Cut, Ncut:
V),(
B)A,(
V)A,(
B)A,( B)A,(
Bassoc
cut
assoc
cutNcut
Normalized Cut As Generalized Eigenvalue problem
Normalized Cut As Generalized Eigenvalue problem
• after simplification, we get• after simplification, we get
...
),(
),( ;
11)1(
)1)(()1(
11
)1)(()1(
)VB,(
)BA,(
)VA,(
B)A,(B)A,(
0
i
x
T
T
T
T
iiD
iiDk
Dk
xWDx
Dk
xWDx
assoc
cut
assoc
cutNcut
i
.01},,1{ with ,)(
),(
DybyDyy
yWDyBANcut T
iT
T
Cue-Integration for Image SegmentationCue-Integration for Image Segmentation
[Malik, Belongie, Shi, Leung 1999]
On image segmentation..On image segmentation..
• Humans are quite consistent, so model the goal as emulating their behavior.
• Ecological statistics of grouping cues can be learned from image data.
• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.
• Humans are quite consistent, so model the goal as emulating their behavior.
• Ecological statistics of grouping cues can be learned from image data.
• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.
Framework for RecognitionFramework for Recognition(1) Segmentation
PixelsSegments(2) Association
SegmentsRegions(3) Matching
RegionsPrototypes
Over-segmentation necessary; Under-segmentation fatal
Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k
~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps
Matching regions to viewsMatching regions to views
• GOAL: obtain small misclassification error using few views
• Matching allowing deformations of prototype views makes this possible
• GOAL: obtain small misclassification error using few views
• Matching allowing deformations of prototype views makes this possible
Matching with original and deformed prototypesPrototype Test Error
Deforming Biological ShapesDeforming Biological Shapes
• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms
• Find correspondences between points on shape
• Estimate transformation
• Measure similarity
model target
...
Finding correspondences between shapes
Finding correspondences between shapes
• Each shape is represented by a set of sample points
• Each sample point has a descriptor – the shape context
• Define cost Wij for matching point i on first shape with point j on second shape.
• Solve for correspondence as optimum assignment.
• Each shape is represented by a set of sample points
• Each sample point has a descriptor – the shape context
• Define cost Wij for matching point i on first shape with point j on second shape.
• Solve for correspondence as optimum assignment.
Shape ContextShape ContextCount the number of points inside each bin, e.g.:
Count = 4
Count = 10
...
Compact representation of distribution of points relative to each point
Comparing Shape ContextsComparing Shape ContextsCompute matching costs using Chi Squared Test:
Recover correspondences by solving linear assignment problem with costs Cij
[Jonker & Volgenant 1987]
MatchingExampleMatchingExample
model target
Synthetic Test ResultsSynthetic Test ResultsFish - deformation + noise Fish - deformation + outliers
ICP Shape Context RPM
Measuring Shape SimilarityMeasuring Shape Similarity• Image appearance around matched points
– color or gray-level window– orientation
• Shape context differences at matched points
• Bending Energy
• Image appearance around matched points– color or gray-level window– orientation
• Shape context differences at matched points
• Bending Energy
COIL Object Database
Editing: PrototypesEditing: Prototypes
• Human Shape Perception• Computational Needs for K-NN
• Human Shape Perception• Computational Needs for K-NN
Prototype Selection: Coil-20Prototype Selection: Coil-20
MNIST Handwritten Digits
Handwritten Digit RecognitionHandwritten Digit Recognition• MNIST 60 000:
– linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%
• MNIST 20 000: – K-NN, Shape Context
matching: 0.63%
Hand-written Digit RecognitionHand-written Digit Recognition• MNIST 60 000:
– linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%
– SVM: 0.8%
– Boosted LeNet 4: 0.7%
• MNIST 20 000– K-NN, Shape context
matching: 0.63 %
Results: Digit RecognitionResults: Digit Recognition
1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance
1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance
Results: Digit Recognition (Detail)
Results: Digit Recognition (Detail)
Trademark SimilarityTrademark Similarity
Future work..Future work..
• Indexing based on color/texture/shape features before correspondence matching
• Integrate segmentation and recognition
• Indexing based on color/texture/shape features before correspondence matching
• Integrate segmentation and recognition
Computing cost on a Pentium PCComputing cost on a Pentium PC
• Segmentation: 2 minutes /image (200x100)
• Matching : 0.2 sec / match (100 points)
• Segmentation: 2 minutes /image (200x100)
• Matching : 0.2 sec / match (100 points)
Given a 104 speedup..Given a 104 speedup..
• 5K object categories/sec
• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.
• 5K object categories/sec
• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.