Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik...

Visual Grouping and RecognitionVisual Grouping and Recognition

Jitendra Malik

University of California at Berkeley

Jitendra Malik

University of California at Berkeley

CollaboratorsCollaborators

• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)

• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren

• Recognition: Serge Belongie, Jan Puzicha

• Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL)

• Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren

• Recognition: Serge Belongie, Jan Puzicha

From images to objectsFrom images to objects

Labeled sets: tiger, grass etc

What enables us to parse a scene?What enables us to parse a scene?

– Low level cues• Color/texture• Contours• Motion

– Mid level cues• T-junctions• Convexity

– High level Cues• Familiar Object• Familiar Motion

– Low level cues• Color/texture• Contours• Motion

– Mid level cues• T-junctions• Convexity

– High level Cues• Familiar Object• Familiar Motion

Grouping factors Grouping factors

But is segmentation a meaningful problem?

• Difficult to define formally, but humans are remarkably consistent…

Human Segmentations (1)Human Segmentations (1)

Human Segmentations (2)Human Segmentations (2)

ConsistencyConsistencyA

• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept

• Attention accounts for differences

BG L-bird R-bird

grass bush

headeye

beakfar body

headeye

beak body

Perceptual organization forms a tree:

Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).

Ecological Statistics of image segmentation

• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)

• Design algorithm for incorporating multiple cues for image segmentation

• Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950)

• Design algorithm for incorporating multiple cues for image segmentation

ProximityProximity

Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)

ConvexityConvexity

Region AreaRegion Area

• Compare to Alvarez,Gousseau,Morel• Compare to Alvarez,Gousseau,Morel

y = Kx-

= 1.008

Lengths of curvesLengths of curves

100 120 140 160 180 200 220 240140

-1 0 1 2 3 4 5 6 70

log(contour length)

Image Segmentation as Graph PartitioningImage Segmentation as Graph PartitioningBuild a weighted graph G=(V,E) from image

V: image pixels

E: connections between pairs of nearby pixels

region

same the tobelong

j& iy that probabilit :ijW

Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

Normalized Cut, A measure of dissimilarity

• Minimum cut is not appropriate since it favors cutting small pieces.

• Normalized Cut, Ncut:

• Minimum cut is not appropriate since it favors cutting small pieces.

• Normalized Cut, Ncut:

B)A,( B)A,(

Bassoc

cutNcut

Normalized Cut As Generalized Eigenvalue problem

• after simplification, we get• after simplification, we get

)1)(()1(

B)A,(B)A,(

cutNcut

.01},,1{ with ,)(

DybyDyy

yWDyBANcut T

Cue-Integration for Image SegmentationCue-Integration for Image Segmentation

[Malik, Belongie, Shi, Leung 1999]

On image segmentation..On image segmentation..

• Humans are quite consistent, so model the goal as emulating their behavior.

• Ecological statistics of grouping cues can be learned from image data.

• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.

• Humans are quite consistent, so model the goal as emulating their behavior.

• Ecological statistics of grouping cues can be learned from image data.

• We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.

Framework for RecognitionFramework for Recognition(1) Segmentation

PixelsSegments(2) Association

SegmentsRegions(3) Matching

RegionsPrototypes

Over-segmentation necessary; Under-segmentation fatal

Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k

~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps

Matching regions to viewsMatching regions to views

• GOAL: obtain small misclassification error using few views

• Matching allowing deformations of prototype views makes this possible

• GOAL: obtain small misclassification error using few views

• Matching allowing deformations of prototype views makes this possible

Matching with original and deformed prototypesPrototype Test Error

Deforming Biological ShapesDeforming Biological Shapes

• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms

• Find correspondences between points on shape

• Estimate transformation

• Measure similarity

model target

Finding correspondences between shapes

• Each shape is represented by a set of sample points

• Each sample point has a descriptor – the shape context

• Define cost Wij for matching point i on first shape with point j on second shape.

• Solve for correspondence as optimum assignment.

• Each shape is represented by a set of sample points

• Each sample point has a descriptor – the shape context

• Define cost Wij for matching point i on first shape with point j on second shape.

• Solve for correspondence as optimum assignment.

Shape ContextShape ContextCount the number of points inside each bin, e.g.:

Count = 4

Count = 10

Compact representation of distribution of points relative to each point

Comparing Shape ContextsComparing Shape ContextsCompute matching costs using Chi Squared Test:

Recover correspondences by solving linear assignment problem with costs Cij

[Jonker & Volgenant 1987]

MatchingExampleMatchingExample

model target

Synthetic Test ResultsSynthetic Test ResultsFish - deformation + noise Fish - deformation + outliers

ICP Shape Context RPM

Measuring Shape SimilarityMeasuring Shape Similarity• Image appearance around matched points

– color or gray-level window– orientation

• Shape context differences at matched points

• Bending Energy

• Image appearance around matched points– color or gray-level window– orientation

• Shape context differences at matched points

• Bending Energy

COIL Object Database

Editing: PrototypesEditing: Prototypes

• Human Shape Perception• Computational Needs for K-NN

Prototype Selection: Coil-20Prototype Selection: Coil-20

MNIST Handwritten Digits

Handwritten Digit RecognitionHandwritten Digit Recognition• MNIST 60 000:

– linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– K-NN (deskewed): 2.4%

– K-NN (tangent dist.): 1.1%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 60 000: – linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%

• MNIST 20 000: – K-NN, Shape Context

matching: 0.63%

Hand-written Digit RecognitionHand-written Digit Recognition• MNIST 60 000:

– linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 60 000: – linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 600 000 (distortions): – LeNet 5: 0.8%

– SVM: 0.8%

– Boosted LeNet 4: 0.7%

• MNIST 20 000– K-NN, Shape context

matching: 0.63 %

Results: Digit RecognitionResults: Digit Recognition

1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance

Results: Digit Recognition (Detail)

Trademark SimilarityTrademark Similarity

Future work..Future work..

• Indexing based on color/texture/shape features before correspondence matching

• Integrate segmentation and recognition

• Indexing based on color/texture/shape features before correspondence matching

• Integrate segmentation and recognition

Computing cost on a Pentium PCComputing cost on a Pentium PC

• Segmentation: 2 minutes /image (200x100)

• Matching : 0.2 sec / match (100 points)

• Segmentation: 2 minutes /image (200x100)

• Matching : 0.2 sec / match (100 points)

Given a 104 speedup..Given a 104 speedup..

• 5K object categories/sec

• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.

• 5K object categories/sec

• Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.

Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik...

Documents

Recognizing Surfaces using Three-Dimensional Textons Thomas Leung and Jitendra Malik Computer Science Division University of California at Berkeley

Intrinsic Scene Properties from a Single RGB-D Image...Intrinsic Scene Properties from a Single RGB-D Image Jonathan T. Barron and Jitendra Malik UC Berkeley {barron, malik}@eecs.berkeley.eduAbstract

Convergence of vision and graphics Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley

Cross Modal Distillation for Supervision Transferjhoffman/papers/Gupta_CVPR16.pdf · Saurabh Gupta Judy Hoffman Jitendra Malik University of California, Berkeley fsgupta, jhoffman,

1 Learning to Detect Natural Image Boundaries David Martin, Charless Fowlkes, Jitendra Malik Computer Science Division University of California at Berkeley

Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming ... · Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming He Facebook AI Research (FAIR) Abstract We present SlowFast

Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley

Introduction to Artifical Intelligence Jitendra Malik U.C. Berkeley Jitendra Malik U.C. Berkeley

Iterative Instance Segmentation...Iterative Instance Segmentation Ke Li UC Berkeley ke.li@eecs.berkeley.edu Bharath Hariharan Facebook AI Research bharathh@fb.com Jitendra Malik UC

Professor Jitendra Malik - EECS at UC Berkeleymalik/malik-cv-full.pdf · Professor Jitendra Malik Arthur J. Chick Professor, Department of EECS University of California at Berkeley,

Recognition using Regions - EECS at UC Berkeley€¦ · Recognition using Regions ⁄ Chunhui Gu, Joseph J. Lim, Pablo Arbel´aez, Jitendra Malik University of California at Berkeley

Finding and exploiting correspondences in Drosophila embryos Charless Fowlkes and Jitendra Malik UC Berkeley Computer Science

Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley

1 Ecological Statistics and Perceptual Organization Charless Fowlkes work with David Martin and Jitendra Malik at University of California at Berkeley

Perceptual Organization and Recognition of Indoor Scenes ......Saurabh Gupta, Pablo Arbelaez, and Jitendra Malik´ University of California, Berkeley - Berkeley, CA 94720 {sgupta,

Viewpoints and Keypoints - cv-foundation.org...Shubham Tulsiani and Jitendra Malik University of California, Berkeley - Berkeley, CA 94720 fshubhtuls,malikg@eecs.berkeley.edu Abstract

Poselets Lubomir Bourdev and Jitendra Malik EECS, U.C ...yjlee/teaching/ecs289h... · Bourdev, Lubomir, and Jitendra Malik. "Poselets: Body part detectors trained using 3d human pose

Computer Vision Group University of California Berkeley 1 Learning Scale-Invariant Contour Completion Xiaofeng Ren, Charless Fowlkes and Jitendra Malik

Visual Recognition: The Big Picture Jitendra Malik University of California at Berkeley

Video Motion Capture · Video Motion Capture Christoph Bregler Jitendra Malik Computer Science Division University of California, Berkeley Berkeley, CA 94720-1776 bregler@cs.berkeley.edu,