45
Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes, T. Leung, D. Martin, G. Mori, J. Puzicha, J.Shi, X. Ren

Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Visual Grouping and Object Recognition

Jitendra Malik*

U.C. Berkeley

* with S. Belongie, C. Fowlkes, T. Leung, D. Martin, G. Mori, J. Puzicha, J.Shi, X. Ren

Page 2: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

From images/video to objects

Labeled sets: tiger, grass etc

Page 3: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Page 4: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Page 5: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Consistency

A

B C

• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept

• Attention accounts for differences

Image

BG L-bird R-bird

grass bush

headeye

beakfar body

headeye

beak body

Perceptual organization forms a tree:

Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).

Page 6: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Outline

• Finding boundaries

• Recognizing objects

• Recognizing actions

Page 7: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Finding boundaries: Is texture a problem or a solution?

image orientation energy

Page 8: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Statistically optimal contour detection

• Use humans to segment a large collection of natural images.

• Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features.

Page 9: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Orientation Energy

• Gaussian 2nd derivative and its Hilbert pair

• Can detect combination of bar and edge features [Perona & Malik 90]

22 )()( evenodd fIfIOE

Page 10: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Texture gradient = Chi square distance between texton histograms in half disks across edge

Texture gradient = Chi square distance between texton histograms in half disks across edge

i

j

k

K

m ji

jiji mhmh

mhmhhh

1

22

)()(

)]()([

2

1),(Chi-square

0.1

0.8

Page 11: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Page 12: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Page 13: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

ROC curve for local boundary detection

Page 14: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Outline

• Finding boundaries

• Recognizing objects

• Recognizing actions

Page 15: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Biological Shape

• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms

Page 16: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Deformable Templates: Related Work

• Fischler & Elschlager (1973)

• Grenander et al. (1991)

• von der Malsburg (1993)

Page 17: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Matching Framework

• Find correspondences between points on shape

• Fast pruning

• Estimate transformation & measure similarity

model target

...

Page 18: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Comparing Pointsets

Page 19: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Shape ContextCount the number of points inside each bin, e.g.:

Count = 4

Count = 10

...

Compact representation of distribution of points relative to each point

Page 20: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Shape Context

Page 21: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Comparing Shape Contexts

Compute matching costs using Chi Squared distance:

Recover correspondences by solving linear assignment problem with costs Cij

[Jonker & Volgenant 1987]

Page 22: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Matching Framework

• Find correspondences between points on shape

• Fast pruning

• Estimate transformation & measure similarity

model target

...

Page 23: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Fast pruning

• Find best match for the shape context at only a few random points and add up cost

),(minarg

),(),(

2*

*

1

2

ui

jqueryui

ij

query

r

jiquery

SCSCSC

SCSCSSdist

Page 24: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Matching Framework

• Find correspondences between points on shape

• Fast pruning

• Estimate transformation & measure similarity

model target

...

Page 25: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

• 2D counterpart to cubic spline:

• Minimizes bending energy:

• Solve by inverting linear system

• Can be regularized when data is inexact

Thin Plate Spline Model

Duchon (1977), Meinguet (1979), Wahba (1991)

Page 26: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

MatchingExample

model target

Page 27: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Outlier Test Example

Page 28: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Object Recognition Experiments

• Handwritten digits

• COIL 3D objects (Nayar-Murase)

• Human body configurations

• Trademarks

Page 29: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Terms in Similarity Score• Shape Context difference

• Local Image appearance difference– orientation– gray-level correlation in Gaussian window– … (many more possible)

• Bending energy

Page 30: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Handwritten Digit Recognition

• MNIST 60 000: – linear: 12.0%

– 40 PCA+ quad: 3.3%

– 1000 RBF +linear: 3.6%

– K-NN: 5%

– K-NN (deskewed): 2.4%

– K-NN (tangent dist.): 1.1%

– SVM: 1.1%

– LeNet 5: 0.95%

• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%

• MNIST 20 000: – K-NN, Shape Context

matching: 0.63%

Page 31: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Page 32: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

COIL Object Database

Page 33: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Prototypes Selected for 2 Categories

Details in Belongie, Malik & Puzicha (NIPS2000)

Page 34: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Error vs. Number of Views

Page 35: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Human body configurations

Page 36: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Deformable Matching

• Kinematic chain-based deformation model

• Use iterations of correspondence and deformation

• Keypoints on exemplars are deformed to locations on query image

Page 37: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Results

Page 38: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Trademark Similarity

Page 39: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Recognizing objects in scenes

Page 40: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Outline

• Finding boundaries

• Recognizing objects

• Recognizing actions

Page 41: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Examples of Actions• Movement and posture change

– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …

• Object manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,

press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)…

• Conversational gesture– point, …

• Sign Language

Page 42: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Key cues for action recognition

• “Morpho-kinesics” of action (shape and movement of the body)

• Identity of the object/s

• Activity context

Page 43: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Image/Video Stick figure Action

• Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom)– 2D joint positions– 3D joint positions– Joint angles

• Complete representation

• Evidence that it is effectively computable

Page 44: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Tracking by Repeated Finding

Page 45: Computer Vision Group University of California Berkeley Visual Grouping and Object Recognition Jitendra Malik * U.C. Berkeley * with S. Belongie, C. Fowlkes,

Computer Vision GroupUniversity of California Berkeley

Achievable goals in 3 years

• Reasonable competence at object recognition at crude category level (~1000)

• Detection/Tracking of humans as kinematic chains, assuming adequate resolution.

• Recognition of ~10-100 actions and compositions thereof.