41
Visual Recognition: Prospects for Image & Video Analytics Jitendra Malik University of California at Berkeley

Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Visual Recognition:

Prospects for Image & Video Analytics

Jitendra Malik

University of California at Berkeley

Page 2: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Computer Vision Group

UC Berkeley

Classification & Segmentation

Tiger Grass

Water

Sand

outdoor

wildlife

Tiger

tail

eye

legs

head

back

shadow

mouth

Page 3: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

PASCAL Visual Object Challenge

Page 4: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

We want to locate the object

Orig. Image Segmentation Orig. Image Segmentation

Page 5: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Computer Vision Group

UC Berkeley

Fifty years of computer vision 1963-2013

• 1960s: Beginnings in artificial intelligence, image processing and pattern recognition

• 1970s: Foundational work on image formation: Horn, Koenderink, Longuet-Higgins …

• 1980s: Vision as applied mathematics: geometry, multi-scale analysis, probabilistic modeling, control theory, optimization

• 1990s: Geometric analysis largely completed, vision meets graphics, statistical learning approaches resurface

• 2000s: Significant advances in visual recognition, range of practical applications

Page 6: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group
Page 7: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Computer Vision Group University of California

Berkeley

Handwritten digit recognition

(MNIST,USPS)

• LeCun’s Convolutional Neural Networks variations (0.8%, 0.6% and 0.4% on MNIST)

• Tangent Distance(Simard, LeCun & Denker: 2.5% on USPS)

• Randomized Decision Trees (Amit, Geman & Wilder, 0.8%)

• K-NN based Shape context/TPS matching (Belongie, Malik & Puzicha: 0.6% on MNIST)

Page 8: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Computer Vision Group

UC Berkeley

EZ-Gimpy Results (Mori & Malik, 2003)

• 171 of 192 images correctly identified: 92 %

horse

smile

canvas

spade

join

here

Page 9: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Results on various images submitted to the CMU on-line face detector http://www.vasc.ri.cmu.edu/cgi-bin/demos/findface.cgi

Face Detection Carnegie Mellon University

Page 10: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Multiscale sliding window

Ask this question repeatedly, varying position, scale, category…

Paradigm introduced by Rowley, Baluja & Kanade 96 for face detection Viola & Jones 01, Dalal & Triggs 05, Felzenszwalb, McAllester, Ramanan 08

Page 11: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Computer Vision Group

UC Berkeley

Caltech-101 [Fei-Fei et al. 04]

• 102 classes, 31-300 images/class

Page 12: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Caltech 101 classification results

(even better by combining cues..)

Page 13: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

PASCAL Visual Object Challenge

Page 14: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group
Page 15: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group
Page 16: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group
Page 17: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Trying to find stick figures is

hard (and unnecessary!)

Generalized Cylinders (Binford, Marr & Nishihara)

Geons (Biederman)

Page 18: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Person detection is challenging

Page 19: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Can we build upon the success

of faces and pedestrians?

Pattern matching

Capture patterns that are common and visually characteristic

Are these the only two common and characteristic patterns?

Rowley, Baluja, Kanade CVPR96

Viola and Jones, IJCV01

Dalal and Triggs, CVPR05

Page 20: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Poselets

We will train classifiers for these different visual patterns

Page 21: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Segmenting people

[Bourdev, Maji, Brox and Malik, ECCV10]

Best person segmentation on PASCAL 2010 dataset

Page 22: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

“A person with

long pants”

“A man with short

hair and long sleeves”

“A man with short

hair, glasses, short

sleeves and shorts”

“A woman with long hair,

glasses and long pants”(??)

Describing people

Page 23: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Male or female?

Page 24: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Gender classifier per poselet is

much easier to train

Page 25: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Is male

Page 26: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Has long hair

Page 27: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Wears long pants

Page 28: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Wears a hat

Page 29: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Wears long sleeves

Page 30: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Wears glasses

Page 31: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Actions in still images …

have characteristic :

pose and appearance

interaction with objects and agents

Page 32: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Some discriminative poselets

Page 33: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Problem: Human Activity Recognition

12/20/2011 SMARTS Annual Review 2011

Mean Performance: 59.7% correct

Approach: Learn pose and appearance specific for an action

Page 34: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Results : Top Confusions

Page 35: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Low-Cost Automated Tuberculosis Diagnostics Using Mobile Microscopy Jeannette Chang1, Pablo Arbelaez1, Neil Switz2, Clay Reber2, Asa Tapley2,3

Lucian Davis3, Adithya Cattamanchi3, Daniel Fletcher2, and Jitendra Malik1

Department of Electrical Engineering and Computer Science, UC Berkeley1

Department of Bioengineering, UC Berkeley2

Medical School and San Francisco General Hospital, UC San Francisco3

Page 36: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Why Tuberculosis? Mortality and Treatment1

TB is second leading cause of deaths from infectious disease worldwide (after HIV/AIDS)

Highly effective antibiotic treatment

Current Diagnostics

Technicians screen microscopic images of sputum smears manually

Other methods include culture and PCR

Tremendous potential benefit from automated processing or classification

1. http://www.who.int/tb/publications/global_report/2011/gtbr11_full.pdf

2. http://www.thehindu.com/health/rx/article21138.ece

Examples of sputum smears with TB bacteria. Brightfield (top) and fluorescent (bottom) microscopy.2

Page 37: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Candidate TB Blob

Identification

Feature Extraction

Linear SVM Classification

Input image from

CellScope device

Each candidate TB object is

characterized by a feature vector

containing 8 Hu moment invariants

and 14 geometric/photometric

descriptors.

𝑥1⋮𝑥𝑁

=

Candidate TB objects sorted by their

SVM output confidence scores in

decreasing order (row-wise, from top

to bottom)

Bar plot with SVM output confidence

scores corresponding to sorted candidate

TB objects

Array of

candidate

TB objects

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Candidate Object Index

SV

M O

utp

ut

Confidence S

core

Sample subset of candidate TB objects with

corresponding confidence scores

0.918 0.885 0.389 0.374 0.008 0.002 0.001 0.000

Page 38: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Sample positive objects Sample negative objects

Sample Candidate Objects

Page 39: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Patches in Descending Order of Confidence

Page 40: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

0.000 0.200 0.400 0.600 0.800 1.000

MinIntensity

φ1

Perimeter

FilledArea

Area

φ5

φ7

φ6

φ4

φ11

MaxIntensity

EulerNumber

Extent

φ3

ConvexArea

Solidity

MajorAxisLength

EquivDiameter

φ2

MinorAxisLength

Eccentricity

MeanIntensity

Features listed in descending order of

normalized SVM weights.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sensitivity (Recall)

Specific

ity o

r P

recis

ion

SS/RP curves, Avg spec: 0.96744, Avg prec: 0.95389 cost exp: 7

train-SS

train-RP

test-SS

test-RP

Object-Level Performance (Uganda Data)

Page 41: Jitendra Malik University of California at Berkeleylcm.csa.iisc.ernet.in/indous-symposium/slides/jitendra...Jitendra Malik University of California at Berkeley Computer Vision Group

Slide-Level Performance (Uganda Data)