Object Detection with Discriminatively Trained Part Based...

Object Detection with Discriminatively Trained Part

Based Models

Pedro F. FelzenszwalbDepartment of Computer Science

University of Chicago

Joint with David Mcallester, Deva Ramanan, Ross Girshick

PASCAL Challenge

• ~10,000 images, with ~25,000 target objects

- Objects from 20 categories (person, car, bicycle, cow, table...)

- Objects are annotated with labeled bounding boxes

Why is it hard?

• Objects in rich categories exhibit significant variability

- Photometric variation

- Viewpoint variation

- Intra-class variability

- Cars come in a variety of shapes (sedan, minivan, etc)

- People wear different clothes and take different poses

We need rich object modelsBut this leads to difficult matching and training problems

Starting point: sliding window classifiers

Feature vector x = [... , ... , ... , ... ]

• Detect objects by testing each subwindow

- Reduces object detection to binary classification

- Dalal & Triggs: HOG features + linear SVM classifier

- Previous state of the art for detecting people

Histogram of Gradient (HOG) features

• Image is partitioned into 8x8 pixel blocks

• In each block we compute a histogram of gradient orientations

- Invariant to changes in lighting, small deformations, etc.

• Compute features at different resolutions (pyramid)

HOG Filters

• Array of weights for features in subwindow of HOG pyramid

• Score is dot product of filter and feature vector

Image pyramid HOG feature pyramid

HOG pyramid H

Score of F at position p is F ⋅ φ(p, H)

Filter F

φ(p, H) = concatenation of HOG features from

subwindow specified by p

Dalal & Triggs: HOG + linear SVMs

Typical form of a model

φ(p, H)

φ(q, H)

There is much more background than objectsStart with random negatives and repeat: 1) Train a model 2) Harvest false positives to define “hard negatives”

Overview of our models

• Mixture of deformable part models

• Each component has global template + deformable parts

• Fully trained from bounding boxes alone

2 component bicycle model

root filterscoarse resolution

part filtersfiner resolution

deformationmodels

Each component has a root filter F0 and n part models (Fi, vi, di)

Object hypothesis

Image pyramid HOG feature pyramid

Multiscale model captures features at two-resolutions

Score is sum of filter scores minus

deformation costs

p0 : location of rootp1,..., pn : location of parts

z = (p0,..., pn)

filters deformation parameters

displacementsscore(p0, . . . , pn) =

Fi · !(H, pi)!n!

di · (dx2i , dy2

concatenation of HOG features and part

displacement features

concatenation filters and deformation parameters

score(z) = ! · !(H, z)

Score of a hypothesis

“data term” “spatial prior”

Matching

• Define an overall score for each root location

- Based on best placement of parts

• High scoring root locations define detections

- “sliding window approach”

• Efficient computation: dynamic programming + generalized distance transforms (max-convolution)

score(p0) = maxp1,...,pn

score(p0, . . . , pn).

head filter

Dl(x, y) = maxdx,dy

!Rl(x + dx, y + dy)! di · (dx2, dy2)

"Transformed response

max-convolution, computed in linear time(spreading, local max, etc)

input image

Response of filter in l-th pyramid level

Rl(x, y) = F · !(H, (x, y, l))

cross-correlation

response of root filter

transformed responses

response of part filters

feature map feature map at twice the resolution

combined score of

root locations

color encoding of filter

response values

Matching results

(after non-maximum suppression)

~1 second to search all scales

Training• Training data consists of images with labeled bounding boxes.

• Need to learn the model structure, filters and deformation costs.

Training

Latent SVM (MI-SVM)

LD(!) =12

||!||2 + Cn!

max(0, 1! yif!(xi))

Minimize

D = (!x1, y1", . . . , !xn, yn")Training data yi ! {"1, 1}

We would like to find β such that: yif!(xi) > 0

Classifiers that score an example x using

β are model parametersz are latent values

f!(x) = maxz!Z(x)

! · !(x, z)

Semi-convexity

• Maximum of convex functions is convex

• is convex in β

• is convex for negative examplesmax(0, 1! yif!(xi))

f!(x) = maxz!Z(x)

! · !(x, z)

LD(!) =12

||!||2 + Cn!

max(0, 1! yif!(xi))

Convex if latent values for positive examples are fixed

Latent SVM training

• Convex if we fix z for positive examples

• Optimization:

- Initialize β and iterate:

- Pick best z for each positive example

- Optimize β via gradient descent with data-mining

LD(!) =12

||!||2 + Cn!

max(0, 1! yif!(xi))

Training Models• Reduce to Latent SVM training problem

• Positive example specifies some z should have high score

• Bounding box defines range of root locations

- Parts can be anywhere

- This defines Z(x)

Background

• Negative example specifies no z should have high score

• One negative example per root location in a background image

- Huge number of negative examples

- Consistent with requiring low false-positive rate

Training algorithm, nested iterations

Fix “best” positive latent values for positives

Harvest high scoring (x,z) pairs from background images

Update model using gradient descent

Trow away (x,z) pairs with low score

• Sequence of training rounds

- Train root filters

- Initialize parts from root

- Train final model

Car model

deformationmodels

Person model

deformationmodels

Cat model

deformationmodels

Bottle model

deformationmodels

Car detections

high scoring false positiveshigh scoring true positives

Person detections

high scoring true positiveshigh scoring false positives

(not enough overlap)

Horse detections

high scoring true positives high scoring false positives

Cat detections

high scoring true positives high scoring false positives (not enough overlap)

Quantitative results

• 7 systems competed in the 2008 challenge

• Out of 20 classes we got:

- First place in 7 classes

- Second place in 8 classes

• Some statistics:

- It takes ~2 seconds to evaluate a model in one image

- It takes ~4 hours to train a model

- MUCH faster than most systems.

Precision/Recall results on Bicycles 2008

Precision/Recall results on Person 2008

Precision/Recall results on Bird 2008

Comparison of Car models on 2006 data

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

recall

class: car, year 2006

1 Root (0.48)2 Root (0.58)1 Root+Parts (0.55)2 Root+Parts (0.62)2 Root+Parts+BB (0.64)

Summary

• Deformable models for object detection

- Fast matching algorithms

- Learning from weakly-labeled data

- Leads to state-of-the-art results in PASCAL challenge

• Future work:

- Hierarchical models

- Visual grammars

- AO* search (coarse-to-fine)

Object Detection with Discriminatively Trained Part Based...

Documents

3D Pose-by-Detection of Vehicles via Discriminatively

Object Detection with Discriminatively Trained Part …people.cs.uchicago.edu/~pff/papers/lsvm-pami.pdf · Object Detection with Discriminatively Trained Part Based Models ... R.B

Discriminatively Embedded K-Means for Multi-View Clustering · 2016. 5. 16. · Discriminatively Embedded K-Means for Multi-view Clustering Jinglin Xu1, Junwei Han1, Feiping Nie2

Gaussian Processes for Big Data Problemsmpd37/teaching/tutorials/2015-04-14-mlss.pdf · 4/14/2015 · Gaussian Processes Marc Deisenroth @MLSS, 14 April 2015 9. Sampling from a Multivariate

Human food intake is discriminatively sensitive to gastric ... · Human food intake is discriminatively sensitive to gastric signaling ... absence of control of eating by gastric

REGISTERED NURSE OVERSEA S TRAINED CERTIFICATEnmcn.gov.ng/docs/REGISTERED NURSE OVERSEAS TRAINED... · REGISTERED NURSE OVERSEA S TRAINED CERTIFICATE ... Racheal Ololade Ogun 75

Set2Model Networks: Learning Discriminatively To Learn ...sites.skoltech.ru/app/data/uploads/sites/25/2017/... · Memory-augmented networks −Neural Turing Machine: M in the machine

Object Detection with Discriminatively Trained Part …cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf · a collection of parts arranged in a deformable conﬁgu- ... the part ﬁlter

Discriminatively Structured Graphical Models for Speech ...ssli.ee.washington.edu/ssli/people/karim/papers/GMRO-final-rpt.pdf · Discriminatively Structured Graphical Models for Speech

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance Andrew McCallum Kedar Bellare Fernando Pereira Thanks to Charles

Face Detection, Pose Estimation, and Landmark Localization ...deva/papers/face/face-cvpr12.pdf · Face detection is dominated by discriminatively-trained scanning window classiﬁers

Discriminatively Trained Sparse Code Gradients for Contour Detectionpapers.nips.cc/paper/4787-discriminatively-trained... · · 2014-04-23basis of many tasks such as image segmentation

Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan IEEE Trans

Object Detection with Discriminatively Trained Part Based ...lear.inrialpes.fr/~oneata/reading_group/dpm.pdf · Object Detection with Discriminatively Trained Part Based Model P.F

Trained for Torture

Natural Language Processinglegacydirs.umiacs.umd.edu/~hal/tmp/mlss.pdf · Not just string processing or keyword matching End systems that we want to build Simple: Spelling correction,

Trained NeuroAnimator Demo

Object Detection with Discriminatively Trained Part Based

Line Listing Of Sterilization Trained/Empaneled Providers Trained/ … · 2020-02-10 · Line Listing Of Sterilization Trained/Empaneled Providers o t r n n) r Trained/ Empaneled

Discriminatively Trained And-Or Tree Models for Object Detection