35
Epitomic Location Epitomic Location Recognition Recognition A generative approach for location A generative approach for location recognition recognition K. Ni, A. Kannan, A. Criminisi and J. Winn K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR 2008. Anchorage, Alaska.

Epitomic Location Recognition

Embed Size (px)

DESCRIPTION

K. Ni, A. Kannan , A. Criminisi and J. Winn. Epitomic Location Recognition. A g enerative approach for location recognition. In proc. CVPR 2008. Anchorage, Alaska. Goal Introduction Recognition Enhancements Evaluation. Location Recognition. Where am I? Instance recognition - PowerPoint PPT Presentation

Citation preview

Page 1: Epitomic Location Recognition

Epitomic Location Epitomic Location RecognitionRecognition

A generative approach for location recognitionA generative approach for location recognition

K. Ni, A. Kannan, A. Criminisi and J. WinnK. Ni, A. Kannan, A. Criminisi and J. Winn

In proc. CVPR 2008. Anchorage, Alaska.

Page 2: Epitomic Location Recognition
Page 3: Epitomic Location Recognition

Location RecognitionLocation Recognition

Where am I?Where am I?Instance recognitionInstance recognition

Category recognition (more difficult)Category recognition (more difficult)

Lobby? Cubicle? Hallway? Kitchen?

Page 4: Epitomic Location Recognition
Page 5: Epitomic Location Recognition

Geometry Based Geometry Based RecognitionRecognition

SLAM & structure from motionSLAM & structure from motionWhy do we need Why do we need metric metric reconstruction?reconstruction?

Lose the flexibility to do class recognition.Lose the flexibility to do class recognition.

F. Schaffalitzky and A. Zisserman

Training Training ImagesImages

Testing Testing ImageImage

Geometry Geometry &Labels&Labels

FeaturesFeatures

Local Feature Local Feature DatabaseDatabase

G. Schindler, M. Brown, R. Szeliski

Page 6: Epitomic Location Recognition

Appearance Based Appearance Based RecognitionRecognition

Capture global appearance informationCapture global appearance informationGaussian mixture model used by A. Gaussian mixture model used by A. Torralba, et. alTorralba, et. al

Training Training ImagesImages

ImageImageVectorsVectors

Appearance Appearance ModelModel

Preprocessing Training

A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin

M. Cummins and P. Newman

(e.g. PCA)

Page 7: Epitomic Location Recognition

Appearance or Geometry?Appearance or Geometry?

Can we do better by fusing both Can we do better by fusing both information together?information together?

A small example with 2 location labels: cubicle and corridor

Page 8: Epitomic Location Recognition

The Simplest ModelThe Simplest Model

Nearest neighbor classificationNearest neighbor classificationNaive but still effective with Naive but still effective with enoughenough samples.samples.

A small shift may disrupt the recognition.A small shift may disrupt the recognition.

Does not capture uncertainty.Does not capture uncertainty.

Page 9: Epitomic Location Recognition

How to Incorporate Translation How to Incorporate Translation Invariance?Invariance?

We need something better than a “bag of frames” We need something better than a “bag of frames” modelmodel

Training images

Testing image

Page 10: Epitomic Location Recognition

PanoramaPanorama

It models both appearance & geometryIt models both appearance & geometryAdapts to camera rotation and focal length Adapts to camera rotation and focal length changechange

M. Brown and D. G. Lowe

GenerativeGenerativeAn image is a patch “extracted” from the panoramaAn image is a patch “extracted” from the panorama

Page 11: Epitomic Location Recognition

Cons of PanoramasCons of PanoramasNot easy to build a panorama due to Not easy to build a panorama due to parallaxparallax

Do not capture uncertaintyDo not capture uncertainty

Only work for location Only work for location instanceinstance recognition recognition

No compact representation for repetitive No compact representation for repetitive scenesscenes

Page 12: Epitomic Location Recognition

Gaussian Mixture ModelGaussian Mixture Model

Six mixtures trained as in Torralba et al’s Six mixtures trained as in Torralba et al’s paperpaper

Handles uncertainties but no translation Handles uncertainties but no translation invarianceinvariance

Means

Variances

Remove boundariesMuch more blurred

Page 13: Epitomic Location Recognition

A Weak PanoramaA Weak Panorama

3D motions can be roughly modeled by 3D motions can be roughly modeled by 2D translation + scaling.2D translation + scaling.

2D translation

Scaling

Page 14: Epitomic Location Recognition

Epitome = Panorama + Epitome = Panorama + GMMGMM

EpitomeEpitomeGenerative model for image patches /video Generative model for image patches /video framesframes

Captures repetitive patterns in the original Captures repetitive patterns in the original imageimage

Mapping = 2D translation + scalingMapping = 2D translation + scaling

A source image Image patches

Epitome

N. Jojic et.al., ICCV 2003; N. Petrovic, et.al., CVPR 2006

Page 15: Epitomic Location Recognition

Means

Variances

Location Epitome

Epitome as Probabilistic Epitome as Probabilistic PanoramaPanorama

Model 3D scenes rather than a single 2D Model 3D scenes rather than a single 2D imageimage

Environment = Virtual panorama

Page 16: Epitomic Location Recognition

Learning the Location Learning the Location EpitomeEpitome

Initialize epitome randomlyInitialize epitome randomly

EM IterationsEM IterationsE-step: infer the posteriors over all E-step: infer the posteriors over all mappingsmappings

M-step: use the posteriors as weights to update M-step: use the posteriors as weights to update the the meanmean and and variancevariance of epitome pixels of epitome pixels

Free energy EM iterations

Page 17: Epitomic Location Recognition

Model ComparisonModel ComparisonEpitome is a smart mixture of Gaussians model with parameters sharing among components

For the same number of parameters, the epitome generalizes better

Page 18: Epitomic Location Recognition
Page 19: Epitomic Location Recognition

Build Label MapsBuild Label MapsThe label maps are the posterior of the The label maps are the posterior of the label given the mapping label given the mapping

Epitome

Label maps

Corridor label map

Cubicle label map

Page 20: Epitomic Location Recognition

Recognition from Location Recognition from Location EpitomesEpitomes

Fast correlation: infer the best mapping region

Sum the pixel-wise votes

Temporal smoothing using HMMInput testing image Best matching patch

Corridor label map

Cubicle label map

Location epitome

Page 21: Epitomic Location Recognition
Page 22: Epitomic Location Recognition

Color is not always the best Color is not always the best featurefeature

Other features besides RGBOther features besides RGBFor example, stereo feature captures the For example, stereo feature captures the depth info.depth info.

Do not need high stereo accuracy (efficient DP Do not need high stereo accuracy (efficient DP here)here)

Corridor Cubicle Kitchen

Page 23: Epitomic Location Recognition

B

G

R

Stereo

Integrating Multiple Integrating Multiple FeaturesFeatures

Stack multiple feature “channels”Stack multiple feature “channels”

Page 24: Epitomic Location Recognition

Local HistogramsLocal Histograms

Enable better translation invariance and more Enable better translation invariance and more generalizationgeneralization

Error rate: Error rate: 0.49 0.49 0.36 0.36 in a test, 4-class datasetin a test, 4-class dataset

Improve the efficiency dramatically: Improve the efficiency dramatically: 30 30 times speed-times speed-upup

Page 25: Epitomic Location Recognition

Supervised LearningSupervised LearningIncorporates training image labelsIncorporates training image labels

Helps discriminate images with similar Helps discriminate images with similar features but different location labels.features but different location labels.

An example epitome

An example label feature

A monitor in the cubicle

A microwave in the kitchen

Discriminative features

Page 26: Epitomic Location Recognition
Page 27: Epitomic Location Recognition

MIT Image DatabaseMIT Image Database

Created by Created by Antonio Torralba, and et. al.Antonio Torralba, and et. al.17 sequences, 62 locations, 7 categories, 17 sequences, 62 locations, 7 categories, 72077 images72077 images

Page 28: Epitomic Location Recognition

Results on Recognizing Location Results on Recognizing Location InstancesInstances

Location epitome vs. GMM, 10% better in Location epitome vs. GMM, 10% better in averageaverage

Page 29: Epitomic Location Recognition

Results on Recognizing Location Results on Recognizing Location ClassesClasses

Location Epitome vs. GMM, 10%-20% Location Epitome vs. GMM, 10%-20% betterbetter

Page 30: Epitomic Location Recognition

MSRC Data SetMSRC Data Set

Captured with a Captured with a stereo camerastereo camera5409 images collected at the speed of 5409 images collected at the speed of 44 fps fps

11 sequences and 7 classes11 sequences and 7 classes

corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large

lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

Page 31: Epitomic Location Recognition

Integrate Integrate Depth Depth CuesCues

corridor_visionlab cubicle_mlp kitchen-fl2-north lectureroom-large

lectureroom-small stairs-1st-to-2nd stairs-2nd-to-1st

Page 32: Epitomic Location Recognition

Instance Recognition with Multiple Instance Recognition with Multiple FeaturesFeatures

RGB & Stereo overwhelms the other RGB & Stereo overwhelms the other featuresfeatures

Learning: Learning: 5.75.7 fps fps

Recognition: Recognition: 116 116 fps = fps = 2929 times the capture times the capture speedspeed

Page 33: Epitomic Location Recognition

SummarySummaryA generative model for the recognition of A generative model for the recognition of both location both location instancesinstances and and classesclasses

FastFast: capable of real-time applications: capable of real-time applications

FlexibleFlexible: capable of integrating various features: capable of integrating various features

ProbabilisticProbabilistic: capable of capturing : capable of capturing uncertaintiesuncertainties

Future applicationsFuture applicationsNavigation for visually impaired peopleNavigation for visually impaired people

Appearance-based loop closing for SLAM Appearance-based loop closing for SLAM problemsproblems

Page 34: Epitomic Location Recognition

Epitomic Location Epitomic Location RecognitionRecognition

A generative approach for location recognitionA generative approach for location recognition

K. Ni, A. Kannan, A. Criminisi and J. WinnK. Ni, A. Kannan, A. Criminisi and J. Winn

Thank you !

Page 35: Epitomic Location Recognition

Local Histograms (2)Local Histograms (2)

Improves efficiency (both training and Improves efficiency (both training and testing)testing)

The bottle neck: convoluting epitome and The bottle neck: convoluting epitome and imagesimages

Compression rate: 3*(CCompression rate: 3*(C11CC22))22/50 = 2400/50 = 2400

Learning: 3 hours Learning: 3 hours 6 mins, 6 mins, 3030 times times fasterfaster

Convolute 3-dimension RGB features Convolute 50-dimension local histograms

M

N

ImageEpitome Me

Ne

M/C1

N/C2

****

Me/C1

Ne/C2