Computer Vision Introduction One Lecture ( Short )

Preview:

Citation preview

Computervision,inonelecture

BillFreemanElectricalEngineeringandComputerScienceDept.

Massachuse<sIns>tuteofTechnologyApril21,2010

TheTaiyuanUniversityofTechnologyComputerCenterstaff,andme(1987)

Meandmywife,ridingfromtheForeigners’Cafeteria

Insidethecomputercenter,withtheimageprocessingequipment

WhileinChina,Ireadthisbook(tobere‐issuedbyMITPressthisyear),andgotveryexcitedaboutcomputervision.StudiedforPhDatMIT.

Goalofcomputervision

Marr:“Totellwhatiswherebylooking”.

Wantto:– Es>matetheshapesandproper>esofthings.– Recognizeobjects– Findandrecognizepeople– Findroadlanesandothercars– Helparobotwalk,navigate,orfly.–  Inspectformanufacturing

Somepar>culargoalsofcomputervision

•  Waveacameraaround,geta3‐dmodelout.•  Capturebodyposeofactordancing.•  Detectandrecognizefaces.•  Recognizeobjects.•  Trackpeopleorobjects

Let’sgobackin>me,tothemid‐1980’s

Whateveryonelookedlikebackthen

10

Features

•  Points

butalso,•  Lines•  Conics•  Otherfi<edcurves

11

Features“blocksworld”Atoyworldinwhichtostudyimageinterpreta>on.Allwehavetodoistoconvertrealworldimagestotheirblocksworldequivalentsandwe’reallset.

YvanLeclercandMar>nFischler,anop>miza>on‐basedapproachtothe

interpreta>onofsinglelinedrawingsas3‐dwire

frames.

Objects

12 Hu<enlocherandUllman,Objectrecogni>onusingalignment,ICCV,1986

Computervisionresearchresults,1986

13 FromRothwelletal,Efficientmodellibraryaccessbyprojec>velyinvariantindexingfunc>ons,CVPR1992.

6yearslater:Recognizingplanarobjectsusinginvariants.

Inputimage Edgepointsfi<edwithlinesorconics

Objectsthathavebeenrecognizedandverified.

Computervisionresearchresults,1992

Backtothepresent…

Companiesandapplica>ons

•  Cognex•  Poseidon•  Mobileye•  Eyetoy•  Iden>x•  Google•  Microsoh•  Facerecogni>onincameras

MobilEye

Google

Microsoh

Microsoh

Somepar>culargoalsofcomputervision(statusreport)

•  Waveacameraaround,geta3‐dmodelout(almost)

•  Capturebodyposeofactordancing.Usingmul>plecameras(pre<ywell),usingasinglecamera(notyet)

•  Detectandrecognizefaces.(frontal,yes)•  Recognizeobjects.(workingonit,lotsofprogress)•  Trackpeopleorobjects(overshort>mes)

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

BuildingaPanorama

M.BrownandD.G.Lowe.RecognisingPanoramas.ICCV2003

Howdowebuildapanorama?

•  Weneedtomatch(align)images

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

MatchingwithFeatures• Detectfeaturepointsinbothimages

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

MatchingwithFeatures• Detectfeaturepointsinbothimages

• Findcorrespondingpairs

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

MatchingwithFeatures• Detectfeaturepointsinbothimages

• Findcorrespondingpairs• Usethesepairstoalignimages‐weknowthis

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

MatchingwithFeatures

•  Problem1:– Detectthesamepointindependentlyinbothimages

nochancetomatch!

Weneedarepeatabledetector

counter‐example:

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

MatchingwithFeatures

•  Problem2:– Foreachpointcorrectlyrecognizethecorrespondingone

?

Weneedareliableanddis>nc>vedescriptor

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Overviewoffeaturedetec2onfor(instance)objectrecogni2on

Descriptor

detector location

Note:hereviewpointisdifferent,notpanorama(theyshowoff)

•  Detector:detectsamescenepointsindependentlyinbothimages

•  Descriptor:encodelocalneighboringwindow–  Notehowscale&rota>onofwindowarethesameinbothimage(butcomputedindependently)

•  Correspondence:findmostsimilardescriptorinotherimage

CVPR2003Tutorial

Recogni2onandMatchingBasedonLocalInvariant

Features

DavidLoweComputerScienceDepartment

UniversityofBri>shColumbia

h<p://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

InvariantLocalFeatures•  Imagecontentistransformedintolocalfeaturecoordinatesthatareinvarianttotransla>on,rota>on,scale,andotherimagingparameters

SIFT Features

Freemanetal,1998h<p://people.csail.mit.edu/billf/papers/cga1.pdf

Advantagesofinvariantlocalfeatures

•  Locality:featuresarelocal,sorobusttoocclusionandclu<er(nopriorsegmenta>on)

•  Dis2nc2veness:individualfeaturescanbematchedtoalargedatabaseofobjects

•  Quan2ty:manyfeaturescanbegeneratedforevensmallobjects

•  Efficiency:closetoreal‐>meperformance

•  Extensibility:caneasilybeextendedtowiderangeofdifferingfeaturetypes,witheachaddingrobustness

SIFTvectorforma2on•  Computedonrotatedandscaledversionofwindowaccordingtocomputedorienta>on&scale–  resamplea16x16versionofthewindow

•  BasedongradientsweightedbyaGaussianofvariancehalfthewindow(forsmoothfalloff)

SIFTvectorforma2on•  4x4arrayofgradientorienta>onhistograms

– notreallyhistogram,weightedbymagnitude•  8orienta>onsx4x4array=128dimensions•  Mo>va>on:somesensi>vitytospa>allayout,butnottoomuch.

showingonly2x2herebutis4x4

SIFTvectorforma2on•  Thresholdedimagegradientsaresampledover16x16arrayofloca>onsinscalespace

•  Createarrayoforienta>onhistograms

•  8orienta>onsx4x4histogramarray=128dimensions

showingonly2x2herebutis4x4

Ensuresmoothness•  Gaussianweight•  Trilinearinterpola>on

– agivengradientcontributesto8bins:4inspace>mes2inorienta>on

Reduceeffectofillumina2on•  128‐dimvectornormalizedto1

•  Thresholdgradientmagnitudestoavoidexcessiveinfluenceofhighgradients

– ahernormaliza>on,clampgradients>0.2–  renormalize

Featurestabilitytonoise•  Matchfeaturesaherrandomchangeinimagescale&orienta>on,withdifferinglevelsofimagenoise

•  Findnearestneighborindatabaseof30,000features

Featurestabilitytoaffinechange•  Matchfeaturesaherrandomchangeinimagescale&

orienta>on,with2%imagenoise,andaffinedistor>on

•  Findnearestneighborindatabaseof30,000features

Dis2nc2venessoffeatures•  Varysizeofdatabaseoffeatures,with30degreeaffinechange,2%imagenoise

•  Measure%correctforsinglenearestneighbormatch

Thesefeaturepointdetectorsanddescriptorsarethemostimportantrecentadvancein

computervisionandgraphics.

•  Featurepointsareusedalsofor:–  Imagealignment(homography,fundamentalmatrix)–  3Dreconstruc>on– Mo>ontracking–  Objectrecogni>on–  Indexinganddatabaseretrieval–  Robotnaviga>on–  …other

MoreusesforSIFTfeatures

SIFTfeatureshavealsobeenappliedto(categorical)objectrecogni>on

First,let’spresentvariousoftheissuesinobjectrecogni>on.

intra‐classvaria>on

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

–  Invariances•  Viewpoint•  Illumina>on•  Occlusion•  Scale•  Deforma>on•  Clu<er•  etc.

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

–  invariances– Partsorglobalw/sub‐window

– Usesetoffeaturesoreachpixelinimage

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Currentapproachesinobjectrecogni>on

•  Bagofwords•  Boos>ng•  Labeltransfer

Visualwords

•  Vectorquan>zeSIFTdescriptorstoavocabularyof2or3thousand“visualwords”.

•  Heuris>cdesignofdescriptorsmakesthesewordssomewhatinvariantto:– Ligh>ng– 2‐dOrienta>on– 3‐dViewpoint

Comparewithobjectclassdatabase

Findwords

Formhistograms

Objectrecogni>onusingvisualwords

Manycombinatorialmatchingproblemstobesolvedforobjectrecogni>on.

Instancerecogni>on:withfeaturesallowedtoappearornotinboththetestandtrainingexamples.

Deformableobjectrecogni>on:somefeatureclustersmaintainspa>alcoherence,otherscanvary.

Categoryrecogni>on:eachclassdefinedbymanydifferenttrainingsetexemplars.Findtheclassthatbestexplainstheobservedfeatureset.

Semi‐supervisedobjectrecogni>on:observedtrainingsetfeaturesincludemanybackgroundobjectfeatures.

h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf

h<p://www.cs.utexas.edu/~grauman/research/projects/pmk/pmk_projectpage.htm

Caltech101

Caltech101resultsover>me

Problem:Categorylevelrecogni>onusingvisualwordsrepresenta>on.

Applica>ons:Objectrecogni>on.

References:Lazebnik,Schmid,andPonce,Beyondbagsoffeatures:Spa>alpyramidmatchingforrecognizingnaturalscenecategories,ComputerVisionandPa<ernRecogni>on(CVPR2006),h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf

K.GraumanandT.Darrell.UnsupervisedLearningofCategoriesfromSetsofPar>allyMatchingImageFeatures.InProceedingsoftheIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),NewYorkCity,NY,June2006,h<p://www.cs.utexas.edu/~grauman/papers/grauman_darrell_cvpr2006.pdf

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers—SVM’sandboos>ng

•  Bayesianmethods

•  Largedatabases

PaulViolaMichaelJ.JonesMitsubishiElectricResearchLaboratories(MERL)

Cambridge,MA

MostofthisworkwasdoneatCompaqCRLbeforetheauthorsmovedtoMERL

Rapid Object Detection Using a Boosted Cascade of Simple Features

h<p://citeseer.ist.psu.edu/cache/papers/cs/23183/h<p:zSzzSzwww.ai.mit.eduzSzpeoplezSzviolazSzresearchzSzpublica>onszSzICCV01‐Viola‐Jones.pdf/viola01robust.pdf

Manuscriptavailableonweb:

Viola‐Jonesapproach

•  Largefeatureset(…ishugeabout16,000,000features)

•  Efficientfeatureselec>onusingAdaBoost

•  CascadedClassifierforrapiddetec>on– HierarchyofA<en>onalFilters

The combination of these ideas yields the fastest known face detector for gray scale images.

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

ImageFeatures

“Rectangle filters”

Similar to Haar wavelets

Differences between sums of pixels in adjacent rectangles

{ ht(x) = +1 if ft(x) > θt -1 otherwise Unique Features

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Huge“Library”ofFilters

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

IntegralImage

•  DefinetheIntegralImage

•  Anyrectangularsumcanbecomputedinconstant>me:

•  Rectanglefeaturescanbecomputedasdifferencesbetweenrectangles

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Construc>ngclassifiersbycombiningfilteroutputs

•  Perceptronyieldsasufficientlypowerfulclassifier

•  UseAdaBoosttoefficientlychoosebestfeatures

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

AdaBoost Ini>aluniformweightontrainingexamples

weakclassifier1

weakclassifier2

Incorrectclassifica2onsre‐weightedmoreheavily

weakclassifier3

Finalclassifierisweightedcombina2onofweakclassifiers

(Freund&Shapire’95)

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Ada‐BoostTutorial

•  GivenaWeaklearningalgorithm–  Learnertakesatrainingsetandreturnsthebestclassifierfromaweakconceptspace

•  requiredtohaveerror<50%

•  Star>ngwithaTrainingSet(ini>alweights1/n)– Weaklearningalgorithmreturnsaclassifier–  Reweighttheexamples

•  Weightoncorrectexamplesisdecreased•  Weightonerrorsisdecreased

•  FinalclassifierisaweightedmajorityofWeakClassifiers– Weakclassifierswithlowerrorgetlargerweight

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

ReviewofAdaBoost(Freund&Shapire95)

• Givenexamples(x1,y1),…,(xN,yN)whereyi=0,1fornega>veandposi>veexamplesrespec>vely.• Ini>alizeweightswt=1,i=1/N

• Fort=1,…,T• Normalizetheweights,wt,i=wt,i/Σwt,j

• Findaweaklearner,i.e.ahypothesis,ht(x)withweightederrorlessthan.5• Calculatetheerrorofht:et=Σwt,i|ht(xi)–yi|

• Updatetheweights:wt,i=wt,iBt(1‐di)whereBt=et/(1‐et)anddi=0ifexamplexiisclassifiedcorrectly,di=1otherwise.

• Thefinalstrongclassifieris

whereαt=log(1/Bt)

j=1

N

1if Σ αtht(x)> 0.5Σ αt

0otherwise

T

t=1 t=1

T

{h(x)=

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

ExampleClassifierforFaceDetec>on

ROC curve for 200 feature classifier

One stage: a classifier with 200 rectangle features was learned using AdaBoost

95% correct detection on test set with 1 in 14084 false positives.

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Developfast,accurateclassifierusingacascade

•  Givenanestedsetofclassifierhypothesisclasses

•  Computa>onalRiskMinimiza>on

vsfalsenegdeterminedby

%FalsePos

%Detec>o

n

050

5099

FACEIMAGESUB‐WINDOW

Classifier1

F

T

NON‐FACE

Classifier3T

F

NON‐FACE

F

T

NON‐FACE

Classifier2T

F

NON‐FACE

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Experiment:SimpleCascadedClassifier

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

CascadedClassifier

1Feature 5Features

F

50%20Features

20% 2%

FACE

NON‐FACE

F

NON‐FACE

F

NON‐FACE

IMAGESUB‐WINDOW

•  A1featureclassifierachieves100%detec>onrateandabout50%falseposi>verate.

•  A5featureclassifierachieves100%detec>onrateand40%falseposi>verate(20%cumula>ve)–  usingdatafrompreviousstage.

•  A20featureclassifierachieve100%detec>onratewith10%falseposi>verate(2%cumula>ve)

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

AReal‐>meFaceDetec>onSystem

Trainingfaces:4916faceimages(24x24pixels)plusver>calflipsforatotalof9832faces

Trainingnon‐faces:350millionsub‐windowsfrom9500non‐faceimages

Finaldetector:38layercascadedclassifierThenumberoffeaturesperlayerwas1,10,25,25,50,50,50,75,100,…,200,…

Finalclassifiercontains6061features.ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

AccuracyofFaceDetector

Performance on MIT+CMU test set containing 130 images with 507 faces and about 75 million sub-windows.

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

ComparisontoOtherSystems

10 31 50 65 78 95 110 167

Viola-Jones 76.1 88.4 91.4 92.0 92.1 92.9 93.1 93.9

Viola-Jones (voting)

81.1 89.7 92.1 93.1 93.1 93.2 93.7 93.7

Rowley-Baluja-Kanade

83.2 86.0 89.2 90.1

Schneiderman-Kanade

94.4

Detector

False Detections

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

SpeedofFaceDetector

Speed is proportional to the average number of features computed per sub-window.

On the MIT+CMU test set, an average of 9 features out of a total of 6061 are computed per sub-window.

On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).

Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

OutputofFaceDetectoronTestImages

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

MoreExamples

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Conclusions

•  We[they]havedevelopedthefastestknownfacedetectorforgrayscaleimages

•  Threecontribu>onswithbroadapplicability– Cascadedclassifieryieldsrapidclassifica>on– AdaBoostasanextremelyefficientfeatureselector

– RectangleFeatures+IntegralImagecanbeusedforrapidimageanalysis

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

Trackingahumanin3D

The appearance of people can vary dramatically.

People can appear in arbitrary poses.

Structure is unobservable—inference from visible parts.

Geometrically under-constrained.

Butthisrequiresthatweusemarkers,whichwedon’twant,andalsorequiresmul>plecameras.

http://www.vicon.com/animation/

State of the Art.

•  Brightnessconstancycue–  Insensi>vetoappearance

•  Full‐bodyrequiredmul>plecameras

•  Singlehypothesis

State of the Art.

I(x, t) = I(x+u, 0) + η

•  Singlecamera,mul>plehypotheses•  2Dtemplates(nodrihbutviewdependent)

State of the Art.

•  Mul>plehypotheses

•  Mul>plecameras

•  Simplifiedclothing,ligh>ngandbackground

* No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment

Task: Infer 3D human motion from 2D image

p(model | cues) = p(cues | model) p(model)

3. Posterior probability: Need an effective way to explore the model space (very high dimensional) and represent ambiguities.

p(cues)

1.  Need a constraining likelihood model that is also invariant to variations in human appearance.

2. Need a prior model of how people move.

Systemcomponentsforhumanbodytracking

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).

•  Modelsforhumanappearance(likelihoodterm).

* Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ

•  Posteriordistribu>onovermodelparametersohenmul>‐modal(duetoambigui>es)

•  Representwholedistribu>on:–  sampledrepresenta>on–  eachsampleisapose–  predictover>meusingapar>clefilteringapproach

•  IsardandBlake,1998,“Condensa>onAlgorithm”

Posterior Temporal dynamics

Likelihood Posterior

Giventhedatasofar,whatdoIthinkisthesetofpossiblestatesthebodycouldbein?

Whatcouldeachofthosestatesbecomeatthenext>mestep?(Usespriormodelforhumanmo>on).

Howmuchiseachofthosepossiblestatessupportedbythevisualdataatthenext>mestep?

Updatees>mateofpossiblestates,giventhevisualdata.

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

•  Onlyhandlespeoplewalking.•  Verypowerfulconstraintonhumanmo>on.

•  Ac>on‐specificmodel‐Walking– Trainingdata:3Dmo>oncapturedata

– Fromtrainingset,learnmeancycleandcommonmodesofdevia>on(PCA)

Mean cycle Small noise Large noise

Initialize to figure, then let go…

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

Changing background

Low contrast limb boundaries

Occlusion

Varying shadows

Deforming clothing

What do non-people look like?

What do people look like?

(5000 samples in each example)

Edge cues

Ridge cues

Flow cues

Edge cues

Ridge cues

Flow cues

Walking model

2500 samples ~10 min/frame

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatasets•  Miscellaneousadvances:exploi>ngcontext

Images by Antonio Torralba

Useofcontextforobjectdetec>on

car pedestrian

Identical local image features!

Contextspeedsobjectdetec>on:thisiswhattheworldlooksliketoafacedetectorthatdoesn’ttakeadvantageofcontext.Canyoufindthe

face?

AntonioTorralba

Thebestobjectdetec>onalgorithmscombinetop‐down(context)withbo<om‐up(localfeatures)cues.

Thetop‐downinforma>oncanhelpsuppressfalsedetec>onscausedbyambiguouslocalinforma>on.

Featurevectorforanimage:the“gist”ofthescene

–  Compute 12 x 30 = 360 dim. feature vector –  Or use steerable filter bank, 6 orientations, 4 scales, averaged

over 4x4 regions = 384 dim. feature vector –  Reduce to ~ 80 dimensions using PCA

Oliva & Torralba, IJCV 2001

Low‐dimensionalrepresenta>onforimagecontext

Images

Random noise filtered to have the

same 80-dimensional

representation as the images above.

“gist”usefulforobjectpriming

Examplesoflearnedfeaturesforbo<om‐updetec>on:applythefiltershownattoprowsandaveragethesquaredoutputover

regionsshowninbo<omrows.

Theadvantageofcontextinobjectdetec>onFor each type of object, we plot the single most probable detection if it is above a threshold (set to give 80% detection rate)

If we know we are in a street, we can prune false positives such as chair and coffee-machine (which are hard to detect, and hence must have low thresholds to get 80% hit rate)

Objectdetec>onswithoutcontext:notefalsealarms

Objectdetec>onsahersuppressionoffalsedetec>onsusingcontext

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Large,labeleddatasets.

Acorrespondence‐basedapproachtosceneparsing

Givenanimage

–  Findanotherannotatedimagewithsimilarscene

–  Findcorrespondencebetweenthesetwoimages

– Warptheannota>onaccordingtothecorrespondence

tree

sky

road

field

car

unlabeled

building

window

Input Support

Userannota>onWarpedannota>onDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Systemoverview

Flow visualization code

Query

RGB SIFT

RGB SIFT Annota>onSIFTflow

Nearestneighbors

tree

sky

road

field

car

unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Systemoverview

Flow visualization code

SIFTflow RGB SIFT Annota>on

Warpednearestneighbors

Query

RGB SIFT Parsing Groundtruth

tree

sky

road

field

car

unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Sceneparsingresults(1)

Query Bestmatch Annota>onofbestmatch

Warpedbestmatchtoquery

Parsingresult Groundtruth

Sceneparsingresults(2)

Query Bestmatch Annota>onofbestmatch

Warpedbestmatchtoquery

Parsingresult Groundtruth

Pixel‐wiseperformance

Oursystemop>mizedparameters

Per‐pixelrate74.75%

Pixel‐wisefrequencycountofeachclass

DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Comparison

J.Sho<onetal.Textonboost:Jointappearance,shapeandcontextmodelingformul>‐classobjectrecogni>onandsegmenta>on.ECCV,2006

(a)Oursystemop>mizedparameters

74.75%

(b)OursystemNoMarkovrandomfield

66.24%

(c)Sho<onetal.NoMarkovrandomfield

51.67%

(d)OursystemMatchingcolorinsteadofSIFT

49.68%

Comparisonforeachclass

•  Weconvertoursystemtoabinarydetectorforeachclassandcompareitwith[Dalal&Triggs.CVPR2005]

•  InROC,oursystem(red)outperformstheirs(blue)formostoftheclasses

DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Non‐parametricmethods

Algorithm

– Picksizeofblockandsizeofoverlap– Synthesizeblocksinrasterorder

– Searchinputtextureforblockthatsa>sfiesoverlapconstraints(aboveandleh)•  Easytoop>mizeusingNNsearch[Lianget.al.,’01]

– Pastenewblockintoresul>ngtexture•  usedynamicprogrammingtocomputeminimalerrorboundarycut

Problem:Howtoconstructandmanageanon‐parametricsignalprior?Howselecttheexemplarstouse,howquicklyfindnearestneighbormatches?

Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture

synthesis.

References:W.T.Freeman,E.C.Pasztor,O.T.CarmichaelLearningLow‐LevelVisionInterna>onalJournalofComputerVision,40(1),pp.25‐47,2000.h<p://www.merl.com/reports/docs/TR2000‐05.pdf

AlexeiA.EfrosandThomasK.Leung,TextureSynthesisbyNon‐parametricSampling,IEEEInterna>onalConferenceonComputerVision(ICCV'99),Corfu,Greece,September1999,h<p://graphics.cs.cmu.edu/people/efros/research/NPS/efros‐iccv99.pdf

2009BIRSWorkshoponComputerVisionandtheInternet

RobFergus

RickSzeliski

LanaLazebnik

Nearestneighborsearchinhighdimensions

Nearestneighborsinhigh‐dimensions.categoryrecogni>on.forinstancerecogni>on,nnforindividualfeaturesworksfine.butforcategoryrecogni>on,many>mesthelocalfeaturesarenot,bythemselves,aclosematch,duetowithin‐classvaria>ons.

Nearestneighborsearch,buttakingintoaccountarepar>culardata.or,telluswhatques>onsweshouldbeaskingaboutourdatainordertodonearestneighborsearchwell.

onthelargedatabaseside:howstorememories,concepts,objectsinverylargedatabases?Largedatabaseissues.mul>dimensional:kdtree(butonlyupto20dims)findingsimilarthingsinveryhighdimensions.

Parallelism‐‐wherecanweexploitit?kdtreehighdsearch.DoesLSHworkasadver>sed?inprac>cenotaswell.

Problem:Nearestneighborsearchinhighdimensions.

Applica>ons:Non‐parametrictexturesynthesisandsuper‐resolu>on.Imagefilling‐in.Objectrecogni>on.Scenerecogni>on.

References:(ManyinCSliterature,LSH,etc.)

PatchMatch:ARandomizedCorrespondenceAlgorithmforStructuralImageEdi>ngACMTransac>onsonGraphics(Proc.SIGGRAPH),August2009ConnellyBarnes,EliShechtman,AdamFinkelstein,DanBGoldman,h<p://www.cs.princeton.edu/gfx/pubs/Barnes_2009_PAR/patchmatch.pdf

ShaiAvidan

Blindvision

Problem:Developsecuremul>‐partytechniquesforvisionalgorithms.

Applica>ons:Secure,distributedimageanalysis.

References:

S.AvidanandM.ButmanBlindVisionEuropeanConferenceonComputerVision(ECCV),Graz,Austria,2006.h<p://www.merl.com/reports/docs/TR2006‐006.pdf

Paperabstract:Alicewouldliketodetectfacesinacollec>onofsensi>vesurveillanceimagessheown.Bobhasafacedetec>onalgorithmthatheiswillingtoletAliceuse,forafee,aslongasshelearnsnothingabouthisdetector.AliceiswillingtouseBob´sdetectorprovidedthathewilllearnnothingaboutherimages,noteventheresultofthefacedetec>onopera>on.Blindvisionisaboutapplyingsecuremul>‐partytechniquestovisionalgorithmssothatBobwilllearnnothingabouttheimagesheoperateson,noteventheresultofhisownopera>onandAlicewilllearnnothingaboutthedetector.Theprolifera>onofsurveillancecamerasraisesprivacyconcernsthatcanbeaddressedbysecuremul>‐partytechniquesandtheiradapta>ontovisionalgorithms.

DevaRamanan

Evaluateeasilyoverapowersetofallsegmenta>ons.

DevaRamanan:wantsafastandefficientwaytosearchoverallpossiblesegmenta>onsofanimage,scoringeachoneagainstsomemodel.

h<p://www.di.ens.fr/~russell/papers/Russell06.pdf

Problem:Evaluatesomesegmenta>on‐dependentfunc>onover(someapproxima>onto)allpossiblesegmenta>ons.Note:differentthanbo<om‐upsegmenta>on,whichIwouldnotrecommendasa

researchproject.

Applica>ons:Imageunderstanding.

References:Deva’shomepage:h<p://www.ics.uci.edu/~dramanan/

UsingMul>pleSegmenta>onstoDiscoverObjectsandtheirExtentinImageCollec>ons,BryanRussell,AlexeiA.Efros,JosefSivic,BillFreeman,AndrewZissermaninCVPR2006,h<p://people.csail.mit.edu/brussell/research/proj/mult_seg_discovery/index.html

AlyoshaEfros

Efroscomments

Alyosha:non‐booleanretrievaloflargedataset.ie,it'snotlogicalopera>onswewannaretreive,butrealvaluednumbers.

alyosha:theneedleinthehaystackproblem.findsignalclusters/characteris>cswhenthere'slotsofnoise.findthepa<erns,ignorethenoise.seethepictureofthe4ofuswithhatsanddeterminethathatsarewhat'sincommon.

alyosha:weneedtofindsomethingnewtogeneralizefromgraphicalmodels.thoseweregoodfortoyproblemswheretherewerelotsofcondi>onalindependencies.Butnowwedon'thavethat.wantsomeothermodel.somethingthatprovidestheabstrac>on,maybe,thatonlyafewofthesecondi>onalindependenciesareac>veatanyone>me(likesparsecoding).sortofsimilartohigherordercliques.

DavidLowe

DavidLowe

needbe<erfeatures.anar>stcandrawthenendofanelephant'strunk,andyouknowimmediatelywhatitis.butourfeaturesdon'tcapturethatsimilarityatall.

learningoffeaturesfromimages.whatisanaturalencodingofimages?asawarningforwhatapproachnottotake:don'tbotherlearningtransla>oninvariance,orrota>oninvariance.soali<lebitofsupervisionisok.

Computervisionacademicculture

Nomore“ifonly”papers

End‐to‐endempiricalorienta>onThereisacertainoverheadincominguptospeedonthefiltersandrepresenta>ons.Needdatasetvalida>onThecompe>>veconferenceshave20‐25%acceptancerate.Otherconferenceshaveli<leimpact.Thecompe>>veconferences:CVPR,ICCV,ECCV,NIPS.

Thus:besttocollaborate.

PeopleatMITtoworkwith

EdwardAdelson—BrainandCogni>veSciences,materialpercep>oninhumansandmachines;mul>‐resolu>onimagerepresenta>ons.

FredoDurand—EECS,computa>onalphotography,computergraphics.BillFreeman—EECS,computa>onalphotography,computervision.JohnFisher—CSAIL,machinelearning,computervision.PolinaGolland—EECS,medicalapplica>ons.EricGrimson—EECS,surveillance,medicalapplica>ons.BertholdHorn—EECS,computedimaging.TommyPoggio—BrainandCogni>veSciences,machinelearning,

computervision,inspiredbyandmodelinghumanvision.RameshRaskar—MediaLab,computa>onalphotography.AntonioTorralba—EECS,objectrecogni>on,sceneinterpreta>on.

Acomputergraphicsapplica>onofnearest‐neighborfindinginhighdimensions

Theimagedatabase

•  Wehavecollected~6millionimagesfromFlickrbasedonkeywordandgroupsearches

–  typicalimagesizeis500x375pixels– 720GBofdiskspace(jpegcompressed)

Imagerepresenta>on

Color layout

GIST [Oliva and Torralba’01]

Original image

Obtainingseman>callycoherentthemesWe further break-up the collection into themes of semantically coherent scenes:

Train SVM-based classifiers from 1-2k training images [Oliva and Torralba, 2001]

Basiccameramo>ons

Forward motion Camera rotation Camera pan

Starting from a single image, find a sequence of images to simulate a camera motion:

3. Find a match to fill the missing pixels

Scene matching with camera view transformations: Translation

1. Move camera

2. View from the virtual camera

4. Locally align images

5. Find a seam

6. Blend in the gradient domain

4. Stitched rotation

Scene matching with camera view transformations: Camera rotation

1. Rotate camera

2. View from the virtual camera

3. Find a match to fill-in the missing pixels

5. Display on a cylinder

More “infinite” images – camera translation

Virtual space as an image graph

Forward Rotate (left/right)

Pan (left/right)

•  Nodes represent Images

•  Edges represent particular motions:

•  Edge cost is given by the cost of the image match under the particular transformation

Image graph

Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.

Virtual image space laid out in 3D

Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.

Outline

•  Aboutme•  Computervisionapplica>ons

•  Computervisiontechniquesandproblems:– Low‐levelvision:underdeterminedproblems– High‐levelvision:combinatorialproblems– Miscellaneousproblems

Problem:InferenceinMarkovRandomFields.Wanttohandlehigherordercliquepoten>als,high‐dimensionalstatevariables,andreal‐valuedstatevariables.

Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture

synthesis.

References:PushmeetKohli,LuborLadicky,PhilipTorrRobustHigherOrderPoten>alsforEnforcingLabelConsistency.In:Interna>onalJournalofComputerVision,2009.h<p://research.microsoh.com/en‐us/um/people/pkohli/papers/klt_IJCV09.pdf