Upload
nahanchd
View
122
Download
3
Tags:
Embed Size (px)
Citation preview
Computervision,inonelecture
BillFreemanElectricalEngineeringandComputerScienceDept.
Massachuse<sIns>tuteofTechnologyApril21,2010
TheTaiyuanUniversityofTechnologyComputerCenterstaff,andme(1987)
Meandmywife,ridingfromtheForeigners’Cafeteria
Insidethecomputercenter,withtheimageprocessingequipment
WhileinChina,Ireadthisbook(tobere‐issuedbyMITPressthisyear),andgotveryexcitedaboutcomputervision.StudiedforPhDatMIT.
Goalofcomputervision
Marr:“Totellwhatiswherebylooking”.
Wantto:– Es>matetheshapesandproper>esofthings.– Recognizeobjects– Findandrecognizepeople– Findroadlanesandothercars– Helparobotwalk,navigate,orfly.– Inspectformanufacturing
Somepar>culargoalsofcomputervision
• Waveacameraaround,geta3‐dmodelout.• Capturebodyposeofactordancing.• Detectandrecognizefaces.• Recognizeobjects.• Trackpeopleorobjects
Let’sgobackin>me,tothemid‐1980’s
Whateveryonelookedlikebackthen
10
Features
• Points
butalso,• Lines• Conics• Otherfi<edcurves
11
Features“blocksworld”Atoyworldinwhichtostudyimageinterpreta>on.Allwehavetodoistoconvertrealworldimagestotheirblocksworldequivalentsandwe’reallset.
YvanLeclercandMar>nFischler,anop>miza>on‐basedapproachtothe
interpreta>onofsinglelinedrawingsas3‐dwire
frames.
Objects
12 Hu<enlocherandUllman,Objectrecogni>onusingalignment,ICCV,1986
Computervisionresearchresults,1986
13 FromRothwelletal,Efficientmodellibraryaccessbyprojec>velyinvariantindexingfunc>ons,CVPR1992.
6yearslater:Recognizingplanarobjectsusinginvariants.
Inputimage Edgepointsfi<edwithlinesorconics
Objectsthathavebeenrecognizedandverified.
Computervisionresearchresults,1992
Backtothepresent…
Companiesandapplica>ons
• Cognex• Poseidon• Mobileye• Eyetoy• Iden>x• Google• Microsoh• Facerecogni>onincameras
MobilEye
Microsoh
Microsoh
Somepar>culargoalsofcomputervision(statusreport)
• Waveacameraaround,geta3‐dmodelout(almost)
• Capturebodyposeofactordancing.Usingmul>plecameras(pre<ywell),usingasinglecamera(notyet)
• Detectandrecognizefaces.(frontal,yes)• Recognizeobjects.(workingonit,lotsofprogress)• Trackpeopleorobjects(overshort>mes)
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Largedatabases
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Largedatabases
BuildingaPanorama
M.BrownandD.G.Lowe.RecognisingPanoramas.ICCV2003
Howdowebuildapanorama?
• Weneedtomatch(align)images
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
MatchingwithFeatures• Detectfeaturepointsinbothimages
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
MatchingwithFeatures• Detectfeaturepointsinbothimages
• Findcorrespondingpairs
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
MatchingwithFeatures• Detectfeaturepointsinbothimages
• Findcorrespondingpairs• Usethesepairstoalignimages‐weknowthis
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
MatchingwithFeatures
• Problem1:– Detectthesamepointindependentlyinbothimages
nochancetomatch!
Weneedarepeatabledetector
counter‐example:
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
MatchingwithFeatures
• Problem2:– Foreachpointcorrectlyrecognizethecorrespondingone
?
Weneedareliableanddis>nc>vedescriptor
h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
Overviewoffeaturedetec2onfor(instance)objectrecogni2on
Descriptor
detector location
Note:hereviewpointisdifferent,notpanorama(theyshowoff)
• Detector:detectsamescenepointsindependentlyinbothimages
• Descriptor:encodelocalneighboringwindow– Notehowscale&rota>onofwindowarethesameinbothimage(butcomputedindependently)
• Correspondence:findmostsimilardescriptorinotherimage
CVPR2003Tutorial
Recogni2onandMatchingBasedonLocalInvariant
Features
DavidLoweComputerScienceDepartment
UniversityofBri>shColumbia
h<p://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
InvariantLocalFeatures• Imagecontentistransformedintolocalfeaturecoordinatesthatareinvarianttotransla>on,rota>on,scale,andotherimagingparameters
SIFT Features
Freemanetal,1998h<p://people.csail.mit.edu/billf/papers/cga1.pdf
Advantagesofinvariantlocalfeatures
• Locality:featuresarelocal,sorobusttoocclusionandclu<er(nopriorsegmenta>on)
• Dis2nc2veness:individualfeaturescanbematchedtoalargedatabaseofobjects
• Quan2ty:manyfeaturescanbegeneratedforevensmallobjects
• Efficiency:closetoreal‐>meperformance
• Extensibility:caneasilybeextendedtowiderangeofdifferingfeaturetypes,witheachaddingrobustness
SIFTvectorforma2on• Computedonrotatedandscaledversionofwindowaccordingtocomputedorienta>on&scale– resamplea16x16versionofthewindow
• BasedongradientsweightedbyaGaussianofvariancehalfthewindow(forsmoothfalloff)
SIFTvectorforma2on• 4x4arrayofgradientorienta>onhistograms
– notreallyhistogram,weightedbymagnitude• 8orienta>onsx4x4array=128dimensions• Mo>va>on:somesensi>vitytospa>allayout,butnottoomuch.
showingonly2x2herebutis4x4
SIFTvectorforma2on• Thresholdedimagegradientsaresampledover16x16arrayofloca>onsinscalespace
• Createarrayoforienta>onhistograms
• 8orienta>onsx4x4histogramarray=128dimensions
showingonly2x2herebutis4x4
Ensuresmoothness• Gaussianweight• Trilinearinterpola>on
– agivengradientcontributesto8bins:4inspace>mes2inorienta>on
Reduceeffectofillumina2on• 128‐dimvectornormalizedto1
• Thresholdgradientmagnitudestoavoidexcessiveinfluenceofhighgradients
– ahernormaliza>on,clampgradients>0.2– renormalize
Featurestabilitytonoise• Matchfeaturesaherrandomchangeinimagescale&orienta>on,withdifferinglevelsofimagenoise
• Findnearestneighborindatabaseof30,000features
Featurestabilitytoaffinechange• Matchfeaturesaherrandomchangeinimagescale&
orienta>on,with2%imagenoise,andaffinedistor>on
• Findnearestneighborindatabaseof30,000features
Dis2nc2venessoffeatures• Varysizeofdatabaseoffeatures,with30degreeaffinechange,2%imagenoise
• Measure%correctforsinglenearestneighbormatch
Thesefeaturepointdetectorsanddescriptorsarethemostimportantrecentadvancein
computervisionandgraphics.
• Featurepointsareusedalsofor:– Imagealignment(homography,fundamentalmatrix)– 3Dreconstruc>on– Mo>ontracking– Objectrecogni>on– Indexinganddatabaseretrieval– Robotnaviga>on– …other
MoreusesforSIFTfeatures
SIFTfeatureshavealsobeenappliedto(categorical)objectrecogni>on
First,let’spresentvariousoftheissuesinobjectrecogni>on.
intra‐classvaria>on
Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/
Objectrecogni2onissues
– Genera>ve/discrimina>ve/hybrid
Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/
Objectrecogni2onissues
– Genera>ve/discrimina>ve/hybrid
– Appearanceonlyorloca>onandappearance
Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/
Objectrecogni2onissues
– Genera>ve/discrimina>ve/hybrid
– Appearanceonlyorloca>onandappearance
– Invariances• Viewpoint• Illumina>on• Occlusion• Scale• Deforma>on• Clu<er• etc.
Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/
Objectrecogni2onissues
– Genera>ve/discrimina>ve/hybrid
– Appearanceonlyorloca>onandappearance
– invariances– Partsorglobalw/sub‐window
– Usesetoffeaturesoreachpixelinimage
Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/
Currentapproachesinobjectrecogni>on
• Bagofwords• Boos>ng• Labeltransfer
Visualwords
• Vectorquan>zeSIFTdescriptorstoavocabularyof2or3thousand“visualwords”.
• Heuris>cdesignofdescriptorsmakesthesewordssomewhatinvariantto:– Ligh>ng– 2‐dOrienta>on– 3‐dViewpoint
Comparewithobjectclassdatabase
Findwords
Formhistograms
Objectrecogni>onusingvisualwords
Manycombinatorialmatchingproblemstobesolvedforobjectrecogni>on.
Instancerecogni>on:withfeaturesallowedtoappearornotinboththetestandtrainingexamples.
Deformableobjectrecogni>on:somefeatureclustersmaintainspa>alcoherence,otherscanvary.
Categoryrecogni>on:eachclassdefinedbymanydifferenttrainingsetexemplars.Findtheclassthatbestexplainstheobservedfeatureset.
Semi‐supervisedobjectrecogni>on:observedtrainingsetfeaturesincludemanybackgroundobjectfeatures.
h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf
h<p://www.cs.utexas.edu/~grauman/research/projects/pmk/pmk_projectpage.htm
Caltech101
Caltech101resultsover>me
Problem:Categorylevelrecogni>onusingvisualwordsrepresenta>on.
Applica>ons:Objectrecogni>on.
References:Lazebnik,Schmid,andPonce,Beyondbagsoffeatures:Spa>alpyramidmatchingforrecognizingnaturalscenecategories,ComputerVisionandPa<ernRecogni>on(CVPR2006),h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf
K.GraumanandT.Darrell.UnsupervisedLearningofCategoriesfromSetsofPar>allyMatchingImageFeatures.InProceedingsoftheIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),NewYorkCity,NY,June2006,h<p://www.cs.utexas.edu/~grauman/papers/grauman_darrell_cvpr2006.pdf
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers—SVM’sandboos>ng
• Bayesianmethods
• Largedatabases
PaulViolaMichaelJ.JonesMitsubishiElectricResearchLaboratories(MERL)
Cambridge,MA
MostofthisworkwasdoneatCompaqCRLbeforetheauthorsmovedtoMERL
Rapid Object Detection Using a Boosted Cascade of Simple Features
h<p://citeseer.ist.psu.edu/cache/papers/cs/23183/h<p:zSzzSzwww.ai.mit.eduzSzpeoplezSzviolazSzresearchzSzpublica>onszSzICCV01‐Viola‐Jones.pdf/viola01robust.pdf
Manuscriptavailableonweb:
Viola‐Jonesapproach
• Largefeatureset(…ishugeabout16,000,000features)
• Efficientfeatureselec>onusingAdaBoost
• CascadedClassifierforrapiddetec>on– HierarchyofA<en>onalFilters
The combination of these ideas yields the fastest known face detector for gray scale images.
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
ImageFeatures
“Rectangle filters”
Similar to Haar wavelets
Differences between sums of pixels in adjacent rectangles
{ ht(x) = +1 if ft(x) > θt -1 otherwise Unique Features
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
Huge“Library”ofFilters
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
IntegralImage
• DefinetheIntegralImage
• Anyrectangularsumcanbecomputedinconstant>me:
• Rectanglefeaturescanbecomputedasdifferencesbetweenrectangles
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
Construc>ngclassifiersbycombiningfilteroutputs
• Perceptronyieldsasufficientlypowerfulclassifier
• UseAdaBoosttoefficientlychoosebestfeatures
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
AdaBoost Ini>aluniformweightontrainingexamples
weakclassifier1
weakclassifier2
Incorrectclassifica2onsre‐weightedmoreheavily
weakclassifier3
Finalclassifierisweightedcombina2onofweakclassifiers
(Freund&Shapire’95)
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
Ada‐BoostTutorial
• GivenaWeaklearningalgorithm– Learnertakesatrainingsetandreturnsthebestclassifierfromaweakconceptspace
• requiredtohaveerror<50%
• Star>ngwithaTrainingSet(ini>alweights1/n)– Weaklearningalgorithmreturnsaclassifier– Reweighttheexamples
• Weightoncorrectexamplesisdecreased• Weightonerrorsisdecreased
• FinalclassifierisaweightedmajorityofWeakClassifiers– Weakclassifierswithlowerrorgetlargerweight
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
ReviewofAdaBoost(Freund&Shapire95)
• Givenexamples(x1,y1),…,(xN,yN)whereyi=0,1fornega>veandposi>veexamplesrespec>vely.• Ini>alizeweightswt=1,i=1/N
• Fort=1,…,T• Normalizetheweights,wt,i=wt,i/Σwt,j
• Findaweaklearner,i.e.ahypothesis,ht(x)withweightederrorlessthan.5• Calculatetheerrorofht:et=Σwt,i|ht(xi)–yi|
• Updatetheweights:wt,i=wt,iBt(1‐di)whereBt=et/(1‐et)anddi=0ifexamplexiisclassifiedcorrectly,di=1otherwise.
• Thefinalstrongclassifieris
whereαt=log(1/Bt)
j=1
N
1if Σ αtht(x)> 0.5Σ αt
0otherwise
T
t=1 t=1
T
{h(x)=
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
ExampleClassifierforFaceDetec>on
ROC curve for 200 feature classifier
One stage: a classifier with 200 rectangle features was learned using AdaBoost
95% correct detection on test set with 1 in 14084 false positives.
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
Developfast,accurateclassifierusingacascade
• Givenanestedsetofclassifierhypothesisclasses
• Computa>onalRiskMinimiza>on
vsfalsenegdeterminedby
%FalsePos
%Detec>o
n
050
5099
FACEIMAGESUB‐WINDOW
Classifier1
F
T
NON‐FACE
Classifier3T
F
NON‐FACE
F
T
NON‐FACE
Classifier2T
F
NON‐FACE
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
Experiment:SimpleCascadedClassifier
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
CascadedClassifier
1Feature 5Features
F
50%20Features
20% 2%
FACE
NON‐FACE
F
NON‐FACE
F
NON‐FACE
IMAGESUB‐WINDOW
• A1featureclassifierachieves100%detec>onrateandabout50%falseposi>verate.
• A5featureclassifierachieves100%detec>onrateand40%falseposi>verate(20%cumula>ve)– usingdatafrompreviousstage.
• A20featureclassifierachieve100%detec>onratewith10%falseposi>verate(2%cumula>ve)
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
AReal‐>meFaceDetec>onSystem
Trainingfaces:4916faceimages(24x24pixels)plusver>calflipsforatotalof9832faces
Trainingnon‐faces:350millionsub‐windowsfrom9500non‐faceimages
Finaldetector:38layercascadedclassifierThenumberoffeaturesperlayerwas1,10,25,25,50,50,50,75,100,…,200,…
Finalclassifiercontains6061features.ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
AccuracyofFaceDetector
Performance on MIT+CMU test set containing 130 images with 507 faces and about 75 million sub-windows.
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
ComparisontoOtherSystems
10 31 50 65 78 95 110 167
Viola-Jones 76.1 88.4 91.4 92.0 92.1 92.9 93.1 93.9
Viola-Jones (voting)
81.1 89.7 92.1 93.1 93.1 93.2 93.7 93.7
Rowley-Baluja-Kanade
83.2 86.0 89.2 90.1
Schneiderman-Kanade
94.4
Detector
False Detections
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
SpeedofFaceDetector
Speed is proportional to the average number of features computed per sub-window.
On the MIT+CMU test set, an average of 9 features out of a total of 6061 are computed per sub-window.
On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).
Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
OutputofFaceDetectoronTestImages
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
MoreExamples
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
Conclusions
• We[they]havedevelopedthefastestknownfacedetectorforgrayscaleimages
• Threecontribu>onswithbroadapplicability– Cascadedclassifieryieldsrapidclassifica>on– AdaBoostasanextremelyefficientfeatureselector
– RectangleFeatures+IntegralImagecanbeusedforrapidimageanalysis
ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Largedatabases
Trackingahumanin3D
The appearance of people can vary dramatically.
People can appear in arbitrary poses.
Structure is unobservable—inference from visible parts.
Geometrically under-constrained.
Butthisrequiresthatweusemarkers,whichwedon’twant,andalsorequiresmul>plecameras.
http://www.vicon.com/animation/
State of the Art.
• Brightnessconstancycue– Insensi>vetoappearance
• Full‐bodyrequiredmul>plecameras
• Singlehypothesis
State of the Art.
I(x, t) = I(x+u, 0) + η
• Singlecamera,mul>plehypotheses• 2Dtemplates(nodrihbutviewdependent)
State of the Art.
• Mul>plehypotheses
• Mul>plecameras
• Simplifiedclothing,ligh>ngandbackground
* No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment
Task: Infer 3D human motion from 2D image
p(model | cues) = p(cues | model) p(model)
3. Posterior probability: Need an effective way to explore the model space (very high dimensional) and represent ambiguities.
p(cues)
1. Need a constraining likelihood model that is also invariant to variations in human appearance.
2. Need a prior model of how people move.
Systemcomponentsforhumanbodytracking
• Representa>onforprobabilis>canalysis.• Modelsforhumanmo>on(priorterm).• Modelsforhumanappearance(likelihoodterm).
• Representa>onforprobabilis>canalysis.• Modelsforhumanmo>on(priorterm).
• Modelsforhumanappearance(likelihoodterm).
* Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ
• Posteriordistribu>onovermodelparametersohenmul>‐modal(duetoambigui>es)
• Representwholedistribu>on:– sampledrepresenta>on– eachsampleisapose– predictover>meusingapar>clefilteringapproach
• IsardandBlake,1998,“Condensa>onAlgorithm”
Posterior Temporal dynamics
Likelihood Posterior
Giventhedatasofar,whatdoIthinkisthesetofpossiblestatesthebodycouldbein?
Whatcouldeachofthosestatesbecomeatthenext>mestep?(Usespriormodelforhumanmo>on).
Howmuchiseachofthosepossiblestatessupportedbythevisualdataatthenext>mestep?
Updatees>mateofpossiblestates,giventhevisualdata.
• Representa>onforprobabilis>canalysis.• Modelsforhumanmo>on(priorterm).• Modelsforhumanappearance(likelihoodterm).
• Onlyhandlespeoplewalking.• Verypowerfulconstraintonhumanmo>on.
• Ac>on‐specificmodel‐Walking– Trainingdata:3Dmo>oncapturedata
– Fromtrainingset,learnmeancycleandcommonmodesofdevia>on(PCA)
Mean cycle Small noise Large noise
Initialize to figure, then let go…
• Representa>onforprobabilis>canalysis.• Modelsforhumanmo>on(priorterm).• Modelsforhumanappearance(likelihoodterm).
Changing background
Low contrast limb boundaries
Occlusion
Varying shadows
Deforming clothing
What do non-people look like?
What do people look like?
(5000 samples in each example)
Edge cues
Ridge cues
Flow cues
Edge cues
Ridge cues
Flow cues
Walking model
2500 samples ~10 min/frame
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Largedatasets• Miscellaneousadvances:exploi>ngcontext
Images by Antonio Torralba
Useofcontextforobjectdetec>on
car pedestrian
Identical local image features!
Contextspeedsobjectdetec>on:thisiswhattheworldlooksliketoafacedetectorthatdoesn’ttakeadvantageofcontext.Canyoufindthe
face?
AntonioTorralba
Thebestobjectdetec>onalgorithmscombinetop‐down(context)withbo<om‐up(localfeatures)cues.
Thetop‐downinforma>oncanhelpsuppressfalsedetec>onscausedbyambiguouslocalinforma>on.
Featurevectorforanimage:the“gist”ofthescene
– Compute 12 x 30 = 360 dim. feature vector – Or use steerable filter bank, 6 orientations, 4 scales, averaged
over 4x4 regions = 384 dim. feature vector – Reduce to ~ 80 dimensions using PCA
Oliva & Torralba, IJCV 2001
Low‐dimensionalrepresenta>onforimagecontext
Images
Random noise filtered to have the
same 80-dimensional
representation as the images above.
“gist”usefulforobjectpriming
Examplesoflearnedfeaturesforbo<om‐updetec>on:applythefiltershownattoprowsandaveragethesquaredoutputover
regionsshowninbo<omrows.
Theadvantageofcontextinobjectdetec>onFor each type of object, we plot the single most probable detection if it is above a threshold (set to give 80% detection rate)
If we know we are in a street, we can prune false positives such as chair and coffee-machine (which are hard to detect, and hence must have low thresholds to get 80% hit rate)
Objectdetec>onswithoutcontext:notefalsealarms
Objectdetec>onsahersuppressionoffalsedetec>onsusingcontext
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Large,labeleddatasets.
Acorrespondence‐basedapproachtosceneparsing
Givenanimage
– Findanotherannotatedimagewithsimilarscene
– Findcorrespondencebetweenthesetwoimages
– Warptheannota>onaccordingtothecorrespondence
tree
sky
road
field
car
unlabeled
building
window
Input Support
Userannota>onWarpedannota>onDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.
Systemoverview
Flow visualization code
Query
RGB SIFT
RGB SIFT Annota>onSIFTflow
Nearestneighbors
tree
sky
road
field
car
unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.
Systemoverview
Flow visualization code
SIFTflow RGB SIFT Annota>on
Warpednearestneighbors
Query
RGB SIFT Parsing Groundtruth
tree
sky
road
field
car
unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.
Sceneparsingresults(1)
Query Bestmatch Annota>onofbestmatch
Warpedbestmatchtoquery
Parsingresult Groundtruth
Sceneparsingresults(2)
Query Bestmatch Annota>onofbestmatch
Warpedbestmatchtoquery
Parsingresult Groundtruth
Pixel‐wiseperformance
Oursystemop>mizedparameters
Per‐pixelrate74.75%
Pixel‐wisefrequencycountofeachclass
DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.
Comparison
J.Sho<onetal.Textonboost:Jointappearance,shapeandcontextmodelingformul>‐classobjectrecogni>onandsegmenta>on.ECCV,2006
(a)Oursystemop>mizedparameters
74.75%
(b)OursystemNoMarkovrandomfield
66.24%
(c)Sho<onetal.NoMarkovrandomfield
51.67%
(d)OursystemMatchingcolorinsteadofSIFT
49.68%
Comparisonforeachclass
• Weconvertoursystemtoabinarydetectorforeachclassandcompareitwith[Dalal&Triggs.CVPR2005]
• InROC,oursystem(red)outperformstheirs(blue)formostoftheclasses
DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.
Whathasallowedustomakeprogress?
• SIFTfeatures• Discrimina>veclassifiers
• Bayesianmethods
• Non‐parametricmethods
Algorithm
– Picksizeofblockandsizeofoverlap– Synthesizeblocksinrasterorder
– Searchinputtextureforblockthatsa>sfiesoverlapconstraints(aboveandleh)• Easytoop>mizeusingNNsearch[Lianget.al.,’01]
– Pastenewblockintoresul>ngtexture• usedynamicprogrammingtocomputeminimalerrorboundarycut
Problem:Howtoconstructandmanageanon‐parametricsignalprior?Howselecttheexemplarstouse,howquicklyfindnearestneighbormatches?
Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture
synthesis.
References:W.T.Freeman,E.C.Pasztor,O.T.CarmichaelLearningLow‐LevelVisionInterna>onalJournalofComputerVision,40(1),pp.25‐47,2000.h<p://www.merl.com/reports/docs/TR2000‐05.pdf
AlexeiA.EfrosandThomasK.Leung,TextureSynthesisbyNon‐parametricSampling,IEEEInterna>onalConferenceonComputerVision(ICCV'99),Corfu,Greece,September1999,h<p://graphics.cs.cmu.edu/people/efros/research/NPS/efros‐iccv99.pdf
2009BIRSWorkshoponComputerVisionandtheInternet
RobFergus
RickSzeliski
LanaLazebnik
Nearestneighborsearchinhighdimensions
Nearestneighborsinhigh‐dimensions.categoryrecogni>on.forinstancerecogni>on,nnforindividualfeaturesworksfine.butforcategoryrecogni>on,many>mesthelocalfeaturesarenot,bythemselves,aclosematch,duetowithin‐classvaria>ons.
Nearestneighborsearch,buttakingintoaccountarepar>culardata.or,telluswhatques>onsweshouldbeaskingaboutourdatainordertodonearestneighborsearchwell.
onthelargedatabaseside:howstorememories,concepts,objectsinverylargedatabases?Largedatabaseissues.mul>dimensional:kdtree(butonlyupto20dims)findingsimilarthingsinveryhighdimensions.
Parallelism‐‐wherecanweexploitit?kdtreehighdsearch.DoesLSHworkasadver>sed?inprac>cenotaswell.
Problem:Nearestneighborsearchinhighdimensions.
Applica>ons:Non‐parametrictexturesynthesisandsuper‐resolu>on.Imagefilling‐in.Objectrecogni>on.Scenerecogni>on.
References:(ManyinCSliterature,LSH,etc.)
PatchMatch:ARandomizedCorrespondenceAlgorithmforStructuralImageEdi>ngACMTransac>onsonGraphics(Proc.SIGGRAPH),August2009ConnellyBarnes,EliShechtman,AdamFinkelstein,DanBGoldman,h<p://www.cs.princeton.edu/gfx/pubs/Barnes_2009_PAR/patchmatch.pdf
ShaiAvidan
Blindvision
Problem:Developsecuremul>‐partytechniquesforvisionalgorithms.
Applica>ons:Secure,distributedimageanalysis.
References:
S.AvidanandM.ButmanBlindVisionEuropeanConferenceonComputerVision(ECCV),Graz,Austria,2006.h<p://www.merl.com/reports/docs/TR2006‐006.pdf
Paperabstract:Alicewouldliketodetectfacesinacollec>onofsensi>vesurveillanceimagessheown.Bobhasafacedetec>onalgorithmthatheiswillingtoletAliceuse,forafee,aslongasshelearnsnothingabouthisdetector.AliceiswillingtouseBob´sdetectorprovidedthathewilllearnnothingaboutherimages,noteventheresultofthefacedetec>onopera>on.Blindvisionisaboutapplyingsecuremul>‐partytechniquestovisionalgorithmssothatBobwilllearnnothingabouttheimagesheoperateson,noteventheresultofhisownopera>onandAlicewilllearnnothingaboutthedetector.Theprolifera>onofsurveillancecamerasraisesprivacyconcernsthatcanbeaddressedbysecuremul>‐partytechniquesandtheiradapta>ontovisionalgorithms.
DevaRamanan
Evaluateeasilyoverapowersetofallsegmenta>ons.
DevaRamanan:wantsafastandefficientwaytosearchoverallpossiblesegmenta>onsofanimage,scoringeachoneagainstsomemodel.
h<p://www.di.ens.fr/~russell/papers/Russell06.pdf
Problem:Evaluatesomesegmenta>on‐dependentfunc>onover(someapproxima>onto)allpossiblesegmenta>ons.Note:differentthanbo<om‐upsegmenta>on,whichIwouldnotrecommendasa
researchproject.
Applica>ons:Imageunderstanding.
References:Deva’shomepage:h<p://www.ics.uci.edu/~dramanan/
UsingMul>pleSegmenta>onstoDiscoverObjectsandtheirExtentinImageCollec>ons,BryanRussell,AlexeiA.Efros,JosefSivic,BillFreeman,AndrewZissermaninCVPR2006,h<p://people.csail.mit.edu/brussell/research/proj/mult_seg_discovery/index.html
AlyoshaEfros
Efroscomments
Alyosha:non‐booleanretrievaloflargedataset.ie,it'snotlogicalopera>onswewannaretreive,butrealvaluednumbers.
alyosha:theneedleinthehaystackproblem.findsignalclusters/characteris>cswhenthere'slotsofnoise.findthepa<erns,ignorethenoise.seethepictureofthe4ofuswithhatsanddeterminethathatsarewhat'sincommon.
alyosha:weneedtofindsomethingnewtogeneralizefromgraphicalmodels.thoseweregoodfortoyproblemswheretherewerelotsofcondi>onalindependencies.Butnowwedon'thavethat.wantsomeothermodel.somethingthatprovidestheabstrac>on,maybe,thatonlyafewofthesecondi>onalindependenciesareac>veatanyone>me(likesparsecoding).sortofsimilartohigherordercliques.
DavidLowe
DavidLowe
needbe<erfeatures.anar>stcandrawthenendofanelephant'strunk,andyouknowimmediatelywhatitis.butourfeaturesdon'tcapturethatsimilarityatall.
learningoffeaturesfromimages.whatisanaturalencodingofimages?asawarningforwhatapproachnottotake:don'tbotherlearningtransla>oninvariance,orrota>oninvariance.soali<lebitofsupervisionisok.
Computervisionacademicculture
Nomore“ifonly”papers
End‐to‐endempiricalorienta>onThereisacertainoverheadincominguptospeedonthefiltersandrepresenta>ons.Needdatasetvalida>onThecompe>>veconferenceshave20‐25%acceptancerate.Otherconferenceshaveli<leimpact.Thecompe>>veconferences:CVPR,ICCV,ECCV,NIPS.
Thus:besttocollaborate.
PeopleatMITtoworkwith
EdwardAdelson—BrainandCogni>veSciences,materialpercep>oninhumansandmachines;mul>‐resolu>onimagerepresenta>ons.
FredoDurand—EECS,computa>onalphotography,computergraphics.BillFreeman—EECS,computa>onalphotography,computervision.JohnFisher—CSAIL,machinelearning,computervision.PolinaGolland—EECS,medicalapplica>ons.EricGrimson—EECS,surveillance,medicalapplica>ons.BertholdHorn—EECS,computedimaging.TommyPoggio—BrainandCogni>veSciences,machinelearning,
computervision,inspiredbyandmodelinghumanvision.RameshRaskar—MediaLab,computa>onalphotography.AntonioTorralba—EECS,objectrecogni>on,sceneinterpreta>on.
Acomputergraphicsapplica>onofnearest‐neighborfindinginhighdimensions
Theimagedatabase
• Wehavecollected~6millionimagesfromFlickrbasedonkeywordandgroupsearches
– typicalimagesizeis500x375pixels– 720GBofdiskspace(jpegcompressed)
Imagerepresenta>on
Color layout
GIST [Oliva and Torralba’01]
Original image
Obtainingseman>callycoherentthemesWe further break-up the collection into themes of semantically coherent scenes:
Train SVM-based classifiers from 1-2k training images [Oliva and Torralba, 2001]
Basiccameramo>ons
Forward motion Camera rotation Camera pan
Starting from a single image, find a sequence of images to simulate a camera motion:
3. Find a match to fill the missing pixels
Scene matching with camera view transformations: Translation
1. Move camera
2. View from the virtual camera
4. Locally align images
5. Find a seam
6. Blend in the gradient domain
4. Stitched rotation
Scene matching with camera view transformations: Camera rotation
1. Rotate camera
2. View from the virtual camera
3. Find a match to fill-in the missing pixels
5. Display on a cylinder
More “infinite” images – camera translation
Virtual space as an image graph
Forward Rotate (left/right)
Pan (left/right)
• Nodes represent Images
• Edges represent particular motions:
• Edge cost is given by the cost of the image match under the particular transformation
Image graph
Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.
Virtual image space laid out in 3D
Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.
Outline
• Aboutme• Computervisionapplica>ons
• Computervisiontechniquesandproblems:– Low‐levelvision:underdeterminedproblems– High‐levelvision:combinatorialproblems– Miscellaneousproblems
Problem:InferenceinMarkovRandomFields.Wanttohandlehigherordercliquepoten>als,high‐dimensionalstatevariables,andreal‐valuedstatevariables.
Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture
synthesis.
References:PushmeetKohli,LuborLadicky,PhilipTorrRobustHigherOrderPoten>alsforEnforcingLabelConsistency.In:Interna>onalJournalofComputerVision,2009.h<p://research.microsoh.com/en‐us/um/people/pkohli/papers/klt_IJCV09.pdf