CAP 6412 Advanced Computer Visionbgong/CAP6412/lec5.pdf · CAP 6412 Advanced Computer Vision ... Human-centered CV 3D CV Low-level CV, etc. ... • Object detection proposals, by

CAP6412AdvancedComputerVision

http://www.cs.ucf.edu/~bgong/CAP6412.html

Boqing GongJan26,2016

Today

• Administrivia• Abiggerpictureandsomecommonquestions• Objectdetectionproposals,bySamer

Pastdue(12pmtoday)

• Assignment2:Reviewthefollowingpaper

{Major}[DetectionProposals]J.Hosang,R.Benenson,P.Dollár,andB.Schiele.Whatmakesforeffectivedetectionproposals?PAMI2015.

Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx

Anassignmentwithnoduedates

• See“PaperPresentation”onUCFwebcourse

• Sharingyourslides• Refertotheoriginalssourcesofimages,figures,etc.inyourslides• ConvertthemtoaPDFfile• UploadthePDFfileto“PaperPresentation”afteryourpresentation

ScheduleupdateWeek2 CNNvisualization&objectrecognition

Week3 CNN&objectlocalization

Week4 CNN&transferlearning

Week5 CNN &segmentation,super-resolution

Week6 CNN&videos(opticalflow,pose)

Week7 Imagecaptioning&attentionmodel

Week8 Visualquestionanswering

Week9 Attentionmodel,aligningbookswithmovies

Week10--16 Video:tracking,action,surveillanceHuman-centered CV3DCVLow-levelCV,etc.

Nextweek:Imagecaptioning&attentionmodel

Tuesday(02/02)

Harish RaviPrakash

Karpathy, Andrej, and Li Fei-Fei. “Deep visual-semantic alignments forgenerating image descriptions.” arXiv preprint arXiv:1412.2306(2014).

& Secondary papersThursday(02/04)

Karan Daei-Mojdehi

Xu, Kelvin, Jimmy Ba, Ryan Kiros, Aaron Courville, RuslanSalakhutdinov, Richard Zemel, and Yoshua Bengio. “Show, attend andtell: Neural image caption generation with visual attention.” arXivpreprint arXiv:1502.03044 (2015).

& Secondary papers

Beginningnextclass

• Makegoodpresentations--- #3courseobjective- Title,authors(fullname),authors’institutes,yournameandemail- Motivationoftheresearch(1—2slides)- Problemstatement(1—2slides)- Maincontributionsofthepaper- Approachoutline(1slide)- Detailsoftheproposedapproach- Experiments- Relatedwork(1—3slides)- Conclusion:take-homemessage(1—2slides)- Strengths&weaknessesofthepaper(1—2slides)- Overallrating&why(howyouweighthestrengthsandweaknesses)(1slide)- Futuredirections(1—3slides)

Beginningnextclass

• Makegoodpresentations--- #3courseobjective- Title,authors(fullname),authors’institutes,yournameandemail- Motivationoftheresearch(1—2slides)- Problemstatement(1—2slides)- Maincontributionsofthepaper- Approachoutline(1slide)- Detailsoftheproposedapproach- Experiments- Relatedwork(1—3slides)- Conclusion:take-homemessage(1—2slides)- Strengths&weaknessesofthepaper(1—2slides)- Overallrating&why(howyouweighthestrengthsandweaknesses)(1slide)- Futuredirections(1—3slides)

40minsonlyLeavemetimetocover:• Underexploitedpointsinslides/discussion• Techniquedetails• Morerelatedworkandreadingreferences• Myowncomments

Today


Whywereadthesepapers: Apersonalizedandbiasedperspective


Time Event RelatedPapers Read?

01/2012 NegativeCVPRreviews

[LeNet]YannLeCun,L.Bottou,Y.Bengio,andP.Haffner.Gradient-based learningapplied todocument recognition.ProceedingsoftheIEEE,november 1998.

Yes





Yes

10/2012 AlexNet winsILSVRC2012

[AlexNet] Krizhevsky,Alex,IlyaSutskever,andGeoffreyE.Hinton. “Imagenet classificationwithdeepconvolutionalneuralnetworks.”InNIPS,2012.

Yes





Yes



Yes

11/2013 Visualize&understandCNNs

[Visualization] Zeiler,MatthewD.,andRobFergus.“Visualizingandunderstanding convolutionalnetworks.”InECCV,2014.

Yes





Yes



Yes

11/2013 Visualize&understandCNNs

[Visualization] Zeiler,MatthewD.,andRobFergus.“Visualizingandunderstanding convolutionalnetworks.”InECCV,2014.

Yes

2014 CNN winsonobjectdetection

Girshick,Ross,JeffDonahue, TrevorDarrell,andJagannathMalik."Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation."InCVPR, 2014.

ThisThursday

Basicnetworkstructures--- whereisCNN?

• Feed-forwardnetworks • Recurrentneuralnetworks

Imagecredit:http://mesin-belajar.blogspot.com/2016/01/a-brief-history-of-neural-nets-and-deep_84.html

CNN:aspecialformoffeed-forwardnetworks

• Seewhiteboard

Detour:WeightsharinginCNN

Convolutionlayer

Neuronsofthesamefeaturemapsharethesameweights(thefilter)

Significantlyreduced#parameters

Imagecredit:deeplearning.net/tutorial/lenet.html

Detour:SparseconnectioninCNN

TheLeNet [LeCun etal.’1998]

Sparseconnectionsvs. FullconnectionSmaller#parameters,betterlearningefficiency

Today


Whatmakesforeffectivedetectionproposals?

JanHosang1,RodrigoBeneson1,PiotrDollar2,andBernt Schiele1

1MaxPlanckInstituteforInformatics2FacebookAIResearch(FAIR)

Presentedby:Samer Iskander

([email protected])

Motivation• Highperformingobjectdetectorsarebasedonobjectproposals,inordertoavoidexhaustiveslidingwindowsearchacrosstheimage.

• Asaresultofthat,anin-depthanalysisofdifferentmethodsisrequired,inordertostudytheirimpactondetectionperformance.

ProblemStatement

• Althoughthewidespreaduseofdetectionproposals,itisnecessarytostudytheperformancemetricstrade-offswhenemployingthem.

MainContributions• Asystematicoverviewofdetectionproposalmethodsisprovided.

• Thenotionofproposalrepeatabilityisintroduced.• Objectrecallmetricisstudiedondifferentdatasets.• Theinfluenceofdifferentproposalmethodswhenappliedonselectedobjectsdetectionalgorithms(DPM,R-CNNandFastR-CNN).

• Anovelmetric,theaveragerecall(AR),whichrewardsbothproposallocalizationandrecallperformancemetricsandeffectsthedetectionperformanceisproposed.

ApproachOutline1.DetectionProposalMethods1.1BaselineProposalMethod

2.EvaluationMetricsforObjectProposals3.ProposalRepeatability4.ProposalRecall5.UsingTheDetectionProposals5.1DetectorResponsesAroundObjects5.2LM-LLDA,R-CNNandFastR-CNNdetection

performance5.3Predictingdetectionperformance

1.DetectionProposalMethods

DetailsofTheProposedApproach

DetectionProposalMethods

Grouping ProposalMethods

• Theyattempttogeneratesegments(maybeoverlapped) thatarelikely tocorrespond toobjects

WindowScoringMethods

• Theyscoreeachcandidatewindowaccordingtohowlikelyitistocontainanobject.

• Itisfaster.• Ifnotgeneratesdensely

windows, lowlocalizationaccuracy

1.1BaselineProposalMethodA.Uniform:Togenerateproposals,itisnecessarytouniformlysampletheboundingboxcenterposition(x,y),squarerootareaandlogaspectratio.

ThePASCALVOC2007trainingsetisusedtoestimatetheseparameters.

B.Gaussian:Togenerateproposals,itisnecessarytomultivariateGaussiandistributiontheboundingboxcenterposition(x,y),squarerootareaandlogaspectratio.

C.SlidingWindow:Equallydistributedwindowsinspacearegenerated.BING(Binarized NormedGradientsforObjectness Estimationat300fps)uses29specificsizes,thismethodspreadthissizeshomogeneouslyinsidetheimage.

D.Superpixels:Superpixels aregeneratedfromEfficientGraph-BasedImageSegmentation.

2.EvaluationMetricsforObjectProposals

1.IntersectionOverUnion(IOU):• Themetricsusedforevaluatingobjectproposalsarealltypicallyfunctionsofintersectionoverunion(IOU)betweengeneratedproposalsandground-truthannotations.

• Fortwoboxes/regionsbi andbj ,IOUisdefinedas:

𝐼𝑂𝑈 𝑏%, 𝑏' =𝑎𝑟𝑒𝑎 𝑏% ∩ 𝑏'𝑎𝑟𝑒𝑎 𝑏% ∪ 𝑏'

2.Recall@IOUThresholdt:• Foreachground-truthinstance,checkwhetherthebestproposalfromlistLhasIOU>t.

• Ifso,thisground-truthinstanceisconsidereddetectedorrecalled.

• Thenaveragerecallismeasuredoveralltheground-truthinstances.

𝑟𝑒𝑐𝑎𝑙𝑙@𝑡 =1|𝐺| 5 𝐼 max

9:∈<𝐼𝑂𝑈 𝑔%, 𝑙% > 𝑡

?:∈@

I[.]isanindicatorfunctionforlogicalprepositionintheargument

• Objectproposalsareevaluatedusingthismetricintwoways:1.Plottingrecallvs.tbyfixing#proposalsinL.

2.Plottingrecallvs.#proposalsbyfixingt.

3.AverageBestOverlap(ABO):Thismetriceliminatestheneedforthethreshold.Calculatetheoverlapbetweeneachground-truthannotationgiϵGandthebestobjecthypothesisinL.

𝐴𝐵𝑂 =1|𝐺| 5 max

9:∈<𝐼𝑂𝑈 𝑔%, 𝑙%

?:∈@

4.AverageRecall(AR):

𝐴𝐵𝑂 = D|@|∑ max

9:∈<(𝐼𝑂𝑈 𝑔%, 𝑙%?:∈@ -0.5,0)

Averagerecall(forIOUbetween0.5:1)vs.#proposals

5.VolumeUnderSurface(VUS):Itplotsrecallasafunctionofbothtand#proposalsandcomputesthevolumeunderthesurface.

3.ProposalRepeatability

1.ForeachimageinthePASCALVOC2007testset,severalperturbedversionsaregenerated(blur,rotation,scale,illumination,JPEGcompression,and“saltandpepper”noise).

2.Foreachpairofreferenceandperturbedimages,detectionproposalsarecomputedwithagivenmethod(generating1000windowsperimage).3.Theproposalsareprojectedbackfromtheperturbedintothereferenceimageandthenmatchedtotheproposalsinthereferenceimage.4.Then,plotrecallvs.IOUt(0:1),andrepeatabilityistheareaunderthecurve.5.MethodsthatproposewindowsatsimilarlocationsathighIoU—andthusonsimilarimagecontent—aremorerepeatable,sincetheareaunderthecurveislarger.6.Largewindowsaremorelikelytomatchthansmalleronessincethesameperturbationwillhavealargerrelativeeffectonsmallerwindows.

• Scale:AllmethodsexceptBingshowadrasticdropwithsmallscalechanges,butsufferonlyminordegradationforlargerchanges.Bingismorerobusttosmallscalechanges;however,itismoresensitivetolargerchangesduetoitsuseofacoarsesetofboxsizeswhilesearchingforcandidates.

• JPEGCompression:Smallcompressionhasalargeeffectandmoreaggressivecompressionshowsmonotonicdegradation.Despiteusinggradientinformation,Bingismostrobusttothesekindofchanges.

• Rotation:Allproposalmethodsareaffectedbyimagerotation.Therepeatabilitylossisduetomatchingrotatedboundingboxes.

• Illumination:Methodsbasedonsuperpixels areheavilyaffected.Bingismorerobust,likelyduetouseofgradientinformationwhichisknowntobefairlyrobusttoilluminationchanges.

• Blur:Therepeatabilityresultsagainexhibitasimilartrendalthoughthedropisstronger(incomparisonwithothereffects)forasmall.

• Saltandpeppernoise:Significantdegradationinrepeatabilityforthemajorityofthemethodsoccurswhenmerelytenpixelsaremodified.

4.ProposalRecall

• Ifrepeatabilityisaconcern,theproposalmethodshouldbeselectedwithcare.

• Forobjectdetection,anotheraspectofinterestisrecall.

Dataset Description

1. PASCAL Itincludes20objectcategoriesthatarepresentedinnearly5000unconstrained images.

2.ImageNet InlargerImageNet2013,thereare200categoriesinover20,000images.

Differenttypesofobjectsareincluded thatarenotinPASCAL.

ImageNet andPASCALhavethesamenumber ofobjects/imageandsizeofobjects.

3.MSCOCO MicrosoftCommonobjectsinContext(MSCOCO)hasmoreobjects/image,smallerobjects,butfewerobjectclasses(80objectcategories).

Overall,themethodsfallintotwogroups:1.WelllocalizedmethodsthatgraduallyloserecallastheIoU thresholdincreases.2.Methodsthatonlyprovidecoarseboundingboxlocations,sotheirrecalldropsrapidly.

5.UsingTheDetectionProposals

• Thisisananalysisofdetectionproposalstobeusedwithobjectdetection.

• Themain2goals:1. Measuringtheperformanceofproposal

methodsforobjectdetection.2. Theeffectofobjectproposalsmetriconfinal

detectionperformance.

5.1DetectorResponsesAroundObjects

• Itisnecessarytochecktheimportanceandrelationshipbetweenwelllocalizedproposals(highIOU)andobjectdetection(recall).

5.2LM-LLDA,R-CNNandFastR-CNNdetectionperformance

1. ApplyLM-LLDAmodelstogeneratedensedetectionsusingthestandardslidingwindow.

2. Applydifferentobjectproposalstofilterthesedetectionsattesttime.

*Thesestepsareusedtoevaluatetheeffectofproposalsondetectionquality.

• Usingonly1000proposals,thedetectionqualityisreduced.

• But,methodswithhighaveragerecall(AR)alsohavehighmeanaverageprecision(mAP),andviceversa.

• Fromtablebelow:(1)clearlyhurtperformance(bicycle,boat,bottle,car,chair,horse,mbike,person),reducingtherecallandprecisionbecauseofbadlocalization.(2)improveperformance(cat,table,dog),(3)donotshowsignificantchange(allremainingclasses).

• FastR-CNNafterre-trainingforeachmethod.• Intherightmostcolumn,FastR-CNNtrainedwith1000SelectiveSearch proposalsandappliedattesttimewithagivenproposalmethod,versusFastR-CNNtrainedforthetesttimeproposalmethod.

5.3Predictingdetectionperformance

RelatedWork:

FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks

Shaoqing Ren1,Kaiming He2,RossGirshick,andJianSun3

1UniversityofScienceandTechnologyofChina2MicrosoftResearch

3FacebookAIResearch

• Thisobjectdetectionsystemiscomposedoftwomodules.Thefirstmoduleisadeepfullyconvolutionalnetworkthatproposesregions,andthesecondmoduleistheFastR-CNNdetectorthatusestheproposedregions.

• TheRPNmoduletellstheFastR-CNNmodulewheretolook.

• ARegionProposalNetwork(RPN)takesanimage(ofanysize)asinputandoutputsasetofrectangularobjectproposals,eachwithanobjectness score.

• Forregionproposalsgeneration,slideasmallnetworkovertheconvolutionalfeaturemapoutputbythelastsharedconvolutionallayer.

• Thissmallnetworktakesasinputannxn spatialwindowoftheinputconvolutionalfeaturemap.

• Eachslidingwindowismappedtoalower-dimensionalfeature(256-dforZFand512-dforVGG,withReLU following).

• Thisfeatureisfedintotwosiblingfullyconnectedlayers—abox-regressionlayer(reg)andabox-classificationlayer(cls).

Conclusion• Thispaperrevisitsthemajorityofexistingdetectionproposalmethods,proposednewevaluationmetrics,andperformedanextensiveanddirectcomparisonofexistingmethods.

• Therepeatabilityofallproposalmethodsislimited:smallchangestoanimagecauseanoticeablechangeinthesetofproducedproposals.

• Forobjectdetection,improvingproposallocalizationaccuracy(improvedIoU)isasimportantasimprovingrecall.

• Tosimultaneouslymeasurebothproposalrecallandlocalizationaccuracy,averagerecall(AR)summarizesthedistributionofrecallacrossarangeofoverlapthresholds.

Strengths

• Thispaperprovidesanewmetric,AverageRecall(AC),thatrelatesbetweenaccuracy(recall)andgoodlocalization(IOU).

• Itdemonstratesdifferentevaluationprotocoltocomparebetweenproposalmethods(repeatability,recallandusingproposalmethodsforobjectdetection).

Weaknesses

• Thispaperdependsonlyon12proposalmethods,becausetheirimplementationsareavailable.

• Thebaselineproposalmethodsarenotalgorithms(uniform,Gaussian,slidingwindowandsuperpixels).

OverallRating• MyRatingScale(0-5):1ThenewperformancemetricwhichisAverageRecall(AC)isjustanAverageBestOverlap(ABO)withinrange0.5:1

Comparisonistakenplacebetween12proposalmethodsonly.

Documents

CAP 6412 Advanced Computer Visionbgong/CAP6412/lec5.pdf · CAP 6412 Advanced Computer Vision ... Human-centered CV 3D CV Low-level CV, etc. ... • Object detection proposals, by