CAP5415-Computer Vision Lecture 13-Support Vector Machines ...bagci/teaching/computervision15/lec13.pdf · Lecture 13: Support Vector Machines for Computer Vision Applicaons denotes

CAP5415-ComputerVisionLecture13-SupportVectorMachinesfor

ComputerVisionApplica=ons

GuestLecturer:Dr.BoqingGong

[email protected]

10/6/15Lecture13:SupportVectorMachinesforComputerVisionApplica=ons

1

Reminders•  October14– Chooseyourmini-projects(both).– Sendemailwithashortproposal/explana=on.

•  October8–  DueforProgrammingAssignment#3

10/6/15

2

Lecture13:SupportVectorMachinesforComputerVisionApplica=ons

PaWernClassifica=onProblem•  Supposewearegiventwoclassesofobjects,wearethenfacedwithanewobjectandwehavetoassignittooneofthetwoclasses.

10/6/15

3


Mo=va=on10/6/15

4


denotes+1denotes-1

Howwouldyouclassifythisdata?

Mo=va=on10/6/15

5


denotes+1denotes-1


Mo=va=on10/6/15

6


denotes+1denotes-1


Mo=va=on10/6/15

7


denotes+1denotes-1


Mo=va=on10/6/15

8


denotes+1denotes-1

Anyofthesewouldbefine....butwhichisbest?

ClassifierDesign10/6/15

9


denotes+1denotes-1 Definethemarginof

alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehi`ngadatapoint.

MaximumMargin10/6/15

10


denotes+1denotes-1 Themaximum

marginlinearclassifieristhelinearclassifierwiththe,um,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)


11




SupportVectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM


12




SupportVectorsarethosedatapointsthatthemarginpushesupagainst

1.  Intuitively this feels safest.

2.  If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.

3.  LOOCV is easy since the model is immune to removal of any non-support-vector datapoints.

4.  There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing.

5.  Empirically it works very very well.

SVM•  Asupervisedapproachforclassifica=onandregression

10/6/15

13


SVM•  Asupervisedapproachforclassifica=onandregression– Developedinthecomputersciencesocietyat1990s,hasgrowninpopularitysincethen.

10/6/15

14


SVM•  Asupervisedapproachforclassifica=onandregression– Developedinthecomputersciencesocietyat1990s,hasgrowninpopularitysincethen.

– Showntoperformwellinvarietyofse`ngs,andocenconsideredasoneofthebest“out-of-thebox”classifiers.

10/6/15

15


SVM•  Anapproachforclassifica=on– Developedinthecomputersciencesocietyat1990s,hasgrowninpopularitysincethen.

– Showntoperformwellinvarietyofse`ngs,andocenconsideredasoneofthebest“out-of-thebox”classifiers.

10/6/15

16


Max-marginClassifier

SupportVectorClassifier

SupportVector

Machines

(historicalappearancesintheliterature)

Maximal-MarginClassifier•  Inap-dimensionalspace,ahyperplaneisaflataffinesubspaceofdimensionp-1.

10/6/15

17


Maximal-MarginClassifier•  Inap-dimensionalspace,ahyperplaneisaflataffinesubspaceofdimensionp-1.– Ex.In2D,ahyperplaneis1Dline,– Ex.In3D,ahyperplaneis2Dplane.

10/6/15

18



10/6/15

19


�0 + �1X1 + �2X2 + ....�pXp = 0Generalhyperplanedefini=on


10/6/15

20


�0 + �1X1 + �2X2 = 0


Hyperplanefor2Ddata.


10/6/15

21


�0 + �1X1 + �2X2 = 0


Hyperplanefor2Ddata.

1 + 2X1 + 3X2 = 0

1 + 2X1 + 3X2 > 0

1 + 2X1 + 3X2 < 0

Max-marginclassifier10/6/15

22


Lec:Therearetwoclassesofobserva=ons:blueandinpurple(eachofwhichhasmeasurementsontwovariables).Threesepara=nghyperplanes,outofmanypossible,areshowninblack.Right:Asepara=nghyperplaneisshowninblack:atestobserva=onthatfallsinthebluepor=onofthegridwillbeassignedtotheblueclass,andatestobserva=onthatfallsintothepurplepor=onofthegridwillbeassignedtothepurpleclass.

Max-marginclassifier10/6/15

23


Therearetwoclassesofobserva=ons,showninblueandinpurple.Themaximalmarginhyperplaneisshownasasolidline.Themarginisthedistancefromthesolidlinetoeitherofthedashedlines.Thetwobluepointsandthepurplepointthatlieonthedashedlinesarethesupportvectors,andthedistancefromthosepointstothemarginisindicatedbyarrows.Thepurpleandbluegridindicatesthedecisionrulemadebyaclassifierbasedonthissepara=nghyperplane.

Construc=onofMax-MarginClassifier•  Considerntrainingobserva=ons

10/6/15

24


x1, . . . , xn 2 Rp

Construc=onofMax-MarginClassifier•  Considerntrainingobserva=ons•  andassociatedclasslabels

10/6/15

25


x1, . . . , xn 2 Rp

y1, . . . , yn 2 {�1,+1}

Construc=onofMax-MarginClassifier•  Considerntrainingobserva=ons•  andassociatedclasslabels•  Briefly,max-marginhyperplaneisthesolu=ontotheop=miza=onproblem:

10/6/15

26


x1, . . . , xn 2 Rp

y1, . . . , yn 2 {�1,+1}

Construc=onofMax-MarginClassifier•  Considerntrainingobserva=ons•  andassociatedclasslabels•  Briefly,max-marginhyperplaneisthesolu=ontotheop=miza=onproblem:

•  Thisequa=onensuresthateachobserva=onisonthecorrectsideofthehyperplaneandatleastadistanceMfromthehyperplane.Hence,Mrepresentsthemarginofourhyperplane.

10/6/15

27


x1, . . . , xn 2 Rp

y1, . . . , yn 2 {�1,+1}

Non-SeparableCase10/6/15

28


ThemaximalmarginclassifierisaverynaturalwaytoperformClassifica=on,ifasepara=nghyperplaneexists.Inmanycasesthereisnosepara=nghyperplaneexists!Inthiscase,wecannotexactlyseparatethetwoclasses.(However,No=cesocmargininthefollowingslides!)Thegeneraliza=onofthemaximalmarginclassifiertothenon-separablecaseisknownasthesupportvectorclassifier.

SupportVectorClassifier-Separable10/6/15

29


•  Separableplaneisshown.

•  Decisionboundaryisthesolidline.•  BrokenlinesboundtheShadedmaximalmarginof2M.

SupportVectorClassifier-Overlap10/6/15

30


•  Overlapcaseisshown.

•  ThepointslabeledξiareonthewrongsideoftheMargin.•  Themarginismaximizedsubjecttoatotalbudget

X⇠i constant

IsMax-MarginRobust?10/6/15

31


Lec:Twoclassesofobserva=onsareshowninblueandinpurple,alongwiththemaximalmarginhyperplane.Right:Anaddi=onalblueobserva=onhasbeenadded,leadingtoadrama=cshicinthemaximalmarginhyperplaneshownasasolidline.ThedashedlineindicatesthemaximalMarginhyperplanethatwasobtainedintheabsenceofthisaddi=onalpoint.Max-marginclassifierissensi=vetoindividualobserva=ons!

Soc-Margin~SupportVectorClassifier•  Ratherthanseekingthelargestpossiblemarginsothatevery

observa=onisnotonlyonthecorrectsideofthehyperplanebutalsoonthecorrectsideofthemargin,weinsteadallowsomeobserva=onstobeontheincorrectsideofthemargin,oreventheincorrectsideofthehyperplane.

10/6/15

32


Soc-Margin~SupportVectorClassifier•  Ratherthanseekingthelargestpossiblemarginsothatevery

observa=onisnotonlyonthecorrectsideofthehyperplanebutalsoonthecorrectsideofthemargin,weinsteadallowsomeobserva=onstobeontheincorrectsideofthemargin,oreventheincorrectsideofthehyperplane.

•  C:nonnega=vetuningparams,eps:slackvariablesallowingobs.tobeonthewrongSideofthemargin.

10/6/15

33


SVClassifier10/6/15

34


Le#:Asupportvectorclassifierwasfittoasmalldataset.Thehyperplaneisshownasasolidlineandthemarginsareshownasdashedlines.Purpleobserva=ons:Observa=ons3,4,5,and6areonthecorrectsideofthemargin,observa=on2isonthemargin,andobserva=on1isonthewrongsideofthemargin.Blueobserva=ons:Observa=ons7and10areonthecorrectsideofthemargin,observa=on9isonthemargin,andobserva=on8isonthewrongsideofthemargin.Noobserva=onsareonthewrongsideofthehyperplane.Right:Twoaddi=onalpoints,11and12areadded.

SVClassifier10/6/15

35


FourdifferenttuningparamCWereusedtofitSVMintoSmallnumberofdata.ThelargestvalueofCwasusedinthetoplecpanel,andsmallervalueswereusedinthetopright,boWomlec,andboWomrightpanels.WhenCislarge,thenthereisahightoleranceforobserva=onsBeingonthewrongsideofthemargin,andsothemarginwillbelarge.AsCdecreases,thetoleranceforobserva=onsbeingonthewrongsideofthemargindecreases,andthemarginnarrows.

SVClassifier10/6/15

36


Inprac,cewearesome,mesfacedwithnon-linearclassboundaries.SupportvectorclassifieroranylinearclassifierwillperformpoorlyhereLe#:Theobserva=onsfallintotwoclasses,withanon-linearboundarybetweenthem.Right:Thesupportvectorclassifierseeksalinearboundary,andconsequentlyperformsverypoorly.

Non-LinearSupportVectorMachines10/6/15

37


n  Generalidea:theoriginalinputspacecanalwaysbemappedtosomehigher-dimensionalfeaturespacewherethetrainingsetisseparable:

Φ: x → φ(x)

TheKernelTrickn  The linear classifier relies on dot product between vectors K(xi,xj)=xi

Txj

n  If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes:

K(xi,xj)= φ(xi) Tφ(xj)

n  A kernel function is some function that corresponds to an inner product in

some expanded feature space.

n  Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi

Txj)2,

Need to show that K(xi,xj)= φ(xi) Tφ(xj):

K(xi,xj)=(1 + xiTxj)2

,

= 1+ xi12xj1

2 + 2 xi1xj1 xi2xj2+ xi2

2xj22 + 2xi1xj1 + 2xi2xj2

= [1 xi12 √2 xi1xi2 xi2

2 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj2

2 √2xj1 √2xj2] = φ(xi)

Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x2

2 √2x1 √2x2]

10/6/15

38


ExampleofKernelFunc=ons10/6/15

39


n  Linear:K(xi,xj)=xiTxj

n  Polynomialofpowerp:K(xi,xj)=(1+xiTxj)p

n  Gaussian(radial-basisfunc=onnetwork):

n  Sigmoid:K(xi,xj)=tanh(β0xiTxj+β1)

)2

exp(),( 2

2

σji

ji

xxxx

−−=K

Non-linearSVMMathema=cally10/6/15

40


n  Dual problem formulation:

n  The solution is:

n  Optimization techniques for finding αi’s remain the same!

Find α1…αN such that Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi

f(x) = ΣαiyiK(xi, xj)+ b

Formula=ngtheop=miza=onproblem10/6/15

41


41

iξ

Var1

Var2 1w x b⋅ + = −r r

1w x b⋅ + =r r

0=+⋅ bxw !!11

w!

iξ

Constraintbecomes:

Objec=vefunc=onpenalizesformisclassifiedinstancesandthosewithinthemargin

Ctrades-offmarginwidthandmisclassifica=ons

( ) 1 , 0

i i i i

i

y w x b xξξ

⋅ + ≥ − ∀≥

21min2 i

iw C ξ+ ∑

Non-LinearSVMOverviewn  SVMlocatesasepara=nghyperplaneinthefeaturespaceandclassifypointsinthatspace

n  Itdoesnotneedtorepresentthespaceexplicitly,simplybydefiningakernelfunc=on

n  Thekernelfunc=onplaystheroleofthedotproductinthefeaturespace.

10/6/15

42


10/6/15

43


43

DisadvantagesofLinearDecisionSurfaces

Var1

Var2

10/6/15Lecture13:SupportVectorMachinesforComputerVisionApplica=ons

Advantagesofnon-linearDecisionSurfaces

44

Var1

Var2

SVMClassifier10/6/15

45


Le#:AnSVMwithapolynomialkernelofdegree3isappliedtothenon-lineardatafrompreviousslides,resul=nginafarmoreappropriatedecisionrule.Right:AnSVMwitharadialkernelisapplied.Inthisexample,eitherkerneliscapableofcapturingthedecisionboundary.

Mul=-classSVM•  SVMscanonlyhandletwo-classoutputs(i.e.acategoricaloutputvariablewitharity2).

•  Whatcanbedone?•  Answer:withoutputarityN,learnNSVM’s

–  SVM1learns“Output==1” vs“Output!=1”–  SVM2learns“Output==2” vs“Output!=2”–  :–  SVMNlearns“Output==N” vs“Output!=N”

•  Thentopredicttheoutputforanewinput,justpredictwitheachSVMandfindoutwhichoneputsthepredic=onthefurthestintotheposi=veregion.

10/6/15

46


Trade-offBetweenFlexibilityandInterpretability10/6/15

47


Ingeneral,astheflexibilityofamethodincreases,itsinterpretabilitydecreases.

WhydoSVMgeneralize?•  Eventhoughtheymaptoaveryhigh-dimensionalspace– Theyhaveaverystrongbiasinthatspace– Thesolu=onhastobealinearcombina=onofthetraininginstances

•  LargetheoryonStructuralRiskMinimiza=onprovidingboundsontheerrorofanSVM– Typicallytheerrorboundstooloosetobeofprac=caluse

10/6/15

48


Prac=calIssues•  Choiceofkernel-Gaussianorpolynomialkernelisdefault-ifineffec=ve,moreelaboratekernelsareneeded-domainexpertscangiveassistanceinformula=ngappropriate

similaritymeasures•  Choiceofkernelparameters-e.g.σinGaussiankernel-σisthedistancebetweenclosestpointswithdifferent

classifica=ons-Intheabsenceofreliablecriteria,applica=onsrelyontheuse

ofavalida=onsetorcross-valida=ontosetsuchparameters.•  Op=miza=oncriterion–Hardmarginv.s.Socmargin-alengthyseriesofexperimentsinwhichvariousparametersare

tested

10/6/15

49


CVApplica=onofSVM:HumanDetec=on10/6/15

50


FinalFeatureVectorsGotoSVM

CVApplica=onofSVM:PedestrianDetec=on10/6/15

51


Featurevectors:HOG:histogramofgradients

CVApplica=onofSVM:PedestrianDetec=on10/6/15

52


ReferencesandSliceCredits•  AnexcellenttutorialonVC-dimensionandSupportVectorMachines:

C.J.C.Burges.AtutorialonsupportvectormachinesforpaWernrecogni=on.DataMiningandKnowledgeDiscovery,2(2):955-974,1998.hWp://citeseer.nj.nec.com/burges98tutorial.html

•  TheVC/SRM/SVMBible:Sta=s=calLearningTheorybyVladimirVapnik,Wiley-Interscience;1998

•  AndrewW.Moore,CMUJames,WiWen,Has=e,Tibshirani:AnIntroduc=ontoSta=s=calLearning

10/6/15

53


Documents

CAP5415-Computer Vision Lecture 13-Support Vector Machines ...bagci/teaching/computervision15/lec13.pdf · Lecture 13: Support Vector Machines for Computer Vision Applicaons denotes