Upload
lequynh
View
218
Download
0
Embed Size (px)
Citation preview
CS6140:MachineLearningSpring2017
Instructor:LuWangCollegeofComputerandInforma@onScience
NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang
Email:[email protected]
Logis@cs
• Assignment1isout– Due2/9/2017– Startearly!
Whatwelearnedlast@me
• Evalua@onmetrics
• DecisionTree
• Genera@veModels
• Genera@veModelandDiscrimina@veModel
• Logis@cRegression
ROCPlot
• Sensi@vity=a/(a+b)=Recall– Trueposi@verate
• 1-Specificity=1-d/(c+d)=c/(c+d)– Falseposi@verate
DecisionTree
• Playtennis?
• Eachinternalnode:testonefeatureXi• Eachbranchfromanode:selectsonevalueforXi• Eachleafnode:predictY(orP(Y|X∈leaf))
Top-DownInduc@onofDecisionTrees
• Whicha_ributetouseforsplit?
Top-DownInduc@onofDecisionTrees
• Whicha_ributetouseforsplit?• Goodsplitifwearemorecertainaboutclassifica@ona`ersplit– Determinis@cgood(alltrueorallfalse)– Uniformdistribu@onbad– Whataboutdistribu@oninbetween?
Informa@onGain
• Gain(S,A)=expectedreduc@oninentropyduetosor@ngonA
Informa@onGain
Overficng
Bayesianmodel
• H:Hypothesisspaceofpossibleconcepts• X:nexamplesofaconceptC• EvaluatehypothesesgivendatausingBayes’rule:
NaïveBayes
MaximumLikelihoodEs@ma@on
MaximumLikelihoodEs@ma@on
[email protected]@veModel
• P(Y|X)=p(X,Y)/P(X)
• Genera@vemodel– LearnP(X,Y)fromtrainingsample– P(X,Y)=P(Y)P(X|Y)– Specifieshowtogeneratetheobservedfeaturesxfory
• Discrimina@vemodel– LearnP(Y|X)fromtrainingsample– Directlymodelsthemappingfromfeaturesxtoy
Logis@cRegression
Sigmoidfunc@on
• Defini@on
Logis@cRegression
Logis@cRegression
MaximizingLogLikelihood
GradientDescent• Example
ChangingStepSize
AddingPrior
Today’sOutline
• Perceptron(andkernels)
• SupportVectorMachines
Perceptron
[SomeoftheslidesareborrowedfromAlexSmola’stutorial]
BiologyandLearning
• Idea1:Goodbehaviorshouldberewarded,badbehaviorpunished(ornotawarded).– Raisingadog.
• Idea2:Correlatedeventsshouldbecombined.– Babieslearnlanguage.
BiologyandLearning
• Idea1:Goodbehaviorshouldberewarded,badbehaviorpunished(ornotawarded).– Raisingadog.
• Idea2:Correlatedeventsshouldbecombined.– Babieslearnlanguage.
• TrainingMechanisms– Behavioralmodifica@onofindividuals(learning)
• Feedingthedog,thenthedoglearnstostandandsit.– Hard-codedbehaviorinthegenes(ins@nct)
• Thewronglycodedanimaldies.
Neurons
Perceptron
Perceptron
• Weightedcombina@on– Theoutputoftheneuronisalinearcombina@onoftheinputs
• DecisionFunc@on– Attheendtheresultsarecombinedinto
Perceptron
• Anabstractmodelistoassumethat
– Wherewistheweight,xisthefeaturevectors
– bisthebias,• BiologicalInterpreta@on– Theweightswicorrespondtothesynap@cweights,themul@plica@oncorrespondstotheprocessingofinputsviathesynapses,andthesumma@onisthecombina@onofsignalsinthecellbody(soma).
LearningGoal:LinearSepara@on
PerceptronAlgorithm
PerceptronAlgorithm• Nothinghappensifweclassify(xi,yi)correctly• Ifweseeincorrectlyclassifiedobserva@onweupdatew
andb• Posi@vereinforcementofobserva@ons
PerceptronAlgorithm• Aboutthesolu@on– Weightvectorislinearcombina@onofobserva@onsxi:
PerceptronAlgorithm• Aboutthesolu@on– Classifica@oncanbewri_enintermsofdotproducts:
Pseudocode
TheXORProblem
TheXORProblem
• Perceptronscannotlearnsuchlinearlyinseparablefunc@ons!
Problem
• Linearfunc@onsareo`entoosimpletoprovidegoodes@mators.
Problem
• Linearfunc@onsareo`entoosimpletoprovidegoodes@mators.
• Idea:– Maptoahigherdimensionalfeaturespacevia
– Replaceevery by intheperceptronalgorithm.
PerceptrononFeatures
ProblemswithConstruc@ngFeatures
ProblemswithConstruc@ngFeatures
• Needtobeanexpertinthedomain(e.g.Chinesecharacters).
• Canbeexpensivetocompute.
PolynomialFeatures• Dimension=1• Dimension=2
• Dimension=d(skipproof)
Kernels
• Defini@on• Akernelfunc@on isasymmetricfunc@oninitsargumentsforwhichthefollowingpropertyholds
Somechoicesofkernelfunc@ons
RBFkernel:Radialbasisfunc@onkernel
LinearKernel
LaplacianKernel
GaussianKernel
KernelPerceptron
LinearSeparators
• Whichoftheselinearseparatorsisop@mal?
[SomeoftheslidesareborrowedfromDavidSontag’slecture]
Outline
• Perceptron(andkernels)
• SupportVectorMachines
SupportVectorMachine(SVM)
• SVMs(Vapnik,1990’s)choosethelinearseparatorwiththelargestmargin.
Hyperplane
SupportVectorMachine(SVM)
• Reasons:
– Intui@on– Theore@calguarantee(skiphere)– Inprac@caltasks:SVMbecamefamouswhen,usingimagesasinput,it
gaveaccuracycomparabletoneural-networkwithhand-designedfeaturesinahandwri@ngrecogni@ontask.
SupportVectorMachine(SVM)
• Howtofindthehyperplane?
Hyperplane
Planes
• Aplanecanbespecifiedasthesetofpointsgivenby
Planes
• Aplanecanbespecifiedasthesetofpointsgivenby
NormalVector:decidethedirec@onoftheplane
Normaltoaplane
Lengthofthevector
Scaleinvariance
Whatisthedistance ?γ
Finalresult:canmaximizemarginbyminimizing
Supportvectormachines
Whatifthedataisnotlinearlyseparable?
Whatifthedataisnotlinearlyseparable?
• Morefeatures
Whatifthedataisnotlinearlyseparable?
Oldobjec@ve
Newobjec@ve
Whatifthedataisnotlinearlyseparable?
Oldobjec@ve
Newobjec@ve
Jointlyminimizew.wandnumberoftrainingmistakes!
Whatifthedataisnotlinearlyseparable?
Allowingforslack:“So`marginSVM”
Allowingforslack:“So`marginSVM”
PopularToolsforSVMs
• LIBSVM(c++)– h_ps://www.csie.ntu.edu.tw/~cjlin/libsvm/
• SVMlight(c)– h_p://svmlight.joachims.org/
• Scikit-learn(python)– h_p://scikit-learn.org/
PopularToolsforSVMs
• Torch(LuaJIT)– h_p://torch.ch/
• Spider(Matlab)
• Weka(Java)
Howdoweop@mizetheobjec@ve?
• Quadra@cprogramming
Kernels
• Defini@on• Akernelfunc@on isasymmetricfunc@oninitsargumentsforwhichthefollowingpropertyholds
Howdoweop@mizetheobjec@ve?
• Quadra@cprogramming
Noplacetoapplythekerneltrick
Constrainedop@miza@on
Constrainedop@miza@on
Constrainedop@miza@on
Lagrangemul@pliers–Dualvariables
• LagrangeMul@pliers
Howdowesolvewithconstraints?
• LagrangeMul@pliers
Lagrangemul@pliers–Dualvariables
BacktoSVM(hardmargin)
DualSVMderiva@on
DualSVMderiva@on
Slater’scondi@onfromconvexop@miza@onguaranteesthatthesetwoop@miza@onproblemsareequivalent!(skipproof)
DualSVMderiva@on
DualSVMderiva@on
Togetwandb
Classifica@onruleusingdualsolu@onUsingdualsolu@ondotproduct
Dualforthenon-separablecase
Dualforthenon-separablecase
Howtointerpretdualform
Backtotheques@on:Whatifthedataisnotlinearlyseparable?
Forexample:Higherorderpolynomials
Dualformula@ononlydependsondot-productsofthefeatures!
Dualformula@ononlydependsondot-productsofthefeatures!
Kernels
• Defini@on• Akernelfunc@on isasymmetricfunc@oninitsargumentsforwhichthefollowingpropertyholds
KernelTrick
So`marginSVMwithkernel
CommonkernelsforSVM
AkaGaussianRadialbasisfunc@on(RBF)kernel
Overficng
• Hugefeaturespacewithkernels:shouldweworryaboutoverficng?– SVMobjec@veseeksasolu@onwithlargemargin– Goodtheore@calguarantee– Buteverythingoverfitssome@mes
Overficng
• Hugefeaturespacewithkernels:shouldweworryaboutoverficng?– SVMobjec@veseeksasolu@onwithlargemargin– Goodtheore@calguarantee– Buteverythingoverfitssome@mes
• Cancontrolby:– SecngC– Choosingabe_erkernel– Varyingparametersofthekernels
Dualforthenon-separablecase
LinearSVMC=50
LinearSVMC=50
LinearSVMC=50
LinearSVMC=50
Insights
• ChangingC– ForcleandataCdoesn’tma_ermuch.– Fornoisydata,largeCleadstonarrowmargin(SVMtriestodoagoodjobatsepara@ng,eventhoughitisn’tpossible)
Insights
• ChangingC– ForcleandataCdoesn’tma_ermuch.– Fornoisydata,largeCleadstonarrowmargin(SVMtriestodoagoodjobatsepara@ng,eventhoughitisn’tpossible)
• Noisydata– Cleandatahasfewsupportvectors– Noisydataleadstodatainthemargins– Moresupportvectorsfornoisydata
GaussianRBFKernelwithC=0.1
GaussianRBFKernelwithC=0.2
GaussianRBFKernelwithC=0.4
GaussianRBFKernelwithC=0.8
GaussianRBFKernelwithC=1.6
GaussianRBFKernelwithC=3.2
GaussianRBFKernelwithC=6.4
GaussianRBFKernelwithC=12.8
Insights
• ChangingC– ForcleandataCdoesn’tma_ermuch.– Fornoisydata,largeCleadstomorecomplicatedmargin(SVMtriestodoagoodjobatsepara@ng,eventhoughitisn’tpossible)
– OverfiMngforlargeC• Noisydata– Cleandatahasfewsupportvectors– Noisydataleadstodatainthemargins– Moresupportvectorsfornoisydata
CommonkernelsforSVM
AkaGaussianRadialbasisfunc@on(RBF)kernel
GaussianRBFwithdifferent
GaussianRBFwithdifferent
GaussianRBFwithdifferent
GaussianRBFwithdifferent
Insights
• Changing– Forcleandata, doesn’tma_ermuch.– Fornoisydata,small leadstomorecomplicatedmargin(SVMtriestodoagoodjobatsepara@ng,eventhoughitisn’tpossible)
– Lotsofoverficngforsmall• Noisydata– Cleandatahasfewsupportvectors– Noisydataleadstodatainthemargins– Moresupportvectorsfornoisydata
Homework(partofassignment2)
• Studythe“Sequen@alMinimalOp@miza@on”algorithmandimplementanSVMclassifierbyyourself
• References– h_p://cs229.stanford.edu/materials/smo.pdf– FastTrainingofSupportVectorMachinesusingSequen@alMinimalOp@miza@on
– h_p://research.microso`.com/pubs/68391/smo-book.pdf
Whatwelearnedtoday
• Perceptron(andkernels)
• SupportVectorMachines
Homework
• ReadMurphyCH14.1-14.2,14.4-14.5.• Assignment1isout.Dueintwoweeks.