51
CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen Thananjeyan and Aditya Baradwaj --- University of California, Berkeley [These slides were createdby Dan Klein and Pieter Abbeel for CS188 Introto AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

CS188:ArtificialIntelligenceKernelsandClustering

Instructors:BrijenThananjeyanandAdityaBaradwaj--- UniversityofCalifornia,Berkeley[TheseslideswerecreatedbyDanKleinandPieterAbbeelforCS188 IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

Page 2: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

AProbabilistic Perceptron

Page 3: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

A1DExample

definitely blue definitely rednot sure

probability increases exponentially as we move away from boundary

normalizer

Page 4: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

TheSoftMax

Page 5: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

HowtoLearn?

§ Maximumlikelihoodestimation

§ Maximumconditional likelihoodestimation

Page 6: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Local Search

o Simple, general idea:o Start wherevero Repeat: move to the best neighboring stateo If no neighbors better than current, quito Neighbors = small perturbations of w

Page 7: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Our Status

o Our objective

o Challenge: how to find a good w ?

o Equivalently:

ll(w)

maxw

ll(w)

minw

�ll(w)

Page 8: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

1D optimization

o Could evaluate ando Then step in best direction

o Or, evaluate derivative:

o Which tells which direction to step into

w

g(w)

w0

g(w0)

g(w0 + h) g(w0 � h)

@g(w0)

@w= lim

h!0

g(w0 + h)� g(w0 � h)

2h

Page 9: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

2-D Optimization

Source: Thomas Jungblut’s Blog

Page 10: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Steepest Descento Idea:

o Start somewhereo Repeat: Take a step in the steepest descent direction

Figure source: Mathworks

Page 11: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Steepest Direction

o Steepest Direction = direction of the gradient (points up the hill)

rg =

2

6664

@g@w1@g@w2

· · ·@g@wn

3

7775

Page 12: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

HowtoLearn?

Page 13: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Optimization Procedure: Gradient Descent

Page 14: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Stochastic Gradient Descent

probability of incorrect answer probability of incorrect answer

compare this to the multiclass perceptron: probabilistic weighting!

Page 15: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Non-SeparableData

Page 16: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Case-BasedReasoning

§ Classificationfromsimilarity§ Case-basedreasoning§ Predictaninstance’slabelusingsimilarinstances

§ Nearest-neighborclassification§ 1-NN:copythelabelofthemostsimilardatapoint§ K-NN:votetheknearestneighbors(needaweighting

scheme)§ Keyissue:howtodefinesimilarity§ Trade-offs:Smallkgivesrelevantneighbors,Largekgives

smootherfunctions

http://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html

Page 17: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Parametric/Non-Parametric

§ Parametricmodels:§ Fixedsetofparameters§ Moredatameansbettersettings

§ Non-parametricmodels:§ Complexityoftheclassifierincreaseswithdata§ Betterinthelimit,oftenworseinthenon-limit

§ (K)NNisnon-parametric Truth

2Examples 10Examples 100Examples 10000Examples

Page 18: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Nearest-NeighborClassification

§ Nearestneighborfordigits:§ Takenewimage§ Comparetoalltrainingimages§ Assignbasedonclosestexample

§ Encoding:imageisvectorofintensities:

§ What’sthesimilarityfunction?§ Dotproductoftwoimagesvectors?

§ Usuallynormalizevectorsso||x||=1§ min=0(when?),max=1(when?)

0

1

2

0

1

2

Page 19: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

SimilarityFunctions

Page 20: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

BasicSimilarity

§ Manysimilaritiesbasedonfeaturedotproducts:

§ Iffeaturesarejustthepixels:

§ Note:notallsimilaritiesareofthisform

Page 21: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

InvariantMetrics

§ Bettersimilarityfunctionsuseknowledgeaboutvision§ Example:invariantmetrics:

§ Similaritiesareinvariantundercertaintransformations§ Rotation,scaling,translation,stroke-thickness…§ E.g:

§ 16x16=256pixels;apointin256-dimspace§ ThesepointshavesmallsimilarityinR256(why?)

§ Howcanweincorporatesuchinvariances intooursimilarities?

ThisandnextfewslidesadaptedfromXiaoHu,UIUC

Page 22: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

RotationInvariantMetrics

§ EachexampleisnowacurveinR256

§ Rotationinvariantsimilarity:

s’=maxs(r(),r())

§ E.g.highestsimilaritybetweenimages’rotationlines

Page 23: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

TangentFamilies

§ Problemswiths’:§ Hardtocompute§ Allowslargetransformations(e.g.6® 9)

§ Tangentdistance:§ 1storderapproximationatoriginalpoints.

§ Easytocompute§ Modelssmallrotations

Page 24: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

ATaleofTwoApproaches…

§ Nearestneighbor-likeapproaches§ Canusefancysimilarityfunctions§ Don’tactuallygettodoexplicitlearning

§ Perceptron-likeapproaches§ Explicittrainingtoreduceempiricalerror§ Can’tusefancysimilarity,onlylinear§ Orcanthey?Let’sfindout!

Page 25: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Kernelization

Page 26: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

PerceptronWeights

§ Whatisthefinalvalueofaweightwy ofaperceptron?§ Canitbeanyrealvector?§ No!It’sbuiltbyaddingupinputs.

§ Canreconstructweightvectors(theprimalrepresentation)fromupdatecounts(thedualrepresentation)

Page 27: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

DualPerceptron

§ Howtoclassifyanewexamplex?

§ IfsomeonetellsusthevalueofKforeachpairofexamples,neverneedtobuildtheweightvectors(orthefeaturevectors)!

Page 28: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

DualPerceptron

§ Startwithzerocounts(alpha)§ Pickuptraininginstancesonebyone§ Trytoclassifyxn,

§ Ifcorrect,nochange!§ Ifwrong:lowercountofwrongclass(forthisinstance),raise

countofrightclass(forthisinstance)

y = argmaxy

X

i

↵i,yK(xi, xn)

wy = wy � f(xn)

wy⇤ = wy⇤ + f(xn)

Page 29: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

KernelizedPerceptron

§ Ifwehadablackbox(kernel)Kthattoldusthedotproductoftwoexamplesxandx’:§ Couldworkentirelywiththedualrepresentation§ Noneedtoevertakedotproducts(“kerneltrick”)

§ Likenearestneighbor– workwithblack-boxsimilarities§ Downside:slowifmanyexamplesgetnonzeroalpha

Page 30: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Kernels:WhoCares?

§ Sofar:averystrangewayofdoingaverysimplecalculation

§ “Kerneltrick”:wecansubstituteany* similarityfunctioninplaceofthedotproduct

§ Letsuslearnnewkindsofhypotheses

*Fineprint:ifyourkerneldoesn’t satisfycertaintechnicalrequirements, lotsofproofsbreak.E.g.convergence,mistakebounds. Inpractice,illegalkernelssometimeswork(butnotalways).

Page 31: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Non-Linearity

Page 32: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Non-LinearSeparators

§ Datathatislinearlyseparableworksoutgreatforlineardecisionrules:

§ Butwhatarewegoingtodoifthedatasetisjusttoohard?

§ Howabout…mappingdatatoahigher-dimensionalspace:

0

0

0

x2

x

x

x

ThisandnextfewslidesadaptedfromRayMooney,UT

Page 33: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Non-LinearSeparators

§ Generalidea:theoriginalfeaturespacecanalwaysbemappedtosomehigher-dimensionalfeaturespacewherethetrainingsetisseparable:

Φ: x→ φ(x)

Page 34: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

SomeKernels

§ Kernelsimplicitlymaporiginalvectorstohigherdimensionalspaces,takethedotproductthere,andhandtheresultback

§ Linearkernel:

§ Quadratickernel:

§ RBF:infinitedimensionalrepresentation

§ Discretekernels:e.g.stringkernels

Page 35: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

WhyKernels?

§ Can’tyoujustaddthesefeaturesonyourown(e.g.addallpairsoffeaturesinsteadofusingthequadratickernel)?

Page 36: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

WhyKernels?

§ Can’tyoujustaddthesefeaturesonyourown(e.g.addallpairsoffeaturesinsteadofusingthequadratickernel)?§ Yes,inprinciple,justcomputethem§ Noneedtomodifyanyalgorithms§ But,numberoffeaturescangetlarge(orinfinite)

§ Kernelsletuscomputewiththesefeaturesimplicitly§ Example:implicitdotproductinquadratickerneltakesmuchlessspaceandtimeperdotproduct

§ Ofcourse,there’sthecostforusingthepuredualalgorithms:youneedtocomputethesimilaritytoeverytrainingdatum

Page 37: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Clustering(Outofscopeforfinal)

Page 38: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Clustering

§ Clusteringsystems:§ Unsupervisedlearning§ Detectpatterns inunlabeleddata

§ E.g.groupemailsorsearchresults§ E.g.findcategoriesofcustomers§ E.g.detectanomalousprogramexecutions

§ Usefulwhendon’tknowwhatyou’relookingfor

§ Requiresdata,butnolabels§ Oftengetgibberish

Page 39: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Clustering

§ Basicidea:grouptogethersimilarinstances§ Example:2Dpointpatterns

§ Whatcould“similar”mean?§ Oneoption:small(squared)Euclideandistance

Page 40: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-Means

Page 41: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-Means

§ Aniterativeclusteringalgorithm§ PickKrandompointsasclustercenters(means)

§ Alternate:§ Assigndatainstancestoclosestmean

§ Assigneachmeantotheaverageofitsassignedpoints

§ Stopwhennopoints’assignmentschange

Page 42: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-MeansExample

Page 43: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-MeansasOptimization

§ Considerthetotaldistancetothemeans:

§ Eachiterationreducesphi

§ Twostageseachiteration:§ Updateassignments:fixmeansc,changeassignmentsa§ Updatemeans:fixassignmentsa,changemeansc

pointsassignments

means

Page 44: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

PhaseI:UpdateAssignments

§ Foreachpoint,re-assigntoclosestmean:

§ Canonlydecreasetotaldistancephi!

Page 45: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

PhaseII:UpdateMeans

§ Moveeachmeantotheaverageofitsassignedpoints:

§ Alsocanonlydecreasetotaldistance…(Why?)

§ Funfact:thepointywithminimumsquaredEuclideandistancetoasetofpoints{x}istheirmean

Page 46: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

Initialization

§ K-meansisnon-deterministic§ Requiresinitialmeans§ Itdoesmatterwhatyoupick!§ Whatcangowrong?

§ Variousschemesforpreventingthiskindofthing:variance-basedsplit/merge,initializationheuristics

Page 47: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-MeansGettingStuck

§ Alocaloptimum:

Whydoesn’tthisworkoutliketheearlierexample,withthepurpletakingoverhalftheblue?

Page 48: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

K-MeansQuestions

§ WillK-meansconverge?§ Toaglobaloptimum?

§ Willitalwaysfindthetruepatternsinthedata?§ Ifthepatternsareveryvery clear?

§ Willitfindsomethinginteresting?

§ Dopeopleeveruseit?

§ Howmanyclusterstopick?

Page 49: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

AgglomerativeClustering

Page 50: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

AgglomerativeClustering

§ Agglomerativeclustering:§ Firstmergeverysimilarinstances§ Incrementallybuild largerclustersoutof

smallerclusters

§ Algorithm:§ Maintainasetofclusters§ Initially,eachinstanceinitsowncluster§ Repeat:

§ Pickthetwoclosestclusters§ Mergethemintoanewcluster§ Stopwhenthere’sonlyoneclusterleft

§ Producesnotoneclustering,butafamilyofclusterings representedbyadendrogram

Page 51: Kernels and Clustering - University of California, Berkeleycs188/su19/assets/slides/... · 2019. 8. 6. · CS 188: Artificial Intelligence Kernels and Clustering Instructors: Brijen

AgglomerativeClustering

§ Howshouldwedefine“closest”forclusterswithmultipleelements?

§ Manyoptions§ Closestpair (single-linkclustering)§ Farthestpair (complete-linkclustering)§ Averageofallpairs§ Ward’smethod(minvariance,likek-means)

§ Differentchoicescreatedifferentclusteringbehaviors