What is Learning? Machine Learning: Introduction and Unsupervised

Preview:

Citation preview

1

MachineLearning:Introduc2onand

UnsupervisedLearning

Chapter18.1,18.2,18.8.1and“Introduc2ontoSta2s2calMachineLearning”

Optional: “A Few Useful Things to Know about Machine Learning,” P. Domingos, Comm. ACM 55, 2012

WhatisLearning?

•  “Learningismakingusefulchangesinourminds”–MarvinMinsky

•  “Learningisconstruc7ngormodifyingrepresenta7onsofwhatisbeingexperienced“–RyszardMichalski

•  “Learningdenoteschangesinasystemthat...enableasystemtodothesametaskmoreefficientlythenext7me”–HerbertSimon

WhydoMachineLearning?

•  Solveclassifica2onproblems•  Learnmodelsofdata(“datafiNng”)•  Understandandimproveefficiencyofhumanlearning(e.g.,Computer-AidedInstruc2on(CAI))

•  Discovernewthingsorstructuresthatareunknowntohumans(“datamining”)

•  Fillinskeletalorincompletespecifica2onsaboutadomain

MajorParadigmsofMachineLearning

•  RoteLearning•  Induc2on•  Clustering•  Discovery•  Gene2cAlgorithms•  ReinforcementLearning•  TransferLearning•  LearningbyAnalogy•  Mul2-taskLearning

2

Induc2veLearning

•  Generalizefromagivensetof(training)examplessothataccuratepredic2onscanbemadeaboutfutureexamples

•  Learnunknownfunc2on:f(x) = y– x:aninputexample(akainstance)– y:thedesiredoutput

•  Discreteorcon2nuousscalarvalue– h(hypothesis)func2onislearnedthatapproximatesf

Represen2ng“Things”inMachineLearning

•  Anexampleorinstance,x,representsaspecificobject(“thing”)

•  xo[enrepresentedbyaD-dimensionalfeaturevectorx=(x1,...,xD)∈RD

•  Eachdimensioniscalledafeatureora2ribute•  Con2nuousordiscrete•  xisapointintheD-dimensionalfeaturespace•  Abstrac2onofobject.Ignoresallotheraspects(e.g.,twopeoplehavingthesameweightandheightmaybeconsiderediden2cal)

FeatureVectorRepresenta2on•  Preprocessrawdata

–  extractafeature(a_ribute)vector,x, thatdescribesalla_ributesrelevantforanobject

•  Eachxisalistof(attribute, value)pairs x = [(Rank, queen), (Suit, hearts), (Size, big)]–  numberofa_ributesisfixed:Rank,Suit,Size–  numberofpossiblevaluesforeacha_ributeisfixed(ifdiscrete)Rank:2,…,10,jack,queen,king,aceSuit:diamonds,hearts,clubs,spadesSize:big,small

TypesofFeatures

•  Numericalfeaturehasdiscreteorcon2nuousvaluesthataremeasurements,e.g.,aperson’sweight

•  Categoricalfeatureisonethathastwoormorevalues(categories),butthereisnointrinsicorderingofthevalues,e.g.,aperson’sreligion(akaNominalfeature)

•  Ordinalfeatureissimilartoacategoricalfeaturebutthereisaclearorderingofthevalues,e.g.,economicstatus,withthreevalues:low,mediumandhigh

3

FeatureVectorRepresenta2on

EachexamplecanbeinterpretedasapointinaD-dimensionalfeaturespace,whereDisthenumberoffeatures/a_ributes

Suit

Rank

spades clubs hearts diamonds

2 4 6 8 10 J Q K

FeatureVectorRepresenta2onExample

•  Textdocument– VocabularyofsizeD(~100,000):aardvark,…,zulu

•  “bagofwords”:countsofeachvocabularyentry–  Tomarrymytrueloveè(3531:113788:119676:1)–  IwishthatIfindmysoulmatethisyearè(3819:113448:119450:1

20514:1)

•  O[enremove“stopwords”:the,of,at,in,…•  Special“out-of-vocabulary”(OOV)entrycatchesallunknownwords

MoreFeatureRepresenta2ons

•  Image–  Colorhistogram

•  So[ware–  Execu2onprofile:thenumberof2meseachlineisexecuted

•  Bankaccount–  Creditra2ng,balance,#depositsinlastday,week,month,year,#withdrawals,…

•  Bioinforma2cs– Medicaltest1,test2,test3,…

TrainingSet

•  Atrainingset(akatrainingsample)isacollec2onofexamples(akainstances),x1,...,xn,whichistheinputtothelearningprocess

•  xi=(xi1,...,xiD)•  Assumetheseinstancesareallsampledindependentlyfromthesame,unknown(popula2on)distribu2on,P(x)

•  Wedenotethisbyxi∼P(x),wherei.i.d.standsforindependentandiden:callydistributed

•  Example:Repeatedthrowsofdice

i.i.d.

4

TrainingSet

•  Atrainingsetisthe“experience”giventoalearningalgorithm

•  Whatthealgorithmcanlearnfromitvaries•  Twobasiclearningparadigms:

– unsupervisedlearning– supervisedlearning

Induc2veLearning

•  Supervisedvs.UnsupervisedLearning– supervised:"teacher"givesasetof(x,y)pairs– unsupervised:onlythex’saregiven

•  Ineithercase,thegoalistoes2matef sothatitgeneralizeswellto“correctly”dealwith“futureexamples”incompu2ngf(x)=y– Thatis,findfthatminimizessomemeasureoftheerroroverasetofsamples

UnsupervisedLearning•  Trainingsetisx1,...,xn,that’sit!•  No“teacher” providingsupervisionastohowindividualexamplesshouldbehandled

•  Commontasks:–  Clustering:separatethenexamplesintogroups– Discovery:findhiddenorunknownpa_erns– Noveltydetec:on:findexamplesthatareverydifferentfromtherest

– Dimensionalityreduc:on:representeachexamplewithalowerdimensionalfeaturevectorwhilemaintainingkeycharacteris2csofthetrainingsamples

Clustering

•  Goal:Grouptrainingsampleintoclusterssuchthatexamplesinthesameclusteraresimilar,andexamplesindifferentclustersaredifferent

•  Howmanyclustersdoyousee?•  Manyclusteringalgorithms

5

OrangesandLemons

(fromIainMurrayh_p://homepages.inf.ed.ac.uk/imurray2/)

GoogleNews

DigitalPhotoCollec2ons

•  Youhave1000sofdigitalphotosstoredinvariousfolders

•  Organizethembe_erbygroupingintoclusters– Simplestidea:useimagecrea2on2me(EXIFtag)– Morecomplicated:extractimagefeatures

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

6

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

Detec2ngEventsonTwi_er

•  Usereal-2metextandimagesfromtweetstodiscovernewsocialevents

•  Clustersdefinedbysimilarwordsandwordcooccurences,plussimilarimagefeatures

Google’sEmbeddingProjectorProject ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedatasetbyrepeatedlymergingclusters

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

7

HierarchicalAgglomera:veClustering

•  Ini2allyeverypointisinitsowncluster

HierarchicalAgglomera:veClustering•  Findthepairofclustersthataretheclosest

HierarchicalAgglomera:veClustering• Mergethetwointoasinglecluster

HierarchicalAgglomera:veClustering•  Repeat…

8

HierarchicalAgglomera:veClustering•  Repeat…

HierarchicalAgglomera:veClustering•  Repeat…un2lthewholedatasetisonegiantcluster•  Yougetabinarytree(notshownhere)

HierarchicalAgglomera:veClusteringAlgorithm

HierarchicalAgglomera:veClustering

Howdoyoumeasuretheclosenessbetweentwoclusters?Atleastthreeways:

– Single-linkage:theshortestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Complete-linkage:thelargestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Average-linkage:theaveragedistancebetweenallpairsofmembers,onefromeachcluster

9

Distance•  Howtomeasurethedistancebetweenapairofexamples,X=(x1,…,xn)andY=(y1,…,yn)?– Euclidean

– Manha_an/City-Block– Hamming

•  Numberoffeaturesthataredifferentbetweenthetwoexamples

– Andmanyothers

d(X,Y) = xi − yi( )2i∑

d(X,Y) = xi − yii∑

HierarchicalAgglomera:veClustering

•  Thebinarytreeyougetiso[encalledadendrogram,ortaxonomy,orahierarchyofdatapoints

•  Thetreecanbecutatanyleveltoproducedifferentnumbersofclusters:ifyouwantkclusters,justcutthe(k-1)longestlinks

•  6Italianci2es•  Single-linkage

Example created by Matteo Matteucci

HierarchicalAgglomera:veClusteringExample

Itera2on1:MergeMIandTO

Recompute min distance from MI/TO cluster to all other cities

10

Itera2on2:MergeNAandRM Itera2on3:MergeBAandNA/RM

Itera2on4:MergeFIandBA/NA/RM FinalDendrogram

11

WhatFactorsAffecttheOutcomeofHierarchicalAgglomera:veClustering?•  Featuresused•  Rangeofvaluesforeachfeature•  Linkagemethod•  Distancemetricused•  Weightofeachfeature•  …

HierarchicalAgglomera:veClusteringApplet

h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletH.html

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

•  SupposeItellyoutheclustercenters,ci–  Q:Howtodeterminewhichpointstoassociatewitheachci?

K-MeansClustering

–  A:Foreachpointx,chooseclosestci

•  SupposeItellyouthepointsineachcluster–  Q:Howtodeterminetheclustercenters?–  A:Choosecitobethemean/centroidofallpointsinthecluster

12

K-MeansClustering•  Thedataset.Inputk=5

K-MeansClustering•  Randomlypick5

posi2onsasini2alclustercenters(notnecessarilydatapoints)

K-MeansClustering•  Eachpointfinds

whichclustercenteritisclosestto;thepointbelongstothatcluster

K-MeansClustering•  Eachcluster

computesitsnewcentroidbasedonwhichpointsbelongtoit

13

K-MeansClustering•  Eachcluster

computesitsnewcentroid,basedonwhichpointsbelongtoit

•  Repeatun2lconvergence(i.e.,noclustercentermoves)

K-MeansDemo

•  h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletKM.html

K-MeansAlgorithm

•  Input:x1,…,xn,kwhereeachxiisapointinad-dimensionalfeaturespace

•  Step1:Selectkclustercenters,c1,…,ck•  Step2:Foreachpointxi,determineitscluster:Findtheclosestcenter(using,say,Euclideandistance)

•  Step3:Updateallclustercentersasthecentroids

•  Repeatsteps2and3un2lclustercentersnolongerchange

ci =1

num_ pts_ in_ cluster _ ix

x∈ cluster i∑ Input image Clusters on intensity Clusters on color

Example:ImageSegmenta2on

14

K-MeansProper2es

• Willitalwaysterminate?– Yes(finitenumberofwaysofpar22oningafinitenumberofpointsintokgroups)

•  Isitguaranteedtofindan“op2mal”clustering?– No,buteachitera2onwillreducethedistor2on(error)oftheclustering

Copyright © 2001, 2004, Andrew W. Moore

Non-Op2malClustering

Sayk=3andyouaregiventhefollowingpoints:

Copyright © 2001, 2004, Andrew W. Moore

Non-Op2malClustering

Givenapoorchoiceoftheini2alclustercenters,thefollowingresultispossible:

PickingStar2ngClusterCenters

Whichlocalop2mumk-Meansgoestoisdeterminedsolelybythestar2ngclustercenters

–  Idea1:Runk-Meansmul2ple2meswithdifferentstar2ng,randomclustercenters(hillclimbingwithrandomrestarts)

–  Idea2:Pickarandompointx1fromthedataset1.  Findthepointx2farthestfromx1inthe

dataset2.  Findx3farthestfromthecloserofx1,x23. …Pickkpointslikethis,andusethemasthe

star2ngclustercentersforthekclusters

15

PickingtheNumberofClusters

•  Difficultproblem•  Heuris2capproachesdependonthenumberofpointsandthenumberofdimensions

MeasuringClusterQuality

•  Distor:on=Sumofsquareddistancesofeachdatapointtoitsclustercenter:

•  The“op2mal”clusteringistheonethatminimizesdistor2on(overallpossibleclustercenterloca2onsandassignmentofpointstoclusters)

HowtoPickk?Trymul2plevaluesofkandpicktheoneatthe“elbow”ofthedistor2oncurve

Distor2o

n

NumberofClusters,k

UsesofK-Means

•  O[enusedasanexploratorydataanalysistool•  Inone-dimension,agoodwaytoquan2zereal-valuedvariablesintoknon-uniformbuckets

•  Usedonacous2cdatainspeechrecogni2ontoconvertwaveformsintooneofkcategories(knownasVectorQuan:za:on)

•  Alsousedforchoosingcolorpale_esongraphicaldisplaydevices

16

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

MeanShi[Clustering1.  Choose a search window size 2.  Choose the initial location of the search window 3.  Compute the mean location (centroid of the data) in the search

window 4.  Center the search window at the mean location computed in

Step 3 5.  Repeat Steps 3 and 4 until convergence

The mean shift algorithm seeks the mode, i.e., point of highest density of a data distribution:

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

17

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

18

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Objective : Find the densest region

Results

feature space is only gray level

Results Results

19

SupervisedLearning

•  Alabeledtrainingsampleisacollec2onofexamples:(x1,y1),...,(xn,yn)

•  Assume(xi,yi)∼P(x,y)andP(x,y)isunknown•  Supervisedlearninglearnsafunc2onh:x→yinsomefunc2onfamily,H,suchthath(x)predictsthetruelabelyonfuturedata,x,where (x,y)∼P(x,y)

–  Classifica2on:ifydiscrete–  Regression:ifycon2nuous

i.i.d.

i.i.d.

Labels•  Examples

– Predictgender(M,F)fromweight,height– Predictadult,juvenile(A,J)fromweight,height

•  Alabelyisthedesiredpredic2onforaninstancex

•  Discretelabel:classes– M,F;A,J:o[enencodeas0,1or-1,1– Mul2pleclasses:1,2,3,…,C.Noclassorderimplied.

•  Con2nuouslabel:e.g.,bloodpressure

ConceptLearning

•  Determineifagivenexampleisorisnotaninstanceoftheconcept/class/category– Ifitis,callitaposi:veexample– Ifnot,calleditanega:veexample

Example:MushroomClassifica2on

http://www.usask.ca/biology/fungi/

Edible or Poisonous?

20

MushroomFeatures/A_ributes1.   cap-shape:bell=b,conical=c,convex=x,flat=f,knobbed=k,

sunken=s2.   cap-surface:fibrous=f,grooves=g,scaly=y,smooth=s3.   cap-color:brown=n,buff=b,cinnamon=c,gray=g,green=r,

pink=p,purple=u,red=e,white=w,yellow=y4.   bruises?:bruises=t,no=f5.   odor:almond=a,anise=l,creosote=c,fishy=y,foul=f,

musty=m,none=n,pungent=p,spicy=s6.   gill-a2achment:a_ached=a,descending=d,free=f,

notched=n7.  …

Classes:edible=e,poisonous=p

SupervisedConceptLearningbyInduc2on

•  Givenatrainingsetofposi2veandnega2veexamplesofaconcept:–  {(x1, y1), (x2, y2), ..., (xn, yn)}

whereeachyi iseither+or−•  Constructadescrip2onthataccuratelyclassifieswhetherfutureexamplesareposi2veornega2ve:– h(xn+1) = yn+1

whereyn+1 isthe+or−predic2on

SupervisedLearningMethods

•  k-nearest-neighbors(k-NN)(Chapter18.8.1)

•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.

Induc2veLearningbyNearest-NeighborClassifica2on

Asimpleapproach:– saveeachtrainingexampleasapointinFeatureSpace

– classifyanewexamplebygivingitthesameclassifica2onasitsnearestneighborinFeatureSpace

21

k-Nearest-Neighbors(k-NN)

•  1-NN: Decision boundary

k-NN

•  Whatifwewantregression?– Insteadofmajorityvote,takeaverageofneighbors’yvalues

•  Howtopickk?– Splitdataintotrainingandtuningsets– Classifytuningsetwithdifferentvaluesofk– Pickthekthatproducesthesmallesttuning-seterror

k-NNDoesn'tgeneralizewelliftheexamplesineachclassarenotwell"clustered"

Suit

Rank

Spades Clubs Hearts Diamonds

2 4 6 8 10 J Q K

k-NNDemo

•  h_p://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html

22

Induc2veBias

•  Induc2velearningisaninherentlyconjecturalprocess.Why?– anyknowledgecreatedbygeneraliza2onfromspecificfactscannotbeproventrue

–  itcanonlybeprovenfalse

•  Hence,induc2veinferenceis“falsitypreserving,”not“truthpreserving”

Induc2veBias

•  LearningcanbeviewedassearchingtheHypothesisSpaceHofpossiblehfunc2ons

•  Induc2veBias–  isusedwhenonehischosenoveranother–  isneededtogeneralizebeyondthespecifictrainingexamples

•  Completelyunbiasedinduc2vealgorithm– onlymemorizestrainingexamples– can'tpredictanythingaboutunseenexamples

Induc2veBias

Biasescommonlyusedinmachinelearning:– RestrictedHypothesisSpaceBias:allowonlycertaintypesofh’s,notarbitraryones

– PreferenceBias:defineametricforcomparingh’ssoastodeterminewhetheroneisbe_erthananother

SupervisedLearningMethods

•  k-nearest-neighbor(k-NN)•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.

Recommended