What is Learning? Machine Learning: Introduction and Unsupervised

MachineLearning:Introduc2onand

UnsupervisedLearning

Chapter18.1,18.2,18.8.1and“Introduc2ontoSta2s2calMachineLearning”

Optional: “A Few Useful Things to Know about Machine Learning,” P. Domingos, Comm. ACM 55, 2012

WhatisLearning?

•  “Learningismakingusefulchangesinourminds”–MarvinMinsky

•  “Learningisconstruc7ngormodifyingrepresenta7onsofwhatisbeingexperienced“–RyszardMichalski

•  “Learningdenoteschangesinasystemthat...enableasystemtodothesametaskmoreefficientlythenext7me”–HerbertSimon

WhydoMachineLearning?

•  Solveclassifica2onproblems•  Learnmodelsofdata(“datafiNng”)•  Understandandimproveefficiencyofhumanlearning(e.g.,Computer-AidedInstruc2on(CAI))

•  Discovernewthingsorstructuresthatareunknowntohumans(“datamining”)

•  Fillinskeletalorincompletespecifica2onsaboutadomain

MajorParadigmsofMachineLearning

•  RoteLearning•  Induc2on•  Clustering•  Discovery•  Gene2cAlgorithms•  ReinforcementLearning•  TransferLearning•  LearningbyAnalogy•  Mul2-taskLearning

Induc2veLearning

•  Generalizefromagivensetof(training)examplessothataccuratepredic2onscanbemadeaboutfutureexamples

•  Learnunknownfunc2on:f(x) = y– x:aninputexample(akainstance)– y:thedesiredoutput

•  Discreteorcon2nuousscalarvalue– h(hypothesis)func2onislearnedthatapproximatesf

Represen2ng“Things”inMachineLearning

•  Anexampleorinstance,x,representsaspecificobject(“thing”)

•  xo[enrepresentedbyaD-dimensionalfeaturevectorx=(x1,...,xD)∈RD

•  Eachdimensioniscalledafeatureora2ribute•  Con2nuousordiscrete•  xisapointintheD-dimensionalfeaturespace•  Abstrac2onofobject.Ignoresallotheraspects(e.g.,twopeoplehavingthesameweightandheightmaybeconsiderediden2cal)

FeatureVectorRepresenta2on•  Preprocessrawdata

–  extractafeature(a_ribute)vector,x, thatdescribesalla_ributesrelevantforanobject

•  Eachxisalistof(attribute, value)pairs x = [(Rank, queen), (Suit, hearts), (Size, big)]–  numberofa_ributesisfixed:Rank,Suit,Size–  numberofpossiblevaluesforeacha_ributeisfixed(ifdiscrete)Rank:2,…,10,jack,queen,king,aceSuit:diamonds,hearts,clubs,spadesSize:big,small

TypesofFeatures

•  Numericalfeaturehasdiscreteorcon2nuousvaluesthataremeasurements,e.g.,aperson’sweight

•  Categoricalfeatureisonethathastwoormorevalues(categories),butthereisnointrinsicorderingofthevalues,e.g.,aperson’sreligion(akaNominalfeature)

•  Ordinalfeatureissimilartoacategoricalfeaturebutthereisaclearorderingofthevalues,e.g.,economicstatus,withthreevalues:low,mediumandhigh

FeatureVectorRepresenta2on

EachexamplecanbeinterpretedasapointinaD-dimensionalfeaturespace,whereDisthenumberoffeatures/a_ributes

spades clubs hearts diamonds

2 4 6 8 10 J Q K

FeatureVectorRepresenta2onExample

•  Textdocument– VocabularyofsizeD(~100,000):aardvark,…,zulu

•  “bagofwords”:countsofeachvocabularyentry–  Tomarrymytrueloveè(3531:113788:119676:1)–  IwishthatIfindmysoulmatethisyearè(3819:113448:119450:1

20514:1)

•  O[enremove“stopwords”:the,of,at,in,…•  Special“out-of-vocabulary”(OOV)entrycatchesallunknownwords

MoreFeatureRepresenta2ons

•  Image–  Colorhistogram

•  So[ware–  Execu2onprofile:thenumberof2meseachlineisexecuted

•  Bankaccount–  Creditra2ng,balance,#depositsinlastday,week,month,year,#withdrawals,…

•  Bioinforma2cs– Medicaltest1,test2,test3,…

TrainingSet

•  Atrainingset(akatrainingsample)isacollec2onofexamples(akainstances),x1,...,xn,whichistheinputtothelearningprocess

•  xi=(xi1,...,xiD)•  Assumetheseinstancesareallsampledindependentlyfromthesame,unknown(popula2on)distribu2on,P(x)

•  Wedenotethisbyxi∼P(x),wherei.i.d.standsforindependentandiden:callydistributed

•  Example:Repeatedthrowsofdice

i.i.d.

TrainingSet

•  Atrainingsetisthe“experience”giventoalearningalgorithm

•  Whatthealgorithmcanlearnfromitvaries•  Twobasiclearningparadigms:

– unsupervisedlearning– supervisedlearning

Induc2veLearning

•  Supervisedvs.UnsupervisedLearning– supervised:"teacher"givesasetof(x,y)pairs– unsupervised:onlythex’saregiven

•  Ineithercase,thegoalistoes2matef sothatitgeneralizeswellto“correctly”dealwith“futureexamples”incompu2ngf(x)=y– Thatis,findfthatminimizessomemeasureoftheerroroverasetofsamples

UnsupervisedLearning•  Trainingsetisx1,...,xn,that’sit!•  No“teacher” providingsupervisionastohowindividualexamplesshouldbehandled

•  Commontasks:–  Clustering:separatethenexamplesintogroups– Discovery:findhiddenorunknownpa_erns– Noveltydetec:on:findexamplesthatareverydifferentfromtherest

– Dimensionalityreduc:on:representeachexamplewithalowerdimensionalfeaturevectorwhilemaintainingkeycharacteris2csofthetrainingsamples

Clustering

•  Goal:Grouptrainingsampleintoclusterssuchthatexamplesinthesameclusteraresimilar,andexamplesindifferentclustersaredifferent

•  Howmanyclustersdoyousee?•  Manyclusteringalgorithms

OrangesandLemons

(fromIainMurrayh_p://homepages.inf.ed.ac.uk/imurray2/)

GoogleNews

DigitalPhotoCollec2ons

•  Youhave1000sofdigitalphotosstoredinvariousfolders

•  Organizethembe_erbygroupingintoclusters– Simplestidea:useimagecrea2on2me(EXIFtag)– Morecomplicated:extractimagefeatures

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

Detec2ngEventsonTwi_er

•  Usereal-2metextandimagesfromtweetstodiscovernewsocialevents

•  Clustersdefinedbysimilarwordsandwordcooccurences,plussimilarimagefeatures

Google’sEmbeddingProjectorProject ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedatasetbyrepeatedlymergingclusters

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

HierarchicalAgglomera:veClustering

•  Ini2allyeverypointisinitsowncluster

HierarchicalAgglomera:veClustering•  Findthepairofclustersthataretheclosest

HierarchicalAgglomera:veClustering• Mergethetwointoasinglecluster

HierarchicalAgglomera:veClustering•  Repeat…

HierarchicalAgglomera:veClustering•  Repeat…un2lthewholedatasetisonegiantcluster•  Yougetabinarytree(notshownhere)

HierarchicalAgglomera:veClusteringAlgorithm

Howdoyoumeasuretheclosenessbetweentwoclusters?Atleastthreeways:

– Single-linkage:theshortestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Complete-linkage:thelargestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Average-linkage:theaveragedistancebetweenallpairsofmembers,onefromeachcluster

Distance•  Howtomeasurethedistancebetweenapairofexamples,X=(x1,…,xn)andY=(y1,…,yn)?– Euclidean

– Manha_an/City-Block– Hamming

•  Numberoffeaturesthataredifferentbetweenthetwoexamples

– Andmanyothers

d(X,Y) = xi − yi( )2i∑

d(X,Y) = xi − yii∑

•  Thebinarytreeyougetiso[encalledadendrogram,ortaxonomy,orahierarchyofdatapoints

•  Thetreecanbecutatanyleveltoproducedifferentnumbersofclusters:ifyouwantkclusters,justcutthe(k-1)longestlinks

•  6Italianci2es•  Single-linkage

Example created by Matteo Matteucci

HierarchicalAgglomera:veClusteringExample

Itera2on1:MergeMIandTO

Recompute min distance from MI/TO cluster to all other cities

Itera2on2:MergeNAandRM Itera2on3:MergeBAandNA/RM

Itera2on4:MergeFIandBA/NA/RM FinalDendrogram

WhatFactorsAffecttheOutcomeofHierarchicalAgglomera:veClustering?•  Featuresused•  Rangeofvaluesforeachfeature•  Linkagemethod•  Distancemetricused•  Weightofeachfeature•  …

HierarchicalAgglomera:veClusteringApplet

h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletH.html

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

•  SupposeItellyoutheclustercenters,ci–  Q:Howtodeterminewhichpointstoassociatewitheachci?

K-MeansClustering

–  A:Foreachpointx,chooseclosestci

•  SupposeItellyouthepointsineachcluster–  Q:Howtodeterminetheclustercenters?–  A:Choosecitobethemean/centroidofallpointsinthecluster

K-MeansClustering•  Thedataset.Inputk=5

K-MeansClustering•  Randomlypick5

posi2onsasini2alclustercenters(notnecessarilydatapoints)

K-MeansClustering•  Eachpointfinds

whichclustercenteritisclosestto;thepointbelongstothatcluster

K-MeansClustering•  Eachcluster

computesitsnewcentroidbasedonwhichpointsbelongtoit

K-MeansClustering•  Eachcluster

computesitsnewcentroid,basedonwhichpointsbelongtoit

•  Repeatun2lconvergence(i.e.,noclustercentermoves)

K-MeansDemo

•  h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletKM.html

K-MeansAlgorithm

•  Input:x1,…,xn,kwhereeachxiisapointinad-dimensionalfeaturespace

•  Step1:Selectkclustercenters,c1,…,ck•  Step2:Foreachpointxi,determineitscluster:Findtheclosestcenter(using,say,Euclideandistance)

•  Step3:Updateallclustercentersasthecentroids

•  Repeatsteps2and3un2lclustercentersnolongerchange

num_ pts_ in_ cluster _ ix

x∈ cluster i∑ Input image Clusters on intensity Clusters on color

Example:ImageSegmenta2on

K-MeansProper2es

• Willitalwaysterminate?– Yes(finitenumberofwaysofpar22oningafinitenumberofpointsintokgroups)

•  Isitguaranteedtofindan“op2mal”clustering?– No,buteachitera2onwillreducethedistor2on(error)oftheclustering

Non-Op2malClustering

Sayk=3andyouaregiventhefollowingpoints:

Non-Op2malClustering

Givenapoorchoiceoftheini2alclustercenters,thefollowingresultispossible:

PickingStar2ngClusterCenters

Whichlocalop2mumk-Meansgoestoisdeterminedsolelybythestar2ngclustercenters

–  Idea1:Runk-Meansmul2ple2meswithdifferentstar2ng,randomclustercenters(hillclimbingwithrandomrestarts)

–  Idea2:Pickarandompointx1fromthedataset1.  Findthepointx2farthestfromx1inthe

dataset2.  Findx3farthestfromthecloserofx1,x23. …Pickkpointslikethis,andusethemasthe

star2ngclustercentersforthekclusters

PickingtheNumberofClusters

•  Difficultproblem•  Heuris2capproachesdependonthenumberofpointsandthenumberofdimensions

MeasuringClusterQuality

•  Distor:on=Sumofsquareddistancesofeachdatapointtoitsclustercenter:

•  The“op2mal”clusteringistheonethatminimizesdistor2on(overallpossibleclustercenterloca2onsandassignmentofpointstoclusters)

HowtoPickk?Trymul2plevaluesofkandpicktheoneatthe“elbow”ofthedistor2oncurve

Distor2o

NumberofClusters,k

UsesofK-Means

•  O[enusedasanexploratorydataanalysistool•  Inone-dimension,agoodwaytoquan2zereal-valuedvariablesintoknon-uniformbuckets

•  Usedonacous2cdatainspeechrecogni2ontoconvertwaveformsintooneofkcategories(knownasVectorQuan:za:on)

•  Alsousedforchoosingcolorpale_esongraphicaldisplaydevices

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

MeanShi[Clustering1.  Choose a search window size 2.  Choose the initial location of the search window 3.  Compute the mean location (centroid of the data) in the search

window 4.  Center the search window at the mean location computed in

Step 3 5.  Repeat Steps 3 and 4 until convergence

The mean shift algorithm seeks the mode, i.e., point of highest density of a data distribution:

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Region of interest

Centroid

Mean Shift vector

Intui2veDescrip2on

Region of interest

Centroid

Mean Shift vector Objective : Find the densest region

Intui2veDescrip2on

Region of interest

Centroid

Mean Shift vector

Intui2veDescrip2on

Region of interest

Centroid

Mean Shift vector

Intui2veDescrip2on

Region of interest

Centroid

Mean Shift vector

Intui2veDescrip2on

Region of interest

Centroid

Results

feature space is only gray level

Results Results

SupervisedLearning

•  Alabeledtrainingsampleisacollec2onofexamples:(x1,y1),...,(xn,yn)

•  Assume(xi,yi)∼P(x,y)andP(x,y)isunknown•  Supervisedlearninglearnsafunc2onh:x→yinsomefunc2onfamily,H,suchthath(x)predictsthetruelabelyonfuturedata,x,where (x,y)∼P(x,y)

–  Classifica2on:ifydiscrete–  Regression:ifycon2nuous

i.i.d.

Labels•  Examples

– Predictgender(M,F)fromweight,height– Predictadult,juvenile(A,J)fromweight,height

•  Alabelyisthedesiredpredic2onforaninstancex

•  Discretelabel:classes– M,F;A,J:o[enencodeas0,1or-1,1– Mul2pleclasses:1,2,3,…,C.Noclassorderimplied.

•  Con2nuouslabel:e.g.,bloodpressure

ConceptLearning

•  Determineifagivenexampleisorisnotaninstanceoftheconcept/class/category– Ifitis,callitaposi:veexample– Ifnot,calleditanega:veexample

Example:MushroomClassifica2on

http://www.usask.ca/biology/fungi/

Edible or Poisonous?

MushroomFeatures/A_ributes1.   cap-shape:bell=b,conical=c,convex=x,flat=f,knobbed=k,

sunken=s2.   cap-surface:fibrous=f,grooves=g,scaly=y,smooth=s3.   cap-color:brown=n,buff=b,cinnamon=c,gray=g,green=r,

pink=p,purple=u,red=e,white=w,yellow=y4.   bruises?:bruises=t,no=f5.   odor:almond=a,anise=l,creosote=c,fishy=y,foul=f,

musty=m,none=n,pungent=p,spicy=s6.   gill-a2achment:a_ached=a,descending=d,free=f,

notched=n7.  …

Classes:edible=e,poisonous=p

SupervisedConceptLearningbyInduc2on

•  Givenatrainingsetofposi2veandnega2veexamplesofaconcept:–  {(x1, y1), (x2, y2), ..., (xn, yn)}

whereeachyi iseither+or−•  Constructadescrip2onthataccuratelyclassifieswhetherfutureexamplesareposi2veornega2ve:– h(xn+1) = yn+1

whereyn+1 isthe+or−predic2on

SupervisedLearningMethods

•  k-nearest-neighbors(k-NN)(Chapter18.8.1)

•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.

Induc2veLearningbyNearest-NeighborClassifica2on

Asimpleapproach:– saveeachtrainingexampleasapointinFeatureSpace

– classifyanewexamplebygivingitthesameclassifica2onasitsnearestneighborinFeatureSpace

k-Nearest-Neighbors(k-NN)

•  1-NN: Decision boundary

•  Whatifwewantregression?– Insteadofmajorityvote,takeaverageofneighbors’yvalues

•  Howtopickk?– Splitdataintotrainingandtuningsets– Classifytuningsetwithdifferentvaluesofk– Pickthekthatproducesthesmallesttuning-seterror

k-NNDoesn'tgeneralizewelliftheexamplesineachclassarenotwell"clustered"

Spades Clubs Hearts Diamonds

2 4 6 8 10 J Q K

k-NNDemo

•  h_p://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html

Induc2veBias

•  Induc2velearningisaninherentlyconjecturalprocess.Why?– anyknowledgecreatedbygeneraliza2onfromspecificfactscannotbeproventrue

–  itcanonlybeprovenfalse

•  Hence,induc2veinferenceis“falsitypreserving,”not“truthpreserving”

Induc2veBias

•  LearningcanbeviewedassearchingtheHypothesisSpaceHofpossiblehfunc2ons

•  Induc2veBias–  isusedwhenonehischosenoveranother–  isneededtogeneralizebeyondthespecifictrainingexamples

•  Completelyunbiasedinduc2vealgorithm– onlymemorizestrainingexamples– can'tpredictanythingaboutunseenexamples

Induc2veBias

Biasescommonlyusedinmachinelearning:– RestrictedHypothesisSpaceBias:allowonlycertaintypesofh’s,notarbitraryones

– PreferenceBias:defineametricforcomparingh’ssoastodeterminewhetheroneisbe_erthananother

SupervisedLearningMethods

•  k-nearest-neighbor(k-NN)•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.

What is Learning? Machine Learning: Introduction and Unsupervised

Documents

Part 2: Unsupervised Learning Machine Learning Techniques

Data Preprocessing Unsupervised learningedu.bioinf.me/files/preprocessing_unsupervised.pdf · Data Preprocessing Unsupervised learning Elena Sügis elena.sugis@ut.ee Machine learning,

Unsupervised Learning: K-means Clusteringi-systems.github.io/HSE545/machine learning all/05 Clustering/01_K... · Unsupervised Learning Data clustering is an unsupervised learning

28 Machine Learning Unsupervised Hierarchical Clustering

25 Machine Learning Unsupervised Learaning K-means K-centers

Unsupervised Learning for Matrix Decompositions - HIIT Learning Coffee... · Unsupervised Machine Learning for Matrix Decomposition Erkki Oja Department of Computer Science Aalto

Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning

27 Machine Learning Unsupervised Measure Properties

Improving Software Maintenance using Unsupervised Machine Learning techniques

Bits of Machine Learning Part 2: Unsupervised Learningwasp-sweden.org/custom/uploads/2016/11/alexandre-proutiere... · Bits of Machine Learning Part 2: Unsupervised Learning ... protocols

Machine Learning - Bioconductor€¦ · Types of Machine Learning •Supervised Learning –classification •Unsupervised Learning –clustering –class discovery •Feature Selection

An Unsupervised Machine Learning Approach To Assessing Designer Performance … · 2018-02-12 · An Unsupervised Machine Learning Approach To Assessing Designer Performance During

Unsupervised Machine Learning in 5G Networks for Low ...iwinlab.eng.usf.edu/papers/Unsupervised Machine... · different machine learning applications, however, those papers are not

1 Machine Learning Unsupervised Learning. 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate

Big Data and Unsupervised Machine Learning Approach to

Austin Machine Learning Meetup - HMMs & Unsupervised Partial Parsing

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning Learn models from data Three main types of learning : Supervised learning Unsupervised

Machine Learning 101 - PBworksmachinelearning101.pbworks.com/f/chap1_OurIntroTanTib.pdf · Machine Learning 101 ... Supervised Learning Unsupervised Learning ... of customers in same

UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY …chairelogistiqueurbaine.fr/wp-content/uploads/2019/06/UNSUPERVISED... · UNSUPERVISED MACHINE LEARNING TO ANALYSE CITY Image credit:

An unsupervised machine learning method for discovering ...Lopez_etal_JBI_18.pdf · significant number of publications implementing unsupervised machine learning methods, such as