Supervised Learning - Penn Engineering · 2019. 1. 22. · Supervised Learning : Examples §...

Preview:

Citation preview

SupervisedLearning

RobotImageCredit:Viktoriya Sukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

TheBadgesGame

Background:• Pre-registeredattendeesatthe1994MachineLearningConferencereceivedanamebadgelabeledwitha"+"or"-"

• Thelabelisbasedonly uponthename• Thereare294examples(210positiveand84negative)

Whatfunctionwasusedtogeneratethe+/- labeling?

+NaokiAbe - EricBaum

TrainingData

3

+NaokiAbe- Myriam Abramson+DavidW.Aha+KamalM.Ali- EricAllender+DanaAngluin- Chidanand Apte+MinoruAsada+LarsAsker+Javed Aslam+JoseL.Balcazar- CristinaBaroglio

+PeterBartlett- EricBaum+Welton Becket- Shai Ben-David+GeorgeBerg+NeilBerkman+Malini Bhandaru+Bir Bhanu+Reinhard Blasig- Avrim Blum- AnselmBlumer+JustinBoyan

+CarlaE.Brodley+NaderBshouty- WrayBuntine- Andrey Burago+TomBylander+BillByrne- ClaireCardie+JohnCase+JasonCatlett- PhilipChan- Zhixiang Chen- ChrisDarken

TestData

5

?Shivani Agarwal?ChrisCallison-Burch?EricEaton?PeterStone?MatthewTaylor

LabeledTestData

6

- Shivani Agarwal- ChrisCallison-Burch- EricEaton+PeterStone+MatthewTaylor

WhatisLearning?• TheBadgesGameisanexampleofakeylearningprotocol:supervisedlearning

• Firstquestion:Areyousureyougotit?Why?• Issues:–Whichproblemwaseasier: predictionormodeling?– Representation– Problemsetting– BackgroundKnowledge–Whendidlearningtakeplace?

Algorithm:canyouwriteaprogramthattakesthisdataasinputandpredictsthelabelforyourname?

7

Output

y∈YAnitemy

drawnfromanoutputspaceY

Input

x∈XAnitemx

drawnfromaninputspaceX

Systemy =f(x)

SupervisedLearning

• Weconsidersystemsthatapplyanunknownfunctionf()toinputitemsxandreturnanoutputy =f(x).

8

Output

y∈YAnitemy

drawnfromanoutputspaceY

Input

x∈XAnitemx

drawnfromaninputspaceX

Systemy =f(x)

SupervisedLearning

• In(supervised)machinelearning,ourgoalistolearnafunctionh()fromexamplesthatapproximatesf()

9

Output

y∈Y

Anitemydrawnfromalabel

spaceY

Input

x∈X

AnitemxdrawnfromaninstancespaceX

LearnedModely=h(x)

Supervisedlearning

10

Targetfunctiony=f(x)

y = h(x)

Supervisedlearning:Training

• GivethelearnerexamplesinD train

• Thelearnerreturnsamodelh(x)11

LabeledTrainingDataD train

(x1,y1)(x2,y2)…

(xN,yN)

Learnedmodelh(x)

LearningAlgorithm

Canyousuggestotherlearningprotocols?

h(x)isthemodelwe’lluseinourapplication

FunctionApproximationProblemSetting• Setofpossibleinstances• Setofpossiblelabels• Unknowntargetfunction• Setoffunctionhypotheses

Input:Trainingexamplesofunknowntargetfunctionf

Output:Hypothesisthatbestapproximatesf

XY

f : X ! YH = {h | h : X ! Y}

h 2 H

BasedonslidebyTomMitchell

{hxi, yii}ni=1 = {hx1, y1i , . . . , hxn, yni}

SampleDataset• ColumnsdenotefeaturesXi

• Rowsdenotelabeledinstances• Classlabeldenoteswhetheratennisgamewasplayed

hxi, yii

hxi, yii

Supervisedlearning:Testing

• Reservesomelabeleddatafortesting14

LabeledTestData

D test

(x’1,y’1)(x’2,y’2)

…(x’M,y’M)

Supervisedlearning:Testing

LabeledTestData

D test

(x’1,y’1)(x’2,y’2)

…(x’M,y’M)

TestLabelsY test

y’1y’2...y’M

RawTestDataX test

x’1x’2….x’M

15

TestLabelsY test

y’1y’2...y’M

RawTestDataX test

x’1x’2….x’M

Supervisedlearning:Testing• Applythemodeltotherawtestdata• Evaluatebycomparingpredictedlabelsagainstthetestlabels

16

Learnedmodelh(x)

PredictedLabelsh(X test)h(x’1)h(x’2)….

h(x’M)

Canyouuse thetestdataotherwise?

SupervisedLearning:Examples

§ Diseasediagnosis§ x:Propertiesofpatient(symptoms,labtests)§ f:Disease(ormaybe:recommendedtherapy)

§ Part-of-Speechtagging§ x:AnEnglishsentence(e.g.,Thecanwillrust)§ f:Thepartofspeechofawordinthesentence

§ Facerecognition§ x:Bitmappictureofperson’sface§ f:Nametheperson(ormaybe:apropertyof)

§ AutomaticSteering§ x:Bitmappictureofroadsurfaceinfrontofcar§ f:Degreestoturnthesteeringwheel

17

Manyproblemsthatdonotseemlikeclassificationproblemscanbedecomposedintoclassificationproblems.

KeyIssuesinMachineLearning• Modeling

– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)

• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?

• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem

18

Usingsupervisedlearning

§ Whatisourinstancespace?§ Whatkindoffeaturesareweusing?

§ Whatisourlabelspace?§ Whatkindoflearningtaskarewedealingwith?

§ Whatisourhypothesisspace?§ Whatkindoffunctions(models)arewelearning?

§ Whatlearningalgorithmdoweuse?§ Howdowelearnthemodelfromthelabeleddata?

§ Whatisourlossfunction/evaluationmetric?§ Howdowemeasuresuccess?Whatdriveslearning?

19

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

1.TheinstancespaceX

• DesigninganappropriateinstancespaceX iscrucialforhowwellwecanpredicty.

20

1.TheinstancespaceX§ Whenweapplymachinelearningtoatask,wefirst

needtodefinetheinstancespaceX.§ Instancesx∈ X aredefinedbyfeatures:

§ Booleanfeatures:§ Isthereafoldernamedafterthesender?§ Doesthisemailcontainstheword‘class’?§ Doesthisemailcontainstheword‘waiting’?§ Doesthisemailcontainstheword‘class’andtheword‘waiting’?

§ Numericalfeatures:§ Howoftendoes‘learning’occurinthisemail?§ Whatlongisemail?§ HowmanyemailshaveIseenfromthissenderoverthelastday/week/month?

§ Bagoftokens§ Justlistallthetokens intheinput 21

Doesitaddanything?

What’sX fortheBadgesgame?

§ Possiblefeatures:§ Gender§ Name’scountry-of-origin§ Lengthoftheirfirstorlastname§ Doesthenamecontainletter‘x’?§ Howmanyvowelsdoestheirnamecontain?§ Isthen-th letteravowel?§ Doesthenamehavethesamenumberofvowelsandconsonants?

22

X asavectorspace

§ X isanN-dimensionalvectorspace(e.g.<N)§ Eachdimension=onefeature.

§ Eachx isafeaturevector(hencetheboldfacex).§ Thinkofx =[x1 …xN]asapointinX :

23x1

x2

Goodfeaturesareessential§ Thechoiceoffeaturesiscrucial forhowwellataskcanbelearned

§ Inmanyapplicationareas(language,vision,etc.),alotofworkgoesintodesigningsuitablefeatures

§ Thisrequiresdomainexpertise

§ Thinkaboutthebadgesgame– whatifyouwerefocusingonvisualfeatures?

§ Wecan’tteachyouwhatspecificfeaturestouseforyourtask§ Butwewilltouchonsomegeneralprinciples

24

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

2.ThelabelspaceY

• ThelabelspaceY determineswhatkind ofsupervisedlearningtask wearedealingwith

25

SupervisedlearningtasksI

§ Outputlabelsy∈Y arecategorical:§ Binaryclassification:Twopossiblelabels§ Multi-classclassification:kpossiblelabels

§ Outputlabelsy∈Y arestructuredobjects (sequencesoflabels,parsetrees,etc.)

§ Structurelearning

26

SupervisedlearningtasksII

§ Outputlabelsy∈Y arenumerical:§ Regression(linear/polynomial):

§ Labelsarecontinuous-valued§ Learnalinear/polynomialfunctionf(x)

§ Ranking:§ Labelsareordinal§ Learnanorderingf(x1)>f(x2)overinput

27

Output

y∈YAnitemy

drawnfromalabelspaceY

Input

x∈XAnitemx

drawnfromaninstancespaceX

LearnedModelh(x)

3.Themodelh(x)

• Weneedtochoosewhatkind ofmodelwewanttolearn

28

ALearningProblem

29

y = f (x1, x2, x3, x4)Unknownfunction

x1x2x3x4

Example x1 x2 x3 x4 y1 0 0 1 0 0

3 0 0 1 1 14 1 0 0 1 15 0 1 1 0 06 1 1 0 0 07 0 1 0 1 0

2 0 1 0 0 0Canyoulearnthis

function?Whatisit?

HypothesisSpaceCompleteIgnorance:Thereare216 =65536possiblefunctionsoverfourinputfeatures.

Wecan’tfigureoutwhichoneiscorrectuntilwe’veseeneverypossibleinput-outputpair.

Afterobservingsevenexampleswestillhave29 possibilitiesfor f

IsLearningPossible?

30

Example x1 x2 x3 x4 y

16 1 1 1 1 ?

1 0 0 0 0 ?

1 0 0 0 ?

1 0 1 1 ?1 1 0 0 01 1 0 1 ?

1 0 1 0 ?1 0 0 1 1

0 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 ?

0 0 1 1 10 0 1 0 0

2 0 0 0 1 ?

1 1 1 0 ?

q Thereare|Y||X| possiblefunctionsf(x)fromtheinstancespaceX tothelabelspaceY.

q Learnerstypicallyconsideronlyasubset ofthefunctionsfromX toY,calledthehypothesisspaceH .H⊆|Y||X|

GeneralstrategiesforMachineLearning

§ Developflexiblehypothesisspaces:§ Decisiontrees,neuralnetworks,nestedcollections.§ Constrainingthehypothesisspaceisdonealgorithmically

§ Developrepresentationlanguagesforrestrictedclassesoffunctions:§ Servetolimittheexpressivityofthetargetmodels§ E.g.,Functionalrepresentation(n-of-m);Grammars;linearfunctions;stochasticmodels;

§ Getflexibilitybyaugmentingthefeaturespace§ Ineithercase:

§ Developalgorithmsforfindingahypothesisinourhypothesisspace,thatfitsthedata

§ Andhopethattheywillgeneralizewell

34

KeyIssuesinMachineLearning• Modeling

– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)

• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?

• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem

35

Recommended