38
CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev and modified

CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

CS4705ProbabilityReviewandNaïveBayesSlidesfromDragomirRadevandmodified

Page 2: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Announcements• Readingfortoday:C.4,4.5NLP• Readingfornextclass:C3,NLP

• NextclasswillbetaughtbyChrisKedzie• Fornewstudentsinclass:•  Nolaptoppolicy•  ClassparKcipaKonusingPollEverywhereorin-classcomments

Page 3: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Today• SciKitLearnTutorial• WrapuponopKmizaKon• GeneraKvemethods

Page 4: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Regularization• Considerthecasewhereoneormoredocumentsaremis-labeled•  Textfromanovelmaybemis-labeledassocialmediaifpostedasaquote

• TheclassifierwillaRempttolearnweightsthatpromotewordscharacterisKcofnovelsaspredictorsofsocialmedia• OverfiTngcanalsooccurwhenthesocialmediadocumentsinthetrainingsetarenotrepresentaKve

Page 5: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Loss•  TopreventoverfiTng,aregularizaKonparameterR(Θ)isadded:

Page 6: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

TwoCommonregularizers•  L2regularizaKon•  Keepssumofsquaresofparametervalueslow

•  Gaussianpriororweightdecay(HereWisweightsnotincludingb)•  Preferstodecreaseparameterwithhighweightby1than10parameterswithlowweights

•  L1regularizaKon•  KeepssumofabsolutevalueofparameterslowPunisheduniformlyforhighandlowvalues

Page 7: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Gradientbasedoptimization• RepeatunKlL(Loss)<margin• ComputeLoverthetrainingset• ComputegradientsofΘwithrespecttoL• MovetheparametersintheoppositedirecKonofthegradient

Page 8: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

StochasticGradientDescent

Page 9: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Problem• Erroriscalculatedbasedonjustonetrainingsample• MaynotberepresentaKveofcorpuswideloss• Insteadcalculatetheerrorbasedonasetoftrainingexamples:minibatch• ->MinibatchstochasKcgradientdescent

Page 10: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

ComputingGradients

Page 11: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Summary• Smoothinghelpstoaccountforzerovaluedn-grams• TextclassificaKonusingfeaturevectorsrepresenKngn-gramsandotherproperKes• DiscriminaKvelearning• MethodsforopKmizaKon,lossfuncKonsandregularizaKon

Page 12: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

ClassiCicationusingaGenerativeApproach• StartwithNaïveBayesandMaximumLikelihoodExpectaKon• Butweneedsomebackgroundinprobabilityfirst

Page 13: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

ProbabilitiesinNLP• Veryimportantforlanguageprocessing• ExampleinspeechrecogniKon:•  “recognizespeech”vs“wreckanicebeach”

• ExampleinmachinetranslaKon:•  “l’avocatgeneral”:“theaRorneygeneral”vs.“thegeneralavocado”

• ExampleininformaKonretrieval:•  Ifadocumentincludesthreeoccurrencesof“sKr”andoneof“rice”,whatistheprobabilitythatitisarecipe

• ProbabiliKesmakeitpossibletocombineevidencefrommulKplesourcessystemaKcally

Page 14: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Probabilities• Probabilitytheory•  predicKnghowlikelyitisthatsomethingwillhappen

• Experiment(trial)•  e.g.,throwingacoin

• Possibleoutcomes•  headsortails

• Samplespaces•  discrete(numberof“rice”)orconKnuous(e.g.,temperature)

• Events•  Ωisthecertainevent•  ∅istheimpossibleevent•  eventspace-allpossibleevents

Page 15: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

SampleSpace• Randomexperiment:anexperimentwithuncertainoutcome•  e.g.,flippingacoin,pickingawordfromtext• Samplespace:allpossibleoutcomes,e.g.,•  Tossing2faircoins,Ω={HH,HT,TH,TT}

Page 16: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Events• Event:asubspaceofthesamplespace•  E⊆Ω,EhappensiffoutcomeisinE,e.g.,•  E={HH}(allheads)•  E={HH,TT}(sameface)

• ProbabilityofEvent:0≤P(E)≤1,s.t.•  P(Ω)=1(outcomealwaysinΩ)•  P(A∪B)=P(A)+P(B),if(A∩B)=∅(e.g.,A=sameface,B=differentface)

Page 17: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Example:TossaDie

• Samplespace:Ω={1,2,3,4,5,6}• Fairdie:•  p(1)=p(2)=p(3)=p(4)=p(5)=p(6)=1/6

• Unfairdie:p(1)=0.3,p(2)=0.2,...• N-dimensionaldie:•  Ω={1,2,3,4,…,N}

• Exampleinmodelingtext:•  TossadietodecidewhichwordtowriteinthenextposiKon•  Ω={cat,dog,Kger,…}

Page 18: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Example:FlipaCoin• Ω:{Head,Tail}• Faircoin:•  p(H)=0.5,p(T)=0.5• Unfaircoin,e.g.:•  p(H)=0.3,p(T)=0.7• Flippingtwofaircoins:•  Samplespace:{HH,HT,TH,TT}

• Exampleinmodelingtext:•  Flipacointodecidewhetherornottoincludeawordinadocument•  Samplespace={appear,absence}

Page 19: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Probabilities

• ProbabiliKes•  numbersbetween0and1

• ProbabilitydistribuKon•  distributesaprobabilitymassof1throughoutthesamplespaceΩ.

•  Example:•  AfaircoinistossedthreeKmes.•  Whatistheprobabilityof3heads?

Page 20: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Probabilities

•  Jointprobability:P(A∩B),alsowriRenasP(A,B)•  CondiKonalProbability:P(A|B)=P(A∩B)/P(B)•  P(A∩B)=P(A)P(B|A)=P(B)P(A|B)•  So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)•  Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)

•  Totalprobability:IfA1,…,AnformaparKKonofS,then•  P(B)=P(B∩S)=P(B,A1)+…+P(B,An)•  So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]•  ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)

Page 21: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Probabilities

•  Jointprobability:P(A∩B),alsowriRenasP(A,B)•  CondiKonalProbability:P(A|B)=P(A∩B)/P(B)•  P(A∩B)=P(A)P(B|A)=P(B)P(A|B)•  So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)•  Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)

•  Totalprobability:IfA1,…,AnformaparKKonofS,then•  P(B)=P(B∩S)=P(B,A1)+…+P(B,An)•  So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]•  ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)

Page 22: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

PropertiesofProbabilities•  p(∅)=0•  P(certainevent)=1•  p(X)≤p(Y),ifX⊆Y•  p(X∪Y)=p(X)+p(Y),ifX∩Y=∅

Page 23: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

ConditionalProbability

• Priorandposteriorprobability• CondiKonalprobability

P(A|B)=P(A∩B)

P(B)

Ω

A B

A∩B

Page 24: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

ConditionalProbability• Six-sidedfairdie• P(Deven)=?• P(D>=4)=?• P(Deven|D>=4)=?• P(Dodd|D>=4)=?• MulKplecondiKons• P(Dodd|D>=4,D<=5)=?

Page 25: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon
Page 26: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon
Page 27: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon
Page 28: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon
Page 29: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon
Page 30: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Independence

•  TwoeventsareindependentwhenP(A∩B)=P(A)P(B)

• UnlessP(B)=0thisisequivalenttosayingthatP(A)=P(A|B)•  Iftwoeventsarenotindependent,theyareconsidereddependent

Page 31: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

[slidefromBrendanO’Connor]

Page 32: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

NaïveBayesClassiCier• WeuseBaye’srule:•  P(C|D)=P(D|C)P(C)P(D)HereC=Class,D=Document

• WecansimplifyandignoreP(D)sinceitisindependentofclasschoice•  P(C|D)≅P(D|C)P(C)≅P(C)ΠP(wi|C)i=1,n•  ThisesKmatestheprobabilityofDbeinginClassCassumingthatDasntokensandwisatokeninD.

Page 33: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

UseLabeledTrainingData• P(C)isequivalenttothenumberoflabeleddocumentsintheclass/totalnumberofdocuments:

P(C)=Dc/DP(wi|C)isequivalenttothenumberofKmeswioccurswithlabelC/thenumberofKmesallwordsinthevocabulary(V)occurwithlabelC

P(w,|C)=Count(wiC)/ΣCount(viC)viεV

Page 34: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

MultinomialNaïveBayesIndependenceAssumptions

• BagofWordsassumpKon•  AssumeposiKondoesn’tmaRer

• CondiKonalIndependence•  AssumethefeatureprobabiliKesP(wi|c)areindependentgiventheclassc.

[JurafskyandMarKn]

P(w1,…wn)

P(w1,…wn)=ΠP(wi|C)i=1,n

Page 35: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

MultinomialNaïveBayesClassiCier• CMAP=argmaxP(w1…wn|C)P(C)• CNB=argmaxP(Cj)ΠP(w|C)wεW

Thisiswhyit’snaïve!

[JurafskyandMarKn]

Page 36: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Laplace Smoothing: Needed because counts may be zero

P̂(wi | c) =count(wi,c)+1count(w,c)+1( )

w∈V∑

=count(wi,c)+1

count(w,cw∈V∑ )

#

$%%

&

'(( + V

P̂(wi | c) =count(wi,c)count(w,c)( )

w∈V∑

[JurafskyandMarKn]

Page 37: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

Questions?

Page 38: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon

SciKitLearn