CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using...

CS4705ProbabilityReviewandNaïveBayesSlidesfromDragomirRadevandmodified

Announcements• Readingfortoday:C.4,4.5NLP• Readingfornextclass:C3,NLP

• NextclasswillbetaughtbyChrisKedzie• Fornewstudentsinclass:•  Nolaptoppolicy•  ClassparKcipaKonusingPollEverywhereorin-classcomments

Today• SciKitLearnTutorial• WrapuponopKmizaKon• GeneraKvemethods

Regularization• Considerthecasewhereoneormoredocumentsaremis-labeled•  Textfromanovelmaybemis-labeledassocialmediaifpostedasaquote

• TheclassifierwillaRempttolearnweightsthatpromotewordscharacterisKcofnovelsaspredictorsofsocialmedia• OverfiTngcanalsooccurwhenthesocialmediadocumentsinthetrainingsetarenotrepresentaKve

Loss•  TopreventoverfiTng,aregularizaKonparameterR(Θ)isadded:

TwoCommonregularizers•  L2regularizaKon•  Keepssumofsquaresofparametervalueslow

•  Gaussianpriororweightdecay(HereWisweightsnotincludingb)•  Preferstodecreaseparameterwithhighweightby1than10parameterswithlowweights

•  L1regularizaKon•  KeepssumofabsolutevalueofparameterslowPunisheduniformlyforhighandlowvalues

Gradientbasedoptimization• RepeatunKlL(Loss)<margin• ComputeLoverthetrainingset• ComputegradientsofΘwithrespecttoL• MovetheparametersintheoppositedirecKonofthegradient

StochasticGradientDescent

Problem• Erroriscalculatedbasedonjustonetrainingsample• MaynotberepresentaKveofcorpuswideloss• Insteadcalculatetheerrorbasedonasetoftrainingexamples:minibatch• ->MinibatchstochasKcgradientdescent

ComputingGradients

Summary• Smoothinghelpstoaccountforzerovaluedn-grams• TextclassificaKonusingfeaturevectorsrepresenKngn-gramsandotherproperKes• DiscriminaKvelearning• MethodsforopKmizaKon,lossfuncKonsandregularizaKon

ClassiCicationusingaGenerativeApproach• StartwithNaïveBayesandMaximumLikelihoodExpectaKon• Butweneedsomebackgroundinprobabilityfirst

ProbabilitiesinNLP• Veryimportantforlanguageprocessing• ExampleinspeechrecogniKon:•  “recognizespeech”vs“wreckanicebeach”

• ExampleinmachinetranslaKon:•  “l’avocatgeneral”:“theaRorneygeneral”vs.“thegeneralavocado”

• ExampleininformaKonretrieval:•  Ifadocumentincludesthreeoccurrencesof“sKr”andoneof“rice”,whatistheprobabilitythatitisarecipe

• ProbabiliKesmakeitpossibletocombineevidencefrommulKplesourcessystemaKcally

Probabilities• Probabilitytheory•  predicKnghowlikelyitisthatsomethingwillhappen

• Experiment(trial)•  e.g.,throwingacoin

• Possibleoutcomes•  headsortails

• Samplespaces•  discrete(numberof“rice”)orconKnuous(e.g.,temperature)

• Events•  Ωisthecertainevent•  ∅istheimpossibleevent•  eventspace-allpossibleevents

SampleSpace• Randomexperiment:anexperimentwithuncertainoutcome•  e.g.,flippingacoin,pickingawordfromtext• Samplespace:allpossibleoutcomes,e.g.,•  Tossing2faircoins,Ω={HH,HT,TH,TT}

Events• Event:asubspaceofthesamplespace•  E⊆Ω,EhappensiffoutcomeisinE,e.g.,•  E={HH}(allheads)•  E={HH,TT}(sameface)

• ProbabilityofEvent:0≤P(E)≤1,s.t.•  P(Ω)=1(outcomealwaysinΩ)•  P(A∪B)=P(A)+P(B),if(A∩B)=∅(e.g.,A=sameface,B=differentface)

Example:TossaDie

• Samplespace:Ω={1,2,3,4,5,6}• Fairdie:•  p(1)=p(2)=p(3)=p(4)=p(5)=p(6)=1/6

• Unfairdie:p(1)=0.3,p(2)=0.2,...• N-dimensionaldie:•  Ω={1,2,3,4,…,N}

• Exampleinmodelingtext:•  TossadietodecidewhichwordtowriteinthenextposiKon•  Ω={cat,dog,Kger,…}

Example:FlipaCoin• Ω:{Head,Tail}• Faircoin:•  p(H)=0.5,p(T)=0.5• Unfaircoin,e.g.:•  p(H)=0.3,p(T)=0.7• Flippingtwofaircoins:•  Samplespace:{HH,HT,TH,TT}

• Exampleinmodelingtext:•  Flipacointodecidewhetherornottoincludeawordinadocument•  Samplespace={appear,absence}

Probabilities

• ProbabiliKes•  numbersbetween0and1

• ProbabilitydistribuKon•  distributesaprobabilitymassof1throughoutthesamplespaceΩ.

•  Example:•  AfaircoinistossedthreeKmes.•  Whatistheprobabilityof3heads?

Probabilities

Probabilities

PropertiesofProbabilities•  p(∅)=0•  P(certainevent)=1•  p(X)≤p(Y),ifX⊆Y•  p(X∪Y)=p(X)+p(Y),ifX∩Y=∅

ConditionalProbability

• Priorandposteriorprobability• CondiKonalprobability

P(A|B)=P(A∩B)

ConditionalProbability• Six-sidedfairdie• P(Deven)=?• P(D>=4)=?• P(Deven|D>=4)=?• P(Dodd|D>=4)=?• MulKplecondiKons• P(Dodd|D>=4,D<=5)=?

Independence

•  TwoeventsareindependentwhenP(A∩B)=P(A)P(B)

• UnlessP(B)=0thisisequivalenttosayingthatP(A)=P(A|B)•  Iftwoeventsarenotindependent,theyareconsidereddependent

[slidefromBrendanO’Connor]

NaïveBayesClassiCier• WeuseBaye’srule:•  P(C|D)=P(D|C)P(C)P(D)HereC=Class,D=Document

• WecansimplifyandignoreP(D)sinceitisindependentofclasschoice•  P(C|D)≅P(D|C)P(C)≅P(C)ΠP(wi|C)i=1,n•  ThisesKmatestheprobabilityofDbeinginClassCassumingthatDasntokensandwisatokeninD.

UseLabeledTrainingData• P(C)isequivalenttothenumberoflabeleddocumentsintheclass/totalnumberofdocuments:

P(C)=Dc/DP(wi|C)isequivalenttothenumberofKmeswioccurswithlabelC/thenumberofKmesallwordsinthevocabulary(V)occurwithlabelC

P(w,|C)=Count(wiC)/ΣCount(viC)viεV

MultinomialNaïveBayesIndependenceAssumptions

• BagofWordsassumpKon•  AssumeposiKondoesn’tmaRer

• CondiKonalIndependence•  AssumethefeatureprobabiliKesP(wi|c)areindependentgiventheclassc.

[JurafskyandMarKn]

P(w1,…wn)

P(w1,…wn)=ΠP(wi|C)i=1,n

MultinomialNaïveBayesClassiCier• CMAP=argmaxP(w1…wn|C)P(C)• CNB=argmaxP(Cj)ΠP(w|C)wεW

Thisiswhyit’snaïve!

[JurafskyandMarKn]

Laplace Smoothing: Needed because counts may be zero

P̂(wi | c) =count(wi,c)+1count(w,c)+1( )

w∈V∑

=count(wi,c)+1

count(w,cw∈V∑ )

'(( + V

P̂(wi | c) =count(wi,c)count(w,c)( )

w∈V∑

[JurafskyandMarKn]

Questions?

SciKitLearn

CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using...

Documents

Neural Dialog Systems - Columbia Universitykathy/NLP/2019/ClassSlides/Class21-Dialog/dialog... · End-to-end neural dialog systems Basic idea: reuse MT encoder-decoder model RNNs

The Usage of PollEverywhere by Howard University Faculty Who Took PollEverywhere Training From CETLA

CS 4705 Final Review CS4705 Julia Hirschberg. Format and Coverage Covers only material from thru (i.e. beginning with Probabilistic Parsing) Same format

CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech

CS4705 Natural Language Processing Fall 2009

Double, Triple, Quadruple the Recipe: Serve Library Instruction to a Crowd (and Assess It, Too!) with LibGuides and PollEverywhere Melissa Bowles-Terry

FREE CLICKERS!!! Using PollEverywhere for Formative Assessment in the Classroom

Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Tagging Problems, and Hidden Markov Modelsmcollins/cs4705-spring2019/slides/tagging.p… · Tagging Problems, and Hidden Markov Models Michael Collins, Columbia University. Overview

CS4705 Natural Language Processing Fall 2009kathy/NLP/ClassSlides/Class1-Intro/intro.pdf10% class participation. Copying or paraphrasing someone's work (code included), or permitting

Semantic Parsing - Columbia Universitykathy/NLP/2017/ClassSlides/Class14... · 2017. 10. 24. · Two Examples • Parsing into Abstract Meaning Representaon (AMR) • Language to

Uninformed Search - Columbia Universitykathy/cs4701/documents/UninformedSearch.… · Uninformed Search Strategies The strategy gives the order in which the search space is searched

Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Lecture - Columbia Universitykathy/NLP/ClassSlides/Class2-FSA/regexpr.pdfHow much is IBM worth? ... Simple but powerful tools for ‘shallow’ processing of ... δ(q,i): a transition

Information Extraction - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Alexander Mackenzie , (January 28, 1822

CS4705 Part of Speech tagging - Columbia Universitykathy/NLP/2019/ClassSlides/Class8-POS/Cla… · CS4705 Part of Speech tagging 9/30/19 Some slides adapted from: Dan Jurafsky, Julia

CS4705 Natural Language Processing - Columbia …julia/courses/CS4705/kathy/Slides...speech, deceptive speech…. ` CS Advising ` Recommendation letters ` Research project ` Advice

CS4705 Natural Language Processing

CS 4705 Hidden Markov Models Julia Hirschberg CS4705

A Guide to PollEverywhere!