47
TTIC 31210: Advanced Natural Language Processing Kevin Gimpel Spring 2017 Lecture 14: Finish up Bayesian/Unsupervised NLP, Start Structured Prediction 1

TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

TTIC31210:AdvancedNaturalLanguageProcessing

KevinGimpelSpring2017

Lecture14:FinishupBayesian/UnsupervisedNLP,

StartStructuredPrediction

1

Page 2: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• TodayandWednesday:structuredprediction• NoclassMondayMay29(MemorialDay)• FinalclassisWednesdayMay31

2

Page 3: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• Assignment3hasbeenposted,dueThursdayJune1• FinalprojectreportdueFriday,June9

3

Page 4: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

KeyQuantities

4

Ourdataisasetofsamples:

Page 5: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

GibbsSamplingTemplate

5

Page 6: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

LDA

6

Page 7: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

ExpectationMaximization(EM)

• EMisanalgorithmictemplatethatfindsalocalmaximumofthemarginallikelihoodoftheobserveddata

7

Page 8: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

EM• “E”step:– computeposteriorsoverlatentvariables:

• “M”step:– updateparametersgivenposteriors:

8

Page 9: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

DifferentViewsoftheDirichlet Process(DP)

• lasttimewediscussedthe“stick-breaking”viewoftheDP

• todaywe’llbrieflydiscussthe“ChineseRestaurantProcess”view

• withbothviews,westillhavethesameDPhyperparameters(basedistribution&concentrationparameter)

9

Page 10: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

BaseDistributionforDP

• ourunboundeddistributionoveritemswillchoosethemfromthebasedistribution

• basedistributionusuallyhasinfinitesupport• simpleexamplebasedistributionforourmorphlexicon:

10

Page 11: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

ConcentrationParameter• instick-breakingprocess,concentrationparameterdetermineshowmuchofthestickwebreakoffeachtime

• highconcentration==smallpartsofstick

11

fullstick

Page 12: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• thestick-breakingconstructionoftheDPisusefulforspecifyingmodelsanddefininginferencealgorithms

• anotherusefulwayofrepresentingadrawfromaDPiswiththeChineseRestaurantProcess(CRP)– CRPprovidesadistributionoverpartitionswithanunboundednumberofparts

12

Page 13: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• imagineaChineserestaurantwithaninfinitenumberoftables…

13

1 2 3 4

Page 14: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• firstcustomersitsatfirsttable:

14

1 2 3 4

Page 15: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• secondcustomerenters,choosesatable:

15

1 2 3 4

Page 16: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• secondcustomerenters,

choosestable1:

16

1 2 3 4

Page 17: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• secondcustomerenters,

choosestable1:

choosesnewtable:

17

1 2 3 4

Page 18: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• secondcustomerenters,

choosestable1

18

1 2 3 4

Page 19: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• thirdcustomerenters,

19

1 2 3 4

Page 20: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• thirdcustomerenters,

choosestable1:

choosesnewtable:

20

1 2 3 4

Page 21: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• thirdcustomerenters,

choosesnewtable

21

1 2 3 4

Page 22: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• fourthcustomerenters,

p(choosetable1):p(choosetable2):

p(choosenewtable):

22

1 2 3 4

Page 23: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

23

1 2 3 4

Page 24: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• largevalueofconcentrationparameter:

24

1 2 3 4

fullstick

Page 25: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• smallvalueofconcentrationparameter:

25

1 2 3 4

Page 26: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

ADrawGfromaDP(Stick-BreakingRepresentation)

26

drawinfiniteprobabilitiesfromstick-breakingprocesswithparameters

drawatomsfrombasedistributionatomscanberepeated!

Page 27: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

ARepresentationofGDrawnfromaDP(ChineseRestaurantProcessRepresentation)

27

drawtableassignmentsforn customerswithparameters

foreachoccupiedtable,drawatomfrombasedistribution

numberoftablesoccupied

eachdrawfromGisanatom,whereitsprobabilitycomesfromthenumberofcustomersatitstable

Page 28: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

WhentobeBayesian?• ifyou’redoingunsupervisedlearningorlearningwithlatentvariables

• ifyouwanttomarginalizeoutsomemodelparameters

• ifyouwanttolearnthestructure/architectureofyourmodel

• ifyouwanttolearnapotentially-unboundedlexicon(Bayesiannonparametrics)

28

Page 29: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

WhatisStructuredPrediction?

29

Page 30: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Modeling,Inference,Learning

30

Page 31: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Modeling,Inference,Learning

• Modeling:Howdoweassignascoretoan(x,y)pairusingparameters?

modeling:definescorefunction

31

Page 32: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Modeling,Inference,Learning

• Inference:Howdoweefficientlysearchoverthespaceofalllabels?

inference:solve_ modeling:definescorefunction

32

Page 33: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Modeling,Inference,Learning

• Learning:Howdowechoose?

learning:choose_

modeling:definescorefunctioninference:solve_

33

Page 34: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Modeling,Inference,Learning

StructuredPrediction:sizeofoutputspaceisexponentialinsizeofinputorisunbounded(e.g.,machinetranslation)(wecan’tjustenumerateallpossibleoutputs)

learning:choose_

modeling:definescorefunctioninference:solve_

34

Page 35: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

35

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

Simplestkindofstructuredprediction:SequenceLabeling

Page 36: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

36

OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct

OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.

NamedEntityRecognition

B=“begin”I=“inside”O=“outside”

FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:

Page 37: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

ConstituentParsing(S(NPtheman)(VPwalked(PPto(NPthepark))))

37

themanwalkedtothepark

S

NP

NP

VP

PP

Key:S=sentenceNP=nounphraseVP=verbphrasePP=prepositionalphraseDT=determinerNN=nounVBD=verb(pasttense)IN=preposition

DT NN VBDINDTNN

Page 38: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

38

source: $ konnten sie es übersetzen ?

reference: $ could you translate it ?“wall”symbol

DependencyParsing

Page 39: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Coreference ResolutionAsweheadtowardstrainingcamp,thePhiladelphia

Eagleshavefinallyfilledmostoftheirneedsonoffense.Oneofthemaingoalsforthisoff-seasonwastofind

weaponsfortheteam’sfranchisequarterback,CarsonWentz.TheEagles neededawidereceiverwhocouldstretchthefieldandgiveWentz theopportunitytothrowthelongball.They signedreceiverTorreySmithtoa3-yeardeal.

WhilethesigningofSmith washugefortheteam,thebiggestsigningtheEaglesmadewasformerChicagoBearsreceiverAlshon Jeffery.He hadasolid5-yearstintinChicago,butastheteamstartedtofallapart,Jefferywasforcedtoexploreotheroptions.

39

Page 40: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Coreference Resolutioninput:adocumentoutput:asetof“mentions”(textualspansindocument),andmembershipsofthosementionsinclusters

40

Page 41: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

SemanticRoleLabelingApplications

` Question & answer systems

Who did what to whom at where?

30

The police officer detained the suspect at the scene of the crime

ARG0 ARG2 AM-loc V Agent ThemePredicate Location

J&M/SLP3

input:asentenceoutput:onespaninthesentenceidentifiedasapredicate,andasetofotherspansidentifiedasparticularrolesforthatpredicate

Page 42: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

SupervisedWordAlignment

42

givenparallelsentences,predictwordalignments:

Brownetal.(1990)

Page 43: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

konnten : could

konnten : could

konnten sie : could you

sie : you

sie : you

es übersetzen : translate it

sie es übersetzen : you translate it

übersetzen :translate

übersetzen :translate

es : it

es : it

es : it

? : ?

? : ?

MachineTranslation• phrase-basedmodel(Koehnetal.,2003):

input:asentenceinthesourcelanguageoutput:asegmentationofthesourcesentenceintosegments,atranslationofeachsegment,andanorderingofthetranslations

Page 44: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

• Ithinkofstructuredpredictionmethodsintwoprimarycategories:score-basedandsearch-based

KeyCategoriesofStructuredPrediction

44

Page 45: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Score-BasedStructuredPrediction• focusondefiningthescorefunctionofthestructuredinput/outputpair:

• independencyparsing,thisiscalled“graph-basedparsing”becauseminimumspanningtreealgorithmscanbeusedtofindtheglobally-optimalmax-scoringtree

45

Page 46: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

Search-BasedStructuredPrediction• focusontheprocedureforsearchingthroughthestructuredoutputspace(usuallyinvolvessimplegreedyorbeamsearch)

• designaclassifiertoscoreasmallnumberofdecisionsateachpositioninthesearch• thisclassifiercanuseinformationaboutthecurrentstate

aswellastheentirehistoryofthesearch

• independencyparsing,thisiscalled“transition-basedparsing”becauseitconsistsofgreedily,sequentiallydecidingwhatparsingdecisiontomake

46

Page 47: TTIC 31210 · 2017. 5. 22. · local maximum of the marginal likelihood of the observed data 7. EM ... 26 draw infinite probabilities from stick-breaking process with parameter s

StructuredPrediction• tomakeSPpractical,weneedtodecomposetheSPproblemintoparts

• thisistruewhetherwearegoingtousesearch-basedorscore-basedSP– score-based:scorefunctiondecomposesadditivelyintoscoresofparts

– search-based:searchfactorsintoasequenceofdecisions,eachoneaddingaparttothefinaloutputstructure

47