76
Sequence Prediction and Part-of-speech Tagging Instructor: Yoav Artzi CS5740: Natural Language Processing Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and Slav Petrov

Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Sequence Prediction and Part-of-speech Tagging

Instructor: Yoav Artzi

CS5740: Natural Language Processing

Slides adapted from Dan Klein, Dan Jurafsky, Chris Manning, Michael Collins, Luke Zettlemoyer, Yejin Choi, and Slav Petrov

Page 2: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Overview• POS Tagging: the problem• Hidden Markov Models (HMM)– Supervised Learning– Inference• The Viterbi algorithm

• Feature-rich models–Maximum-entropy Markov Models– Perceptron– Conditional Random Fields

Page 3: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Parts of SpeechOpen class (lexical) words

Closed class (functional)

Nouns Verbs

Proper Common

Modals

Main

Adjectives

Adverbs

Prepositions

Particles

Determiners

Conjunctions

Pronouns

… more

… more

IBMItaly

cat / catssnow

seeregistered

canhad

old older oldest

slowly

to with

off up

the some

and or

he its

Numbers

122,312one

Interjections Ow Eh

Page 4: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

POS Tagging• Words often have more than one POS: back– The back door = JJ– On my back = NN– Win the voters back = RB– Promised to back the bill = VB

• The POS tagging problem is to determine the POS tag for a particular instance of a word.

Page 5: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

POS Tagging

• Input: Plays well with others• Ambiguity: NNS/VBZ UH/JJ/NN/RB IN NNS• Output: Plays/VBZ well/RB with/IN others/NNS• Uses:

– Text-to-speech (how do we pronounce “lead”?)– Can write regular expressions like (Det) Adj* N+ over the output for

phrases, etc.– As input to a full parser (e.g., to create dependency trees)– If you know the tag, you can back off to it in other tasks

Penn Treebank POS tags

Page 6: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Penn TreeBank Tagset

• Possible tags: 45• Tagging guidelines: 36 pages• Newswire text

Page 7: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

CC conjunction, coordinating and both but either orCD numeral, cardinal mid-1890 nine-thirty 0.5 oneDT determiner a all an every no that theEX existential there there FW foreign word gemeinschaft hund ich jeuxIN preposition or conjunction, subordinating among whether out on by ifJJ adjective or numeral, ordinal third ill-mannered regrettable

JJR adjective, comparative braver cheaper tallerJJS adjective, superlative bravest cheapest tallestMD modal auxiliary can may might will would NN noun, common, singular or mass cabbage thermostat investment subhumanity

NNP noun, proper, singular Motown Cougar Yvette LiverpoolNNPS noun, proper, plural Americans Materials StatesNNS noun, common, plural undergraduates bric-a-brac averagesPOS genitive marker ' 's PRP pronoun, personal hers himself it we themPRP$ pronoun, possessive her his mine my our ours their thy your RB adverb occasionally maddeningly adventurously

RBR adverb, comparative further gloomier heavier less-perfectlyRBS adverb, superlative best biggest nearest worst RP particle aboard away back by on open throughTO "to" as preposition or infinitive marker to UH interjection huh howdy uh whammo shucks heckVB verb, base form ask bring fire see take

VBD verb, past tense pleaded swiped registered sawVBG verb, present participle or gerund stirring focusing approaching erasingVBN verb, past participle dilapidated imitated reunifed unsettledVBP verb, present tense, not 3rd person singular twist appear comprise mold postponeVBZ verb, present tense, 3rd person singular bases reconstructs marks usesWDT WH-determiner that what whatever which whichever WP WH-pronoun that what whatever which who whomWP$ WH-pronoun, possessive whose WRB Wh-adverb however whenever where why

Main Tags

Page 8: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Penn TreeBank Tagset• How accurate are taggers? (Tag accuracy)– About >97% currently– But baseline is already 90%

• Baseline is performance of simplest possible method– Tag every word with its most frequent tag– Tag unknown words as nouns

– Partly easy because• Many words are unambiguous• You get points for them (the, a, etc.) and for

punctuation marks!– Upperbound: probably 2% annotation errors

Page 9: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Hard Cases are Hard

• Mrs/NNP Shaefer/NNP never/RB got/VBD around/RP to/TO joining/VBG

• All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT corner/NN

• Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

Page 10: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

How Difficult is POS Tagging?• About 11% of the word types in the Brown

corpus are ambiguous regarding their part of speech

• But they tend to be very common words. E.g., that– I know that he is honest = IN– Yes, that play was nice = DT– You can’t go that far = RB

• 40% of the word tokens are ambiguous

Page 11: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The Tagset

• Wait, do we really need all these tags?• What about other languages? – Each language has its own tagset

Page 12: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Tagsets in Different Languages

[Petrov et al. 2012]

Page 13: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The Tagset

• Wait, do we really need all these tags?• What about other languages? – Each language has its own tagset• But why is this bad?• Differences in downstream tasks• Harder to do language transfer

Page 14: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Alternative: The Universal Tagset

• 12 tags:– NOUN, VERB, ADJ, ADV, PRON, DET, ADP,

NUM, CONJ, PRT, ‘.’, and X.• Deterministic conversion from tagsets in

22 languages.• Better unsupervised parsing results• Was used to transfer parsers

[Petrov et al. 2012]

Page 15: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Sources of Information• What are the main sources of information

for POS tagging?– Knowledge of neighboring words• Bill saw that man yesterday• NNP VB(D) DT NN NN• VB NN IN VB NN

– Knowledge of word probabilities• man is rarely used as a verb….

• The latter proves the most useful, but the former also helps

Page 16: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Word-level Features• Can do surprisingly well just looking at a

word by itself:–Word the: the ® DT– Lowercased words:

importantly ® RB– Prefixes unfathomable: un- ® JJ– Suffixes Importantly: -ly ® RB– Capitalization Meridian: CAP ® NNP–Word shapes 35-year: d-x ® JJ

Page 17: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Sequence-to-SequenceConsider the problem of jointly modeling a pair of strings

– E.g.: part of speech taggingDT NNP NN VBD VBN RP NN NNSThe Georgia branch had taken on loan commitments …

DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …

Q: How do we map each word in the input sentence onto the appropriate label?A: We can learn a joint distribution:

And then compute the most likely assignment:

p(x1 . . . xn, y1 . . . yn)

arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)

Page 18: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Classic Solution: HMMsWe want a model of sequences y and observations x

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

where y0=START and we call 𝑞 𝑦𝑖 𝑦!"#) the transitiondistribution and 𝑒 𝑥𝑖 𝑦𝑖) the emission (or observation) distribution.

y1 y2 yn

x1 x2 xn

y0Emission

Transition

Page 19: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Model Assumptions

• Tag/state sequence is generated by a Markov model• Words are chosen independently, conditioned only on

the tag/state• These are totally broken assumptions for POS: why?

y1 y2 yn

x1 x2 xn

y0

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

EmissionTransition

Page 20: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM for POS Tagging

• HMM Model:– States 𝑌 = {DT, NNP, NN, ... } are the POS tags– Observations 𝑋 = V are words– Transition dist’n 𝑞 𝑦! 𝑦!"#) models the tag sequences– Emission dist’n 𝑒 𝑥! 𝑦!) models words given their POS

The Georgia branch had taken on loan commitments …

DT NNP NN VBD VBN RP NN NNS

Page 21: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM for POS Tagging

• HMM Model:– States 𝑌 = {DT, NNP, NN, ... } are the POS tags– Observations 𝑋 = V are words– Transition dist’n 𝑞 𝑦! 𝑦!"#) models the tag sequences– Emission dist’n 𝑒 𝑥! 𝑦!) models words given their POS

The Georgia branch had taken on loan commitments …

DT NNP NN VBD VBN RP NN NNS

Page 22: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM Inference and Learning• Learning– Maximum likelihood: transitions 𝑞 and emissions 𝑒

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

• Inference– Viterbi

y⇤ = arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)

p(x1 . . . xn, yi) =X

y1...yi�1

X

yi+1...yn

p(x1 . . . xn, y1 . . . yn)

– Forward backward

Page 23: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Learning: Maximum Likelihood

• Maximum likelihood methods for estimating transitions q and emissions e

• Will these estimates be high quality?– Which is likely to be more sparse, 𝑞 or 𝑒?

• Smoothing?

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

qML(yi|yi�1) =c(yi�1, yi)

c(yi�1)eML(x|y) =

c(y, x)

c(y)

Page 24: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Learning: Low Frequency Words

• Typically, for transitions:– Linear Interpolation

• However, other approaches used for emissions– Step 1: Split the vocabulary

• Frequent words: appear more than 𝑀 (often 5) times• Low frequency: everything else

– Step 2: Map each low frequency word to one of a small, finite set of possibilities• For example, based on prefixes, suffixes, etc.

– Step 3: Learn model for this new space of possible word sequences

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

q(yi|yi�1) = �1qML(yi|yi�1) + �2qML(yi)

Page 25: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Another Example: Chunking• Goal: Segment text into spans with certain

properties• For example, named entities: PER, ORG, and

LOCGermany ’s representative to the European Union ’s veterinary committee Werner Zwingman said on Wednesday consumers should…

[Germany]LOC ’s representative to the [European Union]ORG ’s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should…

How is this a sequence tagging problem?

Page 26: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Named Entity Recognition

• HMM Model:– States 𝑌 = {NA,BL,CL,BO,CO,BP,CP} represent

beginnings (BL,BO,BP) and continuations (CL,CO,CP) of chunks, as well as other words (NA)

– Observations 𝑋 = V are words– Transition dist’n 𝑞(𝑦!|𝑦!"#) models the tag sequences– Emission dist’n 𝑒(𝑥!|𝑦!) models words given their type

Germany ’s representative to the European Union ’s veterinary committee Werner Zwingman said on Wednesday consumers should…

[Germany]LOC ’s representative to the [European Union]ORG ’s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should…

Page 27: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Low Frequency Words: An Example

• Named Entity Recognition [Bickel et. al, 1999]– Used the following word classes for infrequent words:

Dealing with Low-Frequency Words: An Example

[Bikel et. al 1999] (named-entity recognition)

Word class Example Intuition

twoDigitNum 90 Two digit yearfourDigitNum 1990 Four digit yearcontainsDigitAndAlpha A8956-67 Product codecontainsDigitAndDash 09-96 DatecontainsDigitAndSlash 11/9/89 DatecontainsDigitAndComma 23,000.00 Monetary amountcontainsDigitAndPeriod 1.00 Monetary amount,percentageothernum 456789 Other numberallCaps BBN OrganizationcapPeriod M. Person name initialfirstWord first word of sentence no useful capitalization informationinitCap Sally Capitalized wordlowercase can Uncapitalized wordother , Punctuation marks, all other words

18

Page 28: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Low Frequency Words: An Example

• NA = No entity • SO = Start Organization• CO = Continue Organization • SL = Start Location • CL = Continue Location• …

Profits/NA soared/NA at/NA Boeing/SO Co./CO ,/NA easily/NA topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA as/NA their/NA CEO/NA Alan/SP Mulally/CP announced/NA first/NA quarter/NA results/NA ./NA

Page 29: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Low Frequency Words: An Example

• NA = No entity • SO = Start Organization• CO = Continue Organization • SL = Start Location • CL = Continue Location• …

Profits/NA soared/NA at/NA Boeing/SO Co./CO ,/NA easily/NA topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA as/NA their/NA CEO/NA Alan/SP Mulally/CP announced/NA first/NA quarter/NA results/NA ./NA

firstword/NA soared/NA at/NA initCap/SC Co./CC ,/NA easily/NA lowercase/NA forecasts/NA on/NA initCap/SL Street/CL ,/NA as/NA their/NA CEO/NA Alan/SP initCap/CP announced/NA first/NA quarter/NA results/NA ./NA

Page 30: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM Inference and Learning• Learning– Maximum likelihood: transitions q and emissions e

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

• Inference– Viterbi

y⇤ = arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)

p(x1 . . . xn, yi) =X

y1...yi�1

X

yi+1...yn

p(x1 . . . xn, y1 . . . yn)

– Forward backward

Page 31: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Inference (Decoding)• Problem: find the most likely (Viterbi) sequence under the model

y⇤ = arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)• Given model parameters, we can score any sequence pair

NNP VBZ NN NNS CD NN .Fed raises interest rates 0.5 percent .

q(NNP| ) e(Fed|NNP) q(VBZ|NNP) e(raises|VBZ) q(NN|VBZ)…..• In principle, we’re done – list all possible tag sequences, score

each one, pick the best one (the Viterbi state sequence)

NNP VBZ NN NNS CD NN . 𝑙𝑜𝑔𝑝(𝑥, 𝑦) = −23NNP NNS NN NNS CD NN . log(𝑥, 𝑦) = −29

NNP VBZ VB NNS CD NN . log 𝑝(𝑥, 𝑦) = −27

Any issue?

Page 32: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Finding the Best Trajectory • Too many trajectories (state sequences) to list• Option 1: Beam Search

– A beam is a set of partial hypotheses– Start with just the single empty trajectory– At each derivation step:

• Consider all continuations of previous hypotheses• Discard most, keep top k

<>Fed:N

Fed:V

Fed:J

raises:Nraises:Vraises:Nraises:V

• Beam search often works OK in practice, but …• … sometimes you want the optimal answer• … and there’s usually a better option than naïve beams

Page 33: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The State Lattice / Trellis^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates STOP^ N V V J V

q(N|^)

q(J|V)

e(Fed|N)

q(V|J)

q(V|N)e(raises|V) e(interest|V)

e(rates|J)q(V|V)

e(STOP|V)

Page 34: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Scoring a Sequence

• Define π(i,yi) to be the max score of a sequence of length iending in tag yi

• We can now design an efficient algorithm. – How?

y⇤ = arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

⇡(i, yi) = maxy1...yi�1

p(x1 . . . xi, y1 . . . yi)

= maxyi�1

e(xi|yi)q(yi|yi�1) maxy1...yi�2

p(x1 . . . xi�1, y1 . . . yi�1)

= maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)= maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)= maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

Page 35: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The Viterbi AlgorithmDynamic program for computing (for all i)

⇡(i, yi) = maxy1...yi�1

p(x1 . . . xi, y1 . . . yi)

⇡(0, y0) =

⇢1 if y0 == START0 otherwise

⇡(i, yi) = maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

bp(i, yi) = argmaxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

Iterative computation:

For i = 1 … n:// Store score

// Store back-pointerWhat for?

Page 36: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The State Lattice / Trellis

^

N

V

$

START

^

N

V

$

Fed

^

N

V

STOP

^

N

V

interest

^

N

V

raisesfrom \ to ^ N V $^ 0.0 0.6 0.4 0.0N 0.0 0.4 0.2 0.4V 0.0 0.6 0.1 0.3$ 0.0 0.0 0.0 1.0

Tie breaking:Prefer first

emissions START Fed raises interest STOP^ 1.0 0.0 0.0 0.0 0.0N 0.0 0.45 0.1 0.45 0.0V 0.0 0.0 0.7 0.3 0.0$ 0.0 0.0 0.0 0.0 1.0

$ $ $

Page 37: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The State Lattice / Trellis

^

N

V

$

START

^

N

V

$

Fed

^

N

V

STOP

^

N

V

interest

^

N

V

raisesfrom \ to ^ N V $^ 0.0 0.6 0.4 0.0N 0.0 0.4 0.2 0.4V 0.0 0.6 0.1 0.3$ 0.0 0.0 0.0 1.0

π = 1 bp = null

π = 0 bp = ^

π = 0 bp = ^

π = 0 bp = ^

π = 0 bp = null

π = 0.27 bp = ^

π = 0.0108 bp = N

π = 0.010206bp = V

π = 0 bp = null

π = 0bp = ^

π = 0 bp = ^

π = 0.0378 bp = N

π = 0.001134bp = V

π = 0bp = null

π = 0bp = ^

π = 0bp = ^

Tie breaking:Prefer first

π = 0 bp = ^

π = 0bp = ^

π = 0 bp = ^

π = 0.0040824bp = N

emissions START Fed raises interest STOP^ 1.0 0.0 0.0 0.0 0.0N 0.0 0.45 0.1 0.45 0.0V 0.0 0.0 0.7 0.3 0.0$ 0.0 0.0 0.0 0.0 1.0

$ $ $

Page 38: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The Viterbi Algorithm: Runtime• In term of sentence length n?– Linear

• In term of number of states |K|?– Polynomial

• Specifically:

• Total runtime:

• Q: Is this a practical algorithm?• A: depends on |K|….

⇡(i, yi) = maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

O(n|K|) entries in ⇡(i, yi)

O(n|K|2)O(|K|) time to compute each ⇡(i, yi)

Page 39: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Tagsets in Different Languages

[Petrov et al. 2012]

2942 = 86436

452 = 2045

112 = 121

Page 40: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM Inference and Learning• Learning– Maximum likelihood: transitions q and emissions e

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

• Inference– Viterbi

y⇤ = arg maxy1...yn

p(x1 . . . xn, y1 . . . yn)

p(x1 . . . xn, yi) =X

y1...yi�1

X

yi+1...yn

p(x1 . . . xn, y1 . . . yn)

– Forward backward

Page 41: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

What about n-gram Taggers?• States encode what is relevant about the past• Transitions P(si | si-1) encode well-formed tag sequences

– In a bigram tagger, states = tags

– In a trigram tagger, states = tag pairs

<¨>

s1 s2 sn

x1 x2 xn

s0

< y1> < y2> < yn>

<¨,¨>

s1 s2 sn

x1 x2 xn

s0

< ¨, y1> < y1, y2> < yn-1, yn>

Page 42: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The State Lattice / Trellis

N,N

$

START Fed raises interest …^ N D V …q(N|^,^)

^,^

N,V

N,D

D,V

N,N

$

^,^

^,N

N,D

D,V

N,N

$

^,^

^,V

N,D

D,V

N,N

$

^,^

^,V

N,D

D,V

… … … …

q(D|^,N)

q(V|N,D)

e(Fed|N)

e(raises|D)

e(interest|V)

Not all edges are allowed

Page 43: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Tagsets in Different Languages

[Petrov et al. 2012]

2942 = 86436

452 = 2045

112 = 121

2944 = 7471182096

454 = 4100625

114 = 14641

Page 44: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

• A carefully smoothed trigram tagger• Suffix trees for emissions

Most errors on unknown

words

Page 45: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Re-visit P(x | y)

• Reality check:–What if we drop the sequence?• Use only P(x | y)

–Most frequent tag:• 90.3% with a so-so unknown word model

– Can we do better?

Page 46: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

What about better features?• Looking at a word and its environment– Add in previous / next word the __– Previous / next word shapes X __ X– Occurrence pattern features [X: x X occurs]– Crude entity detection __ ….. (Inc.|Co.)– Phrasal verb in sentence? put …… __– Conjunctions of these things

• Uses lots of features: > 200K

Page 47: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

• What does this tell us about sequence models?• How do we add more features to our sequence

models?

Page 48: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

MEMM TaggersOne step up: also condition on previous tags:

• Training:– Train 𝑝(𝑦!|𝑦!"#, 𝑥1…𝑥$) as a discrete log-linear (MaxEnt) model

• Scoring:

• This is referred to as an MEMM tagger [Ratnaparkhi 96]

p(y1 . . . yn|x1 . . . xn) =nY

i=1

p(yi|y1 . . . yi�1, x1 . . . xn)

=nY

i=1

p(yi|yi�1, x1 . . . xn)

p(yi|yi�1, x1 . . . xn) =ew·�(x1...xn,i,yi�1,yi)

Py0 ew·�(x1...xn,i,yi�1,y0)

Page 49: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

HMM vs. MEMM

• HMM models joint distribution:

• MEMM models conditioned distribution:

p(x1 . . . xn, y1 . . . yn) = q(STOP |yn)nY

i=1

q(yi|yi�1)e(xi|yi)

p(y1 . . . yn|x1 . . . xn) =nY

i=1

p(yi|y1 . . . yi�1, x1 . . . xn)

Page 50: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Decoding MEMM Taggers

• Scoring:

• Beam search is effective• Guarantees? Optimal?• Can we do better?

p(yi|yi�1, x1 . . . xn) =ew·�(x1...xn,i,yi�1,yi)

Py0 ew·�(x1...xn,i,yi�1,y0)

Page 51: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The State Lattice / Trellis^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates STOP^ N V V J V

q(N|^)

q(J|V)

e(Fed|N)

q(V|J)

q(V|N)e(raises|V) e(interest|V)

e(rates|J)q(V|V)

e(STOP|V)

Page 52: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The MEMM State Lattice / Trellis^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates STOP^ N V V J V

q(N|^, X)

q(J|V, X) q(V|J, X)

q(V|N, X) q(V|V, X)

Page 53: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Decoding MEMM Taggers• Decoding MaxEnt taggers:– Just like decoding HMMs– Viterbi, beam search

• Viterbi algorithm (HMMs):– Define 𝜋(𝑖, 𝑦𝑖) to be the max score of a sequence of

length 𝑖 ending in tag 𝑦𝑖

• Viterbi algorithm (MaxEnt):– Can use same algorithm for MEMMs, just need to

redefine 𝜋(𝑖, 𝑦𝑖) !

⇡(i, yi) = maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

⇡(i, yi) = maxyi�1

p(yi|yi�1, x1 . . . xm)⇡(i� 1, yi�1)

Page 54: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

[Ratnaparkhi 1996]

Page 55: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Feature Development

NN/JJ NNofficial knowledge

RB VBD/VBN NNSrecently sold shares

[Toutanova and Manning 2000]

Common errors:

Page 56: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

[Toutanova and Manning 2000]

Page 57: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Locally Normalized Models

• So far:– Probabilities are product of locally normalized

probabilities– Is this bad?

• Label bias– States with fewer transitions are likely to be

preferred because normalization is local

Page 58: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Locally Normalized Models• So far:– Probabilities are product of locally normalized

probabilities– Is this bad?

A

B

C

from \ to A B CA 0.4 0.2 0.4B 0.0 1.0 0.0C 0.6 0.2 0.2

A

B

C

A

B

C

0.4 0.4

1.00.4

0.2 AAA à 0.4 x 0.4 = 0.16ABB à 0.2 x 1.0 = 0.2

B à B transitions are likely to take over even if rarely observed!

Page 59: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Global Discriminative Taggers

• Discriminative sequence models– CRFs (also Perceptrons)– Do not decompose training into independent

local regions– Can be very slow* to train – require repeated

inference on training set

* Relatively slow. NN models are much slower.

Page 60: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Linear Models: Perceptron• The perceptron algorithm

– Iteratively processes the data, reacting to training errors– Can be thought of as trying to drive down training error

• The (online structured) perceptron algorithm:– Start with zero weights– Visit training instances (𝑋 ! , 𝑌(!)) one by one

• Make a prediction

• If correct (𝑌∗ == 𝑌(#)): – no change, goto next example!

• If wrong: – adjust weights:

• Challenge: How to compute argmax efficiently?

Tag Sequence:𝑌 = 𝑦1…𝑦𝑚

Sentence: 𝑋 = 𝑥1…𝑥𝑛

Y ⇤ = argmaxY

w · �(X(i), Y )<latexit sha1_base64="HTXo+EXkjySfRxFsn4XAFH1DsEY=">AAACD3icbVDLSsNAFJ3UV62vqEs3g0VsRUoigroQim5cVjC2pUnDZDJph04ezEzUEvoJbvwVNy5U3Lp15984fSy09cCFwzn3cu89XsKokIbxreXm5hcWl/LLhZXVtfUNfXPrVsQpx8TCMYt5w0OCMBoRS1LJSCPhBIUeI3Wvdzn063eECxpHN7KfECdEnYgGFCOpJFffb7YP4Dm0Ee/YIXpwm/Dexn4s7aRLS412VqLlwSFsll29aFSMEeAsMSekCCaoufqX7cc4DUkkMUNCtEwjkU6GuKSYkUHBTgVJEO6hDmkpGqGQCCcbPTSAe0rxYRBzVZGEI/X3RIZCIfqhpzpDJLti2huK/3mtVAanTkajJJUkwuNFQcqgjOEwHehTTrBkfUUQ5lTdCnEXcYSlyrCgQjCnX54l1lHlrGJeHxerF5M08mAH7IISMMEJqIIrUAMWwOARPINX8KY9aS/au/Yxbs1pk5lt8Afa5w8R3prr</latexit><latexit sha1_base64="HTXo+EXkjySfRxFsn4XAFH1DsEY=">AAACD3icbVDLSsNAFJ3UV62vqEs3g0VsRUoigroQim5cVjC2pUnDZDJph04ezEzUEvoJbvwVNy5U3Lp15984fSy09cCFwzn3cu89XsKokIbxreXm5hcWl/LLhZXVtfUNfXPrVsQpx8TCMYt5w0OCMBoRS1LJSCPhBIUeI3Wvdzn063eECxpHN7KfECdEnYgGFCOpJFffb7YP4Dm0Ee/YIXpwm/Dexn4s7aRLS412VqLlwSFsll29aFSMEeAsMSekCCaoufqX7cc4DUkkMUNCtEwjkU6GuKSYkUHBTgVJEO6hDmkpGqGQCCcbPTSAe0rxYRBzVZGEI/X3RIZCIfqhpzpDJLti2huK/3mtVAanTkajJJUkwuNFQcqgjOEwHehTTrBkfUUQ5lTdCnEXcYSlyrCgQjCnX54l1lHlrGJeHxerF5M08mAH7IISMMEJqIIrUAMWwOARPINX8KY9aS/au/Yxbs1pk5lt8Afa5w8R3prr</latexit><latexit sha1_base64="HTXo+EXkjySfRxFsn4XAFH1DsEY=">AAACD3icbVDLSsNAFJ3UV62vqEs3g0VsRUoigroQim5cVjC2pUnDZDJph04ezEzUEvoJbvwVNy5U3Lp15984fSy09cCFwzn3cu89XsKokIbxreXm5hcWl/LLhZXVtfUNfXPrVsQpx8TCMYt5w0OCMBoRS1LJSCPhBIUeI3Wvdzn063eECxpHN7KfECdEnYgGFCOpJFffb7YP4Dm0Ee/YIXpwm/Dexn4s7aRLS412VqLlwSFsll29aFSMEeAsMSekCCaoufqX7cc4DUkkMUNCtEwjkU6GuKSYkUHBTgVJEO6hDmkpGqGQCCcbPTSAe0rxYRBzVZGEI/X3RIZCIfqhpzpDJLti2huK/3mtVAanTkajJJUkwuNFQcqgjOEwHehTTrBkfUUQ5lTdCnEXcYSlyrCgQjCnX54l1lHlrGJeHxerF5M08mAH7IISMMEJqIIrUAMWwOARPINX8KY9aS/au/Yxbs1pk5lt8Afa5w8R3prr</latexit><latexit sha1_base64="HTXo+EXkjySfRxFsn4XAFH1DsEY=">AAACD3icbVDLSsNAFJ3UV62vqEs3g0VsRUoigroQim5cVjC2pUnDZDJph04ezEzUEvoJbvwVNy5U3Lp15984fSy09cCFwzn3cu89XsKokIbxreXm5hcWl/LLhZXVtfUNfXPrVsQpx8TCMYt5w0OCMBoRS1LJSCPhBIUeI3Wvdzn063eECxpHN7KfECdEnYgGFCOpJFffb7YP4Dm0Ee/YIXpwm/Dexn4s7aRLS412VqLlwSFsll29aFSMEeAsMSekCCaoufqX7cc4DUkkMUNCtEwjkU6GuKSYkUHBTgVJEO6hDmkpGqGQCCcbPTSAe0rxYRBzVZGEI/X3RIZCIfqhpzpDJLti2huK/3mtVAanTkajJJUkwuNFQcqgjOEwHehTTrBkfUUQ5lTdCnEXcYSlyrCgQjCnX54l1lHlrGJeHxerF5M08mAH7IISMMEJqIIrUAMWwOARPINX8KY9aS/au/Yxbs1pk5lt8Afa5w8R3prr</latexit>

w = w + �(X(i), Y (i))� �(X(i), Y ⇤)<latexit sha1_base64="F+nsSgDCwbImFeSxeiI1a2SUmwg=">AAACHHicbZDLSgMxFIbPeK31NurSTbAIrZcyYwXdCEU3LivYi7RjyaRpG5q5kGQsZeiDuPFV3LhQxI0LwbcxbWehrQcCH/9/Difnd0POpLKsb2NufmFxaTm1kl5dW9/YNLe2KzKIBKFlEvBA1FwsKWc+LSumOK2FgmLP5bTq9q5GfvWBCskC/1YNQup4uOOzNiNYaalpFvroAvXRIWqEXZat3cdZlhseobsJ5NDxjHGQa5oZK2+NC82CnUAGkio1zc9GKyCRR31FOJaybluhcmIsFCOcDtONSNIQkx7u0LpGH3tUOvH4uCHa10oLtQOhn6/QWP09EWNPyoHn6k4Pq66c9kbif149Uu1zJ2Z+GCnqk8midsSRCtAoKdRighLFBxowEUz/FZEuFpgonWdah2BPnzwLlZO8XchbN6eZ4mUSRwp2YQ+yYMMZFOEaSlAGAo/wDK/wZjwZL8a78TFpnTOSmR34U8bXD9XUnXw=</latexit>

Page 61: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Decoding• Linear Perceptron

– Features must be local, for 𝑋 = 𝑥1…𝑥𝑛, and 𝑌 = 𝑦1…𝑦𝑚

Y ⇤ = argmaxY

w · �(X,Y )<latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit>

�(X,Y ) =nX

j=1

�(X, j, yj�1, yj)

<latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit>

Page 62: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The MEMM State Lattice / Trellis^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates STOP^ N V V J V

q(N|^)

q(J|V) q(V|J)

q(V|N)q(V|V)

Page 63: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

The Perceptron State Lattice / Trellis^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates STOP^ N V V J V

w�Φ(X,1,N,^,x)

w�Φ(X,4J,V) w�Φ(X,5,V,J)

w�Φ(X,2,V,N)w�Φ(X,3,V,V)

Page 64: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Decoding• Linear Perceptron

– Features must be local, for 𝑋 = 𝑥1…𝑥𝑛, and 𝑌 = 𝑦1…𝑦%

– Define 𝜋(𝑖, 𝑦#) to be the max score of a sequence of length 𝑖ending in tag 𝑦𝑖

• Viterbi algorithm (HMMs):

• Viterbi algorithm (Maxent):

Y ⇤ = argmaxY

w · �(X,Y )<latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit><latexit sha1_base64="YEGLhKFe/hTjQsh8mLarkiVNq/w=">AAACCXicbVDLSsNAFJ3UV62vqEs3o0WoIiURQV0IRTcuKxjb0sQwmUzaoZMHMxO1hK7d+CtuXKi49Q/c+TdO2yy09cCFwzn3cu89XsKokIbxrRVmZufmF4qLpaXlldU1fX3jRsQpx8TCMYt500OCMBoRS1LJSDPhBIUeIw2vdzH0G3eECxpH17KfECdEnYgGFCOpJFffbt3uwzNoI96xQ/TgtuC9jf1Y2kmXVpoHsLXn6mWjaowAp4mZkzLIUXf1L9uPcRqSSGKGhGibRiKdDHFJMSODkp0KkiDcQx3SVjRCIRFONnplAHeV4sMg5qoiCUfq74kMhUL0Q091hkh2xaQ3FP/z2qkMTpyMRkkqSYTHi4KUQRnDYS7Qp5xgyfqKIMypuhXiLuIIS5VeSYVgTr48TazD6mnVvDoq187zNIpgC+yACjDBMaiBS1AHFsDgETyDV/CmPWkv2rv2MW4taPnMJvgD7fMHyhSYnw==</latexit>

�(X,Y ) =nX

j=1

�(X, j, yj�1, yj)

<latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit>

⇡(i, yi) = maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

⇡(i, yi) = maxyi�1

p(yi|yi�1, x1 . . . xm)⇡(i� 1, yi�1)

⇡(i, yi) = maxyi�1

w · �(X, i, yi�1, yi) + ⇡(i� 1, yi�1)<latexit sha1_base64="3oDwx7Kxz/+lRq2PEAcVvp8t2WE=">AAACN3icbVDLSsNAFJ34rPVVdelmsAgttiURQV0IRTfurGBsoQlhMp22QycPZiZqCPksN36GO3HjQsWtf+A0zUJbDwwczrmHO/e4IaNC6vqLNje/sLi0XFgprq6tb2yWtrZvRRBxTEwcsIB3XCQIoz4xJZWMdEJOkOcy0nZHF2O/fUe4oIF/I+OQ2B4a+LRPMZJKckpXVkgrtAZjh1YhPIOWhx6cJHYSWjfSFN5DC/cCCa1wSCudGq3lTi2bP4BZum6M85ledUplvaFngLPEyEkZ5Gg5pWerF+DII77EDAnRNfRQ2gnikmJG0qIVCRIiPEID0lXURx4RdpIdnsJ9pfRgP+Dq+RJm6u9EgjwhYs9Vkx6SQzHtjcX/vG4k+yd2Qv0wksTHk0X9iEEZwHGLsEc5wZLFiiDMqforxEPEEZaq66IqwZg+eZaYh43ThnF9VG6e520UwC7YAxVggGPQBJegBUyAwSN4Be/gQ3vS3rRP7WsyOqflmR3wB9r3D2zmqME=</latexit><latexit sha1_base64="3oDwx7Kxz/+lRq2PEAcVvp8t2WE=">AAACN3icbVDLSsNAFJ34rPVVdelmsAgttiURQV0IRTfurGBsoQlhMp22QycPZiZqCPksN36GO3HjQsWtf+A0zUJbDwwczrmHO/e4IaNC6vqLNje/sLi0XFgprq6tb2yWtrZvRRBxTEwcsIB3XCQIoz4xJZWMdEJOkOcy0nZHF2O/fUe4oIF/I+OQ2B4a+LRPMZJKckpXVkgrtAZjh1YhPIOWhx6cJHYSWjfSFN5DC/cCCa1wSCudGq3lTi2bP4BZum6M85ledUplvaFngLPEyEkZ5Gg5pWerF+DII77EDAnRNfRQ2gnikmJG0qIVCRIiPEID0lXURx4RdpIdnsJ9pfRgP+Dq+RJm6u9EgjwhYs9Vkx6SQzHtjcX/vG4k+yd2Qv0wksTHk0X9iEEZwHGLsEc5wZLFiiDMqforxEPEEZaq66IqwZg+eZaYh43ThnF9VG6e520UwC7YAxVggGPQBJegBUyAwSN4Be/gQ3vS3rRP7WsyOqflmR3wB9r3D2zmqME=</latexit><latexit sha1_base64="3oDwx7Kxz/+lRq2PEAcVvp8t2WE=">AAACN3icbVDLSsNAFJ34rPVVdelmsAgttiURQV0IRTfurGBsoQlhMp22QycPZiZqCPksN36GO3HjQsWtf+A0zUJbDwwczrmHO/e4IaNC6vqLNje/sLi0XFgprq6tb2yWtrZvRRBxTEwcsIB3XCQIoz4xJZWMdEJOkOcy0nZHF2O/fUe4oIF/I+OQ2B4a+LRPMZJKckpXVkgrtAZjh1YhPIOWhx6cJHYSWjfSFN5DC/cCCa1wSCudGq3lTi2bP4BZum6M85ledUplvaFngLPEyEkZ5Gg5pWerF+DII77EDAnRNfRQ2gnikmJG0qIVCRIiPEID0lXURx4RdpIdnsJ9pfRgP+Dq+RJm6u9EgjwhYs9Vkx6SQzHtjcX/vG4k+yd2Qv0wksTHk0X9iEEZwHGLsEc5wZLFiiDMqforxEPEEZaq66IqwZg+eZaYh43ThnF9VG6e520UwC7YAxVggGPQBJegBUyAwSN4Be/gQ3vS3rRP7WsyOqflmR3wB9r3D2zmqME=</latexit><latexit sha1_base64="3oDwx7Kxz/+lRq2PEAcVvp8t2WE=">AAACN3icbVDLSsNAFJ34rPVVdelmsAgttiURQV0IRTfurGBsoQlhMp22QycPZiZqCPksN36GO3HjQsWtf+A0zUJbDwwczrmHO/e4IaNC6vqLNje/sLi0XFgprq6tb2yWtrZvRRBxTEwcsIB3XCQIoz4xJZWMdEJOkOcy0nZHF2O/fUe4oIF/I+OQ2B4a+LRPMZJKckpXVkgrtAZjh1YhPIOWhx6cJHYSWjfSFN5DC/cCCa1wSCudGq3lTi2bP4BZum6M85ledUplvaFngLPEyEkZ5Gg5pWerF+DII77EDAnRNfRQ2gnikmJG0qIVCRIiPEID0lXURx4RdpIdnsJ9pfRgP+Dq+RJm6u9EgjwhYs9Vkx6SQzHtjcX/vG4k+yd2Qv0wksTHk0X9iEEZwHGLsEc5wZLFiiDMqforxEPEEZaq66IqwZg+eZaYh43ThnF9VG6e520UwC7YAxVggGPQBJegBUyAwSN4Be/gQ3vS3rRP7WsyOqflmR3wB9r3D2zmqME=</latexit>

Page 65: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

[Collins 2002]

Page 66: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Conditional Random Fields (CRFs)

• What did we lose with the Perceptron?– No probabilities– Let’s try again with a probabilistic model

Page 67: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

p(Y | X;w) =exp(w · �(X,Y ))PY 0 exp(w · �(X,Y 0))

<latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit>

CRFs• Maximum entropy (logistic regression)

– Learning: maximize the (log) conditional likelihood of training data

– Computational challenges?• Most likely tag sequence, normalization constant, gradient

[Lafferty et al. 2001]

Sentence: 𝑋 = 𝑥1…𝑥𝑛

Tag Sequence: 𝑌 = 𝑦1…𝑦𝑛

{(X(i), Y (i))}mi=1<latexit sha1_base64="mYh9LCGcCERxsE2VtLE3vzH+LCs=">AAACBXicbVDLSsNAFJ3UV62vqEsRBovQgpREBHUhFN24rGBspUnDZDpth84kYWYilJCVG3/FjQsVt/6DO//GaZuFth643MM59zJzTxAzKpVlfRuFhcWl5ZXiamltfWNzy9zeuZNRIjBxcMQi0QqQJIyGxFFUMdKKBUE8YKQZDK/GfvOBCEmj8FaNYuJx1A9pj2KktOSb+25aaXXSCq1mR/fTXnUzP6UXdtbhvlm2atYEcJ7YOSmDHA3f/HK7EU44CRVmSMq2bcXKS5FQFDOSldxEkhjhIeqTtqYh4kR66eSMDB5qpQt7kdAVKjhRf2+kiEs54oGe5EgN5Kw3Fv/z2onqnXkpDeNEkRBPH+olDKoIjjOBXSoIVmykCcKC6r9CPEACYaWTK+kQ7NmT54lzXDuv2Tcn5fplnkYR7IEDUAE2OAV1cA0awAEYPIJn8ArejCfjxXg3PqajBSPf2QV/YHz+AL3ql5g=</latexit><latexit sha1_base64="mYh9LCGcCERxsE2VtLE3vzH+LCs=">AAACBXicbVDLSsNAFJ3UV62vqEsRBovQgpREBHUhFN24rGBspUnDZDpth84kYWYilJCVG3/FjQsVt/6DO//GaZuFth643MM59zJzTxAzKpVlfRuFhcWl5ZXiamltfWNzy9zeuZNRIjBxcMQi0QqQJIyGxFFUMdKKBUE8YKQZDK/GfvOBCEmj8FaNYuJx1A9pj2KktOSb+25aaXXSCq1mR/fTXnUzP6UXdtbhvlm2atYEcJ7YOSmDHA3f/HK7EU44CRVmSMq2bcXKS5FQFDOSldxEkhjhIeqTtqYh4kR66eSMDB5qpQt7kdAVKjhRf2+kiEs54oGe5EgN5Kw3Fv/z2onqnXkpDeNEkRBPH+olDKoIjjOBXSoIVmykCcKC6r9CPEACYaWTK+kQ7NmT54lzXDuv2Tcn5fplnkYR7IEDUAE2OAV1cA0awAEYPIJn8ArejCfjxXg3PqajBSPf2QV/YHz+AL3ql5g=</latexit><latexit sha1_base64="mYh9LCGcCERxsE2VtLE3vzH+LCs=">AAACBXicbVDLSsNAFJ3UV62vqEsRBovQgpREBHUhFN24rGBspUnDZDpth84kYWYilJCVG3/FjQsVt/6DO//GaZuFth643MM59zJzTxAzKpVlfRuFhcWl5ZXiamltfWNzy9zeuZNRIjBxcMQi0QqQJIyGxFFUMdKKBUE8YKQZDK/GfvOBCEmj8FaNYuJx1A9pj2KktOSb+25aaXXSCq1mR/fTXnUzP6UXdtbhvlm2atYEcJ7YOSmDHA3f/HK7EU44CRVmSMq2bcXKS5FQFDOSldxEkhjhIeqTtqYh4kR66eSMDB5qpQt7kdAVKjhRf2+kiEs54oGe5EgN5Kw3Fv/z2onqnXkpDeNEkRBPH+olDKoIjjOBXSoIVmykCcKC6r9CPEACYaWTK+kQ7NmT54lzXDuv2Tcn5fplnkYR7IEDUAE2OAV1cA0awAEYPIJn8ArejCfjxXg3PqajBSPf2QV/YHz+AL3ql5g=</latexit><latexit sha1_base64="mYh9LCGcCERxsE2VtLE3vzH+LCs=">AAACBXicbVDLSsNAFJ3UV62vqEsRBovQgpREBHUhFN24rGBspUnDZDpth84kYWYilJCVG3/FjQsVt/6DO//GaZuFth643MM59zJzTxAzKpVlfRuFhcWl5ZXiamltfWNzy9zeuZNRIjBxcMQi0QqQJIyGxFFUMdKKBUE8YKQZDK/GfvOBCEmj8FaNYuJx1A9pj2KktOSb+25aaXXSCq1mR/fTXnUzP6UXdtbhvlm2atYEcJ7YOSmDHA3f/HK7EU44CRVmSMq2bcXKS5FQFDOSldxEkhjhIeqTtqYh4kR66eSMDB5qpQt7kdAVKjhRf2+kiEs54oGe5EgN5Kw3Fv/z2onqnXkpDeNEkRBPH+olDKoIjjOBXSoIVmykCcKC6r9CPEACYaWTK+kQ7NmT54lzXDuv2Tcn5fplnkYR7IEDUAE2OAV1cA0awAEYPIJn8ArejCfjxXg3PqajBSPf2QV/YHz+AL3ql5g=</latexit>

@

@wjL(w) =

mX

I=1

�j(X

(i), Y (i))�X

Y

p(Y | X(i);w)�j(X(i), Y )

!� �wj

<latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit>

Page 68: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Decoding• CRFs

– Features must be local, for 𝑥 = 𝑥1…𝑥𝑛, and 𝑦 = 𝑦1…𝑦𝑛

• Looks familiar?• Same as linear Perceptron!

⇡(i, yi) = maxyi�1

�(x, i, yi�i, yi) + ⇡(i� 1, yi�1)

p(Y | X;w) =exp(w · �(X,Y ))PY 0 exp(w · �(X,Y 0))

<latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit>

�(X,Y ) =nX

j=1

�(X, j, yj�1, yj)

<latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit>

Y ⇤ = argmaxY

p(Y | X;w)<latexit sha1_base64="J6jKOFZWc2k6XII9t6ayZ4+AmMk=">AAACB3icbVBNS8NAEN3Ur1q/oh49uFiE6qEkIqiIUPTisYKxLU0Mm+22XbqbhN2NWkKPXvwrXjyoePUvePPfuG1z0NYHA4/3ZpiZF8SMSmVZ30ZuZnZufiG/WFhaXlldM9c3bmSUCEwcHLFI1AMkCaMhcRRVjNRjQRAPGKkFvYuhX7sjQtIovFb9mHgcdULaphgpLfnmduN2H55BF4mOy9GD34BxqQFdTluwfgrv93yzaJWtEeA0sTNSBBmqvvnltiKccBIqzJCUTduKlZcioShmZFBwE0lihHuoQ5qahogT6aWjRwZwVyst2I6ErlDBkfp7IkVcyj4PdCdHqisnvaH4n9dMVPvYS2kYJ4qEeLyonTCoIjhMBbaoIFixviYIC6pvhbiLBMJKZ1fQIdiTL08T56B8UravDouV8yyNPNgCO6AEbHAEKuASVIEDMHgEz+AVvBlPxovxbnyMW3NGNrMJ/sD4/AFpRJc9</latexit><latexit sha1_base64="J6jKOFZWc2k6XII9t6ayZ4+AmMk=">AAACB3icbVBNS8NAEN3Ur1q/oh49uFiE6qEkIqiIUPTisYKxLU0Mm+22XbqbhN2NWkKPXvwrXjyoePUvePPfuG1z0NYHA4/3ZpiZF8SMSmVZ30ZuZnZufiG/WFhaXlldM9c3bmSUCEwcHLFI1AMkCaMhcRRVjNRjQRAPGKkFvYuhX7sjQtIovFb9mHgcdULaphgpLfnmduN2H55BF4mOy9GD34BxqQFdTluwfgrv93yzaJWtEeA0sTNSBBmqvvnltiKccBIqzJCUTduKlZcioShmZFBwE0lihHuoQ5qahogT6aWjRwZwVyst2I6ErlDBkfp7IkVcyj4PdCdHqisnvaH4n9dMVPvYS2kYJ4qEeLyonTCoIjhMBbaoIFixviYIC6pvhbiLBMJKZ1fQIdiTL08T56B8UravDouV8yyNPNgCO6AEbHAEKuASVIEDMHgEz+AVvBlPxovxbnyMW3NGNrMJ/sD4/AFpRJc9</latexit><latexit sha1_base64="J6jKOFZWc2k6XII9t6ayZ4+AmMk=">AAACB3icbVBNS8NAEN3Ur1q/oh49uFiE6qEkIqiIUPTisYKxLU0Mm+22XbqbhN2NWkKPXvwrXjyoePUvePPfuG1z0NYHA4/3ZpiZF8SMSmVZ30ZuZnZufiG/WFhaXlldM9c3bmSUCEwcHLFI1AMkCaMhcRRVjNRjQRAPGKkFvYuhX7sjQtIovFb9mHgcdULaphgpLfnmduN2H55BF4mOy9GD34BxqQFdTluwfgrv93yzaJWtEeA0sTNSBBmqvvnltiKccBIqzJCUTduKlZcioShmZFBwE0lihHuoQ5qahogT6aWjRwZwVyst2I6ErlDBkfp7IkVcyj4PdCdHqisnvaH4n9dMVPvYS2kYJ4qEeLyonTCoIjhMBbaoIFixviYIC6pvhbiLBMJKZ1fQIdiTL08T56B8UravDouV8yyNPNgCO6AEbHAEKuASVIEDMHgEz+AVvBlPxovxbnyMW3NGNrMJ/sD4/AFpRJc9</latexit><latexit sha1_base64="J6jKOFZWc2k6XII9t6ayZ4+AmMk=">AAACB3icbVBNS8NAEN3Ur1q/oh49uFiE6qEkIqiIUPTisYKxLU0Mm+22XbqbhN2NWkKPXvwrXjyoePUvePPfuG1z0NYHA4/3ZpiZF8SMSmVZ30ZuZnZufiG/WFhaXlldM9c3bmSUCEwcHLFI1AMkCaMhcRRVjNRjQRAPGKkFvYuhX7sjQtIovFb9mHgcdULaphgpLfnmduN2H55BF4mOy9GD34BxqQFdTluwfgrv93yzaJWtEeA0sTNSBBmqvvnltiKccBIqzJCUTduKlZcioShmZFBwE0lihHuoQ5qahogT6aWjRwZwVyst2I6ErlDBkfp7IkVcyj4PdCdHqisnvaH4n9dMVPvYS2kYJ4qEeLyonTCoIjhMBbaoIFixviYIC6pvhbiLBMJKZ1fQIdiTL08T56B8UravDouV8yyNPNgCO6AEbHAEKuASVIEDMHgEz+AVvBlPxovxbnyMW3NGNrMJ/sD4/AFpRJc9</latexit>

argmaxY

exp(w · �(X,Y ))PY 0 exp(w · �(X,Y 0))

= argmaxY

exp(w · �(X,Y ))

= argmaxY

w · �(X,Y )<latexit sha1_base64="QUKAU8BY3YZECEPiAIcKEvSKlNY=">AAACeHicbVHLSgMxFM2M7/qqutRFtNgHSJkRQV0Uim5cVrDa0pQhk2ba0GRmSDLaMvQf/DZ3/ogbN6YPodZeCBzOOfcmOdePOVPacT4te2V1bX1jcyuzvbO7t589OHxWUSIJrZOIR7LhY0U5C2ldM81pI5YUC5/TF79/P9ZfXqlULAqf9DCmbYG7IQsYwdpQXvYdYdlFAg+8JgokJimig7j4hkgn0hDFPVZsXDRLpVGKVCK8tFkYzRl+9YIxwHwFwrlhS8dAhDJ5WJnzwUWPl805ZWdS8D9wZyAHZlXzsh+oE5FE0FATjpVquU6s2ymWmhFORxmUKBpj0sdd2jIwxIKqdjpJbgTPDdOBQSTNCTWcsPMdKRZKDYVvnALrnlrUxuQyrZXo4KadsjBONA3J9KIg4VBHcLwG2GGSEs2HBmAimXkrJD1sFqDNsjImBHfxy/9B/bJ8W3Yfr3LVu1kam+AYnIEicME1qIIHUAN1QMCXdWKdW3nr2z61C3ZparWtWc8R+FP25Q/rzrxj</latexit><latexit sha1_base64="QUKAU8BY3YZECEPiAIcKEvSKlNY=">AAACeHicbVHLSgMxFM2M7/qqutRFtNgHSJkRQV0Uim5cVrDa0pQhk2ba0GRmSDLaMvQf/DZ3/ogbN6YPodZeCBzOOfcmOdePOVPacT4te2V1bX1jcyuzvbO7t589OHxWUSIJrZOIR7LhY0U5C2ldM81pI5YUC5/TF79/P9ZfXqlULAqf9DCmbYG7IQsYwdpQXvYdYdlFAg+8JgokJimig7j4hkgn0hDFPVZsXDRLpVGKVCK8tFkYzRl+9YIxwHwFwrlhS8dAhDJ5WJnzwUWPl805ZWdS8D9wZyAHZlXzsh+oE5FE0FATjpVquU6s2ymWmhFORxmUKBpj0sdd2jIwxIKqdjpJbgTPDdOBQSTNCTWcsPMdKRZKDYVvnALrnlrUxuQyrZXo4KadsjBONA3J9KIg4VBHcLwG2GGSEs2HBmAimXkrJD1sFqDNsjImBHfxy/9B/bJ8W3Yfr3LVu1kam+AYnIEicME1qIIHUAN1QMCXdWKdW3nr2z61C3ZparWtWc8R+FP25Q/rzrxj</latexit><latexit sha1_base64="QUKAU8BY3YZECEPiAIcKEvSKlNY=">AAACeHicbVHLSgMxFM2M7/qqutRFtNgHSJkRQV0Uim5cVrDa0pQhk2ba0GRmSDLaMvQf/DZ3/ogbN6YPodZeCBzOOfcmOdePOVPacT4te2V1bX1jcyuzvbO7t589OHxWUSIJrZOIR7LhY0U5C2ldM81pI5YUC5/TF79/P9ZfXqlULAqf9DCmbYG7IQsYwdpQXvYdYdlFAg+8JgokJimig7j4hkgn0hDFPVZsXDRLpVGKVCK8tFkYzRl+9YIxwHwFwrlhS8dAhDJ5WJnzwUWPl805ZWdS8D9wZyAHZlXzsh+oE5FE0FATjpVquU6s2ymWmhFORxmUKBpj0sdd2jIwxIKqdjpJbgTPDdOBQSTNCTWcsPMdKRZKDYVvnALrnlrUxuQyrZXo4KadsjBONA3J9KIg4VBHcLwG2GGSEs2HBmAimXkrJD1sFqDNsjImBHfxy/9B/bJ8W3Yfr3LVu1kam+AYnIEicME1qIIHUAN1QMCXdWKdW3nr2z61C3ZparWtWc8R+FP25Q/rzrxj</latexit><latexit sha1_base64="QUKAU8BY3YZECEPiAIcKEvSKlNY=">AAACeHicbVHLSgMxFM2M7/qqutRFtNgHSJkRQV0Uim5cVrDa0pQhk2ba0GRmSDLaMvQf/DZ3/ogbN6YPodZeCBzOOfcmOdePOVPacT4te2V1bX1jcyuzvbO7t589OHxWUSIJrZOIR7LhY0U5C2ldM81pI5YUC5/TF79/P9ZfXqlULAqf9DCmbYG7IQsYwdpQXvYdYdlFAg+8JgokJimig7j4hkgn0hDFPVZsXDRLpVGKVCK8tFkYzRl+9YIxwHwFwrlhS8dAhDJ5WJnzwUWPl805ZWdS8D9wZyAHZlXzsh+oE5FE0FATjpVquU6s2ymWmhFORxmUKBpj0sdd2jIwxIKqdjpJbgTPDdOBQSTNCTWcsPMdKRZKDYVvnALrnlrUxuQyrZXo4KadsjBONA3J9KIg4VBHcLwG2GGSEs2HBmAimXkrJD1sFqDNsjImBHfxy/9B/bJ8W3Yfr3LVu1kam+AYnIEicME1qIIHUAN1QMCXdWKdW3nr2z61C3ZparWtWc8R+FP25Q/rzrxj</latexit>

Page 69: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

CRFs: Computing Normalization

• Forward algorithm! Remember HMM case:

Define 𝑛𝑜𝑟𝑚(𝑖, 𝑦!) to sum of scores for sequences ending in position 𝑖

⇡(i, yi) = maxyi�1

e(xi|yi)q(yi|yi�1)⇡(i� 1, yi�1)

p(Y | X;w) =exp(w · �(X,Y ))PY 0 exp(w · �(X,Y 0))

<latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit>

�(X,Y ) =nX

j=1

�(X, j, yj�1, yj)

<latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit>

X

Y 0

exp(w · �(X,Y 0)) =X

Y 0

exp

0

@nX

j=1

w · �(X, j, yj�1, yj)

1

A

=X

Y 0

nY

j=1

exp(w · �(X, j, yj�1, yj))

<latexit sha1_base64="0zzpLqJinD0acoahyDh/143UDO4=">AAACk3icbVHbahsxENVu0yZ1L3Fa6EteRE0TGxKzWwppoQE3yUNfCinUjYPlGq1Wa8vRSkKabWuW/aL+Td/yN5HtJeQ2IOZwzpwZaZQYKRxE0WUQPlp7/GR942nj2fMXLzebW69+Ol1YxvtMS20HCXVcCsX7IEDygbGc5onkZ8nF8UI/+82tE1r9gLnho5xOlMgEo+CpcfMfcUU+Ls93K8L/mvYfwlINmJipaA/2znc7HbxziK9r8KIIE8kzaNfs7DCufim8Mta+2d7cC/tx5fOsg4kVkyn4TBo7+EY3YqxOrzs8MP92n8642Yq60TLwfRDXoIXqOB03/5NUsyLnCpikzg3jyMCopBYEk7xqkMJxQ9kFnfChh4rm3I3K5VIr/M4zKc609UcBXrI3HSXNnZvnia/MKUzdXW1BPqQNC8g+jkqhTAFcsdWgrJAYNF78EE6F5Qzk3APKrPB3xWxKLWXg/7HhlxDfffJ90H/f/dSNv39o9Y7qbWygbfQWtVGMDlAPfUWnqI9YsBUcBL3gS/gm/BwehSer0jCoPa/RrQi/XQGyz8Si</latexit><latexit sha1_base64="0zzpLqJinD0acoahyDh/143UDO4=">AAACk3icbVHbahsxENVu0yZ1L3Fa6EteRE0TGxKzWwppoQE3yUNfCinUjYPlGq1Wa8vRSkKabWuW/aL+Td/yN5HtJeQ2IOZwzpwZaZQYKRxE0WUQPlp7/GR942nj2fMXLzebW69+Ol1YxvtMS20HCXVcCsX7IEDygbGc5onkZ8nF8UI/+82tE1r9gLnho5xOlMgEo+CpcfMfcUU+Ls93K8L/mvYfwlINmJipaA/2znc7HbxziK9r8KIIE8kzaNfs7DCufim8Mta+2d7cC/tx5fOsg4kVkyn4TBo7+EY3YqxOrzs8MP92n8642Yq60TLwfRDXoIXqOB03/5NUsyLnCpikzg3jyMCopBYEk7xqkMJxQ9kFnfChh4rm3I3K5VIr/M4zKc609UcBXrI3HSXNnZvnia/MKUzdXW1BPqQNC8g+jkqhTAFcsdWgrJAYNF78EE6F5Qzk3APKrPB3xWxKLWXg/7HhlxDfffJ90H/f/dSNv39o9Y7qbWygbfQWtVGMDlAPfUWnqI9YsBUcBL3gS/gm/BwehSer0jCoPa/RrQi/XQGyz8Si</latexit><latexit sha1_base64="0zzpLqJinD0acoahyDh/143UDO4=">AAACk3icbVHbahsxENVu0yZ1L3Fa6EteRE0TGxKzWwppoQE3yUNfCinUjYPlGq1Wa8vRSkKabWuW/aL+Td/yN5HtJeQ2IOZwzpwZaZQYKRxE0WUQPlp7/GR942nj2fMXLzebW69+Ol1YxvtMS20HCXVcCsX7IEDygbGc5onkZ8nF8UI/+82tE1r9gLnho5xOlMgEo+CpcfMfcUU+Ls93K8L/mvYfwlINmJipaA/2znc7HbxziK9r8KIIE8kzaNfs7DCufim8Mta+2d7cC/tx5fOsg4kVkyn4TBo7+EY3YqxOrzs8MP92n8642Yq60TLwfRDXoIXqOB03/5NUsyLnCpikzg3jyMCopBYEk7xqkMJxQ9kFnfChh4rm3I3K5VIr/M4zKc609UcBXrI3HSXNnZvnia/MKUzdXW1BPqQNC8g+jkqhTAFcsdWgrJAYNF78EE6F5Qzk3APKrPB3xWxKLWXg/7HhlxDfffJ90H/f/dSNv39o9Y7qbWygbfQWtVGMDlAPfUWnqI9YsBUcBL3gS/gm/BwehSer0jCoPa/RrQi/XQGyz8Si</latexit><latexit sha1_base64="0zzpLqJinD0acoahyDh/143UDO4=">AAACk3icbVHbahsxENVu0yZ1L3Fa6EteRE0TGxKzWwppoQE3yUNfCinUjYPlGq1Wa8vRSkKabWuW/aL+Td/yN5HtJeQ2IOZwzpwZaZQYKRxE0WUQPlp7/GR942nj2fMXLzebW69+Ol1YxvtMS20HCXVcCsX7IEDygbGc5onkZ8nF8UI/+82tE1r9gLnho5xOlMgEo+CpcfMfcUU+Ls93K8L/mvYfwlINmJipaA/2znc7HbxziK9r8KIIE8kzaNfs7DCufim8Mta+2d7cC/tx5fOsg4kVkyn4TBo7+EY3YqxOrzs8MP92n8642Yq60TLwfRDXoIXqOB03/5NUsyLnCpikzg3jyMCopBYEk7xqkMJxQ9kFnfChh4rm3I3K5VIr/M4zKc609UcBXrI3HSXNnZvnia/MKUzdXW1BPqQNC8g+jkqhTAFcsdWgrJAYNF78EE6F5Qzk3APKrPB3xWxKLWXg/7HhlxDfffJ90H/f/dSNv39o9Y7qbWygbfQWtVGMDlAPfUWnqI9YsBUcBL3gS/gm/BwehSer0jCoPa/RrQi/XQGyz8Si</latexit>

norm(i, yi) =X

yi�1

exp(w · �(X, i, yi�1, yi))norm(i� 1, yi�1)<latexit sha1_base64="8CKFkxEG7p4RZ7WWp9XwYCi7T4w=">AAACRHicbVBLSwMxGMz6tr6qHr0Ei9BCLbsiqAeh6MWjgtVCtyzZNG1D81iSrLos++e8ePfmP/DiQcWrmK0VanUgMMzM9yWZMGJUG9d9cqamZ2bn5hcWC0vLK6trxfWNKy1jhUkDSyZVM0SaMCpIw1DDSDNSBPGQketwcJr71zdEaSrFpUki0uaoJ2iXYmSsFBT91FccCql4VqbVJKAVeAx9HfMgTYKU7npZ5pO7qHzr44400I/6tNys5smhWYX5SGV8ya73Y1aCYsmtuUPAv8QbkRIY4TwoPvodiWNOhMEMad3y3Mi0U6QMxYxkBT/WJEJ4gHqkZalAnOh2OmwhgztW6cCuVPYIA4fq+ESKuNYJD22SI9PXk14u/ue1YtM9bKdURLEhAn9f1I0ZNBLmlcIOVQQblliCsKL2rRD3kULY2OILtgRv8st/SWOvdlTzLvZL9ZNRGwtgC2yDMvDAAaiDM3AOGgCDe/AMXsGb8+C8OO/Ox3d0yhnNbIJfcD6/AJLKsBU=</latexit><latexit sha1_base64="8CKFkxEG7p4RZ7WWp9XwYCi7T4w=">AAACRHicbVBLSwMxGMz6tr6qHr0Ei9BCLbsiqAeh6MWjgtVCtyzZNG1D81iSrLos++e8ePfmP/DiQcWrmK0VanUgMMzM9yWZMGJUG9d9cqamZ2bn5hcWC0vLK6trxfWNKy1jhUkDSyZVM0SaMCpIw1DDSDNSBPGQketwcJr71zdEaSrFpUki0uaoJ2iXYmSsFBT91FccCql4VqbVJKAVeAx9HfMgTYKU7npZ5pO7qHzr44400I/6tNys5smhWYX5SGV8ya73Y1aCYsmtuUPAv8QbkRIY4TwoPvodiWNOhMEMad3y3Mi0U6QMxYxkBT/WJEJ4gHqkZalAnOh2OmwhgztW6cCuVPYIA4fq+ESKuNYJD22SI9PXk14u/ue1YtM9bKdURLEhAn9f1I0ZNBLmlcIOVQQblliCsKL2rRD3kULY2OILtgRv8st/SWOvdlTzLvZL9ZNRGwtgC2yDMvDAAaiDM3AOGgCDe/AMXsGb8+C8OO/Ox3d0yhnNbIJfcD6/AJLKsBU=</latexit><latexit sha1_base64="8CKFkxEG7p4RZ7WWp9XwYCi7T4w=">AAACRHicbVBLSwMxGMz6tr6qHr0Ei9BCLbsiqAeh6MWjgtVCtyzZNG1D81iSrLos++e8ePfmP/DiQcWrmK0VanUgMMzM9yWZMGJUG9d9cqamZ2bn5hcWC0vLK6trxfWNKy1jhUkDSyZVM0SaMCpIw1DDSDNSBPGQketwcJr71zdEaSrFpUki0uaoJ2iXYmSsFBT91FccCql4VqbVJKAVeAx9HfMgTYKU7npZ5pO7qHzr44400I/6tNys5smhWYX5SGV8ya73Y1aCYsmtuUPAv8QbkRIY4TwoPvodiWNOhMEMad3y3Mi0U6QMxYxkBT/WJEJ4gHqkZalAnOh2OmwhgztW6cCuVPYIA4fq+ESKuNYJD22SI9PXk14u/ue1YtM9bKdURLEhAn9f1I0ZNBLmlcIOVQQblliCsKL2rRD3kULY2OILtgRv8st/SWOvdlTzLvZL9ZNRGwtgC2yDMvDAAaiDM3AOGgCDe/AMXsGb8+C8OO/Ox3d0yhnNbIJfcD6/AJLKsBU=</latexit><latexit sha1_base64="8CKFkxEG7p4RZ7WWp9XwYCi7T4w=">AAACRHicbVBLSwMxGMz6tr6qHr0Ei9BCLbsiqAeh6MWjgtVCtyzZNG1D81iSrLos++e8ePfmP/DiQcWrmK0VanUgMMzM9yWZMGJUG9d9cqamZ2bn5hcWC0vLK6trxfWNKy1jhUkDSyZVM0SaMCpIw1DDSDNSBPGQketwcJr71zdEaSrFpUki0uaoJ2iXYmSsFBT91FccCql4VqbVJKAVeAx9HfMgTYKU7npZ5pO7qHzr44400I/6tNys5smhWYX5SGV8ya73Y1aCYsmtuUPAv8QbkRIY4TwoPvodiWNOhMEMad3y3Mi0U6QMxYxkBT/WJEJ4gHqkZalAnOh2OmwhgztW6cCuVPYIA4fq+ESKuNYJD22SI9PXk14u/ue1YtM9bKdURLEhAn9f1I0ZNBLmlcIOVQQblliCsKL2rRD3kULY2OILtgRv8st/SWOvdlTzLvZL9ZNRGwtgC2yDMvDAAaiDM3AOGgCDe/AMXsGb8+C8OO/Ox3d0yhnNbIJfcD6/AJLKsBU=</latexit>

Page 70: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

CRFs: Computing Gradient

• Can compute with the Forward Backward algorithmSee notes for full details!

p(Y | X;w) =exp(w · �(X,Y ))PY 0 exp(w · �(X,Y 0))

<latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit><latexit sha1_base64="lbwFBxUDc4Hcn3NuZC3yIIwvrp8=">AAACNXicbVDLSgMxFM3Ud32NunQTLNIWpMyIoCKC6MaNoGBtS6eUTCbThiYzIclYy9CvcuN3uHPjQsWtv2D6WGjbA4HDOedyc48vGFXacd6szNz8wuLS8kp2dW19Y9Pe2n5QcSIxKeOYxbLqI0UYjUhZU81IVUiCuM9Ixe9cDfzKI5GKxtG97gnS4KgV0ZBipI3UtG9EoeZxGsDqGewW4Tn0Qolw6pEnUeh6OIi1J9q0UD2oFYv91FMJb6a1fH+GnzeBpp1zSs4QcJq4Y5IDY9w27VcviHHCSaQxQ0rVXUfoRoqkppiRftZLFBEId1CL1A2NECeqkQ7P7sN9owQwjKV5kYZD9e9EirhSPe6bJEe6rSa9gTjLqyc6PGmkNBKJJhEeLQoTBnUMBx3CgEqCNesZgrCk5q8Qt5HpTZums6YEd/LkaVI+LJ2W3Luj3MXluI1lsAv2QAG44BhcgGtwC8oAg2fwBj7Ap/VivVtf1vcomrHGMzvgH6yfXz0JqhM=</latexit>

�(X,Y ) =nX

j=1

�(X, j, yj�1, yj)

<latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit><latexit sha1_base64="Pz5sDx+rdbFFQKs2A3/Hx0F31hA=">AAACFnicbVDLSsNAFJ3UV62vqEs3g0VooYZEBHVRKLpxWcHaShvDZDppp51MwsxEKKF/4cZfceNCxa2482+ctllo9cDlHs65l5l7/JhRqWz7y8gtLC4tr+RXC2vrG5tb5vbOjYwSgUkDRywSLR9JwignDUUVI61YEBT6jDT94cXEb94TIWnEr9UoJm6IepwGFCOlJc+0OnGfllqV2zKswo5MQi8dVJ3xHYeZMaiMtHTojHUflD2zaFv2FPAvcTJSBBnqnvnZ6UY4CQlXmCEp244dKzdFQlHMyLjQSSSJER6iHmlrylFIpJtO7xrDA610YRAJXVzBqfpzI0WhlKPQ15MhUn05703E/7x2ooJTN6U8ThThePZQkDCoIjgJCXapIFixkSYIC6r/CnEfCYSVjrKgQ3DmT/5LGkfWmeVcHRdr51kaebAH9kEJOOAE1MAlqIMGwOABPIEX8Go8Gs/Gm/E+G80Z2c4u+AXj4xvRRJ1t</latexit>

@

@wjL(w) =

mX

I=1

�j(X

(i), Y (i))�X

Y

p(Y | X(i);w)�j(X(i), Y )

!� �wj

<latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit><latexit sha1_base64="oo34ygGoWyi+Eu/9njgONGN9/v4=">AAACfXicbVFdb9MwFHXC11a+Cjzu5UIFSmCrEoQ0EJo0wQtIPAyJslZ1FzmO03qzE8t2qKooP2N/bG/8Fl5w2ghBx5WudHTuObpfqRLc2Cj66fk3bt66fWdnt3f33v0HD/uPHn83ZaUpG9FSlHqcEsMEL9jIcivYWGlGZCrYaXrxsa2f/mDa8LL4ZleKzSSZFzznlFhHJf1LnGtCa6yItpyI5g+CZXLefAmWIRwBNpVM6s9HcXMmsWC5DQCrBU/Og/FZHfCw2YfJBoRwsFFPQAUTwJJn0GngPSzDLdskBKz5fGHXPuHGzkjbOOkPomG0DrgO4g4MUBcnSf8KZyWtJCssFcSYaRwpO6vbVahgTQ9XhilCL8icTR0siGRmVq/P18Bzx2SQl9plYWHN/u2oiTRmJVOnlMQuzHatJf9Xm1Y2fzureaEqywq6aZRXAmwJ7S8g45pRK1YOEKq5mxXogrh/WPexnjtCvL3ydTB6PXw3jL++GRx/6K6xg/bQMxSgGB2iY/QJnaARouiX99R76b3ykf/C3/eHG6nvdZ4n6J/wD38D6RO8og==</latexit>

X

Y

p(Y | X(i);w)�j(X(i), Y ) =

X

Y

p(Y | X(i);w)nX

k=1

�j(X(i), k, yk�1, yk)

=nX

k=1

X

a,b

X

yk�1=a,yk=b

p(Y | X(i);w)�j(X(i), k, yk�1, yk)

<latexit sha1_base64="fYB7/PoLHifkWelFNsW17EyNY5s=">AAACyXicbVHPb9MwFHYCbKOD0cGRyxPVplYqVYKQNoQqVXCAA4dNWrdWTRc5rrua2E5kOyslyon/kBtH/hOcNhJbsydZ+vT98Hv2i1LOtPG8P4776PGTnd29p439Z88PXjQPX17qJFOEDknCEzWKsKacSTo0zHA6ShXFIuL0Koo/l/rVLVWaJfLCrFI6FfhGsjkj2FgqbP4NdCbCMaTtMQSCzWB0nbdZp/gIy06QLlj4vV0x3XEHjvvw33/Pbt2lkMd9v7iWsBWFuAsrK771ixLEHQiCxjFUt1WhDcbdqNigKmBdeB3qR0V9zNqUtVZhs+X1vHVBHfgVaKGqzsLm72CWkExQaQjHWk98LzXTHCvDCKdFI8g0TTGJ8Q2dWCixoHqar1dRwJFlZjBPlD3SwJq9m8ix0Hol7FOOBDYLva2V5EPaJDPz02nOZJoZKsmm0TzjYBIo9wozpigxfGUBJorZWYEssMLE2O037Cf420+ug+G73oeef/6+NfhU/cYeeo3eoDby0QkaoK/oDA0Rcb44wrl1lu43V7k/3J8bq+tUmVfoXrm//gHp7tYg</latexit><latexit sha1_base64="fYB7/PoLHifkWelFNsW17EyNY5s=">AAACyXicbVHPb9MwFHYCbKOD0cGRyxPVplYqVYKQNoQqVXCAA4dNWrdWTRc5rrua2E5kOyslyon/kBtH/hOcNhJbsydZ+vT98Hv2i1LOtPG8P4776PGTnd29p439Z88PXjQPX17qJFOEDknCEzWKsKacSTo0zHA6ShXFIuL0Koo/l/rVLVWaJfLCrFI6FfhGsjkj2FgqbP4NdCbCMaTtMQSCzWB0nbdZp/gIy06QLlj4vV0x3XEHjvvw33/Pbt2lkMd9v7iWsBWFuAsrK771ixLEHQiCxjFUt1WhDcbdqNigKmBdeB3qR0V9zNqUtVZhs+X1vHVBHfgVaKGqzsLm72CWkExQaQjHWk98LzXTHCvDCKdFI8g0TTGJ8Q2dWCixoHqar1dRwJFlZjBPlD3SwJq9m8ix0Hol7FOOBDYLva2V5EPaJDPz02nOZJoZKsmm0TzjYBIo9wozpigxfGUBJorZWYEssMLE2O037Cf420+ug+G73oeef/6+NfhU/cYeeo3eoDby0QkaoK/oDA0Rcb44wrl1lu43V7k/3J8bq+tUmVfoXrm//gHp7tYg</latexit><latexit sha1_base64="fYB7/PoLHifkWelFNsW17EyNY5s=">AAACyXicbVHPb9MwFHYCbKOD0cGRyxPVplYqVYKQNoQqVXCAA4dNWrdWTRc5rrua2E5kOyslyon/kBtH/hOcNhJbsydZ+vT98Hv2i1LOtPG8P4776PGTnd29p439Z88PXjQPX17qJFOEDknCEzWKsKacSTo0zHA6ShXFIuL0Koo/l/rVLVWaJfLCrFI6FfhGsjkj2FgqbP4NdCbCMaTtMQSCzWB0nbdZp/gIy06QLlj4vV0x3XEHjvvw33/Pbt2lkMd9v7iWsBWFuAsrK771ixLEHQiCxjFUt1WhDcbdqNigKmBdeB3qR0V9zNqUtVZhs+X1vHVBHfgVaKGqzsLm72CWkExQaQjHWk98LzXTHCvDCKdFI8g0TTGJ8Q2dWCixoHqar1dRwJFlZjBPlD3SwJq9m8ix0Hol7FOOBDYLva2V5EPaJDPz02nOZJoZKsmm0TzjYBIo9wozpigxfGUBJorZWYEssMLE2O037Cf420+ug+G73oeef/6+NfhU/cYeeo3eoDby0QkaoK/oDA0Rcb44wrl1lu43V7k/3J8bq+tUmVfoXrm//gHp7tYg</latexit><latexit sha1_base64="fYB7/PoLHifkWelFNsW17EyNY5s=">AAACyXicbVHPb9MwFHYCbKOD0cGRyxPVplYqVYKQNoQqVXCAA4dNWrdWTRc5rrua2E5kOyslyon/kBtH/hOcNhJbsydZ+vT98Hv2i1LOtPG8P4776PGTnd29p439Z88PXjQPX17qJFOEDknCEzWKsKacSTo0zHA6ShXFIuL0Koo/l/rVLVWaJfLCrFI6FfhGsjkj2FgqbP4NdCbCMaTtMQSCzWB0nbdZp/gIy06QLlj4vV0x3XEHjvvw33/Pbt2lkMd9v7iWsBWFuAsrK771ixLEHQiCxjFUt1WhDcbdqNigKmBdeB3qR0V9zNqUtVZhs+X1vHVBHfgVaKGqzsLm72CWkExQaQjHWk98LzXTHCvDCKdFI8g0TTGJ8Q2dWCixoHqar1dRwJFlZjBPlD3SwJq9m8ix0Hol7FOOBDYLva2V5EPaJDPz02nOZJoZKsmm0TzjYBIo9wozpigxfGUBJorZWYEssMLE2O037Cf420+ug+G73oeef/6+NfhU/cYeeo3eoDby0QkaoK/oDA0Rcb44wrl1lu43V7k/3J8bq+tUmVfoXrm//gHp7tYg</latexit>

Page 71: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

[Sun 2014]

Page 72: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Cyclic Network• Train two MEMMs,

combine scores• And be very careful• Tune regularization• Try lots of different

features• See paper for full

details

Cyclic Tagging[Toutanova et al 03]

Another idea: train a bi-directional MEMM

(a) Left-to-Right CMM

(b) Right-to-Left CMM

(c) Bidirectional Dependency Network

Figure 1: Dependency networks: (a) the (standard) left-to-rightfirst-order CMM, (b) the (reversed) right-to-left CMM, and (c)the bidirectional dependency network.

the model.Having expressive templates leads to a large number

of features, but we show that by suitable use of a prior(i.e., regularization) in the conditional loglinear model –something not used by previous maximum entropy tag-gers – many such features can be added with an overallpositive effect on the model. Indeed, as for the voted per-ceptron of Collins (2002), we can get performance gainsby reducing the support threshold for features to be in-cluded in the model. Combining all these ideas, togetherwith a few additional handcrafted unknown word fea-tures, gives us a part-of-speech tagger with a per-positiontag accuracy of 97.24%, and a whole-sentence correctrate of 56.34% on Penn Treebank WSJ data. This is thebest automatically learned part-of-speech tagging resultknown to us, representing an error reduction of 4.4% onthe model presented in Collins (2002), using the samedata splits, and a larger error reduction of 12.1% from themore similar best previous loglinear model in Toutanovaand Manning (2000).

2 Bidirectional Dependency Networks

When building probabilistic models for tag sequences,we often decompose the global probability of sequencesusing a directed graphical model (e.g., an HMM (Brants,2000) or a conditional Markov model (CMM) (Ratna-parkhi, 1996)). In such models, the probability assignedto a tagged sequence of words is the productof a sequence of local portions of the graphical model,one from each time slice. For example, in the left-to-rightCMM shown in figure 1(a),

That is, the replicated structure is a local model.2 Of course, if there are too many con-

ditioned quantities, these local models may have to beestimated in some sophisticated way; it is typical in tag-ging to populate these models with little maximum en-tropy models. For example, we might populate a modelfor with a maxent model of the form:

In this case, the and can have joint effects on ,but there are not joint features involving all three vari-ables (though there could have been such features). Wesay that this model uses the feature templates(previous tag features) and (current word fea-tures).Clearly, both the preceding tag and following tagcarry useful information about a current tag . Uni-

directional models do not ignore this influence; in thecase of a left-to-right CMM, the influence of onis explicit in the local model, while the in-fluence of on is implicit in the local model at thenext position (via ). The situation is re-versed for the right-to-left CMM in figure 1(b).From a seat-of-the-pantsmachine learning perspective,

when building a classifier to label the tag at a certain posi-tion, the obvious thing to do is to explicitly include in thelocal model all predictive features, no matter on whichside of the target position they lie. There are two goodformal reasons to expect that a model explicitly condi-tioning on both sides at each position, like figure 1(c)could be advantageous. First, because of smoothingeffects and interaction with other conditioning features(like the words), left-to-right factors likedo not always suffice when is implicitly needed to de-termine . For example, consider a case of observationbias (Klein and Manning, 2002) for a first-order left-to-right CMM. The word to has only one tag (TO) in thePTB tag set. The TO tag is often preceded by nouns, butrarely by modals (MD). In a sequence will to fight, thattrend indicates that will should be a noun rather than amodal verb. However, that effect is completely lost in aCMM like (a): prefers the modaltagging, and TO is roughly 1 regardless of

. While the model has an arrow between the two tagpositions, that path of influence is severed.3 The same

2Throughout this paper we assume that enough boundarysymbols always exist that we can ignore the differences whichwould otherwise exist at the initial and final few positions.

3Despite use of names like “label bias” (Lafferty et al., 2001)or “observation bias”, these effects are really just unwantedexplaining-away effects (Cowell et al., 1999, 19), where twonodes which are not actually in causal competition have beenmodeled as if they were.

(a) Left-to-Right CMM

(b) Right-to-Left CMM

(c) Bidirectional Dependency Network

Figure 1: Dependency networks: (a) the (standard) left-to-rightfirst-order CMM, (b) the (reversed) right-to-left CMM, and (c)the bidirectional dependency network.

the model.Having expressive templates leads to a large number

of features, but we show that by suitable use of a prior(i.e., regularization) in the conditional loglinear model –something not used by previous maximum entropy tag-gers – many such features can be added with an overallpositive effect on the model. Indeed, as for the voted per-ceptron of Collins (2002), we can get performance gainsby reducing the support threshold for features to be in-cluded in the model. Combining all these ideas, togetherwith a few additional handcrafted unknown word fea-tures, gives us a part-of-speech tagger with a per-positiontag accuracy of 97.24%, and a whole-sentence correctrate of 56.34% on Penn Treebank WSJ data. This is thebest automatically learned part-of-speech tagging resultknown to us, representing an error reduction of 4.4% onthe model presented in Collins (2002), using the samedata splits, and a larger error reduction of 12.1% from themore similar best previous loglinear model in Toutanovaand Manning (2000).

2 Bidirectional Dependency Networks

When building probabilistic models for tag sequences,we often decompose the global probability of sequencesusing a directed graphical model (e.g., an HMM (Brants,2000) or a conditional Markov model (CMM) (Ratna-parkhi, 1996)). In such models, the probability assignedto a tagged sequence of words is the productof a sequence of local portions of the graphical model,one from each time slice. For example, in the left-to-rightCMM shown in figure 1(a),

That is, the replicated structure is a local model.2 Of course, if there are too many con-

ditioned quantities, these local models may have to beestimated in some sophisticated way; it is typical in tag-ging to populate these models with little maximum en-tropy models. For example, we might populate a modelfor with a maxent model of the form:

In this case, the and can have joint effects on ,but there are not joint features involving all three vari-ables (though there could have been such features). Wesay that this model uses the feature templates(previous tag features) and (current word fea-tures).Clearly, both the preceding tag and following tagcarry useful information about a current tag . Uni-

directional models do not ignore this influence; in thecase of a left-to-right CMM, the influence of onis explicit in the local model, while the in-fluence of on is implicit in the local model at thenext position (via ). The situation is re-versed for the right-to-left CMM in figure 1(b).From a seat-of-the-pantsmachine learning perspective,

when building a classifier to label the tag at a certain posi-tion, the obvious thing to do is to explicitly include in thelocal model all predictive features, no matter on whichside of the target position they lie. There are two goodformal reasons to expect that a model explicitly condi-tioning on both sides at each position, like figure 1(c)could be advantageous. First, because of smoothingeffects and interaction with other conditioning features(like the words), left-to-right factors likedo not always suffice when is implicitly needed to de-termine . For example, consider a case of observationbias (Klein and Manning, 2002) for a first-order left-to-right CMM. The word to has only one tag (TO) in thePTB tag set. The TO tag is often preceded by nouns, butrarely by modals (MD). In a sequence will to fight, thattrend indicates that will should be a noun rather than amodal verb. However, that effect is completely lost in aCMM like (a): prefers the modaltagging, and TO is roughly 1 regardless of

. While the model has an arrow between the two tagpositions, that path of influence is severed.3 The same

2Throughout this paper we assume that enough boundarysymbols always exist that we can ignore the differences whichwould otherwise exist at the initial and final few positions.

3Despite use of names like “label bias” (Lafferty et al., 2001)or “observation bias”, these effects are really just unwantedexplaining-away effects (Cowell et al., 1999, 19), where twonodes which are not actually in causal competition have beenmodeled as if they were.

And be careful experimentally! Try lots of features on

dev. set Use L2 regularization see paper...

[Toutanova et al. 2003]

Page 73: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Some Numbers• Rough accuracies:– Most freq tag: ~90% / ~50%– Trigram HMM: ~95% / ~55%– TnT (Brants, 2000): 96.7% / 85.5%– MaxEnt P(y | x) 93.7% / 82.6%– MEMM tagger 1: 96.7% / 84.5%– MEMM tagger 2: 96.8% / 86.9%– Perceptron: 97.1%– CRF++: 97.3%– Cyclic tagger: 97.2% / 89.0%– Upper bound: ~98%

[Toutanova et al. 2003]

Page 74: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Summary• Generative vs. discriminative• Probabilistic or not– Probabilities are great for upstream tasks– But: label bias, global normalization, etc.

• Structured or not– Independent predictions are effective, but global

structure matters– But: need to balance global vs. local for tractability

• Model expressivity– Higher n-grams are better– But: cost

Page 75: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Summary• For tagging, the change from generative to

discriminative model does not by itself result in great improvement

• But: profit from models by specifying dependence on overlapping features of the observation such as spelling, suffix analysis, etc.

• MEMMs allow integration of rich features of the observations

• This additional power (of the MEMM ,CRF, Perceptron models) has been shown to result in improvements in accuracy

• The higher accuracy of discriminative models comes at the price of much slower training

Page 76: Sequence Prediction and Part-of-speech Tagging › courses › cs5740 › 2020sp › ... · part of speech •But they tend to be very common words. E.g., that –I know thathe is

Domain Effects• Accuracies degrade outside of domain– Up to triple error rate– Usually make the most errors on the things you care

about in the domain (e.g. protein names)

• Open questions– How to effectively exploit unlabeled data from a new

domain (what could we gain?)– How to best incorporate domain lexica in a principled

way (e.g. UMLS specialist lexicon, ontologies)