Upload
alopezfoo
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Phrase-based models
•Exact decoding is NP-hard.
•As a consequence of arbitrary permutation…
•…but real permutations are not arbitrary!
•Parameterization of reordering is weak.
•No generalization!
Garcia and associates .
Garcia y asociados .Carlos Garcia has three associates .
Carlos Garcia tiene tres asociados .his associates are not strong .
sus asociados no son fuertes .Garcia has a company also .
Garcia tambien tiene una empresa .its clients are angry .
sus clientes estan enfadados .the associates are also angry .
los asociados tambien estan enfadados .
la empresa tiene enemigos fuertes en Europa .
the company has strong enemies in Europe .the clients and the associates are enemies .
los clientes y los asociados son enemigos .the company has three groups .
la empresa tiene tres grupos .its groups are in Europe .
sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .
los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .
los grupos no venden zanzanina .the small groups are not modern .
los grupos pequenos no son modernos .
Garcia and associates .
Garcia y asociados .Carlos Garcia has three associates .
Carlos Garcia tiene tres asociados .his associates are not strong .
sus asociados no son fuertes .Garcia has a company also .
Garcia tambien tiene una empresa .its clients are angry .
sus clientes estan enfadados .the associates are also angry .
los asociados tambien estan enfadados .
la empresa tiene enemigos fuertes en Europa .
the company has strong enemies in Europe .the clients and the associates are enemies .
los clientes y los asociados son enemigos .the company has three groups .
la empresa tiene tres grupos .its groups are in Europe .
sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .
los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .
los grupos no venden zanzanina .the small groups are not modern .
los grupos pequenos no son modernos .
Same pattern:NN JJ → JJ NN
Phrase-based models do not capture this generalization.
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
hako wo
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
hako wo
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
hako wo akemasu
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
hako wo akemasu
Context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S
NP VP
NP Vwatashi wa
hako wo akemasu
watashi wa hako wo akemasu
Synchronous context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)
Synchronous context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S → NP VPNP → INP → the boxVP → V NP V → open
Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)
Synchronous context-free grammar
S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu
S → NP VPNP → INP → the boxVP → V NP V → open
Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)
Japanese is SOV. English is SVO.
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
REQUIREMENT:one-to-mapping
between source and target nonterminals,
indicated by coindexes
Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
watashi wa I
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
watashi wa I
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
hako wo the box
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
hako wo the box
Synchronous context-free grammar
S → NP1 VP2 / NP1 VP2
NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1
V → akemasu / open
S S
NP VP NP VP
NP V V NPwatashi wa I
akemasu openhako wo the box
Synchronous context-free grammarS S
NP VP NP VP
NP V V NPwatashi wa I
akemasu openhako wo the box
watashi wa hako wo akemasu
Synchronous context-free grammarS S
NP VP NP VP
NP V V NPwatashi wa I
akemasu openhako wo the box
watashi wa hako wo akemasu I open the box
Translation as parsingS
NP VP
NP Vwatashi wa
akemasuhako wo
S
NP VP
V NPI
open the box
watashi wa hako wo akemasu
Translation as parsingS
NP VP
NP Vwatashi wa
akemasuhako wo
S
NP VP
V NPI
open the box
watashi wa hako wo akemasu I open the box
Preliminaries
S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu
Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.
Preliminaries
S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu
Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.
S → NP VPNP → X Y
X → watashi Y → wa NP → Z W Z → hako W → woVP → NP V V → akemasu
Preliminaries
S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu
Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.
S → NP VPNP → X Y
X → watashi Y → wa NP → Z W Z → hako W → woVP → NP V V → akemasu
Q: how do synchronous productions interact with
this transformation?
Decoding
•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.
Decoding
•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.
O((2n)!
(n + 1)!n!)
Decoding
•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.
•Dynamic programming to the rescue!
O((2n)!
(n + 1)!n!)
ParsingNN → duck, pato
PRP → I, yo
VBD → saw, vi
PRP$ → her, ella
NP → PRP$1 NN2, PRP$1 NN2
VP → VBD1 NP2, VBD1 NP2
S → PRP1 VP2, PRP1 VP2
PRP → her, su
VB → duck, agacharseSBAR → PRP1 VB2, PRP1 VB2
VP → VBD1 SBAR2, VBD1 SBAR2
ParsingNN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
PRP0,1 � (w1 = I) ⇤ (PRP ⇥ I)Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
PRP0,1 � (w1 = I) ⇤ (PRP ⇥ I)Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)NP2,4 � PRP$2,3 ⇤NN3,4 ⇤ (NP⇥ PRP$ NN)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)NP2,4 � PRP$2,3 ⇤NN3,4 ⇤ (NP⇥ PRP$ NN)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4 SBAR2,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4 SBAR2,4
VP1,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4 SBAR2,4
VP1,4
Parsing
I1 saw2 her3 duck4
Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)
PRP0,1
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR VBD1,2
PRP$2,3
PRP2,3
NN3,4
VB3,4
NP2,4 SBAR2,4
VP1,4
S0,4
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
I saw her duck
NP
VP
PRP VBD PRP$ NN
S
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
I saw her duck
NP
VP
PRP VBD PRP$ NN
SNP
VP
PRP VBD PRP$ NN
S
yo vi su pato
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
I saw her duck
SBAR
VP
PRP VBD PRP VB
S
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
I saw her duck
SBAR
VP
PRP VBD PRP VB
SSBAR
VP
PRP VBD PRP VB
S
yo vi ella agacharse
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
Analysis
Parsing
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
Analysis
nodesO(Nn2)
O(Gn3) edges
Wait a second!
•Phrase-based MT is NP-hard because of permutations (there are a factorial number).
•SCFGs also permute sentences.
Wait a second!
•Phrase-based MT is NP-hard because of permutations (there are a factorial number).
•SCFGs also permute sentences.
•But the decoding algorithm is polynomial…
Wait a second!
•Phrase-based MT is NP-hard because of permutations (there are a factorial number).
•SCFGs also permute sentences.
•But the decoding algorithm is polynomial…
•What are we giving up for this efficiency?
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
What permutations of a b c d can this
grammar produce?
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b a c d
b a d c
b c a d
b c d a
b d a c
b d c a
What permutations of a b c d can this
grammar produce?
d a b c
d a c b
d b a c
d b c a
d c a b
d c b a
a b c d
a b d c
a c b d
a c d b
a d b c
a d c b
c a b d
c a d b
c b a d
c b d a
c d a b
c d b a
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b a c d
b a d c
b c a d
b c d a
b d a c
b d c a
What permutations of a b c d can this
grammar produce?
d a b c
d a c b
d b a c
d b c a
d c a b
d c b a
a b c d
a b d c
a c b d
a c d b
a d b c
a d c b
c a b d
c a d b
c b a d
c b d a
c d a b
c d b a
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b a c d
b a d c
b c a d
b c d a
b d a c
b d c a
What permutations of a b c d can this
grammar produce?
d a b c
d a c b
d b a c
d b c a
d c a b
d c b a
a b c d
a b d c
a c b d
a c d b
a d b c
a d c b
c a b d
c a d b
c b a d
c b d a
c d a b
c d b a
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b a c d
b a d c
b c a d
b c d a
b d a c
b d c a
What permutations of a b c d can this
grammar produce?
d a b c
d a c b
d b a c
d b c a
d c a b
d c b a
a b c d
a b d c
a c b d
a c d b
a d b c
a d c b
c a b d
c a d b
c b a d
c b d a
c d a b
c d b a
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b a c d
b a d c
b c a d
b c d a
b d a c
b d c a
What permutations of a b c d can this
grammar produce?
d a b c
d a c b
d b a c
d b c a
d c a b
d c b a
a b c d
a b d c
a c b d
a c d b
a d b c
a d c b
c a b d
c a d b
c b a d
c b d a
c d a b
c d b a
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
inside-outside alignments
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
inside-outside alignments
X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
inside-outside alignments
X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2
No equivalent binary-branching SCFG
PermutationsX → X1 X2, X1 X2
X → X1 X2, X2 X1
X → a, a
b d a c
What permutations of a b c d can this
grammar produce?
c a d b
X → b, bX → c, cX → d, d
inside-outside alignments
X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2
No equivalent binary-branching SCFGComplexity of many problems is exponential in rank
Parsing as deductionXi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
[X ! w][wi+1 = w]
[i,X, i+ 1]
[X ! Y Z][i, Y, k][k, Z, j]
[i,X, j]
For sentence w1…wn, grammar G with nonterminals N
[i,X, j] 8i, j 2 0, ..., n,X 2 N
[X ! w] 8X ! w 2 PG
[wi = w] 8i 2 1, ..., naxioms:
items:inference rules:
[1, S, n]goal:
[X ! Y Z] 8X ! Y Z 2 PG
From proof to (pseudo)codeInput: w1…wn, grammar Gfor i in 1,…,n: for X->w_i in P(G): chart[i-1,X,i] := truefor l in 2,…,n: for i in 0,…,n-l: j := i+l for k in i+1,…,j-1: for X->YZ in P(G): if chart[i,Y,k] and chart[k,Z,j]: chart[i,X,j] := truereturn chart[0,S,n]
That’s nice, but…
•We need probabilities.
•We need to compute the most probable parse.
•We need to compute expectations.
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.06
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.06
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.06
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.8
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.8
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.80.56
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Most probable parse
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
0.80.56
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.06
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.06
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.86
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.86
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
NN → duck
PRP → I
VBD → saw
PRP$ → her
NP → PRP$ NN
VP → VBD NP
S → PRP VP
PRP → her
VB → duckSBAR → PRP VB
VP → VBD SBAR
(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)
0.860.602
1.0 0.3
0.3 1.0
1.01.0
1.0
0.7
Rule expectations
I1 saw2 her3 duck4
PRP0,1
VBD1,2
VP1,4
PRP$2,3 NN3,4
NP2,4
PRP2,3 VB3,4
SBAR2,4
S0,4
0.860.602
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
h{T, F},_, F,^, T i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
h{T, F},_, F,^, T i
hR+,max, 0,⇥, 1i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
h{T, F},_, F,^, T i
hR+,max, 0,⇥, 1i
hR+,+, 0,⇥, 1i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))
h{T, F},_, F,^, T i
hR+,max, 0,⇥, 1i
hR+,+, 0,⇥, 1i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))
h{T, F},_, F,^, T i
hA,�,0,⌦,1i
hR+,max, 0,⇥, 1i
hR+,+, 0,⇥, 1i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))
h{T, F},_, F,^, T i
hA,�,0,⌦,1isemiring
hR+,max, 0,⇥, 1i
hR+,+, 0,⇥, 1i
Similarities
Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))
Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))
Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))
boolean
tropical
inside
h{T, F},_, F,^, T i
hA,�,0,⌦,1isemiring
hR+,max, 0,⇥, 1i
hR+,+, 0,⇥, 1i
Parsing as weighted deductionFor sentence w1…wn, grammar G with nonterminals N
[i,X, j] 8i, j 2 0, ..., n,X 2 N
[X ! w] 8X ! w 2 PG
[wi = w] 8i 2 1, ..., naxioms:
items:inference rules:
[1, S, n]goal:
[X ! w] : u [wi+1 = w] : v
[i,X, i+ 1] : u⌦ v
[X ! Y Z] : u [i, Y, k] : v [k, Z, j] : y
[i,X, j] : u⌦ v ⌦ y
[X ! Y Z] 8X ! Y Z 2 PG
From proof to (pseudo)codeInput: w1…wn, grammar Gfor i in 1,…,n: for X->w_i in P(G): chart[i-1,X,i] := u(X->w_i)for l in 2,…,n: for i in 0,…,n-l: j := i+l for k in i+1,…,j-1: for X->YZ in P(G): chart[i,X,j] += chart[i,Y,k]*chart[k,Z,j]return chart[0,S,n]
Semiring parsing•Viterbi, inside, boolean (Goodman 1999)
•Expectation and variance semirings (Li & Eisner 2009)
•Feature expectations
•Minimum Bayes Risk
•Gradients, etc.
•minimum error upper envelope (Kumar et al. 2009)