Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Neural Semantic Role Labeling:What works and what’s next
or: What else can we do other than using 1000 LSTM layers :)
Luheng He†, Kenton Lee†, Mike Lewis ‡ and Luke Zettlemoyer†*
† Paul G. Allen School of Computer Science & Engineering, Univ. of Washington, ‡ Facebook AI Research
* Allen Institute for Artificial Intelligence
1
Semantic Role Labeling (SRL)
2
who did what to whom, when and where
Motivation
predicate argument
role label
who what when where why …
• Find out “who did what to whom” in text. • Given predicate, identify arguments and label them.
Semantic Role Labeling (SRL)
2
who did what to whom, when and where
Question Answering Information Extraction Machine Translation
Applications
Motivation
predicate argument
role label
who what when where why …
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
vsubj obj prep
subj v prep adv
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
vsubj obj prep
subj v prep adv
thing broken
thing broken
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
vsubj obj prep
subj v prep adv
thing broken
thing broken
breaker instrument
pieces (final state) temporal
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
vsubj obj prep
Frame: break.01role description
ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces
ARG4 broken away from what?
subj v prep adv
thing broken
thing broken
breaker instrument
pieces (final state) temporal
My mug broke into pieces immediately.
The robot broke my favorite mug with a wrench.
Semantic Role Labeling (SRL)
3
vsubj obj prep
Frame: break.01role description
ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces
ARG4 broken away from what?
subj v prep adv
thing broken
thing broken
breaker instrument
pieces (final state) temporal
ARG0 ARG1 ARG2
ARG3ARG1 ARGM-TMP
4
The Proposition Bank (PropBank)
4
Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002
4
The Proposition Bank (PropBank)
Annotated on top of the Penn Treebank Syntax
PropBank Annotation Guidelines, Bonial et al., 2010
4
Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002
4
The Proposition Bank (PropBank)
Core roles: Verb-specific roles (ARG0-
ARG5) defined in frame files
Frame: break.01
role descriptionARG0 breakerARG1 thing broken
ARG2 instrument
ARG3 pieces
ARG4 broken away from what?
Annotated on top of the Penn Treebank Syntax
PropBank Annotation Guidelines, Bonial et al., 2010
4
Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002
4
The Proposition Bank (PropBank)
Core roles: Verb-specific roles (ARG0-
ARG5) defined in frame files
Frame: break.01
role descriptionARG0 breakerARG1 thing broken
ARG2 instrument
ARG3 pieces
ARG4 broken away from what?
Frame: buy.01
role descriptionARG0 buyerARG1 thing bough
ARG2 seller
ARG3 price paid
ARG4 benefactive
Annotated on top of the Penn Treebank Syntax
PropBank Annotation Guidelines, Bonial et al., 2010
4
Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002
4
The Proposition Bank (PropBank)
Core roles: Verb-specific roles (ARG0-
ARG5) defined in frame files
Frame: break.01
role descriptionARG0 breakerARG1 thing broken
ARG2 instrument
ARG3 pieces
ARG4 broken away from what?
Frame: buy.01
role descriptionARG0 buyerARG1 thing bough
ARG2 seller
ARG3 price paid
ARG4 benefactive
role descriptionTMP temporalLOC locationMNR mannerDIR directionCAU causePRP purpose…
Adjunct roles: (ARGM-) shared
across verbsAnnotated on top of the Penn Treebank Syntax
PropBank Annotation Guidelines, Bonial et al., 2010
4
Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002
SRL is a hard problem …
5
SRL is a hard problem …
• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)
5
SRL is a hard problem …
• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)
• Many interesting challenges: Syntactic alternation Prepositional phrase attachment Long-range dependencies and common sense
5
The robot plays piano.
The music plays softly.
ARG0player
ARG1thing performed
ARG2instrument
ARGM-MNR
The cafe is playing my favorite song.
ARG1thing performed
ARG0player
6
Syntactic Alternation PP Attachment Long-range Dependencies
The robot plays piano.
The music plays softly.
ARG0player
ARG1thing performed
ARG2instrument
ARGM-MNR
The cafe is playing my favorite song.
ARG1thing performed
ARG0player
subjects
6
Syntactic Alternation PP Attachment Long-range Dependencies
The robot plays piano.
The music plays softly.
ARG0player
ARG1thing performed
ARG2instrument
ARGM-MNR
The cafe is playing my favorite song.
ARG1thing performed
ARG0player
subjects
objects
6
Syntactic Alternation PP Attachment Long-range Dependencies
I eat [pasta] [with delight].ARG0eater
ARG1meal
ARGM-MNRmanner
7
Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range
Dependencies
I eat [pasta] [with delight].ARG0eater
ARG1meal
ARGM-MNRmanner
I eat [pasta with broccoli].ARG0eater
ARG1meal
7
Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range
Dependencies
8
We flew to Chicago.ARG1
passengerARGM-GOL
goal
Syntactic Alternation PP Attachment Long-range Dependencies
8
We flew to Chicago.ARG1
passengerARGM-GOL
goal
We remember the nice view flying to Chicago.ARG1
passengerARGM-GOL
goal
Syntactic Alternation PP Attachment Long-range Dependencies
8
We flew to Chicago.ARG1
passengerARGM-GOL
goal
We remember the nice view flying to Chicago.ARG1
passengerARGM-GOL
goal
We remember John and Mary flying to Chicago.ARG1
passengerARGM-GOL
goal
Syntactic Alternation PP Attachment Long-range Dependencies
SRL is even harder for out-domain data …
“Dip chicken breasts into eggs to coat”
9
SRL is even harder for out-domain data …
“Dip chicken breasts into eggs to coat”
9
SRL is even harder for out-domain data …
“Dip chicken breasts into eggs to coat”
Active, Ser133-phosphorylated CREB effects transcription of CRE-dependent genes via interaction with the 265-kDa …
9
10
Long-term Plan for Improving SRL
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
10
Long-term Plan for Improving SRL
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
10
Long-term Plan for Improving SRL
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
10
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
10
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
11
First Step: Collect more (cheaper) SRL Data
11
Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.
First Step: Collect more (cheaper) SRL Data
11
Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.
First Step: Collect more (cheaper) SRL Data
Challenge: Complicated annotation process of traditional SRL.
11
Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.
First Step: Collect more (cheaper) SRL Data
Challenge: Complicated annotation process of traditional SRL.
Solution: Design a simpler annotation scheme!
12
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Question-Answer Driven SRL (QA-SRL)
12
Who was flying?
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Step 1: Ask a question about the verb:
Question-Answer Driven SRL (QA-SRL)
12
Who was flying? we
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Step 1: Ask a question about the verb:
Step 2: Answer with words in the sentence:
Question-Answer Driven SRL (QA-SRL)
12
Who was flying? we
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Step 1: Ask a question about the verb:
Step 2: Answer with words in the sentence:
Step 3: Repeat, write as many Q/A pairs as possible …
Question-Answer Driven SRL (QA-SRL)
12
Who was flying? we
Where did someone fly to? Chicago
When did someone fly? Last month
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Step 1: Ask a question about the verb:
Step 2: Answer with words in the sentence:
Step 3: Repeat, write as many Q/A pairs as possible …
Question-Answer Driven SRL (QA-SRL)
12
Who was flying? we
Where did someone fly to? Chicago
When did someone fly? Last month
Last month, we saw the Grand Canyon flying to Chicago.
Given sentence and a verb:
Step 1: Ask a question about the verb:
Step 2: Answer with words in the sentence:
Step 3: Repeat, write as many Q/A pairs as possible …
Stop until all Q/A pairs are exhausted.
Question-Answer Driven SRL (QA-SRL)
13
Predicate Argument (Verbal) Predicate
Answer
Role Question
Question-Answer Driven SRL (QA-SRL)
Traditional SRL (PropBank)
Large Role Inventory No Role Inventory!
Comparing QA-SRL to PropBank
14
Last month, we saw the Grand Canyon flying to Chicago.
Question-Answer Driven SRL (QA-SRL)
Who was flying?Where did someone fly to?
When did someone fly?
QA-SRL
14
Last month, we saw the Grand Canyon flying to Chicago.
Question-Answer Driven SRL (QA-SRL)
Who was flying?Where did someone fly to?
When did someone fly?
QA-SRL
ARG1ARGM-GOL
ARGM-TMP
PropBank SRL
14
Last month, we saw the Grand Canyon flying to Chicago.
Question-Answer Driven SRL (QA-SRL)
Who was flying?Where did someone fly to?
When did someone fly?
QA-SRL
ARG1ARGM-GOL
ARGM-TMP
PropBank SRL
Non-expert annotated QA-SRL has about 80% agreement with PropBank.
15
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
15
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
✓
16
SRL Systems
syntactic features
candidate argument spans
labeled arguments
prediction
labeling
ILP/DP
sentence, predicate
argument id.
Pipeline Systems
Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015
16
SRL Systems
syntactic features
candidate argument spans
labeled arguments
prediction
labeling
ILP/DP
sentence, predicate
argument id.
Pipeline Systems
Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015
sentence, predicate
BIO sequence
prediction
Deep BiLSTM + CRF layer
Viterbi
context window features
End-to-end Systems
Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015
16
SRL Systems
syntactic features
candidate argument spans
labeled arguments
prediction
labeling
ILP/DP
sentence, predicate
argument id.
Pipeline Systems
Deep BiLSTM
Hard constraints
BIO sequence
prediction
sentence, predicate
*This work
Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015
sentence, predicate
BIO sequence
prediction
Deep BiLSTM + CRF layer
Viterbi
context window features
End-to-end Systems
Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015
17
The cats love hats .Input (sentence and predicate):
SRL as BIO Tagging Problem
17
The cats love hats .Input (sentence and predicate):
BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O(Begin, Inside, Outside)
SRL as BIO Tagging Problem
17
The cats love hats .Input (sentence and predicate):
BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O
Final SRL output: ARG0 V ARG1
(Begin, Inside, Outside)
SRL as BIO Tagging Problem
18
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …
18
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …
(1) Deep BiLSTM tagger
18
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …
(1) Deep BiLSTM tagger
(2) Highway connections
18
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …
(1) Deep BiLSTM tagger
(2) Highway connections
(3) Variational dropout
18
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …
(1) Deep BiLSTM tagger
(2) Highway connections
(4) Viterbi decoding with
hard constraints
(3) Variational dropout
19
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 I-ARG0 B-V B-ARG1
Forward LSTM
Backward LSTM
Softmax
Word + Tag embeddings
Output
argmax
Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections
Variational Dropout
Viterbi Decoding w\ Hard Constraints
19
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 I-ARG0 B-V B-ARG1
Forward LSTM
Backward LSTM
Softmax
Word + Tag embeddings
Output
argmax
Concatenate: 100dim + 100dim
Predicate
Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections
Variational Dropout
Viterbi Decoding w\ Hard Constraints
19
the cats love hats[ ] [ ] [V] [ ]
B-ARG0 I-ARG0 B-V B-ARG1
Forward LSTM
Backward LSTM
Softmax
Word + Tag embeddings
OutputB-ARG0I-ARG0
B-ARG1O
0.175 0.35 0.525 0.7
0.030.1
0.70.1
argmax
Concatenate: 100dim + 100dim
Predicate
Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections
Variational Dropout
Viterbi Decoding w\ Hard Constraints
Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers
Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers
this work: 8 layers
Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers
20
Model - (2) Highway Connections
Trend: Deeper models for higher accuracy
Deep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
21
the cats love hats[ ] [ ] [V] [ ]
BiLSTM layers 1-2
increase expressive
power
21
the cats love hats[ ] [ ] [V] [ ]
BiLSTM layers 1-2
BiLSTM layers 3-4
increase expressive
power
21
the cats love hats[ ] [ ] [V] [ ]
BiLSTM layers 1-2
BiLSTM layers 3-4
BiLSTM layers 5-6
increase expressive
power
21
the cats love hats[ ] [ ] [V] [ ]
BiLSTM layers 1-2
BiLSTM layers 3-4
BiLSTM layers 5-6
increase expressive
power
21
the cats love hats[ ] [ ] [V] [ ]
BiLSTM layers 1-2
BiLSTM layers 3-4
BiLSTM layers 5-6
increase expressive
power
harder to back-
propagate
Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers
Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers
this work: 8 layers
Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers
22
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers
Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers
this work: 8 layers
Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers
22
use different learning rates for different layers (harder to reimplement)
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers
Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers
this work: 8 layers
Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers
22
use shortcut connections between layers (“highway” or “residual”)
use different learning rates for different layers (harder to reimplement)
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
23
input from the previous layer
recurrent input
from the prev. timestep
output to the next layer
References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial
Training Very Deep Networks, Srivastava et al., 2015
Non-linearity
ht
ht�1 F(ct�1,ht�1,xt)
xt
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
23
input from the previous layer
recurrent input
from the prev. timestep
output to the next layer
References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial
Training Very Deep Networks, Srivastava et al., 2015
Non-linearity
ht shortcut
ht�1 F(ct�1,ht�1,xt)
xt
xt
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
23
input from the previous layer
recurrent input
from the prev. timestep
output to the next layer
References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial
Training Very Deep Networks, Srivastava et al., 2015
Non-linearity
ht shortcut
ht�1 F(ct�1,ht�1,xt)
residual netht + xt
xt
xt
new output:
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
23
input from the previous layer
recurrent input
from the prev. timestep
output to the next layer
References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial
Training Very Deep Networks, Srivastava et al., 2015
Non-linearity
ht shortcut
ht�1 F(ct�1,ht�1,xt)
residual netht + xt
gated highway network:
rt � ht + (1� rt) � xt
rt = �(f(ht�1,xt))
xt
xt
new output:
Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational
DropoutViterbi Decoding w\
Hard Constraints
24
the cats love[ ] [ ] [V]
Model - (3) Variational DropoutDeep BiLSTM Tagger
Highway Connections Variational Dropout Viterbi Decoding w\
Hard Constraints
24
the cats love[ ] [ ] [V]
Traditionally, dropout masks are only applied to vertical connections.
Model - (3) Variational DropoutDeep BiLSTM Tagger
Highway Connections Variational Dropout Viterbi Decoding w\
Hard Constraints
24
the cats love[ ] [ ] [V]
Traditionally, dropout masks are only applied to vertical connections.
Applying dropout to recurrent connections causes too much noise amplification.
Model - (3) Variational DropoutDeep BiLSTM Tagger
Highway Connections Variational Dropout Viterbi Decoding w\
Hard Constraints
24
the cats love[ ] [ ] [V]
Traditionally, dropout masks are only applied to vertical connections.
Variational dropout: Reuse the same dropout mask for each timestep. Gal and Ghahramani, 2016
Applying dropout to recurrent connections causes too much noise amplification.
Model - (3) Variational DropoutDeep BiLSTM Tagger
Highway Connections Variational Dropout Viterbi Decoding w\
Hard Constraints
Softmax
BiLSTM layers …
25
B-ARG1 I-ARG0 B-V B-ARG1Greedy Output
argmax
the cats love hats[ ] [ ] [V] [ ]
Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger
Highway Connections
Variational Dropout Viterbi Decoding w\ Hard Constraints
Softmax
BiLSTM layers …
25
BIO inconsistency
B-ARG1 I-ARG0 B-V B-ARG1Greedy Output
argmax
the cats love hats[ ] [ ] [V] [ ]
Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger
Highway Connections
Variational Dropout Viterbi Decoding w\ Hard Constraints
Softmax
BiLSTM layers …
25
BIO inconsistency
s(B-ARG0 ! I-ARG0) =0
s(B-ARG1 ! I-ARG0) =� inf
· · ·
Heuristic transition scores
B-ARG1 I-ARG0 B-V B-ARG1Greedy Output
argmax
the cats love hats[ ] [ ] [V] [ ]
Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger
Highway Connections
Variational Dropout Viterbi Decoding w\ Hard Constraints
Softmax
BiLSTM layers …
25
B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03
… …O 0.01
B-ARG0 0.1I-ARG0 0.5B-ARG1 0.1I-ARG1 0.2
… …O 0.05
B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001I-ARG1 0.002
… …B-V 0.95
B-ARG0 0.1I-ARG0 0.1B-ARG1 0.7I-ARG1 0.2
… …O 0.05
s(B-ARG0 ! I-ARG0) =0
s(B-ARG1 ! I-ARG0) =� inf
· · ·
Heuristic transition scores
Viterbi decoding
the cats love hats[ ] [ ] [V] [ ]
Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger
Highway Connections
Variational Dropout Viterbi Decoding w\ Hard Constraints
Other Implementation Details …
26
• 8 layer BiLSTMs with 300D hidden layers.
• 100D GloVe embeddings, updated during training.
• Orthonormal initialization for LSTM weight matrices (Saxe et al., 2013)
• 5 model ensemble with product-of-experts (Hinton 2002)
• Trained for 500 epochs.
Datasets
27
CoNLL-2005(PropBank)
CoNLL-2012(OntoNotes)
Size 40k sentences 140k sentences
Domains newswire
• telephone conversations • newswire • newsgroups • broadcast news • broadcast conversation • weblogs
Annotated predicates Verbs Added some nominal
predicates
CoNLL 2005 Results
28
F1
60
65
70
75
80
85
90
Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*
WSJ Test Brown (out-domain) Test *:Ensemble models
CoNLL 2005 Results
28
F1
60
65
70
75
80
85
90
Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*
WSJ Test Brown (out-domain) Test
79.480.379.980.382.883.184.6
*:Ensemble models
CoNLL 2005 Results
28
F1
60
65
70
75
80
85
90
Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*
WSJ Test Brown (out-domain) Test
67.868.871.372.2
69.472.173.6
79.480.379.980.382.883.184.6
*:Ensemble models
CoNLL 2005 Results
28
F1
60
65
70
75
80
85
90
Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*
WSJ Test Brown (out-domain) Test
67.868.871.372.2
69.472.173.6
79.480.379.980.382.883.184.6
Pipeline modelsBiLSTM models
*:Ensemble models
CoNLL 2012 (OntoNotes) Results
29
F1
60
65
70
75
80
85
90
Ours* Ours Zhou FitzGerald* Täckström Pradhan
77.579.480.281.581.783.4
CoNLL 2012 Test
Pipeline modelsBiLSTM models
*:Ensemble models
Ablations on Number of Layers (2,4,6 and 8)
30
F1 o
n C
oNLL
-05
Dev
.
70
75
80
85
L2 L4 L6 L8
Greedy decodingViterbi decoding
80.580.179.1
74.6
Ablations on Number of Layers (2,4,6 and 8)
30
F1 o
n C
oNLL
-05
Dev
.
70
75
80
85
L2 L4 L6 L8
Greedy decodingViterbi decoding
81.681.480.5
77.2
80.580.179.1
74.6
Ablations on Number of Layers (2,4,6 and 8)
30
F1 o
n C
oNLL
-05
Dev
.
70
75
80
85
L2 L4 L6 L8
Greedy decodingViterbi decoding
81.681.480.5
77.2
80.580.179.1
74.6Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.
Ablations on Number of Layers (2,4,6 and 8)
30
F1 o
n C
oNLL
-05
Dev
.
70
75
80
85
L2 L4 L6 L8
Greedy decodingViterbi decoding
81.681.480.5
77.2
80.580.179.1
74.6
Shallow models benefit more from constrained decoding.
Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.
Ablations (single model, on CoNLL05 Dev)
31
F1 o
n D
ev. S
et
60
65
70
75
80
85
Num. Epochs
1 50 100 150 200 250 300 350 400 450 500
Full model No highway No orthonormal init. No dropout
Ablations (single model, on CoNLL05 Dev)
31
F1 o
n D
ev. S
et
60
65
70
75
80
85
Num. Epochs
1 50 100 150 200 250 300 350 400 450 500
Full model No highway No orthonormal init. No dropout
Ablations (single model, on CoNLL05 Dev)
31
F1 o
n D
ev. S
et
60
65
70
75
80
85
Num. Epochs
1 50 100 150 200 250 300 350 400 450 500
Full model No highway No orthonormal init. No dropout
Without initialization, the deep model learns very slowly
Ablations (single model, on CoNLL05 Dev)
31
F1 o
n D
ev. S
et
60
65
70
75
80
85
Num. Epochs
1 50 100 150 200 250 300 350 400 450 500
Full model No highway No orthonormal init. No dropout
Without dropout, model overfits at ~300 epochs.
Without initialization, the deep model learns very slowly
What can we learn from the results?
32
F1
60657075808590
Ours* Ours Zhou FitzGerald* Täckström Pradhan
77.579.480.281.581.783.4
1. What’s in the remaining 17%? When does the model still struggle?
What can we learn from the results?
32
F1
60657075808590
Ours* Ours Zhou FitzGerald* Täckström Pradhan
77.579.480.281.581.783.4
1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?
What can we learn from the results?
32
F1
60657075808590
Ours* Ours Zhou FitzGerald* Täckström Pradhan
77.579.480.281.581.783.4
1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?3. BiLSTM-based models are very accurate even without syntax. But
can we conclude syntax is no longer useful in SRL?
33
Question (1): When does the model make mistakes?
33
Question (1): When does the model make mistakes?
Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)
Oracle Transformations (Error Breakdown)
34
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
Oracle Transformations (Error Breakdown)
34
[We] fly to NYC tomorrow.ARG0ARG1
1) Fix Label:
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
Oracle Transformations (Error Breakdown)
34
[We] fly to NYC tomorrow.ARG0ARG1
1) Fix Label: [They] wrote [an email] to
cancel it.
ARG0ARG02) Move core arg:
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
Oracle Transformations (Error Breakdown)
34
[We] fly to NYC tomorrow.ARG0ARG1
I eat [pasta with delight].ARG1
[pasta] [with delight]ARG1 ARGM-MNR
1) Fix Label: [They] wrote [an email] to
cancel it.
ARG0ARG02) Move core arg:
3) Split/Merge span:
I eat [pasta] [with broccoli].
ARG1[pasta with broccoli]
ARG1 ARGM-MNR
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
Oracle Transformations (Error Breakdown)
34
[We] fly to NYC tomorrow.ARG0ARG1
I eat [pasta with delight].ARG1
[pasta] [with delight]ARG1 ARGM-MNR
[“No broccoli”,] I said.
[“No broccoli”] ,
ARG1
1) Fix Label: [They] wrote [an email] to
cancel it.
ARG0ARG02) Move core arg:
3) Split/Merge span:
I eat [pasta] [with broccoli].
ARG1[pasta with broccoli]
ARG1 ARGM-MNR
4) Fix span boundary:
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
Oracle Transformations (Error Breakdown)
34
[We] fly to NYC tomorrow.ARG0ARG1
I eat [pasta with delight].ARG1
[pasta] [with delight]ARG1 ARGM-MNR
[“No broccoli”,] I said.
[“No broccoli”] ,
ARG1
1) Fix Label: [They] wrote [an email] to
cancel it.
ARG0ARG02) Move core arg:
3) Split/Merge span:
I eat [pasta] [with broccoli].
ARG1[pasta with broccoli]
ARG1 ARGM-MNR
4) Fix span boundary:
5) Drop/add arg:
Hesitantly, they declined to elaborate [on that matter].
[Hesitantly], they declined to elaborate on that matter.
ARG1ARGM-MNR+
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
Oracle Transformations
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Labeling error 29.3%
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
35
F1 a
fter e
rror fi
x
75
81.25
87.5
93.75
100
Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.
Ours Pradhan Punyakanok
Labeling error 29.3%
Attachment error 25.3%
Oracle Transformations (Error Breakdown) Result
Pradhan, Punyakanok: CoNLL-2005 systems
Error Breakdown Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
36
Confusion matrix for labeling errors (row normalized)
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
36
Confusion matrix for labeling errors (row normalized)
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
36
Confusion matrix for labeling errors (row normalized)
Predicate: move
Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Predicate: cut
Arg0-PAG: intentional cutter Arg1-PPT: thing cut Arg2-DIR: medium, source Arg3-MNR: instrument, unintentional cutter Arg4-GOL: beneficiary
36
Confusion matrix for labeling errors (row normalized)
Predicate: move
Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Predicate: cut
Arg0-PAG: intentional cutter Arg1-PPT: thing cut Arg2-DIR: medium, source Arg3-MNR: instrument, unintentional cutter Arg4-GOL: beneficiary
36
Confusion matrix for labeling errors (row normalized)
Predicate: move
Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving
Predicate: strike
Arg0-PAG: Agent Arg1-PPT: Theme(-Creation) Arg2-MNR: Instrument
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Predicate: cut
Arg0-PAG: intentional cutter Arg1-PPT: thing cut Arg2-DIR: medium, source Arg3-MNR: instrument, unintentional cutter Arg4-GOL: beneficiary
36
Confusion matrix for labeling errors (row normalized)
Predicate: move
Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving
Predicate: strike
Arg0-PAG: Agent Arg1-PPT: Theme(-Creation) Arg2-MNR: Instrument
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
• Argument-adjunct distinctions are difficult even for human annotators!
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Predicate: cut
Arg0-PAG: intentional cutter Arg1-PPT: thing cut Arg2-DIR: medium, source Arg3-MNR: instrument, unintentional cutter Arg4-GOL: beneficiary
36
Confusion matrix for labeling errors (row normalized)
Predicate: move
Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving
Predicate: strike
Arg0-PAG: Agent Arg1-PPT: Theme(-Creation) Arg2-MNR: Instrument
• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?
• Argument-adjunct distinctions are difficult even for human annotators!
Error Breakdown Labeling Errors PP
AttachmentLong-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
“After many attempts to find a reliable test to distinguish between arguments and adjuncts, we abandoned structurally marking this difference.”
—The Penn Treebank: An Overview (Taylor et al., 2003)
37
Sumimoto financed the acquisition from Sears
Error Breakdown
Labeling Errors PP Attachment Long-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
37
Sumimoto financed the acquisition from Sears
Wrong PP attachment (attach high)
Correct PP attachment (attach low)
Arg2 (PP)Arg1 (NP)
Arg1 (NP)
Error Breakdown
Labeling Errors PP Attachment Long-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
37
Sumimoto financed the acquisition from Sears
Wrong PP attachment (attach high)
Correct PP attachment (attach low)
Arg2 (PP)Arg1 (NP)
Arg1 (NP)
Wrong SRL spans
Correct SRL spans
merge
Error Breakdown
Labeling Errors PP Attachment Long-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
37
Sumimoto financed the acquisition from Sears
Wrong PP attachment (attach high)
Correct PP attachment (attach low)
Arg2 (PP)Arg1 (NP)
Arg1 (NP)
Wrong SRL spans
Correct SRL spans
merge
Error Breakdown
Labeling Errors PP Attachment Long-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Merge/split span operations: 25.3%. of the mode mistakes.
Categorize the Y spans in : [XY]—>[X][Y] and
[X][Y]—>[XY] operations using gold syntactic labels
37
Sumimoto financed the acquisition from Sears
Wrong PP attachment (attach high)
Correct PP attachment (attach low)
Arg2 (PP)Arg1 (NP)
Arg1 (NP)
Wrong SRL spans
Correct SRL spans
merge
Error Breakdown
Labeling Errors PP Attachment Long-range
DependenciesStructural
ConsistencyCan Syntax Still Help?
Merge/split span operations: 25.3%. of the mode mistakes.
Categorize the Y spans in : [XY]—>[X][Y] and
[X][Y]—>[XY] operations using gold syntactic labels
38
Question (1): When does the model make mistakes?
Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)
38
Question (1): When does the model make mistakes?
Takeaway— Traditionally hard tasks, such as argument-adjunct distinction and PP attachment decisions are still challenging! — Use external information to improve PP attachment.
Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)
39
Question (2): What are deeper models good at?
Analysis— Long-range dependencies: model performance on arguments that are far away from the predicates. — Structural consistency: amount of inconsistent BIO tag pairs in greedy prediction.
40
Avg.
F1
54
62.5
71
79.5
88
0 1-2 3-7 7-max
L8 L6 L4 L2 PunyakanokPradhan
0
1250
2500
3750
5000
Distance to the predicate (by num. words)
0 1-2 3-7 7-max
6571,174
2,151
4,469
Num. gold arguments
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
40
Avg.
F1
54
62.5
71
79.5
88
0 1-2 3-7 7-max
L8 L6 L4 L2 PunyakanokPradhan
0
1250
2500
3750
5000
Distance to the predicate (by num. words)
0 1-2 3-7 7-max
6571,174
2,151
4,469
Num. gold arguments
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
40
Avg.
F1
54
62.5
71
79.5
88
0 1-2 3-7 7-max
L8 L6 L4 L2 PunyakanokPradhan
0
1250
2500
3750
5000
Distance to the predicate (by num. words)
0 1-2 3-7 7-max
6571,174
2,151
4,469
Num. gold arguments
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
40
Avg.
F1
54
62.5
71
79.5
88
0 1-2 3-7 7-max
L8 L6 L4 L2 PunyakanokPradhan
0
1250
2500
3750
5000
Distance to the predicate (by num. words)
0 1-2 3-7 7-max
6571,174
2,151
4,469
Num. gold arguments
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
40
Avg.
F1
54
62.5
71
79.5
88
0 1-2 3-7 7-max
L8 L6 L4 L2 PunyakanokPradhan
0
1250
2500
3750
5000
Distance to the predicate (by num. words)
0 1-2 3-7 7-max
6571,174
2,151
4,469
Num. gold arguments
Deep models’ performance deteriorates slower on long-distance predictions
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency
Can Syntax Still Help?
41
BIO
-vio
latio
ns/T
oken
0
0.06
0.12
0.18
L2 L4 L6 L8 L8+Ensemble
0.070.070.060.08
0.18
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies Structural Consistency Can Syntax
Still Help?
e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)
41
Deeper models (with 4+ layers) generate more consistent BIO sequences.
BIO
-vio
latio
ns/T
oken
0
0.06
0.12
0.18
L2 L4 L6 L8 L8+Ensemble
0.070.070.060.08
0.18
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies Structural Consistency Can Syntax
Still Help?
e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)
42
Question (3): Can syntax still help SRL?
Recap— PropBank SRL is annotated on top of the PTB syntax. — More than 98% of the gold SRL spans are syntactic constituents. Analysis— At decoding time, make predicted argument spans agree with given syntactic structure. — See if SRL performance increases.
F1
75
80
85
90
95
100
% Arguments in gold syntax tree91 93 95 97 99
43
L2
Syntax-aware modelsBiLSTM-based models
Gold
PunyakanokPradhan
L4 L6 L8 L8+PoE
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
F1
75
80
85
90
95
100
% Arguments in gold syntax tree91 93 95 97 99
43
L2
Syntax-aware modelsBiLSTM-based models
Gold
PunyakanokPradhan
L4 L6 L8 L8+PoE
Improve SRL accuracy by adding
syntactic information?
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
44
[The cats] love [hats and the dogs] love bananas. ARG0 ARG1
[The cats][hats and the dogs]
2 Syntax Tree
62 Syntax Tree
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
44
[The cats] love [hats and the dogs] love bananas. ARG0 ARG1
[The cats][hats and the dogs]
2 Syntax Tree
62 Syntax Tree
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
Penalize sequence score
44
[The cats] love [hats and the dogs] love bananas. ARG0 ARG1
[The cats][hats and the dogs]
Sequence score:
2 Syntax Tree
62 Syntax Tree
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
tX
i=1
log p(tagt | sentence)� C ⇥X
span
1(span 62 Syntax Tree)
Penalize sequence score
Num. arguments disagree w\ syntax
Penalty strength
44
• Constraints are not locally decomposable. • A* search (Lewis and Steedman 2014) for a sequence with highest score.
[The cats] love [hats and the dogs] love bananas. ARG0 ARG1
[The cats][hats and the dogs]
Sequence score:
2 Syntax Tree
62 Syntax Tree
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
tX
i=1
log p(tagt | sentence)� C ⇥X
span
1(span 62 Syntax Tree)
Penalize sequence score
Num. arguments disagree w\ syntax
Penalty strength
45
Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016
(State of the art)
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
45
Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016
(State of the art)
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Constrained Decoding with Syntax
F1
75
80
85
90
95
100
% Arguments agree with gold syntax91 93 95 97 99
46
L2
Syntax-aware modelsBiLSTM-based models
Gold
PunyakanokPradhan
L4 L6 L8 L8+PoE
Constrained decoding w\ Syntax
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
F1
75
80
85
90
95
100
% Arguments agree with gold syntax91 93 95 97 99
46
L2
Syntax-aware modelsBiLSTM-based models
Gold
PunyakanokPradhan
L4 L6 L8 L8+PoE +AutoSyn+GoldSyn
Constrained decoding w\ Syntax
Error Breakdown
Labeling Errors
PP Attachment
Long-range Dependencies
Structural Consistency Can Syntax Still Help?
Takeaway— Modest gain observed with predicted syntax. — Joint training could bring more improvement.
47
Contributions (Neural SRL)
• New state-of-the-art deep network for end-to-end SRL.
• Code and models will be publicly available at: https://github.com/luheng/deep_srl
• In-depth error analysis indicating where the models work well and where they still struggle.
• Syntax-based experiments pointing towards directions for future improvements.
48
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
✓
48
Long-term Plan for Improving SRL
Step 3: SRL system for many domains — Future work …
Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing
Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)
✓
✓
49
Thanks!
Frame: break.01
role descriptionARG0 breakerARG1 thing broken
ARG2 instrument
ARG3 pieces
ARG4 broken away from what?
Model AnalysisProblem
Code will be available at: https://github.com/luheng/deep_srl