Neural Semantic Role Labeling: What works and what’s nextluheng.github.io/files/deep_srl_slides_long.pdf · Neural Semantic Role Labeling: What works and what’s next or: What

Neural Semantic Role Labeling:What works and what’s next

or: What else can we do other than using 1000 LSTM layers :)

Luheng He†, Kenton Lee†, Mike Lewis ‡ and Luke Zettlemoyer†*

† Paul G. Allen School of Computer Science & Engineering, Univ. of Washington, ‡ Facebook AI Research

* Allen Institute for Artificial Intelligence

1

Semantic Role Labeling (SRL)

2

who did what to whom, when and where

Motivation

predicate argument

role label

who what when where why …

• Find out “who did what to whom” in text. • Given predicate, identify arguments and label them.


2

who did what to whom, when and where

Question Answering Information Extraction Machine Translation

Applications

Motivation

predicate argument

role label

who what when where why …

My mug broke into pieces immediately.

The robot broke my favorite mug with a wrench.


3




3

vsubj obj prep

subj v prep adv




3

vsubj obj prep

subj v prep adv

thing broken

thing broken




3

vsubj obj prep

subj v prep adv

thing broken

thing broken

breaker instrument

pieces (final state) temporal




3

vsubj obj prep

Frame: break.01role description

ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces

ARG4 broken away from what?

subj v prep adv

thing broken

thing broken

breaker instrument





3

vsubj obj prep

Frame: break.01role description

ARG0 breakerARG1 thing brokenARG2 instrumentARG3 pieces


subj v prep adv

thing broken

thing broken

breaker instrument


ARG0 ARG1 ARG2

ARG3ARG1 ARGM-TMP

4

The Proposition Bank (PropBank)

4

Paul Kingsbury and Martha Palmer.From Treebank to PropBank. 2002

http://verbs.colorado.edu/propbank/framesets-english-aliases

http://www.lrec-conf.org/proceedings/lrec2002/pdf/283.pdf

4


Annotated on top of the Penn Treebank Syntax

PropBank Annotation Guidelines, Bonial et al., 2010

4




4


Core roles: Verb-specific roles (ARG0-

ARG5) defined in frame files

Frame: break.01

role descriptionARG0 breakerARG1 thing broken

ARG2 instrument

ARG3 pieces




4




4




Frame: break.01


ARG2 instrument

ARG3 pieces


Frame: buy.01

role descriptionARG0 buyerARG1 thing bough

ARG2 seller

ARG3 price paid

ARG4 benefactive



4




4




Frame: break.01


ARG2 instrument

ARG3 pieces


Frame: buy.01

role descriptionARG0 buyerARG1 thing bough

ARG2 seller

ARG3 price paid

ARG4 benefactive

role descriptionTMP temporalLOC locationMNR mannerDIR directionCAU causePRP purpose…

Adjunct roles: (ARGM-) shared

across verbsAnnotated on top of the Penn Treebank Syntax


4




SRL is a hard problem …

5


• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)

5


• Over 10 years, F1 on the PropBank test set: 79.4 (Punyakanok 2005) — 80.3 (FitzGerald 2015)

• Many interesting challenges: Syntactic alternation Prepositional phrase attachment Long-range dependencies and common sense

5

The robot plays piano.

The music plays softly.

ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR

The cafe is playing my favorite song.

ARG1thing performed

ARG0player

6

Syntactic Alternation PP Attachment Long-range Dependencies



ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR


ARG1thing performed

ARG0player

subjects

6




ARG0player

ARG1thing performed

ARG2instrument

ARGM-MNR


ARG1thing performed

ARG0player

subjects

objects

6


I eat [pasta] [with delight].ARG0eater

ARG1meal

ARGM-MNRmanner

7

Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range

Dependencies

I eat [pasta] [with delight].ARG0eater

ARG1meal

ARGM-MNRmanner

I eat [pasta with broccoli].ARG0eater

ARG1meal

7

Syntactic Alternation Prepositional Phrase (PP) Attachment Long-range

Dependencies

8

We flew to Chicago.ARG1

passengerARGM-GOL

goal


8


passengerARGM-GOL

goal

We remember the nice view flying to Chicago.ARG1

passengerARGM-GOL

goal


8


passengerARGM-GOL

goal

We remember the nice view flying to Chicago.ARG1

passengerARGM-GOL

goal

We remember John and Mary flying to Chicago.ARG1

passengerARGM-GOL

goal


SRL is even harder for out-domain data …

“Dip chicken breasts into eggs to coat”

9



9



Active, Ser133-phosphorylated CREB effects transcription of CRE-dependent genes via interaction with the 265-kDa …

9

10

Long-term Plan for Improving SRL

Step 1: Collect more data for SRL — Question-Answer Driven Semantic Role Labeling (QA-SRL) — Human-in-the-Loop Parsing

10



10


Step 2: Build accurate SRL model — Neural Semantic Role Labeling (for PropBank SRL)


10


Step 3: SRL system for many domains — Future work …



10




11

First Step: Collect more (cheaper) SRL Data

11

Intuition: Anyone who understands the meaning of a sentence should be able provide annotation for SRL.


11



Challenge: Complicated annotation process of traditional SRL.

11



Challenge: Complicated annotation process of traditional SRL.

Solution: Design a simpler annotation scheme!

12

Last month, we saw the Grand Canyon flying to Chicago.

Given sentence and a verb:

Question-Answer Driven SRL (QA-SRL)

12

Who was flying?



Step 1: Ask a question about the verb:


12

Who was flying? we




Step 2: Answer with words in the sentence:


12

Who was flying? we





Step 3: Repeat, write as many Q/A pairs as possible …


12

Who was flying? we

Where did someone fly to? Chicago

When did someone fly? Last month







12

Who was flying? we

Where did someone fly to? Chicago

When did someone fly? Last month






Stop until all Q/A pairs are exhausted.


13

Predicate Argument (Verbal) Predicate

Answer

Role Question


Traditional SRL (PropBank)

Large Role Inventory No Role Inventory!

Comparing QA-SRL to PropBank

14



Who was flying?Where did someone fly to?

When did someone fly?

QA-SRL

14





QA-SRL

ARG1ARGM-GOL

ARGM-TMP

PropBank SRL

14





QA-SRL

ARG1ARGM-GOL

ARGM-TMP

PropBank SRL

Non-expert annotated QA-SRL has about 80% agreement with PropBank.

15





15





✓

16

SRL Systems

syntactic features

candidate argument spans

labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems

Punyakanok et al., 2008 Täckström et al., 2015 FitzGerald et al., 2015

16

SRL Systems

syntactic features


labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems


sentence, predicate

BIO sequence

prediction

Deep BiLSTM + CRF layer

Viterbi

context window features

End-to-end Systems

Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015

16

SRL Systems

syntactic features


labeled arguments

prediction

labeling

ILP/DP

sentence, predicate

argument id.

Pipeline Systems

Deep BiLSTM

Hard constraints

BIO sequence

prediction

sentence, predicate

*This work


sentence, predicate

BIO sequence

prediction

Deep BiLSTM + CRF layer

Viterbi

context window features

End-to-end Systems

Collobert et al., 2011 Zhou and Xu, 2015 Wang et. al, 2015

17

The cats love hats .Input (sentence and predicate):

SRL as BIO Tagging Problem

17


BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O(Begin, Inside, Outside)


17


BIO output: B-ARG0 I-ARG0 B-V I-ARG1 O

Final SRL output: ARG0 V ARG1

(Begin, Inside, Outside)


18

the cats love hats[ ] [ ] [V] [ ]

B-ARG0 0.4I-ARG0 0.05B-ARG1 0.5I-ARG1 0.03

… …


… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95


… …

18



… …


… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95


… …

(1) Deep BiLSTM tagger

18



… …


… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95


… …


(2) Highway connections

18



… …


… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95


… …



(3) Variational dropout

18



… …


… …

B-ARG0 0.001I-ARG0 0.001B-ARG1 0.001

… …B-V 0.95


… …



(4) Viterbi decoding with

hard constraints

(3) Variational dropout

19


B-ARG0 I-ARG0 B-V B-ARG1

Forward LSTM

Backward LSTM

Softmax

Word + Tag embeddings

Output

argmax

Model - (1) Deep BiLSTM TaggerDeep BiLSTM Tagger Highway Connections

Variational Dropout

Viterbi Decoding w\ Hard Constraints

19



Forward LSTM

Backward LSTM

Softmax


Output

argmax

Concatenate: 100dim + 100dim

Predicate


Variational Dropout


19



Forward LSTM

Backward LSTM

Softmax


OutputB-ARG0I-ARG0

B-ARG1O

0.175 0.35 0.525 0.7

0.030.1

0.70.1

argmax

Concatenate: 100dim + 100dim

Predicate


Variational Dropout


Grammar as a Foreign Language (Vinyals et al., 2014): 3 layers End-to-end Semantic Role Labeling (Zhou and Xu, 2015): 8 layers

Google’s Neural Machine Translation (GNMT, Wu et al., 2016): 8 layers

this work: 8 layers

Deep Residual Learning for Image Recognition (He et al, 2016): 152 layers

20

Model - (2) Highway Connections

Trend: Deeper models for higher accuracy

Deep BiLSTM Tagger Highway Connections Variational

DropoutViterbi Decoding w\

Hard Constraints

21


BiLSTM layers 1-2

increase expressive

power

21


BiLSTM layers 1-2

BiLSTM layers 3-4

increase expressive

power

21


BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

21


BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

21


BiLSTM layers 1-2

BiLSTM layers 3-4

BiLSTM layers 5-6

increase expressive

power

harder to back-

propagate



this work: 8 layers


22

Model - (2) Highway ConnectionsDeep BiLSTM Tagger Highway Connections Variational


Hard Constraints



this work: 8 layers


22

use different learning rates for different layers (harder to reimplement)



Hard Constraints



this work: 8 layers


22

use shortcut connections between layers (“highway” or “residual”)

use different learning rates for different layers (harder to reimplement)



Hard Constraints

23

input from the previous layer

recurrent input

from the prev. timestep

output to the next layer

References: Deep Residual Networks, Kaiming He, ICML 2016 Tutorial

Training Very Deep Networks, Srivastava et al., 2015

Non-linearity

ht

ht�1 F(ct�1,ht�1,xt)

xt



Hard Constraints

23


recurrent input





Non-linearity

ht shortcut


xt

xt



Hard Constraints

23


recurrent input





Non-linearity

ht shortcut


residual netht + xt

xt

xt

new output:



Hard Constraints

23


recurrent input





Non-linearity

ht shortcut


residual netht + xt

gated highway network:

rt � ht + (1� rt) � xt

rt = �(f(ht�1,xt))

xt

xt

new output:



Hard Constraints

24

the cats love[ ] [ ] [V]

Model - (3) Variational DropoutDeep BiLSTM Tagger

Highway Connections Variational Dropout Viterbi Decoding w\

Hard Constraints

24


Traditionally, dropout masks are only applied to vertical connections.



Hard Constraints

24



Applying dropout to recurrent connections causes too much noise amplification.



Hard Constraints

24



Variational dropout: Reuse the same dropout mask for each timestep. Gal and Ghahramani, 2016

Applying dropout to recurrent connections causes too much noise amplification.



Hard Constraints

Softmax

BiLSTM layers …

25

B-ARG1 I-ARG0 B-V B-ARG1Greedy Output

argmax


Model - (4) Viterbi Decoding with Hard ConstraintsDeep BiLSTM Tagger

Highway Connections

Variational Dropout Viterbi Decoding w\ Hard Constraints

Softmax

BiLSTM layers …

25

BIO inconsistency


argmax



Highway Connections


Softmax

BiLSTM layers …

25

BIO inconsistency

s(B-ARG0 ! I-ARG0) =0

s(B-ARG1 ! I-ARG0) =� inf

· · ·

Heuristic transition scores


argmax



Highway Connections


Softmax

BiLSTM layers …

25


… …O 0.01


… …O 0.05


… …B-V 0.95


… …O 0.05

s(B-ARG0 ! I-ARG0) =0

s(B-ARG1 ! I-ARG0) =� inf

· · ·

Heuristic transition scores

Viterbi decoding



Highway Connections


Other Implementation Details …

26

• 8 layer BiLSTMs with 300D hidden layers.

• 100D GloVe embeddings, updated during training.

• Orthonormal initialization for LSTM weight matrices (Saxe et al., 2013)

• 5 model ensemble with product-of-experts (Hinton 2002)

• Trained for 500 epochs.

Datasets

27

CoNLL-2005(PropBank)

CoNLL-2012(OntoNotes)

Size 40k sentences 140k sentences

Domains newswire

• telephone conversations • newswire • newsgroups • broadcast news • broadcast conversation • weblogs

Annotated predicates Verbs Added some nominal

predicates

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Toutanova* Punyakanok*

WSJ Test Brown (out-domain) Test *:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90


WSJ Test Brown (out-domain) Test

79.480.379.980.382.883.184.6

*:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90



67.868.871.372.2

69.472.173.6

79.480.379.980.382.883.184.6

*:Ensemble models

CoNLL 2005 Results

28

F1

60

65

70

75

80

85

90



67.868.871.372.2

69.472.173.6

79.480.379.980.382.883.184.6

Pipeline modelsBiLSTM models

*:Ensemble models

CoNLL 2012 (OntoNotes) Results

29

F1

60

65

70

75

80

85

90

Ours* Ours Zhou FitzGerald* Täckström Pradhan

77.579.480.281.581.783.4

CoNLL 2012 Test

Pipeline modelsBiLSTM models

*:Ensemble models

Ablations on Number of Layers (2,4,6 and 8)

30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8

Greedy decodingViterbi decoding

80.580.179.1

74.6


30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8


81.681.480.5

77.2

80.580.179.1

74.6


30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8


81.681.480.5

77.2

80.580.179.1

74.6Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.


30

F1 o

n C

oNLL

-05

Dev

.

70

75

80

85

L2 L4 L6 L8


81.681.480.5

77.2

80.580.179.1

74.6

Shallow models benefit more from constrained decoding.

Performance increases as model goes deeper. Biggest jump from 2 to 4 layer.

Ablations (single model, on CoNLL05 Dev)

31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500

Full model No highway No orthonormal init. No dropout


31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500



31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500


Without initialization, the deep model learns very slowly


31

F1 o

n D

ev. S

et

60

65

70

75

80

85

Num. Epochs

1 50 100 150 200 250 300 350 400 450 500


Without dropout, model overfits at ~300 epochs.

Without initialization, the deep model learns very slowly

What can we learn from the results?

32

F1

60657075808590


77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?


32

F1

60657075808590


77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?


32

F1

60657075808590


77.579.480.281.581.783.4

1. What’s in the remaining 17%? When does the model still struggle?2. What are deeper models good at?3. BiLSTM-based models are very accurate even without syntax. But

can we conclude syntax is no longer useful in SRL?

33

Question (1): When does the model make mistakes?

33


Analysis— Error breakdown with oracle transformation — E.g. tease apart labeling errors and boundary errors — Link the error types to known linguistic phenomena (e.g. pp attachment)

Oracle Transformations (Error Breakdown)

34

Error Breakdown Labeling Errors

PP Attachment

Long-range Dependencies

Structural Consistency

Can Syntax Still Help?

Oracle Transformations


34

[We] fly to NYC tomorrow.ARG0ARG1

1) Fix Label:


PP Attachment






34


1) Fix Label: [They] wrote [an email] to

cancel it.

ARG0ARG02) Move core arg:


PP Attachment






34


I eat [pasta with delight].ARG1

[pasta] [with delight]ARG1 ARGM-MNR


cancel it.


3) Split/Merge span:

I eat [pasta] [with broccoli].

ARG1[pasta with broccoli]

ARG1 ARGM-MNR


PP Attachment






34




[“No broccoli”,] I said.

[“No broccoli”] ,

ARG1


cancel it.





ARG1 ARGM-MNR

4) Fix span boundary:


PP Attachment






34




[“No broccoli”,] I said.

[“No broccoli”] ,

ARG1


cancel it.





ARG1 ARGM-MNR

4) Fix span boundary:

5) Drop/add arg:

Hesitantly, they declined to elaborate [on that matter].

[Hesitantly], they declined to elaborate on that matter.

ARG1ARGM-MNR+


PP Attachment





35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100

Original Fix Labels Move arg. Merge/Split spans Fix boundary Drop/Add arg.

Ours Pradhan Punyakanok

Oracle Transformations (Error Breakdown) Result

Pradhan, Punyakanok: CoNLL-2005 systems


PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100






PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100






PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100






PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100






PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100






PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100



Labeling error 29.3%




PP Attachment




35

F1 a

fter e

rror fi

x

75

81.25

87.5

93.75

100



Labeling error 29.3%

Attachment error 25.3%




PP Attachment




36

Confusion matrix for labeling errors (row normalized)

Error Breakdown Labeling Errors PP

AttachmentLong-range

DependenciesStructural

ConsistencyCan Syntax Still Help?

36


• ARG2 is often confused with certain adjuncts (DIR, LOC, MNR), why?





36


Predicate: move

Arg0-PAG: mover Arg1-PPT: moved Arg2-GOL: destination Arg3-VSP: aspect, domain in which arg1 moving






Predicate: cut

Arg0-PAG: intentional cutter Arg1-PPT: thing cut Arg2-DIR: medium, source Arg3-MNR: instrument, unintentional cutter Arg4-GOL: beneficiary

36


Predicate: move







Predicate: cut


36


Predicate: move


Predicate: strike

Arg0-PAG: Agent Arg1-PPT: Theme(-Creation) Arg2-MNR: Instrument






Predicate: cut


36


Predicate: move


Predicate: strike



• Argument-adjunct distinctions are difficult even for human annotators!





Predicate: cut


36


Predicate: move


Predicate: strike



• Argument-adjunct distinctions are difficult even for human annotators!





“After many attempts to find a reliable test to distinguish between arguments and adjuncts, we abandoned structurally marking this difference.”

—The Penn Treebank: An Overview (Taylor et al., 2003)

37

Sumimoto financed the acquisition from Sears

Error Breakdown

Labeling Errors PP Attachment Long-range



37


Wrong PP attachment (attach high)

Correct PP attachment (attach low)

Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Error Breakdown




37




Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown




37




Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown




Merge/split span operations: 25.3%. of the mode mistakes.

Categorize the Y spans in : [XY]—>[X][Y] and

[X][Y]—>[XY] operations using gold syntactic labels

37




Arg2 (PP)Arg1 (NP)

Arg1 (NP)

Wrong SRL spans

Correct SRL spans

merge

Error Breakdown




Merge/split span operations: 25.3%. of the mode mistakes.

Categorize the Y spans in : [XY]—>[X][Y] and

[X][Y]—>[XY] operations using gold syntactic labels

38



38


Takeaway— Traditionally hard tasks, such as argument-adjunct distinction and PP attachment decisions are still challenging! — Use external information to improve PP attachment.


39

Question (2): What are deeper models good at?

Analysis— Long-range dependencies: model performance on arguments that are far away from the predicates. — Structural consistency: amount of inconsistent BIO tag pairs in greedy prediction.

40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max

L8 L6 L4 L2 PunyakanokPradhan

0

1250

2500

3750

5000

Distance to the predicate (by num. words)

0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment




40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max


0

1250

2500

3750

5000


0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment




40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max


0

1250

2500

3750

5000


0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment




40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max


0

1250

2500

3750

5000


0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Error Breakdown

Labeling Errors

PP Attachment




40

Avg.

F1

54

62.5

71

79.5

88

0 1-2 3-7 7-max


0

1250

2500

3750

5000


0 1-2 3-7 7-max

6571,174

2,151

4,469

Num. gold arguments

Deep models’ performance deteriorates slower on long-distance predictions

Error Breakdown

Labeling Errors

PP Attachment




41

BIO

-vio

latio

ns/T

oken

0

0.06

0.12

0.18

L2 L4 L6 L8 L8+Ensemble

0.070.070.060.08

0.18

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies Structural Consistency Can Syntax

Still Help?

e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)

41

Deeper models (with 4+ layers) generate more consistent BIO sequences.

BIO

-vio

latio

ns/T

oken

0

0.06

0.12

0.18

L2 L4 L6 L8 L8+Ensemble

0.070.070.060.08

0.18

Error Breakdown

Labeling Errors

PP Attachment

Long-range Dependencies Structural Consistency Can Syntax

Still Help?

e.g. (B-ArgX, I-ArgY) or (O, I-ArgY)

42

Question (3): Can syntax still help SRL?

Recap— PropBank SRL is annotated on top of the PTB syntax. — More than 98% of the gold SRL spans are syntactic constituents. Analysis— At decoding time, make predicted argument spans agree with given syntactic structure. — See if SRL performance increases.

F1

75

80

85

90

95

100

% Arguments in gold syntax tree91 93 95 97 99

43

L2

Syntax-aware modelsBiLSTM-based models

Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Error Breakdown

Labeling Errors

PP Attachment


Structural Consistency Can Syntax Still Help?

F1

75

80

85

90

95

100

% Arguments in gold syntax tree91 93 95 97 99

43

L2


Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Improve SRL accuracy by adding

syntactic information?

Error Breakdown

Labeling Errors

PP Attachment



44

[The cats] love [hats and the dogs] love bananas. ARG0 ARG1

[The cats][hats and the dogs]

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment



Constrained Decoding with Syntax

44



2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment




Penalize sequence score

44



Sequence score:

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment




tX

i=1

log p(tagt | sentence)� C ⇥X

span

1(span 62 Syntax Tree)


Num. arguments disagree w\ syntax

Penalty strength

44

• Constraints are not locally decomposable. • A* search (Lewis and Steedman 2014) for a sequence with highest score.



Sequence score:

2 Syntax Tree

62 Syntax Tree

Error Breakdown

Labeling Errors

PP Attachment




tX

i=1

log p(tagt | sentence)� C ⇥X

span

1(span 62 Syntax Tree)


Num. arguments disagree w\ syntax

Penalty strength

45

Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016

(State of the art)

Error Breakdown

Labeling Errors

PP Attachment




45

Charniak: A maximum-entropy-inspired parser, Charniak, 2000 Choe: Parsing as language modeling, Choe and Charniak, 2016

(State of the art)

Error Breakdown

Labeling Errors

PP Attachment




F1

75

80

85

90

95

100

% Arguments agree with gold syntax91 93 95 97 99

46

L2


Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE

Constrained decoding w\ Syntax

Error Breakdown

Labeling Errors

PP Attachment



F1

75

80

85

90

95

100

% Arguments agree with gold syntax91 93 95 97 99

46

L2


Gold

PunyakanokPradhan

L4 L6 L8 L8+PoE +AutoSyn+GoldSyn

Constrained decoding w\ Syntax

Error Breakdown

Labeling Errors

PP Attachment



Takeaway— Modest gain observed with predicted syntax. — Joint training could bring more improvement.

47

Contributions (Neural SRL)

• New state-of-the-art deep network for end-to-end SRL.

• Code and models will be publicly available at: https://github.com/luheng/deep_srl

• In-depth error analysis indicating where the models work well and where they still struggle.

• Syntax-based experiments pointing towards directions for future improvements.

https://github.com/luheng/deep_srl

48





✓

48





✓

✓

49

Thanks!

Frame: break.01


ARG2 instrument

ARG3 pieces


Model AnalysisProblem

Code will be available at: https://github.com/luheng/deep_srl

https://github.com/luheng/deep_srl

Documents

Neural Semantic Role Labeling: What works and what’s nextluheng.github.io/files/deep_srl_slides_long.pdf · Neural Semantic Role Labeling: What works and what’s next or: What