A Structured Model for Joint Learning of Argument Roles and Predicate Senses
Yotaro Watanabe Masayuki Asahara Yuji Matsumoto
ACL 2010Uppsala, SwedenJuly 12, 2010
Tohoku University Nara Institute of Science and Technology
Page 2
Predicate-Argument Structure Analysis(Semantic Role Labeling)
Task of analyzing predicates and its arguments– A predicate represents a state or an event, and its arguments have relations to
the predicate– Each of arguments has a particular semantic role (Agent, Theme, etc)
In recent years, predicate sense disambiguation has been included in predicate-argument structure analysis [Surdeanu+ 08, Hajič+ 09]
– ‘sell.01’ means that ‘sold’ is an instance of the first sense of ‘sell’
Important for many NLP applications– MT, QA, RTE, etc.
ThemeLocation
Temporalluxury auto maker lastThe year sold 1,214 cars in the U.S.
maker.01 sell.01
Product Agent Agent
Page 3
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
Two Types of Dependencies of Elements in Predicate-Argument Structures
(1) Inter-dependencies between a predicate and its arguments– A1: car => we can infer that the correct sense is drive.01
(2) Non-local dependencies among arguments• Two or more arguments do not have the same role• Basically, obligatory roles of the predicate should appear in sentences
drive.01A0 A1
SBJ NMODOBJ
Paul drove his car
In order to realize robust predicate-argument structure analysis, it is necessary to deal with these types of dependencies
Page 4
Previous Work
(1) Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.]
• Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker
• Can not explicitly capture inter-dependencies between a predicate and its arguments
(2) Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.]
• Jointly learn and classify pred. senses and arg. roles simultaneously• MLN can not deal with particular types of global features
Currently, no existing (discriminative) approach sufficiently handles both types of dependencies
Page 5
Previous Work
(1) Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.]
• Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker
• Can not explicitly capture inter-dependencies between a predicate and its arguments
(2) Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.]
• Jointly learn and classify pred. senses and arg. roles simultaneously• MLN can not deal with particular types of global features
Currently, no existing (discriminative) approach sufficiently handles both types of dependenciesWe propose a structured model that can capture both
types of dependencies simultaneously
Page 6
The proposed model
SBJ NMODOBJ
Paul drove his car
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
Page 7
The proposed model
A0
drive.01 drive.02
…A1 A1A0 …Paul car
drove
NONE NONE
SBJ NMODOBJ
Paul drove his car
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
Expand the possible labels of predicate senses and argument roles
Page 8
The proposed model
A0
drive.01 drive.02
…A1 A1A0 …Paul car
drove
NONE NONE
SBJ NMODOBJ
Paul drove his car
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
Expand the possible labels of predicate senses and argument roles
We use four types of factors which score labels of elements in predicate-argument structures
Page 9
The proposed model
A0
drive.01 drive.02
…A1 A1A0 …Paul car
drove
NONE NONE
SBJ NMODOBJ
Paul drove his car
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
Expand the possible labels of predicate senses and argument roles
These factors are defined by (linear model)
We use four types of factors which score labels of elements in predicate-argument structures
Page 10
A0
drive.01 drive.02
…A1 A1A0 …Paul car
drove
NONE NONE
The proposed model
1.4754 0.7268
SBJ NMODOBJ
Paul drove his car
FP
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
use a factor which scores sense labels of the predicate
Page 11
A0
drive.01 drive.02
…A1 A1A0 …Paul car
drove
NONE NONE
The proposed model
1.784 0.238 -1.665 -1.235 0.876 -1.482
SBJ NMODOBJ
Paul drove his car
FA
FP
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
use a factor which scores role labels of each argument
Page 12
A0 …A1 A1A0 …Paul car
drove
NONE NONE
The proposed model
0.764 0.261
SBJ NMODOBJ
Paul drove his car
FPA
FA
FP
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
drive.01 drive.02
add a factor which scores label pairs of a predicate sense and a semantic role of an argument
Page 13
The proposed model
drive.02
…A1 A0 …Paul car
drove
NONE NONE
A0,drive01,A1… 1.865
A0
drive.01
A1
SBJ NMODOBJ
Paul drove his car
FP
FPA
FA
FG
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
add a factor which captures plausibility of the whole predicate-argument structure(use global features)
Page 14
The proposed model
drive.02
…A1 A0 …Paul car
drove
NONE NONE
A0,drive01,A1… 1.865
A0
drive.01
A1
SBJ NMODOBJ
Paul drove his car
FP
FPA
FA
FG
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
add a factor which captures plausibility of the whole predicate-argument structure(use global features)
The predicate ‘drive’ has all obligatory roles A0 and A1=> FG assigns the higher score to the weight corresponds to this feature
Page 15
The proposed model
drive.02
…A1 A0 …Paul car
drove
NONEA0
drive.01
A1 NONE1.784 0.238 -1.665 -1.235 0.876 -1.482
1.4754 0.7268
0.7640.425
SBJ NMODOBJ
Paul drove his car
A0,drive01,A1… 1.865
FP
FPA
FA
FG
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
The proposed model combines these types of factors
Page 16
The proposed model
drive.02
…A1 A0 …Paul car
drove
NONEA0
drive.01
A1 NONE
drive.01A0 A1
1.784 0.238 -1.665 -1.235 0.876 -1.482
1.4754 0.7268
0.7640.425
SBJ NMODOBJ
Paul drove his car
A0,drive01,A1… 1.865
FP
FPA
FA
FG
drive.01: drive a vehicle A0: driver A1: vehicle
drive.02: cause to move A0: driver A1: things in motion
The proposed model combines these types of factors
The highest scoring assignment is returned by the proposed model
Page 17
Dealing with global (non-local) features
Introduce the fundamental idea of [Kazama and Torisawa 2007]– Features are divided into local features and global features– Inference: N-best based approach
(1) Generate N-best assignments using only local features
(2) Obtain global features in the N-best assignments
(3) Select the argmax – Learning: train parameters with two margin constraints
• All: train parameters so as to ensure a sufficient margin using all features (both local features and global features)
• Local only: when the constraint All is satisfied, train parameters so as to ensure a sufficient margin using only local features
• K&T proposed a Margin-Perceptron Learning Algorithm
Page 18
Inference and Learning Algorithm of the Proposed Model
Inference: generate N-best assignments for each predicate sense
Learning: the online Passive-Aggressive Algorithm [Crammer 2006]
• The parameters are trained by solving the optimization problem used in PA with the two margin constraints: All (local + global) and Local only
(1) All (local + global)
margin
(2) Local only
margin
positive
other
positive
other
Page 19
Results on the CoNLL-2009 ST Dataset (average)
feature selection
Overall (Sem. F1)
WSD (Acc.)
SRL (Lab. F1)
FP+FA no 79.17 89.65 72.20FP+FA+FPA no 79.58 89.78 72.74FP+FA+FG no 80.42 89.83 74.11ALL no 80.75 90.15 74.46Björkelund yes 80.80Zhao yes 80.47Meza-Ruiz no 77.46
sense FP
FPA
FG
FA
…role1 role2 roleN
The best performance is obtained by using the all factorsOur model achieved the competitive results with the top system in
the CoNLL-2009 Shared Task without any feature selection procedure
Page 20
Results on the CoNLL-2009 ST Dataset (average)
feature selection
Overall (Sem. F1)
WSD (Acc.)
SRL (Lab. F1)
FP+FA no 79.17 89.65 72.20FP+FA+FPA no 79.58 89.78 72.74FP+FA+FG no 80.42 89.83 74.11ALL no 80.75 90.15 74.46Björkelund yes 80.80Zhao yes 80.47Meza-Ruiz no 77.46
sense FP
FPA
FG
FA
…role1 role2 roleN
By adding two types of factors FPA and FG, we obtained performance improvements in both tasks (predicate sense disambiguation and argument role labeling)
=> Succeeded in joint learning
Page 21
Summary
We proposed a structured model that can capture two types of dependencies(1) Non-local dependencies among arguments
(2) Inter-dependencies between a predicate and its arguments
The proposed model achieved the competitive results with the state-of-the-art SRL systems without any feature selection procedure
By adding two types of factors, we obtained performance improvements on both predicate sense disambiguation and argument role labeling
=> succeeded in joint learning Future Work
– exploiting unlabeled data (unsupervised or semi-supervised predicate-argument structure analysis)