View
19
Download
0
Category
Preview:
DESCRIPTION
Two-Phase Semantic Role Labeling based on Support Vector Machines. Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ. Contents. Introduction Two-phase semantic role labeling based on SVMs Semantic argument boundary identification phase Semantic role classification phase - PowerPoint PPT Presentation
Citation preview
Two-Phase Semantic Role Labeling based on Support Vector Machines
Kyung-Mi ParkYoung-Sook Hwang
Hae-Chang Rim
NLP Lab. Korea Univ.
2
Contents
Introduction
Two-phase semantic role labeling based on SVMs
• Semantic argument boundary identification phase
• Semantic role classification phase
Experiments
Conclusion
3
Introduction(1)
Advantages of using SVMs• high generalization performance in high dimensional feature spaces
• learning with combination of multiple features is possible by virtue of polynomial kernel functions
Semantic Role Labeling(SRL) task is one of the
multiclass classification task• since SVM is a binary classifier, we have to extend SVMs to multiclass
classification task
• we are often confronted with the unbalanced class distribution problem in a multiclass classification task
4
Introduction(2)
If we try to apply SVMs in the SRL task• we have to find a method of resolving the unbalanced class
distribution problem
Propose a two-phase SRL method• Boundary identification phase + Role classification phase
We can alleviate the unbalanced class distribution problem• In the identification phase, only three SVM classifiers are required to
identify B-ARG, I-ARG, O.
• We can decrease the number of negative examples.
• In the classification phase, we can ignore non-arguments constituents
5
Two-phase Semantic Role Labeling(1)
First phase: semantic argument identification Phase
• Identify the boundary of semantic arguments
• First, segment a sentence into syntactic constituents(c) using a unit of chunk or subclause
• Second, classify syntactic constituents into
B-ARG, I-ARG, O
Second phase: semantic role classification phase
• assign appropriate semantic roles to the identified semantic arguments
6
Two-phase Semantic Role Labeling(2)Input Output
Undertheexistingconstract,Rockwellsaid,ithasalreadydelivered793oftheshipsetstoBoeing.
INDTVBGNN,NNPVBD,PRPVBZRBVBNCDINDTNNSTONNP.
B-PPB-NPI-NPI-NPOB-NPB-VPOB-NPB-VPI-VPI-VPB-NPB-PPB-NPI-NPB-PPB-NPO
(S* * * * *(S* *S) * * * * * * * * * * * *S)
OOOOOB-ORGOOOOOOOOOOOB-MISCO
--exist--------deliver-------
* * (V*V)(A1*A1) * * * * * * * * * * * * * * *
(AM-LOC* * * *AM-LOC) * * * * (A0*A0) *(AM-TMP*AM-TMP) (V*V) (A1* * * *A1) * (A2*A2) *
Under the existing contract ,Rockwell
said, it
has already
delivered793 of the shipsets to Boeing
C C C C C C C P C C C C C
B-ARG I-ARG I-ARG O O O B-ARG P B-ARG I-ARG I-ARG O B-ARG
ARG ARG P ARG ARG
AM-LOC A0 P A1 A2
7
Semantic Argument Boundary Identification(1)
Restrict the search space in terms of the constituents• the left search boundary is set to the left boundary of the second
upper clause
• the right search boundary is set to the right boundary of the immediate clause
Utilize features for identifying syntactic constituents which are dependent to a predicate • Semantic arguments are dependent on the predicate
• Features for finding dependency relations are implicitly represented
8
Semantic Argument Boundary Identification(2)
29 features are used for representing syntactic and semantic information related to dependency relationships between syntactic constituents and predicate
Features Values
predicate-constituent
(intervening features)
position
distance
# of VP, NP, SBAR
# of POS [CC] [,] [:]
POS[“]&POS[”]
path
-2, -1, 1
0, 1, 2…
0, 1, 2…
0, 1, 2…
-1, 0, 1
VP-PP-NP, …
predicate itself & context
headword, headword’s POS, chunk type
beginning word’s POS
context-1: headword, headword’s POS, chunk type
MD, TO, VBZ, …
constituent itself & context
headword, headword’s POS, chunk type
context-2: headword, headword’s POS, chunk type
context-1: headword, headword’s POS, chunk type
context+1: headword, headword’s POS, chunk type
9
Semantic Role Classification(1)
We consider only 18 semantic roles based on frequency in the
training data
• AM-MOD, AM-NEG are post-processed by hand-crafted rules
• we do not consider 19 semantic roles that appear less than 36 times in the training data
— A5, AM-PRD, AM-REC, AA— R-A3, R-AA, R-AM-TMP, R-AM-LOC, R-AM-MNR, R-AM-ADV, R-AM-PNC— C-A0, C-A2, C-A3, C-AM-MNR, C-AM-ADV, C-AM-EXT, C-AM-DIS, C-AM-
CAU
18 semantic roles
A0, A1, A2, A3, A4, R-A0, R-A1, R-A2, C-A1
AM-TMP, AM-ADV, AM-MNR, AM-LOC, AM-DIS
AM-PNC, AM-CAU, AM-DIR, AM-EXT
10
Semantic Role Classification(2)
This phase also uses all features applied in the identification phase• except for # of POS[:] and POS[“] & POS[”]
In addition, we use voice feature
• This is a binary feature identifying whether the target phrase is active or passive
Named-entity information is not used• performance is decreased when NE information is included
11
Experiments(1)
We used SVM light package (http://svm-light.joachims.org/) In both phases, we used a polynomial kernel (degree 2) with
the one-vs-rest classification method
Results on the development set (closed challenge)
Precision Recall F-measure Accuracy
Overall 67.27% 64.36% 65.78% -
Identification 75.96% 72.30% 74.08% -
Classification - - - 85.45%
12
Experiments(2)
Results on the test set (closed challenge)
prec. rec. F1
Overall 65.63 62.43 63.99
A0 78.24 74.60 76.38
A1 65.83 66.46 66.14
A2 49.84 43.70 46.57
A3 56.04 34.00 42.32
A4 62.86 44.00 51.76
A5 0.00 0.00 0.00
AM-ADV 45.18 44.30 44.74
AM-CAU 36.67 22.45 27.85
AM-DIR 20.00 20.00 20.00
AM-DIS 56.62 58.22 57.41
AM-EXT 61.54 57.14 59.26
AM-LOC 26.01 31.14 28.34
AM-MNR 43.54 35.69 39.22
prec. rec. F1
AM-MOD 97.46 91.10 94.17
AM-NEG 94.92 88.19 91.43
AM-PNC 40.00 28.24 33.10
AM-PRD 0.00 0.00 0.00
AM-TMP 51.83 45.38 48.39
R-A0 80.49 83.02 81.73
R-A1 75.00 51.43 61.02
R-A2 100.00 33.33 50.00
R-A3 0.00 0.00 0.00
R-AM-LOC 0.00 0.00 0.00
R-AM-MNR 0.00 0.00 0.00
R-AM-PNC 0.00 0.00 0.00
R-AM-TMP 0.00 0.00 0.00
V 96.66 96.66 96.66
13
Conclusion
proposed a method of two-phase semantic role labeling based on the support vector machines
By applying the two-phase method,• we can alleviate the unbalanced class distribution problem caused by
the negative examples
Our system obtains F-measure of 63.99 % on the test set and 65.78 % on the development set
Recommended