Upload
dangxuyen
View
222
Download
3
Embed Size (px)
Citation preview
Incremental Joint Extraction of Entity Mentions and Relations
Qi Li and Heng Ji
{liq7,jih}@rpi.edu
Rensselaer Polytechnic Institute
Baltimore is the largest city in the U.S. state of Maryland
2
End‐to‐End Relation Extraction
Geopolitical entity
Geopolitical entity
locatedrelation?
Baseline System
3
Relation Extraction
Entity Mention Boundaries + Types
The tire maker still employs 1,400input
ORG tire maker
PER 1,400
EMP‐ORGarg‐1 ORG tire makerarg‐2 PER 1,400
• Typical pipelined approach
Error Propagation
Problem Statement
4
• Exploit global features in the joint search space
the tire maker still employs 1,400
EMP-ORG
• Jointly extract and improve both subtasks
PER NIL ORG PER NIL ORG
Problem Formulation
5
Joint Extraction of Entity Mentions and relations
Joint search algorithm
The tire maker still employs 1,400
search space
beam
Joint search space is exponentially largeGlobal features make inference even harderExact inference is expensive
Joint search space is exponentially largeGlobal features make inference even harderExact inference is expensive
Learning Framework
6
• For each training set:
• In each training iteration:
Beam Search update weights
(Collins and Roark 2004, Huang et al. 2012)
• Weights update:
Search Algorithm• Joint search framework
o beam search• flexible and efficient
o segment–based decoding• “segment” ‐‐ subsequence of input sentence• each segment is a hypothesis a entity mention or NIL
7
The tire maker still employs 1,400O B-ORG L-ORG O O U-PER
The tire maker still employs 1,400
ORG PER
token-based
vs.
segment-based
Joint Search Algorithm• Token‐based decoder doesn’t work
o unfair to compare mentions with different boundaries• Complete mention is biased by the model
o difficult to synchronize relation links• (NewB‐FAC YorkI‐FAC) is not yet a complete mentionno link can be made at this step
8
Not parsed yet
Joint Search Algorithm• Mention‐step
o propose various segments at the current tokeno append to previous assignmentso get best‐k new assignments
9
The tire maker still employs 1,400
ORGPER
O
…
Joint Search Algorithm• Mention‐step
o propose various segments at the current tokeno append to previous assignmentso get best‐k new assignments
10
The tire maker still employs 1,400
ORG
PER
O
…PER
Context Features:noun phrase
person gazetteerprevious word: “the”
…× PER
O
Joint Search Algorithm• Mention‐step (cont.)
o propose various segments at the current tokeno append to previous assignmentso get best‐k new assignments
11
The tire maker still employs 1,400
PER
O
ORG
PER
…
O ORG O O
Joint Search Algorithm• Relation‐step
o link each new node to previous oneso following type constraints
o iteratively update the beam
12
The tire maker still employs 1,400
EMP-ORG
PER
O ORG O O
Prune relations incompatible w/ entity typesPhysical, Person‐Social are ruled out in this example
• Final structureo return top‐ranked configuration in the beam
The tire maker still employs 1,400
ORGOo
ORG O O
The tire maker still employs 1,400
PERO ORG O O
The tire maker still employs 1,400
ORGO ORG O O
Affiliation
The tire maker still employs 1,400
PERORG O O
EMP-ORG
O
The tire maker still employs 1,400
OO ORG O O
The tire maker still employs 1,400
PERO O O
EMP-ORG
O ORG
Search Algorithm
13
The tire maker still employs 1,400
PERO ORG O O
EMP-ORG
. . .final beam
. . .
532
501
397
302
205
103
Features• Segment‐based features
o Based on the entire mention instead of individual tokenso Gazetteer features
• “New York City” is a city • “New York” is a state or city
o Word case features • case information about all tokens contained• all‐capitalized “Lusaka”• all‐lowercase “magistrate”• mixture “Lusaka magistrate” ‐‐ a bad mention
14
Features• Segment‐based features (cont.)
o Contextual features• neighbor unigrams and bigrams
o Parsing features • phrase label of common ancestor (NP)• depth of common ancestor (2)• whether the segment matches a base phrase (true)
or is a suffix of a base phrase• head word of the segment (maker)
15
Global Features• Involve multiple local decisions
o dynamically created during the searcho capture long‐distance dependencies
o entity mentions are inter‐dependent o a relation may indicate or contradict other ones
16
Global Entity Mention Features• Co‐referential mentions should be assigned the same label
17
thousands of Muslims marched to theirmain mosque
the senior Moscow official, who was ..
GPE
PER
GPE, PER, O …
(GPE=Geopolitical Entity)
Global Entity Mention Features• Neighbor entity mentions should have coherent types
Barbara Starr was reporting from the Pentagon“PER–prep_from–PER” will receive negative weights
“GPE–conj_and–GPE” will receive positive weights
18
prep_from
Syria, China and Germany all opposing
conj_and
Global Entity Mention Features• If an entity mention is semantically part of another mention,
they should be assigned the same entity type • Examples:
o some of Iraq’s exiles o one of the town’s two meat‐packing plantso the rest of Americao …
• Part‐whole relation is identified by prep_of dependency
19
GPE
one plantsprep_of
Global Entity Mention Features• Entity role coherence
o entity mentions should play coherent roles o a person mention is unlikely to have two employer o a geo‐political mention is likely to be physical locations for two other mentions
20
US forces in Somalia, Haiti and Kosovo
Global Entity Mention Features• Penalize triangle structures
o multiple entity mentions are unlikely to be fully connected with the same relation type
o triangle structure will be penalized
21
US forces in Somalia, Haiti and Kosovo
Global Entity Mention Features• Dependency compatibility
o two dependent mentions should have compatible relations
22
US forces in Somalia, Haiti and Kosovo
Experiments• Data
o ACE’05 corpus: exclude genres cts and uno ACE’04 corpus: bnews and nwire subsets
• Evaluate Metrico precision/recall and f‐measure for entity mention and relationo entity mention + relation: consider entity type
23
Data Set # sentences # mentions # relations
ACE’05Train 7,273 26,470 4,779Dev 1,765 6,421 1,179Test 1,535 5,476 1,147
ACE’04 6,789 22,740 4,368
Experiments• Performance on development set (beam size = 8)
o global feature improves performance on both taskso set training iteration as 22 for remaining experiments
24
Experiments• Overall performance on ACE’05 corpus
25
78.1
49.8
48
80
50.6
48.3
80.8
52.1
49.5
86.5
55
51.9
40 45 50 55 60 65 70 75 80 85 90
En ty Men on
Rela on
En ty Men on + Rela on
Annotator‐Agreement
Joint w/ Global
Joint w/ Local
Pipeline
Experiments• Overall performance on ACE’04 corpus
26
77.6
4642.9
78.8
46.944.1
79.7
48.345.3
40.8
20
30
40
50
60
70
80
90
Entity Mention Relation Entity Mention +Relation
Pipeline
Joint w/ Local
Joint w/ Global
Chan & Roth (2011)
Experiments• Real Example
27
a marcher from Florida 1o o
a marcher from Florida 2o per
• the correct hypothesis is ranked lower
Ranking
Experiments• Real Example
28
a marcher from Florida 1o o o
a marcher from Florida 2‐>4o per o
• correct one is ranked lower
Ranking
Experiments• Real Example
29
a marcher from Florida 4‐>1o per o gpe
a marcher from Florida 1‐>2o o o gpe
• global entity feature of (per‐prep_from‐gpe) pushed the correct assignment to the top
Ranking
Experiments• Real Example
30
a marcher from Florida 1o per o gpe
a marcher from Florida 2‐>4o o o gpe
GEN-AFF
• adding relation link makes the margin even larger
Ranking
Related Work• ACE Entity Mention and Relation Extraction
o Florian et al., 2006, Florian et al., 2010, Ohta et al., 2012 etc.o Zhou et al., 2007, Jiang & Zhai, 2007, Chan & Roth 2011, etc.o Pipelined methods, assumed entity mentions were given
• Joint Inference Methods for IEo Re‐ranking: Ji & Grishman 2005. Parsing: Kate & Mooney, 2010o ILP‐inference: Roth & Yih, 2004, Roth & Yih 2007, Yang & Cardie,
2013 etc.o Models are separately learnedo Ours: single model + global features
• Joint Graphical Modelso Singh et al., 2013, Yu & Lam, 2010 etc.o Computationally expensive
31