PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching
System
Yan LiBeijing University of Posts and Telecommunications
Outline Introduction Preprocessing Entity Expansion Pattern bootstrapping Post-processing Evaluation results Conclusion
Introduction: the framework
Preprocessing NLP (the Standford CoreNLP toolkit)
POS tagger NER Date and time expression recognition Dependency parser Coreference resolution
Preprocessing (cont’) Example: Takeshi Watanabe, the first president of
the ADB, died in his native Japan.
The categorizations of slots
PER ORGDomain Slots Domain Slots
PERalternate_names; spouses; children; parents; siblings;
other_familyPER
alternate_names; members; shareholders; founded_by;
top_members/emplyeesORG member_of; employee_of
ORG
parents; members; member_of; shareholders;
subsidiariesLOC country/state/city_of_birth/death/residence
DATE date_of_birth/deathLOC
member_of; country/state/city_of_headqu
arters; NUM age
ORI originREL religion DATE founded; dissolved
SCHOOL schools_attendedNUM
number_of_employees/membersCAUSE cause_of_death
TITLE titles URL websiteCHARGE charges REL political/religious_affiliation
Entity Expansion The coreferences and alternate names of an
entity exist in relevant documents. In the purpose of improving recall. Scheme 1 (PER & ORG): coreference
resolution The relation chain run by the Stanford CoreNLP. Example:
Entity Expansion (cont’) Scheme 2 (PER & ORG): identifying
alternate names Rule-based information extraction Interpretative entities in parenthesis Example:
Starr International Co., known as SICO, ……
Scheme 3 (ORG) Removing the corporate suffixes in queries Finding the acronyms or full expressions Example:
Norwegian University of Science and Technology (NTNU)
Pattern Bootstrapping: WorkflowRalph Grishman and Bonan Min, “New York University KBP 2010 Slot‐Filling System”, 2010.
Pattern Bootstrapping: Seed Pairs
The KBP English Monolingual Slot Filling Evaluation Data in the past three years 92 PER entities 106 ORG entities 1,627 entity-value pairs
Word sequence pattern the middle context between an entity-value pair Example:
PER:countries_of_residence <PER> native <LOC>
Dependency path pattern the shortest dependency path which connects an
entity-value pair Example: PER:title <PER> appos <TITLE>
PER:member_of <PER> appos president prep_of<ORG> PER:country_of_death <PER> nsubj-1 died prep_in<LOC>
Pattern Bootstrapping: Pattern Generation
Pattern Bootstrapping: Pattern Evaluation
In the purpose of improving precision Pattern frequency Trigger phrase High-confidence patterns
New entity-value pairs Iteration
Post-processing In the purpose of improving precision DATE
The SUTime module of the CoreNLP TIMEX2 normalization
PER: spouses, children and parents Last name complement Example: John Doe’s first wife, Ruth
“Ruth Doe” is better than “Ruth”.
Post-processing (cont’) Identifying countries, states/provinces
and cities for LOC slots A Wikipedia list containing all countries and
states or provinces. Adding modifiers into fillers of per: title
adjectival modifier: financial Minister noun compound modifier: police chief prepositional modifier: chief of military
operations
Evaluation Results PRIS
Summary StatisticsLDC Top-1 Top-2 Median
Precision 0.9278607 0.6757322 0.48955223 0.11392405
Recall 0.7252106 0.41866493 0.21257292 0.0874919
F1 0.8141142 0.5170068 0.2964302 0.0989736
Slot non-NIL correct redundant inexact wrong missing
Alternate names 6 0 0 0 23
Date of birth 16 4 0 1 1
Date of death 17 1 0 4 2
age 22 0 0 2 2
Country of birth 1 0 0 0 1
State or province of birth 8 0 2 3 2
City of birth 13 1 0 5 2
Country of death 1 0 0 2 0
State or province of death 13 0 2 1 2
City of death 17 0 0 4 1
Country of residence 10 2 2 7 3
State or province of residence 22 1 4 5 13
City of residence 35 1 0 14 8
origin 16 2 0 17 0
Cause of death 18 0 0 1 13
Schools attended 19 7 0 1 14
titles 85 13 8 24 4
Member of 26 2 4 17 10
Employee of 7 0 2 5 20
religion 4 0 0 1 3
spouses 16 5 1 3 10
Children 73 0 3 10 6
Parents 21 4 0 1 4
Siblings 20 0 1 8 3
Other family 2 0 0 0 7
Charges 5 0 0 4 2
Slot non-NIL correct redundant inexact wrong missingAlternate names 46 4 5 25 5Political/religious
affiliations7 1 0 6 3
Top members/employees 59 1 2 20 8Number of
employees/members3 0 0 0 8
Members 0 0 0 0 4
Member of 0 0 0 0 7
Subsidiaries 7 0 0 3 10Parents 4 1 0 4 4
Founded by 5 0 0 3 5
Founded 5 0 0 1 3Dissolved 1 0 0 0 2
Country of headquarters 3 0 0 1 20State or province of
headquarters1 1 0 7 11
City of headquarters 2 0 0 3 10
Shareholders 3 0 1 8 0Website 7 0 0 1 8
Conclusion In the slot filling task of KBP 2012, we
designed an enhanced pattern-matching system which consists of preprocessing, entity expansion, pattern bootstrapping and post-processing.
The precision and recall are relatively good for some specific slots.
It is urgent to improve the remaining slots.
Tips Adequate preparation A harmonious team Active and disciplined environment Be passionate, patient and hardworking ……
Thank you!