Upload
darryl
View
17
Download
0
Embed Size (px)
DESCRIPTION
ACE Annotation. Ralph Grishman New York University. ACE (Automatic Content Extraction). Government evaluation task for information extraction 6 evaluations since 2000 next one Nov. 2005 incremental increases in task complexity (Current) criteria for what to annotate: - PowerPoint PPT Presentation
Citation preview
4/14/2005 4/14/2005 1
ACE Annotation
Ralph Grishman
New York University
4/14/2005 4/14/2005 2
ACE
(Automatic Content Extraction)
• Government evaluation task for information extraction
• 6 evaluations since 2000– next one Nov. 2005
– incremental increases in task complexity
• (Current) criteria for what to annotate:– interest to Government sponsors
– good inter-annotator agreement
– reasonable density of annotations• initially for news, now for wider range of genres
(trade-off between coverage and agreement)
4/14/2005 4/14/2005 3
Types of Annotations
Entities
Relations
Events
• Inter-annotator agreement measured by ‘value’ metric– roughly 1.00 - % missing - % spurious
4/14/2005 4/14/2005 4
Entities
• Objects of the discourse• (Semantic) Types:
– persons, organizations, geo-political entities, [non-political] locations, facilities, vehicles, weapons
• Two levels of annotation:– mentions (individual names, nominals, pronouns)
– entities (sets of coreferring mentions)
• Inter-annotator agreement around 0.90
4/14/2005 4/14/2005 5
Relations
• Binary, generally static relationships between entities
• Main types:– physical (location), part-whole, personal-social,
org-affiliation, gen-affiliation, and agent-artifact
• Example: the CEO of Microsoft
• Inter-annotator agreement (given entities) around 0.75 - 0.80
Org-affiliation
4/14/2005 4/14/2005 6
Events
• New for 2005
• Types:– life (born/marry/die), movement, transaction, business (start / end),
personnel (hire / fire), conflict (attack), contact (meet), justice
• Example: China purchased two subs from Russia in 1998.
transfer-ownership: buyer (trigger) artifact seller time
• Inter-annotator agreement (given entities) around 0.55-0.60• some events (born, hire/fire, justice) fairly clear-cut
• others (attack, meet, move) hard to delimit
• coreference sometimes hard
• No causal / subevent linkage -- too hard (maybe in 2006?)
4/14/2005 4/14/2005 7
Corpora
• Genres• newswire and broadcast news
• adding weblogs, conversational telephone, talk shows, usenet this year
• Multi-lingual• English, Chinese, Arabic (since 2003)
• Volume• 2004 set: 140 KW training, 50 KW test per language
• Distributed by LDC
4/14/2005 4/14/2005 8
A (Nearly) Semantic Annotation
• Annotation criteria primarily truth-conditional, not linguistic– although annotations are linked back to text
• e.g., event triggers
– and some constraints are included to improve inter-annotator agreement• e.g., event arguments must be in same sentence as trigger
• Event arguments are filled in using ‘true beyond a reasonable doubt’ rule “An attack in the Middle East killed two Israelis.”
– Both the attack and die events are tagged as occurring in the Middle East