Upload
nancy-daniel
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Scaling Answer Type Detection to Large Hierarchies
Kirk Roberts and Andrew Hickl{kirk,andy}@languagecomputer.com
May 29, 2008
Answer Type Hierarchy
(ATH)
Introduction
• Work in factoid question-answering (Q/A) has long leveraged answer type detection (ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question.
Human
OrganizationGroupIndividual
Actor Artist Award Athlete
Baseball Player
CricketPlayer
Who wears #23 for the Los Angeles Galaxy?
Who wears #23 for the Los Angeles Galaxy?
SoccerPlayer
Answer Type Hierarchy
(ATH)
Introduction
• Work in factoid question-answering (Q/A) has long leveraged answer type detection (ATD) systems in order to identify the semantic class (or answer type) of the entities, words, or phrases most likely to correspond to the exact answer of a question.
Human
OrganizationGroupIndividual
Actor Artist Award Athlete
Baseball Player
CricketPlayer
Who wears #23 for the Los Angeles Galaxy?
Who wears #23 for the Los Angeles Galaxy?
SoccerPlayer
Answer Types and Entity Types
• While articulated ATHs clearly have value for question-answering applications, most work in ATD has been limited by the number of types recognized by current named entity recognition systems.– ACE Guidelines: ~35 entity types– Typical Commercial Offering: ~50 entity types– LCC’s CiceroLite™: > 350 entity types
• But are more types really better? Or do they make for a tougher learning problem?
Four Challenges
Challenge 1: Creating an Answer Type Hierarchy• First (published) Answer Type Hierarchy: (Li and Roth 2002, et seq.):
– 2 Tiered Structure: 6 “Coarse” Answer Types, ~50 “Fine” Answer Types
LAND VEHICLE (7): Automobile, Truck, Mass Transport, Train, Military Vehicle, Industrial Vehicle
LAND VEHICLE (7): Automobile, Truck, Mass Transport, Train, Military Vehicle, Industrial Vehicle
WATER VEHICLE (4): Ships, Submarines, Civilian Watercraft, Other Watercraft
WATER VEHICLE (4): Ships, Submarines, Civilian Watercraft, Other Watercraft
AIR VEHICLE (4): Commercial Airliner, Military Plane, Other Aircraft, Blimp
AIR VEHICLE (4): Commercial Airliner, Military Plane, Other Aircraft, Blimp
SPACE VEHICLE (3): Spacecraft, Satellite, Fictional SpacecraftSPACE VEHICLE (3): Spacecraft, Satellite, Fictional Spacecraft
• Questions to answer:– Why not just use an entity hierarchy as the answer type hierarchy?– What are the right set of leaf nodes for an ATH?– What are the right set of non-terminals for an ATH?
Why not just use the entity hierarchy?
• Short answer: Entity hierarchies aren’t organized according to the potential information needs expressed by natural language questions.
• Entity Types are semantic categories assigned to phrases found in text:– David Beckham was born on February 9, 1976. [ENTITY TYPE: DATE]– David Beckham was 33 years old in 2008. [ENTITY TYPE: AGE]– David Beckham (1976 - ) plays for the LA Galaxy. [ENTITY TYPE: YEAR_RANGE]– David Beckham is one year older than Luis Figo. [ENTITY TYPE: RELATIVE_AGE]– David Beckham, 33, was scratched by Capello. [ENTITY TYPE:
GENERIC_NUMBER] – David Beckham has been living for 33 years. [ENTITY TYPE: DURATION]
• Answer Types are semantic categories sought by a question:– How old is David Beckham?
• Answer Type is AGE, but valid entity types include:– AGE– RELATIVE_AGE– GENERIC_NUMBER– DURATION– DATE / YEAR_RANGE
• Step 1: Initialize. – Create the initial ATH as a direct clone of the existing ETH
• Step 2: Consolidate Similar Nodes.– Combine similar nodes under “abstract” parent nodes corresponding to a possible Q-stem
• SOCCER PLAYER, BASEBALL PLAYER, CRICKET PLAYER ATHLETE (Which player?)• CITY, FACILITY, GEOPOLITICAL ENTITY LOCATION (Where?)• POEM, BOOK, MOVIE, GOVERNMENT DOCUMENT AUTHORED_WORK (What work?)
• Step 3: Separate Existing Nodes into Subtypes.– Create multiple answer types for a single entity type when it belongs under different parents
• AIRPORT AIRPORT_LOC and AIRPORT_ORG
• Step 4: Repeat (as necessary).– Perform Step 2 and Step 3 until all “merge-able” types are included in ATH
Constructing an ATH from an ETH
Resultant Answer Type Hierarchy
• 11 coarse answer types (UIUC Hierarchy: 6)– HUMAN, LOCATION, NUMERIC, ABBREVIATION, ENTITY, COMPLEX, WORK,
TEMPORAL, TITLE, CONTACT-INFO, OTHER-VALUE*
• 296 fine types (UIUC Hierarchy: ~50)– Examples: CASINO, MUSEUM, CITY, COUNTRY, STATE, ACTOR, BASEBALL PLAYER, MILITARY
PERSON, COMPANY, UNIVERSITY, BASEBALL TEAM, ISLAND, PLANET, RIVER, ALBUM, SONG, BOOK, WRESTLER, SOCCER PLAYER, SPACE LOCATION, MOON, etc.
• Average Depth: 3.8 levels• Average Number of “Sisters”: 4.2 nodes
* - Corresponds to UIUC Coarse Type
Four Challenges
Annotation Methodology
• We experimented with three different annotation methodologies:– Method #1: One Pass
• A traditional one-pass annotation where the annotator assigned the final “fine” AT
– Method #2: Two Passes• Annotators first select a “coarse” answer type, then select a “fine” answer type
– Methodology #3: Multiple Passes• Annotators annotate each question according to each decision point in the hierarchy• Annotators can STOP annotation at any level in the hierarchy
Who wears #23 for theLos Angeles Galaxy? SOCCER PLAYERSOCCER PLAYER
Who wears #23 for theLos Angeles Galaxy? HUMANHUMAN SOCCER PLAYERSOCCER PLAYER
Who wears #23 for theLos Angeles Galaxy? HUMANHUMAN SOCCER PLAYERSOCCER PLAYER
INDIVIDUALINDIVIDUAL
ATHLETEATHLETE
Named ComplexValue
Abbrev Human Location Work Other
Individual Group Organization…..
ArtistActor WriterCoachAthlete
One-Pass Annotation
Who is the owner of the Los Angeles Galaxy?
Who is the owner of the Los Angeles Galaxy?
Coarse Types
Fine Types
Named ComplexValue
Abbrev Human Location Work Other
Individual Group Organization…..
ArtistActor WriterCoachAthlete
Two-Pass Annotation
Who is the owner of the Los Angeles Galaxy?
Who is the owner of the Los Angeles Galaxy?
Coarse Types
Fine Types
Named ComplexValue
Abbrev Human Location Work Other
Individual Group Organization…..
ArtistActor WriterCoachAthlete
STOP
Multi-Pass Annotation
Who is the owner of the Los Angeles Galaxy?
Who is the owner of the Los Angeles Galaxy?
Annotating Questions
• Annotated a corpus of 10,000 questions using all three different annotation methods
• UIUC Set: Factoid and Definition Questions (Li and Roth 2002)– How many villi are found in the small intestine?
• Web Crawl Set: “What” questions taken from on-line FAQs– What is the e-mail address for the mayor of Miami?
• Ferret Log: Factoid and Complex Questions taken from previous experiments with LCC’s Ferret question-answering system (Hickl et al. 2006)
– What is the relationship between Iran and Hezbollah?– What power plants are in Baluchestan?
UIUC Train & Test
Web Crawl
FERRET Log
5,952
3,485
563
Total 10,000
Experimental Methodology
• Manually annotated 10,000 factoid questions– “Warm-up” Set: 1000 Questions (Method 1)– Method 2: 4000 Questions– Method 3: 4000 Questions– “Cool Down” Set: 1000 Questions (Method 1)
• Each set annotated by 2 different pairs of annotators• Annotators tasked with annotating 1K questions per session• Differences between annotators resolved after each session; differences between
annotator pairs were resolved after all questions were annotated• Average agreement between individuals per session:
47.4%(initial 1K)
72.3%(final 1K) 79.4%
(Fine)
86.2%(Coarse)
99.8%
85.3%
84.7%
91.5%
97.0%
Method 1 Method 2 Method 3
Four Challenges
Performing Answer Type Detection
• Heuristic (Harabagiu et al. 2001, Harabagiu et al. 2002):– Used lexicosemantic features (e.g. WordNet synsets) to map between question terms and
answer types– Performance dependent on number of synset mappings
• “Flat” Classification:– Classifier used to directly map to one of n fine answer types– Performance degrades as n increases
• (Pure) “Hierarchical” Classification: (Li and Roth 02, Das et al. 05, Hickl et al. 06)– Recursively identifies best “child” node for each answer type– Only the children of the current type are considered as outcomes at every branching point in
the hierarchy– Proceeds until no more branching points, or until a STOP type has been selected
• “Hierarchical” Classifier + Heuristics (Hickl et al. 07, Roberts & Hickl 08)– Uses classifier to identify best “child” node for selected sets of answer types– Uses heuristics to map to some terminal nodes– Proceeds until:
• No more branching points• No heuristics available for mapping to finer types• A STOP type has been selected
Performance of ATD
• Compared performance of 3 classification-based approaches:– “Flat” Classification– Pure “Hierarchical” Classification– “Hierarchical” Classification + Heuristics
• All approaches trained / tested on same questions:– UIUC Hierarchy: 4000 train / 2000 test– LCC Hierarchy: 8000 train / 2000 test
ATD Method Coarse Type Fine Type
Flat -- 79.8%
Pure Hierarchical 92.5% 86.7%
Hybrid Hierarchical 92.5% 89.5%
Four Challenges
Architecture of Ferret
• We used a “baseline” version of LCC’s question-answering system, Ferret (Hickl et al. 2006) in order to evaluate the impact that an expanded ATH could have on Q/A performance.
Question ProcessingQuestion
Processing ATDATD
DocumentRetrieval
DocumentRetrieval
PassageRetrievalPassageRetrieval
Answer Extraction
Answer Extraction
Answer RankingAnswer Ranking
Answer Validation
Answer Validation
Impact on Question-Answering
• Used Ferret on a set of 188 factoid questions taken from the past TREC QA evaluations which had known answer types in both the UIUC and LCC ATHs
– Document Collection: AQUAINT-2 Newswire Corpus (2 GB)– Answers judged by hand based on TREC QA keys– Question considered to be answered correctly (“Top 1”) if valid answer returned in first
position– Question considered answered correctly (“Top 5”) if answer returned in any of the top 5
answers returned by system
Q/A Method Top 1 Performance Top 5 Performance
Coarse Only (11) 31.3% (+8.8%) 34.4% (+7.9%)
Coarse + Fine (307) 38.6% (+10.3%) 48.5% (+17.1%)
Conclusions
• Annotated a corpus of more than 10,000 factoid questions with appropriate “fine” answer types with nearly 90% inter-annotator agreement
• Constructed classifier-based ATD models capable of associating questions with their appropriate answer type with nearly 90% accuracy
• Incorporated new ATD system into a baseline Q/A system; showed improvement of more than 10% over system using previous ATH
Talk Overview
• Introduction• Four Challenges:
– Challenge 1: Organization.Can we organize a large entity hierarchy into a workable answer type hierarchy?
– Challenge 2: Annotation.Can we reliably annotate questions with fine-grained types from a large ATH?What’s the best way to perform annotation?
– Challenge 3: Learning.Can we learn models for performing fine-grained ATD?How do they compare with current ATD models?
– Challenge 4: Implementation.How do we incorporate ATD into a Q/A system (without sacrificing performance)?
• Conclusions