197
01/01/22 1 Lecture 32 Question Answering November 10, 2005

Lecture 32 Question Answering

  • Upload
    temple

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 32 Question Answering. November 10, 2005. Question Answering Tutorial. John M. Prager IBM T.J. Watson Research Center [email protected]. Tutorial Overview. Ground Rules Part I - Anatomy of QA A Brief History of QA Terminology The essence of Text-based QA - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 32 Question Answering

04/22/23 1

Lecture 32

Question Answering

November 10, 2005

Page 2: Lecture 32 Question Answering

04/22/23 2

Question Answering TutorialJohn M. Prager

IBM T.J. Watson Research [email protected]

Page 3: Lecture 32 Question Answering

04/22/23 3

Tutorial Overview Ground Rules Part I - Anatomy of QA

A Brief History of QA Terminology The essence of Text-based QA Basic Structure of a QA System NE Recognition and Answer Types Answer Extraction

Part II - Specific Approaches By Genre By System

Part III - Issues and Advanced Topics Evaluation No Answer Question Difficulty Dimensions of QA Relationship questions Decomposition/Recursive QA Constraint-based QA Cross-Language QA

References

Page 4: Lecture 32 Question Answering

04/22/23 4

Part I - Anatomy of QA

Terminology The Essence of Text-based QA Basic Structure of a QA System NE Recognition and Answer Types Answer Extraction

Page 5: Lecture 32 Question Answering

04/22/23 5

Some “factoid” questions from TREC8-9 9: How far is Yaroslavl from Moscow? 15: When was London's Docklands Light Railway constructed? 22: When did the Jurassic Period end? 29: What is the brightest star visible from Earth? 30: What are the Valdez Principles? 73: Where is the Taj Mahal? 134: Where is it planned to berth the merchant ship, Lane Victory, which

Merchant Marine veterans are converting into a floating museum? 197: What did Richard Feynman say upon hearing he would receive the Nobel

Prize in Physics? 198: How did Socrates die? 199: How tall is the Matterhorn? 200: How tall is the replica of the Matterhorn at Disneyland? 227: Where does dew come from? 269: Who was Picasso? 298: What is California's state tree?

Page 6: Lecture 32 Question Answering

04/22/23 6

Terminology

Question Type Answer Type Question Focus Question Topic Candidate Passage Candidate Answer Authority File/List

Page 7: Lecture 32 Question Answering

04/22/23 7

Terminology – Question Type

Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats E.g. TREC2003

FACTOID: “How far is it from Earth to Mars?” LIST: “List the names of chewing gums” DEFINITION: “Who is Vlad the Impaler?”

Other possibilities: RELATIONSHIP: “What is the connection between Valentina Tereshkova

and Sally Ride?” SUPERLATIVE: “What is the largest city on Earth?” YES-NO: “Is Saddam Hussein alive?” OPINION: “What do most Americans think of gun control?” CAUSE&EFFECT: “Why did Iraq invade Kuwait?” …

Page 8: Lecture 32 Question Answering

04/22/23 8

Terminology – Answer Type

Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g. PERSON (from “Who …”) PLACE (from “Where …”) DATE (from “When …”) NUMBER (from “How many …”) …

but also EXPLANATION (from “Why …”) METHOD (from “How …”) …

Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer.

Page 9: Lecture 32 Question Answering

04/22/23 9

Terminology – Question Focus

Question Focus: The property or entity that is being sought by the question.

E.g. “In what state is the Grand Canyon?” “What is the population of Bulgaria?” “What colour is a pomegranate?”

Page 10: Lecture 32 Question Answering

04/22/23 10

Terminology – Question Topic

Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus.

E.g. “What is the height of Mt. Everest?” height is the focus Mt. Everest is the topic

Page 11: Lecture 32 Question Answering

04/22/23 11

Terminology – Candidate Passage

Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question.

Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers.

Candidate passages will usually have associated scores, from the search engine.

Page 12: Lecture 32 Question Answering

04/22/23 12

Terminology – Candidate Answer

Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type. In some systems, the type match may be approximate, if there is the

concept of confusability. Candidate answers are found in candidate passages E.g.

50 Queen Elizabeth II September 8, 2003 by baking a mixture of flour and water

Page 13: Lecture 32 Question Answering

04/22/23 13

Terminology – Authority List Authority List (or File): a collection of instances of a class of interest, used to test a term for class

membership. Instances should be derived from an authoritative source and be as close to complete as possible. Ideally, class is small, easily enumerated and with members with a limited number of lexical forms. Good:

Days of week Planets Elements

Good statistically, but difficult to get 100% recall: Animals Plants Colours

Problematic People Organizations

Impossible All numeric quantities Explanations and other clausal quantities

Page 14: Lecture 32 Question Answering

04/22/23 14

Essence of Text-based QA

Need to find a passage that answers the question. Find a candidate passage (search) Check that semantics of passage and question match Extract the answer

(Single source answers)

Page 15: Lecture 32 Question Answering

04/22/23 15

Essence of Text-based QA

For a very small corpus, can consider every passage as a candidate, but this is not interesting

Need to perform a search to locate good passages. If search is too broad, have not achieved that much, and are

faced with lots of noise If search is too narrow, will miss good passages

Search

Two broad possibilities: Optimize search Use iteration

Page 16: Lecture 32 Question Answering

04/22/23 16

Essence of Text-based QA

Need to test whether semantics of passage match semantics of question Count question words present in passage Score based on proximity Score based on syntactic relationships Prove match

Match

Page 17: Lecture 32 Question Answering

04/22/23 17

Essence of Text-based QA

Find candidate answers of same type as the answer type sought in question.

Has implications for size of type hierarchy Where/when/whether to consider subsumption

Consider later

Answer Extraction

Page 18: Lecture 32 Question Answering

04/22/23 18

Basic Structure of a QA-System

See for example Abney et al., 2000; Clarke et al., 2001; Harabagiu et al.; Hovy et al., 2001; Prager et al. 2000

QuestionAnalysis

AnswerExtraction

SearchCorpus

orWeb

Question

Answer

Documents/ passages

Query

AnswerType

Page 19: Lecture 32 Question Answering

04/22/23 19

Essence of Text-based QA

Have three broad locations in the system where expansion takes place, for purposes of matching passages

Where is the right trade-off? Question Analysis.

Expand individual terms to synonyms (hypernyms, hyponyms, related terms) Reformulate question

In Search Engine Generally avoided for reasons of computational expense

At indexing time Stemming/lemmatization

High-Level View of Recall

Page 20: Lecture 32 Question Answering

04/22/23 20

Essence of Text-based QA

Have three broad locations in the system where narrowing/filtering/matching takes place

Where is the right trade-off?

Question Analysis. Include all question terms in query Use IDF-style weighting to indicate preferences

Search Engine Possibly store POS information for polysemous terms

Answer Extraction Reward (penalize) passages/answers that (don’t) pass test Particularly attractive for temporal modification

High-Level View of Precision

Page 21: Lecture 32 Question Answering

04/22/23 21

Answer Types and Modifiers

Most likely there is no type for “French Cities” So will look for CITY

include “French/France” in bag of words, and hope for the best include “French/France” in bag of words, retrieve documents, and look

for evidence (deep parsing, logic) use high-precision Language Identification on results If you have a list of French cities, could either

Filter results by list Use Answer-Based QA (see later)

Use longitude/latitude information of cities and countries

Name 5 French Cities

Page 22: Lecture 32 Question Answering

04/22/23 22

Answer Types and Modifiers

Most likely there is no type for “female figure skater” Most likely there is no type for “figure skater” Look for PERSON, with query terms {figure, skater} What to do about “female”? Two approaches.

1. Include “female” in the bag-of-words. • Relies on logic that if “femaleness” is an interesting property, it might well

be mentioned in answer passages. • Does not apply to, say “singer”.

2. Leave out “female” but test candidate answers for gender. • Needs either an authority file or a heuristic test.• Test may not be definitive.

Name a female figure skater

Page 23: Lecture 32 Question Answering

04/22/23 23

Named Entity Recognition

BBN’s IdentiFinder (Bikel et al. 1999) Hidden Markov Model

Sheffield GATE (http://www.gate.ac.uk/) Development Environment for IE and other NLP activities

IBM’s Textract/Resporator (Byrd & Ravin, 1999; Wacholder et al. 1997; Prager et al. 2000) FSMs and Authority Files

+ others

Inventory of semantic classes recognized by NER related closely to set of answer types system can handle

Page 24: Lecture 32 Question Answering

04/22/23 24

Named Entity Recognition

Page 25: Lecture 32 Question Answering

04/22/23 25

Probabilistic Labelling (IBM)

In Textract, a Proper name can be one of the following PERSON PLACE ORGANIZATION MISC_ENTITY (e.g. names of Laws, Treaties, Reports, …)

However, NER needs another class (UNAME) for any proper name it can’t identify.

In a large corpus, many entities end up being UNAMEs. If, for example, a “Where” question seeks a PLACE, and similarly for

the others above, then is being classified as UNAME a death sentence? How will a UNAME ever be searched for?

Page 26: Lecture 32 Question Answering

04/22/23 26

Probabilistic Labelling (IBM) When entity is ambiguous or plain unknown, use a set of disjoint

special labels in NER, instead of UNAME Assumes NER is able to rule out some possibilities, at least

sometimes. Annotate with all remaining possibilities Use these labels as part of answer type E.g.

UNP <-> could be a PERSON UNL <-> could be a PLACE UNO <-> could be an ORGANIZATION UNE <-> could be a MISC_ENTITY

So {UNP UNL} <-> could be a PERSON or a PLACE This would be a good label for Beverly Hills

Page 27: Lecture 32 Question Answering

04/22/23 27

Probabilistic Labelling (IBM)

So “Who” questions that would normally generate {PERSON} as answer type, now generate {PERSON UNP}

Question: “Who is David Beckham married to?” Answer Passage: “David Beckham, the soccer star engaged

to marry Posh Spice, is being blamed for England 's World Cup defeat.”

“Posh Spice” gets annotated with {UNP UNO} Match occurs, answer found. Crowd erupts!

Page 28: Lecture 32 Question Answering

04/22/23 28

Issues with NER Coreference

Should referring terms (definite noun phrases, pronouns) be labelled the same way as the referent terms?

Nested Noun Phrases (and other structures of interest) What granularity? Partly depends on whether multiple annotations are allowed

Subsumption and Ambiguity What label(s) to choose? Probabilistic labelling

Page 29: Lecture 32 Question Answering

04/22/23 29

Spanish Prime Minister Felipe Gonzales

1

2

3

4

5

6

7

How to Annotate?

Nationality

Role

Person

“… Baker will leave Jerusalem on Saturday and stop in Madrid on the way home to talk to Spanish Prime Minister Felipe Gonzales.”

What about: The U.S. ambassador to Spain, Ed Romero ?

Page 30: Lecture 32 Question Answering

04/22/23 30

Answer Extraction

Also called Answer Selection/Pinpointing Given a question and candidate passages, the process of selecting

and ranking candidate answers. Usually, candidate answers are those terms in the passages which

have the same answer type as that generated from the question Ranking the candidate answers depends on assessing how well the

passage context relates to the question 3 Approaches:

Heuristic features Shallow parse fragments Logical proof

Page 31: Lecture 32 Question Answering

04/22/23 31

Answer Extraction using Features Heuristic feature sets (Prager et al. 2003+). See also (Radev at al. 2000) Calculate feature values for each CA, and then calculate linear combination

using weights learned from training data. Ranking criteria:

Good global context: the global context of a candidate answer evaluates the relevance of

the passage from which the candidate answer is extracted to the question.

Good local context the local context of a candidate answer assesses the likelihood that

the answer fills in the gap in the question. Right semantic type

the semantic type of a candidate answer should either be the same as or a subtype of the answer type identified by the question analysis component.

Redundancy the degree of redundancy for a candidate answer increases as more

instances of the answer occur in retrieved passages.

Page 32: Lecture 32 Question Answering

04/22/23 32

Answer Extraction using Features (cont.)

Features for Global Context KeywordsInPassage: the ratio of keywords present in a passage to the

total number of keywords issued to the search engine. NPMatch: the number of words in noun phrases shared by both the

question and the passage. SEScore: the ratio of the search engine score for a passage to the

maximum achievable score. FirstPassage: a Boolean value which is true for the highest ranked

passage returned by the search engine, and false for all other passages. Features for Local Context

AvgDistance: the average distance between the candidate answer and keywords that occurred in the passage.

NotInQuery: the number of words in the candidate answers that are not query keywords.

Page 33: Lecture 32 Question Answering

04/22/23 33

Answer Extraction using Relationships

Computing Ranking Scores – Linguistic knowledge to compute passage & candidate answer scores

Perform syntactic processing on question and candidate passages Extract predicate-argument & modification relationships from parse

Question: “Who wrote the Declaration of Independence?” Relationships: [X, write], [write, Declaration of Independence]

Answer Text: “Jefferson wrote the Declaration of Independence.” Relationships: [Jefferson, write], [write, Declaration of Independence]

Compute scores based on number of question relationship matches Passage score: consider all instantiated relationships Candidate answer scores: consider relationships with variable

Page 34: Lecture 32 Question Answering

04/22/23 34

Answer Extraction using Relationships (cont.)

Example: When did Amtrak begin operations? Question relationships

[Amtrak, begin], [begin, operation], [X, begin] Compute passage scores: passages and relationships

In 1971, Amtrak began operations,… [Amtrak, begin], [begin, operation], [1971, begin]…

“Today, things are looking better,” said Claytor, expressing optimism about getting the additional federal funds in future years that will allow Amtrak to begin expanding its operations.

[Amtrak, begin], [begin, expand], [expand, operation], [today, look]… Airfone, which began operations in 1984, has installed air-to-ground phones….

Airfone also operates Railfone, a public phone service on Amtrak trains. [Airfone, begin], [begin, operation], [1984, operation], [Amtrak, train]…

Page 35: Lecture 32 Question Answering

04/22/23 35

Answer Extraction using Logic Logical Proof

Convert question to a goal Convert passage to set of logical forms representing

individual assertions Add predicates representing subsumption rules, real-

world knowledge Prove the goal

See section on LCC later

Page 36: Lecture 32 Question Answering

04/22/23 36

Question Answering Tutorial Part II

John M. PragerIBM T.J. Watson Research Center

[email protected]

Page 37: Lecture 32 Question Answering

04/22/23 37

Part II - Specific Approaches

By Genre Statistical QA Pattern-based QA Web-based QA Answer-based QA (TREC only)

By System SMU LCC USC-ISI Insight Microsoft IBM Statistical IBM Rule-based

Page 38: Lecture 32 Question Answering

04/22/23 38

Approaches by Genre By Genre

Statistical QA Pattern-based QA Web-based QA Answer-based QA (TREC only)

Web-based QA Database-based QA

Considerations Effectiveness by question-type

Precision and recall Expandability to other domains Ease of adaptation to CL-QA

Page 39: Lecture 32 Question Answering

04/22/23 39

Statistical QA

Use statistical distributions to model likelihoods of answer type and answer

E.g. IBM (Ittycheriah, 2001) – see later section

Page 40: Lecture 32 Question Answering

04/22/23 40

Pattern-based QA

For a given question type, identify the typical syntactic constructions used in text to express answers to such questions

Typically very high precision, but a lot of work to get decent recall

Page 41: Lecture 32 Question Answering

04/22/23 41

Web-Based QA

Exhaustive string transformations Brill et al. 2002

Learning Radev et al. 2001

Page 42: Lecture 32 Question Answering

04/22/23 42

Answer-Based QA

Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B.

Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B.

Artificial problem, but real for TREC participants.

Page 43: Lecture 32 Question Answering

04/22/23 43

Answer-Based QA

Web-Based solution:When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher. Hermjakob et al. 2002

Why this is true:Why this is true:

The Web is much larger than the TREC Corpus (3,000 : 1)The Web is much larger than the TREC Corpus (3,000 : 1)

TREC questions are generated from Web logs, and the style of TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections.Web content than to newswire collections.

Page 44: Lecture 32 Question Answering

04/22/23 44

Answer-Based QA Database/Knowledge-base/Ontology solution:

When question syntax is simple and reliably recognizable, can express as a logical form

Logical form represents entire semantics of question, and can be used to access structured resource:

WordNet On-line dictionaries Tables of facts & figures Knowledge-bases such as Cyc

Having found answer construct a query with original question terms + answer Retrieve passages Tell Answer Extraction the answer it is looking for

Page 45: Lecture 32 Question Answering

04/22/23 45

Approaches of Specific Systems

SMU Falcon LCC USC-ISI Insight Microsoft IBM

Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors

Page 46: Lecture 32 Question Answering

04/22/23 46

SMU Falcon

Harabagiu et al. 2000

Page 47: Lecture 32 Question Answering

04/22/23 47

SMU Falcon

From question, dependency structure called question semantic form is created Query is Boolean conjunction of terms From answer passages that contain at least one instance of answer type,

generate answer semantic form 3 processing loops: Loop 1

Triggered when too few or too many passages are retrieved from search engine Loop 2

Triggered when question semantic form and answer semantic form cannot be unified

Loop 3 Triggered when unable to perform abductive proof of answer correctness

Page 48: Lecture 32 Question Answering

04/22/23 48

SMU Falcon

Loops provide opportunities to perform alternations Loop 1: morphological expansions and nominalizations Loop 2: lexical alternations – synonyms, direct hypernyms and

hyponyms Loop 3: paraphrases Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-

byte task in TREC9 Loop 1: 40% Loop 2: 52% Loop 3: 8% Combined: 76%

Page 49: Lecture 32 Question Answering

04/22/23 49

LCC

Moldovan & Rus, 2001 Uses Logic Prover for answer justification

Question logical form Candidate answers in logical form XWN glosses Linguistic axioms Lexical chains

Inference engine attempts to verify answer by negating question and proving a contradiction

If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold.

Page 50: Lecture 32 Question Answering

04/22/23 50

LCC: Lexical ChainsQ:1518 What year did Marco Polo travel to Asia?

Answer: Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra

Lexical Chains: (1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1

(2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1

(3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART -> Southeast _Asia:n#1 -> ISPART -> Asia:n#1

Q:1570 What is the legal age to vote in Argentina?Answer: Voting is mandatory for all Argentines aged over 18.Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1(2) age:n#1 -> RGLOSS -> aged:a#3(3) Argentine:a#1 -> GLOSS -> Argentina:n#1

Page 51: Lecture 32 Question Answering

04/22/23 51

LCC: Logic Prover Question

Which company created the Internet Browser Mosaic? QLF: (_organization_AT(x2) ) & company_NN(x2) & create_VB(e1,x2,x6) &

Internet_NN(x3) & browser_NN(x4) & Mosaic_NN(x5) & nn_NNC(x6,x3,x4,x5) Answer passage

... Mosaic , developed by the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois at Urbana - Champaign ...

ALF: ... Mosaic_NN(x2) & develop_VB(e2,x2,x31) & by_IN(e2,x8) & National_NN(x3) & Center_NN(x4) & for_NN(x5) & Supercomputing_NN(x6) & application_NN(x7) & nn_NNC(x8,x3,x4,x5,x6,x7) & NCSA_NN(x9) & at_IN(e2,x15) & University_NN(x10) & of_NN(x11) & Illinois_NN(x12) & at_NN(x13) & Urbana_NN(x14) & nn_NNC(x15,x10,x11,x12,x13,x14) & Champaign_NN(x16) ...

Lexical Chains develop <-> make and make <->create exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1) <-> make_vb(e2,x7,x1) &

something_nn(x1) & new_jj(x1) & such_jj(x1) & product_nn(x2) & or_cc(x4,x1,x3) & mental_jj(x3) & artistic_jj(x3) & creation_nn(x3)).

all e1 x1 x2 (make_vb(e1,x1,x2) <-> create_vb(e1,x1,x2) & manufacture_vb(e1,x1,x2) & man-made_jj(x2) & product_nn(x2)).

Linguistic axioms all x0 (mosaic_nn(x0) -> internet_nn(x0) & browser_nn(x0))

Page 52: Lecture 32 Question Answering

04/22/23 52

USC-ISI Textmap system

Ravichandran and Hovy, 2002 Hermjakob et al. 2003

Use of Surface Text Patterns When was X born ->

Mozart was born in 1756 Gandhi (1869-1948)Can be captured in expressions <NAME> was born in <BIRTHDATE> <NAME> (<BIRTHDATE> -

These patterns can be learned

Page 53: Lecture 32 Question Answering

04/22/23 53

USC-ISI TextMap Use bootstrapping to learn patterns. For an identified question type (“When was X born?”), start with known answers for some values

of X Mozart 1756 Gandhi 1869 Newton 1642

Issue Web search engine queries (e.g. “+Mozart +1756” ) Collect top 1000 documents Filter, tokenize, smooth etc. Use suffix tree constructor to find best substrings, e.g.

Mozart (1756-1791) Filter

Mozart (1756- Replace query strings with e.g. <NAME> and <ANSWER>

Determine precision of each pattern Find documents with just question term (Mozart) Apply patterns and calculate precision

Page 54: Lecture 32 Question Answering

04/22/23 54

USC-ISI TextMap

Finding Answers Determine Question type Perform IR Query Do sentence segmentation and smoothing Replace question term by question tag

i.e. replace Mozart with <NAME> Search for instances of patterns associated with question type Select words matching <ANSWER> Assign scores according to precision of pattern

Page 55: Lecture 32 Question Answering

04/22/23 55

Insight Soubbotin, 2002. Soubbotin & Soubbotin, 2003. Performed very well in TREC10/11 Comprehensive and systematic use of “Indicative patterns” E.g.

cap word; paren; 4 digits; dash; 4 digits; parenmatches

Mozart (1756-1791) The patterns are broader than named entities “Semantics in syntax” Patterns have intrinsic scores (reliability), independent of question

Page 56: Lecture 32 Question Answering

04/22/23 56

Insight Patterns with more sophisticated internal structure are more indicative of answer 2/3 of their correct entries in TREC10 were answered by patterns E.g.

a == {countries} b == {official posts} w == {proper names (first and last)} e == {titles or honorifics} Patterns for “Who is the President (Prime Minister) of given country?

abeww ewwdb,a b,aeww

Definition questions: (A is primary query term, X is answer) <A; comma; [a/an/the]; X; [comma/period]> For: “Moulin Rouge, a cabaret”

<X; [comma]; [also] called; A [comma]> For: “naturally occurring gas called methane”

<A; is/are; [a/an/the]; X> For: “Michigan’s state flower is the apple blossom”

Page 57: Lecture 32 Question Answering

04/22/23 57

Insight

Emphasis on shallow techniques, lack of NLP Look in vicinity of text string potentially matching pattern for

“zeroing” – e.g. for occupational roles: Former Elect Deputy Negation

Comments: Relies on redundancy of large corpus Works for factoid question types of TREC-QA – not clear how it extends Not clear how they match questions to patterns Named entities within patterns have to be recognized

Page 58: Lecture 32 Question Answering

04/22/23 58

Microsoft Data-Intensive QA. Brill et al. 2002 “Overcoming the surface string mismatch between the question

formulation and the string containing the answer” Approach based on the assumption/intuition that someone on the Web

has answered the question in the same way it was asked. Want to avoid dealing with:

Lexical, syntactic, semantic relationships (bet. Q & A) Anaphora resolution Synonymy Alternate syntax Indirect answers

Take advantage of redundancy on Web, then project to TREC corpus (Answer-based QA)

Page 59: Lecture 32 Question Answering

04/22/23 59

Microsoft AskMSR Formulate multiple queries – each rewrite has intrinsic score. E.g.

for “What is relative humidity?” [“+is relative humidity”, LEFT, 5] [“relative +is humidity”, RIGHT, 5] [“relative humidity +is”, RIGHT, 5] [“relative humidity”, NULL, 2] [“relative” AND “humidity”, NULL, 1]

Get top 100 documents from Google Extract n-grams from document summaries Score n-grams by summing the scores of the rewrites it came from Use tiling to merge n-grams Search for supporting documents in TREC corpus

Page 60: Lecture 32 Question Answering

04/22/23 60

Microsoft AskMSR

Question is: “What is the rainiest place on Earth” Answer from Web is: “Mount Waialeale” Passage in TREC corpus is: “… In misty Seattle, Wash., last

year, 32 inches of rain fell. Hong Kong gets about 80 inches a year, and even Pago Pago, noted for its prodigious showers, gets only about 196 inches annually. (The titleholder, according to the National Geographic Society, is Mount Waialeale in Hawaii, where about 460 inches of rain falls each year.) …”

Very difficult to imagine getting this passage by other means

Page 61: Lecture 32 Question Answering

04/22/23 61

IBM Statistical QA (Ittycheriah, 2001)

ATM predicts, from the question and a proposed answer, the answer type they both satisfy

Given a question, an answer, and the predicted answer type, ASM seeks to model the correctness of this configuration.

Distributions are modelled using a maximum entropy formulation

Training data = human judgments For ATM, 13K questions annotated with 31 categories For ASM, ~ 5K questions from TREC plus trivia

p(c|q,a) = e p(c,e|q,a)= e p(c|e,q,a) p(e|q,a)

q = questiona = answerc = “correctness”e = answer type

p(e|q,a) is the answer type model (ATM)p(c|e,q,a) is the answer selection model (ASM)

Page 62: Lecture 32 Question Answering

04/22/23 62

IBM Statistical QA (Ittycheriah) Question Analysis (by ATM)

Selects one out of 31 categories Search

Question expanded by Local Context Analysis Top 1000 documents retrieved

Passage Extraction: Top 100 passages that: Maximize question word match Have desired answer type Minimize dispersion of question words Have similar syntactic structure to question

Answer Extraction: Candidate answers ranked using ASM

Page 63: Lecture 32 Question Answering

04/22/23 63

IBM Rule-basedPredictive Annotation (Prager 2000, Prager 2003)

Want to make sure passages retrieved by search engine have at least one candidate answer

Recognize that candidate answer is of correct answer type which corresponds to a label (or several) generated by Named Entity Recognizer

Annotate entire corpus and index semantic labels along with text Identify answer types in questions and include corresponding labels in

queries

Page 64: Lecture 32 Question Answering

04/22/23 64

IBM PIQUANTPredictive Annotation – E.g.: Question is “Who invented baseball?” “Who” can map to PERSON$ or ORGANIZATION$ Suppose we assume only people invent things (it doesn’t really matter).

So “Who invented baseball?” -> {PERSON$ invent baseball}

haveSPORT$ PERSON$

baseball

by

Doubleday

be invent

Consider text “… but its conclusion was based largely on the recollections of a man named Abner Graves, an elderly mining engineer, who reported that baseball had been "invented" by Doubleday between 1839 and 1841. ”

Page 65: Lecture 32 Question Answering

04/22/23 65

IBM PIQUANTPredictive Annotation – Previous example

“Who invented baseball?” -> {PERSON$ invent baseball} However, same structure is equally effective at answering

“What sport did Doubleday invent?” -> {SPORT$ invent Doubleday}

haveSPORT$ PERSON$

baseball

by

Doubleday

be invent

Page 66: Lecture 32 Question Answering

04/22/23 66

IBM Rule-Based

Handling Subsumption & Disjunction If an entity is of a type which has a parent type, then how is annotation done? If a proposed answer type has a parent type, then what answer type should be

used? If an entity is ambiguous then what should the annotation be? If the answer type is ambiguous, then what should be used?

Guidelines: If an entity is of a type which has a parent type, then how is annotation done? If a proposed answer type has a parent type, then what answer type should be

used? If an entity is ambiguous then what should the annotation be? If the answer type is ambiguous, then what should be used?

Page 67: Lecture 32 Question Answering

04/22/23 67

Subsumption & Disjunction Consider New York City – both a CITY and a PLACE

To answer “Where did John Lennon die?”, it needs to be a PLACE To answer “In what city is the Empire State Building?”, it needs to be a

CITY. Do NOT want to do subsumption calculation in search engine

Two scenarios 1. Expand Answer Type and use most specific entity annotation

1A { (CITY PLACE) John_Lennon die} matches CITY 1B {CITY Empire_State_Building} matches CITYOr2. Use most specific Answer Type and multiple annotations of NYC 2A {PLACE John_Lennon die} matches (CITY PLACE) 2B {CITY Empire_State_Building} matches (CITY PLACE)

Case 2 preferred for simplicity, because disjunction in #1 should contain all hyponyms of PLACE, while disjunction in #2 should contain all hypernyms of CITY

Choice #2 suggests can use disjunction in answer type to represent ambiguity: “Who invented the laser” -> {(PERSON ORGANIZATION) invent laser}

Page 68: Lecture 32 Question Answering

04/22/23 68

Clausal classes Any structure that can be recognized in text can be annotated.

Quotations Explanations Methods Opinions …

Any semantic class label used in annotation can be indexed, and hence used as a target of search: What did Karl Marx say about religion? Why is the sky blue? How do you make bread? What does Arnold Schwarzenegger think about global warming? …

Page 69: Lecture 32 Question Answering

04/22/23 69

Named Entity Recognition

Page 70: Lecture 32 Question Answering

04/22/23 70

IBMPredictive Annotation – Improving Precision at no cost to Recall E.g.: Question is “Where is Belize?” “Where” can map to

(CONTINENT$, WORLDREGION$, COUNTRY$, STATE$, CITY$, CAPITAL$, LAKE$, RIVER$ … ).

But we know Belize is a country. So “Where is Belize?” ->

{(CONTINENT$ WORLDREGION$) Belize}

Belize occurs 1068 times in TREC corpus Belize and PLACE$ co-occur in only 537 sentences Belize and CONTINENT$ or WORLDREGION$ co-occur in only 128

sentences

Page 71: Lecture 32 Question Answering

04/22/23 71

Page 72: Lecture 32 Question Answering

04/22/23 72

Page 73: Lecture 32 Question Answering

04/22/23 73

Virtual Annotation (Prager 2001)

Use WordNet to find all candidate answers (hypernyms)

Use corpus co-occurrence statistics to select “best” ones Rather like approach to WSD by Mihalcea and

Moldovan (1999)

Page 74: Lecture 32 Question Answering

04/22/23 74

Parentage of “nematode”Level Synset

0 {nematode, roundworm}

1 {worm}

2 {invertebrate}

3 {animal, animate being, beast, brute, creature, fauna}

4 {life form, organism, being, living thing}

5 {entity, something}

Page 75: Lecture 32 Question Answering

04/22/23 75

Parentage of “meerkat”Level Synset

0 {meerkat, mierkat}

1 {viverrine, viverrine mammal}

2 {carnivore}

3 {placental, placental mammal, eutherian, eutherian mammal}

4 {mammal}

5 {vertebrate, craniate}

6 {chordate}

7 {animal, animate being, beast, brute, creature, fauna}

8 {life form, organism, being, living thing}

9 {entity, something}

Page 76: Lecture 32 Question Answering

04/22/23 76

Natural Categories

“Basic Objects in Natural Categories” Rosch et al. (1976)

According to psychological testing, these are categorization levels of intermediate specificity that people tend to use in unconstrained settings.

Page 77: Lecture 32 Question Answering

04/22/23 77

What is this?

Page 78: Lecture 32 Question Answering

04/22/23 78

What can we conclude?

There are descriptive terms that people are drawn to use naturally.

We can expect to find instances of these in text, in the right contexts.

These terms will serve as good answers.

Page 79: Lecture 32 Question Answering

04/22/23 79

Virtual Annotation (cont.) Find all parents of query term in WordNet Look for co-occurrences of query term and parent in text corpus Expect to find snippets such as:

“… meerkats and other Y …” Many different phrasings are possible, so we just look for proximity,

rather than parse. Scoring:

Count co-occurrences of each parent with search term, and divide by level number (only levels >= 1), generating Level-Adapted Count (LAC).

Exclude very highest levels (too general). Select parent with highest LAC plus any others with LAC within 20%.

Page 80: Lecture 32 Question Answering

04/22/23 80

Parentage of “nematode”

Level Synset

0 {nematode, roundworm}

1 {worm(13)}

2 {invertebrate}

3 {animal(2), animate being, beast, brute, creature, fauna}

4 {life form(2), organism(3), being, living thing}

5 {entity, something}

Page 81: Lecture 32 Question Answering

04/22/23 81

Parentage of “meerkat”

Level Synset

0 {meerkat, mierkat}

1 {viverrine, viverrine mammal}

2 {carnivore}

3 {placental, placental mammal, eutherian, eutherian mammal}

4 {mammal}

5 {vertebrate, craniate}

6 {chordate}

7 {animal(2), animate being, beast, brute, creature, fauna}

8 {life form, organism, being, living thing}

9 {entity, something}

Page 82: Lecture 32 Question Answering

04/22/23 82

Sample Answer Passages

“What is a nematode?” -> “Such genes have been found in nematode worms but not yet in

higher animals.”

“What is a meerkat?” -> “South African golfer Butch Kruger had a good round going in the

central Orange Free State trials, until a mongoose-like animal grabbed his ball with its mouth and dropped down its hole. Kruger wrote on his card: "Meerkat."”

Use Answer-based QA to locate answers

Page 83: Lecture 32 Question Answering

04/22/23 83

Use of Cyc as Sanity Checker Cyc: Large Knowledge-base and Inference engine (Lenat 1995) A post-hoc process for

Rejecting “insane” answers How much does a grey wolf weigh?

• 300 tons Boosting confidence for “sane” answers

Sanity checker invoked with Predicate, e.g. “weight” Focus, e.g. “grey wolf” Candidate value, e.g. “300 tons”

Sanity checker returns “Sane”: + or – 10% of value in Cyc “Insane”: outside of the reasonable range

Plan to use distributions instead of ranges “Don’t know”

Confidence score highly boosted when answer is “sane”

Page 84: Lecture 32 Question Answering

04/22/23 84

Cyc Sanity Checking Example

Trec11 Q: “What is the population of Maryland?” Without sanity checking

PIQUANT’s top answer: “50,000” Justification: “Maryland’s population is 50,000 and growing rapidly.” Passage discusses an exotic species “nutria”, not humans

With sanity checking Cyc knows the population of Maryland is 5,296,486 It rejects the top “insane” answers PIQUANT’s new top answer: “5.1 million” with very high confidence

Page 85: Lecture 32 Question Answering

04/22/23 85

Question Answering Tutorial Part III

John M. PragerIBM T.J. Watson Research Center

[email protected]

Page 86: Lecture 32 Question Answering

04/22/23 86

Part III – Issues, Advanced Topics

Evaluation No Answer Question Difficulty Future of QA/Hot topics

Dimensions of QA Relationship questions Decomposition / Recursive QA Constraint-based QA Cross-Language QA

Page 87: Lecture 32 Question Answering

04/22/23 87

Evaluation

Relatively straightforward for “factoid” questions. TREC-8 (1999) & TREC-9 (2000)

50-byte and 250-byte tasks Systems returned top 5 answers Mean Reciprocal Rank

1 point if top answer is correct, else 0.5 point if second answer is correct, else … 0.2 point if fifth answer is correct, else 0

Page 88: Lecture 32 Question Answering

04/22/23 88

Evaluation

For each question, a set of “correct” answers “Correctness” testing is easy to automate with

pattern files, but patterns are subjective Patterns don’t/can’t test for justification

Page 89: Lecture 32 Question Answering

04/22/23 89

Evaluation TREC-10 (2001)

Dropped 250-byte task Introduced NIL (No Answer ) questions

TREC-11 (2002) Instead of top 5 answers, systems returned top 1 Answer must be “exact” Definition questions (“What/who is X?”) dropped Results returned sorted in order of system’s confidence Scored by Confidence Weighted Score (= Average Precision)

TREC-12 (2003) Definition questions re-introduced, but answers assumed to be a collection of

“nuggets” List questions introduced, answers must be exact Definition and List questions evaluated by F-measure biased to favour recall Factoid questions evaluated by fraction correct

Page 90: Lecture 32 Question Answering

04/22/23 90

Confidence-Weighted Score (Average Precision)

N

ii

iquestiontoupcorrectNCWS

1

#1

= average of N different precision measures Score1 participates in every term Score2 participates in all but first, … ScoreN participates in just last term

Much more weight given to early terms in sum

Page 91: Lecture 32 Question Answering

04/22/23 91

Contribution by Rank Position

For N questions, if contribution of correct answer in position k is ck

kkNNc

kN

k1)ln()

11ln(

ck = ck+1 +

1/kNcN+1 = 0

N =500

Page 92: Lecture 32 Question Answering

04/22/23 92

Average Precision

With no sorting by confidence, expected score is n/N.

For N questions total, if n questions are correct, maximum possible AP (CWS) score (green curve) is approx. )

1ln1(

nN

Nn

N =500

Ranking-ability = (AP – expected)/(max - expected)

Ranking-ability = .54, .63, .63

Page 93: Lecture 32 Question Answering

04/22/23 93

Evaluation Issues

What is really meant by “exact answer”? What if there is a mistake in question?

Suppose question is “Who said X?”, where X is a famous saying with a mistake in it.

Maybe the answer is NIL What granularity is required?

“Where is Chicago?” “What is acetominophen?”

Difficult to answer without model of user.

Page 94: Lecture 32 Question Answering

04/22/23 94

Questions with No Answer Subtle difference between:

1. This question has no answer (within the available resources),2. This question has no answer (at all), and3. I don’t know the answer

TREC-QA tests #1 (“NIL questions”), but systems typically answer as if #3

Strategies used: When allowed top 5 answers (with confidences)

Always put NIL in position X (X in {2,3,4,5}) If some criterion succeeds, put NIL in position X (X in {1,2,3,4,5}) Determine some threshold T, and insert NIL at corresponding position in

confidence ranking (1-5, or not) When single answer

Determine some threshold T, and insert NIL if answer confidence < T

Page 95: Lecture 32 Question Answering

04/22/23 95

NIL and CWS

When Confidence-Weighted Score is used, what should the NIL strategy be?

If an answer has low confidence and is replaced by NIL, then what is its new confidence?

Study strategy used by IBM in TREC11 (Chu-Carroll et al. 2003)

Page 96: Lecture 32 Question Answering

04/22/23 96

No-Answer Confidence-Based Calculation

Use TREC10 Data to determine strategy and thresholds Observe that lowest-confidence questions are more often No-

Answer than correct Examine TREC10 distribution to determine cut-off threshold.

Convert all questions below this to NIL. Improves average confidence of block.

Move converted block to rank with same average precision.

Confidences based on Grammatical Relationships Semantic Relationships Redundancy

Page 97: Lecture 32 Question Answering

04/22/23 97

TREC10 Distribution

NIL CORRECT OUT OF xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50.x--......xx....-.-..x.-....-.-..x...........--... 9 5 50-.-.-..--...-x.xx....-.-x......-.....-..-...-.x.-. 13 5 50

Key: X Correct. Incorrect- NIL

Page 98: Lecture 32 Question Answering

04/22/23 98

Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C.

TREC10 Distribution

NIL CORRECT OUT OF xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50.x--......xx....-.-..x.-....-.-..x...........--... 9 5 50-.-.-..--...-x.xx....-.-x......-.....-..-...-.x.-. 13 5 50

Key: X Correct. Incorrect- NIL

C

Page 99: Lecture 32 Question Answering

04/22/23 99

Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C.

TREC10 Distribution

NIL CORRECT OUT OF xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50..xx............x.x....x....x.x..............xx... all 9 50x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50

Key: X Correct. Incorrect- NIL

C

Page 100: Lecture 32 Question Answering

04/22/23 100

Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C.

TREC10 Distribution

NIL CORRECT OUT OF xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50..xx............x.x....x....x.x..............xx... all 9 50x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50

Key: X Correct. Incorrect- NIL

Calculate precision of block P = 22/100

C

Page 101: Lecture 32 Question Answering

04/22/23 101

Changing all answers in block to NIL gains 22-10 = 12 correct. Note confidence of leading element = C.

TREC10 Distribution

NIL CORRECT OUT OF xxxxxxxxxxxxxx.xx.xxxxxxxxxxxxxx.x..xx.xx 0 35 41xxxxxx-x.-x.xxxxxxxx..x-xxxxxxxxxx.xxxxx.x.xxx.-xx 4 38 50xx.....x.-xx.....xx....x.xx.x..xxx.xx...xx.x..xx.x 1 22 50.-...x.xx-..x..x.xx....xx.x...xx.....x..xxx....xx. 2 18 50........x....x..xxxx...x...xx....xxxxx--......xxx. 2 17 50..x.xxx...-x-...xx.....x...xx--.xx-....xx..x..x... 5 16 50..x.x.-......x....x.x-.x.xx...-x-x-x-...-..x-x.x.x 8 15 50x..-x.....x.x.....-..........-...-..x.-....-..x... 6 6 50..xx............x.x....x....x.x..............xx... all 9 50x.x.x..xx...x........x.x.......x.....x..x...x...x. all 13 50

Key: X Correct. Incorrect- NIL

Calculate precision of block P = 22/100

Calculate point with same local precision P. Note confidence K.

C

K

Page 102: Lecture 32 Question Answering

04/22/23 102

?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????? ?????????????????????????????????????????????????? ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Make all answers in block NIL, and add K-C to each confidence.

?????????????????????????????????????????????????? -???????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????? ?????????????????????????????????????????????????? ???---------------------------------------------------------------------------------------------------------------------------------------------------

NIL Placement in TREC11 Answers

CFind point with confidence K. Insert block at this point. Subtract C from all confidences to the right.

K

Sorted by confidence, but correctness unknown

Find point with confidence C. (Block is of size 147)

C

Find point with confidence C. (Block is of size 147)

Page 103: Lecture 32 Question Answering

04/22/23 103

NIL Placement in TREC11 Answers ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????----------------------------------- -------------------------------------------------- --------------------------------------------------------------????????????????????????????????????????????????????????????????????????????????????????

Find point with confidence K. Insert block at this point. Subtract C from all confidences to the right.

Sorted by confidence, but correctness unknown

Find point with confidence C. (Block is of size 147)

Find point with confidence C. (Block is of size 147)

Make all answers in block NIL, and add K-C to each confidence.

Page 104: Lecture 32 Question Answering

04/22/23 104

NIL Placement in TREC11 Answers - Impact ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????????????????? ???????????????----------------------------------- -------------------------------------------------- --------------------------------------------------------------????????????????????????????????????????????????????????????????????????????????????????

29 out of 46 NIL answers located – recall of .63

9 previously-correct answers lost

Total of 20 correct questions gained … 20/500 = 4%

Minimal (< 0.5%) improvement in final AP score

Page 105: Lecture 32 Question Answering

04/22/23 105

Question Complexity

“Simple” questions are not a solved problem: Complex questions can be decomposed into simpler components. If simpler questions cannot be handled successfully, there’s no hope for

more complex ones.

Areas not explored (intentionally) by TREC to date:

• spelling errors

• grammatical errors

• syntactic precision e.g. significance of articles

• “not”, “only”, “just” …

Page 106: Lecture 32 Question Answering

04/22/23 106

Page 107: Lecture 32 Question Answering

04/22/23 107

Question Complexity• When was Queen Victoria born?

… King George III’s only granddaughter to survive infancy was born in 1819 …… Victoria was the only daughter of Edward, Duke of Kent … … George III’s fourth son Edward became Duke of Kent …

All of the current leading economic indicators point in the direction of the Federal Reserve Bank raising interest rates at next week’s meeting. Alan Greenspan, Fed chairman.

42. (The Hitchhiker’s Guide to the Galaxy)

• Should the Fed raise interest rates?Should the Fed raise interest rates?

• What is the meaning of life?What is the meaning of life?

Page 108: Lecture 32 Question Answering

04/22/23 108

Question Complexity

Not a function of question alone, but rather the pair {question, corpus}

In general, it is a function of the question and the resources to answer it, which include text corpora, databases, knowledge bases, ontologies and processing modules

Complexity ≡ Impedance Match

Page 109: Lecture 32 Question Answering

04/22/23 109

Future of QA

By fixing resources, can make factoid QA more difficult by intentionally exploiting requirements for advanced NLP and/or reasoning

Questions that require more than one resource / document for an answer E.g. What is the relationship between A and B?

Question decomposition Cross-language QA

How to advance the field

Page 110: Lecture 32 Question Answering

04/22/23 110

Dimensions of QA “Answer Topology”

Characteristics of correct answer set Language

Vocabulary & Syntax Question as a problem

Enumeration, arithmetic, inference User Model

Who’s asking the question Opinions, hypotheses, predictions, beliefs

Page 111: Lecture 32 Question Answering

04/22/23 111

Answer Set Topology No Answer, one, many When are two different answers the same –

Natural variation Size of an elephant

Estimation Populations

Variation over time Populations, Prime Ministers

Choose correct presentation format Lists, charts, graphs, dialogues

Page 112: Lecture 32 Question Answering

04/22/23 112

LanguageThe biggest current roadblock to Question Answering is arguably

Natural Language: Anaphora Definite Noun Phrases Synonyms Subsumption Metonyms Paraphrases Negation & other such qualification Nonce words Idioms Figures of speech Poetic & other stylistic variations …

Page 113: Lecture 32 Question Answering

04/22/23 113

Negation (1)

Q: Who invented the electric guitar?A: While Mr. Fender did not invent the electric guitar, he did revolutionize and perfect it.

Note: Not all instances of “not” will invalidate a passage.

Page 114: Lecture 32 Question Answering

04/22/23 114

Questions as Word Problems

Text Match Find text that says “London is the largest city in England” (or paraphrase).

“Superlative” Search Find a table of English cities and their populations, and sort. Find a list of the 10 largest cities in the world, and see which are in England.

Uses logic: if L > all objects in set R then L > all objects in set E R.

Find the population of as many individual English cities as possible, and choose the largest.

Heuristics London is the capital of England. (Not guaranteed to imply it is the largest city, but quite

likely.) Complex Inference

E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”; “London is larger than Paris”; “London is in England”.

What is the largest city in England?

Page 115: Lecture 32 Question Answering

04/22/23 115

Negation (2)

Name a US state where cars are manufactured. versus

Name a US state where cars are not manufactured.

Certain kinds of negative events or instances are rarely asserted explicitly in text, but must be deduced by other means

Page 116: Lecture 32 Question Answering

04/22/23 116

Other Adverbial Modifiers (Only, Just etc.)

Name an astronaut who nearly made it to the moon

To satisfactorily answer such questions, need to know what are the different ways in which events can fail to happen. In this case there are several.

Page 117: Lecture 32 Question Answering

04/22/23 117

Need for User Model

What is meant? The city: what granularity is required? The rock group The play/movie The sports team (which one?)

Can hardly choose the right answer without knowing who is asking the question, and why.

Where is Chicago?

What is mold?

Page 118: Lecture 32 Question Answering

04/22/23 118

Not all “What is” Questions are definitional

• Subclass or instance– What is a powerful adhesive?

• Distinction from co-members of class– What is a star fruit?

• Value or more common synonym– What is a nanometer?– What is rubella?

• Subclass/instance with property – What is a yellow spotted lizard?

• Ambiguous: definition or instance – What is an antacid?

From a Web log:

Page 119: Lecture 32 Question Answering

04/22/23 119

Attention to Details

Tenses Who is the Prime Minister of Japan?

Number What are the largest snakes in the world?

Articles What is mold? Where is the Taj Mahal?

^^

Page 120: Lecture 32 Question Answering

04/22/23 120

Opinions, Hypotheses, Predictions and Beliefs

What does X think about Y? Will X happen?

‘ “X will happen”, says Dr. A’ ‘Prof. B believes that X will happen.’ ‘X will happen’ (asserted by article writer)

e.g. Is global warming real?

• How many countries did the Pope visit in 1990?– “ … the Pope’s planned visit to Argentina …”

Page 121: Lecture 32 Question Answering

04/22/23 121

What is appropriate for QA? How much emphasis should be placed on:

Retrieval Built-in knowledge Computation Estimation Inference

Sample questions What is one plus one? How many $2 pencils can I buy for $10? How many genders are there? How many legs does a person have? How many books are there in a local library? What was the dilemma facing Hamlet?

Page 122: Lecture 32 Question Answering

04/22/23 122

Relationship Questions

An exercise in the ARDA AQUAINT program. “What has been the relationship between Osama bin Laden and

Sudan?” “What does Soviet Cosmonaut Valentina Tereshkova

(Vladinrouna) and U.S. Astronaut Sally Ride have in common?” “What is the connection between actor and comedian Chris

Rock and former Washington, D.C. mayor Marion Barry?” Two approaches (Cycorp and IBM)

Page 123: Lecture 32 Question Answering

04/22/23 123

Cycorp Approach

Use original question terms as IR query Break top retrieved documents into sentences Generate Bayesian network with words as nodes from Sentence x

Word matrix Select ancestor terms to augment query

E.g. “What is the connection between actor and comedian Chris Rock and former Washington, D.C. mayor Marion Barry?”

Augmentation terms = {drug, arrested} Iterate but where new network has sentences as nodes Output sentences that are neighbours of augmented query

Single Strategy

Page 124: Lecture 32 Question Answering

04/22/23 124

IBM Approach

Extending pattern-based agent “What is the relationship between X and Y?” -> locate syntactic contexts with

X and Y: conjunction subject-verb-object objects of prepositions.

New profile-based agent Local Context Analysis on documents containing either X or Y Form vector of terms, normalize, intersect, sort “What do Valentina Tereshkova and Sally Ride have in common?” ->

Space First Woman Collins (the first woman to ever fly the space shuttle)

Multi-part Strategy, including:

Page 125: Lecture 32 Question Answering

04/22/23 125

Decomposition/Recursive QA “Who/What is X” require a profile of the subject – QA-by-

Dossier Can generate auxiliary questions based on type of question

focus. When/where was X born? When/where/how did X die? What occupation did X have?

Can generate follow-up questions based on earlier answers What did X win? What did X write? What did X discover?

Page 126: Lecture 32 Question Answering

04/22/23 126

Constraint-based QA

QA-by-Dossier-with-Constraints Variation of QA-by-Dossier Ask auxiliary questions that constrain the answer to

the original question. Prager et al. (submitted)

Page 127: Lecture 32 Question Answering

04/22/23 127

When did Leonardo paint the Mona Lisa?

Score Answer

1 .64 2000

2 .43 1988

3 .34 1911

4 .31 1503

5 .31 1490

Page 128: Lecture 32 Question Answering

04/22/23 128

Capitalize on existence of natural relationships between events/situations that can be used as constraints E.g. A person’s achievements occurred during his/her lifetime. Develop constraints for a person and an achievement event:

date(died) <= date(born) + 100 date(event) >= date(born) + 10 date(event) <= date(died)

For each constraint variable, ask Auxiliary Question to generate set of candidate answers, e.g. When was Leonardo born? When did Leonardo die?

Constraints

Page 129: Lecture 32 Question Answering

04/22/23 129

Auxiliary Questions

Score Answer1 .66 1452

2 .12 1519

3 .04 1920

4 .04 1987

5 .04 1501

When was Leonardo born?

Score Answer1 .99 1519

2 .98 1989

3 .96 1452

4 .60 1988

5 .60 1990

When did Leonardo die?

Page 130: Lecture 32 Question Answering

04/22/23 130

Dossier-with-Constraints Process

Dossier for LeonardoBorn = 1452Died = 1519Painted Mona Lisa = 1503

OriginalQuestion

AuxiliaryQuestions Constraints

Constraint Satisfaction +Confidence Combination

+ +

Page 131: Lecture 32 Question Answering

04/22/23 131

Cross-Language QA

Probably easiest approach is to translate question to language of collection, and perform monolingual QA

All considerations that apply to CL-IR apply to CL-QA, and then some: Named Entity Recognition Parsers Ontologies …

Page 132: Lecture 32 Question Answering

04/22/23 132

Cross-Language QA

Jung and Lee, 2002. User Query -> NLP -> SQL -> Relational Database Morphological Analysis and Linguistic Resources are

language dependent. Generate Lexico-Semantic patterns

Page 133: Lecture 32 Question Answering

04/22/23 133

Cross-Language QA

TREC CLIR for several years CLEF (Cross-Language Evaluation Forum)

http://clef.iei.pi.cnr.it:2002/ CLIR activities for several years CL-QA in 2003 http://clef-qa.itc.it/

Page 134: Lecture 32 Question Answering

04/22/23 134

References Abney, S., Collins, M. and Singhal, A. “Answer Extraction”. In Proceedings ANLP 2000. E. Brill, J. Lin, M. Banko, S. Dumais and A. Ng, “Data-Intensive Question Answering”, in Proceedings of the 10th Text

Retrieval Conference (TREC-2001), NIST, Gaithersburg, MD, 2002. D. Bikel, R. Schwartz, R. Weischedel, "An Algorithm that Learns What's in a Name," Machine Learning, 1999. Byrd, R. and Ravin, Y. “Identifying and Extracting Relations in Text.” In Proceedings of NLDB 99, Klagenfurt, Austria, 1999. Jennifer Chu-Carroll, John Prager, Christopher Welty, Krzysztof Czuba and David Ferrucci. "A Multi-Strategy and Multi-

Source Approach to Question Answering", Proceedings of TREC2002, Gaithersburg, MD, 2003. Clarke, C.L.A., Cormack, G.V., Kisman, D.I.E. and Lynam, T.R. “Question answering by passage selection (Multitext

experiments for TREC-9)” in Proceedings of the 9th Text Retrieval Conference, pp. 673-683, NIST, Gaithersburg, MD, 2001. Sanda Harabagiu, Dan Moldovan, Marius Pasca, Rada Mihalcea, Mihai Surdeanu, Razvan Bunescu, Roxana Girju, Vasile

Rus and Paul Morarescu, FALCON: Boosting Knowledge for Answer Engines, in Proceedings of the 9th Text Retrieval Conference, pp. 479-488, NIST, Gaithersburg MD, 2001.

Sanda Harabagiu, Dan Moldovan, Marius Pasca, Rada Mihalcea, Mihai Surdeanu, Razvan Bunescu, Roxana Girju, Vasile Rus and Paul Morarescu, The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering, in Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL-2001), July 2001, Toulouse France, pages 274-281.

Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalowicz, Jonathan Slocum: Developing a Natural Language Interface to Complex Data. VLDB 1977: 292

Page 135: Lecture 32 Question Answering

04/22/23 135

References Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C-Y. “Question answering in Webclopedia” in Proceedings of the 9th

Text Retrieval Conference, pp. 655-664, NIST, Gaithersburg, MD, 2001. Ulf Hermjakob, Abdessamad Echihabi and Daniel Marcu, Natural Language Based Reformulation Resource and Web

Exploitation for Question Answering Proceedings of TREC2002, Gaithersburg MD, 2003. Hanmin Jung, Gary Geunbae Lee, Multilingual Question Answering with High Portability on Relational Databases Workshop

on Multilingual Summarization and Question Answering, COLING 2002 Boris Katz. “Annotating the World Wide Web using natural language”. Proceedings RIAO 1997. Kupiec, J. “Murax: A robust linguistic approach for question answering using an on-line encyclopedia”. Proceedings 16th

SIGIR, Pittsburgh, PA 2001. Lenat, D. B. 1995. "Cyc: A Large-Scale Investment in Knowledge Infrastructure." Communications of the ACM 38, no. 11. Mihalcea, R. and Moldovan, D. “A Method for Word Sense Disambiguation of Unrestricted Text”. Proceedings of the 37th

Annual Meeting of the Association for Computational Linguistics (ACL-99), pp. 152-158, College Park, MD, 1999. Miller, G. “WordNet: A Lexical Database for English”, Communications of the ACM 38(11) pp. 39-41, 1995. Dan I. Moldovan and Vasile Rus, ``Logic Form Transformation of WordNet and its Applicability to Question Answering'',

Proceedings of the ACL 2001 Conference, July 2001,Toulouse, France. Marius Pasca and Sanda Harabagiu, High Performance Question/Answering, in Proceedings of the 24th Annual

International ACL SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2001), September 2001, New Orleans LA, pages 366-374.

Page 136: Lecture 32 Question Answering

04/22/23 136

References John M. Prager, Jennifer Chu-Carroll and Krzysztof Czuba, "A Multi-Strategy, Multi-Question Approach to Question

Answering" submitted for publication. Prager, J.M., Chu-Carroll, J., Brown, E.W. and Czuba, K. "Question Answering by Predictive Annotation”, in Advances in

Open-Domain Question-Answering", Strzalkowski, T. and Harabagiu, S. Eds., Kluwer Academic Publishers, to appear 2003?.

Prager, J.M., Radev, D.R. and Czuba, K. “Answering What-Is Questions by Virtual Annotation”. Proceedings of Human Language Technologies Conference, San Diego CA, March 2001.

Prager, J.M., Brown, E.W., Coden, A. and Radev, R. "Question-Answering by Predictive Annotation”. Proceedings of SIGIR 2000, pp. 184-191, Athens, Greece.

Radev, D.R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Zhang, Z., Fan, W. & Prager, J.M. “Mining the Web for Answers to Natural Language Questions”, Proceedings of CIKM, Altlanta GA., 2001.

Radev, D.R., Prager, J.M. and Samn, V. "Ranking Suspected Answers to Natural Language Questions using Predictive Annotation”. Proceedings of ANLP 2000, pp. 150-157, Seattle, WA.

Deepak Ravichandran and Eduard Hovy, “Learning Surface Text Patterns for a Question Answering System”. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 41-47.

Rosch, E. et al. “Basic Objects in Natural Categories”, Cognitive Psychology 8, pp. 382-439, 1976. Soubbotin, M. “Patterns of Potential Answer Expressions as Clues to the Right Answers” in Proceedings of the 10th Text

Retrieval Conference, pp. 293-302, NIST, Gaithersburg, MD, 2002. Soubbotin, M. and Soubbotin, S. “Use of Patterns for Detection of Answer Strings: A Systematic Approach” in Proceedings

of the 11th Text Retrieval Conference, pp. 325-331, NIST, Gaithersburg, MD, 2003.

Page 137: Lecture 32 Question Answering

04/22/23 137

References Ellen M. Voorhees and Dawn Tice. 2000. Building a question answering test collection. In 23rd Annual International ACM

SIGIR Conference on Research and Development in Information Retrieval, pages 200-207, Athens, August. N. Wacholder, Y. Ravin and M. Choi. “Disambiguation of Proper Names in Text”, Proceedings of ANLP’97. Washington,

DC, April 1997. Warren, David H.D., & Fernando C.N. Pereira (1982) "An efficient easily adaptable system for interpreting natural

language queries," Computational Linguistics, 8:3-4, 110-122. Terry Winograd. 1972. Procedures as a representation for data in a computer program for under-standing natural

language. Cognitive Psychology, 3(1).

Page 138: Lecture 32 Question Answering

04/22/23 138

QA Block Architecture

QuestionProcessing

PassageRetrieval

AnswerExtraction

WordNet

NERParser

WordNet

NERParser

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

Page 139: Lecture 32 Question Answering

04/22/23 139

Question Processing

Two main tasks Determining the type of the answer Extract keywords from the question and formulate a

query

Page 140: Lecture 32 Question Answering

04/22/23 140

Answer Types

Factoid questions… Who, where, when, how many… The answers fall into a limited and somewhat

predictable set of categories Who questions are going to be answered by… Where questions…

Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract

Page 141: Lecture 32 Question Answering

04/22/23 141

Answer Types

Of course, it isn’t that easy… Who questions can have organizations as answers

Who sells the most hybrid cars? Which questions can have people as answers

Which president went to war with Mexico?

Page 142: Lecture 32 Question Answering

04/22/23 142

Answer Type Taxonomy Contains ~9000 concepts reflecting expected answer types Merges named entities with the WordNet hierarchy

Page 143: Lecture 32 Question Answering

04/22/23 143

Answer Type Detection

Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.

But remember our notion of matching. It doesn’t do any good to do something complex here if it can’t also be done in potential answer texts.

Page 144: Lecture 32 Question Answering

04/22/23 144

Keyword Selection

Answer Type indicates what the question is looking for, but that doesn’t really help in finding relevant texts (i.e. Ok, let’s look for texts with people in them)

Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context.

Page 145: Lecture 32 Question Answering

04/22/23 145

Lexical Terms Extraction Questions approximated by sets of unrelated words

(lexical terms) Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms

Q002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

Page 146: Lecture 32 Question Answering

04/22/23 146

Keyword Selection Algorithm

1. Select all non-stopwords in quotations2. Select all NNP words in recognized named entities3. Select all complex nominals with their adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the answer type word

Page 147: Lecture 32 Question Answering

04/22/23 147

Passage Retrieval

QuestionProcessing

PassageRetrieval

AnswerExtraction

WordNet

NERParser

WordNet

NERParser

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

Page 148: Lecture 32 Question Answering

04/22/23 148

Passage Extraction Loop

Passage Extraction Component Extracts passages that contain all selected keywords Passage size dynamic Start position dynamic

Passage quality and keyword adjustment In the first iteration use the first 6 keyword selection

heuristics If the number of passages is lower than a threshold query

is too strict drop a keyword If the number of passages is higher than a threshold

query is too relaxed add a keyword

Page 149: Lecture 32 Question Answering

04/22/23 149

Passage Scoring Passages are scored based on keyword windows

For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

k1 k2 k3k2 k1

Window 1 k1 k2 k3k2 k1

Window 2

k1 k2 k3k2 k1

Window 3 k1 k2 k3k2 k1

Window 4

Page 150: Lecture 32 Question Answering

04/22/23 150

Passage Scoring

Passage ordering is performed using a sort that involves three scores: The number of words from the question that are

recognized in the same sequence in the window The number of words that separate the most

distant keywords in the window The number of unmatched keywords in the

window

Page 151: Lecture 32 Question Answering

04/22/23 151

Answer Extraction

QuestionProcessing

PassageRetrieval

AnswerExtraction

WordNet

NERParser

WordNet

NERParser

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

Page 152: Lecture 32 Question Answering

04/22/23 152

Ranking Candidate Answers

Answer type: Person Text passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Q066: Name the first private citizen to fly in space.

Page 153: Lecture 32 Question Answering

04/22/23 153

Ranking Candidate Answers

Answer type: Person Text passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Best candidate answer: Christa McAuliffe

Q066: Name the first private citizen to fly in space.

Page 154: Lecture 32 Question Answering

04/22/23 154

Features for Answer Ranking

Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the candidate

answer Number of question terms matched in the same sentence as the candidate

answer Flag set to 1 if the candidate answer is followed by a punctuation sign Number of question terms matched, separated from the candidate answer by

at most three words and one comma Number of terms occurring in the same order in the answer

passage as in the question Average distance from candidate answer to question term matches

SIGIR ‘01

Page 155: Lecture 32 Question Answering

04/22/23 155

Evaluation

Evaluation of this kind of system is usually based on some kind of TREC-like metric.

In Q/A the most frequent metric is Mean reciprocal rank

You’re allowed to return N answers. Your score is based on 1/Rank of the first right answer.

Averaged over all the questions you answer.

Page 156: Lecture 32 Question Answering

04/22/23 156

LCC/UTD/SMU

Page 157: Lecture 32 Question Answering

04/22/23 157

Is the Web Different?

In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.

The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.

But…

Page 158: Lecture 32 Question Answering

04/22/23 158

The Web is Different

On the Web popular factoids are likely to be expressed in a gazzilion different ways.

At least a few of which will likely match the way the question was asked.

So why not just grep (or agrep) the Web using all or pieces of the original question.

Page 159: Lecture 32 Question Answering

04/22/23 159

The Google answer #1

Include question words etc. in your stop-listDo standard IR

Sometimes this (sort of) works:

Question: Who was the prime minister of Australia during the Great Depression?

Answer: James Scullin (Labor) 1929–31.

Page 160: Lecture 32 Question Answering

04/22/23 160

Page about Curtin (WW II Labor Prime Minister)(Can deduce answer)

Page about Curtin (WW II Labor Prime Minister)

(Lacks answer)

Page about Chifley(Labor Prime Minister)(Can deduce answer)

Page 161: Lecture 32 Question Answering

04/22/23 161

But often it doesn’t…

Question: How much money did IBM spend on advertising in 2002?

Answer: I dunno, but I’d like to …

Page 162: Lecture 32 Question Answering

04/22/23 162

Lot of ads onGoogle these days!

No relevant info(Marketing firm page)

No relevant info(Mag page on ad exec)

No relevant info(Mag page on MS-IBM)

Page 163: Lecture 32 Question Answering

04/22/23 163

The Google answer #2

Take the question and try to find it as a string on the webReturn the next sentence on that web page as the

answerWorks brilliantly if this exact question appears as a FAQ

question, etc.Works lousily most of the timeReminiscent of the line about monkeys and typewriters

producing ShakespeareBut a slightly more sophisticated version of this approach

has been revived in recent years with considerable success…

Page 164: Lecture 32 Question Answering

04/22/23 164

A Brief (Academic) HistoryA Brief (Academic) History

In some sense question answering is not a new research area Question answering systems can be found in many areas of

NLP research, including: Natural language database systems

• A lot of early NLP work on these Spoken dialog systems

• Currently very active and commercially relevant

The focus on open-domain QA is new MURAX (Kupiec 1993): Encyclopedia answers Hirschman: Reading comprehension tests TREC QA competition: 1999–

Page 165: Lecture 32 Question Answering

04/22/23 165

AskJeeves

AskJeeves is probably most hyped example of “Question answering”

It largely does pattern matching to match your question to their own knowledge base of questions

If that works, you get the human-curated answers to that known question

If that fails, it falls back to regular web searchA potentially interested middle ground, but a fairly

weak shadow of real QA

Page 166: Lecture 32 Question Answering

04/22/23 166

Online QA Examples

ExamplesAnswerBus is an open-domain question answering

system: www.answerbus.com Ionaut: http://www.ionaut.com:8400/EasyAsk, AnswerLogic, AnswerFriend, Start,

Quasm, Mulder, Webclopedia, etc.

Page 167: Lecture 32 Question Answering

04/22/23 167

Question Answering at TRECQuestion Answering at TREC

Question answering competition at TREC consists of answering a set of 500 fact-based questions, e.g., “When was Mozart born?”.

For the first three years systems were allowed to return 5 ranked answer snippets (50/250 bytes) to each question. IR think Mean Reciprocal Rank (MRR) scoring:

1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ doc Mainly Named Entity answers (person, place, date, …)

From 2002 the systems are only allowed to return a single exact answer and the notion of confidence has been introduced.

Page 168: Lecture 32 Question Answering

04/22/23 168

The TREC Document CollectionThe TREC Document Collection

The current collection uses news articles from the following sources:

AP newswire, 1998-2000 New York Times newswire, 1998-2000 Xinhua News Agency newswire, 1996-2000

In total there are 1,033,461 documents in the collection. 3GB of text

Clearly this is too much text to process entirely using advanced NLP techniques so the systems usually consist of an initial information retrieval phase followed by more advanced processing.

Many supplement this text with use of the web, and other knowledge bases

Page 169: Lecture 32 Question Answering

04/22/23 169

Sample TREC questions

1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"?2. What was the monetary value of the Nobel Peace Prize in 1989?3. What does the Peugeot company manufacture?4. How much did Mercury spend on advertising in 1993?5. What is the name of the managing director of Apricot Computer?6. Why did David Koresh ask the FBI for a word processor?7. What debts did Qintex group leave?8. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)?

Page 170: Lecture 32 Question Answering

04/22/23 170

Top Performing SystemsTop Performing Systems

Currently the best performing systems at TREC can answer approximately 70% of the questions

Approaches and successes have varied a fair dealKnowledge-rich approaches, using a vast array of NLP

techniques stole the show in 2000, 2001AskMSR system stressed how much could be achieved

by very simple methods with enough text (and now various copycats)

Middle ground is to use large collection of surface matching patterns (ISI)

Page 171: Lecture 32 Question Answering

04/22/23 171

AskMSR

Web Question Answering: Is More Always Better? Dumais, Banko, Brill, Lin, Ng (Microsoft, MIT, Berkeley)

Q: “Where isthe Louvrelocated?”

Want “Paris”or “France”or “75058Paris Cedex 01”or a map

Don’t justwant URLs

Page 172: Lecture 32 Question Answering

04/22/23 172

AskMSR: Shallow approach

In what year did Abraham Lincoln die?Ignore hard documents and find easy ones

Page 173: Lecture 32 Question Answering

04/22/23 173

AskMSR: Details

1 2

3

45

Page 174: Lecture 32 Question Answering

04/22/23 174

Step 1: Rewrite queries

Intuition: The user’s question is often syntactically quite close to sentences that contain the answerWhere is the Louvre Museum located?

The Louvre Museum is located in Paris

Who created the character of Scrooge?

Charles Dickens created the character of Scrooge.

Page 175: Lecture 32 Question Answering

04/22/23 175

Query rewriting Classify question into seven categories

Who is/was/are/were…? When is/did/will/are/were …? Where is/are/were …?

a. Category-specific transformation ruleseg “For Where questions, move ‘is’ to all possible locations”

“Where is the Louvre Museum located” “is the Louvre Museum located” “the is Louvre Museum located” “the Louvre is Museum located” “the Louvre Museum is located” “the Louvre Museum located is”

b. Expected answer “Datatype” (eg, Date, Person, Location, …)When was the French Revolution? DATE

Hand-crafted classification/rewrite/datatype rules(Could they be automatically learned?)

Nonsense,but whocares? It’sonly a fewmore queriesto Google.

Page 176: Lecture 32 Question Answering

04/22/23 176

Query Rewriting - weights

One wrinkle: Some query rewrites are more reliable than others

+“the Louvre Museum is located”

Where is the Louvre Museum located?Weight 5if we get a match, it’s probably right

+Louvre +Museum +located

Weight 1Lots of non-answerscould come back too

Page 177: Lecture 32 Question Answering

04/22/23 177

Step 2: Query search engine

Send all rewrites to a Web search engineRetrieve top N answers (100?)For speed, rely just on search engine’s “snippets”,

not the full text of the actual document

Page 178: Lecture 32 Question Answering

04/22/23 178

Step 3: Mining N-Grams

Unigram, bigram, trigram, … N-gram:list of N adjacent terms in a sequence

Eg, “Web Question Answering: Is More Always Better” Unigrams: Web, Question, Answering, Is, More, Always, Better Bigrams: Web Question, Question Answering, Answering Is, Is More,

More Always, Always Better Trigrams: Web Question Answering, Question Answering Is, Answering Is

More, Is More Always, More Always Betters

Page 179: Lecture 32 Question Answering

04/22/23 179

Mining N-Grams

Simple: Enumerate all N-grams (N=1,2,3 say) in all retrieved snippets

Use hash table and other fancy footwork to make this efficientWeight of an n-gram: occurrence count, each weighted by

“reliability” (weight) of rewrite that fetched the documentExample: “Who created the character of Scrooge?”

Dickens - 117 Christmas Carol - 78 Charles Dickens - 75 Disney - 72 Carl Banks - 54 A Christmas - 41 Christmas Carol - 45 Uncle - 31

Page 180: Lecture 32 Question Answering

04/22/23 180

Step 4: Filtering N-Grams

Each question type is associated with one or more “data-type filters” = regular expression

When…Where…What …Who …

Boost score of n-grams that do match regexpLower score of n-grams that don’t match regexp

Date

LocationPerson

Page 181: Lecture 32 Question Answering

04/22/23 181

Step 5: Tiling the Answers

Dickens

Charles Dickens

Mr Charles

Scores

20

15

10

merged, discardold n-grams

Mr Charles DickensScore 45

N-Gramstile highest-scoring n-gram

N-Grams

Repeat, until no more overlap

Page 182: Lecture 32 Question Answering

04/22/23 182

Results

Standard TREC contest test-bed:~1M documents; 900 questions

Technique doesn’t do too well (though would have placed in top 9 of ~30 participants!)MRR = 0.262 (ie, right answered ranked about #4-#5)Why? Because it relies on the enormity of the Web!

Using the Web as a whole, not just TREC’s 1M documents… MRR = 0.42 (ie, on average, right answer is ranked about #2-#3)

Page 183: Lecture 32 Question Answering

04/22/23 183

Issues

In many scenarios (e.g., monitoring an individuals email…) we only have a small set of documents

Works best/only for “Trivial Pursuit”-style fact-based questions

Limited/brittle repertoire ofquestion categoriesanswer data types/filtersquery rewriting rules

Page 184: Lecture 32 Question Answering

04/22/23 184

ISI: Surface patterns approach

Use of Characteristic Phrases"When was <person> born”

Typical answers "Mozart was born in 1756.” "Gandhi (1869-1948)...”

Suggests phrases like "<NAME> was born in <BIRTHDATE>” "<NAME> ( <BIRTHDATE>-”

as Regular Expressions can help locate correct answer

Page 185: Lecture 32 Question Answering

04/22/23 185

Use Pattern Learning

Example: “The great composer Mozart (1756-1791) achieved fame at

a young age” “Mozart (1756-1791) was a genius” “The whole world would always be indebted to the great

music of Mozart (1756-1791)”Longest matching substring for all 3 sentences is

"Mozart (1756-1791)”Suffix tree would extract "Mozart (1756-1791)" as

an output, with score of 3Reminiscent of IE pattern learning

Page 186: Lecture 32 Question Answering

04/22/23 186

Pattern Learning (cont.)

Repeat with different examples of same question type “Gandhi 1869”, “Newton 1642”, etc.

Some patterns learned for BIRTHDATEa. born in <ANSWER>, <NAME>b. <NAME> was born on <ANSWER> , c. <NAME> ( <ANSWER> -d. <NAME> ( <ANSWER> - )

Page 187: Lecture 32 Question Answering

04/22/23 187

Experiments

6 different Q types from Webclopedia QA Typology (Hovy et al., 2002a)

BIRTHDATE LOCATION INVENTOR DISCOVERER DEFINITION WHY-FAMOUS

Page 188: Lecture 32 Question Answering

04/22/23 188

Experiments: pattern precision

BIRTHDATE table: 1.0 <NAME> ( <ANSWER> - ) 0.85 <NAME> was born on <ANSWER>, 0.6 <NAME> was born in <ANSWER> 0.59 <NAME> was born <ANSWER> 0.53 <ANSWER> <NAME> was born 0.50 - <NAME> ( <ANSWER> 0.36 <NAME> ( <ANSWER> -

INVENTOR 1.0 <ANSWER> invents <NAME> 1.0 the <NAME> was invented by <ANSWER> 1.0 <ANSWER> invented the <NAME> in

Page 189: Lecture 32 Question Answering

04/22/23 189

Experiments (cont.)

DISCOVERER 1.0 when <ANSWER> discovered <NAME> 1.0 <ANSWER>'s discovery of <NAME> 0.9 <NAME> was discovered by <ANSWER> in

DEFINITION 1.0 <NAME> and related <ANSWER> 1.0 form of <ANSWER>, <NAME> 0.94 as <NAME>, <ANSWER> and

Page 190: Lecture 32 Question Answering

04/22/23 190

Experiments (cont.)

WHY-FAMOUS 1.0 <ANSWER> <NAME> called 1.0 laureate <ANSWER> <NAME> 0.71 <NAME> is the <ANSWER> of

LOCATION 1.0 <ANSWER>'s <NAME> 1.0 regional : <ANSWER> : <NAME> 0.92 near <NAME> in <ANSWER>

Depending on question type, get high MRR (0.6–0.9), with higher results from use of Web than TREC QA collection

Page 191: Lecture 32 Question Answering

04/22/23 191

Shortcomings & Extensions

Need for POS &/or semantic types "Where are the Rocky Mountains?” "Denver's new airport, topped with white fiberglass cones in

imitation of the Rocky Mountains in the background , continues to lie empty”

<NAME> in <ANSWER>NE tagger &/or ontology could enable system to

determine "background" is not a location

Page 192: Lecture 32 Question Answering

04/22/23 192

Shortcomings... (cont.)

Long distance dependencies "Where is London?” "London, which has one of the most busiest airports in the

world, lies on the banks of the river Thames” would require pattern like:

<QUESTION>, (<any_word>)*, lies on <ANSWER>Abundance & variety of Web data helps system to find

an instance of patterns w/o losing answers to long distance dependencies

Page 193: Lecture 32 Question Answering

04/22/23 193

Shortcomings... (cont.)

System currently has only one anchor word Doesn't work for Q types requiring multiple words from question to

be in answer "In which county does the city of Long Beach lie?” "Long Beach is situated in Los Angeles County” required pattern:

<Q_TERM_1> is situated in <ANSWER> <Q_TERM_2>Does not use case

"What is a micron?” "...a spokesman for Micron, a maker of semiconductors, said SIMMs

are..."If Micron had been capitalized in question, would be a perfect

answer

Page 194: Lecture 32 Question Answering

04/22/23 194

QA Typology from ISI (USC)Typology of typical Q forms—94 nodes (47 leaf nodes)Analyzed 17,384 questions (from answers.com)(THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY ...)) (MALE-FIRST-NAME (LAWRENCE SAM ...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF ...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK ...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP ...)) (GROUP-OF-PEOPLE (POSSE CHOIR ...)) (STATE-DISTRICT (TIROL MISSISSIPPI ...)) (CITY (ULAN-BATOR VIENNA ...)) (COUNTRY (SULTANATE ZIMBABWE ...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL ...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B ...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY

(SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCE-QUANTITY)) ...

PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE ... EXABYTE)) (MASS-UNIT (OUNCE ...)) (ENERGY-UNIT (BTU ...)) (CURRENCY-UNIT (ZLOTY PESO ...)) (TEMPORAL-UNIT (ATTOSECOND ... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER ...)) (DISTANCE-UNIT (NANOMETER ...)))) (AREA-UNIT (ACRE)) ... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE ...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD ...)) (SOLID-SUBSTANCE (MARBLE PAPER ...)) (GAS-FORM-SUBSTANCE (GAS AIR)) ...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN)) ...) (BODY-PART (ARM HEART ...)) (MUSICAL-INSTRUMENT (PIANO))) ... *GARMENT *PLANT DISEASE)

Page 195: Lecture 32 Question Answering

04/22/23 195

Question Answering Example

How hot does the inside of an active volcano get? get(TEMPERATURE, inside(volcano(active))) “lava fragments belched out of the mountain were as

hot as 300 degrees Fahrenheit” fragments(lava, TEMPERATURE(degrees(300)),

belched(out, mountain)) volcano ISA mountain lava ISPARTOF volcano lava inside volcano fragments of lava HAVEPROPERTIESOF lava

The needed semantic information is in WordNet definitions, and was successfully translated into a form that was used for rough ‘proofs’

Page 196: Lecture 32 Question Answering

04/22/23 196

References

AskMSR: Question Answering Using the Worldwide Web Michele Banko, Eric Brill, Susan Dumais, Jimmy Lin http://www.ai.mit.edu/people/jimmylin/publications/Banko-etal-AA

AI02.pdf In Proceedings of 2002 AAAI SYMPOSIUM on Mining Answers

from Text and Knowledge Bases, March 2002  Web Question Answering: Is More Always Better?

Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng http://research.microsoft.com/~sdumais/SIGIR2002-QA-Submit-C

onf.pdf D. Ravichandran and E.H. Hovy. 2002.

Learning Surface Patterns for a Question Answering System.ACL conference, July 2002.

Page 197: Lecture 32 Question Answering

04/22/23 197

Harder Questions

Factoid question answering is really pretty silly. A more interesting task is one where the answers

are fluid and depend on the fusion of material from disparate texts over time. Who is Condoleezza Rice? Who is Mahmoud Abbas? Why was Arafat flown to Paris?