89
Question- Answering: Overview Ling573 Systems & Applications March 31, 2011

Question-Answering: Overview

  • Upload
    marlow

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Question-Answering: Overview. Ling573 Systems & Applications March 31, 2011. Quick Schedule Notes. No Treehouse this week! CS Seminar: Retrieval from Microblogs (Metzler) April 8, 3:30pm; CSE 609. Roadmap. Dimensions of the problem A (very) brief history Architecture of a QA system - PowerPoint PPT Presentation

Citation preview

Page 1: Question-Answering: Overview

Question-Answering:Overview

Ling573Systems & Applications

March 31, 2011

Page 2: Question-Answering: Overview

Quick Schedule NotesNo Treehouse this week!

CS Seminar: Retrieval from Microblogs (Metzler)April 8, 3:30pm; CSE 609

Page 3: Question-Answering: Overview

RoadmapDimensions of the problemA (very) brief historyArchitecture of a QA systemQA and resourcesEvaluationChallengesLogistics Check-in

Page 4: Question-Answering: Overview

Dimensions of QABasic structure:

Question analysisAnswer searchAnswer selection and presentation

Page 5: Question-Answering: Overview

Dimensions of QABasic structure:

Question analysisAnswer searchAnswer selection and presentation

Rich problem domain: Tasks vary onApplicationsUsersQuestion typesAnswer typesEvaluationPresentation

Page 6: Question-Answering: Overview

ApplicationsApplications vary by:

Answer sourcesStructured: e.g., database fieldsSemi-structured: e.g., database with commentsFree text

Page 7: Question-Answering: Overview

ApplicationsApplications vary by:

Answer sourcesStructured: e.g., database fieldsSemi-structured: e.g., database with commentsFree text

Web Fixed document collection (Typical TREC QA)

Page 8: Question-Answering: Overview

ApplicationsApplications vary by:

Answer sourcesStructured: e.g., database fieldsSemi-structured: e.g., database with commentsFree text

Web Fixed document collection (Typical TREC QA)

Page 9: Question-Answering: Overview

ApplicationsApplications vary by:

Answer sourcesStructured: e.g., database fieldsSemi-structured: e.g., database with commentsFree text

Web Fixed document collection (Typical TREC QA) Book or encyclopedia Specific passage/article (reading comprehension)

Page 10: Question-Answering: Overview

ApplicationsApplications vary by:

Answer sourcesStructured: e.g., database fieldsSemi-structured: e.g., database with commentsFree text

Web Fixed document collection (Typical TREC QA) Book or encyclopedia Specific passage/article (reading comprehension)

Media and modality:Within or cross-language; video/images/speech

Page 11: Question-Answering: Overview

UsersNovice

Understand capabilities/limitations of system

Page 12: Question-Answering: Overview

UsersNovice

Understand capabilities/limitations of system

ExpertAssume familiar with capabiltiesWants efficient information accessMaybe desirable/willing to set up profile

Page 13: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summary

Page 14: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questions

Page 15: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questionsVary dramatically in difficulty

Factoid, List

Page 16: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questionsVary dramatically in difficulty

Factoid, ListDefinitionsWhy/how..

Page 17: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questionsVary dramatically in difficulty

Factoid, ListDefinitionsWhy/how..Open ended: ‘What happened?’

Page 18: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questionsVary dramatically in difficulty

Factoid, ListDefinitionsWhy/how..Open ended: ‘What happened?’

Affected by formWho was the first president?

Page 19: Question-Answering: Overview

Question TypesCould be factual vs opinion vs summaryFactual questions:

Yes/no; wh-questionsVary dramatically in difficulty

Factoid, ListDefinitionsWhy/how..Open ended: ‘What happened?’

Affected by formWho was the first president? Vs Name the first

president

Page 20: Question-Answering: Overview

AnswersLike tests!

Page 21: Question-Answering: Overview

AnswersLike tests!

Form:Short answerLong answerNarrative

Page 22: Question-Answering: Overview

AnswersLike tests!

Form:Short answerLong answerNarrative

Processing:Extractive vs generated vs synthetic

Page 23: Question-Answering: Overview

AnswersLike tests!

Form:Short answerLong answerNarrative

Processing:Extractive vs generated vs synthetic

In the limit -> summarizationWhat is the book about?

Page 24: Question-Answering: Overview

Evaluation & PresentationWhat makes an answer good?

Page 25: Question-Answering: Overview

Evaluation & PresentationWhat makes an answer good?

Bare answer

Page 26: Question-Answering: Overview

Evaluation & PresentationWhat makes an answer good?

Bare answerLonger with justification

Page 27: Question-Answering: Overview

Evaluation & PresentationWhat makes an answer good?

Bare answerLonger with justification

Implementation vs Usability

QA interfaces still rudimentary Ideally should be

Page 28: Question-Answering: Overview

Evaluation & PresentationWhat makes an answer good?

Bare answerLonger with justification

Implementation vs Usability

QA interfaces still rudimentary Ideally should be

Interactive, support refinement, dialogic

Page 29: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-

70s)BASEBALL, LUNAR

Page 30: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-

70s)BASEBALL, LUNARLinguistically sophisticated:

Syntax, semantics, quantification, ,,,

Page 31: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-

70s)BASEBALL, LUNARLinguistically sophisticated:

Syntax, semantics, quantification, ,,,Restricted domain!

Page 32: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-

70s)BASEBALL, LUNARLinguistically sophisticated:

Syntax, semantics, quantification, ,,,Restricted domain!

Spoken dialogue systems (Turing!, 70s-current)SHRDLU (blocks world), MIT’s Jupiter , lots more

Page 33: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-

70s)BASEBALL, LUNARLinguistically sophisticated:

Syntax, semantics, quantification, ,,,Restricted domain!

Spoken dialogue systems (Turing!, 70s-current)SHRDLU (blocks world), MIT’s Jupiter , lots more

Reading comprehension: (~2000)

Page 34: Question-Answering: Overview

(Very) Brief HistoryEarliest systems: NL queries to databases (60-s-70s)

BASEBALL, LUNAR Linguistically sophisticated:

Syntax, semantics, quantification, ,,, Restricted domain!

Spoken dialogue systems (Turing!, 70s-current) SHRDLU (blocks world), MIT’s Jupiter , lots more

Reading comprehension: (~2000) Information retrieval (TREC); Information extraction

(MUC)

Page 35: Question-Answering: Overview

General Architecture

Page 36: Question-Answering: Overview

Basic StrategyGiven a document collection and a query:

Page 37: Question-Answering: Overview

Basic StrategyGiven a document collection and a query:Execute the following steps:

Page 38: Question-Answering: Overview

Basic StrategyGiven a document collection and a query:Execute the following steps:

Question processingDocument collection processingPassage retrievalAnswer processing and presentationEvaluation

Page 39: Question-Answering: Overview

Basic StrategyGiven a document collection and a query:Execute the following steps:

Question processingDocument collection processingPassage retrievalAnswer processing and presentationEvaluation

Systems vary in detailed structure, and complexity

Page 40: Question-Answering: Overview

AskMSRShallow Processing for QA

1 2

3

45

Page 41: Question-Answering: Overview

Deep Processing Technique for QA

LCC (Moldovan, Harabagiu, et al)

Page 42: Question-Answering: Overview

Query FormulationConvert question suitable form for IRStrategy depends on document collection

Web (or similar large collection):

Page 43: Question-Answering: Overview

Query FormulationConvert question suitable form for IRStrategy depends on document collection

Web (or similar large collection):‘stop structure’ removal:

Delete function words, q-words, even low content verbsCorporate sites (or similar smaller collection):

Page 44: Question-Answering: Overview

Query FormulationConvert question suitable form for IRStrategy depends on document collection

Web (or similar large collection):‘stop structure’ removal:

Delete function words, q-words, even low content verbsCorporate sites (or similar smaller collection):

Query expansion Can’t count on document diversity to recover word

variation

Page 45: Question-Answering: Overview

Query FormulationConvert question suitable form for IRStrategy depends on document collection

Web (or similar large collection):‘stop structure’ removal:

Delete function words, q-words, even low content verbsCorporate sites (or similar smaller collection):

Query expansion Can’t count on document diversity to recover word

variation Add morphological variants, WordNet as thesaurus

Page 46: Question-Answering: Overview

Query FormulationConvert question suitable form for IRStrategy depends on document collection

Web (or similar large collection):‘stop structure’ removal:

Delete function words, q-words, even low content verbsCorporate sites (or similar smaller collection):

Query expansion Can’t count on document diversity to recover word

variation Add morphological variants, WordNet as thesaurus Reformulate as declarative: rule-based

Where is X located -> X is located in

Page 47: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who

Page 48: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who -> PersonWhat Canadian city ->

Page 49: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who -> PersonWhat Canadian city -> CityWhat is surf music -> Definition

Identifies type of entity (e.g. Named Entity) or form (biography, definition) to return as answer

Page 50: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who -> PersonWhat Canadian city -> CityWhat is surf music -> Definition

Identifies type of entity (e.g. Named Entity) or form (biography, definition) to return as answerBuild ontology of answer types (by hand)

Train classifiers to recognize

Page 51: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who -> PersonWhat Canadian city -> CityWhat is surf music -> Definition

Identifies type of entity (e.g. Named Entity) or form (biography, definition) to return as answerBuild ontology of answer types (by hand)

Train classifiers to recognizeUsing POS, NE, words

Page 52: Question-Answering: Overview

Question ClassificationAnswer type recognition

Who -> PersonWhat Canadian city -> CityWhat is surf music -> Definition

Identifies type of entity (e.g. Named Entity) or form (biography, definition) to return as answerBuild ontology of answer types (by hand)

Train classifiers to recognizeUsing POS, NE, wordsSynsets, hyper/hypo-nyms

Page 53: Question-Answering: Overview
Page 54: Question-Answering: Overview
Page 55: Question-Answering: Overview

Passage RetrievalWhy not just perform general information

retrieval?

Page 56: Question-Answering: Overview

Passage RetrievalWhy not just perform general information

retrieval?Documents too big, non-specific for answers

Identify shorter, focused spans (e.g., sentences)

Page 57: Question-Answering: Overview

Passage RetrievalWhy not just perform general information

retrieval?Documents too big, non-specific for answers

Identify shorter, focused spans (e.g., sentences) Filter for correct type: answer type classificationRank passages based on a trained classifier

Features: Question keywords, Named Entities Longest overlapping sequence, Shortest keyword-covering span N-gram overlap b/t question and passage

Page 58: Question-Answering: Overview

Passage RetrievalWhy not just perform general information retrieval?

Documents too big, non-specific for answersIdentify shorter, focused spans (e.g., sentences)

Filter for correct type: answer type classificationRank passages based on a trained classifier

Features: Question keywords, Named Entities Longest overlapping sequence, Shortest keyword-covering span N-gram overlap b/t question and passage

For web search, use result snippets

Page 59: Question-Answering: Overview

Answer ProcessingFind the specific answer in the passage

Page 60: Question-Answering: Overview

Answer ProcessingFind the specific answer in the passagePattern extraction-based:

Include answer types, regular expressions

Similar to relation extraction:Learn relation b/t answer type and aspect of question

Page 61: Question-Answering: Overview

Answer ProcessingFind the specific answer in the passagePattern extraction-based:

Include answer types, regular expressions

Similar to relation extraction:Learn relation b/t answer type and aspect of question

E.g. date-of-birth/person name; term/definitionCan use bootstrap strategy for contexts, like Yarowsky

<NAME> (<BD>-<DD>) or <NAME> was born on <BD>

Page 62: Question-Answering: Overview

Answer ProcessingFind the specific answer in the passagePattern extraction-based:

Include answer types, regular expressions

Similar to relation extraction:Learn relation b/t answer type and aspect of question

E.g. date-of-birth/person name; term/definitionCan use bootstrap strategy for contexts<NAME> (<BD>-<DD>) or <NAME> was born on <BD>

Page 63: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learning

Page 64: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/test

Page 65: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/testSpecifically manually constructed/manually annotated

Page 66: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/testSpecifically manually constructed/manually annotated ‘Found data’

Page 67: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/testSpecifically manually constructed/manually annotated ‘Found data’

Trivia games!!!, FAQs, Answer Sites, etc

Page 68: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/testSpecifically manually constructed/manually annotated ‘Found data’

Trivia games!!!, FAQs, Answer Sites, etc Multiple choice tests (IP???)

Page 69: Question-Answering: Overview

ResourcesSystem development requires resources

Especially true of data-driven machine learningQA resources:

Sets of questions with answers for development/testSpecifically manually constructed/manually annotated ‘Found data’

Trivia games!!!, FAQs, Answer Sites, etc Multiple choice tests (IP???) Partial data: Web logs – queries and click-throughs

Page 70: Question-Answering: Overview

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchy

Page 71: Question-Answering: Overview

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchyWikipedia

Page 72: Question-Answering: Overview

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchyWikipediaWeb itself….

Term management:Acronym listsGazetteers ….

Page 73: Question-Answering: Overview

Software ResourcesGeneral: Machine learning tools

Page 74: Question-Answering: Overview

Software ResourcesGeneral: Machine learning toolsPassage/Document retrieval:

Information retrieval engine:Lucene, Indri/lemur, MG

Sentence breaking, etc..

Page 75: Question-Answering: Overview

Software ResourcesGeneral: Machine learning toolsPassage/Document retrieval:

Information retrieval engine:Lucene, Indri/lemur, MG

Sentence breaking, etc..Query processing:

Named entity extractionSynonymy expansionParsing?

Page 76: Question-Answering: Overview

Software ResourcesGeneral: Machine learning toolsPassage/Document retrieval:

Information retrieval engine:Lucene, Indri/lemur, MG

Sentence breaking, etc..Query processing:

Named entity extraction Synonymy expansion Parsing?

Answer extraction: NER, IE (patterns)

Page 77: Question-Answering: Overview

EvaluationCandidate criteria:

RelevanceCorrectnessConciseness:

No extra informationCompleteness:

Penalize partial answersCoherence:

Easily readable Justification

Tension among criteria

Page 78: Question-Answering: Overview

EvaluationConsistency/repeatability:

Are answers scored reliability

Page 79: Question-Answering: Overview

EvaluationConsistency/repeatability:

Are answers scored reliability?

Automation:Can answers be scored automatically?Required for machine learning tune/test

Page 80: Question-Answering: Overview

EvaluationConsistency/repeatability:

Are answers scored reliability?

Automation:Can answers be scored automatically?Required for machine learning tune/test

Short answer answer keys Litkowski’s patterns

Page 81: Question-Answering: Overview

EvaluationClassical:

Return ranked list of answer candidates

Page 82: Question-Answering: Overview

EvaluationClassical:

Return ranked list of answer candidates Idea: Correct answer higher in list => higher score

Measure: Mean Reciprocal Rank (MRR)

Page 83: Question-Answering: Overview

EvaluationClassical:

Return ranked list of answer candidates Idea: Correct answer higher in list => higher score

Measure: Mean Reciprocal Rank (MRR)For each question,

Get reciprocal of rank of first correct answerE.g. correct answer is 4 => ¼None correct => 0

Average over all questions

Page 84: Question-Answering: Overview

Dimensions of TREC QAApplications

Page 85: Question-Answering: Overview

Dimensions of TREC QAApplications

Open-domain free text searchFixed collections News, blogs

Page 86: Question-Answering: Overview

Dimensions of TREC QAApplications

Open-domain free text searchFixed collections News, blogs

UsersNovice

Question types

Page 87: Question-Answering: Overview

Dimensions of TREC QAApplications

Open-domain free text searchFixed collections News, blogs

UsersNovice

Question typesFactoid -> List, relation, etc

Answer types

Page 88: Question-Answering: Overview

Dimensions of TREC QAApplications

Open-domain free text searchFixed collections News, blogs

UsersNovice

Question typesFactoid -> List, relation, etc

Answer typesPredominantly extractive, short answer in context

Evaluation:

Page 89: Question-Answering: Overview

Dimensions of TREC QA Applications

Open-domain free text searchFixed collections News, blogs

UsersNovice

Question typesFactoid -> List, relation, etc

Answer typesPredominantly extractive, short answer in context

Evaluation:Official: human; proxy: patterns

Presentation: One interactive track