Question Answering - Emory Universitytjurczy/qa/QAPresentation.pdf · Evaluating the distance...

Question AnsweringApproaches towards better human questions

answering

Tomasz JurczykEmory NLP Group Meeting

February 16th, 2015

Information Overload“Getting information off the Internet is like taking a drink from fire hydrant”

~Mitchell Kapor

Question Answering● Intersection of Information Retrieval and Natural

Language Processing● Query structured database of knowledge (knowledge

base)● Able to pull an answer from an unstructured collection of

natural language documents● Variety of question types (open-domain, closed-domain,

factual etc.)

Some challenges in QA● Question types● Processing & context● Data sources & answer extraction● Specific needs for QA systems (real time question

answering, multilingual etc.)● Information clustering

Existing projects● Watson (IBM)● START Natural Language Question Answering System

(MIT)● Google Search

Watson● Won Jeopardy on February 16, 2011!

Google Search

Mapping Dependencies Trees● Evaluating the distance between a question and an

answer candidate● Distance is calculated in an approximate tree matching

algorithm○ Distance is the cost of doing sequences

(add/delete/modify) to transform one tree to another

Mapping Dependencies Trees: An Application to Question Answering ∗, Vasin Punyakanok, Dan Roth, Wen-tau Yih

Bag of words?Q: What is the fastest car in the world?

CA1: The Jaguar XJ220 is the dearest (415000 pounds), fastest (217mph) and most sought after car in the world.CA2: (...) will stretch Volkswagen’s lead in the world’s fastest growing vehicle market.

Dependency trees matching distance

Measurements● MAP (Mean Average Precision)

○ Mean average precision for a set of queries is the mean of the average precision scores for each query.

● MRR (Mean Reciprocal Rank)○ The mean reciprocal rank is the average of the reciprocal

ranks of results for a sample of queries

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Passage Retrieval Using Dependency Relations

● Fuzzy relation matching based on statistical models

● Two methods for learning relation mapping scores from past QA pairs: 1. Mutual information2. Expectation maximization

Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua., Question Answering Passage Retrieval Using Dependency Relations

Extracting and Pairing Relation Paths

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Passage Retrieval (2005) 0.427 0.526

Jeopardy Model - A Quasi-Synchronous Grammar for QA

● Used probabilistic quasi-synchronous grammar● Parameterized by mixtures of a robust non-lexical

syntax/alignment model ○ 3 adjustments in their model

■ Bayes’ rule■ Labeled, structured dependency tree■ Alignment between question and answer words

Mengqiu Wang and Noah A. Smith and Teruko Mitamura, What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA

Alignment relations

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Tree Edit Models● Tree edit models for representing sequences of tree

transformations● Similar to the Mapping Dependencies Trees, but more advanced

○ Used 6 main operations that are mixes of move, delete, merge, relabel etc.

● Greedy best-first search used to search sensible edit sequences (using Tree Kernel Heuristic)

● Defined constraints on the Search Space● Trained a logistic regression classification model

○ 33 features that consists of number/type of edits, node types etc.

Michael Heilman Noah A. Smith, Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions

A Tree Edit Sequence

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Tree Edit Models (2010) 0.609 0.692

Probabilistic Tree-Edit Models with Structured Latent Variables

Recognizing Textual Entailment

Gabriel Garcia Marquez is a novelist and winner of the Nobel prize for literature

Gabriel Garcia Marquez won the Nobel for Literature.)

Mengqiu Wang, Christopher D. Manning, Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering

Text Edits Technique● Similar idea to the previous approach with text edits

○ 45 edit operations (12 delete, 12 insert, 21 substitute)● Designed a Finite-State Machine (each edit operation is mapped to

a unique state, and an edit sequence is mapped into a transition sequence)

● The probability of an edit sequence is calculated based on mix of features (word-matching features, tree structure features)

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Probabilistic TEM (2010) 0.595 0.695

Answer Extraction as Sequence Tagging with Tree Edit Distance

● Extended work of Tree Edit Models○ Added synonyms, entailment and causing verbs, parts-

of/member-of entities

Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, Peter Clark, Answer Extraction as Sequence Tagging with Tree Edit Distance

Answer extraction using Conditional Random Field

● Sequence tagging by three states: start/middle/end● Features used by CRF:

○ Chunking (kind of silly is unlikely to be an answer, while in 90 days is)

○ Question-type (how many questions expect numerical answer types)

○ Edit script (during sequencing, words are deleted/renamed and they could be an answer)

○ Alignment distance (a candidate answer often appears close to an aligned word)

● Then, applied voting mechanism to find an answer

Example Prediction Trace

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Sequence Tagging (2013) 0.631 0.748

Enhanced Lexical Semantic Models● Designed Lexical Semantic Models

○ Synonymy and Antonymy○ Hypernymy and Hyponymy

■ Class-Inclusion or Is-A relation (What color is Saturn? → Saturn is a giant gas planet with brown and beige clouds.

○ Semantic Word Similarity● Learning QA Matching Models

○ Bag of Words○ Learning Latent Structures

■ Look like a Latent-SVM (different learning formulations and replaced decision function)

Wen-tau Yih Ming-Wei Chang Christopher Meek Andrzej Pastusiak, Question Answering Using Enhanced Lexical Semantic Models

Relations Between Text

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Lexical Sem. M. (2013) 0.709 0.770

Automatic Feature Engineering for Answer Selection and Extraction

● Trained SVM with tree kernels to train an answer sentence classifier

● Trained Kernel-based classifier to select the best answer

Aliaksei Severyn, Alessandro Moschitti, Automatic Feature Engineering for Answer Selection and Extraction

Method MAP MRR

Mapping DT (2004) 0.419 0.494

Jeopardy Model (2007) 0.603 0.685

Lexical Semantic M. (2013) 0.709 0.770

AFE (2013) 0.678 0.736

Summary

● Current SOTA (presented) has ~70% accuracy

● A great need for more accurate QA systems

Thanks!

Questions?

Question Answering - Emory Universitytjurczy/qa/QAPresentation.pdf · Evaluating the distance...

Documents

EMORY PARTICIPATING PROVIDERS BY ENTITY Emory …EMORY PARTICIPATING PROVIDERS BY ENTITY Emory University Hospital Aaron MD, Maria Elizabeth M Abbott MD, Shawn Renee ... Suzanne Marie

EMORY UNIVERSITYmsacd.emory.edu/documents/emory.msacdbrochure.pdfEMORY UNIVERSITY EMORY UNIVERSITY SCHOOL OF MEDICINE ËAß Emory Neurodevelopmental Exposure Clinic nor AS Created

Emory University Hospital Midtown€¦ · EMORY UNIVERSITY HOSPITAL MIDTOWN COMMUNITY HEALTH NEEDS ASSESSMENT July 2016 Page 3 OVERVIEW OF EMORY HEALTHCARE AND EMORY UNIVERSITY HOSPITAL

Emory Transcript

The Emory EditionThe Emory Editionrxresidency.emory.edu/images/Documents/2014 Newsletter.pdf · The Emory EditionThe Emory Edition Emory University Hospital Recognized as a Magnet

Dieter Jaeger Department of Biology Emory University djaeger@emory

The Emory Edition - Emory Pharmacy Residency Program

Emory Sports Medicine Center and Emory …...Emory Sports Medicine Center and Emory University School of Medicine present 13th Annual Emory Sports Medicine Symposium: An Interactive

Speed, Distance, Time, Velocity, and Acceleration Quiz Review Distance Time...SPEED, DISTANCE, TIME, VELOCITY, AND ACCELERATION QUIZ REVIEW. QUESTION #1 Write down the equations for:

Speed, Distance, Time, Velocity, and Acceleration Quiz · PDF fileSPEED, DISTANCE, TIME, VELOCITY, AND ACCELERATION QUIZ REVIEW. QUESTION #1 Write down the equations for: ... the distance

Emory university2

Emory University biographical files, circa 1880-1990 EMORY

Motion Distance and Displacement - Livingston Public Schools...Distance and Displacement Essential Question: How are distance and displacement different? Definition: quantities that

Emory Law Scholarly Commons | Emory University School of

Emory S. Bogardus Social distance and its origins & Measuring social distance Referat von Désirée Nevries, Thomas Petri, Sophia Schönborn, Ina Schulz

Emory Cr9m9

PDF Time Speed and Distance Question Bank for SSC and BANK exams

PERIYAR UNIVERSITY DISTANCE EDUCATION (PRIDE) · PDF filePERIYAR UNIVERSITY DISTANCE EDUCATION (PRIDE), ... The Chalukyas and the Rashtrakutas ... Eastern question – Balkan Wars

Emory Prevention Research Center Emory Prevention Research Center Emory Cancer Prevention and Control Research Network (CPCRN)

Emory House