Talk Schedule Question Answering from Email Bryan Klimt July 28, 2005

Talk Schedule Question Answering from Email

Bryan KlimtJuly 28, 2005

Project Goals• To build a practical working question

answering system for personal email

• To learn about the technologies that go into QA (IR,IE,NLP,MT)

• To discover which techniques work best and when

System Overview

Dataset

• 18 months of email (Sept 2003 to Feb 2005)

• 4799 total• 196 are talk announcements• hand labelled and annotated

• 478 questions and answers

A new email arrives…

• Is it a talk announcement?

• If so, we should index it.

Email Classifier

Email Data

LogisticRegression

DecisionLogisticRegressionCombo

Classification Performance

• precision = 0.81• recall = 0.66• (previous works had better

performance)

• Top features:– abstract, bio, speaker, copeta, multicast,

esm, donut, talk, seminar, cmtv, broadcast, speech, distinguish, ph, lectur, ieee, approach, translat, professor, award

Annotator• Use Information Extraction

techniques to identify certains types of data in the emails– speaker names and affiliations– dates and times– locations– lecture series and titles

Annotator

Rule-based Annotator

• Combine regular expressions and dictionary lookups

• defSpanType date =:...[re('^\d\d?$') ai(dayEnd)?

ai(month)]...;

• matches “23rd September”

Conditional Random Fields

• Probabilistic framework for labelling sequential data

• Known to outperform HMMs (relaxation of independence assumptions) and MEMMs (avoid “label bias” problem)

• Allow for multiple output features at each node in the sequence

Rule-based vs. CRFs

Rule-based vs. CRFs

• Both results are much higher than in previous study

• For dates, times, and locations, rules are easy to write and perform extremely well

• For names, titles, affiliations, and series, rules are very difficult to write, and CRFs are preferable

Template Filler• Creates a database record for each talk

announced in the email• This database is used by the NLP answer

extractor

Filled TemplateSeminar {

title = “Keyword Translation from English toChinese for Multilingual QA”

name = Frank Lintime = 5:30pmdate = Thursday, Sept. 23location = 4513 Newell Simon Hallaffiliation = series =

}

Search Time• Now the email is index• The user can ask questions

IR Answer ExtractorWhere is Frank Lin’s talk?

0.5055 3451.txtsearch[468:473]: "frank"search[2025:2030]: "frank"search[474:477]: "lin”

0.1249 2547.txtsearch[580:583]: "lin”

0.0642 2535.txtsearch[2283:2286]: "lin"

• Performs a traditional IR (TF-IDF) search using the question as a query

• Determines the answer type from simple heuristics (“Where”->LOCATION)

IR Answer Extractor

NL Question Analyzer

• Uses Tomita Parser to fully parse questions to translate them into a structured query language

• “Where is Frank Lin’s talk?”• ((FIELD LOCATION)

(FILTER (NAME “FRANK LIN”)))

NL Answer Extractor

• Simply executes the structured query produced by the Question Analyzer

• ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”)))

• select LOCATION from seminar_templates where NAME=“FRANK LIN”;

Results

• NL Answer Extractor -> 0.870• IR Answer Extractor -> 0.755

Answer Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NL Answer Extractor IR Answer Extractor

Results• Both answer extractors have similar

(good) performance• IR based extractor

– easy to implement (1-2 days)– better on questions w/ titles and names– very bad on yes/no questions

• NLP based extractor– more difficult to implement (4-5 days)– better on questions w/ dates and times

Examples• “Where is the lecture on dolphin language?”

– NLP Answer Extractor: Fails to find any talk– IR Answer Extractor: Finds the correct talk– Actual Title: “Natural History and Communication of

Spotted Dolphin, Stenella Frontalis, in the Bahamas”

• “Who is speaking on September 10?”– NLP Extractor: Finds the correct record(s)– IR Extractor: Extracts the wrong answer– A talk “10 am, November 10” ranks higher than one

on “Sept 10th”

Future Work• Add an annotation “feedback loop” for the

classifier

• Add a planner module to decide which answer extractor to apply to each individual question

• Tune parameters for classifier and TF-IDF search engine

• Integrate into a mail client!

Conclusions• Overall performance is good enough for the

system to be helpful to end users

• Both rule-based and automatic annotators should be used, but for different types of annotations

• Both IR-based and NLP-based answer extractors should be used, but for different types of questions

DEMO

Documents

Talk Schedule Question Answering from Email Bryan Klimt July 28, 2005