26
Aware Discovering characteristics of habitable question answering systems with iterative formative evaluation Bill Ogden Ron Zacharski New Mexico State University

Aware Discovering characteristics of habitable question answering systems with iterative formative evaluation Bill Ogden Ron Zacharski New Mexico State

Embed Size (px)

Citation preview

Aware

Discovering characteristics of habitable question answering systems with iterative

formative evaluation

Bill OgdenRon Zacharski

New Mexico State University

CRLCOMPUTINGRESEARCHLABORATORY

Habitability

• Watt (1968) – A language is considered habitable if users can

express everything that is needed for a task using language they would expect the system to understand.

– Describes how easily, naturally, and effectively users can use language to express themselves within the constraints of a system language.

– If there are 26 ways that a user population would be likely to ask a question, a habitable system will process all 26.

CRLCOMPUTINGRESEARCHLABORATORY

Today

• Progress w/ three methodologies– User protocol analysis– Wizard of Oz dialog– Formative evaluation

• Collaboration

CRLCOMPUTINGRESEARCHLABORATORY

Project Goals

• Identify and/or develop new interface elements aiding Q&A visibility.

• Innovative user interfaces are best achieved through iterative user testing

• New evaluation methodologies will emerge from interactive Q&A user testing.

CRLCOMPUTINGRESEARCHLABORATORY

User Protocol Task

• e.g. “You are not sure about the safety of genetically engineered foods, and would like to find more information and research on this topic. Name four potential types of safety problems that have been raised.”

• 8 tasks, 6 users, 2-3 sessions.

• Recorded screen and voice.

• User protocols are now being analyzed

CRLCOMPUTINGRESEARCHLABORATORY

CRLCOMPUTINGRESEARCHLABORATORY

Dialog pre-evaluation

• We used primarily Wizard responses.

• Two NIST analysts, each with 10 tasks.

• Surprises:– Users asked complex questions– and were satisfied with simple answers.

CRLCOMPUTINGRESEARCHLABORATORY

Referring Expressions in the Wizard of Oz Study

• Kehler (2000) in an analysis of referring expressions in a Woz study simulating a multimodal travel guide application

• While reference resolution for human-human conversations is extremely difficult, reference resolution for human-computer conversations can be computed by means of a simple model.

• He found that all third person pronominal forms (24% of the referring expressions) referred to an entity introduced by an NP or displayed as an object on the display.

CRLCOMPUTINGRESEARCHLABORATORY

Referring expressions

• 217 total

• Third person pronominal forms accounted for only 3.6% of referring expressions

• Only 11% where anaphoric expressions and 1/3 of these did not have NP antecedents.

CRLCOMPUTINGRESEARCHLABORATORY

Examples

Definite Determiner

the operating system used by microsoft has been infiltrated with a computer virus which disrupts the overall system operation. I would like to know what microsoft is doing to correct the situation.

Demonstrative Determiner

i would like to know what microsoft is doing to correct the situation and what statements they have made as to their progress in this problem solution

CRLCOMPUTINGRESEARCHLABORATORY

Examples

There was only one occurrence of a third person pronoun whose referent was introduced in an earlier sentence by the system:

Wizard: Perhaps the following is relevent: I can find reports of nine recent joint ventures with Japanese companies. Do you want to view all or just the first one

User:: all of them

CRLCOMPUTINGRESEARCHLABORATORY

START Evaluation

• START Natural language system

• Developed by Boris Katz at MIT's Artificial Intelligence Laboratory

• Answers questions in English about the MIT AI Laboratory, geography, and assorted other topics

CRLCOMPUTINGRESEARCHLABORATORY

Iterative Formative Evaluation

• Short, empirical design evaluation studies

• Focus is on improvement, not validation

• Fixing ‘details’ often leads to expected productivity gains

CRLCOMPUTINGRESEARCHLABORATORY

START examples

• ===> which countries export tea?• Sorry - I don't know.

• ===> what does Burundi export? • Burundi• Exports - commodities:

coffee, tea, sugar, cotton, hides

• ===> Does Burundi export tea? • I'm not following what you're saying. Please try it a different way

• ===> Does China export tea? • I am sorry to say I don't know whether or not China exports tea.

CRLCOMPUTINGRESEARCHLABORATORY

START examples

• ===> Is Montana bigger than California? • Unfortunately, I don't know whether Montana is bigger than

California.

• ===> Is Mars bigger than Mercury?• Mars has a diameter of 6,780 km, and Mercury has a diameter

of 4,880 km. • Mars' diameter is 1,900 km larger than Mercury's .

• Source: Planetary Sciences at the National Space Science Data Center

CRLCOMPUTINGRESEARCHLABORATORY

START examples

• ===> How big is Montana? • The area of Montana is 145,556 square miles.• Source: START KB • ===> How big is California? • There are 155,973 square miles in California.• Source: START KB

CRLCOMPUTINGRESEARCHLABORATORY

START Evaluation

• “find the three biggest: countries in Asia, states in the US, planets.

• Order made a difference.• Users will echo systems responses

CRLCOMPUTINGRESEARCHLABORATORY

START Evaluation

• QUERY: What are the three largest countries in Asia?

• START's reply

• The following are the countries ranked in the top

three places for area among countries in Asia. • China

• India

• Indonesia

CRLCOMPUTINGRESEARCHLABORATORY

CRLCOMPUTINGRESEARCHLABORATORY

START Evaluation

• QUERY: What are the states ranked in the top three places for area among the states in the United States?

• START's reply• The word RANKED may be misspelled. Please

choose one of the following: • Ranged • Yanked • Accept Word • Abort

CRLCOMPUTINGRESEARCHLABORATORY

Decomposition in START

• QUERY: What is the population of the capitol of Greenland?

• I don't know the answer to your question. • QUERY: What is the capitol of Greenland?• Nuuk is the capital of Greenland. • QUERY: What is the population of Nuuk?• Main Entry: Nuuk

Pronunciation: 'nükVariant(s): or Godt·håb /'got-"hop/Usage: geographical nametown capital of Greenland on SW coast population 12,181

CRLCOMPUTINGRESEARCHLABORATORY

Solutions.

• Fix the system

• Use dialog“I don’t know if Montana is bigger than Texas but I have information you can use to calculate it yourself”

• Give partial answers.Montana area: 145,556 square miles

California area: 155,973 square miles

CRLCOMPUTINGRESEARCHLABORATORY

Project Goals for Collaboration

– Identify the characteristics of habitable Aquaint systems

– Use prototype Aquaint systems for iterative formative evaluation

CRLCOMPUTINGRESEARCHLABORATORY

Remaining Issue

• Using surrogate users/tasks may not capture ‘real’ Q&A user behavior

• We are looking for ways to observe users who are working on their own questions.– Lack of control will be offset by richness of

behavior