8
22 August 2003 CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information Sciences Institute

22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Embed Size (px)

Citation preview

Page 1: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

22 August 2003 CLEF 2003

Answering Spanish Questions from English Documents

Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob

USC Information Sciences Institute

Page 2: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Outline

• Development collection

• Choosing a cross-language approach

• TextMap-TMT architecture

• What did we learn?

Page 3: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Cross-Language QA

• Evaluation conditions:– English documents (LA Times) – 200 questions (we chose Spanish)– Exact answers

• ISI development collection:– English documents (TREC-2003 QA track)– 100 Spanish questions (translated, from TREC)– Answer patterns

Page 4: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Design Space

• Architecture– Question translation + English QA– Document translation + Spanish QA– Mix of language-specific + translation components

• Translation approaches– Statistical MT, trained on European Parliament– Transfer-method MT (Systran on the Web)– Human translation (as an upper bound)

Page 5: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

TextMap-TMT Architecture

Systran Contex

Web IR

CLEF IR

Answer Pinpointing

Answer Pinpointing

Spanish Question English Question Answer Type Reformulation Patterns Question Parse Tree

web

CLEF Corpus

Top 300 sentences

Top 3 answers

Top 300 sentences

Top 5 answers

Answer Validation web

Answer

[“Alaska became a state on”] OR [Alaska became a state in”] OR …

[Alaska (49)] [state (7)] [{January 3 1959} OR {1867} OR {1959} (3)]

cuanto->whichever

whichever->“how many””

DATEISLANDBASEBALL-SPORTS-TEAM

Optional

Page 6: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Development Test Results

6

20

28

31

35

0 10 20 30 40 50 60 70 80 90 100

SMT-all

TMT-all

TMT-search

TMT-pinpoint

English

Correct Answers

Page 7: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Official Results

43

5

157

53

3

144

69

8

123

Supported

Unsupported

Wrong

Top-1, ValidatedTop-1, Not validated

Top-3, Not validated

Page 8: 22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information

Lessons Learned• Cross-Language QA is a tractable problem

– Better than 25% @ top 1, almost 40% @ top 3!

• Our best MT systems are statistical– But our best QA systems are heavily rule-based

• Virtually every component needs to be redone– As complex as making a new monolingual system

• Strong synergy with CLIR is possible– Web search, local collection search