31
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003 Bernardo Magnini*, Simone Romagnoli*, Alessandro Vallin* Jesús Herrera**, Anselmo Peñas**, Víctor Peinado**, Felisa Verdejo** Maarten de Rijke*** * ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy {magnini,romagnoli,vallin}@itc.it ** UNED, Spanish Distance Learning University, Madrid – Spain {jesus.herrera,anselmo,victor,felisa}@lsi.uned.es *** Language and Inference Technology Group, ILLC, University of Amsterdam - The Netherlands

The Multiple Language Question Answering Track at CLEF 2003

  • Upload
    keiji

  • View
    32

  • Download
    2

Embed Size (px)

DESCRIPTION

The Multiple Language Question Answering Track at CLEF 2003. Bernardo Magnini*, Simone Romagnoli*, Alessandro Vallin* Jes ús Herrera**, Anselmo Peñas**, Víctor Peinado**, Felisa Verdejo** Maarten de Rijke*** * ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy - PowerPoint PPT Presentation

Citation preview

Page 1: The Multiple Language  Question Answering Track at CLEF 2003

CLEF – Cross Language Evaluation Forum

Question Answering at CLEF 2003 (http://clef-qa.itc.it)

The Multiple Language

Question Answering Track at CLEF 2003

Bernardo Magnini*, Simone Romagnoli*, Alessandro Vallin*

Jesús Herrera**, Anselmo Peñas**, Víctor Peinado**, Felisa Verdejo**

Maarten de Rijke***

* ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy

{magnini,romagnoli,vallin}@itc.it

** UNED, Spanish Distance Learning University, Madrid – Spain

{jesus.herrera,anselmo,victor,felisa}@lsi.uned.es

*** Language and Inference Technology Group, ILLC, University of Amsterdam - The Netherlands

[email protected]

Page 2: The Multiple Language  Question Answering Track at CLEF 2003

Outline

Overview of the Question Answering track at CLEF 2003

• Report on the organization of QA tasks

• Present and discuss the participants’ results

• Perspectives for future QA campaigns

Page 3: The Multiple Language  Question Answering Track at CLEF 2003

Question Answering

• QA: find the answer to an open domain question in a large collection of documents

INPUT: questions (instead of keyword-based queries)

OUTPUT: answers (instead of documents)

• QA track at TREC

– Mostly fact-based questions

Question: Who invented the electric light?

Answer: Edison

• Scientific Community

– NLP and IR

– AQUAINT program in USA

• QA as an applicative scenario

Page 4: The Multiple Language  Question Answering Track at CLEF 2003

Multilingual QA

Purposes:

• Answers may be found in languages different from the language of the question

• Interest in QA systems for languages other than English

• Force the QA community to design real multilingual systems

• Check/improve the portability of the technologies implemented in current English QA systems

• Creation of reusable resources and benchmarks for further multilingual QA evaluation

Page 5: The Multiple Language  Question Answering Track at CLEF 2003

QA at CLEF 2003 - Organization

“QA@CLEF” WEB SITE ( http://clef-qa.itc.it )

CLEF QA MAILING LIST ( [email protected] )

GUIDELINES FOR THE TRACK (following the model of TREC 2001)

Page 6: The Multiple Language  Question Answering Track at CLEF 2003

Tasks at CLEF 2003

200 question

s

target corpus

exact answers

50 bytes answers

Page 7: The Multiple Language  Question Answering Track at CLEF 2003

QA Tasks at CLEF 2003

Monolingual Bilingual English

Q-set Assessment Q-set Assessment

Italian ITC-irst ITC-irst ITC-irst NIST

DutchU. Amsterdam U. Amsterdam ITC-irst

U. Amsterdam

NIST

SpanishUNED UNED ITC-irst

UNED

NIST

FrenchITC-irst

U. Montreal

NIST

GermanITC-irst

DFKI

NIST

Page 8: The Multiple Language  Question Answering Track at CLEF 2003

Tasks at CLEF 2003

Monolingual Bilingual English

Q-set Assessment Q-set Assessment

Italian ITC-irst ITC-irst ITC-irst NIST

DutchU. Amsterdam U. Amsterdam ITC-irst

U. Amsterdam

NIST

SpanishUNED UNED ITC-irst

UNED

NIST

FrenchITC-irst

U. Montreal

NIST

GermanITC-irst

DFKI

NIST

1

1

1 1

0

1

3

1

Page 9: The Multiple Language  Question Answering Track at CLEF 2003

Bilingual against English

English questionsQuestion extraction

Italian questions

Translation

English answersQA system Assessment

English textcollection

1 p/m for 200 questions2 p/d for 200 questions

4 p/d for 1 run(600 answers)

Page 10: The Multiple Language  Question Answering Track at CLEF 2003

Document Collections

Corpora licensed by CLEF in 2002:

• Dutch Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995)

• Italian La Stampa and SDA press agency (1994)

• Spanish EFE press agency (1994)

• English Los Angeles Times (1994)

MONOLINGUAL TASKS

BILINGUAL TASK

Page 11: The Multiple Language  Question Answering Track at CLEF 2003

Creating the Test Collection

CLEF Topics

150 q/aDutch

150 q/aItalian

150 q/aSpanish

MONOLINGUAL TEST SETS

150 Dutch/English

150 Italian/English

150 Spanish/English

ENGLISH

QUESTIONS SHARING

ILLC ITC-irst UNED

300Ita+Spa

300Dut+Spa

300Ita+Dut

NEW TARGET LANGUAGES

ENGLISH

the DISEQuA corpus

DATA MERGING

Page 12: The Multiple Language  Question Answering Track at CLEF 2003

Questions

200 fact-based questions for each task:

- queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora;

- coverage of different categories of questions: date, location, measure, person, object, organization, other;

- questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL”

Page 13: The Multiple Language  Question Answering Track at CLEF 2003

Questions

200 fact-based questions for each task:

- queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora

- coverage of different categories of questions (date, location, measure, person, object, organization, other)

- questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL”

- definition questions (“Who/What is X”)

- Yes/No questions

- list questions

Page 14: The Multiple Language  Question Answering Track at CLEF 2003

Answers

Participants were allowed to submit up to three answers per question and up to two runs:

- answers must be either exact (i.e. contain just the minimal information) or 50 bytes long strings

- answers must be supported by a document

- answers must be ranked by confidence

Answers were judged by human assessors, according to four categories:

• CORRECT (R)• UNSUPPORTED (U)• INEXACT (X)• INCORRECT (W)

Page 15: The Multiple Language  Question Answering Track at CLEF 2003

Judging the Answers

Question and judged responses Comment

What museum is directed by Henry Hopkins?W 1 irstex031bi 1 3253 LA011694-0094 Modern ArtU 1 irstex031bi 2 1776 LA011694-0094 UCLAX 1 irstex031bi 3 1251 LA042294-0050 Cultural Center

The second answer was correct but the document retrieved was not relevant. The third response missed bits of the name, and was judged non-exact.

Where did the Purussaurus live before becoming extinct?W 2 irstex031bi 1 9 NIL

The system erroneously “believed” that the query had no answer in the corpus, or could not find one.

When did Shapour Bakhtiar die?R 3 irstex031bi 1 484 LA012594-0239 1991W 3 irstex031bi 2 106 LA012594-0239 Monday

In the questions that asked for the date of an event, the year was often regarded as sufficient.

Who is John J. Famalaro accused of having killed?W 4 irstex031bi 1 154 LA072294-0071 ClarkR 4 irstex031bi 2 117 LA072594-0055 HuberW 4 irstex031bi 3 110 LA072594-0055 Department

The second answer, that returned the victim’s last name, was considered sufficient and correct, since in the document retrieved no other people named “Huber” were mentioned.

Page 16: The Multiple Language  Question Answering Track at CLEF 2003

Evaluation Measures

The score for each question was the reciprocal of the rank of the first answer to be found correct; if no correct answer was returned, the score was 0.

The total score, or Mean Reciprocal Rank (MRR), was the mean score over all questions.

In STRICT evaluation only correct (R) answers scored points.

In LENIENT evaluation the unsupported (U) answers were considered correct, as well.

Page 17: The Multiple Language  Question Answering Track at CLEF 2003

Participants

GROUP TASK RUN NAMEDLSI-UAUniversity of Alicante, Spain

Monolingual Spanishalicex031ms

alicex032ms

UVAUniversity of Amsterdam, The Netherlands

Monolingual Dutchuamsex031md

uamsex032md

ITC-irstItaly

Monolingual Italian

Bilingual Italian

irstex031mi

irstst032mi

irstex031bi

irstex032bi

ISIUniversity of Southern California, USA

Bilingual Spanishisixex031bs

isixex032bs

/ Bilingual Dutch /

DFKIGermany

Bilingual German dfkist031bg

CS-CMUCarnegie Mellon University, USA

Bilingual Frenchlumoex031bf

lumoex032bf

DLTGUniversity of Limerick, Ireland

Bilingual Frenchdltgex031bf

dltgex032bf

RALIUniversity of Montreal, Canada

Bilingual Frenchudemst031bf

udemex032bf

Page 18: The Multiple Language  Question Answering Track at CLEF 2003

Participants in past QA tracks

Comparison between the number and place of origin of the participants in the past TREC and in this year’s CLEF QA tracks:

PARTICIPANTSNo. of

submitted runsUnited States

CanadaEurope Asia Australia TOTAL

TREC-8 13 3 3 1 20 46

TREC-9 14 7 6 / 27 75

TREC-10 19 8 8 / 35 67

TREC-11 16 10 6 / 32 67

CLEF 2003 3 5 / / 8 17

Page 19: The Multiple Language  Question Answering Track at CLEF 2003

Performances at TREC-QA

• Evaluation metric: Mean Reciprocal Rank (MRR)1

rank of the correct answer

• Best result

• Average over 67 runs

/ 500

TREC-8 TREC-9 TREC-10

66%25% 58%

24%

67%23%

Page 20: The Multiple Language  Question Answering Track at CLEF 2003

Results - EXACT ANSWERS RUNS

MONOLINGUAL TASKS

GROUP TASK RUN NAME MRRNo. of Q. with

at least one right answer

NIL Questions

strict lenient strict lenient returned correctly returned

DLSI-UAMonolingual

Spanish

alicex031ms .307 .320 80 87 21 5

alicex032ms .296 .317 70 77 21 5

ITC-irstMonolingual

Italianirstex031mi .422 .442 97 101 4 2

UVAMonolingual

Dutch

uamsex031md .298 .317 78 82 200 17

uamsex032md .305 .335 82 89 200 17

Page 21: The Multiple Language  Question Answering Track at CLEF 2003

Results - EXACT ANSWERS RUNS

MONOLINGUAL TASKS

0

0,1

0,2

0,3

0,4

0,5

0,6

alic

ex03

1ms

alic

ex03

2ms

irste

x031

mi

uam

sex0

31m

d

uam

sex0

32m

d

RUN

MR

R

strict lenient

Page 22: The Multiple Language  Question Answering Track at CLEF 2003

Results - EXACT ANSWERS RUNS

CROSS-LANGUAGE TASKS

GROUP TASK RUN NAME MRRNo. of Q. with

at least one right answer

NIL Questions

strict lenient strict lenient returned correctly returned

ISIBilingual Spanish

isixex031bs .302 .328 69 77 4 0

isixex032bs .271 .307 68 78 4 0

ITC-irstBilingual

Italian

irstex031bi .322 .334 77 81 49 6

irstex032bi .393 .400 90 92 28 5

CS-CMUBilingual French

lumoex031bf .153 .170 38 42 92 8

lumoex032bf .131 .149 31 35 91 7

DLTGBilingual French

dltgex031bf .115 .120 23 24 119 10

dltgex032bf .110 .115 22 23 119 10

RALIBilingual French

udemex032bf .140 .160 38 42 3 1

Page 23: The Multiple Language  Question Answering Track at CLEF 2003

Results - EXACT ANSWERS RUNS

CROSS-LANGUAGE TASKS

0

0,1

0,2

0,3

0,4

0,5

0,6is

ixex

031b

s

isix

ex03

2bs

irste

x031

bi

irste

x032

bi

lum

oex0

31bf

lum

oex0

32bf

dltg

ex03

1bf

dltg

032b

f

udem

ex03

2bf

RUN

MR

R

strict lenient

Page 24: The Multiple Language  Question Answering Track at CLEF 2003

Results - 50 BYTES ANSWERS RUNS

MONOLINGUAL TASKS

GROUP TASK RUN NAME

MRR

No. of Q. with at least one

right answer NIL Questions

strict lenient strict lenient returned correctly returned

ITC-irstMonolingual

Italianirstst032mi .449 .471 99 104 5 2

Page 25: The Multiple Language  Question Answering Track at CLEF 2003

Results - 50 BYTES ANSWERS RUNS

CROSS-LANGUAGE TASKS

GROUP TASK RUN NAME MRR

No. of Q. with at least one

right answer NIL Questions

strict lenient strict lenient returned correctly returned

DFKIBilingual German

dfkist031bg .098 .103 29 30 18 0

RALIBilingual French

udemst031bf .213 .220 56 58 4 1

Page 26: The Multiple Language  Question Answering Track at CLEF 2003

Average Results in Different Tasks

EXACT ANSWERS - MONOLINGUAL (5 runs)

0

50

100

150

200

1st 2nd 3rd not found

strict lenient

EXACT ANSWERS - BILINGUAL (9 runs)

0

50

100

150

200

1st 2nd 3rd not found

strict lenient

Page 27: The Multiple Language  Question Answering Track at CLEF 2003

Approaches in CL QA

Two main different approaches used in Cross-Language QA systems:

answer extraction

question processing

answer extraction

question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.)

translation and expansion of the retrieved data

1

2

translation of the question into the target language (i.e. in the language of the document collection)

Page 28: The Multiple Language  Question Answering Track at CLEF 2003

Approaches in CL QA

Two main different approaches used in Cross-Language QA systems:

answer extraction

question processing

answer extraction

preliminary question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.)

translation and expansion of the retrieved data

1

2

translation of the question into the target language (i.e. in the language of the document collection)

ITC-irst

RALI

DFKI

ISI

CS-CMU

Limerik

Page 29: The Multiple Language  Question Answering Track at CLEF 2003

Conclusions

A pilot evaluation campaign for multiple language Question Answering Systems has been carried on.

Five European languages were considered: three monolingual tasks and five bilingual tasks against an English collection have been activated.

Considering the difference of the task, results are comparable with QA at TREC.

A corpus of 450 questions, each in four languages, reporting at least one known answer in the respective text collection, has been built.

This year experience was very positive: we intend to continue with QA at CLEF 2004.

Page 30: The Multiple Language  Question Answering Track at CLEF 2003

Perspective for Future QA Campaigns

• Organization issues:

• Promote larger participation

• Collaboration with NIST

• Financial issues:

• Find a sponsor: ELRA, the new CELCT center, …

• Tasks (to be discussed)

• Update to TREC-2003: definition questions, list questions

• Consider just “exact answer”: 50 bytes did not have much favor

• Introduce new languages: in the cross-language task this is easy to do

• New steps toward multilinguality: English questions against other language collections; a small set of full cross-language tasks (e.g. Italian/Spanish).

Page 31: The Multiple Language  Question Answering Track at CLEF 2003

Creation of the Question Set

1. Find 200 questions for each language (Dutch, Italian, Spanish), based on CLEF-2002 topics, with at least one answer in the respective corpus.

2. Translate each question into English, and from English into the other two languages.

3. Find answers in the corpora of the other languages (e.g. a Dutch question was translated and processed in the Italian text collection).

4. The result is a corpus of 450 questions, each in four languages, with at least one known answer in the respective text collection. More details in the paper and in the Poster.

5. Questions with at least one answer in all the corpora were selected for the final question set.