9
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) Bridging Languages for Question Answering: DIOGENE at CLEF-2003 Matteo Negri, Hristo Tanev and Bernardo Magnini ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento – Italy {negri, tanev, magnini}@itc.it

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Embed Size (px)

Citation preview

Page 1: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

CLEF – Cross Language Evaluation Forum

Question Answering at CLEF 2003 (http://clef-qa.itc.it)

Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Matteo Negri, Hristo Tanev and Bernardo Magnini

ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento – Italy

{negri, tanev, magnini}@itc.it

Page 2: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Outline

• DIOGENE QA English system

• Porting DIOGENE to monolingual Italian

• Porting DIOGENE to the bi-lingual task

• Results at CLEF-2003

• Question Answering for Italian

Page 3: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

System Architecture

Search

ANSWER

Answer Extraction

Answer Validation and Ranking

Named Entities Recognition

Candidate Answer Selection

Query Composition

Search Engine

Keyword Expansion

Multiwords Recognition

Answer Type Identification

Tokenization and POS tagging

QUESTION

Question Processing

Keywords Extraction

Keywords Translation

Query Reformulation

Document collection

WEB

WordNet

M

B

Page 4: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

System Architecture

Search

ANSWER

Answer Extraction

Answer Validation and Ranking

Named Entities Recognition

Candidate Answer Selection

Query Composition

Search Engine

Keyword Expansion

Multiwords Recognition

Answer Type Identification

Tokenization and POS tagging

QUESTION

Question Processing

Keywords Extraction

Keywords Translation

Query Reformulation

Document collection

WEB

WordNet

M

B

Page 5: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Porting from English to Italian

Named Entities Recognition

Keyword Expansion

Multiwords Recognition

Answer Type Identification

Tokenization and POS tagging

Keywords Extraction

WordNet

POS tagger for Italian: already available

List of Italian multiwords (about 5000) extracted from an electronic dictionary: 1p/m

Rules for Italian (about 250): 1 p/m

Rules for Italian (e.g. verb expansion) : 0.5 p/m

Rules for Italian (about 300): 1 p/m

WordNet for Italian aligned with the English WordNet: already available

Page 6: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Porting to the Italian-English task

Keyword Translation

Multiwords Recognition

Answer Type Identification

Tokenization and POS tagging

Keywords Extraction

Already available from the monolingual Italian task

Word by word translation based on Italian/English aligned resources; sense ambiguities are addressed with statistical techniques.

Page 7: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Keyword Translation

• Method proposed by [Federico and Bertoldi, 2002]

• Bilingual dictionary, target corpus and search engine

• Given a sequence of Italian keywords:

1. Extract all possible translations from the Italian-English dictionary and from the aligned wordnets

2. Estimate the probability of each translation sequence against the English target corpus

Page 8: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Results

MONOLINGUAL ITALIAN

GROUP TASK RUN NAME MRR

No. of Q. with at least one

right answer NIL Questions

strict lenient strict lenient returned correctly returned

ITC-irstExact

answerirstex031mi .422 .442 97 101 4 2

ITC-irst50 bytes answer

irstst032mi .449 .471 99 104 5 2

ITC-irst Exact answer

irstex031bi .322 .334 77 81 49 6

irstex032bi .393 .400 90 92 28 5

BILINGUAL ITALIAN

Page 9: CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 () Bridging Languages for Question Answering: DIOGENE at CLEF-2003

Situation for Italian Tasks

• Two groups from Italy have participated at TREC-2002 QA

• Four groups expressed their interest in QA for Italian

• Just one registered and took part at CLEF-2003

• Resources available for Italian:

• Wordnets (aligned with the English)

• Named Entities Recognition