Upload
benedict-payne
View
212
Download
0
Embed Size (px)
Citation preview
CLEF – Cross Language Evaluation Forum
Question Answering at CLEF 2003 (http://clef-qa.itc.it)
Bridging Languages for Question Answering: DIOGENE at CLEF-2003
Matteo Negri, Hristo Tanev and Bernardo Magnini
ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento – Italy
{negri, tanev, magnini}@itc.it
Outline
• DIOGENE QA English system
• Porting DIOGENE to monolingual Italian
• Porting DIOGENE to the bi-lingual task
• Results at CLEF-2003
• Question Answering for Italian
System Architecture
Search
ANSWER
Answer Extraction
Answer Validation and Ranking
Named Entities Recognition
Candidate Answer Selection
Query Composition
Search Engine
Keyword Expansion
Multiwords Recognition
Answer Type Identification
Tokenization and POS tagging
QUESTION
Question Processing
Keywords Extraction
Keywords Translation
Query Reformulation
Document collection
WEB
WordNet
M
B
System Architecture
Search
ANSWER
Answer Extraction
Answer Validation and Ranking
Named Entities Recognition
Candidate Answer Selection
Query Composition
Search Engine
Keyword Expansion
Multiwords Recognition
Answer Type Identification
Tokenization and POS tagging
QUESTION
Question Processing
Keywords Extraction
Keywords Translation
Query Reformulation
Document collection
WEB
WordNet
M
B
Porting from English to Italian
Named Entities Recognition
Keyword Expansion
Multiwords Recognition
Answer Type Identification
Tokenization and POS tagging
Keywords Extraction
WordNet
POS tagger for Italian: already available
List of Italian multiwords (about 5000) extracted from an electronic dictionary: 1p/m
Rules for Italian (about 250): 1 p/m
Rules for Italian (e.g. verb expansion) : 0.5 p/m
Rules for Italian (about 300): 1 p/m
WordNet for Italian aligned with the English WordNet: already available
Porting to the Italian-English task
Keyword Translation
Multiwords Recognition
Answer Type Identification
Tokenization and POS tagging
Keywords Extraction
Already available from the monolingual Italian task
Word by word translation based on Italian/English aligned resources; sense ambiguities are addressed with statistical techniques.
Keyword Translation
• Method proposed by [Federico and Bertoldi, 2002]
• Bilingual dictionary, target corpus and search engine
• Given a sequence of Italian keywords:
1. Extract all possible translations from the Italian-English dictionary and from the aligned wordnets
2. Estimate the probability of each translation sequence against the English target corpus
Results
MONOLINGUAL ITALIAN
GROUP TASK RUN NAME MRR
No. of Q. with at least one
right answer NIL Questions
strict lenient strict lenient returned correctly returned
ITC-irstExact
answerirstex031mi .422 .442 97 101 4 2
ITC-irst50 bytes answer
irstst032mi .449 .471 99 104 5 2
ITC-irst Exact answer
irstex031bi .322 .334 77 81 49 6
irstex032bi .393 .400 90 92 28 5
BILINGUAL ITALIAN
Situation for Italian Tasks
• Two groups from Italy have participated at TREC-2002 QA
• Four groups expressed their interest in QA for Italian
• Just one registered and took part at CLEF-2003
• Resources available for Italian:
• Wordnets (aligned with the English)
• Named Entities Recognition