48
AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra [eboldrini/patricio/borja/marcel/]@dlsi.ua.es [email protected] CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Embed Size (px)

Citation preview

Page 1: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

AQA: a multilingual Anaphora annotation scheme for

Question Answering

E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra

[eboldrini/patricio/borja/marcel/]@dlsi.ua.es [email protected]

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 2: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Outline• Introduction

• Corpus

• Principles

• Previous work

• Problematic cases

• Evaluation

• Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 3: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Introductioninteraction• AQA: multilingual annotation scheme for anaphora

resolution that can be applied in machine learning for the improvement of QA systems

• To understand and annotate the way anaphora is used in each language

• To be able to detect the antecedent of each the anaphora and find the correct answer

• INTERACTION between the user and the system

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 4: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Introductionlanguages

• Languages: Italian, Spanish, English

• Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language

• Disadvantages: languages with different characteristics

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 5: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Introductionlanguages

• Languages: Italian, Spanish, English

• Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language

• Disadvantages: languages with different characteristics

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q065">¿Qué medio de transporte se utilizó en la Expedición Kon-tiki?</q><q id="q066">¿Cuántas personas <link rel="dir" status="ok" type="pron" ref="" ant="a" refq="q065">la</link> tripulaban?</q></t>

<t><q id="q265">Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki?</q><q id="q266">Quanti membri d'equipaggio aveva <link rel="dir" status="ok" type="elips" ref="" ant="a" refq="q265">0</link>?</q></t>

<t><q id="q465">What transport was used in the Kon-Tiki Expedition?</q><q id="q466">How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>?</q></t>

Page 6: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Corpus

• Corpus for CLEF 2008 in English, Italian and Spanish

• 200 questions per language

• Topic-related questions

• Categories of questions: factoid, definition, and list

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 7: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elements

• Each group has a topic

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 8: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elements

• Each group has a topic

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Page 9: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elements

• If there is a subtopic, we mark it

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Page 10: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• Each question (question/answer pair) has a number

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 11: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• Each question (question/answer pair) has a number

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Page 12: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• Each anaphora has a number, the same of its antecedent

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 13: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We indicate if the antecedent is in the question or in the answer

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 14: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Principlesannotated elments

• We indicate if the antecedent is in the question or in the answer

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 15: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We indicate if the antecedent is in the question or in the answer

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

<t><q id="q482">Which city is the headquarters of the China's Eastern Fleet?</q><q id="q483">How far from China's capital city is <link rel="dir" status="ok" ant="a" refq="q482" type="pron" ref="">it</link>?</q><q id="q484">What was <link rel="indir" status="ok" ant="a" refq="q482" type="dd" ref="">its population</link> in 2002?</q></t>

Page 16: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We indicate the number of the question or the answer where the antecedent is situated

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 17: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Principlesannotated elments

• We indicate the number of the question or the answer where the antecedent is situated

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 18: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We select the type of anaphora

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 19: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Principlesannotated elments

• We select the type of anaphora

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 20: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We select the type of anaphora

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q453">In which country is <de id="n28">the Colditz Castle</de>?</q><q id="q454">Exactly in which state is <link rel="dir" status="ok" type="pron" ref="n28" ant="q" refq="q453">it</link>?</q><q id="q455">Who was the first who escaped from <link rel="dir" status="ok" type="adv" ref="n28" ant="q" refq="q453">there</link> ?</q>

Page 21: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We select the type of anaphora

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q412">Who published the Evangelium Vitae <de id="n6">encyclical</de>?</q><q id="q413">How many <link rel="dir" status="ok" ant="q" refq="q412" type="elips" ref="n6">0</link> did <link rel="dir" status="ok" ant="a" refq="q412" type="pron" ref="">he</link> publish?</q></t>

Page 22: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We select the type of relation

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 23: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Principlesannotated elments

• We select the type of relation

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 24: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We select the type of relation

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q416">Which islands are in <de id="n9">the Pelagie Islands</de>?</q><q id="q417">Which is <link rel="indir" status="ok" type="dd" ref="n9" ant="q" refq="q416">the biggest one</link>?

Page 25: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Principlesannotated elments

• We underline if the annotator has doubts or not

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 26: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Principlesannotated elments

• We underline if the annotator has doubts or not

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 27: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Previuos work• UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme

for anaphora resolution

• MUC: inclusion of the coreference task in MUC-6 and MUC-7

• Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997).

• Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm.

• MATE/GNOME (Poesio, 2004): meta-model

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 28: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Previuos workwhat we added

• MATE/GNOME (Poesio, 2004): meta-model

• Element link in the text with the information about the anaphora

• Identification of the question/answer pair

• Topic/subtopic

• Antecedent in the question or in the answer

• Status of the annotation

• Applied to three languages

• Applied to collections of questions

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 29: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases

• World knowledge

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 30: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases• World knowledge

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

<t><q id="q404">Which was <de id="n2">the "gordo" in the 1995 Christmas</de>?</q><q id="q405">Which was <link rel="indir" status="no" type="dd" ref="n2" ant="q" refq="q404">the prize</link>?</q></t>

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 31: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases• World knowledge• World knowledge

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

• An antecedent contains another one<t><q id="q427">Who were <de id="n14">the founders of <de id="n15">Magnum Photos</de> </de>?</q><q id="q428">In what year did <link rel="dir" status="ok" ant="q" refq="q427" type="pron" ref="n14">they</link> found <link rel="dir" status="ok" type="pron" ref="n15" ant="q" refq="q427">it</link>?</q></t>

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 32: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases• World knowledge

• An antecedent contains another one

• World knowledge

• An antecedent contains another one

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

<t><q id="q432">What is <de id="n18">the starring cast</de> of the film Beetlejuice?</q><q id="q433">Who of <link rel="dir" status="ok" type="pron" ref="n18" ant="q" refq="q432">them</link> is the main character?</q></t>

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 33: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases• World knowledge

• An antecedent contains another one

• Collective nouns

• World knowledge

• An antecedent contains another one

• Collective nouns

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• Two antecedents, but separated

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

• Doubtful position of the antecedent

• An anaphora inside a discourse entity

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>

Page 34: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

<t><q id="q465">What transport was used in the Kon-Tiki Expedition?</q><q id="q466">How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>?</q></t>

Problematic cases• World knowledge

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• World knowledge

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• Doubtful position of the antecedent

• An anaphora inside a discourse entity• An anaphora inside a discourse entity

?? ??

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 35: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Problematic cases• World knowledge

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

• World knowledge

• An antecedent contains another one

• Collective nouns

• Two antecedents, but separated

• Doubtful position of the antecedent

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

• An anaphora inside a discourse entity

<t><q id="q434">What is <de id="n19">a censer</de> ?</q><q id="q435">What name is given to <de id="n20"> <link rel="dir" status="no" type="pron" ref="n19" ant="q" refq="q434">the one</link> of the Cathedral of Santiago de Compostela </de>?</q><q id="q436">How much does <link rel="dir" status="ok" type="pron" ref="n20" ant="q" refq="q434">it</link> weight?</q></t>

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 36: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluation

• Annotation

• 2 annotators

• Blind annotation

• Evaluation

• Each language independently

• Global results

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 37: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationsubdivision

• Topic boundary

• Anaphora detection

• Anaphora attibutes

• Antecedent recognition

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 38: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationtopic boundary

• Class N: new topic

• Class S: same topic

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

A1\A2 S N A1\A2 S N A1\A2 S N

S 62 0 S 62 0 S 61 0

N 0 138 N 0 138 N 1 138

Kappa 1 Kappa 1 Kappa 0,988

Page 39: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationanaphora detection

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

Anaphors detected by A1 70 69 67

Anaphors detected by A2 70 69 68

Anaphors detection agreement 70 69 67

Different anaphora boundary 1 1 0

Page 40: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationanaphora attributes (antecedent)

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

A1\A2 Q A A1\A2 Q A A1\A2 Q A

Q 64 0 Q 62 0 Q 61 0

A 0 6 A 0 7 A 0 6

Kappa 1 Kappa 1 Kappa 1

Page 41: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationanaphora attributes (type)

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

A1 A2 A1 A2 A1 A2

Elips 33 33 32 32 3 3

Pron 13 15 13 13 42 42

Adv 1 1 2 2 1 1

Sup 1 0 0 0 0 0

DD 22 21 22 22 21 21

Kappa 0,955 1 1

Page 42: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationanaphora attributes (relation)

• Dir: direct relation

• Indir: bridging relation

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

A1\A2 DIR INDIR A1\A2 DIR INDIR A1\A2 DIR INDIR

DIR 52 0 Q 51 0 Q 52 0

INDIR 4 14 INDIR 1 17 INDIR 2 13

Kappa 0,838 Kappa 0,961 Kappa 0,909

Page 43: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationantecedent recognition

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

SPANISH ITALIAN ENGLISH

Total antecedents into the answer (agreement) 6 7 6

Total antecedents into the question (agreement) 64 62 61

Anaphors pointing to the same question (refq) (agreement)

64 62 61

Antecedents with different boundary (disagreement)

2 3 1

Page 44: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationglobal results

• Total agreement results

• Spanish: 60/70 = 0,857

• Italian: 60/69 = 0,869

• English: 59/67 = 0,880

• Average: 0,868

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 45: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Conclusion

• Multilingual annotation scheme for anaphora resoultion

• For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer

• For a true interaction between the system and the user

• Simple but complete

• Positive results of the evaluation

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages

Page 46: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Future work

• Integration of other languages

• Application of the annotation scheme to other corpora

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

Page 47: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas
Page 48: AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas

Evaluationmeasure used

• Kappa

Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages