14
A multiple knowledge source A multiple knowledge source algorithm for anaphora algorithm for anaphora resolution resolution Allaoua Refoufi Allaoua Refoufi Computer Science Department Computer Science Department University of Setif, Setif 19000, Algeria University of Setif, Setif 19000, Algeria email : [email protected] email : [email protected] Introduction Introduction Types of anaphora Types of anaphora Knowledge sources Knowledge sources The main algorithm The main algorithm Discussion Discussion Conclusion Conclusion

A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Embed Size (px)

Citation preview

Page 1: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

A multiple knowledge source A multiple knowledge source algorithm for anaphora resolutionalgorithm for anaphora resolution

Allaoua RefoufiAllaoua RefoufiComputer Science DepartmentComputer Science Department

University of Setif, Setif 19000, AlgeriaUniversity of Setif, Setif 19000, Algeriaemail : [email protected] : [email protected]

IntroductionIntroductionTypes of anaphoraTypes of anaphoraKnowledge sourcesKnowledge sourcesThe main algorithmThe main algorithm

DiscussionDiscussionConclusionConclusion

Page 2: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

what is anaphora ?what is anaphora ?

• The term anaphora relates to the presence in the text The term anaphora relates to the presence in the text of entities (noun phrases, pronouns, etc.) which, on of entities (noun phrases, pronouns, etc.) which, on one hand, refer to the same entity (are co referential) one hand, refer to the same entity (are co referential) and, on the other hand supply additional information. and, on the other hand supply additional information.

• Reference to an entity is generally termed anaphora, Reference to an entity is generally termed anaphora, the entity to which the anaphora refers is the the entity to which the anaphora refers is the antecedent or the referent, anaphor is the entity used antecedent or the referent, anaphor is the entity used to make the reference. Example “to make the reference. Example “My brother called called last night, last night, he wanted to see me wanted to see me”. ”.

• Linguistic unit which is a substitution to another Linguistic unit which is a substitution to another linguistic unit already introduced. linguistic unit already introduced.

Page 3: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Types of anaphoraTypes of anaphora

• Pronominal: it’s the most used one, the Pronominal: it’s the most used one, the reference is made by a pronoun :”reference is made by a pronoun :”Sabrina took took the applethe apple on the table. on the table. She ate ate it”it”

• Definite noun phrase : the Definite noun phrase : the antecedentantecedent is referred is referred to by a to by a definitedefinite noun phrase «  noun phrase « The president visited the city. visited the city. The host of the people’s palace inaugurated several realisations inaugurated several realisations »»..

• Verb phrase as antecedent : « Verb phrase as antecedent : « Sarah Sarah tried to convince him to stay. The attempt was vain was vain ».».

• Ordinal Anaphora Ordinal Anaphora : : the anaphor is a cardinal the anaphor is a cardinal number like number like firstfirst, , secondsecond, etc. “, etc. “Sarah was not Sarah was not satisfied by satisfied by the solution. She looked for . She looked for a new one”.”.

Page 4: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

knowledge sourcesknowledge sources

• Morphology is concerned with the structure of Morphology is concerned with the structure of words; it tells us how to extract the base forms words; it tells us how to extract the base forms out of inflected forms that occur in texts. out of inflected forms that occur in texts.

• Syntax is concerned with the ways words Syntax is concerned with the ways words combine to form phrases, and phrases combine combine to form phrases, and phrases combine to form sentences. It extracts the syntactic to form sentences. It extracts the syntactic function of each word (verb, noun, pronoun, etc.). function of each word (verb, noun, pronoun, etc.). This process is known as parsing. This process is known as parsing.

• Semantics deals with the meaning of words, Semantics deals with the meaning of words, phrases and sentences. phrases and sentences.

• Pragmatic knowledge uses the context in order to Pragmatic knowledge uses the context in order to disambiguate among different settings. disambiguate among different settings.

Page 5: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

The main algorithmThe main algorithm

• Recognition phaseRecognition phase – Morphosyntactic Morphosyntactic analysisanalysis– Recognition of non Recognition of non anaphoricanaphoric pronounspronouns– Identification of focusing expressionIdentification of focusing expressionss– Data structures buildingData structures building

• Resolution phaseResolution phase• For each anaphor do : For each anaphor do :

– Carry out in order the constraintsCarry out in order the constraints– Carry out in order the preferencesCarry out in order the preferences

Page 6: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

constraintsconstraints

Constraints are rules which participate in the purging of Constraints are rules which participate in the purging of the candidates appearing in the structures built the candidates appearing in the structures built during the parsing process. during the parsing process.

• Consistency conditions : candidates are eliminated Consistency conditions : candidates are eliminated on morphological grounds (number, gender, person)on morphological grounds (number, gender, person)

• Condition on insertions : an expression which is Condition on insertions : an expression which is included in an insertion cannot be the antecedent of included in an insertion cannot be the antecedent of an anaphor located outside the insertion. an anaphor located outside the insertion.

Page 7: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

preferencespreferences

• Preferences, as opposed to constraints, can be Preferences, as opposed to constraints, can be violated by the antecedent candidates, they are violated by the antecedent candidates, they are used to rank the candidates. However those that used to rank the candidates. However those that verify the preferences are retained. The order in verify the preferences are retained. The order in which they appear reflects their weight. which they appear reflects their weight.

• Syntactic parallelismSyntactic parallelism

• Antecedent not occurring in a prep. phraseAntecedent not occurring in a prep. phrase

• Focus expressionsFocus expressions

• RecencyRecency

Page 8: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Some preferencesSome preferences

• Syntactic parallelism states that we prefer the Syntactic parallelism states that we prefer the antecedent that shares the same syntactic antecedent that shares the same syntactic function as the anaphor. “function as the anaphor. “The child recognizedrecognized the kingthe king, although , although he has never met has never met himhim before before””. .

• An expression included in a prep. phrase is An expression included in a prep. phrase is unlikely to be referred to because it only brings unlikely to be referred to because it only brings additional information. “additional information. “La voiture de la voisine de la voisine bloque le passage, il faut bloque le passage, il faut la déplacer»  déplacer» 

Page 9: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Focus expressionsFocus expressions

They identify the main theme, the They identify the main theme, the focus of attention.focus of attention.

Of the form :Of the form :

• C’est NP qui …C’est NP qui …

• Il y a NP qui …Il y a NP qui …

Page 10: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

appositionsappositions

Fragments of sentences which can be eliminated without ‘altering’ the main meaning. Goal : eliminate candidates which occur inside.

Mainly three forms :

• Delimited by separators “,” “(“ “,” “(“ : La dame, assise en face de Sarah, était anxieuse. Elle voulait prendre la parole.”

• Relative clauses : La dame qui discute avec Sarah est une voisine.

• Just one comma : Caesar, the roman emperor VERB …

Page 11: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

discussiondiscussion

• The algorithm realises a success rate of 68%. The The algorithm realises a success rate of 68%. The evaluation has been carried out so far on more evaluation has been carried out so far on more than 100 texts of reasonable size(1 page) from than 100 texts of reasonable size(1 page) from literary stories.literary stories.

• The results show that the resolution of pronouns The results show that the resolution of pronouns such as such as il(s), elle(s), le, la il(s), elle(s), le, la is relatively successful is relatively successful (success rate of 93%). (success rate of 93%).

• The insertion constraint tends to add more The insertion constraint tends to add more complexity in the implementation.complexity in the implementation.

Page 12: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Unresolved problemsUnresolved problems

• Multiple source anaphorMultiple source anaphor :“ :“Sarah and Sofia left left early this morning. early this morning. They have an appointment at the university”

• Self referring expressions :Self referring expressions :””everyone knows it, John is a good driver”

• Reference to verb phrases, sentences :Reference to verb phrases, sentences : « « On tOn two o wheels we are we are vulnerable. vulnerable. The problem is to The problem is to forgetforget it”.”.

Page 13: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

conclusionconclusion

• The main idea of our work is to establish a link The main idea of our work is to establish a link between nominal phrases that share similar between nominal phrases that share similar context with constituents in the input text.context with constituents in the input text.

• It relies heavily on a morphosyntactic parser. The It relies heavily on a morphosyntactic parser. The application of a set of constraints followed by a application of a set of constraints followed by a set of preferences provides an elegant modular, set of preferences provides an elegant modular, easy to update anaphora resolution algorithm. easy to update anaphora resolution algorithm.

• Unfortunately, current state-of-the-art of Unfortunately, current state-of-the-art of practically applicable parsing technology still falls practically applicable parsing technology still falls short of robust and reliable delivery of syntactic short of robust and reliable delivery of syntactic analysis of real texts to the level of detail and analysis of real texts to the level of detail and precision that most algorithms assume.precision that most algorithms assume.

• Shallow parsing, on the other hand, can affect Shallow parsing, on the other hand, can affect greatly the performance and the efficiency of the greatly the performance and the efficiency of the algorithm. algorithm.

Page 14: A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria email

Related workRelated work

Type of Type of knowledgeknowledge

Success rateSuccess rate corpuscorpus

Lappin & Lappin & LeassLeass

Robust parserRobust parser 75%75% Computer Computer textstexts

Kennedy et al.Kennedy et al. Shallow Shallow parserparser

85%85% Web Web documentsdocuments

MitkovMitkov P.O.S. taggerP.O.S. tagger 89.7%89.7% Manuel textsManuel texts