117
Three Stories on Automated Reasoning for Natural Language Understanding Johan Bos University of Rome "La Sapienza“ Dipartimento di Informatica

Three Stories on Automated Reasoning for Natural Language Understanding Johan Bos University of Rome "La Sapienza Dipartimento di Informatica

Embed Size (px)

Citation preview

  • Slide 1

Three Stories on Automated Reasoning for Natural Language Understanding Johan Bos University of Rome "La Sapienza Dipartimento di Informatica Slide 2 Background My work is in between natural language processing computational linguistics formal and computational semantics Aim of my work implement linguistic theories use automated reasoning in modeling natural language understanding Slide 3 Applications What kind of applications? Human-machine dialogue systems Question answering systems Textual entailment systems Use of logical inference Off the shelf systems, FOL theorem provers and finite model builders Empirically successful? Slide 4 Surprise Perhaps surprisingly, automated reasoning tools rarely make it into NLP applications Why? Requires interdisciplinary background Gap between formal semantic theory and practical implementation It is just not trendy --- statistical approaches dominate the field Slide 5 Three Stories World Wide Computational Semantics The worlds first serious implementation of Discourse Representation Theory, with the help of the web and theorem proving Godot, the talking robot The first robot that computes semantic representations and performs inferences using theorem proving and model building Recognising Textual Entailment Automated deduction applied in wide-coverage natural language processing Slide 6 The First Story 1994-2001 World Wide Computational Semantics The first serious implementation of Discourse Representation Theory, with the help of the internet Slide 7 How it started Implementing tools for the semantic analysis of English Follow linguistic theory as closely as possible Discourse Representation Theory [DRT] First-order logic Presupposition projection Computational Semantics Slide 8 How can we automate the process of associating semantic representations with expressions of natural language? How can we use logical representations of natural language expressions to automate the process of drawing inferences? Slide 9 Basic idea Text: Vincent loves Mia. Slide 10 Basic idea Text: Vincent loves Mia. DRT: x y vincent(x) mia(y) love(x,y) Slide 11 Basic idea Text: Vincent loves Mia. DRT: FOL: x y(vincent(x) & mia(y) & love(x,y)) x y vincent(x) mia(y) love(x,y) Slide 12 Basic idea Text: Vincent loves Mia. DRT: FOL: x y(vincent(x) & mia(y) & love(x,y)) BK: x (vincent(x) man(x)) x (mia(x) woman(x)) x (man(x) woman(x)) x y vincent(x) mia(y) love(x,y) Slide 13 Basic idea Text: Vincent loves Mia. DRT: FOL: x y(vincent(x) & mia(y) & love(x,y)) BK: x (vincent(x) man(x)) x (mia(x) woman(x)) x (man(x) woman(x)) Model: D = {d1,d2} F(vincent)={d1} F(mia)={d2} F(love)={(d1,d2)} x y vincent(x) mia(y) love(x,y) Slide 14 ? Text: Vincent loves Mia. DRT: x y vincent(x) mia(y) love(x,y) Slide 15 Compositional Semantics The Problem Given a natural language expression, how do we convert it into a logical formula? Freges principle The meaning of a compound expression is a function of the meaning of its parts. Slide 16 Lexical semantics CategorySemanticsExample N x. spokesman NP/N p. q.(( ;p(x));q(x)) a NP/N p. q.(( ;p(x)) q(x)) the S\NP y. lied spokesman(x) X e lie(e) agent(e,y) X Slide 17 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) z. y. spokesman(z) x e lie(e) agent(e,y) Slide 18 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) z. y. -------------------------------------------------------- (FA) NP: a spokesman p. q. ;p(x);q(x)( z. ) spokesman(z) x e lie(e) agent(e,y) x Slide 19 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) z. y. -------------------------------------------------------- (FA) NP: a spokesman q. ; ;q(x)) spokesman(z) x spokesman(x) e lie(e) agent(e,y) x Slide 20 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) z. y. -------------------------------------------------------- (FA) NP: a spokesman q. ;q(x) spokesman(z) x x spokesman(x) e lie(e) agent(e,y) Slide 21 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) x. y. -------------------------------------------------------- (FA) NP: a spokesman q. ;q(x) -------------------------------------------------------------------------------- (BA) S: a spokesman lied q. ;q(x)( y. ) spokesman(z) x x spokesman(x) e lie(e) agent(e,y) e lie(e) agent(e,y) x spokesman(x) Slide 22 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) x. y. -------------------------------------------------------- (FA) NP: a spokesman q. ;q(x) -------------------------------------------------------------------------------- (BA) S: a spokesman lied ; spokesman(z) x x spokesman(x) e lie(e) agent(e,y) e lie(e) agent(e,x) x spokesman(x) Slide 23 A derivation NP/N:a N:spokesman S\NP:lied p. q. ;p(x);q(x) x. y. -------------------------------------------------------- (FA) NP: a spokesman q. ;q(x) -------------------------------------------------------------------------------- (BA) S: a spokesman lied spokesman(x) x x e lie(e) agent(e,y) x e spokesman(x) lie(e) agent(e,x) Slide 24 The DORIS System Reasonable grammar coverage Parsed English sentences, followed by resolving ambiguities Pronouns Presupposition Generated many different semantic representation for a text Slide 25 Texts and Ambiguity Usually, ambiguities cause many possible interpretations Example: Butch walks into his modest kitchen. He opens the refrigerator. He takes out a milk and drinks it. Slide 26 Texts and Ambiguity Usually, ambiguities cause many possible interpretations Example: Butch walks into his modest kitchen. He opens the refrigerator. He takes out a milk and drinks it. Slide 27 Texts and Ambiguity Usually, ambiguities cause many possible interpretations Example: Butch walks into his modest kitchen. He opens the refrigerator. He takes out a milk and drinks it. Slide 28 Texts and Ambiguity Usually, ambiguities cause many possible interpretations Example: Butch walks into his modest kitchen. He opens the refrigerator. He takes out a milk and drinks it. Slide 29 Basic idea of DORIS Given a text, produce as many different DRSs [semantic interpretations] as possible Filter out `strange` interpretations Inconsistent interpretations Uninformative interpretations Applying theorem proving Use general purpose FOL theorem prover Bliksem [Hans de Nivelle] Slide 30 Screenshot Slide 31 Consistency checking Inconsistent text: Mia likes Vincent. She does not like him. Two interpretations, only one consistent: Mia likes Jody. She does not like her. Slide 32 Informativity checking Uninformative text: Mia likes Vincent. She likes him. Two interpretations, only one informative: Mia likes Jody. She likes her. Slide 33 Local informativity Example: Mia is the wife of Marsellus. If Mia is the wife of Marsellus, Vincent will be disappointed. The second sentence is informative with respect to the first. But Slide 34 x y mia(x) marsellus(y) wife-of(x,y) Local informativity Slide 35 x y z mia(x) marsellus(y) wife-of(x,y) vincent(z) wife-of(x,y) disappointed(z) Local informativity Slide 36 Local consistency Example: Jules likes big kahuna burgers. If Jules does not like big kahuna burgers, Vincent will order a whopper. The second sentence is consistent with respect to the first. But Slide 37 x y jules(x) big-kahuna-burgers(y) like(x,y) Local consistency Slide 38 x y z jules(x) big-kahuna-burgers(y) like(x,y) vincent(z) u order(z,u) whopper(u) Local consistency like(x,y) Slide 39 Studying Presupposition The DORIS system allowed one to study the behaviour of presupposition Examples such as: If Mia has a husband, then her husband is out of town. If Mia is married, then her husband is out of town. If Mia is dating Vincent, then her husband is out of town. Slide 40 Applying Theorem Proving The first version of DORIS sort of worked, but Many readings to start with, explosion The local constraints added a large number of inference tasks It could take about 10 minutes for a complex sentence Slide 41 MathWeb MathWeb [by Kohlhase & Franke] came to the rescue Theorem proving services via the internet Interface Doris with MathWeb At the time this was a sensation! What happened exactly? Slide 42 World Wide Computational Semantics Just in order to find out whether Mia was married or not, thousands of computers world wide were used Slide 43 What can we say about DORIS? Demonstrated that FOL could play an interesting role in natural language understanding Helped to develop presupposition theory Generated a new application area for automated deduction [TPTP] Limitations Scalability Coverage Slide 44 Six feet under The DORIS system 1993-2001 Why? Limited grammatical coverage Unfocussed application domain It would take at least 20 years to develop a parser that was both robust and accurate [at least that was my belief] Slide 45 The Second Story 2001-2004 Godot the talking robot Or how a mobile, talking robot engaged in conversation using theorem provers and model builders for first-order logic Slide 46 Human-Computer Dialogue Focus on small domains Grammatical coverage ensured Background knowledge encoding Spoken Dialogue system = killer app Godot the robot Speech recognition and synthesis People could give Godot directions, ask it questions, tell it new information Godot was a REAL robot Slide 47 Godot the Robot Godot with Tetsushi Oka Slide 48 Typical conversation with Godot Simple dialogues in beginning: Human: Robot? Robot: Yes? Human: Where are you? Robot: I am in the hallway. Human: OK. Go to the rest room! Slide 49 Adding inference to Godot Using the components of DORIS, we added semantics and inference Using DRT Inference tasks: Consistency checking Informativity checking Using FOL theorem provers and finite model builders Slide 50 Advanced conversation with Godot Dealing with inconsistency and informativeness Human: Robot, where are you? Robot: I am in the hallway. Human: You are in my office. Robot: No, that is not true. Human: You are in the hallway. Robot: Yes I know! Slide 51 Parallel inference Given a text T and a translation into FOL If theorem prover finds a proof for, then T is inconsistent If model builder finds a model for, then T is consistent In other words, this is the Yin and Yen of inference Slide 52 Why is this relevant to natural language? Testing a discourse for consistency DiscourseTheorem proverModel builder Slide 53 Why is this relevant to natural language? Testing a discourse for consistency DiscourseTheorem proverModel builder Vincent is a man. ??Model Slide 54 Why is this relevant to natural language? Testing a discourse for consistency DiscourseTheorem proverModel builder Vincent is a man. ??Model Mia loves every man. ??Model Slide 55 Why is this relevant to natural language? Testing a discourse for consistency DiscourseTheorem proverModel builder Vincent is a man. ??Model Mia loves every man. ??Model Mia does not love Vincent. Proof?? Slide 56 Why is this relevant to natural language? Testing a discourse for informativity DiscourseTheorem proverModel builder Slide 57 Why is this relevant to natural language? Testing a discourse for informativity DiscourseTheorem proverModel builder Vincent is a man. ??Model Slide 58 Why is this relevant to natural language? Testing a discourse for informativity DiscourseTheorem proverModel builder Vincent is a man. ??Model Mia loves every man. ??Model Slide 59 Why is this relevant to natural language? Testing a discourse for informativity DiscourseTheorem proverModel builder Vincent is a man. ??Model Mia loves every man. ??Model Mia loves Vincent. Proof?? Slide 60 Minimal Models Model builders normally generate models by iteration over the domain size As a side-effect, the output is a model with a minimal domain size From a linguistic point of view, this is interesting, as there is no redundant information Minimal in extensions Slide 61 Using models Examples: Turn on a light. Turn on every light. Turn on everything except the radio. Turn off the red light or the blue light. Turn on another light. Slide 62 Videos of Godot Video 1: Godot in the basement of Bucceuch Place Video 2: Screenshot of dialogue manager with DRSs and camera view of Godot Slide 63 What can we say about Godot? Demonstrated that FOL could play an interesting role in human machine dialogue systems Also showed a new application of finite model building Domain known means all background knowledge known Limitations Scalability, only `small` dialogues Lack of incremental inference Minimal models required Slide 64 Godot the Robot [later] Godot at the Scottish museum Slide 65 The Third Story 2005-present Recognising Textual Entailment Or how first-order automated deduction is applied to wide- coverage semantic processing of texts Slide 66 Recognising Textual Entailment What is it? A task for NLP systems to recognise entailment between two (short) texts Proved to be a difficult, but popular task. Organisation Introduced in 2004/2005 as part of the PASCAL Network of Excellence, RTE-1 A second challenge (RTE-2) was held in 2005/2006 PASCAL provided a development and test set of several hundred examples Slide 67 RTE Example (entailment) RTE 1977 (TRUE) His family has steadfastly denied the charges. ----------------------------------------------------- The charges were denied by his family. Slide 68 RTE Example (no entailment) RTE 2030 (FALSE) Lyon is actually the gastronomical capital of France. ----------------------------------------------------- Lyon is the capital of France. Slide 69 Aristotles Syllogisms All men are mortal. Socrates is a man. ------------------------------- Socrates is mortal. ARISTOTLE 1 (TRUE) Slide 70 Five methods Five different methods to RTE Ranging in sophistication from very basic to advanced Slide 71 Recognising Textual Entailment Method 1: Flipping a coin Slide 72 Advantages Easy to implement Cheap Disadvantages Just 50% accuracy Slide 73 Recognising Textual Entailment Method 2: Calling a friend Slide 74 Advantages High accuracy (95%) Disadvantages Lose friends High phone bill Slide 75 Recognising Textual Entailment Method 3: Ask the audience Slide 76 RTE 893 (????) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5 th century AD. ---------------------------------------------------------------- The first settlements on the site of Jakarta were established as early as the 5 th century AD. Slide 77 Human Upper Bound RTE 893 (TRUE) The first settlements on the site of Jakarta were established at the mouth of the Ciliwung, perhaps as early as the 5 th century AD. ---------------------------------------------------------------- The first settlements on the site of Jakarta were established as early as the 5 th century AD. Slide 78 Recognising Textual Entailment Method 4: Word Overlap Slide 79 Word Overlap Approaches Popular approach Ranging in sophistication from simple bag of word to use of WordNet Accuracy rates ca. 55% Slide 80 Word Overlap Advantages Relatively straightforward algorithm Disadvantages Hardly better than flipping a coin Slide 81 RTE State-of-the-Art Pascal RTE challenge Hard problem Requires semantics Slide 82 Recognising Textual Entailment Method 5: Semantic Interpretation Slide 83 Basic idea Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge in FOL Use ATPs to determine the likelyhood of entailment Slide 84 Wait a minute This requires that we have the means to produce semantic representations [DRSs] for any kind of English input Recall DORIS experience Do we have English parsers at our disposal that do this? Slide 85 Robust Parsing Rapid developments in statistical parsing the last decades These parsers are trained on large annotated corpora [tree banks] Yet most of these parsers produced syntactic analyses not suitable for systematic semantic work This changed with the development of CCG bank and a fast CCG parser Slide 86 Implementation CCG/DRT Use standard statistical techniques Robust wide-coverage parser Clark & Curran (ACL 2004) Grammar derived from CCGbank 409 different categories Hockenmaier & Steedman (ACL 2002) Compositional Semantics, DRT Wide-coverage semantics Bos (IWCS 2005) Slide 87 Example Output Example: Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Semantic representation, DRT Complete Wall Street Journal Slide 88 Using Theorem Proving Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Give this to the theorem prover: T H If the theorem prover finds a proof, then we predict that T entails H Slide 89 Vampire (Riazanov & Voronkov 2002) Lets try this. We will use the theorem prover Vampire This gives us good results for: apposition relative clauses coodination intersective adjectives/complements passive/active alternations Slide 90 Example (Vampire: proof) On Friday evening, a car bomb exploded outside a Shiite mosque in Iskandariyah, 30 miles south of the capital. ----------------------------------------------------- A bomb exploded outside a mosque. RTE-2 112 (TRUE) Slide 91 Example (Vampire: proof) Initially, the Bundesbank opposed the introduction of the euro but was compelled to accept it in light of the political pressure of the capitalist politicians who supported its introduction. ----------------------------------------------------- The introduction of the euro has been opposed. RTE-2 489 (TRUE) Slide 92 Background Knowledge However, it doesnt give us good results for cases requiring additional knowledge Lexical knowledge World knowledge We will use WordNet as a start to get additional knowledge All of WordNet is too much, so we create MiniWordNets Slide 93 MiniWordNets Use hyponym relations from WordNet to build an ontology Do this only for the relevant symbols Convert the ontology into first-order axioms Slide 94 MiniWordNet: an example Example text: There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes. Slide 95 MiniWordNet: an example Example text: There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes. Slide 96 Slide 97 x(user(x) person(x)) x(worker(x) person(x)) x(researcher(x) person(x)) Slide 98 x(person(x) risk(x)) x(person(x) cigarette(x)) . Slide 99 Using Background Knowledge Given a textual entailment pair T/H with text T and hypothesis H: Produce DRS for T and H Translate drs(T) and drs(H) into FOL Create Background Knowledge for T&H Give this to the theorem prover: (BK & T) H Slide 100 MiniWordNets at work Background Knowledge: x(soar(x) rise(x)) Crude oil prices soared to record levels. ----------------------------------------------------- Crude oil prices rise. RTE 1952 (TRUE) Slide 101 Troubles with theorem proving Theorem provers are extremely precise. They wont tell you when there is almost a proof. Even if there is a little background knowledge missing, Vampire will say: don`t know Slide 102 Vampire: no proof RTE 1049 (TRUE) Four Venezuelan firefighters who were traveling to a training course in Texas were killed when their sport utility vehicle drifted onto the shoulder of a Highway and struck a parked truck. ---------------------------------------------------------------- Four firefighters were killed in a car accident. Slide 103 Using Model Building Need a robust way of inference Use model builders Mace, Paradox McCune Claessen & Sorensson (2003) Use size of (minimal) model Compare size of model of T and T&H If the difference is small, then it is likely that T entails H Slide 104 Using Model Building Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge Give this to the Model Builder: i) BK & T ii) BK & T & H If the models for i) and ii) are similar, then we predict that T entails H Slide 105 Model similarity When are two models similar? Small difference in domain size Small difference in predicate extensions Slide 106 Example 1 T: John met Mary in Rome H: John met Mary Model T: 3 entities Model T+H: 3 entities Modelsize difference: 0 Prediction: entailment Slide 107 Example 2 T: John met Mary H: John met Mary in Rome Model T: 2 entities Model T+H: 3 entities Modelsize difference: 1 Prediction: no entailment Slide 108 Model size differences Of course this is a very rough approximation But it turns out to be a useful one Gives us a notion of robustness Negation Give not T and not [T & H] to model builder Disjunction Not necessarily one unique minimal model Slide 109 How well does this work? We tried this at the RTE-1 and RTE-2 Using standard machine learning methods to build a decision tree using features Proof (yes/no) Domain size difference Model size difference Better than baseline, still room for improvement Slide 110 RTE Results 2004/5 Accuracy Shallow0.569 Deep0.562 Hybrid (S+D)0.577 Hybrid+Task0.612 Bos & Markert 2005 Slide 111 RTE State-of-the-Art Pascal RTE challenge Hard problem Requires semantics Slide 112 What can we say about RTE? We can use FOL inference techniques successfully There might be an interesting role for model building The bottleneck is getting the right background knowledge Slide 113 Lack of Background Knowledge RTE-2 235 (TRUE) Indonesia says the oil blocks are within its borders, as does Malaysia, which has also sent warships to the area, claiming that its waters and airspace have been violated. --------------------------------------------------------------- There is a territorial waters dispute. Slide 114 Winding up Summary Conclusion Shameless Plug Future Slide 115 Summary Use of first order inference tools has a major influence on how computational semantics is perceived today Implementations used in pioneering work of using first-order inference in NLP Implementations used in spoken dialogue systems Now also used in wide-coverage NLP systems Slide 116 Conclusions We have got the tools for doing computational semantics in a principled way using DRT For many applications, success depends on the ability to systematically generate background knowledge Small restricted domains [dialogue] Open domain Finite model building has potential Incremental inference Slide 117 Shameless Plug For more on the basic architecture underlying this work on computational semantics, and particular on implementations on the lambda calculus, and parallel use of theorem provers and model builders, see: www.blackburnbos.org