22
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2336135 Datalog Grammars for Abductive Syntactic Error Diagnosis and Repair Article · July 2000 Source: CiteSeer CITATIONS 16 READS 54 1 author: Some of the authors of this publication are also working on these related projects: Translation Classification View project Corpora e dicionários do português medieval View project Gabriel Pereira Lopes New University of Lisbon 124 PUBLICATIONS 834 CITATIONS SEE PROFILE All content following this page was uploaded by Gabriel Pereira Lopes on 10 December 2013. The user has requested enhancement of the downloaded file.

Datalog - researchgate.net · Datalog Grammars for Ab ductiv e Syn tactic Error Diagnosis and Repair J. Balsa 1, V.Dahl 2 and J.G. P ereira Lop es 3 1 Departamen to de Inform atica,

  • Upload
    ngodung

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2336135

Datalog Grammars for Abductive Syntactic Error Diagnosis and Repair

Article · July 2000

Source: CiteSeer

CITATIONS

16

READS

54

1 author:

Some of the authors of this publication are also working on these related projects:

Translation Classification View project

Corpora e dicionários do português medieval View project

Gabriel Pereira Lopes

New University of Lisbon

124 PUBLICATIONS   834 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Gabriel Pereira Lopes on 10 December 2013.

The user has requested enhancement of the downloaded file.

Datalog Grammars for Abductive SyntacticError Diagnosis and RepairJ. Balsa1, V.Dahl2, and J.G. Pereira Lopes31 Departamento de Inform�atica, Faculdade de Ciencia de Lisboa, Bloco C5, Piso 1,Campo Grande, 1700 Lisboa, Portugal. [email protected] Logical and Functional Programming Group, Simon Fraser University, Burnaby,B.C. V5A 1S6, Canada. [email protected] Departamento de Inform�atica, Faculdade de Ciencias e Tecnologia, UniversidadeNova de Lisboa, Quinta da Torre, 2825, Monte da Caparica, [email protected]. In this paper we present a method based on abduction forexplaining and repairing the errors detected in the analysis of naturallanguage sentences. This method builds the most plausible correctionsfor any unrecognised sentence. It is a declarative approach and, as such,departs from the traditional heuristic relaxation based approaches. Inthe proposed method, the ungrammaticality of a sentence is viewed asa diagnosis problem and abductive reasoning is used to obtain the ex-planations for the detected errors. The problem is represented as anabduction problem and is implemented using logic programming withexplicit negation and with integrity constraints. The existence of an er-ror in a sentence will cause a contradiction to become apparent due tothe violation of an integrity rule. Contradictions are removed accordingto a revision process developed for contradictory programs in the sense ofthe well founded semantics for extended logic programs. A contradictioncan be eliminated by adding rules that force the change of the logic valueof some literals. The plausibility criteria among the di�erent correctionhypothesis is related to the minimality of the revisions and to the num-ber of fault modes in the revisions. In order to improve the e�ciency ofan earlier reported experiment that used a top-down parsing strategy, abottom-up parsing strategy, using a Datalog grammar, is now proposed.Building on the partial results obtained during this phase, the proposedmethodology for detecting and repairing errors is launched.1 IntroductionIn this paper a new method for explaining and repairing errors detected along theanalysis of natural language sentences is described. It is based on abduction plusDatalog Grammars (DLGs, [12]). We are interested in detecting and explainingerrors that are context dependent. So, we do not deal with the correction ofisolated misspelled words.For a long time, the problem of error correction was viewed as reducibleto a few cases, all amenable to relatively ad-hoc techniques such as constraint

relaxation. More recent e�orts try to integrate more natural language processingpower into the detection and repair tasks, as the sole means to cover more classesof errors accurately.In [5] an abductive framework for syntactic error repair was developed. Syn-tactic error detection and repair is conceived as a model-based diagnosis problem,in the sense of Console and Torasso [11]. This problem is then transformed intoan abductive problem and the method, proposed by Alferes and Pereira [1, 3]for solving abductive diagnosis problems as problems of contradiction-removalin a logic program, is then applied. Once possible explanations for the abductiveproblem are found, a correction for the input can be easily obtained.In [5], the diagnosis of errors and their repair is performed in two phases. Inthe �rst phase there is an attempt to fully parse the input string. In the secondphase, if the �rst one was not successful, the framework for abducing minimal(in the sense of subset ordering) sets of explanations is then fully set up. Oneof the problems that was not dealt with relates to the fact that, during thesecond phase, the partial parses obtained in the �rst phase were not used. As nomemory of earlier work is kept, the second phase needs to go again through thesame problems.In this paper, we explain how memory loss about the work already done isovercome by using a Datalog Grammar [12] in order to obtain a database withall the possible partial parses for the input string. In the second phase, usingthis database as contextual information, an abductive approach, similar to theone proposed in [5], is then set up in order to detect if either the input string isa correct sentence or a faulty one. In the latter case changes are then proposedfor the input.Datalog grammars are basically logic grammars in which function symbolsare either non-existent, or are restricted in such a way that their use in a grammardoes not a�ect termination (i.e., termination is still guaranteed). Their e�ciencywas shown to be better than that of their DCG counterparts under (terminating)OLDT-resolution [39].Other methods for detecting and repairing syntactic faults have been widelydiscussed in the literature (see [20] for a complete survey). However, as it willbecome apparent from this paper, our approach is completely declarative andcan be tuned either to propose corrections for the input, or (this requires furtherresearch) to help a grammar engineer to pinpoint a faulty or missing rule in thegrammar (declarative abductive program debugging) or a faulty or incompletelyspeci�ed or missing lexical entry.The other approaches are mostly based on heuristic relaxation techniques.The EPISTLE/CRITIQUE system [18, 33], that is used for editing Englishcorrespondence, is based on an Augmented Phrase Structure Grammar for thesyntactic analysis. It uses a relaxation-based technique, i.e. a technique thatrelaxes, one by one, di�erent kinds of constraints from the rules (such as numberagreement) in order to enable the recognition of the input string as a sentence.A more recent relaxation-based system, by Douglas and Dale [16], does syn-tax and style correction extending the PATR-II formalism [37], with a declarative

speci�cation of possible relaxation on the rules. This procedure enables a clearidenti�cation of possible input faults. However there is no clear declarative pro-cedure either for generating corrections for hypothesised input errors nor forpinpointing faults (grammar and lexicon) when the input is correct.Another system, for Dutch [42], after a spelling correction phase, uses a shift-reduce parsing technique with an Augmented Context-free Grammar in order toenable a robust parsing of the sentence and perform the syntactic correction.A more recent system, described in [17], as it currently stands, performs onlythe correction of agreement errors for French by relaxing constraints.Another system for Portuguese was developed [4]. It uses a chart parser ca-pable of handling quite complex movement of linguistic material in Portuguesesentences (topicalization, relative clauses, interrogative sentences, etc.), takinginto account di�erent kinds of barriers to that movement [26]. It also uses relax-ation and, as the previous systems, draws heavily on heuristic criteria to decidewhat are the possible corrections and which of those should be preferred. Itenables person, number and gender agreement correction, identi�cation eitherof missing words or of word excess, recognition of unknown words, etc. A com-parable system had been developed by C. Mellish [30]. It takes a context freegrammar and generalises chart parsing allowing some kind of relaxation. It doesnot handle feature uni�cation relaxation.Quite recently, a commercial product for Portuguese syntactic correction hasbeen presented [31]. This system is implemented in C++ and, in its core, thesyntactic corrector rules are based on Augmented Transition Networks [43]. Aftermaking the spelling correction of the text, the main module of the system tries,in �rst place, to �nd speci�c patterns of common errors. In a second phase, aheuristic strategy is used to perform automatic syntactic parsing.The methodology proposed in this paper, built on top of Datalog Grammars[12], extends the techniques based on abduction and on model-based diagno-sis which were �rst developed for Portuguese [5], and it is portable to otherlanguages and to other grammar formalisms and parsing methodologies. Ourextension focuses the abductive reasoner in such a way that it becomes moree�cient than in the previous model.Section 2 provides the background necessary for us to build on, in an infor-mal and intuitive manner (for the formal de�nitions and proofs, please see [5]and [12]). Section 3 presents the use of DLGs in the bottom-up parsing phase.Section 4 presents the abductive component. Section 5 discusses limitations andextensions, and Section 6 presents our concluding remarks.2 Background2.1 The Abductive Approach to Error CorrectionOur approach is based on the characterisation of the syntactic error correctionand detection problem as a diagnostic problem [11].Any system whose behaviour we want to diagnose will be described as apair hBM, COMPi, where BM is the behaviour model (whose basic constituent

is the grammar, in our approach) and COMP is the set of components of thesystem (the words of the input sentence). As the same word might appear morethan once in a sentence, each component is represented, not only by the wordbut also by its position in the input string. So, for the input string John atetwo cake, whose incorrectness must be explained, the set of components will beCOMP=f0John1, 1ate2, 2two3, 3cake4g. Of course we could think of higher levelsentence components other than the words, as it will be shown in Section 5.Each component of the system is associated to a set of behavioural modes.For a generic word i�1wi, this set ismodes(i�1wi) = fcorrect, faulti1 ,. . . , faultimi gwhere mi is the number of fault modes for the word i�1wi, and correct representsthe possibility that the given word form is correct in the input string. The faultmodes for a word depend:� on the representation of that word in the lexicon (it may or it may not berepresented there) and� when the word is there, on the assigned syntactic categories.As we want to detect and repair agreement errors, all words recognised as in- ecting in number will have the fault mode wrong number. But, as it will bementioned in Section 5, if we wanted to tackle person, gender, tense and modeagreement (namely the use of subjunctive and tensed in�nitive in subordinateclauses in Portuguese), then we should consider other kinds of fault modes.Following Console and Torasso [11], a diagnostic problem is de�ned as follows:De�nition 1 (Diagnostic Problem). A diagnostic problem DP is a tripleDP = hhBM, COMPi, CXT, OBSi where:� hBM, COMPi is the description of the system to be diagnosed;� CXT is a set of ground atoms denoting the set of contextual data;� OBS is a set of ground atoms denoting the set of observations to be explained.If we consider the words given in the input string as the components of thesystem to be diagnosed, each new input string (unrecognised sentence) de�nes anew diagnostic problem. Of course di�erent diagnostic problems will have manythings in common, namely, the grammar. In our system we consider a singleobservation requiring explanation. So, OBS will only have a single element, i.e.the information that the input was not recognised.The contextual data (CXT) depends on the sentence that is currently beingexamined for correction, and it contains a set of facts that represent possibleways for repairing possibly incorrect words (not only the misspelled ones butalso wrongly in ected words) [5].A solution for the diagnostic problem is an explanation for the observations.Console and Torasso [11] de�ned a solution to a diagnostic problem as a solutionto an equivalent abduction problem with consistency constraints. Let us have acloser look into their de�nitions.

De�nition 2 (Abduction Problem). Given a diagnostic problem DP=hhBM,COMPi, CXT, OBSi, an abduction problem AP corresponding to DP is a tripleAP = hhBM, COMPi, CXT, h+, �ii where:� + � OBS;� � = f:f(x) : f(y) 2 OBS for each admissible value of x other than yg.So, + represents the subset of the observations that must be covered by thesolution. As we are interested in a purely abductive approach, we will considerin our work +=OBS. In our case, �, a set of explicitly negated atoms withwhich the solution must be consistent, is the empty set.Using this approach, not everything can be abduced. The set of abduciblesymbols is the union of all the sets of behaviour modes, i.e.abducibles = n[i=1modes(i�1wi)where n is the number of words in the sentence. Given an abducible symbol, �,the fact that a word i�1wi is in fault mode � is represented in the behaviourmodel by �(i�1wi). In the behaviour model, BM (which is a set of Horn clauses)no abducible symbol can appear in the head of such clauses. So, BM will include:1. a representation of the grammar used to recognise the sentences;2. clauses that represent the coverage of each particular type of error;3. clauses that depend on the given input sentence and that establish the linkbetween the errors covered and the fault modes each given input word canbe in.Assuming that each component can be only in one behaviour mode, we can seewhat an explanation for an abduction problem is.De�nition 3 (Explanation for an Abduction Problem). Given an abduc-tion problem AP = hhBM, COMPi, CXT, h+, �ii, a set of literals W is anexplanation for AP i�:1. W contains exactly one element of the form �(c), for each c 2 COMP.2. W covers +, that is, for each m 2 +we have that BM [ CXT [ W j= m.3. W is consistent with �, that is, BM [ CXT [ W [ �is consistent, orequivalently, for each :m2 �, we have BM [ CXT [ W 6j= m.In general, more than one explanation for an abduction problem can exist. A so-lution to an abduction problem is de�ned as being an explanation that minimises(in the sense of set inclusion) the number of faulty components. For instance,given the sentence John ate two cke, the system could �nd these two explana-tions:W1=fcorrect(0John1), correct(1ate2), correct(2two3), misspelled(3cke4)gW2=fcorrect(0John1), correct(1ate2), wrong(2two3), misspelled(3cke4)gwhere the correction of the misspelling would substitute cke by cakes (in W1) orby cake (in W2). If we prefer lower cardinality explanations,W1 will be preferredas it assumes only one faulty mode.

2.2 Abductive diagnosis as contradiction removal in logic programsIn order to e�ectively solve our problem, we use the methodology presentedin Alferes and Pereira [3] for the abductive resolution of diagnostic problems.These authors de�ned a general program transformation (Theorem 1, below)that converts an abduction problem (De�nition 2, above) into a logic programextended with explicit negation1 and integrity constraints. Such a logic programis a set of rules of the formH L1, . . . , Ln (n � 0)where H is an objective literal2 and the Li's are literals. Integrity rules have theform ? A1, . . . , An, not B1, . . . , not Bm (with n+m > 0)where all Ai's and Bj 's are objective literals, and the symbol ? denotes falsity.In [1, 3] a semantics for this kind of logic programs,WFSX3, was de�ned. Forthe programs that are contradictory with respect to the WFSX, a process forremoving the contradiction was developed. As a matter of fact, contradictionsmay arise as a consequence of assuming some default literals as true, when theyshould not. For instance, consider the following program:a not b::a not c:? a;:a:This program is contradictory because both a and :a are entailed (since not band not c are true by default). The contradiction removal process �nds theminimal sets (in the sense of set inclusion) of default literals that cause thecontradiction, and revises their value. In the above example, if the prede�nedset of revisables is fnot b, not cg, two minimal revisions of the program above,fbg and fcg, will be found. Of course we could remove the contradiction byrevising both not b and not c, but this would not lead to a minimal revision(fb,cg is not minimal).Using Alferes and Pereira's methodology the abduction problem is solved asa contradiction removal problem. Let us then see how we can transform an ab-duction problem in order to take advantage of the contradiction removal processprovided by the machinery developed by Pereira et al [1{3, 15]. The followingtransformation is due to [3].1 Although in this paper we do not illustrate the use of explicit negation, we have usedit in some experiments in the context of [5].2 A literal is either an objective literal or a default literal. An objective literal, L, isan atom (A) or its explicit negation (:A). A default literal is an objective literal orits default negation (not L). An explicitly negated atom is true, only if it is entailed.Whereas a default negated literal may be assumed true when the literal itself is notentailed.3 Well-Founded Semantics with eXplicit negation.

Theorem 1. Given an abduction problem (AP) corresponding to a diagnosticproblem, the minimal solutions of AP are the minimal revising assumptions ofthe modelling program plus contextual data and the following rules:1. ? not obs(v), for each obs(v) 2 +.2. :obs(v), for each :obs(v) 2 �.And for each component ci with distinct abnormal behaviour modes bj and bk:3. correct(ci) not ab(ci)4.4. bj(ci) ab(ci), fault mode(ci,bj).5. ? fault mode(ci,bj), fault mode(ci,bk).for each bj and bk and being fault mode(ci,bj) and ab(ci) the revisables.The great advantage of having this transformation is that there is an in-terpreter for the WFSX that performs contradiction removal (the algorithmin which this interpreter is based is described in [2]). So, if we represent thebehavioural model of the system as a logic program (and this is a straightfor-ward task since BM is a set of Horn clauses), adding the rules indicated in theabove theorem, we get a direct implementation. This means that when we applythe contradiction removal process to the program obtained as a result of thetransformation de�ned, the minimal revisions of the program are the abductiveexplanations we are looking for (in our application, errors in words). And fromthese abductive explanations we may choose one as the diagnosis. This choicetakes into account other criteria, namely the cardinality of the revisions.2.3 Datalog GrammarsIn their most common implementations, logic grammars [10] resort to list rep-resentations of the strings being analysed or synthesised. For instance, the rule:s --> np, vp.where s stands for sentence, np for noun phrase and vp for verb phrase, typicallytranslates into:s(In, Out) np(In, Remainder), vp(Remainder, Out).The �rst added argument represents the input from which to consume symbolswhile analysing them, while the second one produces the as yet unanalysedportion of the string, which is to be consumed by the next grammar symbol.Thus the query: ?- s([the, parse, failed],[]).4 ab(c) means the component c is abnormal.

can be read as: \Can the initial part of the string the parse failed be recognisedas a sentence by the grammar, leaving an empty remainder?" - i.e., can thewhole string be recognised as a sentence? In general, string manipulation resortsinvisibly to the system predicate 'C' (standing for connects), usually de�ned as'C'(From,[From|To],To), where From denotes the head of the input list andTo the rest of that list.In Datalog grammars (DLG) [12], a given context free grammar is automati-cally translated into an assertional representation [19] which is largely equivalentto the list-based one but which, under appropriate evaluation mechanisms suchas OLDT resolution, ensures termination.For instance, a call to analyse the parse failed, compiles into:'D'(the,0,1). (the stretches between points 0 and 1)'D'(parse,1,2).'D'(failed,2,3).?- s(0,3). (Is there a sentence between points 0 and 3?)while lexical rules compile into forms that use these representations accordingly,e.g.:noun(P1,P2) :- 'D'(parse,P1,P2).meaning that there is a noun between points P1 and P2 if there is an appropriate'D' fact connecting P1 with P2.Termination can be guaranteed even in the presence of extra arguments,as long as these are orthogonal to the syntactic derivation process describedby the grammar | such as those typically used to build syntactic or semanticrepresentations as a by-product of parsing or those which act as constraints toavoid over generation, as in the case of number and gender agreement.Such arguments will, if anything, decrease the number of derivations possiblefrom the unaugmented grammar, and thus termination will be maintained. Froma practical point of view, the restriction to these kinds of arguments can beensured through the (not uncommon) technique of writing and testing a (pureDLG with no function symbols) grammar, and then adding allowed types ofextra arguments as needed.In [12] an incremental evaluation implementation of Datalog grammars wasstudied, based upon the semi-naive evaluation algorithm [9], in which we beginwith the set of axioms and obtain the theorems of the �rst \layer" by applyingthe derivation rules; then we take these theorems as a new starting point, toderive the theorems of the second layer, and so on. Generally, to derive thetheorems of the next \layer", at least one theorem produced at the previousstage must be used. This process terminates when no more new theorems canbe generated. This implementation also includes theorem counters, but we shalldisregard them for our purposes here.It is interesting to note that the Datalog grammar formalism is not dependenton implementation. It allows, other than the semi-naive bottom-up evaluationused here, other possible implementations, such as magic sets [8] or OLDT. The

s --> np, vp. s(A, B) np(A, C), vp(C, B).np --> det, n. n(A, B) det(A, C), n(C, B).vp --> v. vp(A, B) v(A, B).det --> [the]. det(A, B) 'D'(the, A, B).n --> [parse]. n(A, B) 'D'(parse, A, B).v --> [failed]. v(A, B) 'D'(failed, A, B).?- s(the parse failed).T1 = f'D'(the, 0, 1), 'D'(parse, 1, 2), 'D'(failed, 2, 3)gT2 = T1 [ fdet(0 ; 1 ); n(1 ; 2 ); v(2 ; 3 )gT3 = T2 [ fnp(0 ; 2 ); vp(2 ; 3 )gT4 = T3 [ fs(0 ; 3 )gFig. 1. A CF grammar and query, its DLG translation, and the result of incrementalparsingmain focus of this paper is not optimal execution of Datalog grammars, butrather, their applicability to abductive reasoning in the context of error detec-tion. The application of faster implementation methods of Datalog grammarswould, of course, indirectly improve on the results presented here.However, we should point out that the semi-naive evaluation algorithm is notas bad as it would seem in a �rst approach, since the parsing is focused by thedirect immediate use of the input string. A great degree of e�ciency is providedby the method proposed. Previous approaches, such as the one described in[30], also use a �rst bottom-up parse, to generate an initial chart on which amodi�ed top-down parser can then hypothesise possible complete parses. Butthis approach needs a myriad of modi�cations on the grammar: modi�ed topdown rules which must know how to re�ne a set of global needs, fundamentalrules, e.g. to incorporate found constituents from either direction, versions of top-down rules which stop when reaching an appropriate category that has alreadybeen found bottom-up, etc. Our approach cleanly deals with error detection andcorrection through a �rst Datalog pass followed by abductive reasoning on the(unmodi�ed) user's grammar.Figure 1 shows a toy grammar for sentences such as the parse failed, itstranslation into DLGs, and the sequence of theorems obtained incrementally asoutlined above. Terminals are noted between brackets. Since s(0,3) is presentamong the set of theorems derived, we conclude that the sentence in question issyntactically correct.Incremental parsing has been advocated from a psycho-linguistic as well asa computational viewpoint [38]. But of course, other implementations of DLGsare also possible (cf. [12]).

s(Number) --> np(Number), det(sing) --> [the]. det(plu) --> [the].vp(Number). det(sing) --> [this]. det(plu) --> [these].np(N) --> det(N), n(N). det(sing) --> [a].vp(N) --> iv(N). n(sing) --> [boy]. n(sing) --> [book].vp(N) --> tv(N), np(_). n(plu) --> [boys]. n(plu) --> [books].iv(sing) --> [laughs]. tv(sing) --> [reads].iv(plu) --> [laugh]. tv(plu) --> [read].Fig. 2. A grammar with number agreement speci�cations3 DLG Bottom-up Parsing PhaseLet us exemplify our approach through the toy grammar in Figure 2, in whichnumber disagreement is prevented through term uni�cation. Given the input theboys laugh, we obtain the results expressed in Figure 3(a). Given the input theboys laughs, we obtain the results expressed in Figure 3(b) and no parse for thewhole sentence is obtained. Given the input a boys laughs, we obtain the resultsexpressed in Figure 3(c) and no parse for the whole sentence is obtained. Giventhe input this boys read a book, we obtain the results expressed in Figure 4 andno parse for the whole sentence is obtained.(a) T1 = f'D'(the, 0, 1), 'D'(boys, 1, 2), 'D'(laugh, 2, 3)gT2 = T1 [ fdet(sing ; 0 ; 1 ); det(plu; 0 ; 1 ); n(plu; 1 ; 2 ); iv(plu; 2 ; 3 )gT3 = T2 [ fnp(plu; 0 ; 2 ); vp(plu; 2 ; 3 )gT4 = T3 [ fs(plu; 0 ; 3 )g(b) T1 = f'D'(the, 0, 1), 'D'(boys, 1, 2), 'D'(laughs, 2, 3)gT2 = T1 [ fdet(sing ; 0 ; 1 ); det(plu; 0 ; 1 ); n(plu; 1 ; 2 ); iv(sing ; 2 ; 3 )gT3 = T2 [ fnp(plu; 0 ; 2 ); vp(sing ; 2 ; 3 )g(c) T1 = f'D'(a, 0, 1), 'D'(boys, 1, 2), 'D'(laughs, 2, 3)gT2 = T1 [ fdet(sing ; 0 ; 1 ); n(plu; 1 ; 2 ); iv(sing ; 2 ; 3 )gT3 = T2 [ fvp(sing ; 2 ; 3 )gFig. 3. The result of using DLG to parse the strings: (a) the boys laugh, (b) the boyslaughs, and (c) a boys laughs4 The Abductive Error Detection and Repair PhaseIn order to run the correction process, the grammar presented earlier, in Fig-ure 2, must be expanded with rules 1, 3 and 5, as shown in Figure 5, after having

T1 = f'D'(this, 0, 1), 'D'(boys, 1, 2), 'D'(read, 2, 3), 'D'(a, 3, 4), 'D'(book, 4, 5)gT2 = T1 [ fdet(sing ; 0 ; 1 ); n(plu; 1 ; 2 ); tv(plu; 2 ; 3 ); det(sing ; 3 ; 4 ); n(sing ; 4 ; 5 )gT3 = T2 [ fnp(sing ; 3 ; 5 )gT4 = T3 [ fvp(plu; 2 ; 5 )gFig. 4. The result of using DLG to parse the string this boys read a bookstripped the grammar from the lexicon part. The added rules will be used forjustifying correct behaviour of grammar non-terminals. These rules use the in-formation produced during the �rst parse phase (using the Datalog Grammarformalism). So that the reader may easily understand the use of Theorem 1, letus take the example illustrated in Figure 3(b). The transformation we requirefrom that data collapses levels T1 and T2 into the data depicted in Figure 6under the predicate 'D'/4 (In T1 'D' was a three-argument predicate). Theother higher order syntactic categories np, vp and s are transformed into np1,vp1, and s1, respectively (np1(plu,0,2) and vp1(sing,2,3) of Figure 6).In the second phase, as np1(plu,0,2) is a fact, the system may assume thesubject np as correct (see Rule 3, Figure 5, and the fact vp1(2,3)). In thiscase it only needs to parse through the verb phrase in order to �nd that theerror results from the number of iv, which should be plural (laugh) rather thansingular (laughs). The other possible repair assumes a correct vp (see Rule 5,Figure 5), stretching from position 2 to 3 (the resulting fact is vp1(sing,2,3)of Figure 6), and the repair must build upon the noun, changing its number fromplural (boys) to singular (boy). Let us now look in detail at how our techniqueacts.For each grammar pre-terminal symbol, pre terminal(Info, A, B), a genericrule must be added:pre terminal(Info, A, B) rw(pre terminal, Info, A, B).So, for the noun, we must add the rule:1. s(N,A,B) s1(N,A,B).2. s(N,A,B) np(N,A,C), vp(N,C,B).3. np(N,A,B) np1(N,A,B).4. np(N,A,B) det(N,A,C), n(N,C,B).5. vp(N,A,B) vp1(N,A,B).6. vp(N,A,B) iv(N,A,B).7. vp(N,A,B) tv(N,A,C), np( ,C,B).Fig. 5. The grammar for the abductive error detection and repairing

'D'(det,sing,0,1). 'D'(det,plu,0,1).'D'(n,plu,1,2). 'D'(iv,sing,2,3).np1(plu,0,2). vp1(sing,2,3).Fig. 6. Transforming the results obtained in the �rst phase for the parse of the boyslaughs n(N, A, B) rw(n, N, A, B).where rw stands for recognised word and represents the link between the gram-mar and the rules that will allow the system to explain the errors. For eachagreement error that we want to cover we must add a rule for rw/4.rw(Cat, N, A, B) 'D'(Cat, N1, A, B), /* how the word was recognised */canchange n(Cat, N1, A, B, N), /* contextual information */wrong number(w(A, B)). /* fault mode */But we must also consider a rule for the correct word case:rw(Cat, N, A, B) 'D'(Cat, N, A, B), correct(w(A, B)).So far, we have presented the rules that are needed for the error correction,independently of the sentence to which they will be applied. Now, given theinformation created during the �rst phase analysis of the input string (performedby the Datalog parser), let us see what rules must be created in order to performthe error correction as contradiction removal in the logic program. Consideranother incorrect input (Figure 3(c)): a boys laughs. After the �rst phase, thetransformed results are shown in Figure 7. According to the transformations'D'(det,sing,0,1). 'D'(n,plu,1,2).'D'(iv,sing,2,3). vp1(sing,2,3).Fig. 7. Transforming the results obtained in the �rst phase for the parse of a boyslaughsimposed by Theorem 1, we must add, for each word in the input string, a rulefor its possible correct behaviour mode (condition 3 of theorem 1). These rulesshould have the form:correct(w(A,B)) not ab(w(A,B)).

For each possible fault mode of each word we must add a rule. As the onlyfault mode we are illustrating is wrong number, and as every word in the sentencecan be in that fault mode (according to their syntactic categories), we must addthree rules of the generic formwrong number(w(A, B)) ab(w(A, B)),fault mode(w(A, B), wrong number).It is also necessary to add contextual information. In this example it is necessaryto identify the words that can have their number changed:canchange n(n,plu,1,2,sing).canchange n(iv,sing,2,3,plu).Notice that the article a, that stretches from positions 0 to 1, can not have thenumber changed in the sense that it can not be used in a plural noun phrase.This is because its stem is di�erent from that of the plural inde�nite article,some (not present in the toy grammar presented in Figure 2)5. In the context ofa systematic generation of canchange n facts, such irregular cases will need, asin all dictionaries, to be explicitly dealt with. Finally, it is necessary to assert therule that will trigger the revision process and thus build the possible explanationsfor the error in the sentence. This rule is? not s(N, 0, 3)This is the integrity constraint that corresponds to the one that must be addedaccording to condition 1 of Theorem 1. In the interpreter we are using, this ruletakes the form s(N, 0, 3) ( true.where ( is a special operator used to de�ne integrity constraints. The previousrule states that the system should �nd, unconditionally, all possible sets of atoms(abducibles) which, when revised to true, will allow the proof of s(N, 0, 3). Inthis example the revisables would be identi�ed by the factsrevisable(fault mode(w(A, B), wrong number)).revisable(ab(w(A, B))).where A and B should be again instantiated to the pairs of values (0,1), (1,2)and (2,3). According to revision process, it is easy to check that, in this example,there is only one minimal revision for the program (that corresponds to recognisethe sentence as a boy laughs):5 Of course these are somewhat simpli�ed examples and we present them for illus-tration purposes only. Note that the way the canchange n facts are created is moreelaborated than it may seem. In fact, to generate these facts, we use some morpho-logical information that is not present in the grammar presented.

R1 = fcorrect(w(0,1)), correct(w(2,3)),ab(w(1,2)), fault mode(w(1,2),wrong number) gAs a matter of fact, the determiner a can not, in this simpli�ed scenario, changein number (although we have the fault mode wrong number, we do not have thecontextual information, and thus the second rule for rw/4 can never succeed).So, the only possibility is to change the number of the noun.Considering the example in Figure 4, this boys read a book, there are twopossible explanations, i.e., there are two minimal revisions of the program:R1 = fcorrect(w(0,1)), correct(w(3,4)), correct(w(4,5)),ab(w(1,2)), fault mode(w(1,2),wrong number),ab(w(2,3)), fault mode(w(2,3),wrong number)gandR2 = fab(w(0,1)), fault mode(w(0,1),wrong number),correct(w(1,2)), correct(w(2,3)), correct(w(3,4)), correct(w(4,5))gwhere R1 corresponds to recognise the sentence as this boy reads a book, and R2to recognise it as these boys read a book. If we were taking into account the tenseof the verbs, R1 would not be minimal, because read is also past tense. However,as we are only considering present tense, and since none of the above revisionscontains the other, they are both minimal revisions and the interpreter does notforce the choice of any of them. But, a possible preference criterion could choosethose revisions with a lower number of fault modes. If this were the case, R2would obviously be preferred.5 Limitations and ExtensionsIn our present approach the components of the system to be diagnosed arethe words in the input string. So we can associate to each word a set of faultmodes. One possible improvement is to consider also as components the partialparses that are obtained in the �rst analysis. For instance, for the sentence theboys laughs, we will have the usual components w(0, 1), w(1, 2) and w(2, 3)(one corresponding to each word), and two other components correspondingto the fact that we can parse a plural noun phrase between points 0 and 2(np1(plu,0,2)), and a singular verb phrase between points 2 and 3 (vp1(sing,2,3)).Moreover we must take into account the correct behaviour mode and the faultmodes that we can now associate to each of these components. In our simpleexample we will only add the fault mode wrong number. For noun phrases thiscorresponds to adding the rulenp(N, A, B) np1(N1, A, B), /* successful partial parse */canchange n(N1, N), /* change the number */wrong number(np1(N1, A, B)). /* fault mode */

The use of these new components is suitable for an interactive environmentwhere the user can choose at which level he wants to receive the explanations.In the example above, if the user chooses a higher level, the two only possibleexplanations might be:- there is an error in number in the noun phrase the boys laughs- there is an error in number in the verb phrase the boys laughsAt this point the user might correct the error himself or tell the system to �ndexplanations at a lower level (in our case the usual word level).The implementation of this feature might be achieved using the possibilityof de�ning levels for the revisable literals involved [14].We can also extend the previous approach to the pre-terminal categories. Thiswill allow the system to suggest possible missing words in the lexicon. But thenwe are running into the problem of updating the lexicon (and in other extensioninto the debugging of the lexicon itself and into the grammar debugging). Forthis purpose, we can add for each pre-terminal category a rule like the following(example for nouns):n(N, A, B) n1(N1, A, B), /* successful partial parse */canchange n(N1, N), /* change the number */wrong number(n1(N1, A, B)). /* fault mode */Using this rule, and the corresponding fault modes for n1(N1,A,B), the systemmight propose as an explanation for the error, an error in the number of thenoun, even if, for the recognised noun, the morphological analyser could not �ndan alternative number in ection (contextual information in the �rst presentedrule for rw/4). Of course the revisables corresponding to these fault modes wouldonly be active at the lowest level.Another possible way to improve the e�ciency of the explanation/correctionprocess, is to have a previous parse that builds the parse tree for the input.The basic idea is that, adding this extra argument to represent the syntacticstructure, we can use it to guide the �nal analysis, thus limiting the number ofcandidate rules at each step.Another possible extension concerns situations in which error correction re-sults in possible alternative parses. For instance, in a speech recognition system,in which words are not properly heard, we can imagine the incorrect inputApril sea delightcorrected, according to context, into a sentence with April seas as its subject,namely April seas delightor one with just April as its subject:April sees the light

In these uses, we can exploit the Datalog component of our system to proposealternatives as described in [12], and use the abductive component to furtherre�ne the choices. One obvious extension is the coverage of the other types ofagreement errors (person, gender, tense, mode, case, etc.)6. This can be easilydone by adding a speci�c rule for each category whose agreement we want tocheck. The rules would be analogous to the ones we have for number agreement(cf. [5]).A frequent error that is not covered in this paper relates to the existence ofextra words in the input string. Let us simplify and assume that there might beat most one extra word. One way to cover this error is to include new rules forthe syntactic categories that, taking advantage of the facts derived during theDatalog parse phase, consider the existence of an extra (erroneous) word. Forinstance, we can add the following rule for sentence:s(N, A, B) np1(N, A, C), vp1(N, D, B), extra word(w(C, D)).where np1(N, A, C) and vp1(N, D, B) represent the partial parses obtainedduring the Datalog parse phase and the predicate extra word corresponds to thefault mode. For instance, if we consider the input string0 the 1boys 2 eat 3 eat 4 the 5 cakes 6during the Datalog parse the facts np(plu, 0, 2) and vp(plu, 3, 6) are derived.So, with the previous rule for s, it is possible to identify in a very early stagethat one possible explanation for the faulty sentence is the existence of an extraword in the input string.6 Concluding RemarksThis work represents an attempt to solve the problem of syntactic error cor-rection in natural languages, using Datalog Grammars, abductive reasoning andmodel-based fault diagnosis. The method we described has several strong points.By using a DLG7 in order to obtain as much information as we can about theinput string, we have been able to prune the search space, thus avoiding an ex-plosion in the abductive procedure. The results obtained so far point to a 15%improvement of the response time relative to the process described in [5]. Aswe use a logic representation for the grammar, the integration of the syntacticanalyser in the extended logic programming language used is straightforward.As the de�nition of the grammar is independent from the de�nition of the rulesthat represent the types of errors that are covered, this makes it easier to expand6 Some of these agreement errors are not relevant in English. But at least in Portuguese(and other Latin languages) they are very important, namely the use of subjunctiveand tensed in�nitive in subordinate clauses.7 Flexibility of the Datalog grammar formalism for describing natural grammars hasbeen recently attested in a study of their use for reconstituting missing elements inparallel structures such as co-ordination [13].

any of these two components of the system, i.e., we can extend the coverage ofthe grammar without changing the rules for the errors, and we can cover a largernumber of errors without changing the grammar. Of course there are some pointsof contact between these two components. Although the examples we used fo-cus only in number agreement errors, other types of errors are already covered,namely other kinds of agreement errors (gender, person, case) and also missing,unknown and misspelled words8. Although we did not try to correct semanticerrors, in principle, these can be treated in a similar way, however this topicrequires further extensive research.With respect to relaxation-based methods (e.g. [16]), our approach is simplerin that it does not require the ordering of constraints. This need springs from thefact that relaxation methods depend crucially on how much of the grammar's in-formation is represented as constraints on feature values. Relaxation may trans-form grammars to less closely related ones, and therefore it becomes importantto identify, for instance through constraint ordering, which constraints (such asgender and number agreement) are more important to relax, which should notbe tried if certain others have failed, etc. It is not clear how the ordering ofconstraints could be automated for relaxation methods, and relying on the userto declare some kind of ordering results in lower transparency.It should also be noted that the use of Datalog grammars, with its stresson lexical information in the input string and its immediate use in the analysis,favours currently promising linguistic approaches which stress the incorporationof all kinds of linguistic information into the lexicon. A lexicon thus enriched,coupled with our lexicon-focusing Datalog approach, is likely to provide at anearly stage a lot of information that will further reduce the search space.At the time, when the �rst version of this paper was prepared for the Interna-tional Workshop on Natural Language Understanding and Logic Programming[7] we have identi�ed several possible future developments for the work describedin this paper. One was the augmentation of the grammar coverage. Not only inthe sense that the grammar should have more rules, but it is also worth itto study the adaptability of this method to more powerful logic grammar for-malisms, namely extraposition grammars [32] or Bound Movement Grammars[26, 4]. Regarding Bound Movement Grammars formalism [26], that has beendesigned as a generalization of the Extraposition Grammars formalism [32] inorder to maintain declarativity and descriptive simplicity when dealing with dif-ferent kinds of movements of sentence constituents (in interrogative sentences, inrelative clauses, for topicalization, for clitics in Latin languages, etc), a wide cov-erage grammar (with more than 200 rules) for Portuguese was designed as wellas a parsing framework for a head driven bottom-up and top-down bidirectional(LR and RL) chart parser [35, 36] implemented in DyALog [41, 40]. This parserwith such a grammar is quite fast (it partially parses 6 pages of text in less than14 seconds). We have not yet reproduced the results of the research described in8 We do not perform the spelling correction, but it is possible to integrate a spellingcorrector with our system (for instance, [34]).

this paper using this wide coverage grammar for explaining incorrectly writtensentences.As we have also pointed out, it is also desirable to cover more types of errors.Even at the syntactic level there are types of errors that are not considered, forinstance, errors due to an incorrect separation of the words (thecar instead ofthe car, or bo ok instead of book).As already noted, we assume that the errors can occur only in the given input.Of course, this might not be the case. There may appear errors (especially dueto lack of information), both in the grammar and in the lexicon, that make itimpossible to recognize a sentence. Whether or not we can represent this typeof errors in a uniform way is still an open issue for further research. However,quite recently, Lopes et alia [25, 21, 6, 23] have experimented the chart parsermentioned above [35, 36] for explaining partial parses of real text by assumingwrong subcategorization information in the used lexicon (with 120000 basic wordentries: singular for nouns, singular masculine for adjectives, in�nitive for verbs[22, 24]). The hypotheses posed that lead to better partial parses (with a lowernumber of non-intersecting edges spanning over the partially parsed input) maythen be statistically validated using a method similar to the one explained in [29,27]. These experiments aim at enabling self-evolving parsing systems [23, 21].Another issue for further research arises as a consequence of pruning thesearch space, for larger coverage grammars, namely by using a POS tagger [28].This may lead to locally incorrect tagging. Relying on the assigned tags maythen lead to the impossibility of recognizing correct input. This is a problem wehave just started to tackle and experimental results are not yet available.Acknowledgements We gratefully acknowledge support from Canada's Na-tional Scienti�c Research Council (grant 31-611024), from Universidade Nova deLisboa, from Simon Fraser University, and from V. Dahl's Calouste GulbenkianAward for Science and Technology, 1994. We want to also thank JNICT for thesupport received through the grant PBIC/C/TIT /1226/92, corresponding tothe project FEELING (Tools for the Portuguese Language Industry). We alsowant to thank Lu��s Moniz Pereira for his encouragement towards an extendedapplication of the abductive machinery developed at the Computer Science De-partment of FCT/UNL and Paul Tarau, as well as the anonymous referees, forhelpful comments on this article's �rst draft.References1. J. J. Alferes. Semantics of Logic Programs with Explicit Negation. PhD thesis,Universidade Nova de Lisboa, 1993.2. J. J. Alferes, C. V. Dam�asio, and L. M. Pereira. A top-down query evaluationfor well-founded semantics with explicit negation. In A. Cohn, editor, EuropeanConference on Arti�cial Intelligence, pages 140{144. Morgan Kaufmann, 1994.3. J. J. Alferes and L. M. Pereira. Reasoning with Logic Programming. LNAI 1111.Springer Verlag, 1996.

4. A. Bai~ao. Chart parsing: Repairing morpho-syntactic errors on written portuguese.Master's thesis, Faculade de Ciencias e Tecnologia, Universidade Nova de Lisboa,1994. (in Portuguese).5. J. Balsa. Abductive reasoning for repairing and explaining errors in portuguese.Master's thesis, Faculdade de Ciencias e Tecnologia, Universidade Nova de Lisboa,1994. (in Portuguese).6. J. Balsa. A hierarchical multi-agent system for natural language diagnosis. InHenri Prade, editor, Proceedings of the 13th European Conference on Arti�cialIntelligence (ECAI'98), pages 195{196. John Wiley and Sons, 1998.7. J. Balsa, V. Dahl, and J. G. P. Lopes. Datalog grammars for syntactic errordiagnosis and repair. In Proc. of the 5th NLULP Workshop, 1995.8. Francois Bancilhon, David Maier, Yehoshua Sagiv, and Je�rey D Ullman. Magicsets and other strange ways to implement logic programs. In PODS '86. Proc. Ofthe Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems,pages 1{15, 1986.9. Fran�cois Bancilhon and Raghu Ramakrishnan. An amateur's introduction to re-cursive query processing strategies. In Carlo Zaniolo, editor, Proceedings of theACM SIGMOD International Conference on Management of Data, pages 16{52,1986.10. A. Colmerauer. Metamorphosis Grammars, pages 133{189. LNCS 63. SpringerVerlag, 1978.11. L. Console and P. Torasso. A spectrum of logical de�nitions of model-based diag-nosis. Computational Intelligence, 7(3):133{141, 1991.12. V. Dahl, P. Tarau, and Y. N. Huang. Datalog grammars. In M. Alpuente, R. Bar-buti, and I. Ramos, editors, Proceedings of the Joint International Conference onDeclarative Programming (GULP-PRODE'94), 1994.13. V. Dahl, P. Tarau, L. Moreno, and M. Palomar. Treating coordination throughdatalog grammars. In Proceedings of the COMPULOGNET/ELSNET/EAGLESWorkshop on Computational Logic for Natural Language Processing, pages 1{17,1995.14. C. Dam�asio, W. Nejdl, and L. M. Pereira. Revise: An extended logic programmingsystem for revising knowledge bases. In J. Doyle, E. Sandewall, and P. Torasso,editors, Proc. of KR'94. Morgan Kaufmann, 1994.15. C. V. Dam�asio. Paraconsistent Extended Logic Programs with Constraints. PhDthesis, Universidade Nova de Lisboa, 1996.16. Shona Douglas and Robert Dale. Towards robust PATR. In Proceedings of theFifteenth International Conference on Computational Linguistics (COLING '92),pages 468{474, 1992.17. Damien Genthial, Jacques Courtin, and Jacques Men�ezo. Towards a more user-friendly correction. In Proceedings of the Fifteenth International Conference onComputational Linguistics (COLING '94), pages 1083{1088, 1994.18. G. E. Heidorn, K. Jensen, L. A. Miller, R. J. Byrd, and M. S. Chodorow. TheEPISTLE text-critiquing system. IBM Systems Journal, 21(3):305{326, 1982.19. R. A. Kowalski. Logic for Problem Solving. Elsevier, 1979.20. Karen Kukich. Techniques for automatically correcting words in text. ACM Com-puting Surveys, 24(4):377{439, 1992.21. J. G. P. Lopes and J. Balsa. Overcoming incomplete information in NLP systems- verb sub-categorization. In Fausto Giunchiglia, editor, Arti�cial Intelligence:Methodology, Systems and Applications, pages 331{340. Springer Verlag, 1998.

22. J. G. Pereira Lopes. Desenvolvimento de Dicion�arios Electr�onicos e de Bases deDados Lexicais Para Processamento Computacional de L��nguas Naturais. Colecci�onAspectos de Lingu��stica Aplicada. Madri/Frankfurt: Iberoamericana/Vervuert,1998.23. J. G. Pereira Lopes. Overcoming incomplete lexical information. In Steven Krawer,editor, Proceedings of the Workshop "ELSNET in Wonderland", pages 114{116.ELSNET, 1998.24. J. G. Pereira Lopes, N. Marques, and V. Rocio. POLARIS: A POrtuguese lexiconacquisition and retrieval interactive system. In Leon Sterling, editor, Proceedings ofthe Second International Conference on the Practical Applications of Prolog, page665, 1994.25. J. G. Pereira Lopes, V. Rocio, and J. Balsa. Overcoming incomplete lexical infor-mation, 1998. (submitted to the Journal of Computational Logic).26. J. G. Pereira Lopes, Vitor Rocio, Rosa Viccari, and Emiliano Padilha. Bound-movement grammar for natural language parsing. In Proceedings of the "II Encon-tro Para O Processamento Computacional de Portugues Escrito e Falado", pages11{19, 1996.27. N. Marques, J. G. Pereira Lopes, and C. A. Coelho. Using loglinear clustering forsubcategorization identi�cation. In M. Quafafou and J. Zytkov, editors, Proceedingsof the 2nd European Symposium on Principles of Data Mining and KnowledgeDiscovery (PKDD '98), LNCS 1510, pages 379{387. Springer Verlag, 1998.28. N. M. Marques and J. G. P. Lopes. Using neural nets for portuguese part-of-speechtagging. In Proc. Of the 5th CSNLP Conference, September 1996.29. N. M. Marques, J. G. P. Lopes, and C. A. Coelho. Learning verbal transitivityusing loglinear models. In C. Nedellec and C. Rouveirol, editors,Machine Learning:ECML-98, pages 19{24. Springer Verlag, 1998.30. Chris S. Mellish. Some chart-based techniques for parsing ill-formed input. InProceedings of the 27th Annual Meeting of the ACL, pages 102{109, 1989.31. Maria Das Gra�cas V. Nunes, Claudete M. Ghiraldelo, Gisele Montilha, MarceloTurine, Maria Cristina de Oliveira, Ricardo Hasegawa, Ronaldo Martins, andOsvaldo Oliveira Jr. Desenvolvimento de um sistema de revis~ao gramatical au-tom�atica para o portugues do brasil. In Proceedings of the "II Encontro Para OProcessamento Computacional de Portugues Escrito e Falado, pages 71{80, 1996.(in Portuguese).32. Fernando Pereira. Extraposition grammars. American Journal of ComputationalLinguistics, 9(4):243{255, 1981.33. Stephen Richardson and Lisa Braden-Harder. The experience of developing a large-scale natural language text processing system: CRITIQUE. In Proceedings of the2nd Conference on Applied Natural Language Processing, pages 195{202, 1988.34. V. Rocio and J. G. P. Lopes. Approximate word matching in retrieval from elec-tronic dictionaries. In Proceedings of the International Conference on Natural Lan-guage Processing and Industrial Applications, pages 237{241, 1996.35. V. Rocio and J. G. Pereira Lopes. Partial parsing, deduction and tabling. InProceedings of the First Workshop on Tabulation in Parsing and Deduction (TAPD'98), pages 52{61, 1998.36. V. Rocio and J. G. Pereira Lopes. An infra-structure for diagnosing causes forpartially parsed natural language input. In Proceedings of the Sixth InternationalSymposium on Social Communicattion, 1999. (to be published).37. S. Shieber, H. Uszkoreit, F. Pereira, J. Robinson, and M. Tyson. The Formalismand Implementation of PATR-II, chapter The formalism and implementation ofPATR-II, pages 39{79. AI Center, SRI International, 1983.

38. Edward Stabler. Parsing for incremental interpretation. Technical report, UCLA,1994.39. Hisao Tamaki and Taisuke Sato. OLD resolution with tabulation. In Third In-ternational Conference on Logic Programming, LNCS 225, pages 84{98. SpringerVerlag, 1986.40. Eric Villemonte de la Clergerie. Automates �a Piles et Programmation Dynamique.DyALog : Une application �a la programmation en Logique. PhD thesis, Universit�eParis 7, 1993.41. Eric Villemonte de la Clergerie and Bernard Lang. LPDA: Another look at tabula-tion in logic programming. In Van Hentenryck, editor, Proc. Of the 11th Interna-tional Conference on Logic Programming (ICLP '94), pages 470{486. MIT Press,June 1994.42. Theo Vosse. Detecting and correcting morpho-syntactic errors in real texts. InProceedings of the 3rd Conference on Applied Natural Language Processing, pages111{118, 1992.43. W. A. Woods. Transition network grammars for natural language analysis. Com-munications of the ACM, 13(10):591{606, 1970.

View publication statsView publication stats