58
A Robust Linguistic Processing Architecture C.J. Rupp and David Milward Distribution: Public Specification, Interaction and Reconfiguration in Dialogue Understanding Systems: IST-1999-10516 Deliverable D4.1 September,2000

A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

Embed Size (px)

Citation preview

Page 1: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

A Robust Linguistic Processing Architecture

C.J. Rupp and David Milward

Distribution: Public

Specification, Interaction and Reconfiguration inDialogue Understanding Systems: IST-1999-10516

Deliverable D4.1

September,2000

Page 2: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

Specification, Interaction and Reconfiguration in Dialogue Understanding Systems:IST-1999-10516

Goteborg UniversityDepartment of Linguistics

SRI CambridgeNatural Language Processing Group

Telefonica Investigacion y Desarrollo SA UnipersonalSpeech Technology Division

Universitat des SaarlandesDepartment of Computational Linguistics

Universidad de SevillaJulietta Research Group in Natural Language Processing

For copies of reports, updates on project activities and other SIRIDUS-related information, con-tact:

The SIRIDUS Project AdministratorSRI International23 Millers Yard,Mill Lane,Cambridge, United KingdomCB2 [email protected]

See also our internet homepage http://www.cam.sri.com/siridus

c�

2000, The Individual Authors

Page 3: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 3/58

No part of this document may be reproduced or transmitted in any form, or by any means, elec-tronic or mechanical, including photocopy, recording, or any information storage and retrievalsystem, without permission from the copyright owner.

Page 4: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 4/58

Primary responsibility for authorship is divided as follows. David Milward wrote Chapters 1 and4. C.J. Rupp wrote Chapter 2,3,5 and 6.

Page 5: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 5/58

Contents

1 Introduction 7

2 Linguistic Approach to Robust Processing 8

2.1 On Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Defining Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Engineering Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Robust Linguistic Processing . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.4 Robust Processing of Spoken Language . . . . . . . . . . . . . . . . . . 14

2.2 Architectures for Robust Linguistic Processing. . . . . . . . . . . . . . . . . . . 18

2.2.1 Putting Robustness before Parsing . . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Robust Parsing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Robust Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Robust Multi-Engine Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5 The Verbmobil Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Robustness as Repair 33

Page 6: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 6/58

4 Robust Interpretation 37

4.1 Distributed Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Mapping from a Semantic Chart to Slot Values . . . . . . . . . . . . . . . . . . 41

4.3 Distinctive features of the approach . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3.1 Task specific interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3.2 Choosing the best fragments, not just the largest . . . . . . . . . . . . . 45

4.3.3 Exploitation of underspecification . . . . . . . . . . . . . . . . . . . . . 46

4.3.4 Context dependent interpretation . . . . . . . . . . . . . . . . . . . . . . 46

4.3.5 Relationship to other Approaches . . . . . . . . . . . . . . . . . . . . . 47

4.4 Reconfigurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5 Evaluation on a transcribed corpus . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 The Contrast between two Approaches to Robustness 50

6 A Baseline Architecture for Combining Robustness Strategies 52

6.1 A Simple Analysis Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Page 7: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 7/58

Chapter 1

Introduction

In this report we will describe two apparently very different approaches to robust linguistic pro-cessing. The first is based around the notion of dealing with exceptional phenomena: a systemis robust if it deals with phenomena outside its normal range of input, such as ungrammaticalsentences. To achieve this the approach encodes various patterns which reconstruct self cor-rections, repetitions etc. The second approach has no hard and fast distinction between normalor exceptional input: it assumes that we have to come to conclusions under partial information(utterances without a full parse are just one example of this) and must make use of whatever in-formation sources are available at the time. Under this approach, if the input is noisy and henceuncertain, greater weight is given to expectation from the context, and vice versa.

The two approaches embody different philosophies, and this will be reflected in the chapterswhich follow. Despite this it does make sense to incorporate both strategies in a single system.The first approach is concerned with providing as complete an analysis of the input utterance aspossible, creating larger constituents which span e.g. a repetition or a correction. The secondapproach is concerned with taking a set of constituents and the context and mapping from these toa semantics. It does not care whether constituents are created via grammar rules or reconstructionrules provided the rules are reliable (or the new edges are weighted appropriately). In fact, fromthe perspective of the second approach, the fact that reliable patterns can be created to dealwith exceptional behaviour suggests that the behaviour is not exceptional at all, just a matter ofpredictable performance effects, similar to the predictable competence effects which are capturedvia syntax rules.

The first part of this deliverable describes the first approach, but also includes background con-cerning the nature of robustness, and some alternative approaches to robustness. The second partdescribes the second approach, and how this has been incorporated in the initial Siridus architec-ture. The final chapter describes how the two approaches could be put together, and some of thechallenges we foresee in achieving this.

Page 8: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 8/58

Chapter 2

Linguistic Approach to Robust Processing

2.1 On Robustness.

This chapter picks up and extends an exposition begun in [19]. The original theme was whatit actually means to provide a system with robust behaviour. Here1, we will go slightly deeperinto the semantics of the term “robustness” and include cases where systems are considered tobe inherently robust, as well as those where their functionality is extended to provide robust-ness. These insights will then be applied to the most common notion of robustness in linguisticprocessing and then to the processing of spoken language in particular.

2.1.1 Defining Robustness.

As we consider it useful to analyse the accepted definition of the term “robustness” we willtake as a starting point a definition of the underlying adjective taken from an online technicaldictionary [9]:

robust Said of a system that has demonstrated an ability to recover gracefully from the wholerange of exceptional inputs and situations in a given environment. One step below bullet-proof. Carries the additional connotation of elegance in addition to just careful attention todetail. Compare smart, opposite: brittle.

1Because here the pages are cheaper.

Page 9: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 9/58

In fact we will mainly make use of the first sentence which provides some very useful conceptsfor the structure of the discussion that follows. We might have gone further back to definitionsof the original usage from which the technical metaphor is drawn, but this smacks of excessiveetymological enthusiasm and, more importantly, standard dictionary entries distinguish readingswhich share the properties we would most want to emphasise, i.e. deeper analysis would appearmore diffuse rather than more concentrated.

The definition given above actually emphasises some of the properties of the best or most suc-cessful instantiations of robust strategies, in the reference to graceful recovery and elegance.Such heights are not always reached, at least in the research literature. However we believe thatwe can structure an exposition on the essence of robustness around four concepts in the firstsentence of the definition:

Demonstrate: Firstly, robustness has to be demonstrated. You cannot prove that a system isrobust. You must allow it to exhibit behaviours that are then deemed to demonstrate robustness.Hence, robustness is a user-oriented concept. Designers and developers may attempt to ensurerobustness but only the user can determine whether that was successful. How do they make thatjudgement?

Recover: The system exhibits robustness by recovering, preferably gracefully. But recoveryimplies that the systems behaviour in detectably different in an exceptional situation, and yet thatdoes not include failing or crashing. In fact, it is the absence of failure that first characterisesrobustness and then in the second instance the relative gracefulness of the “recovery” that it doesexhibit. The user does not necessarily know what is going on within the system and yet she canidentify the recovery behaviour.

Exceptional: The triggers for robust behaviours, or rather for interpreting the system’s be-haviour as a recovery, are exceptional inputs and situations. Well who is to say what is excep-tional? We are clearly dealing with a set of preconceived expectations about which circumstanceought to be within the scope of the system’s normal performance and which are exceptional.The correct response to a normal situation is unremarkable. Any non-terminative response to anabnormal situation will be interpreted as a recovery, but some recoveries are more graceful thanothers.

Environment: Finally, the whole system is further relativised because it takes place in a givenenvironment. The class of normal and exceptional circumstance varies according to an environ-ment. But what happens when the conditions for a given environment are broken? We mustnecessarily suspend robustness judgements, hence there are malfunctions that are outside of the

Page 10: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 10/58

scope of robust behaviour because they undermine the environment against which judgementsare to be made.

For our current purposes this latter insight is useful, but mainly because it helps us to arrive atthe practical definition of criteria for robustness judgements that we are looking for:

� Robustness is an externally observed, and therefore subjective, property.

� It is assigned on the basis of the system’s behaviour in a situation that is outside the ob-server’s expectations for the system’s normal working parameters.

� Robustness judgements are made relative to a more global set of expectations regardingnormal working parameters.

Such a practical definition, based on the appearance of being able to withstand a certain degree ofadversity, accords quite well with the original usage of the term “robust” as applied to physicalobjects, like, say, furniture. However, it provides absolutely no guidance on how to encoderobustness in any specific system.

2.1.2 Engineering Robustness.

We can caricature the definition given in the previous section by saying that robustness is gener-ally in the eye of the observer. This external viewpoint suggests a degree of subjectivity but thisis not necessarily the case. While you can vary the observer, you can also assume a generalisedmodel of the expectations of both naive and expert users. In particular, this will be necessaryfor the purposes of design and evaluation. For evaluation, you need a more formal set of criteriafor robustness judgements to allow for comparisons between strategies and for systems with andwithout a specific strategy. For design, purposes you need a goal to work towards and this canbe characterised in a generalised model of robustness judgements for a specific system.

However, an external viewpoint does mean that there is no easy correlation between the internalstate of the system and whether or not its behaviours will be considered robust. In a complexsystem there may, indeed, be a considerable distance between the modules that contribute torobustness and the system’s external behaviour, so that a successful local strategy may not givethe best global results. Since it is the global results that really count this decoupling of internalbehaviour and external effect would appear to be correct but it risks concealing an importantdistinction in the design of robust systems is concealed.

As far as the engineering of a system is concerned there is an obvious difference between inher-ently robust mechanisms and those that require additional components and behaviours to achieve

Page 11: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 11/58

a level of robustness. The root of such a distinction often lies in the application itself or thedegree to which the available technology matches the application. In fact, there are probablyeven applications that are inherently robust. If doing nothing is an acceptable behaviour then theapplication itself offers a “no lose” strategy. More generally, a system may give the impressionof being robust just by being correct and running in a stable operating system. The latter part ofthe definition above implies that this is possible in the reference to “careful attention to detail”,but that equally implies that user expectations are somewhat deformed, probably by exposure toincorrect or unfinished software. The additional criterion of elegance can also, and eminently, bemet by an inherently robust system. However, you have to wonder whether the definition of ex-ceptional situations is not in some way defective. For instance, this can most easily be explainedin an expert observer with expectations of cases that are typically problematic for the state ofthe art system who is confronted with a genuine advance in the underlying technology, such asemploying a theory that is oriented towards the actual data and not an idealisation of it. The per-ception of the systems behaviour is then defective in underestimating the actual capabilities ofthe technology, but the system is still robust because it is the perception of robustness that countsand not any particular strategy that is engineered into the system to generate this perception.

Inherent robustness is, however, but no means the rule and where it does occur it is usually arelative, rather than an absolute judgement where some problem cases are taken into the normalscope of the system’s functionality. In most cases robust behaviours are generated by additionalmechanisms over and above the basic system functionality. The solution of employing a “bag oftricks” to achieve an acceptable level of robustness is also relative, but in a slightly different way.Specific mechanisms and strategies are employed to counter specific problems so that claimsand judgements have to be made relative to the phenomena that were to be accounted for. Theinterplay of expectation is even more complex here: the system designer is attempting to counterexceptional, essentially unexpected, circumstances that may occur in the actual system usage, orsimply to expect the unexpected. The occurrence of such problematic cases may be amplifiedby inappropriate or idealised models underlying the basic functionality, or there may be othergenuine, perturbing factors, e.g. in the gathering of information. There are really only two typesof strategies you can employ to achieve additional robust behaviours:

focused: Identify problem phenomena, analyse them and formulate specific responses.

fuzzy: Make processing more, potentially successively more approximative.

The focused approach is highly phenomenon specific which directly affects the claims of robust-ness that can be made. The greatest danger here is that the identification process will misfire andemploy strategies that are not required. Of course, phenomena that are not predicted will notevoke any robust response. The fuzzy approach is more uniformly robust, but may involve thesystem becoming less responsive. This could lead to behaviours triggered more by the system’sown internal state than the information that was input. In the extreme case it also carries a certainrisk that the system will be so unperturbed that it fails to exhibit any response at all. It is then

Page 12: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 12/58

really a matter of perception whether this is regarded as a recovery. While we would not for oneminute assume that anyone would actually implement the extreme cases of these strategies wecan use them to characterise the shape of what they are about. On this view the catatonia inducedin by terminally fuzzy perception is far less dangerous than the paranoia that would result fromtaking a focused view to its logical conclusion.

2.1.3 Robust Linguistic Processing

The previous discussion has been somewhat abstract and without any grounding in concreteexamples. In fact, we have largely been extrapolating back from the specific instances that weare familiar with to a more general perspective. This is not an apology, just a description, sincewe are now ready to apply the our prespective on robustness to systems that process naturallanguage, and, in particular, spoken language.

The first question we will ask in addressing robust linguistic processing is: against what influ-ences is is necessary to be robust? Here, we can leave aside non-linguistic influences such asprocess communication or memory managment. The main source of linguistically exceptionalcircumstances is then the user. This is not meant to be, in any way, demeaning it is a simple andexplicable fact that the problems the system has to be robust against come from outside. There-fore there are whole areas of linguistic processing that are usually impervious to these influence,e.g. in generation from an internal representation to a linguistic form for output. Whereas, anyinformation that originates with the user has to be regarded with some degree of suspicion. Thereare two main reasons for this. The first is that computational models of linguistic behaviour donot completely match actual behaviour. Formal models are oriented towards competence ratherthan performance. Statistical models are data driven and therefore implicitly normative. Hence,there will be some degree of mismatch between the analyses proposed and what was actually in-tended. Usually, this is not the major source of inaccuracy, but performance effects are increasedin spoken language.

The other main source of perturbation in linguistic inputs is the medium of input itself. For writ-ten language this is also a relatively minor factor, but it does exist. Increasingly typed text existsin various different qualities. Most of these are relatively free of both performance phenomena,such as spelling errors, because they have been adequately edited, but wherever controls are lessstrict linguistic performance errors will be mixed with physical typing errors which are purely aconsequence of the medium. If you start to process typed input incrementally then both sides areamplified. Perhaps you could also consider non-native usage here which will generate compe-tence errors from human sources, but also, increasingly, automatic translations errors. However,none of these factors occurs on quite the same scale as the effects of automatic speech recognitionon spoken input. Despite the advances in speech recognition that have made speech applicationsa reality, recognition errors are persistent. The word error rate is the standard measure of qualityin speech recognition. This indicates immediately that it is the frequency of errors that is de-

Page 13: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 13/58

okay

at

I

have

evening % appointment seventeenth % but

and45.363

#NIB#16.253

I8.810

I24.492

#NIB#27.540

hi28.024

oh28.113

yeah28.925

no29.526

but28.219

that28.250

so28.483

29.351

88.510

am18.324

I25.198

62.132

have24.010

31.759

a9.992

an28.687 255.134 128.934

on27.334

the23.569 266.234 75.1291 2 3 4

5

6

7 8 9 10 11 12 13 14

Figure 2.1: An example of a word hypothesis graph for English.

creasing. Their nature has not changed nor their gravity. The worst case of media interferencetherefore coincides with the most frequent performance phenomena.

Since most of the phenomena that require a robust response occur in user inputs robust strategiesare required throughout analysis technique. Most of robust linguistic processing is, thus, reducedto robust linguistic analysis. In order to adequately present the issues associated with a robustlinguistic analysis we first need, at least schematically, a general model of linguistic analysiswithout a robust component. The predominant component in linguistic analysis is, of course, theparser. We all know what parsers do specifically, so generally what they do is associate an inputform with a result form, or analysis. For text processing the input form is essentially a string. Forspeech processing this can be more complex, because it may be a single string, a set of string ora set of strings organised into a word lattice or word hypothesis graph.

Any of these forms may be annotated with additional information. Probabilities or confidencevalues are common, but the range of possible annotations is quite broad. The range of objectsthat parsers may produce as analyses is, if anything, wider. Formally, the most popular structuresare trees, logical forms or attribute value structures, although care is needed with some of theseterminologies. A logical form is usually a string, but may be a tree. An attribute value struc-ture/matrix/graph or feature structure can encode both a tree and a logical form if interpreted asa metalanguage expression. None of these distinctions are particularly important to the architec-ture for implementing robustness but they may be important to the design of the parser. Theyare almost certain to be relevant to the linguistic description that a parser applies. The traditionaland default model of parsing employs a linguistic description, but there are also other ways ofmapping from an input to an analysis. We will not need to explore these here either because

Page 14: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 14/58

they are typically so uniform that only inherent or external robustness is worth considering. Ona reductionist model of linguistic analysis there are only really three things you can do introduceadditional robust behaviours:

� Modify the input.

� Modify the parser.

� Modify the analysis.

Modification of the parser can be further divided into the behaviour of the parser and modifica-tion to the linguistic description that the parser applies, but that is already getting into difficultterritory, because you are taking a linguistic model of competence and adding, probably specific,performance and application-specific information. It could be argued that as a result the descrip-tion only models what it is to be acceptable to the specific parser in a specific application. Inthe next chapter we will examine how each of these options for introducing robustness into thelinguistic analysis architecture has been exploited in processing spoken language, but first wemust also examine the phenomena that must be treated in more detail.

2.1.4 Robust Processing of Spoken Language

We have already noted in the previous section that spoken language exhibits two classes of phe-nomena that may require a robust treatment in analysis: spontaneous speech phenomena andrecognition errors. Spontaneous speech phenomena show linguistic performance and the effectsof the language production process. They present a problem for linguistic models that mostlymodel linguistic competence and are in many cases adapted from models designed for the anal-ysis of text. On the other hand spontaneous speech phenomena do present as clear and recog-nisable behaviours that, themselves, can be modelled. Recognition errors are less predictable asthey arise out of the techniques of automatic speech recognition. There are various points in thespeech recognition process where errors can enter in. The most obvious and trivial examples areactual disturbances on the speech signal that is recorded and the effects of statistical languagemodels applied as a final filter. In the best, or rather least worst, case recognition errors mayapproximate human recognition errors and confuse words with similar phonetic forms, but eventhat is not guaranteed. The one thing that we can confidently say is that in recent years the worderror rate for automatic speech recognition has been dropping steadily. This will not eliminaterecognition errors but it may provide enough of a framework of correct word hypotheses to detecterrors and determine a robust response.

The two main factors in the analysis of spoken language that require a robust response, sponta-neous speech phenomena and recognition errors, can be conceived of a noise on a signal. The

Page 15: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 15/58

speech recogniser itself must detect the speech signal from among the noise that occurs in therecorded data. We can similarly see the information the spoken input should carry as a linguisticsignal. This is perturbed by spontaneous speech phenomena and recognition errors. In reality, thereal noise is a contributing factor in the figurative noise, but that is not our current concern. Whatwe need here is ways of filtering out the noise or compensating for it. Taken at the metaphoricallevel this would mean either placing a threshold on the information that is supplied, or attemptingthe recognise patterns that are indicative of the perturbing phenomena. This coincides roughlywith the notion of fuzzy and focused strategies to robustness. We should note here that a thresh-old or filter requires a reliable and uniform measure of confidence in each item of information.Conversely, pattern matching must accept recognition of the individual pattern elements. Thiscan be taken into account but it indicates how pervasive confidence values once they are firstintroduced.

The two most significant spontaneous speech phenomena are self corrections and false starts.These both reflect errors in the speech production process. Words that have been uttered areexplicitly or implicitly retracted or replaced. This is equivalent to online speech editing.

e040ach1_040_ADB_280000: well Sunday night as well . Sunday night ,Monday night , and Tuesday night . we are coming back Wednesday . so ,-/I don’t suggest taking/- -/there is/- I don’t think on a Sundaythere is any need to take the #six $A-$M flight . but it is a #twelvehour flight , I would like to get there before it is +/too/+ very lateon the Sunday .

For example, we present here a turn, or single dialogue contribution, from the Verbmobil Englishcorpus. This displays both a self correction and a false start. This example employs a reducedversion of the Verbmobil dialogue annotations to highlight these behaviours. In a self correctionthe redundant section or reparandum, literally that which is to be repaired, is denoted by a pre-ceding +/ and a following /+. The latter mark implicitly marks the point at which the correctionoccurs, but the extent of the reparans, the replacement, is not marked.

False starts are marked with a similar notation where the -/ and /- bracket the redundant com-ponent. Here there is no direct relation to the restart, as a false start is intended to mark a changein the train of thought. Practically, though the distinction between a correction and a restart maynot always easy to make. The only hard test lies in the notion of the structural relation betweenreparandum and reparans, but that can, potentially be at any level of structure. These behaviourscan also be nested so that a correction may actually contribute to the redundant part of a falsestart, as in the following:

e049ach1_034_LMT_280000: okay . yeah , +/I could/+ -/well , I could

Page 16: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 16/58

leave/- I have dinner plans , you know +/but/+ -/we could leave/- itlooks like there is a midnight flight .

2 Conversely, a false start may be immediately followed by a correction. Which may also leadinto a grey area.

e064ach2_019_RNH_280000: and -/after/- +/I guess I will be back/+ Iwill be back in the office on Monday , the #fifth of January .

Is this a correction or a repetition?

e067ach2_010_PNP_280000: okay . -/so we will/- +/so when do you wantit/+ +/what/+ why don’t we leave a little bit , like the #twelfth ?maybe .

Are these corrections or further false starts. Fortunately, the questions we raise here are of theinterpretation suggested by the dialogue annotations. Accounting for these phenomena can onlybe concerned with relatively simple structural patterns that will focus on the simplest and mostfrequent phenomena.

Hesitations, stammering, and repetitions are also typical spontaneous speech phenomena butthese can formally be mapped on to the more intrusive classes, e.g. a self correction with identity.In fact, the annotation scheme adopted in Verbmobil does not distinguish between correction andrepetition. A repetition is simple a correction where reparans and reparandum are identical andthese are quite frequent.

e045ach2_068_MBB_280000: it is got , indoor pool , and , exercise room, all sort of , nice little features . -/I/- I am not really sure ,about these , locations . but they seem to be , in relation to+/their/+ their central station . and , you know as far as I can tell, +/either/+ either the ˜Marriott , or maybe the hotel ˜Cristal ,˜Hanover which is actually at the central station . seem to be +/the/+the good ideas for me .

Most of these phenomena then have a specific pattern, even at some level a symmetrical one. Ifwe can detect these patterns at some level of the analysis we can counter them by simplifying the

2The interpretation of the second correction is a dubious case.

Page 17: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 17/58

information that is available to further processing. This would be an intrusive process. You arealtering the object under analysis. So you have to be careful that you are not really eliminatinginformation that was really intended to be part of the communication.

Recognition errors are exhibited at a more basic level and should be countered by simpler strate-gies. Speech recognition is essentially word recognition, so an error is either a missing word orthe intervention of an incorrect word hypothesis, or both, as in the case of a substitution. Thecounter measures are similarly made up of basic string operations such as deletion and insertion.The problems of confidence in the error detection are also similar. The goal that is usually set isthe completion of an analysis apparently blocked by such an error, but the level of confidence inthe analysis must also be taken into account. Here too we would be fabricating an analysis, andtherefore information, that we have not actually seen. The situation may be slightly more certainwhere it appears that morphological variants of the same lemma have been substituted. This canbe easily explained by relative frequency in a language model, but this is also most typical ofhighly inflected languages.

The interplay of confidence values in the various analysis processes, robust or otherwise, is apervasive factor in the previous discussion. We should point out that confidence factors can bereflect different sorts of information and may even already be composed from different sources.Word recognition makes use of various acoustic models which include reduction to the level ofindividual phones. These may be combined with statistical language models, typically N-gramprobability in a given corpus. The combination of acoustic and distributional probabilities mayoccur within the recogniser or within the analysis module. The analyser itself may attach con-fidence values to constructions on the basis of distribution in a tree bank or analysed corpus, oron subjective hand-coded preferences. Finally, the robustness mechanisms, if they are separatelyimplemented, may attach different confidences to its own operations. In the latter case these aremost likely to be subjective, as it is hard to see how a probabilistic preference for robustnessmechanisms could be developed.

Spoken language processing is much more reliant on probabilities and other scoring mechanismsthan text processing, because its input data, the speech recognition results, is uncertain. Anothercharacteristic, or at least intensified, feature of spoken language processing is time pressure.Most of the benefits of using spoken inputs can only be realised if the system’s overall perfor-mance is close to real time, i.e. close to human performance for a similar task. Practically, thisoften imposes quite stringent time constraints on individual processes to the extent that time fac-tors can also play a role in determining the best analysis result, or rather the best result derivablein the available time. Time factors often play an influential role in determining how the inter-face between the parser and recogniser is explored, as in how many and which hypotheses areexlplored. Conversely, this affects how the probabilities and confidence values may be used.

The interplay of various such scoring mechanisms is already a complex problem, in particular thecombination probabilities and subjective scores, but in processing analyses of spoken languagethere is often a further factor that has to be taken into account. The scores for various competing

Page 18: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 18/58

constructions can be compared, but only if they are all present. This may not always be the case.We have noted that the interface between parser and recogniser allows a certain amount of varia-tion. In the simplest case the input is a string, the best string according to the recogniser, then theonly choices are between competing analyses or the choice of continuing with standard analysisor going robust. However, the best string is usually not the only one available. This complicatesthe choice problem with the factor of whether to suspend current processing and consider thenext best string, as it might just yield better results. A word lattice or word hypothesis graphcan provide a whole set of ranked strings simultaneously and in a compact data structure. Thefull ranking is however implicit. It would be possible to consider all analysis generated by wordsequences in a word lattice. This would reduce the choice between strings to the choice betweenconstructions, but for a very large set of constructions. In practice, word lattice are often reducedto N-best string sequences before or during parsing, just to ensure that the best strings can beprocessed within a realistic time. Where processing time allows a lattice can even be processedas a packed structure to produce a packed analysis result, but this further implies that the individ-ual analysis steps are relatively cheap and that subsequent modules in the application can eitherprocess all packed ambiguities or select an appropriate result.

In the subsequent discussion of architectures for robust linguistic processing and, in particular,of robust parsing processes we will take the paradigm in which a lattice is unpacked into aranked sequence of strings as a standard example for an analysis architecture, because this mostclearly indicates the scheduling problems, under time pressure, between grammatical and robustoperations on uncertain data. Such systems are already relatively complex and we see no reasonto believe that analysis methods that are more exhaustive or employ more compact data structurewould face any less conflicts in principle, but they may make them harder to see.

2.2 Architectures for Robust Linguistic Processing.

The following exposition is based on a more in depth survey in [31] but this is not widely avail-able yet. A similar summary appears in [32]. These surveys focus on architecture decisionsrelevant to the robust processing of spoken language. The conclusions that were previously ar-rived at in the references above were applied in the Verbmobil system. The strategy for robustanalysis adopted there is one of the modules to be accommodated in the Siridus demonstrator.Most of the conclusions arrived at in the Verbmobil discussion remain valid. However, it willalso be useful to consider whether this solution was to some degree dependent on specific fea-tures of the Verbmobil system and whether other alternative may become feasible in a differentcontext.

We have already noted that, in practice, robust linguistic processing is mainly concerned withrobust analysis, since most of the exceptional situations that require a robust response are inputsfrom the users. Those that do not originate with the user inputs are typically not of a linguistic

Page 19: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 19/58

nature. We have similarly noted in basic terms the architecture of linguistic analysis, or, at least,some of the typical interface behaviours of parsers, as the main analysis component. This givesus the three basic options for modifying the analysis architecture to accommodate additionalrobustness operations. Each of these will be explored in the subsequent sections of this chapterwhere we consider what is directly involved and how these options have been adopted in theprevious approaches. However, it should be noted that not all of the strategies we describe havebeen tested in realistic running systems.

2.2.1 Putting Robustness before Parsing

Robustness strategies that are applied before parsing effectively modify the parser’s input. Atthis level we come the closest to treating ill-formed input as a signal perturbed by noise. In thesimplest cases the input to the parser is a string. It may be appropriate to manipulate such a stringinput if specific phenomena can be detected that would prevent a well-formed analysis. Both thedetection of patterns of ill-formedness and the necessary manipulation can be carried out at arelatively low level because the input structures themselves are quite simple. This is equally thekey to the limitations on modifying the parsers input: which phenomena can accurately detectedat the level of a string and how does this scale up to a sequence of strings or a lattice?

The most successful preprocessors for robust linguistic analysis detect structural phenomena,such as symmetric self corrections, but do not only rely on the string evidence. Further cuessuch as prosodic irregularities can be used to locate the point where a correction occurs. Thisevidence can be combined with the occurrence of words or simple phrases that typically indicatedisfluency in the speech production process. The level of structural information that is available istypically quite limited. For example, it may be useful to carry out part of speech tagging to assistin the recognition of structural patterns, but any deeper analysis would require more processingand, above all, duplicate part of the analysis task of the parser. Assuming that the point ofcorrection can detected the appropriate manipulation then has to be derived and applied. Astatistical language model may be applied here to locate phrases that may be interchangeable, ascandidates for reparandum and reparans. In the most interesting case, [27], an alignment modelis applied which treats the modification of self correction structures as a translation problem. Thehidden danger of syntactic pattern matching is that what appear to be low level symmetries aresemantically quite distinct elements, as in the example in Figure 2.3. The information to separatesuch cases simple is not available without a deeper analysis.

Such pattern matching for spontaneous speech phenomena at the string level is, of course, ex-tensible to more complex parser inputs. This is most obviously the case when the most likelystrings for analysis are organised into a lattice, or word hypothesis graph. Patterns which matchon a string will also do so in a graph, effectively picking out all the represented strings in whichthe pattern is a substring. Word lattices also have the additional safety net that the modifica-tion can be entered without deleting the original sequence of word hypotheses. This offers the

Page 20: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 20/58

today is %

seventh %

seventh %

taking

talking

the

yeah

$P−$Mif142.080

today255.896

93.315

is41.122

118.846today41.122

is41.122

the21.866

272.803

234.055two

34.903

to65.805

too65.805

two65.805

to40.584

261.651

261.895

the28.211

226.529

196.420

226.4771 2

3

4 5 6 7 8 9

10

11

12 13

okay

that

that

okay

yes

yeah

that

sounds

sounds %

sounds

sounds %

so that %

sounds good

#NIB#13.012

#NIB#73.711

76.845

29.702

58.694

60.331

60.351

61.382

is29.702

that427.781

70.792

71.952

285.973

78.710

146.915

66.517 140.757

that211.548

that73.877 71.952 125.4721

2

3

4

5

6

7 8

9 10 11 12

and eleven#NIB#13.253 93.484

we19.041

we37.710

we17.615

we33.390

will19.041

will17.615

meet %149.995

meet209.631

me204.218

be %150.515

we %153.242

me49.811

me47.120

meet63.227

me69.634

to50.065

that46.227

the %47.494

at33.649

196.7091 2 3 4

5 6

7 8 9

10

11

12 13

Figure 2.2: Word hypothesis graphs with repair edges.

Page 21: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 21/58

hundred marks % single %

night

night

that49.127

it61.162

is35.943

is27.194

a16.790 54.692

<UNK:NoName.daUItS.Deutsch>90.619 73.097

for234.033

for31.262

a9.320 135.504

a12.098

for48.534

90.873

a12.098 90.8731

2

3 4 5 9 10 11 12 13

14

15

16

17 18 19

have

have

haven’t

have %

have

seventh

fourth % seventhI54.978

71.936

80.898

106.208

109.278

63.861from

38.365

to44.967

until348.164

till348.164

the33.323 153.237

in33.185

some33.895

an25.310

#NIB#27.976

the34.273 208.428

until60.319

till60.319

to65.074

the33.323 153.2371 2

3

4 5

6

7

8 9 10 11 12 13

no

nine o’clock % twelfth

morning %

morning %

morning

morning %

on

twelfth

#NIB#119.068

a120.744

yeah121.960

42.251

63.342

eight22.275

101.993in

21.120the

249.150

the17.621

300.611

165.623

203.512

85.745

112.386

or28.113 90.307

and36.395

the29.330 300.611

1 2

3

4 5 6 7

8

9 10

11 12 13

14

Figure 2.3: Word hypothesis graphs with incorrectly repaired edges.

Page 22: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 22/58

parser both the orginal and modified strings to choose from with the appropriate weightings inthe scores associated with old and new edges.

Such a modification is much harder to carry out where speech recognition offers an ordered se-quence of “N-best” strings to the parser. Supposing that you are able to match a pattern thatwould trigger a modification on the current string, how do you know that the result of apply-ing this robustness operation will produce a better input for the parser than the next string inthe sequence which may not exhibit any obvious pattern of ill-formedness. The quality of therobustness operations must be weighed against the relative confidence associated with the in-dividual strings by the recogniser, although these two criteria may be expressed as measure oftotally unrelated properties. This is a clear instance of the choice problem.

We noted above that pattern matching to detect spontaneous speech phenomena carries a certainrisk because of the limited information available to a preprocessor. This is even more clearlythe case with any attempt to account for recognition errors between the recogniser itself andthe parser. Effectively, there is no more information available than that which the recognisershould have access to. Thus, any process that could work at this level ought to be combinedas part of the recognition rather than regarded as a component of robust linguistic analysis. Theexception to this that we are aware of in the literature, [21, 22], is effectively a recogniser adaptionpostprocessor. It uses the same techniques as the latter stages of recognition in that a languagemodel is applied, but one that is specifically trained for the application domain. The same effectcould be achieved by retraining the recogniser, assuming this were possible, and one runtimeprocessing step could be eliminated.

In summary, certain spontaneous speech phenomena can be identified in a parser’s input structureusing statistical language models and prosodic cues. The input structure can then be modifiedto eliminate the effects of the detected phenomena increasing the probability of a successfulanalysis. This works best where the parser input is a word lattice permitting modifications to bemonotonic and reducing the cost of an incorrect modification. One of the reasons that patternmatching may misfire is the relatively limited information available at the string and word level.This is also the reason why true recognition errors cannot be treated by a preprocessor. Even forspontaneous speech phenomena there are clear limitations on the capabilities of pattern matchingpreprocessors.

Obviously, there must be a detectable pattern at the level of words or parts of speech as providedby a language model. This already breaks down with symmetric corrections where the interven-ing edit term, in effect the actual correction marker, is syntactically complex. Even detectablecorrections may be asymmetric, involving the same structures as coordination. It is also notobvious that false starts have a syntactic or lexical pattern. Where robustness operations applybefore parsing they are best combined with additional selection processes later in the analysisprocess and further robustness modules which treat the phenomena beyond their remit.

Page 23: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 23/58

2.2.2 Robust Parsing Processes

Probably the most intuitive, or at least the most obvious, architecture for robust linguistic analysisis to modify the main analysis process itself, the parser. This would essentially mean that thereis no explicit accommodation of robustness in the architecture and the parsing process gives theimpression of being inherently robust. In practice, the parser may be modified in two ways, butthe overall and intended consequence is the same: the parser becomes more liberal and assignsanalyses to inputs it would otherwise have rejected. This can be achieved by extending thelinguistic description that the parser applies to include constructions that would not normally beregarded as well formed. Alternatively the parsing process itself can be modified.

Modifications to the parsing process offer the widest range of possible approaches. The oper-ations of a typical parser, using a linguistic description to guide the construction of analyses,basically consist of accepting new words, constructing new constituent analyses and determiningwhether the analysis is complete. The conditions for a complete analysis may be relaxed, sothat categories other than the target are accepted. It may also be practical to accept a sequenceof fragments rather than a single analysis spanning the whole input. However, this is most ap-propriate when there is some further processor which can make use of fragmentary analyses. Inthe construction of constituents constraints may also be relaxed, so that variants of the standardmatching or unification processes are acceptable. This will generate more constituents of varyingquality, according to the degree to which constraints have been relaxed. At the word level theparser may, in effect, carry out insertions, deletions or substitutions on the sequence of words inthe input string. More radical structural operations on the input are not so acceptable as part ofthe parser’s functionality since they should be guided by a description of the kind of patterns thatare likely occur, i.e. the parser’s modifications of the analysis string are best geared to word levelerrors, typically recognition errors.

Each of these options for modifying the parser’s basic functionality should be associated with aweighting or confidence value. It is difficult to see a way of making such weightings system-atically relative to the applications domain, but the degree involved in the modification may beapparent, as in liberalising the application of constraints. Where the parsing process is itself aconstraint system, e.g. [13], things look slightly different, since analyses are not constructedbut disambiguated or eliminated. However, the robust extension of a constraint system is quitesimilar to the liberalisation of constructive parsing. The constraint system is overdefined andthe constraints are ranked, so that should there be any solutions, then the quality of the result isinversely proportional to the significance of the constraints that were not applied. However theseweightings are arrived at and organised they are, at best, properties of the parser or by extensionof the linguistic description that it applies. They are not related to the probabilistic scores associ-ated with recognition results and, hence, can only be combined with them with circumspection,if not difficulty. Extending the parser’s options for which action to carry out next increases therange of possible and, we would assume, actual processing tasks. The options may also includedeferring the currently pending tasks and looking at the next best input. Once more the choice

Page 24: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 24/58

problem is clearly present.

Modifications to the actual parser operation can really only be made in a general way, by sys-tematically modifying conditions for a successful analysis. As such, they are better suited tocountering recognition errors where there is no natural structural pattern to be observed. Inparticular, the substitution of an incorrect morphological variant by a language model can becounteracted by the relaxation of unification constraints. Modification to the linguistic descrip-tion that the parser applies can, of course, also provide such systematic liberalisation of analysisconstraints, but to achieve this a considerable and potentially inefficient duplication may be un-avoidable. This really depends on how information and constraints are organised in the linguisticdescription. A classical rule-based grammar would incur an additional rule for each variation.Descriptions that are oriented towards type hierarchies have a potential for abstraction, but thegranularity of the type hierarchy is oriented to and generated by the structure of the linguisticdescription, and rightly so. The specific type necessary to provide the exact degree of constraintrelaxation required for a robust analysis may simply not be either explicitly or automatically de-fined. More general constraints, e.g. [24], may provide more structure for such relaxations. Thestructure of a description in terms of linguistic generalisations, each such generalisation couldthen be defined with a set of successively weaker constraint sets which then may apply in everycase where the generalisation is evoke. This would be more systematic, but then the granularityof constraint relaxation would have to be coded by hand and the conditions for constraint relax-ation would also have to be determined. It should also be noted that such general constraintsare not supported by any extensive and popular linguistic theory, though the inverse does hold.The design principles for linguistic theories require that the formalism support the minimum ofexpressivity, but linguistic formalisms that are more like programming languages may well berequired to handle the compromises involved in robust analysis, where no general linguisticsclaims are to be made.

Essentially, modifications to the linguistic description are better suited to problems that behavelike linguistic constructions. Here we come back to the typical performance problems, like falsestarts and self corrections. Rules that would detect such structures and modify their analysescan be encoded within most linguistic formalisms. Such rules may not be as sensitive to cuesfrom the prosodic analysis as methods that focus on the string level, but they are also not locallyrestricted. The can account for structures that are not simply symmetric and which may extendover quite a number of words. A simple or extended symmetry at the level of constituents wouldbe required, instead. Longer edit terms are also not a problem provide they can be analysed intheir own right as intruding structures that do not attach to their surrounding constituents. Finally,the potential for including semantic constraints in such an analysis would permit the detectionof false symmetries, as in the case of modifiers with identical syntactic structure but differingsemantic functions or levels of granularity.

There are really two problems with extending a linguistic description to cover spontaneousspeech phenomena. On the face of it the first is rather abstract, in that a linguistically moti-vated description is intended to be a model of linguistic competence. This is what informs the

Page 25: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 25/58

theory which determines the structure of the linguistic description and many of its details. Thereis no guarantee that performance phenomena will fit into this framework. Should we reallyworry about this? Perhaps we should. Quite apart a from the theoretical niceties, we are breed-ing something of a mongrel in selectively extending a competence model to some performancephenomena. It is unlikely that we would be able to claim extensive performance coverage. As aresult of this mixed bag of constraints we would have difficulty saying what a successful analysismeant. The relation between inputs and analyses that the parser encodes would then be preciselythat extensional relation, with no further significance. This connects to the second point. Whenthe parser no longer determines just grammaticality how can you ensure that the best grammati-cal structure was actually analysed. You may push the analysis of one string through to the bitterend when the next best string provides a grammatical structure, but by compromising the notionof grammaticality you can no longer see the difference. Some of the distinction can be main-tained by weighting or preference mechanisms. But the score assigned to a parse would have thesame value as the confidence values attached to explicit parser operations. This brings us backto the very same selection problem of choosing between competing the current analysis versustrying alternative input data.

Before we proceed to the final set of options for a robust analysis architecture, we should considera couple of quite distinct developments that may extend the scope of parser modifications in aninteresting way. We have just addressed the difficult of controlling the notion of grammaticalityin a linguistic description that supports different types of constraints and constructions. We wouldnote that recent developments in LFG [6] posit an additional constraint domain that deals withpreferences and variations in analysis. This is termed the optimality domain and is seen in muchthe same way as constituent structure, functional structure and semantic structure. As with eachof these other domains it is governed by its own constraints. In fact the only disturbing factor inthis proposal is the name of the domain. We believe that we can harmlessly abstract away fromany allusions to other linguistic theories, intended or otherwise, and consider this mechanism asa purely formal construct. This mechanism would not solve the conflict between constructionsmotivated by different degrees of grammaticality, but it provides a way of factoring preciselythose constraints that are not concerned with absolute grammaticality judgements, so that theycan then be treated in a transparent and systematic. There is still much work to be done on thepractical application of this idea.

Finally, what of the absolutely utopian solution of considering all paths in the word lattice, i.e.all strings and substrings, simultaneously? The main argument against this is the overgenerationof intermediate constituents, and ultimately of competing analysis. On the face of it this is thechoice problem written very large. Is there any scope for controling this threatened explosion.The most obvious option would be to start with a smaller lattice, either by pruning the lattice andallowing fewer hypotheses through, or perhaps by segmenting it. Segmentation alone is limitedby the fact that information is lost about the hypotheses that would go over a segment boundary.Just segmenting is only really helpful if you can be fairly sure that the segments coincide withnatural constituents. In that case no word hypotheses that go over a segment boundary would beat all relevant to the actual analysis. This type of segmentation can be achieved using prosodic

Page 26: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 26/58

and acoustic information that demonstrates significant pauses at constituent boundaries. Evenwith such segmentation word lattices can be fairly large. The next step down would be to takeincremental slices of the word lattice while recognition is still processing. There are obviousefficiency advantages in allowing recognition and parsing to run in parallel, but this is really onlyan advantage if the parser does not fall behind. Essentially, what would be required would be anincremental parser with a relatively shallow initial analysis. It would also be necessary to adaptthe parser information in the light of adjustments in the lattice increments. Since recognition isnot complete when the lattice increment is submitted for analysis the probabilities on the leadingedge may be subject to modification in the light of a further context to the right. A solution forthe appropriate kind of parser may be a constituent parser with packing that still leaves some ofthe finer decisions to later processes. To our knowledge this approach has not been tried on anyserious scale. This may even provide a context for combining the insights of constructive parsingand constraint propagation.

2.3 Robust Postprocessing

Both of the approaches that we will incorporate into the Siridus demonstrator have been im-plemented as postprocessing strategies for robustness. In fact, some of the most successfulaproaches to robustness that we know of apply after parsing is completed or partially completed.These are typically strategies than can work on the kind of fragments provided by an incompleteanalysis as mentioned in the previous section. When the parser has applied all the grammaticalconstraints and no well-formed spanning analysis is available there are effectively three thingsthat you can do.

� You can examine the resulting fragments and select the best sequence.

� You can attempt to complete the analysis by applying a different set of linguistically moti-vated constraints.

� You can attempt to combine the fragments guided by the semantic representations requiredin the application.

The first option is really the null option since each of these strategies will have to select themost viable fragments. Just selecting fragments assumes that the application can make use offragments almost as well as full analyses. However, such applications do exist, for example intranslation between languages that are closely related or have a similar constituent ordering. Inmost cases it is not sufficient to just select fragmentary analyses. The combination and perhapsmodification of the analyses is required. The question which separates the last two options aboveis whether the robust operations are intended to complete the analysis or whether you regard theanalysis as complete as it is and pass over to some level of interpretation.

Page 27: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 27/58

Robust postprocessing that attempts to complete the analysis is similar to robust parsing pro-cesses, only the robust component has been factored out to the end. This avoids the selectionproblem within parsing, allowing the parser to find a correct analysis if it can, given the inputand available processing time. Postprocessing is then only necessary when the parser has clearlyfailed, unless you allow parser and postprocessor to run in parallel, so that robustness operationsapply to fragments incrementally as they are produced. Whatever the relation between parser andpostprocessor there can be no feedback from the robustness module to the parser, otherwise youcannot guarantee the full grammaticality of a spanning parser result. This in turn implies thatthe postprocessor carries out some operations that emulate the standard grammatical analyses inparser. When fragments have been repaired using operations for robustness they may still have tobe combine further. It may be that this combination requires no special modification, so generalgrammatical rules as well as robust repair rules will have to be encoded in the postprocessor.There is a certain redundancy here, but this cost is outweighed by the simplification of the choiceproblem in both the parser and postprocessor, provided that the postprocessor only gets to seea selected subset of the fragmentary analyses. If you give a prostprocessor full access to theparser’s chart then the number of possible tasks is the same as with robust parsing processes andmost of the complexity of the selection problem returns. Under those circumstances only the fullgrammaticality of the parser result remains as an advantage and the distribution of processingtasks is clearly not optimal. But we have noted above that a selection of fragments is typicallya feature of postprocessing for robustness. It should also be noted that we have talked here ofrepairing the fragmentary analyses at a local level in order to complete the global analysis. Thisimplies an approach that is oriented to specific phenomena, much as pattern matching rules onparser inputs or additional grammar rules for robustness. This does not imply that this type ofpostprocessing is better suited to treating spontaneous speech phenomena than to recognitionerrors, but it does require that some detectable phenomena be present and it does risk misfiringon unreliable data that appears to show signs of an expected phenomenon.

The other major form of robust postprocessing essentially approaches the problem from the otherend. It takes note of the partial analyses available from parsing and attempts to mould these tothe expectations the system as a whole has for the next user input. This may also involve thecombination of information from various fragments. It may even involve the modification of thatinformation, but this is less likely. What is more likely is that information that is incompatiblewith the current expectations be ignored. In essence it is assumed that all the information thatcan be derived with some degree of confidence during analysis is now available, but that thesystem should have more confidence in its own expectations than in the analysis results. This isnot unreasonable since there are a number of factors perturbing the input which may generateunreliable analyses. However, it can also lead to the system eliminating useful information fromthe input. In practical terms this is already an interpretation step or a preprocessing step to inter-pretation. Partial analyses are converted to internal semantic representations and where these areincomplete information is gathered from other fragments in the same local environment. Thismay be guided by linguistic cues but most of the linguistic information is eliminated in the resultstructures. Whereas repair rules construct objects of the same type as their inputs this robust in-terpretation approach is essentially a mapping from whatever form of analysis the parser supplies

Page 28: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 28/58

to the internal semantic representations of the underlying application. Dynamic interpretation isalso basically a fuzzy strategy because it eliminates information that doesn’t match its expecta-tions, whereas repair strategies are more focused on specific phenomena. We have now outlinedmost of the basic properties of the strategy we have called dynamic interpretation. The one ques-tion that we have not raised here is whether we should consider this type of strategy a part ofthe robust analysis process, or whether it is strictly part of some interpretatoins mechanism. Inthis context we have isolated out the analysis side of linguistic processing as the focus of robustlinguistic processing, we can not eliminate from that analytical processes just because they areat some particular level or operate on semantic rather than generally linguistic analyses. Anyprocessing that builds on information contributed by user inputs is in this general sense analysis,otherwise there is no argument for localising the requirement for robustness there.

2.4 Robust Multi-Engine Analysis

So far we have discussed various ways in which additional robustness strategies can be added toto a linguistic analysis architecture. In effect, these are the points at which a robustness moduleor sub-module may plug into an existing architecture. We have not, however, discussed purelyarchitectural solutions to the provision of a more robust analysis component, or at least we havenot done so explicitly. The kind of solution we have in mind here is the use of more than one typeof parser so that there is a fallback position if the preferred analysis mechanism fails to producea result. For example, you might employ a full grammatical analysis with an HPSG, or similar,grammar to provide an accurate and restrictiive grammatical analysis where one exists or can befound in the available processing time. Then, you might back this up with a Chunk parser or sim-ilar shallow or more statistically oriented analysis. The gain in robustness, or at least coverage,comes in having somewhere to go if the “deep” restrictive analysis fails. No mechanisms that areexplicitly oriented to robustness, recognition errors or performance phenomena need necessarilybe applied to achieve an improvement over the original parser running alone. In this respect, thedegree of robustness achieved is inherent in the architectural solution.

However, this is an oversimplification, because the relationship between different analysis strate-gies is not that suggested by a fallback strategy. You may be able to rank analysers in terms ortheir general accuracy, but the performance on a wider set of data is often more complementary,because the analysis methods are distinct rather than being weaker versions of each other. In fact,a fallback strategy is better modelled by successive reparsing of the input with ever more liberalconstraint sets, but this is not a realistic solution in a real time system. As will be discussedbelow, Verbmobil also had a multi-engine architecture, combining an HPSG parser, a statisticalparser, trained on treebanks for the Verbmobil corpus, and a Chunk parser, combining finite-statechunk parsing with knowledge of the treebank information, as shown in Figure 2.4. But theVerbmobil strategy for combining these result was not determined by a general ordering of theaccuracy of the individual parsers. Although it did include preferences, it relied on the confi-

Page 29: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 29/58

SoundSignal

SoundSignal

ParserChunk Statistical

Parser

partial RobustSemantic

Processing

resolvedVITs VITsannotated

WHG

SpeechRecognition& Prosody

ParserHPSG

VITsProcessingIntegrated

WHG

TransferRepair

Generation& Synthesis

Correction

Figure 2.4: The Verbmobil analysis architecture in context.

dence values that the parsers were able to provide, and on a statistical model of the interactionbetween dialogue structure and semantic content, see [25]. The overall aims of this architectureare described in [23]. This was combined with several explicit robustness mechanisms, such asa repair module and pattern matching on the input lattices. In principle, this offers more thanthe “belt and braces” approach implied by a fallback strategy, but it also becomes unstuck whenthe coordination or communication between various locally robust modules is inadequate to thesophistication of the architecture.

2.5 The Verbmobil Experience

We have taken the Verbmobil system as a reference point several times in the discussion so far:

� As the background for our our survey, classification and assessment of robust linguisticanalysis techniques,

� As a prime example of robust operations a the word lattice level,

� As an example of a multi-engine analysis architecture,

� And as the source of one of the strategies we intend to adopt and adapt in the Siridusdemonstrator and prototype.

This pervasive perspective can be motivated by the size of the Verbmobil project, its recencyand, above all, because we have direct knowledge of the system, some of which may not be

Page 30: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 30/58

documented elsewhere. Although we have already referred to experience in Verbmobil in variousexamples we have not exhausted the lessons to be learned from that system, because some of themost interesting insights come from the interactions of various robustness mechanisms withinsuch a large system.

While the size of the Verbmobil project and the complexity of the resulting system presented acoordination overhead, it also offered some unique opportunities to combine strategies and tech-nologies. We have outlined in this chapter three places in a system architecture where robustprocesses can be added to linguistic analysis. The approach that we will adopt from Verbmobilis an instantiation of one of the last strategies, namely a robust postprocessing strategy. However,this was not the only explicitly or implicitly robust module in the Verbmobil analysis architec-ture, in fact each of the architectural options we have outlined above was in some way explicitlyor implicitly instantiated. A very successful approach to preprocessing word lattices for sponta-neous speech phenomena was implemented. The parsers provided selected partial analyses onan incremental basis and, thus, incorporated part of a robustness strategy that was completed bythe robust postprocessing.

The combination of several robust modules within the multi-engine linguistic analysis com-ponent, described in the previous section, posed coordination problems of its own, as well asmaking the evaluation of any one robustness mechanism more difficult. Whereas the results ofvarious parsers can, to some extent, be directly compared, there is a danger that interactionsbetween robustness mechanisms would evade such control. Parser analyses in Verbmobil areobjects of the same type and can thus be subject to direct comparison, even when simple factorssuch as the proportion of the input covered are major factors in their evaluation. The effects oflow level robustness operations are incorporated in those analyses, so that their traces may not beeasily detectable. This may be dangerous if further robustness operations build on these resultscounteracting the original contribution or moving the analysis still further away from what therecogniser actually saw. The intended control mechanism against such composite effects wasthe use of confidence values, so that any robust analysis is marked as being less desirable thanan analysis of the same coverage that uses only grammatical construction rules. This works ef-fectively in the final instance of robust postprocessing, but the final selection process ultimatelyfails to deliver the desired result because the confidence values for individual parser analyses,that may include robust constructions in their input lattices or in the parsing itself, were simplymissing, see [25]. Such preference mechanisms may not have been totally adequate, but therewas not even the opportunity to evaluate them correctly.

Evaluation of robust postprocessing was carried out at two levels:

� The subjective local assessment of whether the intended rules applied.

� A global assessment of the system’s input and output, comparing translation performancewith and without the use of the robust postprocessing rules.

Page 31: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 31/58

Of course, only the latter is of any real significance, particularly given our position set out inChapter 2.1. The results are satisfactory, see [31]. With robust postprocessing roughly 10% ofsuccessful translations involved robust constructions in their analysis.

The design choice of factoring robustness rules into a separate modules running as a postpro-cessor, with the attendant overhead of completing analyses containing robust constructions, wasalso significantly influenced by the complexity of the analysis architecture. We have outlined thegeneral choice problem that occurs in scheduling parser operations when a sequence of N-beststrings is being parsed. Even with only grammatical rules there is always the option of suspend-ing processing on the current string and moving to the next string in the sequence. The additionof robust parser operations complicates that scheduling problem by adding, potentially, anotheroption at each choice point. With multiple parsers the scheduling problem would be multipliedout, quite apart from the problems of communications with the internal as well as the externalparser representations. Instead of plunging into these complexities the robustness rules werefactored out and applied over all parser results equally. However, this does not preclude that asimilar ruleset be applied within the parser operations, given a simpler architecture.

Factoring out robustness operations not only reduced the risk of interference between robustnessrules and the grammatical rules used, generally, in the parsers. This was felt to be desirable aspriority was given to a grammatical analysis. This is seen also in the fact that the confidencefactors associated with robustness rules are factors by which the overall score of the resultingconstructions are reduced, so that robust constructions always get a lower score than spanningparser analyses. Even this preference for grammatical analyses is to some extent motivatedby the Verbmobil architecture and the nature of the application. On the one hand analyses forgrammatical constructions are easier for the Transfer and Generation component to deal with,because that is what they expect to see. On the other hand Verbmobil is a dialogue systemin which the system is not a dialogue participant. This means that, in general, the tracking ofthe dialogue may be somewhat looser and processing have less reliance on predictions aboutthe state of the dialogue. Nevertheless the problem in robust linguistic processing of spokenlanguage inputs remains that the speech recogniser will provide a set of word hypotheses someof which are correct for the input signal. These can be arranged into strings some of whichare grammatical. Spontaneous speech is not always grammatical, but in the absence of otherevidence grammatical strings should be preferred for analysis, because the chances are that thespeaker intended to make a grammatical utterance.

As a final word we would not that Siridus is, of course, different from Verbmobil, not least inthe scale of the project. We have noted that we intend to adopt a ported version of the Verb-mobil robust postprocessor within the Siridus demonstrator and build on this in the subsequentprototype. However, the Siridus application is in the framework of a real dialogue system thatdoes participate in the dialogue. The Siridus prototype is also intended not simply as a singlesystem but as a framework for further development and experimentation on the scalability androbustness of dialogues systems. While there are clear structural and motivational differencesbetween the two projects, the Verbmobil experience offers a wealth of recent and detailed results

Page 32: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 32/58

that we are fortunate enough to be able to plunder, as we require, for Siridus.

Page 33: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 33/58

Chapter 3

Robustness as Repair

The Robust Semantic Interpretation module (RoSI) developed in Verbmobil [19, 31, 32] imple-ments a rule-based strategy that applies as a postprocessor taking the result of a partial analysisand attempting, in effect, to complete the analysis process. The term interpretation in the mod-ule title has limited significance. The RoSI strategy applies at a level of semantic representationsaugmented with other linguistic information and its results are objects of the same type, so that nointerpretation as such is involved, and indeed relatively little semantic interpretation is actuallyrequired in the translation application as a whole.

This approach is well-suited to the Verbmobil environment, because it allows the optimum use ofboth the parsing resources and the processing time. Linguistic analysis in Verbmobil combinesthree distinct parser modules which deliver complete and partial analyses to the RoSI module.Partial analyses are provided incrementally, allowing parsing and robust processing to operate inparallel with a minimal offset. The parsers share the same output format, Verbmobil InterfaceTerms or VITs [26, 1]. They also share roughly the same inputs, in that best strings are selectedfrom a word lattice successively, but some of the parsers can process more strings than others.The general architecture of linguistic analysis in Verbmobil is described in [23]. We have notedin the previous section that this strategy is most probably optimal in the context of Verbmobil’scomplex analysis architecture. In this section we will consider the internal architecture of theRoSI module and the type of robust processing that can best be coded in such repair rules.

anaab :: [[type(V1,anaphor),at_beginning_of_segment(V1)],[type(V2,prop),no_zero(V2)]] --->

[dummy_op(0.9)] & V2.

The RoSI module is modelled on a standard chart parser. It has a rule-base of rewrite rules withconditions. These are similar to traditional grammar rules, except that strictly speaking they have

Page 34: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 34/58

2 4

1: I (64.0) []

4 13

8: I have scheduled an appointment on the seventeenth (65025.0) []

2 13

9: I + I have scheduled an appointment on the seventeenth (66918.0) [1,8]

Figure 3.1: A VIT hypothesis graph showing a successful application of a repair rule (anaab)

no terminal categories. All of the inputs are partial analysis whether provided as outputs from aparser or locally constructed in the RoSI chart. These rules may also be non-monotonic, in thatan input or daughter may play no role in the output or mother analysis, as in the treatment of selfcorrections where the reparandum is not part of the intended content.

The example rule given above treats a specific, but common, case of hesitation, self correction orfalse start. The rule anaab matches the case of an unattached anaphor preceding a full propositionat the beginning of an utterance. The rule consists of a name, a sequence of condition sets, a setof operations and a designated output structure. The rule syntax exploits Prolog variable binding,pariticularly in making the output assignment. A set of conditions is applied to each daughterfragment in order as they are encountered. The first daughter fragment should be an anaphoricpronoun and there is a check that its span coincides with the beginning of a new segment. Thesecond daughter should be a propposition with no arguments filled by pro drop mechanisms, e.g.zero pronouns. It is significant that all arguments in the proposition should be bound, otherwisethe free pronoun is more probably an argument and should be treated by other rules. Similarly,an incorrect pro drop analysis can be repaired by a different rule. The effect of the anaab ruleis that the preceding pronoun is excluded from the analysis, i.e. only the content of the seconddaughter V2 is the output of the rule. The constraints that are specified in this rule indicate thatthe ruleset as a whole has been maintained to limit redundant matches, but at the expense of somemodularity.

Page 35: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 35/58

1 3

1: how about (1296.0) []

3 4

2: in (81.0) []

4 5

3: well (324.0) []

1 3

6: how about (1294.0) []

3 5

7: in well (781.0) []

5 8

8: we forget it (2205.0) []

4 8

25: well + we forget it (3028.9) [3,8]

1 8

39: how about + well + we forget it (5998.1) [1,25]

1 5

47: how about + well (1941.0) [6,3]

Figure 3.2: A VIT hypothesis graph with bridging

This internal organisation supports the notion that the repair operations that RoSI carries out areessentially robust grammar rules factored out of the parser to gain clarity. However, there aresome other aspects of these rules and their applications that reflect more of the strategies wehave associated with robust parsing processes. The RoSI rules are each assigned a confidencefactor, so that each construction bears a confidence value that is derived from the scores of itsimmediate constituents and the way that they were combined. The major factor in calculatingthe confidence value associated with a robust construction is usually the operations carried outwhen applying the RoSI rule. Each operation has an associated confidence factor, so that it isprimarily not the rules but the operations which determine confidence. The anaab rule is anexception because no explicit operation is carried out. A dummy operation is specified and theconfidence factor is given explicitly. In addition, as a minor factor constraints may be selectivelyrelaxed in applying rule conditions and this is also taken into account, e.g. for case or agreementmismatches. Each constraint relaxation contributes a further factor reducing the overal score ofthe output relative to combined score of the inputs. In this way breaches in agreement conditionsthat may be the consequence of recognition selecting the wrong morphological form can beneutralised. Selectional restrictions based on sortal information can also be relaxed.

It is not only in the analysis domain that liberalisation of the parser operations can be emulated.The RoSI module can also effect certain specific operations at the string level. The most in-teresting of these we term bridging. Here the string operation itself is quite trivial in that some

Page 36: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 36/58

information is excluded. In other words robust processing is able to skip part of the input string inorder to arrive at an analysis. In Figure 3.2 We show an example in which the spanning analysisleaves out the word in by the bridging mechanism. The interesting part of this operation are theconditions under which it is carried out. This is a last resort technique and is only applied afterit is clear that no more fragments from the parsers will be delivered or accepted. In principleany fragment could be bridged, but in practice only single word constituents are bridged. This isonly licensed if there appears to be a good chance that a specific analysis may be completed byskipping the intervening word. The score assigned to the constituent that bridges takes accountof the fact that some material was excluded, since coverage is a factor in the overall score. Alonger bridging would therefore be unlikely to be included as a final result, because the coveragefactor would drop fairly quickly.

Bridging is essentially designed to skip spurious words generated by recognition errors, but thereare cases of self correction that have the same structure. Bridging is the main systematic mod-ification to the input string that the RoSI module is allowed to make. Really the only othermodifications at the string levels are allowing certain common confusions between word forms,but this is really a quick fix and very specific to a specific language in a specific application.

The RoSI rule base includes two types of rules that are applied according to these liberalisedoperations. The more interesting class are those that are designed to repair the effects of specificphenomena, notably spontaneous speech phenomena like false starts or self corrections. Theother rules, which in practice are more numerous, carry out a simplified form of a standard gram-matical analysis. So that when a constituent has been locally repaired it can still combine furtherwith other adjacent constituent to get as close as possible to a spanning analysis containing therepaired constituent. The underlying confidence factor of rules that emulate a grammatical anal-ysis is, of course, identity, i.e. 1.0, but this may be reduced by the effects of liberalised constraintstatisfaction or bridging.

The robust semantic processing is completed when no more rules can be applied or when a giventime, a small multiple of the utterance time, has elapsed since the last parser contribution wasaccepted, i.e. RoSI times out a bit later than the parsers. The last phase of processing involvesselecting the best repaired analysis. Of course, if there is a spanning parser analysis this ispassed on immediately anyway, so that will not occur in the final selection. Repaired analysesare compared on the basis of their scores and their contribution to the overall coverage of theinput. This is solved by a standard graph traversal, in much the same way the input fragmentsare selected from the parsers’ internal charts.

Page 37: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 37/58

Chapter 4

Robust Interpretation

A key to providing graceful degradation in the face of noisy input is to be able to exploit redun-dancy. First consider an exchange such as the following, which assumes perfect recognition:

System: Where are you leaving from?User: CambridgeSystem: Where do you want to go?User: I would like to go to London

In a slot filling task we have to work out which slots are to be filled and with which values.Consider interpretation of the final utterance, “I would like to go to London”. The desired outputis the slot name, ‘destination’ and the value, ‘London’. There are several clues that the slot nameis destination:

1. The question concerns the destination

2. London is an argument of the preposition ‘to’ which signals a destination

3. London is a city so is suitable as a destination or origin slot, and the origin slot is alreadyfilled

We thus have a degree of redundancy. If the input were noisy we could make do with one ofthese clues, and still come up with a reliable guess that the slot name is ‘destination’. The keyto the approach we outline in this chapter is to provide a system which comes up with the besthypothesis given the user utterance and the dialogue context. We may or may not be able toprovide a full syntactic or semantic analysis of the user utterance, but we should aim to use allthe information available.

Page 38: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 38/58

The kind of approach we are outlining here contrasts with the approach taken in many currentcommercial dialogue systems where a grammar is compiled into the recogniser. In these systemsrobustness at the interpretation stage is not an issue: the recogniser will always produce an outputwhich fits the grammar. However, statistically based n-gram recognisers have had the most suc-cess for broad coverage recognition. Our hypothesis is that as dialogues become more flexible,and user input becomes less predictable, a grammar based approach will do worse than a statis-tical approach, so we should investigate the possibility of putting together statistical recognitionwith robust interpretation. In later deliverables we will try to provide some concrete compara-tive evaluation, and we will also be looking at ways in which recognition language models canincorporate aspects of both statistical and grammar based approaches.

The rest of this chapter is based closely on [16].

4.1 Distributed Representation

By using and exploiting distributed representation schemes, we can provide algorithms that areless liable to failure in the face of missing information. The classic (though largely outdated)contrast is between NL algorithms which rely on finding a fully connected syntax tree for eachsentence, and Information Retrieval style approaches that represent meaning as a bag of contentwords.

Representing the content of dialogue utterances as bags of words (even including function words)loses too much information e.g. “from Boston to London Heathrow” and “to Boston from LondonHeathrow” get the same representation,

to, from, Boston, London, Heathrow

The next level is a bag of constraints encoding the original information in the surface string byindexing each word and providing linear order constraints e.g.

i1:from, i2:Boston, i3:to,i4:London, i5:Heathrow,i1 � i2, i2 � i3, i3 � i4,i4 � i5

The representations for “from Boston to London Heathrow” and “to Boston from London Heathrow”now differ, however, the representation is not particularly convenient for picking out informationrequired by a dialogue system. To find the destination we want the noun phrase argument, Lon-don Heathrow. This suggests we need a representation encoding structural information not just

Page 39: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 39/58

Figure 4.1: Indexed Tree Structures

linear order. Structural information can be included by adding extra constraints. For the caseabove, we can create a node, i7, corresponding to the phrase “London Heathrow” and state thatthis dominates both i4 and i5. To see how this can be done systematically, consider the pair ofsyntax trees for “from Boston to London Heathrow” in Figure 4.1 in which each node is uniquelyindexed.

The following constraints encode the bracketing:

i1:from, i2:Boston, i3:to,i4:London, i5:Heathrow,i6:(i1,i2), i8:(i3,i7), i7:(i4,i5)

To encode the full tree, including the syntactic labels we would also need to assert the cate-gory for each index e.g. cat(i6, PP). The set of constraints can be considered as an alternativerepresentation of the tree, or as a description of the tree c.f. D-Theory [12].

As an alternative to working with a syntax tree we can use a semantic representation. In thiscase, adding indices to the predicate argument structures gives:

from ��� (Boston ��� ) ��to � (London Heathrow ��� ) �

The equivalent constraints would be:

i1:from, i2:Boston, i3:to,i7:London Heathrow,i6:i1(i2), i8:i3(i7)

Page 40: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 40/58

This provides the lexemes, from, to, Boston and London Heathrow, along with constraints spec-ifying their relationship. Note that we have only changed how we represent the semantics: thereis a one-to-one mapping between the set of constraints and the original recursive representation(assuming index renumbering). The representation is thus closer in spirit to the use of labelling inunderspecified semantics e.g. [20], [5], than event based approaches in the tradition of Davidsone.g. [8].

In spoken dialogue systems we may be starting with a word lattice, and have an ambiguousgrammar. It is then convenient to use the indices to pack ambiguity. The same index can beused more than once to give alternative readings (i.e. meta-level disjunction). For example, i4:Pi4:Q is taken to mean that i4 has the two readings, P or Q1. If the recogniser hypothesised either“Boston” or “Bolton” in the case above we get:

i1:from, i2:Boston, i2:Bolton, i3:to,i7:London Heathrow,i6:i1(i2), i8:i3(i7)

This representation can be obtained using indices corresponding to edges in a chart or lattice(this exploits the context free assumption that any two readings of the same span of utterancewhich have the same category can be interchanged). The result for our example is as follows,numbering spans according to word counts:

0-1-p:from, 1-2-np:Boston,1-2-np:Bolton, 2-3-p:to,3-5-np:London Heathrow,0-2-pp:0-1-p(1-2-np),2-5-pp:2-3-p(3-5-np)

We have ended up with a ‘semantic chart’2. This should not be surprising. Although a chart is

1Note that in Minimal Recursion Semantics [4], there is a similar ability to use the same index more than once,but the interpretation is rather different. In MRS, i4:P, i4:Q is equivalent to conjoining P and Q, similar to hangingtwo predicates off the same event variable. MRS also differs in not splitting up the representation quite as muche.g. instead of � i13:i3(i2), i3:to � MRS would have a single constraint equivalent to i13:to(i2). The VIT formalismdescribed in Chapter 3 also assumes a conjunctive interpretation for shared indices, based on conjuntion of DiscourseRepresentation Structures.

2Semantic charts are similar to the packed semantic structures used in the Core Language Engine [17]. The maindifference is that in the CLE the semantic analysis records more closely follow phrase structure syntax, and semanticrepresentations are not reduced (i.e. we have an application structure saying what applies to what, rather than whatis the argument of which predicate). Consider the following record:

0-6-s: apply(1-6-vp,0-1-np)

This states that the semantics for the verb phrase (from positions 1 to 6) is to be applied to the semantics for thenoun phrase (from positions 0 to 1).

Page 41: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 41/58

not always thought of as a distributed representation, its distributed nature is what allows packingto occur (representations are split up so that bits in common can be shared).

To generate semantic charts we use an incremental parser based on Categorial Grammar whichbuilds the chart essentially left to right and bottom up. The Categorial Grammar is compiled intoa simple Dependency Grammar to allow packing c.f. [15]. An alternative would be to generatethe semantic chart from semantic analysis records, as used in [17], using a more standard bottomup or left corner parser.

The semantic chart is regarded as a semantic representation in its own right. It may be under-specified in the sense that it corresponds to more than one reading. It may be partial if there is noedge spanning the whole utterance. Mapping to the task specific language is performed directlyon the chart. There is no attempt to choose a particular analysis, or a particular set of fragmentsbefore task specific information is brought to bear.

4.2 Mapping from a Semantic Chart to Slot Values

Consider the following utterance and its associated semantic chart:

I leave Boston at 3

0-1-np:I, 1-2-v:leave, 2-3-np:boston,3-4-p:at, 4-5-np:3,0-3-s:1-2-v(0-1-np,2-3-np),0-5-s:3-4-p(0-3-s, 4-5-np)

The ‘departure time’ can be extracted using the following rule, which requires the lexemes ‘at’and ‘leave’ and treats the second argument of ‘at’ as the departure-time.

J:at, L:T, M:leave, I:J(K,L)�departure-time = T

The following components match the left-hand side of the rule, giving the result ’departure-time= 3’3.

3The system treats ’departure-time=3’ as an assertion move. Currently we only have this move plus replacemoves.

Page 42: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 42/58

3-4-p:at, 4-5-np:3, 1-2-v:leave,0-5-s:3-4-p(0-3-s,4-5-np)

Checking for an occurrence of the word ’leave’ in the rest of the utterance ensures that the useris likely to be talking about a departure time rather than an arrival time. Note that there is nostructural constraint on ‘leave’ so the departure time rule will apply equally well to the followingutterance where ‘leave’ is an intransitive rather than a transitive verb:

I leave at 3

The uses of ‘leave’ in the two sentences above are not regarded as involving two different senses,hence both satisfy the constraint ‘M:leave’. If the different subcategorisation possibilities hadcorresponded to different senses, separate lexemes would have been used e.g. leave � and leave � .

The actual mapping rule is more complex, with constraints split according to whether they con-cern the term which is being mapped, are from the rest of the current utterance, sortal constraints,constraints concerning the prior utterance, or constraints on the current dialogue context (e.g. thata particular slot has a particular value):

Term mapped: L:TUtt context: I :J(K,L), J:at, M:leaveSortal constraints: time(T)Prior utt: -Dialogue context: -�departure-time = T

Weights are attached to outputs according to how specific rules are. This is determined by thenumber of constraints, with utterance constraints counting more than contextual constraints. Themotivation is that mappings which require more specific contexts are likely to be the better ones,and what a person said counts more than the prior context. For transcribed dialogues a weightingof x 2 for utterance constraints works well. This may need to be reestimated for speech recogniseroutput. To see how the weighting scheme works in practice, consider the exchange:

S: When do you want to arrive?U: I’d like to leave now let’s see, yes, at 3pm

The system includes the following rule for arrival-time:

Page 43: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 43/58

Term mapped: L:TUtt context: -Sortal constraints: time(T)Prior utt: question(arrival-time)Dialogue context:�arrival-time = T

The arrival-time rule and departure-time rules both fire. There is a subsequent filtering stagewhich ensures that overlapping material (in this case, “3pm”) cannot be used twice. The departure-time output is chosen since more utterance constraints are satisfied giving a higher weighting.

When deciding between rules, the aim is to provide the most likely mapping given the evidenceavailable. The arrival time rule should not be read as stating that an arrival-time question expectsan arrival-time for an answer. The rule merely states that if there is no other evidence, then atime phrase should be interpreted as an arrival-time in the context of a question about the arrivaltime. The rules above may still be valid even if it happens that the most common response toan arrival-time question is a statement about the departure time (assuming the departure time isalways flagged with extra linguistic information).

The rules given so far look like the kind of rules you might see in a shallow NLP system e.g. anInformation Extraction system based on pattern matching over a chunked list of words. However,we can include as much structural information as we want. Thus we can include a more specificarrival-time rule which would over-ride the departure-time rule in a scenario such as:

S: When do you want to arrive?U: I want to arrive at 3pm leaving from Cambridge

Here we need to use the structural relationship between ‘arrive’ and ‘3pm’ to override the ap-pearance of ‘leave’ in the same sentence.

The ability to use higher level linguistic structure is a key difference from shallow processingapproaches where the level of analysis is limited, even if the parser could have reliably extractedhigher level structural information for the particular sentence.

We are currently investigating various ways in which to simplify or automate the construction ofmapping rules. The simplest method would be to allow optional constraints, and this could befurther extended to an inheritance hierarchy to minimise redundancy.

Another area which is being investigated is the way outputs are put together. So far, outputshave been at the top level i.e. slot-values which we would expect to be asserted into the context.

Page 44: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 44/58

However even when doing simple slot-filling, it is natural to use some recursive constructions.For example, some corrections are naturally modelled via a replace operator e.g.

But I want to go to London Heathrow, not to London Stansted�replace(destination=London Stansted, destination=London Heathrow)

The current mapping rule is specific to this construction i.e. an occurrence of a negation betweentwo prepositional phrases. This could be generalised by separately mapping the destination slot-values and the negation to get an indexed representation in the slot filling language i.e.

I: replace(J,K),J:destination=London Stansted,K: destination=London Heathrow

[4] and [28] use a similar approach for machine translation, mapping from an indexed representa-tion of the source to an indexed representation of the target. The difference here is that we wouldnot necessarily translate all the input material, just what we are interested in. Everything else isassumed to have a null or identity translation. As a side note, this gives a perspective on whyleaving out negation is a problem in Information Retrieval: it is a particularly bad assumption toassume that negation is an identity mapping.

Both potential changes preserve a key feature of the approach that structural information (en-compassing constituency and scope) is used where it is available from the parse, but, if it is notavailable, the system backs off to more relaxed versions of the mapping rules, which in the ex-treme are equivalent to just keyword spotting. These backed off rules are defeasible rules: theyassume that the relationships that are missing will not affect the mapping.

4.3 Distinctive features of the approach

4.3.1 Task specific interpretation

Consider the following sentence in the Air Traffic Domain:

Show flights to Boston

Page 45: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 45/58

This has two readings, one where “flights to Boston” is a constituent, the other where “to Boston”is an adverbial modifier (similar to “to Fred” in “Show flights to Fred”). In full semantics ap-proaches, the system is specialised to the domain to achieve the correct reading, either via spe-cialisation of the grammar c.f. OVIS, [29], or via domain specific preference mechanisms c.f.[3].

In contrast, in this approach, all domain dependent information necessary for the task is incorpo-rated into the mapping rules. For the example above, a rule would pick up “flights to � city � ” butthere would be no rule looking for “show to � city � ”, so the second reading is simply ignored.Note that ambiguities irrelevant to the task are left unresolved. Thus incorporating necessary in-formation into task specific mapping rules is a smaller task than training to a domain and tryingto resolve all domain specific ambiguities.

4.3.2 Choosing the best fragments, not just the largest

Many grammar based systems e.g. SmartSpeak [2] and Verbmobil [7], [10] (also see Chapter 3)try to find a full sentence analysis first, and back off to considering fragments on failure. Thecommon strategies on failure are to find the largest possible single fragment e.g. SmartSpeak,or the set of largest possible fragments e.g. Verbmobil. This is defined as the smallest set offragments which span the utterance, i.e. the shortest path.

However, by always selecting the full analysis, you can end up with an implausible sentence, asopposed to plausible fragments. This occurs commonly when the recogniser suggests an extrabogus word at the end of an utterance. A comprehensive grammar may still find an (implausible)full analysis. Similarly, the largest possible fragments are not necessarily the most plausible ones.This has been recognised in the OVIS system which is experimenting with a limited amount ofcontextual information to weight fragments.

In the robust interpretation approach, task specific mapping rules apply to the whole chart (in-cluding edges corresponding to large and small fragments). Preference is given to more specificmapping rules, but this may not always correspond to choosing a larger fragment. A largerfragment will only be preferred if it satisfies more constraints relevant to the task (including con-textual constraints). The addition of bogus words is not rewarded, and is more likely to cause aconstraint to fail. By retaining a lattice or chart throughout, nothing is thrown away until there isa chance to bring task specific information to bear.

The approach is tailored to task oriented dialogue: we are only looking for relevant information,and there is no need to come up with a single path through the lattice or even make a hypothesiseabout exactly what was said (except for the relevant words). Verbmobil has the more difficult taskof translating dialogue utterances. However, some of the same issues are relevant. For example,fragment choice could be determined according to which fragments are most relevant to the task,

Page 46: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 46/58

rather than according to their length.

4.3.3 Exploitation of underspecification

The most obvious gain by working directly with an underspecified representation (in this case achart or lattice) should be an efficiency one. This is particularly true when working with a latticewhere many of the words hypothesised will be irrelevant for the task, and we only home in on thebits of the lattice which are mentioned in task specific rules. The current implementation appliesthe mapping rules in every possible way to provide a set of potential slot-value pairs. It thenfilters the set to obtain a consistent set of pairs. The first stage is to filter out any cases wherethe translated material overlaps (we cannot use “3pm” in both a departure time and an arrivaltime). In these cases the more specific mapping is retained. Next the algorithm uses task specificconstraints, e.g. there can be only one departure time, to prune the outputs.

The current algorithm is ‘greedy’ taking the best local choices, not necessarily providing the bestglobal solution. We have not yet come across a real example where this would have made adifference, but consider the following possible exchange:

S: When would you like to arriveU: I would like to leave at 3 to get in at 4

The system currently has no rules for the construction ‘get in’, so the most specific rule to applyto ‘4’ is the departure time rule (‘4’ is a time in a sentence containing the word ‘leave’). However,‘3’ is also mapped to a departure time and receives a higher weight since it is in a constructionwith ‘leave’. Thus at the next filtering stage, the mapping to ‘departure-time = 4’ is discarded andwe are left with the single output ‘departure-time = 3’. In contrast, an algorithm which looked forthe best global solution might have provided the required result, ‘departure-time = 3, arrival-time= 4’.

4.3.4 Context dependent interpretation

A key feature of the approach is that interpretation of one item can be dependent upon interpre-tation of another part of the utterance. This includes cases where the two items would occur inseparate fragments in a grammar based analysis. Consider the following examples:

I’d like to leave York, now let’s see, yes, at 3pm

at 3pm � departure time(3pm)

Page 47: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 47/58

I’d like to arrive at York, now let’s see, yes, at 3 pm

at 3pm � arrival time(3pm)

The translation of the phrase “at 3pm” is dependent here not on any outside context but on therest of the utterance. This is naturally incorporated in the rule given earlier which just looks for‘leave’ anywhere in the utterance.

4.3.5 Relationship to other Approaches

The use of a distributed representation (flat structured semantics) to enable mapping rules to takepieces of information from different parts of an utterance is a key idea in the work of [4]. Theuse of underspecified representations to provide a semantic representation for fragments has beendiscussed by Pinkal in the context of Radical Underspecification [18]. Viewing interpretation asthe process of finding the most likely meaning in the given context is a view shared with statisticalmodels of dialogue interpretation such as [14].

4.4 Reconfigurability

Reconfiguring to a new task requires the introduction of new mapping rules, and the addition oflexical entries for at least the words mentioned in the mapping rules. Parsing and morphologymodules are shared across different tasks. The robust nature of the approach means that we canprovide a working system without providing a full parse for all, or even for a majority of theutterances in the domain. There is no need to deal with new words or constructions in a newdomain unless they are specifically mentioned in task specific mappings.

For route planning we were able to obtain better performance than our previous systems usingjust 70 mapping rules to cover the four slots, ‘destination’, ‘origin’, ‘arrival-time’. ‘trip-mode’and ‘departure-time’, and just over a hundred lexical entries (excluding city names).

4.5 Evaluation on a transcribed corpus

The approach has been evaluated using a corpus of transcribed spoken dialogues collected bythe Wizard of Oz technique (i.e. a human pretending to be a computer: in this case by typingresponses which were relayed down the phone using a speech synthesiser). The transcriptions

Page 48: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 48/58

include repairs and hesitations, but not recognition errors. The system was trained on one thirdof the corpus, and tested on two thirds. The test set included 200 user replies. Precision andrecall were measured on slot value pairings i.e. for a pairing to be correct both the slot and valuehad to be correct.

The first evaluation was against an existing phrase spotting system which had performed wellwhen evaluated against a full semantics approach [11] in a different domain. A significant im-provement in recall and precision was achieved, but coverage differences meant it was unclearhow valid the comparison was. We therefore performed a second evaluation which investigatedto what extent different knowledge source made a difference. In all cases we used sortal in-formation, but we tried the system with and without access to the previous utterance, access toutterance context outside the phrase we are interested in, and without phrasal information. Theprecision and recall results were as follows:

Prev Utt Utt Cxt Phr Cxt R P22 96

yes 34 75yes 38 89

yes yes 40 87yes yes 51 78yes yes yes 52 79

All six systems are relatively conservative giving good precision. The first system corresponds toonly taking sortal information into account. For this domain, sortal information safely determinesthe slot in the case of ‘trip mode’ e.g. whether the user wants the shortest journey, or the quickest.A phrase spotter which does not use context corresponds to recall of 38 percent. This will pickup ‘to Cambridge’ as the destination, but does not have enough information to decide whether ‘at5pm’ is an arrival time or a departure time. As hoped, use of more linguistic information fromwithin the utterance improves recall and performance, though the improvements are not huge.

The poor recall figures for all six systems are also reflected in the training set. Almost all the lossis due to inadequate lexical/syntactic coverage of potential slot values (e.g. complex city namesand time expressions etc.). There are however some cases where even the most relaxed versionsof the rules were still too restrictive e.g.

from uhm Evesham to uh Windermere

This particular case could be dealt with easily by the use of simple reconstruction rules (deleting‘uh’ and ‘uhm’). Other options we are considering are to expand the use of distributed repre-sentation to include positional and syntactic constraints, or to allow non-exact matches between

Page 49: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 49/58

‘ideal’ scenarios represented by the left hand side of mapping rules and actual input (weightingthen becomes more akin to a distance measure).

4.6 Conclusion

In this chapter we have described an approach which exploits a distributed representation whichincludes both content and structure. Mapping rules (from semantics to slot-fillers) apply directlyto the underspecified semantic interpretation. Ungrammatical input is treated by relaxing themapping rules rather than by trying to recreate a connected parse. In being able to use struc-tural and contextual information where it is available and relevant to the task, the approach canimprove on keyword or phrase spotting approaches, while avoiding many of the pitfalls of pre-mature commitment (e.g. to longest fragments) found in many grammar based systems.

Page 50: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 50/58

Chapter 5

The Contrast between two Approaches toRobustness

We have described two approaches to robust linguistic processing that share some fairly abstractproperties. The are both implemented as postprocessing on fragments supplied by a parser andboth work mainly at the level of semantic representations. However, there are some rather deepseated differences both in the methods they adopt and the perspectives on the problem of ro-bustness that underly those methods. Given the rather abstract discussion of robustness at thebeginning of this report it is appropriate to place the actual methods we intend to adopt in asimilarly abstract backgound. We are not doing this for the sake of it, but we intend to demon-strate that the differences between these approaches is such that even the least intelligent way ofcombining them ought to lead to a complementary effect.

The most general characterisations of the difference in perspective that the two strategies em-body, could be seen as a simple contrast between optimism and pessimism. Most of the specificdifferences in their execution follow from there. The repair strategy is essentially optimistic. Itstarts from the premise that the effects you are trying to be robust again can be counteracted andthe analysis repaired. The input is then really only a normal linguistic signal perturbed by somenoise. A repair module sets out to restore the ideal situation and eliminate the effects of the noiseperturbation, to make the linguistic signal whole again. It proceeds by applying methods that arevery much oriented to specific phenomena, in seeking to identify the nature of the problem, orfiguratively to detract the noise from the signal. Each repair rule is therefore like a little demonlooking for the phenomenon that fits its pattern. Repair module doesn’t really know what hap-pened so it applies any rule that matches and the resulting constituents can be subject to furtherrepair rules. This is essentially a bottom-up process working from the data that is presented, buttreating that data with a degree of circumspection so that some of it may be rejected if it ap-pears to belong to the redundant part of a known phenomena. The top-down needs of the currentcontext or the underlying application are not taken into account.

Page 51: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 51/58

If we adopt Robust Interpretation without any repair mechanism, then this is essentially a pes-simistic approach. It accepts the fact that the parser has completed its work and that all thelinguistic information that can be found has been found. It does not attempt to reconstruct theunderlying competent input by counteracting performance and processing effects. Instead, it at-tempts to match information that is available to the expectations that the application provides.The state it is trying to reach is the best available compromise between the information that ispresent in the parser’s partial analysis and the expectation of what was most likely to happennext.

Both approaches seek to achieve graceful degradation. This means that the quality of the robustresult is proportional to the disturbance in the original input. The worse the problems you areworking against the greater the degree of compromise in the result that is accepted. However, theemphasis on the two words in the phrase is rather different. Repair is trying to reconstruct idealconditions and is, therefore, most at home in the top part of the scale where the performance andrecognition effects are actually least pronounced. In fact, repair will be quite happy to perfectlyrestore the original or intended linguistic input whether this actually helps the interpretationprocess or not. Robust intrepretation can cope with degradation, and the response is proportionalto the degree of degradation in the input signal all the way down the line, including if you allowit beyond the point where the internal system expectations determine the result more than theactual input.

Page 52: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 52/58

Chapter 6

A Baseline Architecture for CombiningRobustness Strategies

While we believe that the robustness strategies that we will adopt in the Siridus prototype canprovide a highly complementary coverage, they represent such distinct prespectives in the waythat they attempt to provide robust operations that the best combination will require careful anal-ysis of their individual strengths and weaknesses. In the initial instance we will not be able topresent an optimal combination, so that our first integration for the demonstator will be a rela-tively simple combination of the existing implemented modules, or ported versions of these. Weexpect that even a simple combination will provide some complementary coverage, i.e. that thecombination of the two approaches will perform better that either alone. To test this hypothesiswe require also that the architecture permit each to act independently, so that its functionalitycan be neutralised without affecting any other module. We therefore set ourselves the immediatetask of combining the two modules so that they can be evaluated in combination and separately.

Both of these modules were originally designed to apply over fragmentary analyses resultingfrom incomplete parsing. However, the repair strategy is closer to the parser and has the sametype of object as both input and output. Robust interpretation carries out a mapping so that itsoutput is not available for any similar analysis process. There is therefore a natural precedencebetween the two modules. A first approximation of the appropriate architecture would thereforecreate a pipeline between parsing, repair and robust interpretation. This would indeed be a viablearchitecture, but just stating the sequence of module applications is not an adequate description.We also need to consider the interface definitions and how the overall processing dependenciesinteract with incrementality at run time. Even if we can sketch the module structure this is onlya schema until the module interactions are also described.

In addition, we should also consider what modifications will be required to incorporate thesemodules together within the Siridus architecture. Part of this will involve changes in the inter-

Page 53: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 53/58

face structures, but it may also involve module interaction. We have noted that the position inthe architecture of the repair module and its interactions with the parser were largely determinedby the complex analysis architecture in Verbmobil. We must also describe the analysis archi-tecture in the Siridus demonstrator and consider how this affects the interfaces and interactionsbetween the parser and the ported repair module. The deciding factors for the combination ofthese modules will include general resource management, since we are trying to reuse softwaredeveloped for various different systems. The components will have to be adapted but it is onlyworth making pragmatic modifications in order to get a first model of how they should interactand how they best complement each other.

6.1 A Simple Analysis Architecture

In this section we must consider the concrete decision concerning how the recogniser, parser,repair and interpretation modules should interact in a baseline architecture to be employed inthe Siridus demonstrator. The first phases of this processing do not directly affect the robust-ness modules, as we are not adopting any robustness operations that apply before parsing, butwe must consider them in order to account for the design of the parser and its interactions withsubsequent modules. Similarly we represent the dialogue manager module in our architecturediagram although we will not have anything to say about it in this report. Each of the moduleswe incorporate in the baseline analysis architecture has been adopted with some degree of modi-fication from usage in a previous project. The following are the main modules in the architectureand their main interface structures:

Recognition: Automatic statistically based speech recognition. The module takes a speech sig-nal as input and produces a word lattice with prosodic annotations.

Parser: The parser is an incremental categorial parser that produces a result in a packed formcontaining mainly semantic information. We refer to this as the semantic chart.

Repair: The repair module acts on the semantic chart and extends it with additional edges rep-resenting analyses completed using repair rules and with bridging.

Translation: This is the interpretation module it extracts most viable analyses from the semanticchart and constructs an interpretation in the form of a slot and filler representation that issubmitted to the dialogue manager, or dialogue move engine.

This description corresponds directly to the data flow diagram of the architecture presented inDeliverable 6.1. While we have described the interfaces in general and formal terms we have notfully determined the module interactions, since questions of incrementality and synchronisation

Page 54: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 54/58

Figure 6.1: Data flow architecture taken form D6.1

have to be answered. As well as the more specific questions of which interface structure areemployed and how these have been modified to accommodate the initial integration of thesemodule functionalities.

The architectural framework described in Deliverable 6.1 and supported by the Trindikit 2.0allows ansynchronous interactions between modules, but the analysis modules define, for themost part, distinct “levels” of analysis. A traditional architecture would really require only astrictly sequential application of each of these modules. What we will implement in Siridus is avariant on this, in that we attempt to compress the processing time as much as possible, becausespeech recognition is known to be expensive in time. However, we will retain the discrete levelsof representation. This can be achieved by determining the incrementality of the processing andthe corresponding units that are communicated between the modules. The most innovative stepwould be to make the interaction between recogniser and parser incremental beyond the level ofindividual segments or phrases. This would entail a successive updating of the lattice structureas it is read by the parser. This can be done if the lattice is effectively the lower strata of a chartrepresentation. However lattice updates may change their information and particularly the scoreson edges in a non-monotonic way, so the parser’s chart representation has to be sensitive to this.

Page 55: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 55/58

Incrementality at the next level is mainly a question of visibility. If the parser constructs a com-plete chart then passes this to the repair module for additions then you have a strictly sequen-tial module interaction. This would also include augmenting the repair rules with generalisedpseudo-syntactic rules to complete repaired analyses. However, we can allow both the parserand repair to work on the same chart and each have access to the passive edges created by theother. In this respect the architecture would be different from the original Verbmobil analysisarchitecture in which the repair module was developed. A close combination of parsing and re-pair is a compromise between extending the parser and adding a purely sequential postprocessor.There will be some selection conflicts in determining which tasks to pursue next. These can becountered, as in the RoSI module by balancing analysis quality against coverage, where repairedconstructions always have a lower score than analysed ones.

One further complexity enters into the representations via the combination of ambiguity packingand repair rules. The basic principle of packing ambiguities in the chart relies on the composi-tionality of the analysis. However, the repair rules can impose constraints that do not preservethis. Repair rules may access not just the span and category information concerning a constituent,but also look inside for e.g. a particular head word. The result is that they may split the candidatesfor a constituent into those that do and those that do not match a repair pattern. To accommodatethis requires changes to be made in the representation of the semantic chart. Options includeassociating extra constraints at nodes, possibly combined with explicit meta-level disjunction ofindices similar to a packed parse forest.

Page 56: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 56/58

Bibliography

[1] Johan Bos, Bianka Buschbeck-Wolf, Michael Dorna, and C. J. Rupp. Managing informa-tion at linguistic interfaces. In Proc. of the 17 ��� COLING/36 ��� ACL, Montreal, Canada,1998.

[2] J. Boye, M. Wiren, M. Rayner, I. Lewin, D. Carter, and R. Becket. Language-processingstrategies and mixed-initiative dialogues. In IJCAI-99 Workshop on Knowledge and Rea-soning in Practical Dialogue Systems, 1999.

[3] D. Carter. The treebanker: a tool for supervised training of parsed corpora. In ACL Work-shop on Computational Environments for Grammar Development and Linguistic Engineer-ing, 1997. Available as SRI Cambridge Technical Report CRC-068.

[4] A. Copestake, D. Flickinger, R. Malouf, S. Riehemann, and I.Sag. Translation using mini-mal recursion semantics. In Sixth International Conference on Theoretical and Mehodolog-ical Issues in Machine Translation, Leuven Belgium, 1995.

[5] M. Egg, J. Niehren, P. Ruhrberg, and F. Xu. Constraints over lambda-structures in semanticunderspecification. In COLING-ACL ’98, Montreal, Canada, 1998.

[6] Annette Frank, Tracy Holloway King, Jonas Kuhn, and John Maxwell. Optimality theorystyle constraint ranking in large-scale lfg grammars. In Proceedings of the LFG98 Confer-ence, The University of Queensland, Brisbane, 1998.

[7] G. Goerz, J. Spilker, V. Strom, and H. Weber. Architectural considerations for conver-sational systems – the verbmobil/intarc experience. In First International Workshop onHuman Computer Conversation, Bellagio, Italy, 1999.

[8] J. Hobbs. An improper treatment of quantification in ordinary english. In 21st AnnualMeeting of the ACL, Cambridge, Mass., 1983.

[9] Denis Howe. Free on-line dictionary of computing. http://www.foldoc.org, 2000.

[10] W. Kasper, B. Kiefer, H.U. Krieger, C. J. Rupp, and K.L.Worm. Charting the depths ofrobust speech processing. In Proceedings of the 37th ACL, 1999.

Page 57: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 57/58

[11] I. Lewin, R. Becket, J. Boye, D. Carter, M. Rayner, and M. Wiren. Language processingfor spoken dialogue systems: is shallow parsing enough? In ESCA ETRW Workshop onAccessing information in Spoken Audio, Cambridge, 1999. Available as SRI CambridgeTechnical Report CRC-074.

[12] M. Marcus, D. Hindle, and M. Fleck. D-theory: Talking about talking about trees. In 21stAnnual Meeting of the ACL, pages 129–136, Cambridge, Mass., 1983.

[13] Wolfgang Menzel. Robust Processing of Natural Language. In Proc. of the 19 ��� GermanConference on Artificial Intelligence, pages ?–?, Bielefeld, 1995.

[14] S. Miller, D. Stallard, R. Bobrow, and R.Schwartz. A fully statistical approach to naturallanguage interfaces. In Proceedings of the 34th Annual Meeting of the ACL, pages 55–61,University of California, 1996.

[15] D. Milward. Dynamic dependency grammar. Linguistics and Philosophy, 17:561–605,1994.

[16] David Milward. Distributing representation for robust interpretation of dialogue utterances.In Proceedings of the 38th ACL, pages 133–141, Hong Kong, 2000.

[17] R. C. Moore and H. Alshawi. Syntactic and semantic processing. In The Core LanguageEngine, pages 129–146. MIT Press, 1992.

[18] M. Pinkal. Radical underspecification. In 10th Amsterdam Colloquium, volume III, pages587–606, 1995.

[19] Manfred Pinkal, C.J. Rupp, and Karsten Worm. Robust semantic processing of spokenlanguage. In Wahlster [30], pages 321–335.

[20] U. Reyle. Dealing with ambiguities by underspecification: Construction, representationand deduction. Journal of Semantics, 10:123–179, 1993.

[21] Eric K. Ringger and James F. Allen. Error correction via a post-processor for continu-ous speech re cognition. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing(ICASSP), Atlanta, GA, 1996. IEEE Signal Processing Society.

[22] Eric K. Ringger and James F. Allen. Robust error correction of continuous speech recog-nition. In Proceedings of the ESCA-NATO Workshop on Robust Speech Recognition forUnknown Communication Channels, Pont-a-Mousson, France, April 1997.

[23] Tobias Ruland, C. J. Rupp, Jorg Spilker, Hans Weber, and Karsten L. Worm. Making themost of multiplicity: A multi-parser multi-strategy architecture for the robust processingof spoken language. In Proc. of the 1998 International Conference on Spoken LanguageProcessing (ICSLP 98), Sydney, Australia, 1998.

Page 58: A Robust Linguistic Processing Architecture2000:RLP.pdfpossible, creating larger constituents which span e.g. a repetition or a correction. The second approach is concernedwith takinga

SIRIDUS project Ref. IST-1999-10516, February 26, 2001 Page 58/58

[24] C. J. Rupp. Constraint propagation and semantic representation. In C. J. Rupp, M. Ros-ner, and R. Johnson, editors, Constraints, Language, and Computation. Academic Press,London, 1993.

[25] C. J. Rupp, Jorg Spilker, Martin Klarner, and Karsten Worm. Combining analyses fromvarious parsers. In Wahlster [30], pages 311–320.

[26] Michael Schiehlen, Johan Bos, and Michael Dorna. Verbmobil interface terms (vits). InWahlster [30], pages 183–199.

[27] Jorg Spilker, Martin Klarner, and Gunter Gorz. Processing self-corrections in a speech-to-speech system. In Wahlster [30], pages 131–140.

[28] A. Trujillo. Bi-lexical rules for multi-lexeme translation in lexicalist mt. In Sixth Interna-tional Conference on Theoretical and Methodological Issues in Machine Translation, pages48–66, Leuven Belgium, 1995.

[29] G. van Noord, G. Bouma, R. Koeling, and M-J. Nederhof. Robust grammatical analysis forspoken dialogue systems. Natural Language Engineering, 5(1):45–93, 1999.

[30] Wolfgang Wahlster, editor. Verbmobil: Foundations of Speech-to-Speech Translation.Springer-Verlag, Berlin, 2000.

[31] Karsten L. Worm. Robust Semantic Processing for Spoken Language. PhD thesis, Univer-sitat des Saarlandes, Saarbrucken, Germany, June 2000.

[32] Karsten L. Worm and C. J. Rupp. Towards robust understanding of speech by combinationof partial analyses. In Proc. of the 13 ��� ECAI, pages 190–194, Brighton, UK, 1998.