8
Edinburgh Research Explorer Compiling language models from a linguistically motivated unification grammar Citation for published version: Rayner, M, Hockey, BA, James, F, Bratt, EO, Goldwater, S & Gawron, JM 2000, Compiling language models from a linguistically motivated unification grammar. in Proceedings of the 18th conference on Computational linguistics - Volume 2. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 670-676. <http://www.aclweb.org/anthology-new/C/C00/C00-2097.pdf> Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Proceedings of the 18th conference on Computational linguistics - Volume 2 General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 08. Aug. 2020

Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

Edinburgh Research Explorer

Compiling language models from a linguistically motivatedunification grammar

Citation for published version:Rayner, M, Hockey, BA, James, F, Bratt, EO, Goldwater, S & Gawron, JM 2000, Compiling languagemodels from a linguistically motivated unification grammar. in Proceedings of the 18th conference onComputational linguistics - Volume 2. Association for Computational Linguistics, Stroudsburg, PA, USA, pp.670-676. <http://www.aclweb.org/anthology-new/C/C00/C00-2097.pdf>

Link:Link to publication record in Edinburgh Research Explorer

Document Version:Peer reviewed version

Published In:Proceedings of the 18th conference on Computational linguistics - Volume 2

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 08. Aug. 2020

Page 2: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

Compiling Language Models from a Linguistically MotivatedUni�cation Grammar

Manny Rayneryz�, Beth Ann Hockeyy, Frankie Jamesy

Elizabeth Owen Brattz, Sharon Goldwaterz and Jean Mark Gawronz

yResearch Institute for zSRI International �netdecisionsAdvanced Computer Science 333 Ravenswood Ave Wellington House

Mail Stop 19-39 Menlo Park, CA 94025 East RoadNASA Ames Research Center Cambridge CB1 1BHMo�ett Field, CA 94035-1000 England

Abstract

Systems now exist which are able to compileuni�cation grammars into language models thatcan be included in a speech recognizer, but itis so far unclear whether non-trivial linguisti-cally principled grammars can be used for thispurpose. We describe a series of experimentswhich investigate the question empirically, byincrementally constructing a grammar and dis-covering what problems emerge when succes-sively larger versions are compiled into �nitestate graph representations and used as lan-guage models for a medium-vocabulary recog-nition task.

1 Introduction1

Construction of speech recognizers for medium-vocabulary dialogue tasks has now become animportant practical problem. The central taskis usually building a suitable language model,and a number of standard methodologies havebecome established. Broadly speaking, thesefall into two main classes. One approach isto obtain or create a domain corpus, and fromit induce a statistical language model, usuallysome kind of N-gram grammar; the alternativeis to manually design a grammar which speci�esthe utterances the recognizer will accept. Thereare many theoretical reasons to prefer the �rstcourse if it is feasible, but in practice there is of-ten no choice. Unless a substantial domain cor-pus is available, the only method that stands achance of working is hand-construction of an ex-

1The majority of the research reported was performedat RIACS under NASA Cooperative Agreement NumberNCC 2-1006. The research described in Section 3 wassupported by the Defense Advanced Research ProjectsAgency under Contract N66001{94{C{6046 with theNaval Command, Control, and Ocean Surveillance Cen-ter.

plicit grammar based on the grammar-writer'sintuitions.

If the application is simple enough, experi-ence shows that good grammars of this kindcan be constructed quickly and e�ciently usingcommercially available products like ViaVoiceSDK (IBM 1999) or the Nuance Toolkit (Nu-ance 1999). Systems of this kind typically al-low speci�cation of some restricted subset of theclass of context-free grammars, together withannotations that permit the grammar-writer toassociate semantic values with lexical entriesand rules. This kind of framework is fully ad-equate for small grammars. As the grammarsincrease in size, however, the limited expres-sive power of context-free language notation be-comes increasingly burdensome. The grammartends to become large and unwieldy, with manyrules appearing in multiple versions that con-stantly need to be kept in step with each other.It represents a large development cost, is hardto maintain, and does not usually port well tonew applications.

It is tempting to consider the option of mov-ing towards a more expressive grammar formal-ism, like uni�cation grammar, writing the orig-inal grammar in uni�cation grammar form andcompiling it down to the context-free notationrequired by the underlying toolkit. At leastone such system (Gemini; (Moore et al 1997))has been implemented and used to build suc-cessful and non-trivial applications, most no-tably CommandTalk (Stent et al 1999). Gem-ini accepts a slightly constrained version of theuni�cation grammar formalism originally usedin the Core Language Engine (Alshawi 1992),and compiles it into context-free grammars inthe GSL formalism supported by the NuanceToolkit. The Nuance Toolkit compiles GSLgrammars into sets of probabilistic �nite state

v1lfass
Typewritten Text
Rayner, M., Hockey, B. A., James, F., Bratt, E. O., Goldwater, S., & Gawron, J. M. (2000). Compiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume 2. (pp. 670-676). Stroudsburg, PA, USA: Association for Computational Linguistics.
Page 3: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

graphs (PFSGs), which form the �nal languagemodel.

The relative success of the Gemini systemsuggests a new question. Uni�cation grammarshave been used many times to build substantialgeneral grammars for English and other natu-ral languages, but the language model orientedgrammars so far developed for Gemini (includ-ing the one for CommandTalk) have all beendomain-speci�c. One naturally wonders howfeasible it is to take yet another step in the di-rection of increased generality; roughly, whatwe want to do is start with a completely gen-eral, linguistically motivated grammar, combineit with a domain-speci�c lexicon, and compilethe result down to a domain-speci�c context-free grammar that can be used as a languagemodel. If this programme can be realized, it iseasy to believe that the result would be an ex-tremely useful methodology for rapid construc-tion of language models. It is important to notethat there are no obvious theoretical obstaclesin our way. The claim that English is context-free has been respectable since at least the early80s (Pullum and Gazdar 1982)2, and the ideaof using uni�cation grammar as a compact wayof representing an underlying context-free lan-guage is one of the main motivations for GPSG(Gazdar et al 1985) and other formalisms basedon it. The real question is whether the goal ispractically achievable, given the resource limi-tations of current technology.

In this paper, we describe work aimed at thetarget outlined above, in which we used theGemini system (described in more detail in Sec-tion 2) to attempt to compile a variety of lin-guistically principled uni�cation grammars intolanguage models. Our �rst experiments (Sec-tion 3) were performed on a large pre-existinguni�cation grammar. These were unsuccessful,for reasons that were not entirely obvious; inorder to investigate the problem more system-atically, we then conducted a second series ofexperiments (Section 4), in which we incremen-tally built up a smaller grammar. By monitor-ing the behavior of the compilation process andthe resulting language model as the grammar's

2We are aware that this claim is most probably nottrue for natural languages in general (Bresnan et al

1987), but further discussion of this point is beyond thescope of the paper.

coverage was expanded, we were able to iden-tify the point at which serious problems beganto emerge (Section 5). In the �nal section, wesummarize and suggest further directions.

2 The Gemini Language ModelCompiler

To make the paper more self-contained, this sec-tion provides some background on the methodused by Gemini to compile uni�cation gram-mars into CFGs, and then into language mod-els. The basic idea is the obvious one: enu-merate all possible instantiations of the featuresin the grammar rules and lexicon entries, andthus transform each rule and entry in the orig-inal uni�cation grammar into a set of rules inthe derived CFG. For this to be possible, therelevant features must be constrained so thatthey can only take values in a �nite prede�nedrange. The �nite range restriction is inconve-nient for features used to build semantic repre-sentations, and the formalism consequently dis-tinguishes syntactic and semantic features; se-mantic features are discarded at the start of thecompilation process.A naive implementation of the basic method

would be impractical for any but the small-est and simplest grammars, and considerableingenuity has been expended on various opti-mizations. Most importantly, categories are ex-panded in a demand-driven fashion, with infor-mation being percolated both bottom-up (fromthe lexicon) and top-down (from the grammar'sstart symbol). This is done in such a waythat potentially valid combinations of featureinstantiations in rules are successively �lteredout if they are not licensed by the top-downand bottom-up constraints. Ranges of featurevalues are also kept together when possible, sothat sets of context-free rules produced by thenaive algorithm may in these cases be mergedinto single rules.By exploiting the structure of the gram-

mar and lexicon, the demand-driven expansionmethod can often e�ect substantial reductionsin the size of the derived CFG. (For the typeof grammar we consider in this paper, the re-duction is typically by a factor of over 1020).The downside is that even an apparently smallchange in the syntactic features associated witha rule may have a large e�ect on the size of

Page 4: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

the CFG, if it opens up or blocks an impor-tant percolation path. Adding or deleting lexi-con entries can also have a signi�cant e�ect onthe size of the CFG, especially when there areonly a small number of entries in a given gram-matical category; as usual, entries of this typebehave from a software engineering standpointlike grammar rules.The language model compiler also performs

a number of other non-trivial transformations.The most important of these is related to thefact that Nuance GSL grammars are not al-lowed to contain left-recursive rules, and left-recursive uni�cation-grammar rules must con-sequently be converted into a non-left-recursiveform. Rules of this type do not however occurin the grammars described below, and we conse-quently omit further description of the method.

3 Initial Experiments

Our initial experiments were performed on arecent uni�cation grammar in the ATIS (AirTravel Information System) domain, developedas a linguistically principled grammar with adomain-speci�c lexicon. This grammar wascreated for an experiment comparing cover-age and recognition performance of a hand-written grammar with that of automatically de-rived recognition language models, as increas-ing amounts of data from the ATIS corpuswere made available for each method. Exam-ples of sentences covered by this grammar are\yes", \on friday", \i want to y from bostonto denver on united airlines on friday septem-ber twenty third", \is the cheapest one wayfare from boston to denver a morning ight",and \what ight leaves earliest from boston tosan francisco with the longest layover in den-ver". Problems obtaining a working recognitiongrammar from the uni�cation grammar endedour original experiment prematurely, and ledus to investigate the factors responsible for thepoor recognition performance.We explored several likely causes of recogni-

tion trouble: number of rules, number of vocab-ulary items, size of node array, perplexity, andcomplexity of the grammar, measured by aver-age and highest number of transitions per graphin the PFSG form of the grammar.We were able to immediately rule out sim-

ple size metrics as the cause of Nuance's di�-

culties with recognition. Our smallest air travelgrammar had 141 Gemini rules and 1043 words,producing a Nuance grammar with 368 rules.This compares to the CommandTalk grammar,which had 1231 Gemini rules and 1771 words,producing a Nuance grammar with 4096 rules.

To determine whether the number of thewords in the grammar or the structure ofthe phrases was responsible for the recognitionproblems, we created extreme cases of a Word+grammar (i.e. a grammar that constrains theinput to be any sequence of the words in thevocabulary) and a one-word-per-category gram-mar. We found that both of these variantsof our grammar produced reasonable recogni-tion, though the Word+ grammar was very in-accurate. However, a three-words-per-categorygrammar could not produce successful speechrecognition.

Many feature speci�cations can make a gram-mar more accurate, but will also result in alarger recognition grammar due to multiplica-tion of feature values to derive the categoriesof the context-free grammar. We experimentedwith various techniques of selecting features tobe retained in the recognition grammar. As de-scribed in the previous section, Gemini's defaultmethod is to select only syntactic features andnot consider semantic features in the recogni-tion grammar. We experimented with selectinga subset of syntactic features to apply and withapplying only semantic sortal features, and nosyntactic features. None of these grammars pro-duced successful speech recognition.

>From these experiments, we were unable toisolate any simple set of factors to explain whichgrammars would be problematic for speechrecognition. However, the numbers of transi-tions per graph in a PFSG did seem suggestiveof a factor. The ATIS grammar had a high of1184 transitions per graph, while the semanticgrammar of CommandTalk had a high of 428transitions per graph, and produced very rea-sonable speech recognition.

Still, at the end of these attempts, it becameclear that we did not yet know the precise char-acteristic that makes a linguistically motivatedgrammar intractable for speech recognition, northe best way to retain the advantages of thehand-written grammar approach while provid-ing reasonable speech recognition.

Page 5: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

4 Incremental GrammarDevelopment

In our second series of experiments, we in-crementally developed a new grammar fromscratch. The new grammar is basically a scaled-down and adapted version of the Core Lan-guage Engine grammar for English (Pulman1992; Rayner 1993); concrete development workand testing were organized around a speech in-terface to a set of functionalities o�ered by asimple simulation of the Space Shuttle (Rayner,Hockey and James 2000). Rules and lexicalentries were added in small groups, typically2{3 rules or 5{10 lexical entries in one incre-ment. After each round of expansion, we testedto make sure that the grammar could still becompiled into a usable recognizer, and at sev-eral points this suggested changes in our im-plementation strategy. The rest of this sectiondescribes the new grammar in more detail.

4.1 Overview of Rules

The current versions of the grammar and lexi-con contain 58 rules and 301 unin ected entriesrespectively. They cover the following phenom-ena:

1. Top-level utterances: declarative clauses,WH-questions, Y-N questions, imperatives,elliptical NPs and PPs, interjections.

2. WH-movement of NPs and PPs.

3. The following verb types: intransi-tive, simple transitive, PP complement,modal/auxiliary, -ing VP complement, par-ticle+NP complement, sentential comple-ment, embedded question complement.

4. PPs: simple PP, PP with postposition(\ago"), PP modi�cation of VP and NP.

5. Relative clauses with both relative NP pro-noun (\the temperature that I measured")and relative PP (\the deck where I am").

6. Numeric determiners, time expressions,and postmodi�cation of NP by numeric ex-pressions.

7. Constituent conjunction of NPs andclauses.

The following example sentences illustratecurrent coverage: \yes", \how about scenario

three?", \what is the temperature?", \mea-sure the pressure at ight deck", \go to thecrew hatch and close it", \what were temper-ature and pressure at �fteen oh �ve?", \is thetemperature going up?", \do the �xed sensorssay that the pressure is decreasing?", \�nd outwhen the pressure reached �fteen p s i", \whatis the pressure that you measured?", \what isthe temperature where you are?", \can you �ndout when the �xed sensors say the temperatureat ight deck reached thirty degrees celsius?".

4.2 Unusual Features of the Grammar

Most of the grammar, as already stated, isclosely based on the Core Language Enginegrammar. We brie y summarize the main di-vergences between the two grammars.

4.2.1 Inversion

The new grammar uses a novel treatment ofinversion, which is partly designed to simplifythe process of compiling a feature grammar intocontext-free form. The CLE grammar's treat-ment of inversion uses a movement account, inwhich the fronted verb is moved to its notionalplace in the VP through a feature. So, forexample, the sentence \is pressure low?" willin the original CLE grammar have the phrase-structure

\[[is]V [pressure]NP [[]V [low]ADJ ]V P ]S"

in which the head of the VP is a V gap coin-dexed with the fronted main verb \is".Our new grammar, in contrast, handles in-

version without movement, by making the com-bination of inverted verb and subject into aVBAR constituent. A binary feature invsubjpicks out these VBARs, and there is a question-formation rule of the form

S ! VP:[invsubj=y]

Continuing the example, the new gram-mar assigns this sentence the simpler phrase-structure

\[[[is]V [pressure]NP ]VBAR [[low]ADJ ]V P ]S"

4.2.2 Sortal Constraints

Sortal constraints are coded into most grammarrules as syntactic features in a straight-forwardmanner, so they are available to the compilation

Page 6: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

process which constructs the context-free gram-mar, and ultimately the language model. Thecurrent lexicon allows 11 possible sortal valuesfor nouns, and 5 for PPs.We have taken the rather non-standard step

of organizing the rules for PP modi�cation sothat a VP or NP cannot be modi�ed by twoPPs of the same sortal type. The principal mo-tivation is to tighten the language model withregard to prepositions, which tend to be pho-netically reduced and often hard to distinguishfrom other function words. For example, with-out this extra constraint we discovered that anutterance like

measure temperature at ight deckand lower deck

would frequently be misrecognized as

measure temperature at ight deck inlower deck

5 Experiments with IncrementalGrammars

Our intention when developing the new gram-mar was to �nd out just when problems beganto emerge with respect to compilation of lan-guage models. Our initial hypothesis was thatthese would probably become serious if the rulesfor clausal structure were reasonably elaborate;we expected that the large number of possibleways of combining modal and auxiliary verbs,question formation, movement, and sententialcomplements would rapidly combine to producean intractably loose language model. Interest-ingly, this did not prove to be the case. In-stead, the rules which appear to be the primarycause of di�culties are those relating to relativeclauses. We describe the main results in Sec-tion 5.1; quantitative results on recognizer per-formance are presented together in Section 5.2.

5.1 Main Findings

We discovered that addition of the single rulewhich allowed relative clause modi�cation of anNP had a drastic e�ect on recognizer perfor-mance. The most obvious symptoms were thatrecognition became much slower and the size ofthe recognition process much larger, sometimescausing it to exceed resource bounds. The falsereject rate (the proportion of utterances which

fell below the recognizer's minimum con�dencetheshold) also increased substantially, thoughwe were surprised to discover no signi�cant in-crease in the word error rate for sentences whichdid produce a recognition result. To investi-gate the cause of these e�ects, we examined theresults of performing compilation to GSL andPFSG level. The compilation processes are suchthat symbols retain mnemonic names, so that itis relatively easy to �nd GSL rules and graphsused to recognize phrases of speci�ed grammat-ical categories.At the GSL level, addition of the relative

clause rule to the original uni�cation grammaronly increased the number of derived Nuancerules by about 15%, from 4317 to 4959. The av-erage size of the rules however increased muchmore3. It is easiest to measure size at the level ofPFSGs, by counting nodes and transitions; wefound that the total size of all the graphs had in-creased from 48836 nodes and 57195 transitionsto 113166 nodes and 140640 transitions, rathermore than doubling. The increase was not dis-tributed evenly between graphs. We extracted�gures for only the graphs relating to speci�cgrammatical categories; this showed that thenumber of graphs for NPs had increased from94 to 258, and moreover that the average sizeof each NP graph had increased from 21 nodesand 25.5 transitions to 127 nodes and 165 transi-tions, a more than sixfold increase. The graphsfor clause (S) phrases had only increased innumber from 53 to 68. They had however alsogreatly increased in average size, from 171 nodesand 212 transitions to 445 nodes and 572 tran-sitions, or slightly less than a threefold increase.Since NP and S are by far the most importantcategories in the grammar, it is not strange thatthese large changes make a great di�erence tothe quality of the language model, and indi-rectly to that of speech recognition.Comparing the original uni�cation grammar

and the compiled GSL version, we were able tomake a precise diagnosis. The problem with therelative clause rules are that they unify featurevalues in the critical S and NP subgrammars;this means that each constrains the other, lead-ing to the large observed increase in the sizeand complexity of the derived Nuance grammar.

3GSL rules are written in an notation which allowsdisjunction and Kleene star.

Page 7: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

Speci�cally, agreement information and sortalcategory are shared between the two daugh-ter NPs in the relative clause modi�cation rule,which is schematically as follows:

NP:[agr=A, sort=S] !NP:[agr=A, sort=S]REL:[agr=A, sort=S]

These feature settings are needed in order to getthe right alternation in pairs like

the robot that *measure/measuresthe temperature [agr]

the *deck/temperature that youmeasured [sort]

We tested our hypothesis by commenting outthe agr and sort features in the above rule.This completely solves the main problem of ex-plosion in the size of the PFSG representation;the new version is only very slightly larger thanthe one with no relative clause rule (50647 nodesand 59322 transitions against 48836 nodes and57195 transitions). Most importantly, there isno great increase in the number or average sizeof the NP and S graphs. NP graphs increase innumber from 94 to 130, and stay constant in av-erage size; S graphs increase in number from 53to 64, and actually decrease in average size to135 nodes and 167 transitions. Tests on speechdata show that recognition quality is nearly thesame as for the version of the recognizer whichdoes not cover relative clauses. Although speedis still signi�cantly degraded, the process sizehas been reduced su�ciently that the problemswith resource bounds disappear.It would be reasonable to expect that remov-

ing the explosion in the PFSG representationwould result in an underconstrained languagemodel for the relative clause part of the gram-mar, causing degraded performance on utter-ances containing a relative clause. Interestingly,this does not appear to happen, though recog-nition speed under the new grammar is signif-icantly worse for these utterances compared toutterances with no relative clause.

5.2 Recognition Results

This section summarizes our empirical recog-nition results. With the help of the NuanceToolkit batchrec tool, we evaluated three ver-sions of the recognizer, which di�ered only with

respect to the language model. no rels usedthe version of the language model derived from agrammar with the relative clause rule removed;rels is the version derived from the full gram-mar; and unlinked is the compromise version,which keeps the relative clause rule but removesthe critical features. We constructed a corpusof 41 utterances, of mean length 12.1 words.The utterances were chosen so that the �rst 31were within the coverage of all three versionsof the grammar; the last 10 contained relativeclauses, and were within the coverage of relsand unlinked but not of no rels. Each utter-ance was recorded by eight di�erent subjects,none of whom had participated in developmentof the grammar or recognizers. Tests were runon a dual-processor SUN Ultra60 with 1.5 GBof RAM.

The recognizer was set to reject utterances iftheir associated con�dence measure fell underthe default threshold. Figures 1 and 2 sum-marize the results for the �rst 31 utterances(no relative clauses) and the last 10 utterances(relative clauses) respectively. Under `xRT',we give mean recognition speed (averaged oversubjects) expressed as a multiple of real time;`FRej' gives the false reject rate, the mean per-centage of utterances which were rejected due tolow con�dence measures; `Mem' gives the meanpercentage of utterances which failed due to therecognition process exceeding memory resourcebounds; and `WER' gives the mean word er-ror rate on the sentences that were neither re-jected nor failed due to resource bound prob-lems. Since the distribution was highly skewed,all means were calculated over the six subjectsremaining after exclusion of the extreme highand low values.

Looking �rst at Figure 1, we see that rels isclearly inferior to no rels on the subset of thecorpus which is within the coverage of both ver-sions: nearly twice as many utterances are re-jected due to low con�dence values or resourceproblems, and recognition speed is about �vetimes slower. unlinked is in contrast not sig-ni�cantly worse than no rels in terms of recog-nition performance, though it is still two and ahalf times slower.

Figure 2 compares rels and unlinked on theutterances containing a relative clause. It seemsreasonable to say that recognition performance

Page 8: Edinburgh Research ExplorerCompiling language models from a linguistically motivated unification grammar. In Proceedings of the 18th conference on Computational linguistics - Volume

Grammar xRT FRej Mem WER

no rels 1.04 9.0% { 6.0%rels 4.76 16.1% 1.1% 5.7%

unlinked 2.60 9.6% { 6.5%

Figure 1: Evaluation results for 31 utterancesnot containing relative clauses, averaged across8 subjects excluding extreme values.

Grammar xRT FRej Mem WER

rels 4.60 26.7% 1.6% 3.5%unlinked 5.29 20.0% { 5.4%

Figure 2: Evaluation results for 10 utterancescontaining relative clauses, averaged across 8subjects excluding extreme values.

is comparable for the two versions: rels haslower word error rate, but also rejects moreutterances. Recognition speed is marginallylower for unlinked, though it is not clear to uswhether the di�erence is signi�cant given thehigh variability of the data.

6 Conclusions and FurtherDirections

We found the results presented above surpris-ing and interesting. When we began our pro-gramme of attempting to compile increasinglylarger linguistically based uni�cation grammarsinto language models, we had expected to see asteady combinatorial increase, which we guessedwould be most obviously related to complexclause structure. This did not turn out to be thecase. Instead, the serious problems we encoun-tered were caused by a small number of crit-ical rules, of which the one for relative clausemodi�cation was by the far the worst. It wasnot immediately obvious how to deal with theproblem, but a careful analysis revealed a rea-sonable compromise solution, whose only draw-back was a signi�cant but undisastrous degra-dation in recognition speed.It seems optimistic to hope that the rela-

tive clause problem is the end of the story; theobvious way to investigate is by continuing toexpand the grammar in the same incrementalfashion, and �nd out what happens next. Weintend to do this over the next few months, and

expect in due course to be able to present fur-ther results.

References

H. Alshawi. 1992. The Core Language Engine.Cambridge, Massachusetts: The MIT Press.

J. Bresnan, R.M. Kaplan, S. Peters and A. Za-enen. Cross-Serial Dependencies in Dutch.1987. In W. J. Savitch et al (eds.), The For-

mal Complexity of Natural Language, Reidel,Dordrecht, pages 286{319.

G. Gazdar, E. Klein, G. Pullum and I. Sag.1985. Generalized Phrase Structure Gram-

mar Basil Blackwell.IBM. 1999. ViaVoice SDK for Windows, ver-sion 1.5.

R. Moore, J. Dowding, H. Bratt, J.M. Gawron,Y. Gorfu, and A. Cheyer. 1997. Com-mandTalk: A Spoken-Language Interfacefor Battle�eld Simulations. Proceedings

of the Fifth Conference on Applied Nat-

ural Language Processing, pages 1{7,Washington, DC. Available online fromhttp://www.ai.sri.com/natural-language/projects/arpa-sls/commandtalk.html.

Nuance Communications. 1999. Nuance SpeechRecognition System Developer's Manual, Ver-sion 6.2

G. Pullum and G. Gazdar. 1982. Natural Lan-guages and Context-Free Languages. Lin-

guistics and Philosophy, 4, pages 471{504.S.G. Pulman. 1992. Uni�cation-Based Syntac-tic Analysis. In (Alshawi 1992)

M. Rayner. 1993. English Linguistic Coverage.In M.S. Agn�as et al. 1993. Spoken Language

Translator: First Year Report. SRI Techni-cal Report CRC-043. Available online fromhttp://www.sri.com.

M. Rayner, B.A. Hockey and F. James. 2000.Turning Speech into Scripts. To appear inProceedings of the 2000 AAAI Spring Sym-

posium on Natural Language Dialogues with

Practical Robotic Devices

A. Stent, J. Dowding, J.M. Gawron, E.O.Bratt, and R. Moore. 1999. The Com-mandTalk Spoken Dialogue System. Pro-

ceedings of the 37th Annual Meeting of the

ACL, pages 183{190. Available online fromhttp://www.ai.sri.com/natural-language/projects/arpa-sls/commandtalk.html.