8
The Induction and Evaluation of Word Order Rules using Corpora based on the Two Concepts of Topological Models Bernd Bohnet Innovative Language Technology, TTI GmbH and University of Stuttgart Visualization and Interactive Systems Group Universit¨ atstr. 58, 70569 Stuttgart, Germany [email protected] Abstract Using dependency trees in natural language generation and machine translation raise the need to derive the word order from de- pendency trees. This task is difficult for languages with (partly) free word order and comparatively easier for languages with fixed word order. This paper describe (a) the two basic elements of topological models, (b) rule patterns for the mapping of depen- dency trees to topological trees, (c) the auto- matic acquisition of word order rules from corpora, annotated with dependency struc- tures and (d) an approach for the automatic evaluation of the results. 1 Introduction Dependency structures are frequently used in Natu- ral Language Generation (NLG) and in some cases in Machine Translation (MT). In NLG, dependency structures are used in the surface realization step. The input to the surface realizer is defined by the standard architecture RAGS for NLG systems as “syntactic representations” which is “based on some notion of abstract syntactic structure which does not encode surface constituency or word order”, cf. page 17 (Mellish et al., 2006). These input structures are usually dependency structures. In MT, statistical n-gram approaches are rather successful and therefore mostly used. Nevertheless, some systems use dependency trees in order to im- prove the results for instance Lavoie et al. (2000), ˇ Cmejrek et al. (2003), Ding and Palmer (2005) and some approaches describe more the theoretical fun- dament for improving the translation results like Mel’ˇ cuk and Wanner (2006). We will give a brief overview over the history of topological models, since topological models are the basis for this paper. The first model that was developed was the German topological model, cf. Drach (1937), Bech (1955), and H¨ ohle (1986). The complexity of the German word order is obviously the reason behind the development of the German topological model. The syntax and the information structure determines the positions of the constituents in the model. The most salient parts are the so called sentence brackets, namely the left bracket (LK) and the right bracket (RK). The left bracket usually contains the finite verb or a conjunction. The right bracket can be empty or it can contain the infinite verb(s). John wird LK nach Berlin reisen RK. ’John will to Berlin. travel’ lit. weil LK es ihm gef¨ allt RK. ’because it he likes. ’ lit. The syntax determines the constituents of the brackets. In relation to the brackets, three positions are possible: before the left bracket (pre-field), in the middle, i.e., between the left and the right brack- ets (middle field), and after the right bracket (post- field). Nach Berlin VF ist LK er um 5 MF abgereist RK. ’to Berlin is John at 5am departed’ lit. This kind of approach has been developed only with regard to German and a few other languages, such as ancient French by Skarup (1975) and 38

The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

The Induction and Evaluation of Word Order Rules using Corpora based onthe Two Concepts of Topological Models

Bernd BohnetInnovative Language Technology, TTI GmbH

andUniversity of Stuttgart

Visualization and Interactive Systems GroupUniversitatstr. 58, 70569 Stuttgart, Germany

[email protected]

Abstract

Using dependency trees in natural languagegeneration and machine translation raise theneed to derive the word order from de-pendency trees. This task is difficult forlanguages with (partly) free word orderand comparatively easier for languages withfixed word order. This paper describe (a) thetwo basic elements of topological models,(b) rule patterns for the mapping of depen-dency trees to topological trees, (c) the auto-matic acquisition of word order rules fromcorpora, annotated with dependency struc-tures and (d) an approach for the automaticevaluation of the results.

1 Introduction

Dependency structures are frequently used in Natu-ral Language Generation (NLG) and in some casesin Machine Translation (MT). In NLG, dependencystructures are used in the surface realization step.The input to the surface realizer is defined by thestandard architecture RAGS for NLG systems as“syntactic representations” which is “based on somenotion of abstract syntactic structure which does notencode surface constituency or word order”, cf. page17 (Mellish et al., 2006). These input structures areusually dependency structures.

In MT, statistical n-gram approaches are rathersuccessful and therefore mostly used. Nevertheless,some systems use dependency trees in order to im-prove the results for instance Lavoie et al. (2000),Cmejrek et al. (2003), Ding and Palmer (2005) and

some approaches describe more the theoretical fun-dament for improving the translation results likeMel’cuk and Wanner (2006).

We will give a brief overview over the history oftopological models, since topological models arethe basis for this paper. The first model that wasdeveloped was the German topological model, cf.Drach (1937), Bech (1955), and Hohle (1986). Thecomplexity of the German word order is obviouslythe reason behind the development of the Germantopological model. The syntax and the informationstructure determines the positions of the constituentsin the model. The most salient parts are the so calledsentence brackets, namely theleft bracket(LK) andthe right bracket (RK). The left bracket usuallycontains the finite verb or a conjunction. The rightbracket can be empty or it can contain the infiniteverb(s).

John wird LK nach Berlin reisenRK .’John will to Berlin. travel’lit.

weil LK es ihm gefallt RK .’because it he likes. ’lit.

The syntax determines the constituents of thebrackets. In relation to the brackets, three positionsare possible: before the left bracket (pre-field), inthe middle, i.e., between the left and the right brack-ets (middle field), and after the right bracket (post-field).

Nach BerlinV F ist LK er um 5 MF abgereistRK .

’to Berlin is John at 5am departed’lit.

This kind of approach has been developed onlywith regard to German and a few other languages,such as ancient French by Skarup (1975) and

38

Page 2: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

Warlpiri by Donohue and Sag (1999). For somelanguages, such as English, the word order is easyto derive and therefore, only a simple topologicalmodel is needed. For instance, phrase structures ful-fil already the requirements for a topological modelwith regard to English as well as with regard to quitea few other languages. Phrase structures describethe syntax by inclusion of constituent in other con-stituents and the position within the constituents. Inthe following example on the left,the and kitchenare constituents within the constituentc. This canoccur recursively, as shown in the example below.

thea kitchenb c ina the kitchenb c

Figure 1 shows a dependency tree. The words ofthe dependency tree are not ordered. Therefore, theword order has to be derived. Xia and Palmer (2001)describe a method for converting dependency struc-tures to phrase structures. The conversion grammaris trained on corpora annotated with dependencystructures and phrase structures. In order to learnword order rules, this method can be simplified, sothat only dependency structures and the word orderis needed.

has

John

SBJ

read

book about

Berlina

VC

OBJ ADV

PMODNMOD

has

John

SBJ

read

book about

Berlina

VC

OBJ ADV

PMODNMOD

Figure 1: Dependency Tree.

Bohnet and Seniv (2004) introduced a methodto map dependency structures to unordered phrasestructures for languages with free word order. Sincethe constituents are not ordered, additional rules areneeded to derive the order. For German, phrasestructures do not allow to determine all possible or-ders, since for instance the phrases can be discon-tinuous. The classical German topological modeldescribes the word order appropriately. Ideally, thedependency trees would be mapped directly to thetopological model. Unfortunately, no large cor-pora annotated with topological fields are available.Therefore, we were forced to use phrase structuresin order to learn German word order rules.

Fortunately, in the last years, corpora annotatedwith dependency structures for several languageshave become available. This was encouraged bytwo “shared tasks” for dependency parsing. Thus,in the year 2006, 13 corpora became available, andin the year 2007 10 corpora partly identical becameavailable. Out of these, we used to learn lineariza-tion grammars the Catalan corpus, the English PennTreebank, the German Tiger corpus, and the Italiancorpus .

2 The two Elements of Topological Models

The two elements of topological models can be dis-covered by analyzing various topological models,that have been developed mainly for text generation.

Broker (1997) describes word order by domainsand explicit precedence relations. The domains donot allow the words from outside of a domain to gobetween words contained in a domain. A topologyis derived with rules based on modal logic from adependency tree.

Duchier and Debusmann (2001) use a tree to de-scribe the linear precedence (LP). The tree is pro-jective and partially ordered. The edges are labelledwith the names of topological fields. The LP treeis derived from a dependency tree called immediatedominance tree (ID). The LP tree is computed fromthe ID tree by a constrained based approach usinglexical constraints and conditions for the claimingof nodes.

The topological model of Gerdes (2002) consistof topological fields and boxes. Boxes contain fieldswhich contain recursively boxes and boxes againcontain fields. The boxes and fields are supplied inform of a list. Both behave like domains. A topol-ogy is derived by traversing the dependency tree topdown. Each time a word is placed in a box or field,depending on its subcatframe, new boxes or fieldsare created within the box or field.

Word order domains have been used also for thelinearization of phrase structures, cf. (Reape, 1993),(Rambow, 1994), (Kathol and Pollard, 1995). Reapedescribes word order in terms of the containerswhich are associated with phrases. A container caninclude recursively other containers and words. Forthe mapping from phrase structures to such a con-tainer structure, the continuous phrases are associ-

39

Page 3: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

ated with a container and the word of discontinuousphrases are included in the parent container. The or-der between the words is kept during the mapping.

The topological models above define word ordereither in terms of lists or sets. The sets need ad-ditional precedence relations for ordering the ele-ments. A model using sets and precedence rela-tions would allow to define models which use lists.The models using sets cannot be formulated directlyin terms of lists, since models based on sets havethe possibility of underspecification. Underspecifiedmodels contain partially ordered sets and representseveral possible orders.

Domains and precedence relations are the mostgeneral concepts. Each of the previous sketchedmodels can be decomposed using these concepts. Inthe following, we define formally, the two conceptsof topological models.

Def. 1 [Precedence Relation]An precedence relation defines an order betweenwords or sets. Formally, an order relation is de-fined as a subset of the Cartesian product of a setW :<⊆ W × W , where(wi, wj) ∈<if wi is beforewj

Def. 2 [Domain1]A domain is a set of words and domains, where theelements of a domain, that contains another domain,are not present between the words of this other do-main or recursively contained domains:if ∃Dm ∈ Pt : ∀ x ∈ Pt ∧ x 6= Dm →(∀y ∈∗ Dm : x < y) ∨(∀y ∈∗ Dm : x > y). 2

3 Rule Patterns and Mapping Procedure

For the definition of rule patterns and examples,graph transducers are used. Knight and Graehl(2005) give a good overview of different (tree) trans-ducer types. We use transducer with two tapes, cf.(Bohnet, 2006). These transducers have many ad-vantages over single tape transducer: Context is pos-

1We have chosen the name domain, since in the mathematicdomain theory (Gierz et al., 2003) and order theory, the partlyordered sets are called domains, we use therefore, the samename for this basic element of topological word order models.Additionally, this name is already frequently used for thiscon-cept, cf. (Reape, 1993), (Kathol and Pollard, 1995), (Broker,1997).

2The definition means that if a domainDm is in a domainPt

(its parent domain) then the other words or domains of the do-mainPt are either before the words (and recursivley containedwords) or after the words of the domainDm.

sible on both tapes, they are bidirectional, no order-ing of rules is needed and a static relationship be-tween symbols of the tapes is introduced. The mostimportant feature is that the rules can be applied inparallel to all parts of the tree in contrast to normaltree transducers which traverse a tree top-down orbottom-up. Nevertheless, it is still possible to havean execution order between rules, then the order isdetermined by the rule context and not by manualordering. The main advantage of applying rules inparallel is that this allows to define a grammar in cor-respondence to the independence of syntactic con-stituents as they exist in natural language. For in-stance, the complements and adjuncts of a verb canbe ordered in parallel and independently from theadjuncts of a noun such as modifiers and determin-ers.

The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition of Busatto (2002). He definesa hierarchical graph as an underlying flat graph andon top of this a directed acyclic graph(DAG)whichrepresent the hierarchy.

has

John

read

book

about

Berlina

Figure 2: Precedence Graph.The order relations and domains can be mapped

directly to hierarchical graphs. The underlying flatgraph represents the order relation and it is namedGO. The nodes of the graphGO are the wordswhich have to be ordered and the order relation asdefined in Def. 1 is equal to the edge definition:Ew ⊆ W × W . The domain and the recursive in-clusion of domains builds the hierarchy and is repre-sented by DAGGD with the nodesD and the edgesED ⊆ D × D. The two graphs are coupled by a bi-partite graph. A bipartite graph consists of two dif-ferent node types. In this case, it links nodes fromthe wordsW and the domainsD. The edges are de-fined asEc ⊆ D × W . The nodes, the edges andthe domains can be labelled. Since we do not need

40

Page 4: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

the formal definition for the labels, we leave it out.Finally, the hierarchical graph is formed by the threegraphs as defined above H =〈GO, GD, GC〉. An ex-ample of hierarchical graph that describes the wordorder of a sentence is shown in Figure 2. We callthis type of a graphprecedence graph.

Deriving the word order from a precedencegraph. The word order of a precedence graph isderived in two steps. (1) If the graph has cycles thenan edge from each of the cycles is removed. (2) Atopological sort is applied to the precedence graphin order to derive the word order. A topological sortis a permutation p of the nodes of a directed graphsuch that an edgei, j implies thati appears beforej in p. The topological sort is applied recursivelyto the domains. The domains are ordered by edgescrossing domain borders.

Creating the precedence graph. The precedencegraph is build by rules. Parts of the dependencytree build the left-hand side of rules and parts ofthe precedence graph build the right-hand side. Theright-hand side of the rules are created without re-placing the parts in the source graph. The createdparts build a new graph as result. In this sense, therules read from one tape and write to another one.After the creation of the parts defined on the right-hand sides, the resulting graph parts are not yet con-nected. In order to connect these parts, the corre-spondence links, like in the case of FST3, can beintroduced. Using this technique, they can either (1)grasp a part of the target graph and attach to thispart a new part or (2) they can indicate which partsshould be glued together. The first approach is use-ful when the rules are applied in a sequence. In thisway already created parts can be accessed.

The advantage of this is that with (1) rules can beexecuted in a sequence and with (2) the rules canbe applied independently in paralle. In this way, an

3FST rule consists of three parts; the correspondence, therule operator and the environment or context, e.g. e:t=> C:@ .The correspondence specifies a lexical symbol that correspondsto a surface symbol. The corresponds is indicated by a colon.We have taken up this idea and connect, in case of tree trans-ducer, the left-hand side with the right-hand side by dashedlinesinstead of the colon. The operator of FST specifies the relation-ship between the correspondence and the context. The contextspecifies the environment in which the correspondence is found.While the lexical and surface symbols of FST are on the sameside, the symbols of tree and graph transducers are usually sep-arated on the left-hand side and right-hand side.

execution sequence of rules is not forced by the pro-cessing technique and on the other hand, it is stillpossible as in the case of FST and in the case of treetransducer to have a sequence, but then justified bylinguistic demands.

A linearization grammar has two tasks: buildingthe domain hierarchy and ordering the nodes con-tained in the domains. Therefore, the rules fall intotwo types: domain rules and precedence rules. Do-main rules consist of domain creation rules and do-main adjunct rules. In the next paragraphs, we de-scribe the rule patterns and a sample grammar.

Domain creation rules are used to define ele-mentary domains. Figure 3 shows on the left a rulepattern which represents this rule type. The domaincreation rules have on the left-hand side two nodesconnected by an edge and on the right-hand sidea domain containing two nodes. The nodes of theleft-hand side and right-hand side are connected bycorrespondence links which are used to unify targetnodes having a link to the same node in the sourcegraph. These rules can build also domains contain-ing more than two nodes. Overlapping domains areunified by the rule interpreter, if the labels are equal.The rule on the right shows an example. The exam-ple rule can be applied to the dependency tree shownin Figure 1. The left-hand side matches to the nodeslabelled withbookanda and creates a domain withthese two nodes as shown in Figure 2. In order tobuild the topological model for this sentence, twoadditional rules of the same type are needed withthe edge labelledSBJ andPMOD.4 The rules mighthave additional conditions which take for instanceinto account the part of speech tag.

?G

?D

?rel ?G’

?D’

LS: RS:

?Xs

?Ys

NMOD ?Xd

?Yd

LS: RS: NP

Figure 3: Domain creation rule pattern and example

Domain adjunct rules place domains into otherdomains. This rule type matches to an edge andgrasps in the target graph a domain and places it in a

4A complete grammar also has rules with the edge labelsOBJ andADV. For the example these rules are not necessary.

41

Page 5: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

newly created domain. This new domain might alsobe unified with domains created by domain creationrules. Figure 4 shows a rule pattern of this rule type.The dashed lines on the right-hand side indicate con-text in the target graph. The rules are applied in thesecond step because of the context. For the creationof English topological models, only two steps areneeded. In the first step, all domain creation rulesand precedence rules are applied in parallel. In thesecond step the adjunct rules are applied. In orderto map the German dependency tree to phrase struc-tures more complex rules are used. The context onthe right hand side is frequently deeply embedded.

?G

?D

?rel ?G’

?D’

LS: RS: ?XP

?Xs

?Ys

OBJ ?Xd

?Yd

LS: RS: VP

?YP

Figure 4: Domain adjunct rule pattern and example

Word order rules fall into two rule types: verticalrules and horizontal rules.

Vertical rules order a parent node in relation to achild node. The left-hand side of a rule consists of apath from the nodep to the nodec. In the most casesthe path consists of only one edge. Figure 5 showsa rule pattern and example rule. The order of thenodes of the right-hand side is adapted dependingon the order of the two nodes in the sentence of thetraining sentences. The left-hand side of the exam-ple rule matches the edges labelled asNMOD and theright-hand side orders the child node before the par-ent node. This rule is applicable to the nodesbookanda and orders thea beforebook. The rule patternshows several variants. In order to build topologicalmodels for German sentences, some of these ruleshave to access the right-hand side, for instance in or-der to determine whether a verb goes into the left orright bracket.

?G

?D

?rel ?G’?D’

LS: RS:

b

?G

?D

NMOD ?G’?D’

LS: RS:

b

Figure 5: Pattern for vertical rules and an example.

Horizontal rules order two child nodes or grant

children. The left-hand side of a rule consists of twopaths starting at a nodep and ending at the nodesvandw. Figure 6 shows a typical pattern of a horizon-tal rule. The example rule matches the nodesread,bookandabout.

?G

?D1

LS: RS:

?D2

?D2’?D1’ b?rel2?rel1 ?G

?D1

LS: RS:

?D2

?D2’?D1’ bADV OBJ

Figure 6: Rule pattern of horizontal rules and an ex-ample.

4 Induction of Linearization Grammars

In order to learn linearization grammars, we use thedefined rule patterns to derive rules from pairs of de-pendency trees and topological models. The rulesare build by completing the rule patterns. The left-hand side is completed with edge names and condi-tions which are taken from the dependency tree andthe right-hand side is completed by labels as well asthe direction of the nodes which have to be ordered.

For some of the rule patterns variations have tobe build. For the domain adjunct rules, rule patternswith different embedding depth of the domain areexamined until a possible solution is found.

The size of the left-hand side is increased in or-der to order all nodes of a domain. The advantage ofthis method is that it adapts automatically to differ-ent topoloical models which might have domains ofdifferent size.

Fundamentally for this method and the termina-tion of the learning process is a termination condi-tion which defines when the extension of rules isstopped. For the definition of the termination con-dition, we need to know which nodes have to be or-der by the rules. These are not only the nodes of adomain. They include also the children in the de-pendency tree of nodes contained in a domain. Thechildren are required to order the domain in relationto other domains. We call this set the order set. Fig-ure 7 shows on the left a dependency tree overlappedby the order set and on the right a dependency treeoverlapped by domains.

42

Page 6: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

has

John

SBJ

read

book about

Berlina

VC

OBJ ADV

PMODNMOD

has

John

SBJ

read

book about

Berlina

VC

OBJ ADV

PMODNMOD

has

John

SBJ

read

book about

Berlina

VC

OBJ ADV

PMODNMOD

Figure 7: Dependency trees overlaped with oder sets(left) and domains (right)

Def. 3 [order set]Op is a set of nodes consisting of all nodes of a do-maindi and the target nodes of all dependency edgeswhere the corresponding source node is contained inthe domain:Given thatGT = 〈NT , ET 〉 is a dependency treeandHp = 〈GP , GD, Gc〉 a precedence graph wherethe DAGGD = 〈ND, ED〉 represents the domainhierarchy,GC = 〈Wc,Dc, Ec〉 the coupling graphbetween the graphGP and the domain hierarchyGD, the functionf : NT → NP maps the words ofthe dependency tree to the words of the precedencegraph Hp, the setDp = {s|(s, d) ∈ Ec} containsthe nodes included in a domain p, then the orderset for the domainp is defined by the set of nodes:Op = { n |(n = s ∨ n = t) ∧ (s, t) ∈ ET ∧ f(s) ∈Dp ∨ p ∈ Dcx}

With this definition, it is possible to define theter-mination conditionfor the rule creation in the learn-ing algorithm: Either the order setOp it totally or-deredor no further rules can be built ordering fur-ther elements.The learning algorithm is shown be-low:

for all paris of the input structurePi = (Gi, HI) dofor allOi

p of Pi do

s←1repeat

build all rules for the rule patterns of size s

if the rules were already builtthenincrease the rule counter

else store the rule

s← s + 1

until Op is totally orderedorno more rules can be built(cf. termination condition)

A grammar can cause cycles in a precedencegraph. In order to solve this problem, we keep allrules and we assign a normalized count of the oc-currences as weight to the edges. The cycles aredissolved in the precedence graph by removing oneedge from each cycle with the lowest weight. Thederived order is one with a high probability. This issufficient in text generation and machine translationsince we always look for the order, which fits best.

5 Evaluation of the LinearizationGrammars

Due to the lack of an automatic evaluation methodfor the results of linearization components, we sug-gest and apply a method which computes the simi-larity between different word orders of a sentence.With this method, it is possible to measure the dif-ference between the original word order and the de-rived word order. For languages with fixed wordorder, the value is correlated as shown below withthe error rate of a sentence and in the case of lan-guages with free word order, the evaluation methodwould probably work when the information struc-ture would be annotated.

A measured difference does not always indicatean error since for instance prepositional constituentscan be placed at different positions also within lan-guages having fixed word order. In order to resolvethis, it would be possible to define the exceptionsmanually. We think that this is acceptable since it isnot possible to have an automatic evaluation methodfor all cases.

The similarity measure is defined between twoword orders of a sentence based on the domains.The position of each elements of all domains fromtwo orders are compared and the elements havingthe equal position are counted. The similarity isdfinied by the fraction of this count to the total num-ber of the elements. In the following exapmle, theorder of the two elements in domain 2 is switched.Therefore, the count of the elements taking the samepositions in this domains is zero.

[ John has [ read [a book]1 ]2]3

[ John has [ [a book]1 read ]2]3

The count of domain 1 is 2 and the count ofdomain 3 is 3. The number of correct placed

43

Page 7: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

elements is 5 and the total number of elementsis 7. The similarity is therefore 5/7 (0.714). Thefollowing equation defines the similarity relation:∑d

i=1elements with equal position in D

s1i

and Ds2i

number of words + number of domains − 1

In order to evaluate the learning procedure for lin-earization grammars, we trained grammars on se-lected corpora as used in the shared task for de-pendency parsing for Catalan, English, German, andItalian. For the languages with fixed word order, weused the automatic evaluation method. For German,we had still to evaluate manually the results as inmost of the cases, we did not get the original wordorder. One important reason for this is the missinginformation structure within the dependency trees.

As input for the training of the Catalan, Chinese,and English grammar, we used pairs of dependencytrees and topological models that can be derive fromthe dependency trees. Each node was placed topdown in a domain and only the leafs of the depen-dency tree together with their parent in one domain,cf. Figure 7 right.

The punctuation marks and coordinations are ex-cluded from the training and evaluation since the an-notation did not allow to derive the word order forthese parts. The constituents of a coordination areall placed at the conjunction. Other annotation fordependency tree define a structure which would al-low to derive the order, cf. (Mel’cuk and Pertsov,1987).

Table 1 shows the results of the evaluation. Thefirst column shows the language of the used cor-pus; the second column shows the number of sen-tences used for training; the third column shows thenumber of test sentences; the fourth column showsthe average test sentence length; the fifth columnshows the automatically computed similarity valuebetween the original sentences and the sentenceswith the derived word order; the sixth column showsthe average of the manual evaluation where resultsof test set has been rated with good, acceptable,and wrong. The value gives the accuracy for sen-tences rated with good or acceptable. The last valueshows the average of the correlation between eachof then ratings and each of the similarity values.In order to compute the correlation, the ratings hasbeen mapped to numerical values. Since the sen-

tence length is an important factor for the accuracy,we computed the correlation between the similarityvalues and length of the English sentences which is0.22. The test sentences of the Catalan corpus areunusually long and for English a bit longer as theaverage of 24 words. The longer Catalan sentencesare probably the reason for the slightly worser resultcompared to English.

Corpus Training Test Length Similarity Eval. Corr.(lang.) (# sent.) (# sent.) (# words) %

Catalan 14796 50 35.3 0.904 84% 0.53English 18526 50 27.2 0.922 86% 0.51Italian 3049 50 19.3 0.879 70% 0.42

Table 1: Summary of the results.

As input for German, we used the dependencytree and phrase structure annotation of the Tiger cor-pus. For German the usage of phrase structures astopological models, is not the best choice, since thennot all possible word orders can be derived. Butthere are no large corpora for written text annotatedwith the German topological model. Additionally,we used about 10 hand written rules in order to as-sign additional features to the nodes that indicatethe position of the verbs in the German topologicalmodel. The rule pattern had then also to considerthis additional features during the learning phase.

Since German has a free word order, we evaluated100 sentences manually, 21 of the sentences havebeen wrong, that is an accuracy of 79%. In 8 casesthe pre-field have been empty that is caused by usingphrase structures as topological models. In 5 casesrules have been missing and in 4 cases adverbs havebeen placed wrong. Each of the following cases oc-curred ones: wrong order of adjectives in a nounphrase, wrong order of a date format, wrong positionof a verb due to the hand written rules, and finally anunspecified position of a negation particle. For Ger-man, we conducted another experiment and evalu-ated 20 randomly selected sentences with a lengthup to 12 words as sentences used in text generationare currently shorter and not so complex. We gotonly one error, which relates to an accuracy of 96%.

The tool for the induction of linearization gram-mars is embedded in a linguistic environment. Weprovide the environment including the tool for learn-ing the linearization grammars and a descriptionhow to proceed.5

5Request are welcome and should be directed via email tothe author of the paper.

44

Page 8: The induction and evaluation of word order rules using ...The topological models of the sentences are repre-sented as hierarchical graphs. We use the hierarchi-cal graph definition

6 Conclusion

We identified the two basic concepts to build topo-logical models which underlay probably any topo-logical model. The basic concepts allow to describecomplex word order models, such as the classic Ger-man topological model or simpler models for lan-guages with fixed word order. Based on this con-cpets rule patterns have been formulated using atree/graph transducers formalism. The graph trans-ducer use two tapes. Therefore, they have a lot ofadvantages. The most important one is that they al-low to describe word order rules due to the syntacticnecessities and to apply rules independently in par-allel. The human brain is also massively parallel andis certainly also able to handle word order in paralleland not top-down as many algorithms do.

We introduce a method for the induction of wordorder rules, which takes as input sentences annotatedwith a dependency tree and topological model. Thealgorithm uses rule patterns to learn grammars. Theresults have been evaluated automatic for languageshaving fix word order. The learning and evaluationmethods can be easily adapted to other languages.

References

G. Bech. 1955.Studiumuber das deutsche Verbum in-finitum. Max Niemeyer Verlag, Tubingen.

B. Bohnet and H. Seniv. 2004. Mapping DependencyStructures to Phrase Structures and the Automatic Ac-quisition of Mapping Rules. InLREC, Portugal, Lis-boa.

B. Bohnet. 2006. Textgenerierung durch Transduk-tion linguistischer Strukturen.Ph.D. thesis, UniversityStuttgart.

N. Broker. 1997. Eine Depedenzgrammatik zurKopplung heterogeer Wissensysteme auf modallois-cher Basis. Ph.D. thesis, Albert-Ludwigs-Universitt,Freiburg.

G. Busatto. 2002. An Abstract Model of Hierarchi-cal Graphs and Hierarchical Graph Transformation-busatto. Ph.D. thesis, Universitat Paderborn.

Y. Ding and M. Palmer. 2005. Machine translation usingprobabilistic synchronous dependency insertion gram-mars. InACL ’05.

C. Donohue and I. Sag. 1999. Domains in Warlpiri. InHPSG99, Edinburgh.

E. Drach. 1937. Grundgedanken der deutschen Sat-zlehre. Diesterweg, Frankfurt.

D. Duchier and R. Debusmann. 2001. Topological de-pendency trees: A constraint-based account of linearprecedence. InProceedings of the ACL.

K. Gerdes. 2002.Topologie et grammaires formelles delallemand. Ph.D. thesis, Universit Paris 7.

G. Gierz, K. H. Hofmann, K. Keimel, J. D. Lawson,M. Mislove, and D. S. Scott. 2003.Continuous Lat-tices and Domains.

T. Hohle. 1986. Der Begriff Mittelfeld. Anmerkun-gen uber die Theorie der topologischen Felder. Aktendes 7. Internationalen Germanistenkongresses, Band3, pages 329–340. Tubingen.

A. Kathol and C. Pollard. 1995. Extraposition via com-plex domain formation. InMeeting of the Associationfor Computational Linguistics, pages 174–180.

K. Knight and J. Graehl. 2005. An overview of proba-bilistic tree transducers for natural language process-ing. In Sixth International Conference on IntelligentText Processing and Computational Linguistics. Lec-ture Notes in Computer Science.

B. Lavoie, R. Kittredge, T. Korelsky, and O. Rambow.2000. A Framework for MT and Multilingual NLGSystems Based on Uniform Lexico-Structural Process-ing. In ANLP/NAACL Conference.

C. Mellish, D. Scott, L. Cahill, D. Paiva, R. Evans, andM. Reape. 2006. A reference architecture for naturallanguage generation systems.Nat. Lang. Eng., 12(1).

I. A. Mel’cuk and N. Pertsov. 1987.Surface-syntax ofEnglish, a formal model in the Meaning-Text Theory.Benjamins, Amsterdam/Philadelphia.

I. Mel’cuk and L. Wanner. 2006. Syntactic mismatchesin machine translation.Machine Translation, 20(2).

O. Rambow. 1994.Formal and Computational Aspectsof Natural Language Syntax. Ph.D. thesis, Universityof Pennsylvania.

M. Reape. 1993.A Formal Theory of Word Order. ACase Study in West Germanic.Ph.D. thesis, Universityof Edinburg.

P. Skarup. 1975. Les premieres zones de la propositionen ancien francais. InAkademisk Forlag, Copenhaen.

M. Cmejrek, J. Curın, and J. Havelka. 2003. Czech-english dependency-based machine translation. InEACL ’03.

F. Xia and M. Palmer. 2001. Converting DependencyStructures to Phrase Structures. InThe Proc. of theHuman Language Technology Conference, San Diego.

45