Proceedings of FG 2008: The 13th conference on …...FG-2008, the 13th conference on Formal Grammar, was held in Hamburg, Germany on the 9th and 10th of Auguust 2008. The conference

Proceedings of FG 2008:The 13th conference on

Formal Grammar

Philippe de Groote (Ed.)

Hamburg, GermanyAugust 9–10, 2008

CENTER FOR THE STUDYOF LANGUAGEAND INFORMATION

Contents

Preface v

1 Underspecification: from Semantics to Discourse —InvitedTalk— 1MARKUS EGG

2 The Emerging Role of Grammars in Statistical MachineTranslation —Invited Talk— 5PHILIPP KOEHN

3 Danish There-Constructions with Intransitive Verbs 7ANNE BJERRE ANDTAVS BJERRE

4 A Landscape of Logics for Finite Unordered Unranked Trees 23STEPHAN KEPSER

5 Semantics in Minimalist-Categorial Grammars 41ALAIN LECOMTE

6 Treebanks and Mild Context-Sensitivity 61WOLFGANG MAIER AND ANDERSSØGAARD

7 Toward a Universal Underspecified Semantic Representation 77MEHDI HAFEZI MANSHADI , JAMES F. ALLEN , MARY SWIFT

8 Inessential Features and Expressive Power of DescriptiveMetalanguages 95GEOFFREYK. PULLUM AND HANS-JORG TIEDE

iii

iv / FG-2008

9 Type Signature Modules 113YAEL SYGAL AND SHULY WINTNER

List of Contributors 129

Preface

The Formal Grammar conference series provides a forum for the presentationof new and original research on formal grammar, mathematical linguistics andthe application of formal and mathematical methods to the study of naturallanguage.

FG-2008, the 13th conference on Formal Grammar, was held in Hamburg,Germany on the 9th and 10th of Auguust 2008. The conference consistedin seven contributed papers (selected out of eighteen submissions), and twoinvited talks.

We would like to thank the people who made this 13th FG conference pos-sible: the two invited speakers, Philipp Koehn and Markus Egg, the membersof the Program Committee, and Laura Kallmeyer for taking care of the orga-nization. She worked closely with the ESSLLI 2008 organizing committee,in particular with Benedikt Lowe to whom we are deeply grateful.

Philippe de Groote

Program Committee. Pierre Boullier (INRIA Paris - Rocquencourt, France). Wojciech Buszkowski (Uniwersytet im. Adama Mickiewicza, Poland). Miriam Butt (Universitat Konstanz, Germany). Alexander Clark (Royal Halloway University of London, UK). Berthold Crysmann (DFKI, Germany). Alexander Dikovsky (Universite de Nantes, France). Denys Duchier (Universite d’Orleans, France). Annie Foret (Universite de Rennes 1, France). Nissim Francez (Technion, Israel)

v

vi / FG-2008

. Philippe de Groote (INRIA Nancy - Grand Est, France), Chair. Gerhard Jaeger (Universitat Bielefeld, Germany). Makoto Kanazawa (National Institute of Informatics, Japan). Stephan Kepser (Universitat Tubingen, Germany). Glyn Morrill (Universitat Politecnica de Catalunya, Spain). Richard Moot (CNRS, France). Larry Moss (Indiana University, USA). Stefan Muller (Freie Universitat Berlin, Germany). Mark-Jan Nederhof (University of St. Andrews, UK). Joakim Nivre (Vaxjo Universitet, Sweden). Frank Richter (Universitat Tubingen). Sylvain Salvati (INRIA Bordeaux - Sud-Ouest, France). Giorgio Satta (Universita di Padova, Italy), Chair. Ed Stabler (UCLA, USA). Hans-Jorg Tiede (Illinois Wesleyan University, USA). Jesse Tseng (CNRS, France). Shuly Wintner (University of Haifa, Israel)

FG Standing Committee. Philippe de Groote (INRIA Nancy - Grand Est, France). Laura Kallmeyer (Universitat Tubingen, Germany). Gerald Penn (University of Toronto, Canada). Giorgio Satta (Universita di Padova, Italy)

1

Underspecification: from Semantics toDiscourse —Invited Talk—MARKUS EGG †

Underspecification has been introduced into semantics as a means to handleambiguity. In the meantime, a host of underspecification formalisms is avail-able, which represent the meaning of an ambiguous expression in terms ofpartial semantic information. Two properties of formalisms have emerged ascrucial:

First, is a formalismefficient, i.e., can the readings of an ambiguous ex-pression be derived and enumerated easily from its underspecified represen-tation, even for very high numbers of readings? (In NLP applications, thenumber is much higher than one would expect due to spurious ambiguities,see Koller and Thater 2006). Second, is a formalismexpressive, i.e., can itrepresent any subset of the readings of an ambiguous expression (Konig andReyle, 1999, Ebert, 2005)?

Practical work on discourse annotation in Potsdam (Reitterand Stede,2003) and in Groningen shows that underspecification is desirable for dis-course processing as well, because not every discourse can be assigned a sin-gle fully specified structure by human analysts (or discourse parsers), whichintroduces ambiguity at the discourse level.

But for discourse processing, efficiency and expressivity of underspeci-fication formalisms get even more important: The items to be analysed getdrastically larger (the number of atomic segments in a discourse exceeds thenumber of scope-bearing entities in a sentence by far), which calls for muchmore efficient processing. And, discourse processing requires a high grade of

†This talk presents joint work with Michaela Regneri and Alexander Koller.

1

FG-2008.Philippe de Groote (Ed.).Copyright c© 2008, CSLI Publications.

2 / MARKUS EGG

expressivity to allow the integration of preferences. These preferences can beextracted from large corpora annotated for discourse structure (for the corporasee Carlson et al. 2003 and Stede 2004).

The preferences describe the interaction of discourse relations (e.g.,CON-DITION, BACKGROUND, or SUMMARY) and discourse configuration (howsmaller segments of discourse are arranged into larger ones), which togetherconstitute discourse structure. Consider for instance thediscourse (1):

(1) I try to read a novel (C1) if I feel bored (C2) because the TV pro-grammes disappoint me (C3) but I can’t concentrate on anything. (C4)

For (1) five different discourse structures are possible, which are ranked bythe constraint that the second argument of the condition relation (introducedby if ) is maximally short: Ideally, this argument should be onlyC2, i.e., thespeaker reads a novel if he feels bored, independently of theTV programmesand/or his ability to concentrate. Structures where this argument consists ofC2 andC3 are less preferred, but still more preferred than structures wherethis argument isC2-C4.

Weighted Regular Tree Grammars (wRTGs; Koller et al. 2008 ) are in-troduced as a formalism to represent and process partial information on dis-course structures in an efficient and expressive way. Preferences as illustratedfor (1) are integrated as soft constraints.

ReferencesCarlson, Lynn, Daniel Marcu, and Mary Ellen Okurowski. 2003. Building a discourse-

tagged corpus in the framework of Rhetorical Structure Theory. In J. van Kuppeveltand R. Smith, eds.,Current Directions in Discourse and Dialogue, pages 85–112.Dordrecht: Kluwer.

Ebert, C. 2005.Formal investigations of underspecified representations. Ph.D. thesis,King’s College, London.

Koller, Alexander, Michaela Regneri, and Stefan Thater. 2008. Regular tree grammarsas a formalism for scope underspecification. InProceedings of ACL-08. To appear.

Koller, Alexander and Stefan Thater. 2006. An improved redundancy eliminationalgorithm for underspecified descriptions. InProceedings of COLING-ACL 2006.Sydney.

Konig, Esther and Uwe Reyle. 1999. A general reasoning scheme for underspeci-fied representations. In H. J. Ohlbach and U. Reyle, eds.,Logic, Language andReasoning. Essays in Honour of Dov Gabbay, pages 1–28. Dordrecht: Kluwer.

Reitter, David and Manfred Stede. 2003. Step by step: underspecified markup in in-cremental rhetorical analysis. InProceedings 4th International Workshop on Lin-guistically Interpreted Corpora (LINC-03). Budapest.

REFERENCES/ 3

Stede, Manfred. 2004. The Potsdam Commentary Corpus. In B. Webber and D. By-ron, eds.,ACL 2004 Workshop on Discourse Annotation, pages 96–102. Barcelona,Spain: Association for Computational Linguistics.

2

The Emerging Role of Grammars inStatistical Machine Translation —InvitedTalk—PHILIPP KOEHN

Statistical machine translation has been proven to be a successful approachto this ancient problem. From a linguistic point of view, current methodsare very simplistic, based on the mapping on small text chunks accordingto huge probabilistic mapping tables. While this works fairly well for lan-guages with similar syntactic structure (such as French-English, and evenArabic-English), it does have systematic problems with language pairs suchas German-English and Japanese-English. There is a rich literature of de-veloping syntax-based (or grammar-based) approaches in statistical machinetranslation. This talk will motivate some of this work on theexample ofGerman-English translation, and highlight the technical challenges involvedin using grammars.

5


3

Danish There-Constructions withIntransitive VerbsANNE BJERRE ANDTAVS BJERRE

AbstractIn this paper we argue that the distribution of verbs in there-constructions is de-

termined by a “locative” constraint. We show that an important function of the there-construction is to “locate” the logical subject referent ata place or in a state. This ac-commodates an unacussative interpretation. However, agentive manner of motion verbs,typically analyzed as unergative verbs, also appear in there-constructions in Danish. Weintroduce a lexical rule insertingthere. In order for our lexical rule to account for allverbs appearing in there-constructions, including the agentive manner of motion verbs, anon-resultative complex event structure is proposed for these verbs, representing both theunacussative existence or appearance meaning and the agentive meaning of these verbs.

Keywords THERE-INSERTION, HPSG,LEXICAL SEMANTICS, DANISH.

3.1 IntroductionAccording to the Unaccusative Hypothesis, (Perlmutter, 1978), intransitiveverbs split into two classes, unaccusative verbs and unergative verbs, basedon their different underlying structures. In Perlmutter’sterminology, unac-cusatives have ‘an initial 2 but no initial 1’, (Perlmutter,1978, 160). Thismeans that unaccusatives have an underlying object but no subject. Unerga-tives, on the other hand, have an underlying subject. Although the classifica-tion is based on syntactic characteristics, Perlmutter points out that semanticfactors determine the syntactic classes, e.g. unaccusatives take a patient argu-ment whereas unergatives typically describe an activity, cf. (Perlmutter, 1978,162–163).

Other authors have tried to determine the syntactic class semantically. Za-enen (1993), based on Dowty (1991), proposes that the argument of unac-

7


8 / ANNE BJERRE ANDTAVS BJERRE

cussatives has more patient properties than agent properties, and the argu-ment of unergatives has more agent properties than patient properties. Levinand Hovav (1995) argue that the syntactic classification of verbs into unac-cusatives and unergatives corresponds to a distinction between verbs whichare externally caused and internally caused (Levin and Hovav, 1995, 98). So-race (2004) posits a hierarchy of auxiliary selection basedon a hierarchy ofsemantic verb classes, and suggests that this same hierarchy may be a hierar-chy of unaccusativity.

The there-construction has traditionally been discussed within the con-text of unaccusativity, claiming that the verbs that allow there-insertion areunaccusative verbs, cf. Burzio (1986). Treating there-insertion as an unac-cusative diagnostics, however, begs an explanation as to why certain appar-ently unergative verbs allow there-insertion. In Danish, intransitive agentivemanner of motion verbs frequently appear in there-constructions, cf. (1).1

(1) a. Der løber en hest pa motorvejen.There runs a horse on motorway-the

b. Der gik en mand længere ude i mosen.There walked a man further out-stative in bog-the

Based on such examples with agentive objects, it has been rejected thatthere-insertion is an unaccusativity diagnostics, cf. e.g. Sveen (1996) andLødrup (2000). The verbs appearing in there-constructionsmay then be ex-plained by e.g. the discourse function of the construction,saying that theconstruction weakens or bleaches the meaning of certain verbs, cf. Oxenvad(1934), Borjars and Vincent (2005) and many others, or by positing two lex-ical entries for these verbs, one consistent with an unacussative verb and oneconsistent with an unergative verb, cf. e.g. Hoekstra and Mulder (1990). Themeaning consistent with the unacussative representation may then be consid-ered to be a “deagentivized” version of the unergative, cf. also Kirsner (1973)and Maling (1987).

In this paper we want to argue that the distribution of verbs in there-constructions is determined by a “locative” constraint. Wewant to show thatan important function of the there-construction is to “locate” the logical sub-ject referent at a place or in a state. See also Bresnan (1993)who proposes asimilar constraint for the English locative inversion construction. This meansthat the function of the there-construction is to state the existence or ap-pearance of the logical subject referent at some location orin some state.This meaning accommodates an unacussative interpretation. However, as alsomentioned by e.g. Brink (1997) and Lødrup (2000), the agentive manner ofmotion verbs still have an agentive interpretation truth-conditionally, and so

1All examples in this paper are found on the internet.

DANISH THERE-CONSTRUCTIONS/ 9

we believe that this has to be represented lexically. In order for our lexicalrule to account for all the verbs appearing in there-constructions, includingthe agentive manner of motion verbs, a non-resultative complex event struc-ture is proposed for these verbs. This event structure represents both the una-cussative existence or appearance meaning and the agentivemeaning of theseverbs. Further support for the non-resultative complex event structure is pro-vided by the behaviour of the verbs in Danish pseudo-coordination construc-tions.

3.2 Danish intransitive verbs in there-constructions

In this paper we concentrate on intransitive verbs in Danishthere-constructions,cf. Bjerre and Bjerre (Forthcoming) for an account of transitive verbs in Dan-ish there-constructions. In Danish we find unacussative verbs of existence orappearance2 in there-constructions, as shown in (2).

(2) a. Der eksisterer mange former for realisme.There exists many types of realism

b. Der opstod en fejl.There appeared a mistake

We also find unacussative verbs of change of state, both internally andexternally caused change of state, as in (3).

(3) a. Der blomstrede et Æbletræ saa rigt og en vild Kastanie.3

There bloomed an apple tree so richly and a wild chestnut

b. Der brændte et hus ved Hørup Mølle ved lynnedslag.There burned a house at Hørup Mølle by strike of lightening

c. Der gik en stol i stykker under valget af dirigent.There went a chair to pieces during election-the of chairman

With internally caused verbs which are ambiguous between anexistenceand change of state reading, according to (Milsark, 1979, 252–253) only theformer reading appears in there-constructions in English,however, in Danishboth readings are found, cf. (4).

(4) a. Der vokser blomster.There grow flowers

b. Der vokser et barn i dig.There grows a child in you

We also find verbs of emission in Danish there-constructions, as in (5).

2We use the verb classes in Levin and Hovav (1995) in this presentation.3Until 1948 in Danish nouns were written with capital lettersand the character ‘a’ with ‘aa’.


(5) a. Da hjulet var kommet af, kunne vi konstatere at derWhen wheel-the was come off could we ascertain that therelækkede olie fra systemet.leaked oil from system-the

b. Der lyser en stjerne pa himlen et sted.There shines a star on sky-the some place

Another class of verbs found in Danish there-constructionsis verbs of spa-tial configuration. We find both verbs in the “simple position” sense and verbsin the “assume position” sense. Examples are given in (6).

(6) a. Der ligger en bombe pa min terasse.There lies a bomb on my terace

b. Der satte sig en kvinde ved hans bord.There sat a woman at his table

With respect to motion verbs, we find both verbs of directed motion andmotion verbs with locational PPs, as in (7).

(7) a. Der gik en høj, svær Haandværksmand ud af Døren, idetThere walked a tall, heavy workman out of door-the, asjeg traadte ind.I walked in

b. Der gik en ko pa Nørrebro.There walked a cow on Nørrebro

With respect to the verbs of directed motion, we find exampleswhere theverbs have a disappearance interpretation as well as examples where theyhave an appearance interpretation in Danish, as in (8). Thisis apparently incontrast to English, Levin and Hovav (1995) cite Kimball (1973) for a con-straint against verbs of disappearance in there-constructions.

(8) For selv om der gik mange ud ad den ene dør, kom derBecause even if there walked many out of the one door, came thereikke nye ind ad den anden.not new in of the other

Intransitive verbs that do not predicate a state or locationof their logicalsubject cannot occur in there-constructions, e. g.grine, ‘laugh’,nyse, ‘sneeze’etc.

In this section we have showed that many different classes ofintransitiveverbs appear in Danish there-constructions. As we have stated earlier, wewill, however, show in 3.5 that all these verbs share a commoncharacteristicin stating the existence or appearance of the logical subject referent at somelocation or in some state.


3.3 Agentive manner of motion verbsAs shown in the previous section, agentive manner of moton verbs appearin the there-construction. As also mentioned earlier, their presence in theconstruction has been explained as a de-agentivization of the verbs. Truth-conditionally, this is a problem, as there is no doubt that the agentivity andthe manner component of their meaning is evident, as shown by(9).

(9) Der spadserer en flue pa væggen.There strolls a fly on wall-the

The manner of the motion is important, the fly “strolls”, fliestypically donot stroll, and it is used to emphasize the manner of the motion.

We also find other examples with adverbials which are inconsistent with apure existence at a location interpretation, cf. (10).

(10) a. Der løber en lille krokodille hurtigt rundt pa væggen i mitThere runs a small crocodile quickly about on wall-the in mysoveværelse.bedroom

b. Der svømmede 2 delfiner stille rundt.There swam 2 dolphins quietly about

Indeed, the combination withrundt, ‘about’, may turn other agentive verbsinto motion verbs allowing these to appear in the there-construction as shownin (11).

(11) a. *Der fjoller cirka 22 mænd efter en sort/hvid boldThere fool approximately 22 men after a black/white balli stedet for.instead

b. Der fjoller cirka 22 mænd rundt efter en sort/hvidThere fool approximately 22 men about after a black/whitebold i stedet for.ball instead

(12) a. *Der fiser en mand med en violin.There farts a man with a violin

b. Der fiser en mand rundt med en violin.There farts a man about with a violin

This phenomenon also suggests that the meaning of manner of motionverbs, when used in there-constructions, is not reduced to apure existence orappearance at a location interpretation.

The data presented here suggests that agentive manner of motion verbshave two “submeanings”, an existence at location meaning, but also an agen-


tive activity meaning. It may be that the agentive “submeaning” is weakenedor bleached for some discourse functional purpose, but it isstill present.

3.4 There-insertion and pseudo-coordination

Danish does not express aspect morphologically. Aspectualdifferences areinstead expressed by using various verbal constructions, socalled pseudo-coordinations. Such aspectual constructions have not received much attentionin the Danish linguistic literature, but cf. Diderichsen (1946, 156), Hansen(1967, vol. 3, 30–31), Jensen (1985, 113), Brandt (1992), Jørgensen (2001)and Bjerre and Bjerre (2007b).

Examples of agentive manner of motion verbs in pseudo-coordina-tionsare shown in (13).

(13) a. Børnene løber og leger.Children-the run and play

b. De sidder og kysser.They sit and kiss

The combination of the two conjuncts makes the constructionimperfec-tive, and the events expressed by the second verb in the construction,playandkiss, are understood to be in progress and continuous.

The first verb in a pseudo-coordination is a motion verb or a verb of spa-tial configuration. Also here do the verbs retain their full meaning though itmay be bleached. As mentioned above these are verbs that appear in there-constructions and there-insertion is possible in all pseudo-coordinations, evenwhen the second verb in the pseudo-coordination is a verb that does not on itsown allow there-insertion.

Mateu and Amadas (1999), based on studies by Bybee et al. (1994) show-ing that the progressive corresponds with or originates as alocative con-struction in most languages, propose an analysis of progressive construc-tions where a locative unaccusative structure which is associated with “be”locates the event depicted by the full verb. (14) is an example from Mateuand Amadas (1999).

(14) John is breaking the window.

Thus, (14) means “John is centrally located in the event of causing thewindow to become broken”, in this way giving it a progressiveinterpretation.

The Danish pseudo-coordination construction and the English “be” +”ing” construction have similar functions. However, in Danish we can usean agentive manner of motion verb. If we adopt the idea of Mateu andAmadas (1999) that the progressive is a locative structure and we analysethese verbs as denoting a non-resultative complex situation with a locativesubevent, in addition to accounting for their agentive meaning, we account for


both their appearance in pseudo-coordinations and their appearance in there-constructions. In pseudo-coordinations the locative structure additionally lo-cates the following event, but in there-constructions, thelocative structureadditionally locates the focused logical subject referent.

3.5 Complex event structure

Before we can formulate our lexical rule for there-insertion, we need to showhow we will represent the lexical semantics of verbs. The analyses providedin this section are modifications of analyses presented in Bjerre (2003) andBjerre and Bjerre (2007a).

Verbs split into a number of semantic classes reflected in their event andargument structure. Verbs (or predicates) denote situations. Situations maybe divided into simple situations, a process or a state, and complex situations.Complex situations have typically been explained as situations where a pro-cess results in another situation, in most cases a state. Theidea of decompos-ing event structure goes back at least to Lakoff (1965) and McCawley (1968)and is employed in combination with the Vendlerian classification (Vendler,1957) in Dowty (1979) and Levin and Hovav (1995) among many others.

In this paper we propose that a complex situation can also be non-resultativeand consist of two subsituations that are not causally linked, but happen orexist in parallel. This is reflected in (15).

(15) [psoaSEM-ARGS list

]

[situationSIT-STRUC list-of-event-rels

]relation

[simple-sitSIT-STRUC

⟨event-rel

⟩

]

complex-sitTEMP-REL temp-relSIT-STRUC

⟨event-rel, event-rel

⟩

non-resultative

TEMP-REL

included-relSIT1 e1SIT2 e2

resultative

TEMP-REL

precede-cause-relSIT1 e1SIT2 e2

In a resultativesituation subsituation1 precedes and causes subsituation2,whereas in a non-resultative situation subsituation1 is temporally included insubsituation2.


We assume that semantic relations come with a fixed number of arguments.We are inspired by Davis (2001), though many details differ.Semantic rolesare introduced as features on relations as shown in the hierarchy in (16).

(16) relation

[event-relE-IND e-ind

]...

process-rel[state-relTHEME ref

]

[act-relACT ref

] [loc-relGRND ref

]th-only-rel

spec-act-rel act-only-rel unspec-rel[exp-theme-relEXP ref

]

spec-act-only-rel fully-unspec-rel[act-und-relUND ref

]unspec-act-rel

Based on the types for event and argument structure in (15) and (16), lex-ical representions for the verb classes we have have discussed earlier can beformulated. The relations used in our formalizations are subsumed by the re-lations in (16).

In (17) the representation forforsvinde, ‘disappear’, a verb of appearance.


(17) forsvinde, ‘disappear’

word

S | L

CAT | HEAD verb

CONT

resultativeSEM-ARGS 1 ⊕ 2

TEMP-REL


SIT-STRUC

⟨

fully-unspec-relE-IND e1

SEM-ARGS 1⟨⟩

,

disappeared-relE-IND e2THEME i

SEM-ARGS 2⟨[

IDX i]⟩

⟩

The meaning offorsvindeis that an unspecified process with no semanticroles results in the state of some theme entity being disappeared,disappeared-rel being a subtype oftheme-only-rel.

(18) gives the lexical representation ofblomstrein its internally causedinterpretation.

(18) blomstre, ‘bloom’ (internally caused)

word

S | L

CAT | HEAD verb

CONT


TEMP-REL


SIT-STRUC

⟨


SEM-ARGS 1⟨⟩

,

in-bloom-relE-IND e2THEME i

SEM-ARGS 2⟨[

IDX i]⟩

⟩

Again the meaning ofblomstreinvolves an unspecified process leading tothe state of an theme entity being in bloom.

In (19) we see the represenation forvoksein its change of state interpreta-tion.


(19) vokse, ‘grow’ (change-of-state reading)

word

S | L

CAT | HEAD verb

CONT


TEMP-REL


SIT-STRUC

⟨


SEM-ARGS 1⟨⟩

,

bigger-relE-IND e2THEME i

SEM-ARGS 2⟨[

IDX i]⟩

⟩

Voksemeans that an unspecified process leads to the state of a themeentitybeing bigger. We do not try to solve the problem of how the relative relationbiggershould be represented.

In (20) the representation for the verb of emmisionlækkeis shown.

(20) lække, ‘leak’

word

S | L

CAT | HEAD verb

CONT


TEMP-REL


SIT-STRUC

⟨

unspec-relE-IND e1

SEM-ARGS 1⟨⟩

,

emitted-relE-IND e2THEME i

SEM-ARGS 2⟨[

IDX i]⟩

⟩

The meaning is that an unspecified process results in the state of a themeentity being emitted. The emitted entity is restricted to bea fluid, but we willassume that this restriction must be stated in terms of selectional restrictionson the argument by the verb, rather than being reflected in event and argumentstructure.

(21) shows the respresentation of the spatial configurationverbligge.


(21) ligge, ‘lie’

word

S | L

CAT | HEAD verb

CONT

simple-sitSEM-ARGS 1

SIT-STRUC

⟨

lie-relTHEME iGRND j

SEM-ARGS 1⟨[

IDX i],[IDX j

]⟩

⟩

The verb means that some them entity is located in some place.

In (22) the representation of the spatial configuational verb sætte sigwhichhas an assume position meaning, is shown.

(22) sætte sig, ‘sit down’

word

SS | LOC

CAT | HEAD verb

CONT


TEMP-REL


SIT-STRUC

⟨

unspec-act-relE-IND e1ACT i

SEM-ARGS 1⟨[

IDX i]⟩

,

sit-relE-IND e2THEME j

SEM-ARGS 2⟨[

IDX j]⟩

⟩

The meaning is that some unspecified process involving an actor results ina state of a theme being placed at a location. The actor and thetheme havethe same referent.

In (23) we show the lexical representation for the agentive manner of mo-tion verbga with a directed motion interpretation.


(23) ga, ‘walk’ (directional)

word

SS | LOC

CAT | HEAD verb

CONT

resultative

SEM-ARGS 1 ⊕⟨

2⟩

TEMP-REL


SIT-STRUC

⟨

walk-relE-IND e1ACT i

SEM-ARGS 1⟨[

IDX i]⟩

,

loc-relE-IND e2THEME iGRND j

SEM-ARGS⟨[

IDX i], 2

[IDX j

]⟩

⟩

The meaning is that an walking process leads to the state of some themeentity being in some location. The actor and the theme have the same referent.

Finally we give an example of a representation for an agentive manner ofmotion verb with a locational rather than directional interpretation. As can beseen in (24), the event structure associated with this groupof verbs is a non-resultative complex event. The complex event consists of a walking processsubevent and a location subevent.

(24) ga, ‘walk’ (locational)

word

SS | LOC

CAT | HEAD verb

CONT

non-resultative

SEM-ARGS 1 ⊕⟨

2⟩

TEMP-REL

included-relSIT1 e1SIT2 e2

SIT-STRUC

⟨

walk-relE-IND e1ACT i

SEM-ARGS 1⟨[

IDX i]⟩

,

loc-relE-IND e2THEME iGRND j

SEM-ARGS⟨[

IDX i], 2

[IDX j

]⟩

⟩


The meaning of the verb is that two subevents are involved simultane-ously, a walking process involving an actor and a locationalstate involvingthe placement of a theme at a location. The actor and the themehave the samereferent.

3.6 Constraint on there-insertion

In 3.1 we stated that an important function of the there-construction is to “lo-cate” the logical subject referent at a place or in a state. The formalizations ofthe semantics of the verbs appearing in there-constructions show a general-ization of the classes of verbs allowing there-insertion. The formal constrainton there-insertion in Danish can be seen in the rule in (25)4.

(25)

there-insertion-lexical-rule

IN

word

SYNSEM |LOC

CAT

HEAD verb

SUBJ⟨

1 NP[indef

]: 2 i

⟩

COMPS 3

CONT

situation

SIT-STRUC list ⊕

⟨[state-relTHEME i

]⟩

⊕ list

INFO-STRUC | TOPIC⟨

2⟩

OUT

word

SYNSEM | LOCAL | CATEGORY

[

SUBJ⟨der

⟩

COMPS 3 ⊕⟨

1⟩

]

INFO-STRUC | TOPIC⟨ ⟩

The function of this lexical rule is to produce a verb (and thereby a clause)without a topic. The input to the rule is a verb that has an indefinite subject, thesubject position is coded as topic. The referent of the subject is semanticallylocated in a state relation, i.e. it is the theme argument of astate relation. As(16) showsstate-relsubsumes bothloc-rel andtheme-only-relwhich are therelations involved in verbs allowing there-insertions.Der, ‘there’, is insertedon theSUBJ list5 and the logical subject is placed on theCOMPS list. Theoutput has no topic. Everything not explicitly mentioned inthe rule is carriedover unaltered from input to output.

As derhas no referential index it is unable to occur as a ‘normal’ argumentNote that the lexical rule in (25) does not preclude transitive verbs, cf.

Bjerre and Bjerre (Forthcoming)

4Cf. Kuthy and Meurers (2003) for details concerning the feature INFO-STRUC.5In German, the corresponding elementes is not a subject, cf. e.g. Platzack (1983), so in

Linearization-based HPSG (e.g. Kathol (2000)) the there-insertion lexical rule for German wouldinsertesnot on theSUBJlist but directly in theF slot.


3.7 ConclusionWe have shown that the distribution of verbs in Danish there-constructionsis determined by a “locative” constraint, meaning that the function of thethere-construction is to state the existence or appearanceof the logical sub-ject referent at some location or in some state. In order for the constraintto account for the verbs appearing in there-construction, including the agen-tive manner of motion verbs, a non-resultative complex event structure wasproposed for the motion verbs. This event structure represents both the un-acussative existence or appearance meaning and the agentive meaning ofthe agentive manner of motion verbs. Support for the non-resultative com-plex event structure was provided by the behaviour of these verbs in Danishpseudo-coordinations. Given an analysis with a non-resultative complex sit-uation with a locative subevent, we accounted for their agentive meaning,but also both their appearance in pseudo-coordinations andtheir appearancein there-constructions. Whereas in pseudo-coordinationsthe locative struc-ture could be said to locate the following event, in there-constructions, it wasshown to locate the logical subject referent.

ReferencesBjerre, Anne and Tavs Bjerre. 2007a. Perfect and periphrastic passive constructions

in Danish.Nordic of Linguistics30(1):5–53.

Bjerre, Anne and Tavs Bjerre. 2007b. Pseudocoordination inDanish. InProceedingsof the 14th International Conference on Head-Driven PhraseStructure Grammar,pages 6–24.

Bjerre, Anne and Tavs Bjerre. Forthcoming. Danish there-constructions with transitiveverbs. InProceedings of the 15th International Conference on Head-Driven PhraseStructure Grammar.

Bjerre, Tavs. 2003.Syntactically and Semantically Complex Predicates. Ph.D. thesis,University of Southern Denmark.

Borjars, Kersti and Nigel Vincent. 2005. Position vs. Function in Scandinavian Pre-sentational Constructions. In M. Butt and T. Holloway, eds., Proceedings of theLFG05 Conference.

there-insertion-lexical-rule-German

IN

word

DOM⟨[

v]⟩

SYNSEM | LOC CAT | HEAD verb

OUT

word

DOM

⟨[

F

PHON⟨es

⟩

]

,[v]

⟩

REFERENCES/ 21

Brandt, Søren. 1992. Two problems in Danish verb syntax.Nordic Journal of Lin-guistics15:47–64.

Bresnan, Joan. 1993.Locative Inversion and the Architecture of UG. Master’s thesis,University of Stanford.

Brink, Lars. 1997. Den danske der-konstruktion [The Danishthere-construction].Danske Studierpages 32–83.

Burzio, Luigi. 1986. Italian Syntax: A Government-Binding Approach. D. ReidelPublishing Company.

Bybee, Joan, Revere Perkins, and William Pagliuca. 1994.The Evolution of Grammar:Tense, Aspect, and Modality in the Languages of the World. University of ChicagoPress.

Davis, Anthony. 2001.Linking by Types in the Hierarchical Lexicon. Stanford: CSLIPublications.

Diderichsen, Paul. 1946.Elementær Dansk Grammatik. København: Gyldendal.

Dowty, David. 1979.Word Meaning and Montague Grammar. Dordrecht: Reidel.

Dowty, David. 1991. Thematic proto-roles and argument selection. Language67(3):pp. 547–619.

Hansen, Aage. 1967.Moderne Dansk [Modern Danish]. Grafisk Forlag.

Hoekstra, Teun and Rene Mulder. 1990. Unergatives as copular verbs: Locational andexistential predication.The Linguistic Review7.

Jensen, Per Anker. 1985.Principper for grammatisk analyse [Principles for gram-matical analysis]. København: Nyt Nordisk Forlag Arnold Busk.

Jørgensen, Henrik. 2001. Nogle bemærkninger om aspekt i dansk [Some remarks onaspect in Danish]. In C. Madsen, H. Skov, and P. E. Sørensen, eds.,Jeget og ordene,pages 115–136.Arhus: Klim and Institute for Nordic Languages and Literature.

Kathol, Andreas. 2000.Linear Syntax. Oxford University Press.

Kimball, John P. 1973. The grammar of existence. InPapers from the Ninth RegionalMeeting, Chicago Linguistic Society, pages 262–70. Chicago Linguistic Society,University of Chicago.

Kirsner, R. S. 1973. Natural focus and agentive interpretation: On semantics of Dutchexpletiveer. Stanford Occasional Paper in Linguisticspages 101–114.

Kuthy, Kordula De and Detmar Meurers. 2003. The secret life of focus exponents, andwhat it tells us about fronted verbal projections. InProceedings from the HPSG03Conference, pages 88–96.


Lakoff, George. 1965.On the Nature of Syntactic Irregularity. Ph.D. thesis, IndianaUniversity.

Levin, B. and M. R. Hovav. 1995.Unaccusativity. At the Syntax–Lexical SemanticsInterface. Cambridge, Massachusetts: MIT Press.

Lødrup, Helge. 2000. Linking and optimality in the Norwegian presentational focusconstruction.Nordic Journal of Linguistics22(2):205–230.

Maling, Joan. 1987. Existential sentences in Swedish and Icelandic: reference to the-matic roles.Working Papers in Scandinavian Syntax28.

Mateu, Jaume and Laia Amadas. 1999. Extended argument structure: Progressive asunaccusative.CatWPL7:159–174.

McCawley, James D. 1968. Lexical insertion in a transformational grammar withoutdeep structure.CLS4:71–80.

Milsark, Gary L. 1979.Existential Sentences in English. Garland Publishing, Inc.

Oxenvad, Erik. 1934. Om nogle upersonlige konstruktioner idansk [On certain im-personal constructions in Danish]. InStudier tilegnede Verner Dahlerup.

Perlmutter, David M. 1978. Impersonal passives and the unaccusative hypothesis.BLS4:157–189.

Platzack, Christer. 1983. Existential sentences in English, German, Iceland andSwedish. InPapers from the 7th Scandinavian Conference of LInguistics, pages80–100. University of Helsinki.

Sorace, Antonellea. 2004. Gradience at the lexicon-syntaxinterface. In A. Alexiadou,E. Anagnostopoulou, and M. Everaert, eds.,The Unaccusativity Puzzle, pages 243–268. Oxforn University Press.

Sveen, Andreas. 1996.Norwegian Personal Actives and the Unaccusative Hypothesis.Ph.D. thesis, University of Oslo.

Vendler, Zeno. 1957. Verbs and times. In Z. Vendler, ed.,Linguistics in Philosophy,pages 97–121. New York: Cornell University Press. 1967.

Zaenen, Annie. 1993. Integrating syntax and lexical semantics. In J. Pustejovsky, ed.,Sematics and the Lexicon, pages 129–161. Kluwer Academic Publishers.

4

A Landscape of Logics for FiniteUnordered Unranked TreesSTEPHAN KEPSER

AbstractIn this paper, we draw a landscape of the expressive power of diverse logics over finiteunordered unranked trees. A tree is unordered iff for each node there is no order on itschildren. A tree is unranked iff for each node the number of its children is independent ofits label. We compare here the expressive power of logics from three non-disjoint areas:logics related to automata theory, logics from descriptivecomplexity theory, and second-order logics. Several of these logics form natural hierarchies of expressive power. We willshow several separation results in these hierarchies thus showing that the hierarchies aremostly proper. We also present that the automata logics are incomparable to the logicsfrom descriptive complexity theory.

Keywords TREE LANGUAGES, TREE AUTOMATA, TREE LOGICS

4.1 IntroductionIn this paper, we consider finite labelled unordered unranked trees. A tree iscalledorderediff for each node there is a linear order on the children of thisnode. A tree is calledunorderediff for each node there is no order on itschildren. The two notions are not complementary. But partially ordered treeshave so far not attracted any research interest.

A tree isrankediff for each node the number of its children is a function ofits label. More generally, a ranking assigns to each label afiniteset of naturalnumbers. Each member of the set is a potential number of childnodes. Weconsider in this paper the unranked case. That means each node may have anarbitrary, but finite, number of children, independent of the label it bears.

Finite unordered unranked trees have many applications in computer sci-ence. The one that is probably best known comes from semi-structured

23


24 / STEPHAN KEPSER

database theory. Unordered unranked trees provide the so-called database-model of XML (Abiteboul et al., 2000). Unordered unranked trees also haveapplications in computational linguistics. They can be seen as the underlyingdata structures of dependency treebanks. The dependency structure usuallyforms a tree. There exists an ordering on the word level. But this order isnot relevant for the dependency structure. Thus the trees are unordered. Thisforms the main motivation for our work. We intend to investigate the expres-sive power of diverse logics as query languages for dependency treebanks.

In this paper we study a large number of logics to define languages of un-ordered unranked trees and compare their expressive power.Generally speak-ing, the logics we consider stem from three non-disjoint areas: logics re-lated to automata theory, logics discussed in descriptive complexity theory,and second-order logics. The basic logic from automata theory is monadicsecond-order logic. Two extensions of this logic will also be discussed. Fromthe area of descriptive complexity theory we consider

. deterministic transitive closure logic,. transitive closure logic,. least or initial fixed-point logic,. partial fixed-point logic, and. infinitary logic with finitely many variables.

We also discuss full second-order logic, its restriction topure existential quan-tification of second-order variables and its extension by second-order transi-tive closure.

Several of these logics form natural hierarchies of expressive power. Thisis true for the automata logics, the logics from descriptivecomplexity the-ory, and second-order logics. We will show numerous separation results inthese hierarchies thus showing that the hierarchies are mostly proper. We alsopresent that the automata logics are incomparable to the logics from descrip-tive complexity theory.

This paper is organised as follows. After the definition of finite unorderedunranked trees in the preliminaries we briefly recall the definitions of all log-ics of this paper in Section 4.3. Section 4.4 provides two simple results tostart with. Section 4.5 contains the separation of automatalogics from fixed-point logics. How to separate the fixed-point logics from second-order logicsis shown in Section 4.6. We close the paper with an overview ofthe resultsobtained in Section 4.7. For comparison, we added an appendix containing adescription of the situation for finite ordered ranked trees.

Due to restrictions of space, most formal definitions and some proofs hadto be omitted from this paper. A technical report containingall definitions andproofs is obtainable from the author.

LOGICS FORFINITE TREES/ 25

4.2 Preliminaries

We consider node-labelled finite unordered unranked trees.A tree is a finitedigraph with a distinguished node, the root and the propertythat for everynode there is a unique path from the root to this node. We also assume a finitesetΛ of node labels.

Formally, a tree is given by a triple(V,E,λ) whereV is a finite, non-emptyset of vertices or nodes,E⊆V×V is a finite set of edges, andλ is a mappingfrom V to Λ. Moreover, there is anr ∈ V, the root, such that for each nodev ∈ V there isn ∈ N and nodesv0,v1, . . . ,vn ∈ V with r = v0,vn = v and(vi ,vi+1) ∈ E for all 0≤ i < n (existence of a path from the root to everynode). Finally for allv,v′ ∈V, if there aren,m∈ N, nodesv0,v1, . . . ,vn ∈V,u0,u1, . . . ,um ∈ V with v = v0 = u0,vn = um = v′ and(vi ,vi+1) ∈ E for 0≤i < n and(u j ,u j+1) ∈ E for 0≤ j < m thenn= mandvi = ui for all 0≤ i ≤ n(uniqueness of paths).

A tree language is a set of trees.Similar to ordered trees, unordered trees can also be definedas terms. This

way of formalising them is useful in the discussions concering automata re-lated logics. We provide it here as an equivalent alternative to the definitionabove. LetM be a set. A multi-set is a functionf : M→N stating for each el-ement ofM its multiplicity. For a sequencem1, . . . ,mk ∈M of not necessarilydifferent elements fromM we denote{m1, . . . ,mk} its multi-set. A multi-setcan also be seen as an unordered sequence.

Based on multi-sets, unordered unranked trees for a given signatureΛ aredefined as follows. EachL ∈ Λ is an unordered unranked tree. Ift1, . . . ,tkare unordered unranked trees andL ∈ Λ then L{t1, . . . ,tk} is an unorderedunranked tree. The multi-set union is denoted by⊎. M ⊆m f in N means thatMis a finite sub-multiset ofN (whereN may also be a set).

4.3 The Logics

The basic logic we consider is first-order logic (denoted FO). From the pointof view of logic, trees are particular finite first-order structures. With everytree(V,E,λ) we associate a first-order structure(V,E,(L)L∈Λ) such thatL(v)iff λ(v) = L for everyv ∈ V. Hence we use the following atomic formulae:E(x,y) denotes the directed edge fromx (parent) toy (child). And L(x) ex-presses that nodex is labelled withL ∈ Λ.

4.3.1 Automata Related Logics

The logics in this section are logics defined to be equivalentto certain typesof tree automata. In opposite to the case of ordered ranked trees, differencesin the definition of tree automata lead to differences in expressive power. Theautomata and logics definitions that follow are taken from (Boneva and Tal-

26 / STEPHAN KEPSER

bot, 2005) and (Seidl et al., 2003).Monadic second-order logic (MSO) is the extension of first-order logic by

set variables and quantification over sets.The logic Counting MSO, defined by Courcelle (1990), denotedCMSO, is

an extension of MSO by predicates that allow modulo countingof sets. Thesyntax of MSO is extended by atomic formulaeModi

j(X) whereX is a setvariable,i, j ∈ N, j < i. The formulaModi

j(X) is true iff X has j elementsmoduloi.

Seidl, Schwentick, and Muscholl (2003) propose another, yet more power-ful, extension of MSO, namely Presburger MSO (denoted PMSO). The namePresburger refers to the fact that for an arbitrary node subsets of child nodescan be restricted by constraints expressed in Presburger arithmetic. An exam-ple would be to state that there are twice as many child nodes labelledL1 thanchildren labelledL2.

The syntax of PMSO is given by the following grammar (quoted from(Seidl et al., 2003)):

f ::= E(x,x) | x∈ S| x/p | f ∧ f | ¬ f | ∃x. f | ∃X. f

S ::= X | L

p ::= t = t | t + t = t | p∧ p | ¬p | ∃y.p

t ::= [S] | y | n

f is a PMSO formula,S is a set,p is a Presburger constraint, andt is a term.x∈ X0 is a first-order variable,X ∈ X1 is a set variable.y∈ Y is a first-orderPresburger variable,Y ∩X0 = /0. L∈Λ is a node label. The formulaep of x/pare Presburger-closed, i.e., do not contain free variablesfrom Y . Intuitively,the assertionx/p means that the children ofx satisfy constraintp where aterm[S] insidep is interpreted as the number of those children ofx which arecontained inS. Arithmetic expressions have their natural semantics.

Seidl et al. (2003) also provide an automaton model for PMSO,namelyPresburger tree automata (PTA). We explain this automaton model here, be-cause we will use it in subsequent proofs.

Given a finite setQ of states, we consider the canonical setYQ of variableswhich are indexed by elements inQ, i.e.,YQ = {yq | q ∈ Q}. A Presburgertree automaton is a quadrupleA = (Q,Λ,δ,F) where

. Q is a finite set of states,. F ⊆Q is the set of accepting states,. Λ is the set of node labels, and. δ maps pairs(q,L) of states and labels to Presburger constraints withfree variables from the setYQ.

The formulaϕ = δ(q,L) represents thepre-conditionon the children of a


node labelled byL for the transition into stateq where the possible valuesof the variablesyp represent the admissible multiplicatives of the statep onthe children. We introduce a satisfaction relationt |=A q between a treet anda stateq as follows. Assume thatt = L({t1, . . . ,tk}) andδ(q,L) = ϕ. Thent |=A ϕ iff there arek cardinalitiesn j , andk statesp j ∈Q such that

. t j |=A p j for i ≤ j ≤ k, and

. {yp j 7→ n j | 1≤ j ≤ n} |= ϕ.

The languageL(A ) of unordered unranked trees which is accepted by theautomatonA is given by

L(A ) = {t | ∃q∈ F : t |=A q}.

A tree languageL is PMSO-definable iff it is accepted by some Presburgertree automaton (Seidl et al., 2003).

We also consider a subclass of Presburger constraints, namely unary or-deringconstraints. An ordering constraint is defined as

p ::= t ≤ t | p∧ p | ¬p

t ::= y | n | t + t

There is no existential quantification. An atomic constraint t ≤ t ′ is calledunary iff it contains only one variable (but potentially several occurrencesof this one variable). A Presburger ordering constraint is called unary iff allits atomic constraints are unary. Note that a unary constraint may containseveral different variables as long as all of its atomic constraints contain onlyone variable.

A Presburger tree automaton over unary ordering constraints is called aunary orderingPTA. Boneva and Talbot (2005) showed that a tree languageL is MSO-definable iff there exists a unary ordering PTA that acceptsL.

On the basis of results by Boneva and Talbot (2005), Courcelle (1990),Seidl et al. (2003), the following is known about the expressive power ofthe different automata logics over unordered unranked trees. Here and in thefollowing, an inclusionA ⊆ B means that every tree language definable inlogic A is also definable in logicB. A proper inclusionA ( B indicates thatthere exist tree languages definable inB which areundefinablein A.

FO( MSO( CMSO( PMSO.

4.3.2 Transitive-Closure Logics

Transitive closure logic is the extension of FO by transitive closure operators.This extension is sensible because FO is known to be incapable of expressingtransitive closures. Formally, letk ∈ N andR a binary relation overk-tuples

28 / STEPHAN KEPSER

(R⊆Mk×Mk). Then

TC(R) :=\

{

W |R⊆W ⊆Mk×Mk,∀x, y, z∈Mk :(x, y),(y, z) ∈W=⇒ (x, z) ∈W

}

.

Deterministictransitive closure is the transitive closure of a deterministic,i.e., functional relation. For an arbitrary binary relation R over k-tuples wedefine itsdeterministic reductby RD := {(x, y)∈R| ∀z: (x, z)∈R =⇒ y= z}.Now DTC(R) := TC(RD).

The formulae of TC are defined by adding to first-order logic the transitiveclosure operator (TC):

If ϕ is a TC formula, ¯x = x1, . . . ,xn, y = y1, . . . ,yn are a subset of the freevariables ofϕ such that∀i, j,xi 6= y j , ands= s1, . . . ,sn, t = t1, . . . ,tn are terms,then[TCx,y ϕ](s, t) is a TC formula.

For DTC we add the deterministic transitive closure operator. If ϕ is a DTCformula, then[DTCx,y ϕ](s, t) is a DTC formula.

A predicate of the form[TCx,y ϕ] ([DTCx,y ϕ]) is supposed to denote the(deterministic) transitive closure of the relation definedby ϕ.

We also consider the special case where the transitive closure is restrictedto binary relations, i.e., the tuple size is 1. These logics are denoted as MTC(where M stands formonadic) and MDTC.

We just mention in passing that for every formula in DTC thereexists anequivalent formula in TC (see, e.g., (Immerman, 1999)).

4.3.3 Fixed-Point Logics and Infinitary Logics

The concept of adding transitive closure operators to FO canbe generalisedto adding fixed-point operators. Indeed, the transitive closure is a particularlysimple type of a fixed-point operator. In this paper, we will consider leastfixed-points, inflationary fixed-points and partial fixed-points. More explana-tion on these logics can be found in (Ebbinghaus and Flum, 1995, Immerman,1999, Libkin, 2004).

Let M be a set. An operator onM is a mappingF :℘(M)→℘(M). An op-eratorF is calledmonotone, if X ⊆Y impliesF(X)⊆ F(Y), andinflationary,if X⊆F(X) for all X,Y∈℘(M). Monotone operators are known to haveleastfixed-points(Tarski-Knaster-Theorem). ForF :℘(M)→℘(M) monotone wedefine LFP(F) =

T

{X | X = F(X)}.Inflationary operators also have fixed-points. This fact is used to transform

an arbitrary operatorG into a fixed-point operator by making it inflationary.Simply setGinfl(X) = X∪G(X). Now for X0 = /0 andXi+1 = Xi ∪G(Xi) setIFP(G) =

S∞i=0Xi .

Finally consider an arbitrary operatorF :℘(M)→℘(M) and the sequenceX0 = /0 andXi+1 = F(Xi). This sequence need not be inflationary. It henceneed not have a fixed-point. Hence we define the partial fixed-point of F as


PFP(F) = Xn if Xn = Xn+1 and PFP(F) = /0 if Xn 6= Xn+1 for all n≤ 2|M|.These operators will now be added to FO in the following way. Let R be

a relational variable of arityk. For each treet = (V,E,λ) the formulaϕ(R, x)where|x|= k gives rise to an operatorFϕ :℘(Vk)→℘(Vk) defined as

Fϕ(X) = {v | t |= ϕ(X/R, v)}.

Now let ϕ(R, x) be a formula where|x|= |t|= k. Then[IFPR,x ϕ(R, x)](t)is a formula of IFP,[LFPR,x ϕ(R, x)](t) is a formula of LFP (assumingR to bepositive inϕ), and[PFPR,x ϕ(R, x)](t) is a formula of PFP. Note that Gurevichand Shelah (1986) showed IFP = LFP.

The infinitary logicL∞ω is the extension of FO by arbitrary infinite dis-junctions and conjunctions. IfΨ is a set of formulae then

W

Ψ andV

Ψ areformulae. BecauseL∞ω is known to be much too powerful, we are interestedhere in a particular sublogic ofL∞ω, namely one in which each formula con-tains onlyfinitelymany different variables.

The class ofL∞ω formulae that use at mostk distinct variables will bedenotedL k

∞ω. And the finite variable infinitary logicsL ω∞ω is defined by

L ω∞ω =

[

k∈N

L k∞ω.

This logic is interesting because it comprises the fixed point logics LFP,IFP, and PFP, i.e., every class of finite structures definablein one of theselogics is definable inL ω

∞ω. The following diagram shows the expressive powerof the logics defined in the last two subsections on finite unordered unrankedtrees. The inclusions are a consequence of the definitions ofthe logics.

DTC ⊆ TC ⊆ LFP ⊆ PFP ⊆ L ω∞ω

⊆ ⊆ ⊆

MDTC ⊆ MTC ⊆ MLFP

Furthermore, Dawar et al. (1995) showed that LFP( L ω∞ω.

4.3.4 Second-Order Logics

In this section we introduce three variants of second-orderlogics. Full second-order logic (denoted SO) is the extension of FO by arbitrary relation variablesand arbitrary (second-order) quantification over these variables.

Existential second-order logic (ESO) is a restriction of SO. In ESO allsecond-order variables are globally existentially quantified. They are not in-volved in any quantifier alternation. Hence an ESO-formula consists of a pre-fix of existential second-order quantifications only and a FO-formula withSO-variables, but without SO-quantification.

The third logic of this section is SO with second-order transitive clo-sure, denoted SO(TC). It was introduced by Immerman (1999) as a logic thatstrongly captures PSPACE, i.e., the logic and the complexity class have the

30 / STEPHAN KEPSER

same expressive power on arbitrary (finite) structures, notjust ordered struc-tures. We will only make use of this logic as a logic that strongly capturesPSPACE. Hence we will not give a full definition, rather referthe interestedreader to the given reference.

Second-order logics are certainly full logics in their own right. But theyalso have a strong connection to complexity theory. Actually, descriptivecomplexity theory was initiated by Fagin’s result showing that ESO stronglycaptures NPTIME (Fagin, 1975). The logic SO strongly captures PH, thepolynomial hierarchy, and SO(TC) strongly captures PSPACE(see, e.g., (Im-merman, 1999)).

It follows immediately from the definitions that

FO( ESO⊆ SO⊆ SO(TC).

Whether any of these inclusions are strict are famous open problems incomplexity theory.

4.3.5 Overview

We close this section with an overview over what is known about the expres-sive power of the different logics defined above on finite unordered unrankedtrees (see Figure 1).

Let us explain those parts of Figure 1 that have not yet been justified pro-ceeding from bottom to top.

MLFP ⊆ MSO Every monadic least fixed point is expressible in MSO.See (Ebbinghaus and Flum, 1995).

TC ( SO(TC) On ordered structures, TC captures NLOGSPACE. The proofof this theorem also shows that TC⊆ NLOGSPACE on arbitrarystructures. Since SO(TC) strongly captures PSPACE = NPSPACE, theproper inclusion follows from the space hierarchy theorem.

PMSO⊆ ESO Seidl et al. (2003) show that any PMSO definable tree lan-guage is recognised in (deterministic) linear time. Since ESO stronglycaptures NPTIME, the inclusion follows.

LFP ⊆ ESO On ordered structures, LFP captures PTIME. The proof of thistheorem also shows that LFP⊆ PTIME on arbitrary structures. SinceESO strongly captures NPTIME, the inclusion follows.

4.4 Two Initial Results

We start with two smaller results. The first one states that even the weakestlogic extending FO, namely MDTC, is truly more powerful thanFO.

Theorem 1 The logicMDTC is strictly more powerful thanFO over un-ordered unranked trees.


SO(TC) L ω∞ω

SO

ESO PFP

PMSO LFP

CMSO

MSO

MLFP TC

MTC DTC

MDTC

FO

FIGURE 1 Logics for finite unordered unranked trees: the base.⊃— indicates a proper inclusion.

A tree language undefinable in FO, but definable in MDTC is one whereeach leaf node is at an even depth level. The second result concerns the ex-pressive power of MSO and MLFP.

Theorem 2 The logicsMSO and MLFP have the same expressive powerover unordered unranked trees.

It can be shown that an accepting run of a unary ordering PTA can belogically rendered in MLFP. This way the more important direction of thetheorem is proven. That MSO can express monadic least fixed points is men-tioned just above.

4.5 Separating Automata Logics and Fixed-Point Logics

The aim of this section is to separate automata logics from transitive closurelogics and fixed-point logics. This is done in two subparts. In the first onewe present a tree language that is DTC-definable, but not PMSO-definable.In the second part we present a tree language that is MSO-definable, but notTC-definable.

32 / STEPHAN KEPSER

4.5.1 A DTC-Definable Tree Language

In this section we present a tree language which is DTC-definable butnot PMSO-definable (and therefore neither CMSO-definable nor MSO-definable). It is a variation of a tree language defined by Tiede and Kepser(2006). We have the following node labelsf ,g where f labels the root,g isthe label for all other nodes. The language is defined asL1 = { f{gn,gn} | n∈N+}. It is the language of twog-chains of equal length below the root.

The languageL1 is definable in DTC as follows. LetRoot(x) := ¬∃yE(y,x)

define the root of a tree andLea f(x) := ¬∃yE(x,y)

define a leaf in the tree. The formula

OneCh(x) := ∃yE(x,y)∧∀z(E(x,z)→ z= y)

expresses that nodex has exactly one child. Consider the following predicateP:

[DTC(y1,y3),(y2,y4) E(y1,y2)∧E(y3,y4)]

which states thaty2 is at the same distance fromy1 asy4 from y3. Letϕ(x1,x2)be the formula

∀y1,y2 P(x1,x2,y1,y2)→ (g(y1)∧g(y2) ∧(Lea f(y1)∧Lea f(y2)) ∨(OneCh(y1)∧OneCh(y2))

expressing that ify1 is at the same distance fromx1 asy2 from x2 then bothare labelled withg and either both are leaves or both have exactly one child.Now the tree language is given by

∃r,x1,x2 Root(r)∧ f (r)∧E(r,x1)∧E(r,x2)∧g(x1)∧g(x2) ∧x1 6= x2∧∀z E(r,z)→ (z= x1∨z= x2) ∧ϕ(x1,x2)

The formula says thatr is the root, labelledf and thatr has exactly twochildrenx1 andx2 both labelledg andϕ holds forx1 andx2.

It is known that this tree language isnot MSO-definable. It can be shownthat it is not even PMSO-definable. The proof method is a variant of the prooffor the pumping lemma for recognisable tree languages adopted to unorderedunranked trees and PTA.

Proposition 3 The tree language L1 is DTC-definable, but isnot PMSO-definable.

PROOF. SupposeA = (Q,Λ,δ,F) is a tree automaton acceptingL1 andk = |Q| is the number of states. Letm> k. Consider the treet = f{gm,gm} ∈L1 and in particular its subtreegm. Sincem> k there must be a treet ′ = gl1

a non-empty contextC = gl2{•} and a contextC′ = gl3{•} and a stateq∈Q


FIGURE 2 Separating automata logics from fixed-point logics.

such thatl1+ l2+ l3 = mandgm =C′{C{t ′}} and both the root oft ′ andC{t ′}receive stateq in an accepting run fort.

Thereforeu = f{gm,C′{C{C{t ′}}}} is accepted byA because bothC{t ′}andC{C{t ′}} receive stateq in an accepting run.But u /∈ L1. 2

The results of this subsection are depicted in Figure 2. Logics in the greenarea are capable of definingL1, whereas logics in the red area are not.

Theorem 4 The following inclusions arestrict.

. MDTC is strictly less powerful thanDTC.. MTC is strictly less powerful thanTC.. MLFP is strictly less powerful thanLFP.. PMSOis strictly less powerful than ESO.

4.5.2 An MSO-Definable Tree Language

Consider the following tree language. It is originally defined in (Ebbinghausand Flum, 1995) as a class of finite graphs. All leaves are labelled eitherwith 0 or 1. All internal nodes are labelled withB for blank, some void nodelabel that is there only because we demand all nodes to be labelled. The leaf

34 / STEPHAN KEPSER

labels 0 and 1 are interpreted as false and true (resp.). Internal nodes functionas gates. They are set to true iff exactly one child node is setto false. Weconsider the class of trees whose root node is evaluated to true.

Formally we define two tree languages inductively as follows. Let Λ ={0,1,B} be a set of labels. The tree languagesL2 andL3 are the smallest setssuch that

0 ∈ L3

1 ∈ L2

B(L′) ∈ L3 whereL′ ⊆m f in L2

B({t}⊎L′) ∈ L2 wheret ∈ L3 andL′ ⊆m f in L2

B({t, t ′}⊎L′⊎L′′) ∈ L3 wheret,t ′ ∈ L3,L′ ⊆m f in L3, andL′′ ⊆m f in L2.

The tree languageL2 is recognised by the following Presburger tree au-tomatonA = ({qt ,qf },Λ,δ,{qt}) where

δ(0,qf ) = true δ(0,qt) = f alseδ(1,qf ) = f alse δ(1,qt) = trueδ(B,qf ) = yqf = 0∨yqf ≥ 2 δ(B,qt) = yqf = 1

HenceL2 is PMSO-definable. A close inspection ofδ reveals that all con-straints in the transitions are unary ordering constraints. HenceL2 is evenMSO-definable.

Proposition 5 There exists a tree language which isMSO-definable, but isnot TC-definable.

The proof that the tree languageL2 is not TC-definable is an applica-tion of results by Grohe (1994), reported in (Ebbinghaus andFlum, 1995,Chap. 7.6.3).

The results of this subsection are depicted in Figure 3. Logics in the greenarea are capable of definingL2, whereas logics in the red area are not.


. MTC is strictly less powerful thanMLFP andMSO.. TC is strictly less powerful thanLFP.

Theorem 7 The logics(P)MSOandTC are incomparable over the class offinite unordered unranked trees.

4.6 Separating Fixed-Point Logics and Second-Order LogicsThe main result of this section is that there is a tree languages definable inCMSO that is notL ω

∞ω-definable. We use the well known fact thatL ω∞ω is not


FIGURE 3 Separating automata logics from fixed-point logics.

particularly good at counting.Let Λ = {A}. Define the tree languageL4 = {(V,E,λ) | |V| =

2n for somen∈N} as the set of all tree with an even number of nodes (whereeach node is labelled withA). We first show thatL4 is CMSO-definable. Thefollowing formula definesL4.

∃X(∀x.x∈ X∧Mod20(X))

We will next show thatL4 is notL ω∞ω-definable using infinite pebble games.

For the definition of this type of games, the reader is referred to, e.g., (Libkin,2004, Chap. 11.2). For a natural numberk defineAk to be

A

~~~~

~~~~

@@@@

@@@@

oooooooooooooo

OOOOOOOOOOOOOO

A A . . . A A︸︷︷︸

k

36 / STEPHAN KEPSER

andBk to be

A

~~~~

~~~~

@@@@

@@@@

oooooooooooooo

OOOOOOOOOOOOOO

TTTTTTTTTTTTTTTTTTTTT

A A . . . A A A︸︷︷︸

k+1

If k is even thenAk has an odd number of nodes whileBk has an evennumber of nodes. Ifk is odd thenAk has an even number of nodes whileBk

has an odd number of nodes.

Lemma 8 The duplicator has a winning strategy for the infinite pebblegamePG∞

k (Ak,Bk) for every k∈ N.

PROOF. Let (a1, . . . ,ak) 7→ (b1, . . . ,bk) be a partial isomorphism betweenAk andBk. We assume no two pebbles are ever placed on the same node, be-cause doing so leads to a game with less thank pebbles. We also assume thatthe spoiler never leaves a pebble in its place when making a move, because ifhe did, the duplicator would do the same and the move would be void.

Assume the spoiler choosesBk and to reposition pebblej. We distinguishthe following cases.Case 1: There is a pebble on the root ofBk.Since(a1, . . . ,ak) 7→ (b1, . . . ,bk) is a partial isomorphism, there is al with1≤ l ≤ k such thatbl is the pebble on the root ofBk andal is a pebble on theroot ofAk.Case 1.1:j = l , i.e., the spoiler chooses the pebble on the root.Since there is now no pebble on the root ofBk, the substructure(b1, . . . ,bk) isnow a discrete structure ofk elements. SinceAk hask leaves and one pebble isplaced on the root ofAk there must be an unpebbled leaf ofAk. The duplicatorplaces hisj-th pebble on this leaf. Now(a1, . . . ,ak) is also a discrete structureand(a1, . . . ,ak) 7→ (b1, . . . ,bk) is a partial isomorphism.Case 1.2:j 6= l , i.e., the spoiler chooses a pebble on one of the leaves.The spoiler moves pebblej onto an unpebbled leaf. The resulting substructureinduced by(b1, . . . ,bk) is obviously isomorphic to the one before the move.Actually, it is Bk−2 = Ak−1. Since the substructure induced by(a1, . . . ,ak)is alsoAk−1, the duplicator leaves all his pebbles in place and(a1, . . . ,ak) 7→(b1, . . . ,bk) is a partial isomorphism.Case 2: There is no pebble on the root ofBk.Both (b1, . . . ,bk) and(a1, . . . ,ak) are discrete structures.Case 2.1: The spoiler moves pebblej onto the root ofBk.The induced structure of(b1, . . . ,bk) is now Bk−2 = Ak−1. The duplicatormimics this move moving his pebblej onto the root ofAk. Now the inducedstructure of(a1, . . . ,ak) is alsoAk−1 and(a1, . . . ,ak) 7→ (b1, . . . ,bk) is a partial


FIGURE 4 Separating fixed-point logics from second-order logics.

isomorphism.Case 2.2: The spoiler moves pebblej onto an unpebbled leaf ofBk.Then(b1, . . . ,bk) remains a discrete structure. Thus it is already isomorphicto (a1, . . . ,ak), and the duplicator leaves all his pebbles in place.

The argument for the situation where the spoiler chooses to move on struc-tureAk is analogous, actually simpler. 2

The lemma implies thatAk |= ϕ iff Bk |= ϕ for everyk∈ N andϕ ∈ L k∞ω.

Proposition 9 The tree language L4 of trees with an even number of nodes isCMSO-definable, but isnotL ω

∞ω-definable.

PROOF. SupposeL4 wereL ω∞ω-definable, i.e, there were a formulaϕ∈ L ω

∞ωthat definedL4. By definition ofL ω

∞ω there is ak∈ N such thatϕ ∈ L k∞ω. By

the above lemma, eitherAk |= ϕ andBk |= ϕ or Ak 2 ϕ andBk 2 ϕ. But oneof Ak,Bk has an even number of nodes, while the other has an odd numberof nodes. 2

The results of this section are summarised in Figure 4. Logics in the greenarea can defineL4 whereas logics in the red one cannot.


38 / STEPHAN KEPSER

SO(TC) L ω∞ω

SO L ω∞ω∩PSPACE

ESO PFP

PMSO LFP

CMSO

MSO MLFP=

TC

MTC DTC

MDTC

FO

FIGURE 5 Logics for finite unordered unranked trees.⊃— indicates a proper inclusion.

. PFPis strictly less powerful thanSO(TC).. LFP is strictly less powerful than ESO.. MSO is strictly less powerful thanCMSO.

The last result is already known. We just provided an alternative proof ofthe result.

4.7 Conclusion

Figure 5 depicts a landscape of the expressive power of different logics forfinite unordered unranked trees. This includes the relationship between PFPandL ω

∞ω, which we have not been able to present here due to space restric-tions. As one can see, most inclusions between different logics turn out to beproper.

An important result one can see from this picture is that the automata log-ics are largely incomparable to the logics stemming from descriptive com-plexity theory (TC, LFP, PFP).

Most of the remaining open questions are directly related todifficult openproblems in complexity theory. This is true for the second-order logics, butalso concerns the transitive closure logics. Also, the separation of LFP from


L ω∞ω

SO(TC)= PFP= PSPACE

SO= PH

ESO= NP

LFP= P

TC = NLOGSPACE

DTC = DLOGSPACE

MSO= MLFP= CMSO= PMSO

MTC

MDTC

FO

FIGURE 6 Logics for finite ordered ranked trees.⊃— indicates a proper inclusion.

PFP amounts to the separation of PTIME from PSPACE by the Abiteboul-Vianu theorem (Abiteboul and Vianu, 1995).

Appendix: The Situation for Finite Ordered Ranked Trees

For comparison we also show what is known about the expressive power ofthe above mentioned logics on finite ordered ranked trees. Most questions onwhether or not inclusions are proper are open. This is probably due to the factthat they are directly related to famous open problems in classical complexitytheory. Figure 6 summarises the results.

There are only few known non-trivial results of proper inclusion. Kolaitisand Vardi (1992) showed that PFP( L ω

∞ω. The proper inclusion TC( PFPfollows from the space hierarchy theorem. Tiede and Kepser (2006) showedthat MSO( DTC. Recently, ten Cate and Segoufin (2008) were able to showthat also MTC( MSO.

40 / STEPHAN KEPSER

ReferencesAbiteboul, Serge, Peter Buneman, and Dan Suciu. 2000.Data on the Web. Morgan

Kaufmann.

Abiteboul, Serge and Victor Vianu. 1995. Computing with first-order logic. Journalof Computer and System Sciences50:309–335.

Boneva, Iovka and Jean-Marc Talbot. 2005. Automata and logics over unranked andunordered trees. In J. Giesl, ed.,Proceedings RTA 2005, LNCS 3467, pages 500–515. Springer.

Courcelle, Bruno. 1990. The monadic second-order logic of graphs I: Recognizablesets of finite graphs.Information and Computation85:12–75.

Dawar, Anuj, Steven Lindell, and Scott Weinstein. 1995. Infinitary logic and inductivedefinability over finite structures.Information and Computation119(2):160–175.

Ebbinghaus, Heinz-Dieter and Jorg Flum. 1995.Finite Model Theory. Springer-Verlag.

Fagin, Ronald. 1975. Monadic generalized spectra.Zeitschrift fur MathematischeLogik und Grundlagen der Mathematik21:89–96.

Grohe, Martin. 1994. The Structure of Fixed-Point Logics. Ph.D. thesis, Albert-Ludwigs-Universitat Freiburg.

Gurevich, Yuri and Saharon Shelah. 1986. Fixed-point extensions of first-order logic.Annals of Pure and Applied Logic32:265–280.

Immerman, Neil. 1999.Descriptive Complexity. Springer.

Kolaitis, Phokion G. and Moshe Y. Vardi. 1992. Infinitary logics and 0-1 laws.Infor-mation and Computation98(2):258–294.

Libkin, Leonid. 2004.Elements of Finite Model Theory. Springer.

Seidl, Helmut, Thomas Schwentick, and Anca Muscholl. 2003.Numerical documentqueries. In T. Milo, ed.,Proc. 22nd Symposium on Principles of Database Systems(PODS 2003), pages 155–166. ACM.

ten Cate, Balder and Luc Segoufin. 2008. XPath, transitive closure logic, and nestedtree walking automata. InProceedings PODS 2008.

Tiede, Hans-Jorg and Stephan Kepser. 2006. Monadic second-order logic over treesand deterministic transitive closure logics. In G. Mints, ed., 13th Workshop onLogic, Language, Information and Computation, ENTCS 165, pages 189–199.Springer.

5

Semantics in Minimalist-CategorialGrammarsALAIN LECOMTE

AbstractThis paper is an attempt to develop a strictly derivationalist version of Chomsky’s

Minimalist theory integratingθ-roles and quantifier scoping without refering to a LFlevel. It is assumed different readings of a same sentence are obtained by various evalua-tion strategies, in a calculus which integrates a non deterministic version ofλµ-calculus.

Keywords SCOPE AMBIGUITIES, THETA-ROLES, MINIMALIST GRAMMARS ,CATEGORIAL GRAMMARS, λµ-CALCULUS

5.1 IntroductionMany works have been done on Type-Theoretical Grammars since the famousbooks by Glyn Morrill (Morrill (1994)) and Aarne Ranta (Ranta (1994)),respectively based on the Categorial tradition (Lambek (1958), Moortgat(1997)) and on Martin-Lof’s Constructive Type Theory. More recently, muchhas been done, exploiting Curry’s distinction between the tectogrammaticaland the phenogrammatical levels, and this has led to interesting proposals likeLambda Grammars, Abstract Categorial Grammars and Convergent Gram-mars (de Groote (2001a), Muskens (2003), Pollard (2007)). Type-theoreticformulations of Minimalist Grammars have also been proposed (Lecomteand Retore (2001), Amblard (2007), Lecomte (2005), Anoun and Lecomte(2006)). All these works take care of problems like scope ambiguities whichare traditional in the Montagovian perspective but they paylittle attention tothematic roles and binding phenomena (except Anoun and Lecomte (2006)and Pollard (2007)). These questions have been more widely adressed inthe Generative frameworks, but unfortunately without giving a proper and

41


42 / ALAIN LECOMTE

rigourous account of the derivations and above all of the syntax-semanticsinterface. In Chomsky’s Minimalist Program, we can say thatlittle attentionis given to the conceptual structure (contrarily, say, to Jackendoff). LogicalForm is simply a grammatical level, which remains very poor with regardsto the interpretation. Moreover, if it is simply a level of Universal Grammar,the question arises whether such an extra level is really needed. Some, like C.Pollard (Pollard (2007)) have suggested that LF is mainly a way to take scopeambiguity into account, by means ofad hoc tranformations ofQuantifierRaising, the only displacements which occur afterSpell Out. It is thereforetempting to develop a frame which keeps rigourous aspects ofCategorialGrammar and reconciles them with some intuitions of Generativist linguistsabout the thematic structure, in order to get richer semantic representationsthrough linguistic derivation.

In a nutshell, our proposal consists in using a bi-dimensional calculus, onedimension devoted to (narrow)-syntax, and the other to semantics (or ”log-ical form” but in a more elaborated version than it is the casein Minimal-ism). In this sense, it has several points in common with Pollard (2007) andPollard (2008) which recommands the syntax-semantics interface bepurelyderivationalandparallel. By purely derivationalhe means that derivationsare proofs, and byparallel that there are separate proofs that provide, re-spectively, candidate syntactic and semantic proofs and that it is the object oflinguistic theory to specify those proof pairs that belong to the language inquestion.

Here, our viewpoint is slightly different: like in the traditional type-theoretic formalisms which, following Montague, are in favor of a functionalapproach to semantic interpretation (along the lines of theCurry-Howard cor-respondance), we assume that the (narrow) syntactic derivation functionallyprovides a semantic form, BUT this form isunderspecified, that is: it maygive various readings according to the way it is evaluated. This evaluationis performed according to various strategies, which are known in the the-oretical computer science litterature under the names of Call-by-value andCall-by-name (or variants) and it consists innormalisationprocedure whichare appliedafter the syntactic part of the calculus. This consists in fact inswitching from the syntactic proof to the semantic one (by means of thetranslation from syntactic types to semantic ones) and thennormalising thesemantic proof. The point is that in those semantic proofs, aparticular type (t)may be interpreted as the formula⊥, thus introducing into the calculusnega-tion and rules for introducing and eliminating it: this justifiesλµ-calculus,since we know that it gives a computational content toclassicallogic (a logicwhere negation exists and where double negation may be eliminated).

Moreover, the results of evaluation are not predicate (or intensional) logicformulaea la Montague, but (fragments of) Discourse Representation Struc-

SEMANTICS IN MCG / 43

tures, simply because questions of binding are easier to solve in such a frame-work than in predicate (or intensional) logic. Pieces of information on thesame entity are given at various places during a proof. Sometimes it couldappear that a variable is bound before a new information is provided (like indonkey sentences), thus failing in the attribution of this information (whichcan be for instance an information on the thematic role). In such a case, theintermediate level ofdiscourse markersreveals useful.

We must also add that the syntactic machinery is here provided by a pieceof logical calculus (the so calledmixed-calculus, invented by Philippe deGroote (de Groote (1996)) and worked out by Christian Retor´e and MaximeAmblard (Amblard and Retore (2007)), a fragment which has been provenequivalent to Stabler’s minimalist grammars (Amblard (2007)). When trans-lated into the sequent calculus, we are only usingcut-free proofs. Because ofthat, the fact that we confine ourselves in this fragment has no severe conse-quences. Of course the use of the cut-rule and of the cut-elimination proce-dure would lead us to get off this fragment, thus obtaining more proofs, someof them having perhaps no linguistic interpretation. Another alternative is tokeep Minimalist Grammars as they are, using them as mere guidelines forobtaining semantic proofs that could still be normalized afterwards like wedo here.

5.2 Elements of VP analysis

Various works (Davidson (1966), Vendler (1967)), have led to the idea thatverbs express events and that there are complex events, which are structuredinto aninner and anouterevent, where the outer one is associated with cau-sation and agency, and the inner one is associated with telicity and change ofstate. This semantic idea is reflected in syntax by the introduction of av labelwhich is added toV into the structure of VP in order to distinguish betweenan internal part and an external one. Sometimes (see Hale andKeyser (1993))it is said thatv is associated with protoverbs likeDO or ACT.

Such views make it possible to understandtransitivity alternations, that iscases where a same verb can appear alternately in causative (a) or inchoative(b) sentences, like:

Example 1 a.Bill melted the iceb. the ice melted

(a) may be rephrased asBill caused the ice to melt, and the history of deriva-tions may be represented as the three following trees.

44 / ALAIN LECOMTE

v’

�� HHH

v VP

�� HH

DPthe ice

Vmelted

v’P

��HHHH

DPBill

v’

��HHHH

vCAUSED

VP

�� HH

DPthe ice

Vto melt

v’P

��HHH

DPBill

v’

��HHH

vmelted

VP�� HH

DPthe ice

V/0

where the first tree corresponds to (b), the third one to (a) and the second oneto the rephrasing of (a). As we see,v can be either interpreted as a protoverb(introducing a cause) or as an empty node which can serve as a target for amovement fromV.

In parallel, questions have been asked on how the semantic arguments ofa verb are linked to syntactic positions, in such a way that for instanceagentsare generallysubjectsand themesare generallyobjects. This question ledto the idea of an attribution order of the so-calledθ-roles, which has beenexpressed by M. Baker (Baker (1997)) in the following principle, known asUTAH (Uniform Theta Assignment Hypothesis):

Two arguments which fulfill the same thematic function with respect to agiven predicate must occupy the same underlying position inthe syntax.

If we put together these ideas we are led to a structure of VP with differentlevels, such that AGENTS are introduced in a specifier position of the highestVP (an ”external” position), whereas THEMES are introducedin a specifierposition of the lowest (an ”internal” one).

Moreover these principles help us finding a structure for ditransitive (ordouble-object) constructions like :

Example 2 John gave a book to Mary

and for structures withresultativepredicates, like:

Example 3 The acid will turn the paper red

From these observations, the canonical structure of a VP hasbeen defined as:


vP

��HHH

DPagent v’

��HHH

v VP

��HHH

DPtheme V’

�� HHH

V PP�� HH

P to DPgoal

If Hale & Keyser conceivev as proto-verbs expressing action or causa-tion, other researchers have assumed that agents are introduced by a specialpredicate, which is independent and additional to a transitive VP and that thispredicate selects an agent and the event described by the verb. Many consid-erations show that this predicate corresponds to an inflectionnal head. Kratzer(Kratzer (1994)) assumes this head to beVoice. In this paper, we shall con-centrate on this version of so-called ”constructionnalism”.

In all the sequel,e will denote the type ofindividualsands the type ofevents.

According to Kratzer (Kratzer (1994)), External Argumentsare base-generated in SPEC of VoiceP, and direct objects in SPEC of VP,thus leadingto the following analysis ofPaul feeds the dog:

VoiceP

��HHHH

DPPaul

Voice’

��HHH

VoiceAgent

VP�� HH

DPthe dog

V’

Vfeed

The semantic interpretation of which is provided by the following steps:

1. feed∗ = λxeλes. f eed(x)(e)

2. the dog∗ = the dog

3. (the dog feed)∗ = λes. f eed(the dog)(e)

4. Agent∗ = λxeλes.Agent(x)(e)

46 / ALAIN LECOMTE

5. (Agent (the dog feed))∗ =λxeλes.Agent(x)(e)∧ f eed(the dog)(e)

6. Paul∗ = Paul

7. (Agent (the dog feed)) Paul)∗ =λes.Agent(Paul)(e)∧ f eed(the dog)(e)

where the step (5) rests on aad hocrule that Kratzer namesEvent Indentifi-cation:

It takes a functionf and a functiong as input and yields a functionh as output.f is of type<e,<s, t>> andg of type<s, t>. h is of type<e,<s, t>>. Thefunctionh is calculated by the rule:h = λxeλes. f (x)(e)∧g(e)

One of the goals of this paper is to provide a solution which avoidssuch operations. This solution is based on a Type calculus, that we namethe Categorial-Minimalist framework (Amblard (2007), Lecomte and Retore(2001), Lecomte (2005)). Moreover, we keep in mind that we don’t only seeka solution for these verbal puzzles, but we also still want toexpress the usualproperties of determiner phrases (and in particular of Generalized Quanti-fiers) along lines similar to the Categorial framework (Moortgat (1997)).

5.3 The mixed calculus and the categorial-minimalistframework

5.3.1 A Labelled Type Grammar

Our framework is based on the mixed calculus (de Groote (1996), Amblardand Retore (2007)), a formulation of Partially Commutative Linear Logic.The plain calculus contains introduction and elimination rules for the noncommutative product• and its residuals (/ and\), and for the commutativeproduct⊗ and its residual−◦. Moreover there is an entropy rule which allowsto relax the order between hypotheses. This calculus has been shown to benormalizable (Amblard and Retore (2007)). For its use in linguistics however,we restrict it to elimination rules of/, \ and⊗ since we don’t see particularevidence for using introduction rules (there are hypotheses in this calculusbut they are discharged by means of the⊗ elimination rule).1 The operationsMergeandMoveof Minimalist Grammars (Stabler (1997), Stabler (2001))are replaced and simulated by combinations of logical rules, labelled withstrings for the phonological parts and byλµ-terms for the semantics (Amblard

1We therefore accept the critics that it is not a ”Type Logic”,it is the reason why we speakrather of a ”Type Grammar”


(2007)).Mergeis elimination of/ or \ followed byentropy:

∆ ⊢ u : A Γ ⊢ w : A\C[\E]

∆;Γ ⊢ uw : C[entropy]

∆,Γ ⊢ uw : C

Γ ⊢ w : C/A ∆ ⊢ u : A[/E]

Γ;∆ ⊢ wu : C[entropy]

Γ,∆ ⊢ wu : C

Moveis [⊗ E].

Γ ⊢ (z1,z2) : A⊗B ∆,x : A,y : B,∆′ ⊢ t : C[⊗E]

∆,Γ,∆′ ⊢ t[z1/x,z2/y] : C

The comma (”,”) is the structural counterpart of the commutative product,while the semi-column (”;”) is the counterpart of the non-commutative prod-uct.

5.3.2 The semantical tier

Semantical representations are built in by means ofλµ-terms. Let us simplyrecall here that theλµ-calculus is a strict extension of theλ-calculus (Parigot(1992), de Groote (2001b)). Its syntax is provided with a second alphabetof variables (α,β,γ... - calledµ-variables), and two additional constructs:µ-abstraction (µα.t), and naming (α t). The syntax of Parigot’sλµ-calculus is(Parigot (1992)):

V ::= x|λx.vv ::= V|(v v)|µα.cc ::= (α v)

whereV ′s are calledvalues. We shall see later on the role thesevaluesplayin the choice of an evaluation strategy.

µ-reduction may be defined in two ways, according to two rules that weshall denote by (µ) and (µ’).

(µ) (µα.u)v−→ µβ.u[α t := β(t v)]

whereu[α t := β(t v)] stands for the termu where each subterm of the form(α t) has been replaced byβ(t v).

(µ′) v(µα.u)−→ µβ.u[α t := β(v t)]

µ-reductions take their contexts (or ”continuations”) as their argumentsand put them inside their body at the points which are marked by the cor-respondingµ-variable. With these two rules the calculus is not deterministic(see again de Groote (2001b)), but this is exactly what we want when we wishto have an account, say, of scope ambiguities.

In order to end up a reduction, we shall assume moreover one simplifica-tion rule which applies when there is no more continuation topass:

(σ) µα.u−→ u[α t := t]

48 / ALAIN LECOMTE

The problem which arises when we deal with the assignment of theta rolesis that the derivation of a VP is such that meaning componentsconcerningthe event to which a verb refers are given at different stages. For instance,according to the principle that the THEME or PATIENT role is assigned in aspecifierposition with regards to the lowest part of the verb, it is assumed thatthe corresponding argument is raised from itscomplementposition (where itfeeds the verbal entry with a suitable variable orreference marker) to thespecifier position (where it receives itsθ-role).

Because movement is expressed by the[⊗ E] rule, that mechanism sup-poses that two variables, each associated with a hypothesized type (e.g.d andk) are discharged at the same time by the respective components of a samepair (e.g. associated with ad⊗ k type), but these two components are rela-tive to the very same object, that we propose to consider as a samereferencemarker. This leads us to use a formalism which is very close to Kamp’sDRSs.

We will then actually use DRSs as semantic recipes (see for instanceR. Muskens and Visser (1997)), a way of getting a dynamic system whichallows us to merge severalconditionson a samereferentat different times.We shall therefore havediscourse referentsandvariables, the first ones willbe noted with a point on their top ( ˙x, y, z...). A discourse referent is an indi-vidual which can be passed as a value to any individual variable.

The syntax of structures is provided by:

γ ::= (P ξ)|(α ξ)|ξ1 = ξ2|¬K|K1∨K2|K1⇒ K2

K ::= [ξ1...ξn|γ1, ...,γm]

where eachξ or ξi is either a variable or a referent. P is apredicate-variable.α is a µ-variable (also called aco-variable). λ-abstraction operates only onvariables (individual or predicate), andµ-abstraction on co-variables.

An operation on pairs of DRSs or on pairs(DRS, condition)is Fusion(elsewhere namedMerge, but we reserve this naming to the syntactic oper-ation), that we will denote⊔. We don’t develop this point here (see Zeevat(1991), Vermeulen (1993) and more recently de Groote (2007), van Eijck(2007), Muskens (1996)) because it is not the object of the present paper. Itsuffices to know that we have at least two alternatives for definingFusion:

1. [V|F ]⊔1 [W|G] = [V ∪W|F ∪G]

2. [V|F ]⊔2 [W|G] = [V⊕W|F∗∪G∗]

In (1) the same discourse referents which occur in V and in W are identified.In (2), they are renamed before fusion (and F∗ and G∗ are the conditions inF and G taking this renaming into account). If the choice between these twomodes is sometimes ambiguous when dealing with discourse, that will not bethe case in the simple frame of one sentence.

Nevertheless, this is not enough for definingFusion, because in most of


our examples we shall have to make the fusion of conditions containing anarbitrary set of variables with complex DRSs where the same variables (ora subset of these) are introduced at different levels. That leads to a morecomplex definition.

Definition 1 Let D be a complex DRS, let≺ the relation of subordinationbetween sub-DRSs and DRSs, letRD the set of reference markers ofD, letmδ the set of reference markers accessible from the sub-DRSδ, let Mδ theset of reference markers contained insideδ, letCδ the set of conditions insideδ (some of which being sub-DRSs). Let[V|φ] a DRS expressing a singlecondition on variables in V. Let us suppose thatV =V1∪V2 with V2∩MD = /0andV1⊂MD.

. if V1⊂ RD thenD⊔ [V|φ] = [RD∪V2|CD∪{φ}]. else if∃δ δ≺D and V1⊂mδ. if δ = δ1⇒ δ2 thenD⊔ [V|φ] = [δ2← [Rδ∪V2|Cδ∪{φ}]]D. else,D⊔ [V|φ] = [δ← [Rδ∪V2|Cδ∪{φ}]]D where ”←” expressesthe substitution.

The terms of our calculus are therefore the previous structures and thosewhich are obtained byλ or µ-abstraction on them, like for instance:

λQ.µα.[x|(Q x)∧ (α x)]

Let us take for instance aµ-term like (1)µα.[x|(BOOKx)∧ (α x)].If we admit [x|(BOOKx)∧ (α x)] is of type t (because it represents the

Discourse Structure of a sentence),µα makes at a same time an abstractionon α (thus leading to the type(e→ t)→ t) and performs the elimination ofdouble negation, thus leading finally to the typee. Therefore, when we havea semantic recipe like (1), it is of typee and we are allowed to put it as ane-type argument of a function like a transitive verb (of typee→ e→ t). Weprove this typing by the following derivation (where the distinguished typet isconsidered the ”observable” type which can also be seen as⊥, and therefore,e→ t is equivalent toe→⊥ and equivalent to¬e):

Γ ⊢ BOOK : e→⊥ Γ ⊢ x : e

Γ ⊢ BOOK(x) :⊥

α : ¬e⊢ α : e→⊥ Γ ⊢ x : e

Γ,α : ¬e⊢ (α x) :⊥

Γ,α : ¬e⊢ [x|BOOK(x)∧ (α x)] :⊥[E ⊥]

Γ ⊢ µα.[x|BOOK(x)∧ (α x)] : e

Let us notice that ˙x, as a discourse referent, is introduced here like a constant,it is exactly as if the context (here represented byΓ) provided a new markereach time it is needed.Such properties are important because they will allow us to state that:

50 / ALAIN LECOMTE

. scope is no longer dependent onc-command(a binder may bind therest of a sentence even if remainingin situ)

. quantified expressions no longer need higher order types (they can taketheir scopes from inside the terms in which they are enclosed, keepingtheir original typee for anNP, for instance)

Anotherλµ-term for our grammar is:

λQ.µα.[ |[x|(Q x)]⇒ [ |(α x)]]

which is introduced to serve as the translation ofevery, like in

every child reads a book

The termµα.[ |[x|(Q x)]⇒ [ |(α x)]] is also of type e by the same reasoning.

5.4 Application to the VP structure

5.4.1 VP syntactic derivation

Let us assume a verb liketo readhas the syntactic typeV/d, the semantic typee→ e→ s→ t and the semantic recipeλxλyλe.read(e,x,y). In a first step ofthe derivation, it will receive a first argument by merging with a hypothesisuof typed (hypotheses are put inside square brackets).

Vto read

λyλe.read(e,u,y)

��HHH

V/dto read

λxλyλe.read(e,x,y)

d[u]

We then assume aphonologically emptytype corresponding to the addingof the first thematic role (patient), of syntactic type :k\d\ v/V and semanticrecipe:

λPλx2λyλe.P(y,e)⊔1 patient(e,x2)

The fusion operation⊔1 is introduced at this point even if it is not particularilyrelevant because we have still only atomic formulae and neither DRS nor”conditions”, but it is not difficult to define an extension ofFusion to theseformulae: it simply amounts to conjunction in this particular case.

By mergewith the previously obtainedV, we get:

λx2λyλe.read(e,u,y)⊔1 patient(e,x2)


k\d\ vλx2λyλe.read(e,u,y)⊔1 patient(e,x2)

��

HHHHHH

k\d\ v/VλPλx2λyλe.P(y,e)⊔1 patient(e,x2)

Vto read

λyλe.read(e,u,y)

At this point, a second hypothesis,v, of typek is introduced, so that it resultsin:

d\vλyλe.read(e,u,y)⊔1 patient(e,v)

��

HHHHHHH

k[v]

k\d \vλx2λyλe.read(e,u,y)⊔1 patient(e,x2)

��

HHHHH

k\d \v/VλPλx2λyλe.P(y,e)⊔1 patient(e,x2)

Vto read

λyλe.read(e,u,y)

We can comment these steps by saying that:

1. the verbal phrase is prepared to host a first thematic role (patient)2. the thematic role will be assigned to theDP which will raise towards

the specifier position, here marked by the attribution of a formal featurek

The two hypothesesu andv can be discharged altogether by the[⊗E] rule, bymeans of the DPa book.

We assume this DP is of syntactic typek⊗d, and of semantic typee⊗e.Its semantic recipe is the pair(x,µα.[x |book(x)∧ (α x)]).

This step, based on[⊗E], builds up the following tree:

d\ vλyλe.read(e,µα.[x |book(x)∧ (α x)],y)⊔1 patient(e, x)

��

HHHHHHH

k⊗d(x,µα.[x |book(x)∧ (α x)])

d\ vλyλe.read(e,u,y)⊔1 patient(e,v)

We getλyλe.read(e,µα.[x |book(x)∧ (α x)]),y)⊔ patient(e, x), of syntactictyped\ c.

After this sequence of steps, there is a new merge, with a new hypothesisw of typed, thus givingλe.read(e,µα.[x |book(x)∧(α x)]),w)⊔ patient(e, x)

52 / ALAIN LECOMTE

of typev.Tense is then added by merge with an entry of type(k\t)/vwhich at the sametime provides an agentive role the semantics of which is:λP.λy2.λe.(P(e)⊔prest(e))⊔agent(e,y2), thus leading to:

k\ tλy2λe.(read(e,µα.[x |book(x)∧ (α x)]),w)⊔ patient(e, x)⊔ prst(e))

⊔ agent(e,y2)

��

HHHHHH

(k\ t)/vλP.λy2.λe.(P(e)⊔ prest(e))

⊔ agent(e,y2)

vλe.read(e,µα.[x |book(x)∧ (α x)]),w)

⊔ patient(e, x)

This is merged again with a hypothesiszof typek thus providing the sequent:

Γ,z : k,w : d ⊢

t : λe.((read(e,µα.[x |book(x)∧ (α x)]),w)

⊔

patient(e, x)⊔ prst(e))⊔ agent(e,z))

When these hypotheses are discharged by a DP like:

(y,µβ.[ |[y |child(y)]⇒ [ |(β y)]])

we finally get

Γ ⊢

t : λe.(read(e,µα.[x |book(x)∧ (α x)]),µβ.[ |[y |child(y)]⇒ [ |(β y)]])

⊔

patient(e, x)⊔ prst(e)) ⊔ agent(e, y)

The final typec/t the semantic recipe of which is (for simplification)λP.P(e),wheree is a discourse referent of types (event), finally gives the followingsemantic representation :

read(e,µα.[x |book(x)∧ (α x)],µβ.[ |[y |child(y)]⇒ [ |(β y)]])

⊔

pt(e, x) ⊔ prst(e) ⊔ agt(e, y)

which does not yet provide an interpretable meaning. For that, steps of eval-uation are still needed. It is the object of the next subsection. But we maysummarize the previous discussion by giving the list of grammatical expres-sions that this derivation required.


syntactic type semantic recipeV /d λxλyλe.read(e,x,y)

transitive verb to readk\d\ v/V λPλx2λyλe.P(y,e)⊔ patient(e,x2)

θ-role 1 εk\ t / v λPλx2λe.(P(e)⊔ prest(e))⊔agent(e,x2)θ-role 2 Tensek⊗d (x,µα.[x|book(x)∧ (α x)])DP a book

k⊗d (x,µα.[ |[x|child(x)⇒ (α x)]])DP every childc/ t λP.P(e)

comp ε

5.4.2 VP semantic evaluation

We will assume conditionsC(x1, ..., xn) equivalent to DRSs :

[x1, ..., xn|C(x1, ..., xn)]

The substructure :

read(e,µα.[x |book(x)∧ (α x)],µβ.[ |[y |child(y)]⇒ [ |(β y)]])

can be evaluated in two ways.First calculation:1. byµ′-reduction:

(α x) replaced by(α (read(e) x))((read(e),µα.[x |book(x)∧ (α x)]), µβ.[ |[y |child(y)]⇒ [ |(β y)]])−→(µα.[x |book(x)∧ (α (read(e) x))], µβ.[ |[y |child(y)]⇒ [ |(β y)]])

2. byµ-reduction:(α (read(e) x)) replaced by(α ((read(e) x) (µβ.[ |[y |...])))

(µα.[x |book(x)∧ (α (read(e) x))], µβ.[ |[y |child(y)]⇒ [ |(β y)]])−→(µα.[x |book(x)∧ (α ((read(e) x) (µβ.[ |[y |child(y)]⇒ [ |(β y)]])))])

3. by simplification ruleσ(µα.[x |book(x)∧(α ((read(e) x) (µβ.[ |[y |child(y)]⇒ [ |(β y)]])))])−→[x |book(x)∧((read(e) x) (µβ.[ |[y |child(y)]⇒ [ |(β y)]]))]

4. byµ′-reduction:(β y) replaced by(β ((read(e) x) y))

[x |book(x)∧ ((read(e) x) (µβ.[ |[y |child(y)]⇒ [ |(β y)]]))]−→[x |book(x)∧ (µβ.[ |[y |child(y)]⇒ [ |(β ((read(e) x) y))]]))]

54 / ALAIN LECOMTE

5. by simplification again[x |book(x)∧ (µβ.[ |[y |child(y)]⇒ [ |(β ((read(e) x) y))]]))]−→[x |book(x) [ |[y |child(y)]⇒ [ |(read(e) x) y)]]))]

This reading of course corresponds to the one in which at someinstant thereis a book which is read by every child.

In the second calculation, after(µ′), (µ′) is still used, replacing(β y) by(β (µα.[x |...])), followed by simplificaton,(µ) and again simplificaion, lead-ing to:

[ |[y |child(y)]⇒ [ | [x |book(x)∧ ((read(e) x) y)]]]

This reading corresponds to the one in which for every child,there is a bookwhich is read by him or her on a particuler event ˙e.

Now, if we take these two readings, we may combine them with the otherparts of the resulting structure since we just obtained, on either side, a DRS(or a condition in a DRS). We obtain both structures:[e, x |prst(e)∧book(x)∧ pt(e, x),[ |[y |child(y)⇒ [ |(read(e) x) y)∧agt(e, y)]]]))] and

[e|prst(e), [y |child(y)]⇒

[ | [x |book(x)∧ pt(e, x)∧ ((read(e) x) y)∧agt(e, y)]]]

whereFusionis defined like in (5-3).

5.5 Binding

One of the advantages of this framework is in its ability to easily deal withbinding phenomena. Let us imagine the following supplementary lexical en-tries.

syntactic type semantic recipeV /d λxλe.smart(e,x)

stative verb to be smartV / c λxλyλe.think(e,x,y)

prop verb to thinkk\ t /V λPλx2λe.(P(e)⊔state(e))⊔expc(e,x2)θ-role 1 Tense

k\d\ v/V λPλx2λyλe.P(y,e)⊔ patient(e,x2)θ-role 1 εk⊗d (ana(x),ana(x))ana (s)he

k⊗d (refl(x), refl(x))refl him/herselfc/ t λP.P(e)

comp ε


For the following sentences:

Example 4 a. Peter thinks he is smartb. John thinks Paul shaves himself

what we get is2:(4 a) :

[e, y|y = Peter,agent(e, y), think(e,

[e1,ana(x)|smart(e1, x),expc(e1, x)], y)]

(4 b):

[e, y|y = John,agent(e, y), think(e,

[e1, refl(x), z | z= Paul,shave(e1, x, z), pt(e1, x),agt(e1, z)], y)

A sentence is said to be closed when its event variable has been instanciated(by an event-reference marker). It is assumed here that every closed sentencemust have itsana- andrefl- reference markers linked either inside the sen-tence (S-linked) or by the discourse (in which case they are D-linked). Itis assumed that allrefl- reference markers must be S-linked, and allana-reference markers must be S-linked or D-linked.

It is otherwise assumed that aana- or a refl- reference marker may beS-linked only by being identified with an ordinary referencemarker (of thesame type,e, sor t) in the same sub-DRS forrefl and in the immediate higherDRS forana(of course it may be linked to it by an identification chain). Thenthis provides convenient readings for both sentences: (4 a):

[e, y|y = Peter,agent(e, y),think(e, [e1|smart(e1, y),expc(e1, y)], y)]

[e, y|y = Peter,agt(e, y),think(e, [e1, xD|smart(e1, x),exp(e1, x)], y)]

(4 b):

[e, y|y = John,agt(e, y), think(e,

[e1, z | z= Paul,shave(e1, z, z), pt(e1, z),agt(e1, z)], y)

We don’t treat here the relations betwen events. The multi-occurrence ofevent-reference markers opens the field to the exploration of tense correspon-dances inside complex sentences.

5.6 Conclusion and perspectivesThe previous analysis may pose some problems:

1. more readings exist for sentences including an event argument (for in-stance readings likethere is a book such that every child reads this bookat some eventor every child reads a book at some event),

2ana andrefl are essentially syntactic devices, on the semantic side, they can be interpretedas identity functors

56 / ALAIN LECOMTE

2. not all readings are required in all cases: for instance incase of a ques-tion, thewh-DP may have always a wide scope over any QNP

The first problem is solved by giving to the emptycompthe semantic recipeλP.P(µδ.[e|(δ e)]) . The second problem is more delicate: it is here that thechoice between several strategies is relevant. Semantic recipes associatedwith somewh-DP or QNP may be labelled by the kind of regime they askfor.

As we know from works on duality of computation (among them Herbe-lin (2005), Curien and Herbelin (2000)), several reductionsystems may beconsidered, which are deterministic. Among the deterministic calculi, let usmention theleft to right call by valueand theright to left call by valuever-sions.

Left to Right Call by Value

(βv) ((λx.v) V) → v[x←V](µL) ((µα.c) v) → µα.c[α← (α ([ ] v))](µRv) ((λx.v) (µα.c)) → µα.c[α← (α ((λx.v) [ ]))](µvar) (α µβ.c) → c[β← (α [ ])]

Notice that in that version,

. theβ reduction rule only applies tovalues, that is not, for instance, toµ-terms. the(µ) reduction rule is kept free of any conditions. the(µ′) rule may be applied, but only in the context of aλ-term

Let us look at then the right to left call by value version.

Right to Left Call by Value

(βv) ((λx.v) V) → v[x←V](µLv) ((µα.c) V) → µα.c[α← (α ([ ] V))](µR) (v (µα.c)) → µα.c[α← (α (v [ ]))](µvar) (α µβ.c) → c[β← (α [ ])]

Symmetrically with regards to the left to right version, we may observe that:

. theβ reduction rule only applies tovalues, that is not, for instance, toµ-terms. the(µ) reduction rule only applies when aµ-term is applied to a value. the(µ′) reduction rule is kept free of any conditions

Now, if we look at our two calculations in 4-2, we see that for each one, thesecond step is crucial, because there, expressions of the form (µα.φ,µβ.ψ)are met. Under aLRCBV regime,(µ′) cannot be applied (thus blocking thesecond calculation) and under aRLCBV regime, it is(µ) which cannot beapplied, thus blocking the first calculation. Therefore, ifwe wish to block the

REFERENCES/ 57

reading where the existentiala bookis raising, we specify thatµα is under theRLCBV regime, and we writeµ←α. In this case only the second reading willobtain. Reciprocally, if we wish to compell this existential to raise, we simplyspecify thatµβ is under theLRCBV regime, and we writeµ→β. This meansthat the complete evaluation will be held under this regime.By default,µ isused with an underspecified regime thus resulting in severalreadings. Noticethat two different labels forµ operators in the same sentence would result ina conflict (acrashof the calculation).

Finally, we may compare this proposal with the ideas included in theMinimalist Program: whereas, in MP, a part of the derivationis continuedafter Spell-Out, which consists in a ”covert” part of the derivation (mainlyQuantifier-Raising), we propose to replace these transformations by an eval-uation stage which is done according to a well-defined strategy.

Acknowledgements

Thanks to Ruth Kempson, Glyn Morrill and Carl Pollard for providing a help-ful discussion during the workshop Glyn organized in Barcelona in Decem-ber, thanks also to Maxime Amblard for his suggestions and advices, to LeaNash having attracted my attention on ”constructionnalism” and to Philippede Groote, Michael Moortgat and Hugo Herbelin for having given to me abetter understanding of theλµ-calculus (and other related formalisms) by thetalks they gave in the Prelude workshop held last year in Carry-le-Rouet andto three anonymous referees who helped much in formulating some parts ofthis paper. Of course, all of the incorrections are of my own responsibility.

ReferencesACL’01. 2001. Proceedings ot the 39th Meeting of ACL. Toulouse: ACL 2001.

Amblard, M. 2007.Calculs de representations semantiques et syntaxe gen´erative: lesgrammaires minimalistes categorielles. Phd thesis, Universite Bordeaux 1.

Amblard, M. and C. Retore. 2007. Natural deduction and normalization for par-tially commutative linear logic and Lambek calculus with product. Computationand Logic in the Real WorldQuaderni del Dipartimento di Scienze MathematicheRoberto Maggiori.

Anoun, H. and A. Lecomte. 2006. Linear grammars with labels.In G. S. P. Monachesi,G. Penn and S. Wintner, eds.,Proceedings of Formal Grammar 06. CSLI Publica-tions.

Baker, M. 1997. Thematic Roles and Syntactic Structure. In L. Haegeman, ed.,El-ements of Grammar, Handbook of Generative Syntax, pages 73–137. Dordrecht:Kluwer.

58 / ALAIN LECOMTE

Curien, P. and H. Herbelin. 2000. Duality of computation. InI. 2000, ed.,Proceed-ings of the Fifth AGM SIGPLAN, pages 233–243. Montreal, Canada: SIGPLANNotices.

Davidson, D. 1966. The Logical Form of Action Sentences. In D. Davidson, ed.,Essays on Actions and Events. Oxford: Clarendon Press.

de Groote, P. 1996. Partially commutative linear logic: sequent calculus and phasesemantics. In M. Abrusci and C. Casadio, eds.,Proofs and Linguistic Categories,pages 199–208. CLUEB - University of Chieti.

de Groote, P. 2001a. Towards Abstract Categorial Grammars.In ACL’01 (2001),pages 148–155.

de Groote, P. 2001b. Type raising, continuations, and classical logic. In Proceed-ings ot the 13th Amsterdam Colloquium, pages 97–101. Amsterdam: Language andComputation.

de Groote, P. 2007. Towards a Montagovian Account of Dynamics. InProceedings ofSemantics and Linguistic Theory XVI. CLC Publications.

Hale and Keyser. 1993. On Argument Structure and the LexicalExpression of Syn-tactic Relations. In Hale and Keyser, eds.,The View from Building 20. Ithaca: MITPress.

Herbelin, H. 2005.Au coeur de la dualite. Phd thesis, Universite Paris 11.

Kratzer, A. 1994. External arguments. In E. Benedicto and J.Runner, eds.,FunctionalProjections. Amherst: University of Massachussets, Occasional Papers.

Lambek, J. 1958. The Mathematics of Sentence Structure.American MathematicalMonthly65:154–170.

Lecomte, A. 2005. Categorial grammar for minimalism. In P. S. C. Casadio andR. Seely, eds.,Language and Grammar, Studies in Mathematical LinguisticsandNatural Language, pages 163–188. Stanford: CSLI Publications.

Lecomte, A. and C. Retore. 2001. Extending Lambek grammars: a logical account ofMinimalist Grammars. In ACL’01 (2001), pages 354–362.

Moortgat, M. 1997. Categorial type logics. In van Benthem and ter Meulen (1997),chap. 2, pages 93–178.

Morrill, G. 1994. Type Logical Grammar, Categorial Logic of Signs. Dordrecht:Kluwer.

Muskens, R. 1996. Combining Montague Semantics and Discourse Representation.Linguistics and Philosophy19:143–186.

REFERENCES/ 59

Muskens, R. 2003. Languages, lambdas and logic. In G.-J. Kruijff and R. Oehrle,eds.,Resource Sensitivity in Binding and Anaphora. Amsterdam: Kluwer.

Parigot, M. 1992.λµ-calculus: an algorithmic interpretation of classical natural de-duction. In A. Voronkov, ed.,Proceedings of the International Conference on LogicProgramming and Automated Reasoning. Berlin: Springers.

Pollard, Carl. 2007. Convergent Grammars. Tech. rep., The Ohio State University.

Pollard, Carl. 2008. Covert movement in logical grammar.ESSLLI Workshop onLudics, Symmetric calculi and Continuations.

R. Muskens, J. van Benthem and A. Visser. 1997. Dynamics. In van Benthem and terMeulen (1997), chap. 10, pages 587–648.

Ranta, A. 1994.Type Theoretical Grammar. Oxford University Press.

Stabler, E. 1997. Derivational minimalism. In C. Retore, ed., Logical Aspects ofComputational Linguistics, vol. 1328 ofLNCS/LNAI, pages 68–95. Springer.

Stabler, E. 2001. Recognizing head movement. In P. de Grooteand G. Morrill, eds.,Logical Aspects of Computational Linguistics, vol. 2099 ofLNCS/LNAI, pages245–260. Springer.

van Benthem, J. and A. ter Meulen, eds. 1997.Handbook of Logic and Language.Elsevier.

van Eijck, J. 2007. Context and the composition of meaning. In H. Bunt andR. Muskens, eds.,Computing Meaning 3, pages 173–193. Netherlands: Springer.

Vendler, Z. 1967. Verbs and Times. In Z. Vendler, ed.,Linguistics in Philosophy.Ithaca: Cornell University Press.

Vermeulen, C. F. M. 1993. Merging without Mystery or: Variables in Dynamic Se-mantics.Journal of Philosophical Logic24(4):405–450.

Zeevat, H. 1991.Aspects of Discourse Representation Theory and UnificationGram-mar. Phd thesis, Amsterdam University.

6

Treebanks and Mild Context-SensitivityWOLFGANG MAIER AND ANDERS SØGAARD †

AbstractSome treebanks, such as German TIGER/NeGra, represent discontinuous elements di-

rectly, i.e. trees contain crossing edges, but the context-free grammars that are extractedfrom them, fail to make any use of this information. In this paper, we present a method forextracting mildly context-sensitive grammars, i.e. simple range concatenation grammars(RCGs), from such treebanks. A measure for the degree of a treebank’s mild context-sensitivity is presented and compared to similar measures used in non-projective depen-dency parsing. Our work is also compared to discontinuous phrase structure grammar(DPSG).

Keywords TREEBANKS, ANNOTATION , DISCONTINUOUS CONSTITUENTS,MILD CONTEXT-SENSITIVITY

6.1 Introduction

Discontinuous constituents (Huck and Ojeda, 1987) are common across nat-ural languages, and they occur particularly frequently in languages with rela-tively free word order, such as German. In the following example, the discon-tinuity is caused by topicalization:

(1) Drei Papiere will ich heute noch schreibenthree papers want I today still write‘I still want to write three papers today.’

However, discontinuous constituents are also found in languages with rel-atively fixed word order, such as Chinese:

†Thanks to Laura Kallmeyer and the three anonymous reviewersfor helpful comments andsuggestions.

61


62 / WOLFGANG MAIER AND ANDERSSØGAARD

(2) shu1, wo zhi mai pian-yi-de t1.book1, I only buy cheap t1.‘As for books, I only buy cheap ones.’

The constituent annotation schemata used in treebanks typically includesome mechanism for treating discontinuous constituents. One of the mostcommon ways is simply to use special labels. Consider, for instance, the fol-lowing case of right node raising in the Penn Treebank (Marcus et al., 1994):

(S But(NP-SBJ-2 our outlook)(VP (VP has

(VP been(ADJP *RNR*-1)))

,and(VP continues

(S (NP-SBJ *-2)(VP to

(VP be(ADJP *RNR*-1)))))

,(ADJP-1 defensive)))

FIGURE 1 A PTB tree

A reference is established between the raised constituent and its originalsites by the special label*RNR* and by the coindexation (-1). The GermanTubingen Treebank of Written German (TuBa-D/Z) (Telljohann et al., 2006),as another example, uses edge labels to establish the reference between partsof discontinuous constituents. This mechanism is supported by an additionallevel of topological field annotation. Figure 2 shows the annotation of (3).

(3) Schillen wies dies gestern zuruckSchillen rejected that yesterdayVPART

‘Schillen rejected that yesterday.’

Here, the edge labelV-MOD on the adverb (gestern) establishes a link toits referent, the verbwies. Similar conventions are found in the Spanish 3LBtreebank (Civit and Martı Antonın, 2002), for example. The German NeGra(Skut et al., 1997) and TIGER (Brants et al., 2002) treebankstake a differ-ent approach. Both depart from annotation backbones based on context-freegrammar and represent discontinuous constituents directly. Figure 3 showsthe NeGra annotation for (4).

TREEBANKS AND M ILD CONTEXT-SENSITIVITY / 63

FIGURE 2 A TuBa-D/Z tree

(4) Daruber muß nachgedacht werdenThereof must thought be‘Thereof must be thought.’

FIGURE 3 A NeGra tree

The verb phrasedaruber nachgedachtin (4) is a discontinuous constituent.The discontinuity is represented in NeGra by crossing edges. TIGER alsouses crossing edges to represent discontinuities.

Most grammars extracted from TIGER/NeGra, if not all, have neverthelessbeen context-free. Conversions into context-free representations, however,introduce inconsistencies (Kubler et al., 2006, Boyd, 2007). Muller (2004)also shows in an experiment with two head-driven phrase structure grammars(HPSGs) for German that, in addition, the grammar that did not use discon-tinuous constituents led to around twice as many passive edges in parsing.

Simple RCGs are equivalent to linear context-free rewriting systems(LCFRSs) (Weir, 1988), as also shown in Boullier (1998). Thederivation


structures of simple range concatenation grammars (RCGs) (Boullier, 1998)can be used as a unified approach to formally describe trees with discon-tinuous elements. So can the derivation structures of othersimilar grammarformalisms (see Sect. 6.4). Our choice of formalism is mainly motivated bythe authors’ recent work on using RCGs for parsing multicomponent tree-adjoining grammars (Lichte, 2007, Kallmeyer et al., 2008) and on usingRCGs in syntax-based machine translation (Søgaard, 2008).The extractionof such derivation structures from treebanks is also a first step toward proba-bilistic RCGs.

In Sect. 6.2, RCGs and their derivation structures are introduced in somedetail. Sect. 6.3 shows how to interpret TIGER/NeGra trees as RCG deriva-tions, and how to extract the underlying simple RCGs. This enables the tree-banks to serve as resources for probabilistic RCG parsing. Ameasure of mildcontext-sensitivity is then defined and compared to a similar notion from non-projective dependency parsing. In Sect. 6.4, ties to other grammar formalismsare discussed. Future work is also outlined.

6.2 Range concatenation grammarsIn RCGs (Boullier, 1998), predicates can be negated (for complementation).If RCGs contain no negated predicates they are calledpositiveRCGs. Sincesimple RCGs are included in the positive RCGs, negated predicates are ig-nored in the following.

Definition 1 [Positive RCGs] A positive RCG is a 5-tupleG= 〈N,T,V,P,S〉.N is a finite set of predicate names with an arity functionρ: N→N∗, T andVare finite sets of terminal and non-terminal symbols.P is a finite set of clausesof the form

ψ0→ ψ1 . . .ψm

where and each of theψi , 0≤ i≤m, is a predicate of the formA(α1, . . . ,αρ(A)).Eachα j ∈ (T∪V)∗, 1≤ j ≤ ρ(A), is an argument.S∈N is the start predicatename withρ(S) = 1.

Note that the order of RHS predicates in a clause is of no importance. Twosubclasses of RCGs are introduced for further reference:

. An RCG G = 〈N,T,V,P,S〉 is simpleiff for all c∈ P, it holds that novariableX occurs more than once in the LHS ofc, and if X occurs inthe LHS then it occurs exactly once in the RHS, and each argument inthe RHS ofc contains exactly one variable.. An RCGG = 〈N,T,V,P,S〉 is ak-RCGiff for all A∈ N,ρ(A)≤ k.

The language of RCGs is based on the notion ofrange. For a stringw1 . . .wn a range is a pair of indices〈i, j〉 with 0≤ i ≤ j ≤ n, i.e. a stringspan, which denotes a substringwi+1 . . .wj in the source string or a substring


vi+1 . . .v j in the target string. Only consequtive ranges can be concatenatedinto new ranges. Terminals, variables and arguments in a clause are boundto ranges by a substitution mechanism. Aninstantiatedclause is a clausein which variables and arguments are consistently replacedby ranges; itscomponents areinstantiated predicates. For exampleA(〈g. . .h〉)→ B(〈g+1. . .h〉) is an instantiation of the clauseA(aX1)→ B(X1) if the target string issuch thatwg+1 = a. A deriverelation=⇒ is defined on strings of instantiatedpredicates. If an instantiated predicate is the LHS of some instantiated clause,it can be replaced by the RHS of that instantiated clause. Thelanguage ofan RCGG = 〈N,T,V,P,S〉 is the setL(G) = {w1 . . .wn | S(〈0,n〉)

∗=⇒ ε},

i.e. an input stringw1 . . .wn is recognized if and only if the empty string canbe derived fromS(〈0,n〉).

Example 1 Let G = 〈{S,A},{a,b},{X,Y},P,S〉 be a simple 2-RCG withP= {S(XY)→A(X,Y), A(aX,aY)→A(X,Y), A(bX,bY)→A(X,Y), A(ε,ε)→ε. It is easy to see thatL(G) = {ww |w∈ {a,b}∗} (the copy language). Con-sider, for instance, a derivation of the stringababin G:

S(〈0,4〉)=⇒ A(〈0,2〉,〈2,4〉)=⇒ A(〈1,2〉,〈3,4〉)=⇒ A(〈ε〉,〈ε〉)=⇒ ε

6.2.1 RCG derivation structures

All possible parses of a stringw with respect to some RCGG can be resp-resented as a context-free grammarGD (Bertsch and Nederhof, 2001). In-tuitively, this is achieved by introducing a context-free production for eachpossible instantiation of every clause inG with ranges ofw, interpreting theinstantiated predicates as non-terminal symbols of the resulting CFG. It al-lows for a packed representation of all parses, i.e. a sharedforest or AND-ORgraph (Billot and Lang, 1989). The derivation in Example 1 is, for instance,represented as:

S(〈0,4〉)

A(〈0,2〉,〈2,4〉)

A(〈1,2〉,〈3,4〉)

A(〈ε〉,〈ε〉)

ε


6.3 RCG derivation structure treebanks6.3.1 Reconstructing clauses from derivation structures

The trees of both NeGra and TIGER can be interpreted as RCG derivations;in other words, these treebanks can be considered a resourcefor estimatingprobabilistic RCGs for German. Since the estimated RCGs areguaranteedto be simple, standard estimation procedures can be adopted, e.g. Kato et al.(2006) (Sect. 6.4.2). A method is presented below for extracting RCGs fromtreebanks with crossing edges.

Our goal is thus to interpret the treebank trees as RCG derivations. In orderto do that in a meaningful way, we first have to identify the clauses the parsetree is composed of. To achieve that, different arguments ofRCG clauses haveto be identified. For each tree over some sentencew1 . . .wn, we extract a setof clauses. All clauses are counted and collected into a single grammar. Theclauses for a single tree are extracted as follows. For each nonterminal nodeNwith n daughtersN′1, . . .N

′m, whereN is not a preterminal, introduce a clause

with an LHS predicate namedN andm RHS predicates namedN′1 to N′m. Foreachwi , 1≤ i ≤ n, we introduce a variableXi . Then for the predicateN, thefollowing conditions must hold:

. The arguments ofN must contain no terminals,. the concatenation of all variables in all arguments ofN must be theconcatenation of allX ∈ {Xi | N dominateswi} such thatXi precedesXj if i < j,. a variableXi with 1≤ i < n, is the right boundary of an argument of thepredicateN iff Xi+1 /∈ {Xi | N dominateswi}, i.e., an argument bound-ary is introduced at each discontinuity.

The arguments of the RHS predicates are determined in the same way. Rangevariables which are adjacent on the LHS and the RHS are collapsed into sin-gle variables, which assures that the RHS predicates of the resulting clauseshave a single variable per argument. For each preterminal nodeN dominat-ing some terminal nodewi , we introduce a clauseN(wi)→ ε, called alexicalclause. This procedure yields the following set of clauses for the tree in Figure3:

PROAV(Daruber) → εVMFIN(muß) → ε

VVPP(nachgedacht) → εVAINF(werden) → ε

S(X1X2X3) → VP(X1,X3) VMFIN(X2)VP(X1,X2X3) → VP(X1,X2) VAINF(X3)

VP(X1,X2) → PROAV(X1) VVPP(X2)

We can now reconstruct the the RCG derivation, as in Figure 4.Note that


〈0,1〉 is Daruber, 〈1,2〉 is muß, 〈2,3〉 is nachgedachtand〈3,4〉 is werden.

S(〈0,1〉,〈1,2〉,〈2,4〉)

VP(〈0,1〉,〈2,4〉) VMFIN(〈1,2〉)

VP(〈0,1〉,〈2,3〉) VAINF(〈3,4〉) ε

PROAV(〈0,1〉) VVPP(〈2,3〉) ε

ε ε

FIGURE 4 Interpretation of a NeGra tree as RCG derivation

It is easy to see that the RCGs extracted this way are allsimpleRCGs.This is a result of the fact that no string range can be part of more than oneconstituent in TIGER/NeGra. What differentiates the TIGER/NeGra annota-tion from a context-free annotation is merely the possibility to group all partsof discontinuous constituents under the same node disregarding the possibleintervening material.

6.3.2 Extracting range concatenation grammars

The algorithm described above was applied to TIGER/NeGra toextract sim-ple RCGs. The dimensions of the treebanks are shown in table 1.

sent cross av slen av ntNeGra 20602 5853 (28.40%) 17.24 6.07TIGER 50474 14114 (27.96%) 17.60 6.44

TABLE 1 Properties of NeGra and TIGER

TIGER is roughly 2.5 times as big as NeGra. The ratio of sentences withcrossing edges (cross), the average sentence length (av slen) and the averagenumber of nonterminals per sentence (av nt) are all comparable, however,which confirms a consistent application of the annotation guidelines. Figure2 shows some dimensions of the extracted grammarsGN, i.e. the simple RCGextracted from NeGra, andGT , i.e. the simple RCG extracted from TIGER.

The most frequent clauses in the extracted grammars that involve disconti-nuities are listed below, i.e. the most frequent clauses with predicates of arity≥ 2. Note how similar the lists are: the first most frequent clauses in the twogrammars are identical; the second most frequent clause inGN is the thirdmost frequent clause inGT ; the third most frequent clause inGN is the sec-ond most frequent clause inGT ; 4–6 are again identical; and finally, 7–9 in


GN (NeGra) GT (TIGER)total # of clauses 468,607 1,192,807

total # of different clauses 71,868 127,154lexical clauses 52,747 92,731

non-lexical clauses 19,121 34,423

TABLE 2 Dimensions of extracted RCGs

GN are identical to, resp., 8, 10 and 7 inGT . Only 10 inGN is not in the topten list inGT , and 9 inGT is not in the top ten list inGN.

1 733 S(X1X2X3X4)→ VP(X1,X4)VAFIN(X2)NP(X3)2 271 VP(X1,X2)→ PP(X1)VVPP(X2)3 268 S(X1X2X3X4X5)→ VP(X1,X3,X5)VAFIN(X2)NP(X4)4 236 S(X1X2X3X4)→ VP(X1,X4)VAFIN(X2)PPER(X3)5 193 S(X1X2X3X4)→ VP(X1,X3)NP(X2)VAFIN(X4)6 149 NP(X1X2,X3)→ ART(X1)NN(X2)S(X3)7 148 NP(X1,X2)→ PPER(X1)VP(X2)8 142 S(X1X2X3X4)→ VP(X1,X4)VMFIN(X2)NP(X3)9 130 VP(X1,X2X3)→ VP(X1,X2)VAINF(X3)

10 127 NP(X1,X2)→ PPER(X1)S(X2)

TABLE 3 Most frequent clauses with predicates of arity≥ 2 in GN (NeGra)

1 1996 S(X1X2X3X4)→ VP(X1,X4)VAFIN(X2)NP(X3)2 790 S(X1X2X3X4X5)→ VP(X1,X3,X5)VAFIN(X2)NP(X4)3 645 VP(X1,X2)→ PP(X1)VVPP(X2)4 526 S(X1X2X3X4)→ VP(X1,X4)VAFIN(X2)PPER(X3)5 454 S(X1X2X3X4)→ VP(X1,X3)NP(X2)VAFIN(X4)6 401 NP(X1X2,X3)→ ART(X1)NN(X2)S(X3)7 378 VP(X1,X2X3)→ VP(X1,X2)VAINF(X3)8 364 NP(X1,X2)→ PPER(X1)VP(X2)9 351 PP(X1,X2)→ PROAV(X1)S(X2)

10 325 S(X1X2X3X4)→ VP(X1,X4)VMFIN(X2)NP(X3)

TABLE 4 Most frequent clauses with predicates of arity≥ 2 in GT (TIGER)

6.3.3 Degree of mild context-sensitivity

The extracted RCGs give an intuitive picture of the constituents contained inthe treebank trees: It is easy to see which subtree corresponds to a clause suchas VP(X1,X2X3)→ VP(X1,X2) VAINF(X3), and especially that two VPs withone interruption per yield each are involved. How can we classify the degree


of discontinuity (context-sensitivity respectively) of our RCGs in a preciseway?

Simple RCGs are, as already said, equivalent to LCFRSs. It follows thatthe languagesL(GT) andL(GN) of the extracted grammarsGT andGN aremildly context-sensitive. In order to make a more fine-grained statementabout the degree of mild context-sensitivity of the treebanks, the notion ofgapdegreeused in non-projective dependency parsing (Nivre, 2006, Kuhlmannand Nivre, 2006) is useful. In short, the gap degree of a dependency graphcorresponds to the maximal number of interruptions in the projection of anode.

Definition 2 [Dependency graph]D is a dependency graphfor some sen-tences = w1, . . . ,wn iff D = 〈V,E,L〉 is a labeled directed graph withV ={0, . . . ,n} with a bijection f : {w1, . . . ,wn} → V \ {0} such thatf (wi) = i,E : V×V \ {0}, andL : E→ R is a labeling function from the set of edges tosome setR of dependency types. We introduce the relation→ (dominates).We writei→ j if (i, j) ∈E.

∗→ is the reflexive transitive closure of→. The set

of nodes dominated byi is called theyieldof i. We useπi to refer to the yieldof i, arranged in ascending order.πi is called theprojectionof i. A dependencygraphD is well-formed iff it is acyclic and connected, and the in-degree of allvertices is at most 1.

Definition 3 [Gap degree] LetD = 〈V,E,L〉 be a dependency graph. Letπi

be the projection of some nodei ∈V.

1. For somei ∈ V, a gap is a pair( jk, jk+1) of nodes adjacent inπi suchthat jk+1− jk > 1, i.e., a gap is a discontinuity in the projection ofa node. Thegap degree dof a nodei in a dependency graph is thenumber of of gaps inπi ,

2. The gap degreed of a dependency graphD is the maximal gap degreeof any of its nodes.

A dependency graphD is calledprojectiveif its gap degree is 0.

PDT DDTd = 0 76.85% 84.95%d = 1 22.72% 14.89%d≥ 2 0.43% 0.16%

TABLE 5 Gap degree of graphs in two dependency treebanks

Table 5 (from Kuhlmann and Nivre (2006)) shows the gap degreefigures ofthe Prague Dependency Treebank (PDT) and the Danish Dependency Tree-bank (DDT). How can we transfer the notion of projectivity toconstituent


structures?Our extracted RCGs gives us easy access to the sentences in which ter-

minal sequences dominated by some nonterminal nodeN are interrupted bymaterial not dominated byN. Whenever there is discontinuous constituency,predicates with multiple arguments are extracted. Moreover, the number ofarguments in the predicates in question reflect the minimum number of con-nected subtrees that span the intervening substring.

Call the number of arguments of a predicate in the RHS of a clause gener-ated for some nonterminal nodeN in a treebank tree minus one theconstituentgap degree cof the clause. The constituent gap degree of a treebank tree is themaximal constituent gap degree of one its extracted clauses. Table 6 showsthe constituent gap degree figures for NeGra and TIGER.

NeGra TIGERc = 0 14,924 (72,44%) 36,573 (72,46%)c = 1 4,991 (24,23%) 12,302 (24,37%)c = 2 679 (3,30%) 1,585 (3,14%)c = 3 8 (0,04%) 14 (0,03%)

TABLE 6 Constituent gap degree of TIGER/NeGra trees

If it is assumed that the linguistic phenomena that give riseto non-projective in dependency structures in dependency treebanks are similar tothe phenomena that are described in terms of discontinuous constituents inconstituent-based treebanks, the figures for all four treebanks can be com-pared. The fact that the figures for NeGra and TIGER are closerto eachother than to the gap degree figures for the two dependency treebanks may bedue to differences between annotation guidelines or it may reflect structucaldifferences between the languages in question. The difference between thegap degree figures in the two dependency treebanks suggests the latter. Thishypothesis remains to be confirmed by an analysis of dependency versions ofthe TIGER/NeGra treebanks (Daum et al., 2004).

If the two measures are assumed to reflect exactly the same linguisticphenomena, our results indicate that discontinuous constituents are more fre-quent (modulotext types) in German than in Czech or Danish, and more fre-quent in Czech than in Danish. This result is consistent withthe literature,e.g. Kubler et al. (2006). The difference between the frequency of discontin-uous constituents in languages like Danish and German can also be shownby the ratio of translation units and discontinuous translation units in hand-aligned parallel corpora, e.g. in the Danish–English parallel corpus used inBuch-Kromann (2007) and the English–German parallel corpus used in Padoand Lapata (2006). It should be noted that these two parallelcorpora differ


considerably in size, i.e. the Danish–English parallel corpus contains 4,729sentences, whereas the English–German one only contains 650 sentences. Itshould also be noted that a discontinuous translation unit need not be a dis-continuous constituent, and a discontinuous constituent need not always betreated as a discontinuous translation unit in parallel corpora, e.g. if it trans-lates into a structurally similar translation unit. Nevertheless, the numbers inTable 7 are comparable to our results. The results are from the two parallelcorpora just mentioned.

TUs/DTUsDanish–English 1.63%English–German 7,36%

TABLE 7 Ratio of translation units (TUs) and discontinuous translation units (DTUs)in two parallel corpora

6.4 Related work6.4.1 Discontinuous phrase structure grammar

Discontinuous phrase structure grammar (DPSG) warrants a separate com-parison, since it is explicitly motivated by discontinuousconstituency andhas been used in practical applications. DPSG was introduced in Bunt et al.(1987) as an extension of context-free grammar that enablesdirect represen-tation of discontinuous elements. Plaehn (1999, 2004) presents applicationsto treebank-based parsing.

The notion of discontinuous trees ordiscotreesis central to DPSG. SeeFigure 5 for an example of a discotree (a) and two of its subtrees (b) and (c).

Essentially, DPSG represents discontinuity in some subtree T rooted atsome noder by specifying material not dominated byr alongside withrsdaughters. The DPSG productions that correspond to (b) and (c) areP→a[b]c andQ→ b[c]d. DPSG rules extracted from a tree without discontinuouselements are simply context-free rules.

(a)

S

P Q

a b c d

(b)P

a b c(c)

Q

b c d

FIGURE 5 Discotree (a) and two of its subtrees (b), (c)

Certain DPSG productions seem somewhat unintuitive from a linguistic


FIGURE 6 An extracted relative clause in TIGER

point of view. In the case of extracted relative clauses, forinstance, the relativeclause is not in any way influenced by the material that may occur betweenitself and the modified noun. Nevertheless a DPSG rule would in this caseinclude all intervening material. A corresponding RCG clause though wouldsimply separate the noun and the relative clause into two different arguments,allowing for intervening material, but not specifying it. Consider, for instance,the NeGra annotation for (5) in Figure 6.

(5) . . . und steckten alles in Brand,. . . and set everything alight

was nach staatlichen Einrichtungen aussah.what after state facilities looked

‘. . . and set everything alight what looked like state facilities.’

A DPSG rule describing the relative clause would beNP→ PIS [PP] S. An RCG clause describing the same datum would simplybe NP(X,Y)→ PIS(X)S(Y), with the immediate advantages that (i) alreadyfrom the LHS of the clause, we know that we are dealing with a constituentwith a single discontinuity and that (ii) in the RHS, we do nothave to specifyexactly what it is that separates the relative clause from its dependent.

Note also that RCG has better worst-case complexity than DPSG. Thecomplexity of the parsing algorithm in Plaehn (2004) is exponential, whilethere exists polynomial time parsing algorithms for RCG (Boullier, 2000,Villemonte de la Clergerie, 2002, Parmentier et al., 2008).

6.4.2 Linear context-free rewriting systems and multiple context-freegrammars

Simple RCGs and LCFRSs are also equivalent to multiple context-free gram-mars (MCFGs) (Seki et al., 1991). For all three theories available parsersexist.

REFERENCES/ 73

LCFRS Burden and Ljunglof (2005)MCFG Kato et al. (2006), Kanazawa (2008)RCG Parmentier et al. (2008)

Kato et al. (2006) even present algorithms for parsing and estimation ofprobabilistic MCFGs.

The three theories are merely notational variants if MCFGs are assumedto be non-erasing, i.e. a variable of a functionf must be used exactly once inthe RHS off . It is thus possible to compare these parsers, something left forfuture work for now.

6.5 Conclusion

In this paper motivation has been provided for the interpretation of two tree-banks using annotation schemata with crossing edges, namely TIGER andNeGra, as collections of simple RCG derivation structures.Such interpreta-tions also give us ready-to-use resources for extraction ofprobabilistic RCGs.The degree of mild context-sensitivity of the RCGs extracted from the twotreebanks was measured, and it was shown that our results arecomparableto related results from dependency treebanks. Nivre (2006)remarks that notmuch work has been done on parsing discontinuous structuresdirectly. Ourcurrent research involves using the extracted simple RCGs for probabilisticparsing.

Estimation and probabilistic parsing of simple RCGs is relatively simpleif the spans in the complex labels are ignored and can be done with the tech-niques for MCFGs described in Kato et al. (2006). Estimationand proba-bilistic parsing of positive RCGsin general, however, is more complicatedbecause of the copying of substrings that occurs when there are multiple oc-curences of the same variable in a clause’s RHS. What is needed, it seems,is to unravel the underlying simple derivation structures and estimate theprobabilities of the unravelled trees seperately. If the propability of a deriva-tion structure in which substrings are copiedn times is said to bep0 withpn

0 = p1× . . .× pn, tightness follows immediately.

References

Bertsch, Eberhard and Mark-Jan Nederhof. 2001. On the complexity of some exten-sions of RCG parsing. InProceedings of the 7th International Workshop on ParsingTechnologies, pages 66–77. Beijing, China.

Billot, Sylvie and Bernard Lang. 1989. The structure of shared forests in ambiguousparsing. InProceedings of the 27th Annual Meeting of the Association for Compu-tational Linguistics, pages 143–151. Vancouver, Canada.


Boullier, Pierre. 1998. Proposal for a natural language processing syntactic backbone.Rapport de Recherche RR-3342, Institut National de Recherche en Informatique eten Automatique, Le Chesnay, France.

Boullier, Pierre. 2000. Range concatenation grammars. InProceedings of the 6thInternational Workshop on Parsing Technologies, pages 53–64. Trento, Italy.

Boyd, Adriane. 2007. Discontinuity revisited: An improvedconversion to context-free representations. InProceedings of the 45th Annual Meeting of the Associationof Computational Linguistics, the Linguistic Annotation Workshop, pages 41–44.Prague, Czech Republic.

Brants, Sabine, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith.2002. The TIGER Treebank. InProceedings of the 1st Workshop on Treebanks andLinguistic Theories, pages 24–42. Sozopol, Bulgaria.

Buch-Kromann, Matthias. 2007. Computing translation units and quantifying paral-lelism in parallel dependency treebanks. InProceedings of the 45th Annual Meetingof the Association of Computational Linguistics, the Linguistic Annotation Work-shop, pages 69–76.

Bunt, Harry, Jan Thesingh, and Ko van der Sloot. 1987. Discontinuous constituentsin trees, rules and parsing. InThird Conference of the European Chapter of theAssociation of Computational Linguistics, pages 203–210. Copenhagen, Denmark.

Burden, Hakan and Peter Ljunglof. 2005. Parsing linear context-free rewriting sys-tems. InProceedings of the Ninth International Workshop on ParsingTechnology,pages 11–17. Vancouver, British Columbia.

Civit, Montserrat and M. Antonia Martı Antonın. 2002. Design principles for a Span-ish treebank. InProceedings of the 1st Workshop on Treebanks and LinguisticTheories. Sozopol, Bulgaria.

Daum, Michael, Kilian Foth, and Wolfgang Menzel. 2004. Automatic transformationof phrase treebanks to dependency trees. InProceedings of the 4th InternationalConference on Language Resources and Evaluation. Lisbon, Portugal.

Huck, Geoffrey and Almerindo Ojeda, eds. 1987.Discontinuous constituency. NewYork, New York: Academic Press.

Kallmeyer, Laura, Timm Lichte, Wolfgang Maier, Yannick Parmentier, and JohannesDellert. 2008. Developing an MCTAG for German with an RCG-based parser.In Proceedings of the 6th International Conference on Language Resources andEvaluation. Marrakech, Morocco. To appear.

Kanazawa, Makoto. 2008. A prefix-correct earley recognizerfor multiple context-freegrammars. InProceedings of the Ninth International Workshop on Tree AdjoiningGrammars and Related Formalisms (TAG+9), pages 49–56. Tubingen, Germany.

REFERENCES/ 75

Kato, Yuki, Hiroyuki Seki, and Tadao Kasami. 2006. RNA pseudoknotted structureprediction using stochastic multiple context-free grammar. IPSJ Digital Courier2:655–664.

Kubler, Sandra, Erhard W. Hinrichs, and Wolfgang Maier. 2006. Is it really that diffi-cult to parse German? InProceedings of the 2006 Conference on Empirical Meth-ods in Natural Language Processing, pages 111–119. Sydney, Australia.

Kuhlmann, Marco and Joakim Nivre. 2006. Mildly non-projective dependency struc-tures. InProceedings of the COLING/ACL 2006 Main Conference Poster Sessions,pages 507–514. Sydney, Australia: Association for Computational Linguistics.

Lichte, Timm. 2007. An MCTAG with tuples for coherent constructions in German.In Proceedings of the 12th Conference on Formal Grammar. Dublin, Ireland.

Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Buildinga large annotated corpus of English: The Penn Treebank.Computational Linguis-tics 19(2):313–330.

Muller, Stefan. 2004. Continuous or discontinuous constituents? Research on Lan-guage & Computation2(2):209–257.

Nivre, Joakim. 2006. Constraints on non-projective dependency parsing. In11thAnnual Meeting of the European Chapter of the Association for ComputationalLinguistics, pages 73–80. Trento, Italy.

Pado, Sebastian and Mirella Lapata. 2006. Optimal constituent alignment with edgecovers for semantic projection. InProceedings of the 21st International Conferenceon Computational Lingustics and the 44th Annual Meeting of the Association forComputational Linguistics, pages 1161–1168.

Parmentier, Yannick, Laura Kallmeyer, Wolfgang Maier, Timm Lichte, and JohannesDellert. 2008. TuLiPA: A syntax-semantics parsing environment for mildlycontext-sensitive formalisms. InProceedings of the Ninth International Workshopon Tree Adjoining Grammars and Related Formalisms (TAG+9). Tubingen, Ger-many.

Plaehn, Oliver. 1999.Probabilistic parsing with discontinuous phrase structure gram-mar. Diploma thesis, Dpt. of Computational Linguistics, Saarland University,Saarbrucken, Germany.

Plaehn, Oliver. 2004. Computing the most probable parse fora discontinuous phrase-structure grammar. In H. Bunt, J. Carroll, and G. Satta, eds., New Developments inParsing Technology, pages 91–106. Kluwer Academic Publishers.

Seki, Hiroyuki, Takashi Matsumura, Mamoru Fujii, and TadaoKasami. 1991. Onmultiple context-free grammars.Theoretical Computer Science88:191–229.


Skut, Wojciech, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit. 1997. An an-notation scheme for free word order languages. InProceedings of the 5th AppliedNatural Language Processing Conference, pages 88–95. Washington, District ofColumbia.

Søgaard, Anders. 2008. Range concatenation grammars for translation. InProceed-ings of the 22nd International Conference on ComputationalLinguistics. Manch-ester, England. To appear.

Telljohann, Heike, Erhard Hinrichs, Sandra Kubler, and Heike Zinsmeister. 2006.Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z). TechnischerBericht, Seminar fur Sprachwissenschaft, Universitat Tubingen, Tubingen. Revi-dierte Fassung.

Villemonte de la Clergerie, Eric. 2002. Parsing mildly context-sensitive languageswith thread automata. InProceedings of the 19th International Conference onComputational Linguistics, pages 1–7. Taipei, Taiwan.

Weir, David J. 1988.Characterizing mildly context-sensitive grammar formalisms.Ph.D. thesis, University of Pennsylvania, Philadelphia, PA.

7

Toward a Universal UnderspecifiedSemantic RepresentationMEHDI HAFEZI MANSHADI , JAMES F. ALLEN , MARY SWIFT

AbstractWe define Canonical Form Minimal Recursion Semantics (CF-MRS) and prove that

all the well-formed MRS structures generated by the MRS semantic composition algo-rithm are in this form. We prove that the qeq relationships are equivalent to outscopingrelations when MRS structures are in this form. This result fills the gap between someunderspecification formalisms and motivates defining a Canonical Form UnderspecifiedRepresentation (CF-UR) which brings those underspecification formalisms together.

Keywords UNDERSPECIFICATION, SEMANTIC FORMALISMS, M INIMAL RE-CURSIONSEMANTICS

7.1 Introduction

Several underspecification formalisms in semantic representation have beenproposed during the last two decades, such as Quasi Logical Form (Alshawiand Crouch 1992), Hole Semantics (Bos 1996 and 2002), Minimal RecursionSemantics (Copestake et al. 2001), and Dominance Constraints (Egg et al.2001). Recently there have been some efforts to bring these formalisms undera unified theory of underspecification.

Koller et al. (2003) define a back-and-forth translation between Hole Se-mantics and Dominance Constraints and show that under some specific re-strictions (chain-connectedness and leaf-labeledness) the two formalismsgenerate the same number of solutions. Under these restrictions, however, theencoding is exact. By giving an example of a grammar, they also claim thatall linguistically useful structures satisfy these restrictions.

Niehren and Thater (2003) give a translation from Minimal Recursion

77


78 / MEHDI HAFEZI MANSHADI , JAMES F. ALLEN , MARY SWIFT

Semantics (MRS) to Dominance Constraints. They define the concept ofnetsand show that when an MRS is in this form, there is a one-to-onecor-respondence between the (bounded) scope-resolved structures of an MRSand minimal solved forms of its corresponding dominance net. In their ap-proach, however, they treat MRS’s qeq relationships, whichare a restrictedversion of outscoping relations (see section 7.2 for details), as simple dom-inance (i.e. outscoping) relations. Fuchss et al. (2004) claim that in MRSnets, the additional power of qeq relationships is not necessary and replac-ing them by simple outscoping relationships does not affectthe number ofscope-resolved structures. Although experimental data (the output of EnglishResource Grammar on Redwood corpus) supports this claim, there has beenno theorem to show this equivalence rigorously. Furthermore, there are ex-amples of coherent English sentences for which the MRS structure is not anet (Thater 2007). Therefore even if we accept that this equivalence holds forall MRS nets, the notion of net is not broad enough to cover alllinguisticallywell-formed MRS structures.

In this paper we seek to prove the equivalence of qeq and outscoping re-lations for a class of MRS structures which we call CanonicalForm MRS orCF-MRS. In a recent paper, Copestake et al. (2005) give an algorithm whichapplies to the syntactic tree of a sentence to build its MRS structure. Here, wedefine the notion of canonical form and show that every well-formed MRSstructure which is generated by this algorithm is in this form. Then, we showa very useful property of canonical form: we prove that when an MRS struc-ture is in this form, Fuchss et al. (2004)’s claim about the equivalence of qeqand outscoping relations holds. However, our approach has the following twoadvantages:

. The assumption, that all the well-formed MRS structures occurring inpractice are in canonical form, is well-justified.. We rigorously prove that the equivalence of qeq and outscoping rela-tionships hold for every MRS structure which is in this form.

The notion of CF-MRS and the equivalence of qeq and dominancerelationsfor this class of MRS structures motivate the definition of a universal under-specified semantic representation which we call Canonical Form Underspec-ified Representation or CF-UR. We define the notion of CF-UR and show theback and forth translation between CF-MRS and CF-UR. We leave the de-tails of the translation between CF-UR and the other two formalisms (HoleSemantics and Dominance Constraints) for future work.

The rest of this paper is organized as follows. We review the definition ofMRS (7.2). We define CF-MRS and prove that every MRS which is gener-ated by the semantic composition algorithm is in canonical form (7.3). Wegive a formal definition of CF-MRS (7.4) and show that qeq and outscoping

TOWARD A UNIVERSAL UNDERSPECIFIEDSEMANTIC REPRESENTATION/ 79

relationships are equivalent when MRS structures are in this form (7.5). Thenotion of CF-UR is defined in section (7.6).

7.2 Minimal Recursion SemanticsElementary Predications(EP) are the basic blocks of MRS. An EP is a labeledrelation of the form

l: P(x1, x2, ..., h1, h2, ...)where l is thelabel of the EP, P is therelation, x1, x2 ... are variables ofthe object language callednon-scopal arguments(also referred to as ordinaryvariables) and h1, h2... are variables over the set of labels, calledhandle-taking argumentsor holesof the EP. We use the termhandleto include bothholes and labels.

MRS recognizes three different types of EP.Non-scopalEPs are EPs withno hole. They model first order predicates in the object language.Floating-scopalor quantifier EPs are in the form l:Q(x, hr, hb) where Q is the actualgeneralized quantifier; x is the variable quantified by Q; andhr and hb areholes for the restriction and the body of the quantifier and are referred to asrestriction holeand body holerespectively. All other EPs are calledfixed-scopalEPs. They model modal operators in the object language. The termscopalis used for both fixed and floating scopal EPs.

Consider the following bag of EPs for the exampleEvery hungry dog prob-ably chases a cat:

(1) {h1:Every(x,h7,h8), h2:Hungry(x) h2:Dog(x), h3:Probably(h9),h4:Chase(x,y), h5:A(y,h10,h11), h6:Cat(y)}

This example shows the three kinds of EP: non-scopal: Hungry(x), Dog(x),Chase(x,y), Cat(y); quantifier EPs: Every(x,h2,h3), A(y,h9, h10); and fixed-scopal EP Probably(h6). In this example, h1...h6 are labelsand h7...h11 areholes.

A group of EPs with the same label are called an EPconjunction. Forexample the EPs Dog(x) and Hungry(x) form an EP conjunction which canbe thought as the semantic fragment Dog(x)∧Hungry(x) if interpreted in thefirst order logic. Every MRS has a unique hole called(global) top handleto mark the highest EP (or EP conjunction). There is also a setof (handle)constraintsassociated with every MRS that restrict how holes are equatedwith labels. Every handle constraint (or simply constraint) relates one holeto one label and is shown as h =q l, where h is a hole and l is a label. Thishandle constraint is satisfied iff either h = l or h = l1, where l1 is the label ofa quantifier EP Q(x, h1, h2) and h2 =q l recursively holds. Handle constraintsare also calledqeq(equality modulo quantifier) relationships. In summary, anMRS structure (or simply an MRS) is a triple〈GT, R, C〉 where GT is theglobal top handle, R is a bag of EPs and C is a set of handle constraints. As


FIGURE 1 Two scope-resolved MRSs

an example, the complete MRS structure for the above sentence is:

(2) 〈 h0,{h1:Every(x,h7,h8), h2:Hungry(x) h2:Dog(x), h3:Probably( h9),h4:Chase(x,y), h5:A(y,h10,h11), h6:Cat(y)},{h0 =q h3, h7 =q h2, h9=q h4, h10 =q h6}〉

Every MRS corresponds to a set ofscope-resolvedMRSs in which everyhole is equated with some label and no label is equated with more than onehole. A scope-resolved MRS must form a tree of EPs (or EP conjunctions),in which dominance is determined byoutscoping1 relation, and must satisfyall the qeq relationships. For example the scope-resolved MRS in (3) canbe obtained from the above MRS using the equalities h0=h1, h7=h2, h8=h5,h10=h6, h11=h3 and h9=h4.

(3) {h0:Every(x,h2,h5), h2:Hungry(x) h2:Dog(x), h3:Probably(h4),h4:Chase(x,y), h5:A(y,h6, h3), h6:Cat(y)}

Scope-resolved MRSs are usually represented as tree structures. For examplethe scoped-resolved structure in (3) is represented as the tree shown in figure(1a). An MRS is calledwell-formedif it corresponds to at least one scope-resolved structure. A scope-resolved MRS is calledboundedif every non-scopal argument (i.e. x, y, ...) is in the scope of its quantifier. It can be easilyverified that the above MRS has six bounded scope-resolved structures, twoof which shown in figure (1a, b).

7.3 Semantic composition algorithmCopestake et al. (2005) give a semantic composition algorithm which convertsa syntactic tree to an MRS. In this section we introduce the notion of Canon-ical Form MRS (CF-MRS) and prove that every well-formed MRS structurethat is generated by this algorithm is in this form.

A Canonical Form MRS (CF-MRS) is an MRS which satisfies followingconditions:

1A label l (or its corresponding EP or EP conjunction)immediately outscopesa label l′ (or itscorresponding EP or EP conjunction) iff l is the label of someEP P(...h...) and h=l′. Outscopesis the reflexive transitive closure ofimmediately outscopes.


. No quantifier EP is involved in an EP conjunction.. The body hole of no quantifier EP is involved in any constraint;. The label of no quantifier EP is involved in any constraint.. Every other hole and label occurs in exactly one constraint.

Theorem 1 Every well-formed MRS structure which is generated by MRSsemantic composition algorithm is in canonical form.

To prove this theorem, we need to describe the semantic composition al-gorithm. In order to do this, we definepartial MRS to be a 4-tuple〈GT, LT,R, C〉, where LT is a new handle calledlocal top. As an initialization step,for every leaf of the syntactic tree (i.e. every word in the sentence), an MRSof the form〈h0, h1,{h2:P(...)}, {}〉 is created; where the EP P comes fromthe lexicon and label h2 is a new distinct label. If P is floating scopal, h1 is anew distinct handle; otherwise, h1 = h2. Note that h0, the global top handle,would be the same for all the partial MRSs which are built during the seman-tic composition process. For example consider the sentenceEvery hungry dogfrequently barks. (4) shows the partial MRS built for the floating scopalEv-ery, fixed-scopalProbablyand non-scopal EPBark.

(4) 〈h0,h1,{h2:Every(x, h3, h4)},{}〉〈h0, h5,{h5: Frequently(h6)}, {}〉〈h0, h7,{h7:Bark(x)}, {}〉

Once a partial MRS is created for every leaf, the semantic composition algo-rithm moves up in the syntactic tree and for every interior node assigns thecombination of its children’s partial MRS to that node. There are two kinds ofMRS combinations: scopal and intersective. Consider the two partial MRSsm1=〈h0, lt1, R1, C1〉 and m2=〈h0, lt2, R2, C2〉 and let m =〈h0, lt, R, C〉 betheir combination. If m1 has a scopal EP P(..., h, ...) which scopes over an EPin m2, the combination of m1 and m2 is ascopal combination, defined as:

(5) lt = lt1R = R1 + R2C = C1∪ C2∪ {h =q lt2}

Otherwise it is anintersective combination:

(6) lt = lt1 = lt2R = R1 + R2C = C1∪ C2

where h is the hole of the scopal EP in m1 and + means append. Thedef-initions can easily be extended to the case where more than two MRSs arecombined. It should be noted that the body hole of the quantifier EPs is ig-nored during semantic composition; that is no handle constraint for the body


FIGURE 2 Semantic composition process for the sentenceEvery hungry dogfrequently barks.

of the quantifier EPs is created. Figure (2) shows how the partial MRS foreach interior node is built using scopal and intersective combinations.

Once the algorithm gets to the root, if the partial MRS for theroot is〈h0,h1, R, C〉, it outputs〈h0, R, C∪ {h0 =q h1}〉 as the final MRS for the wholesentence. (7) shows the final MRS built for the above example.

(7) 〈h0,{h2: Every(x,h3,h4),h5: Hungry(x), h5: Dog(x), h1: Frequently(h8),h9:Bark(x)}, {h0 =q h1, h3 =q h5, h8 =q h9}〉

Let M be the output of the semantic composition process for anarbitrarysentence with the syntactic tree T. Here we sketch the proof of theorem 1 bystating following propositions. We leave the detailed proof of this theorem tothe longer version of this paper.

Proposition 2 No quantifier EP in M is involved in an EP conjunction.

Proposition 3 Every hole in M is involved in at most one handle constraint.Furthermore, there is no handle constraint for the body holeof the quantifierEPs.

Proposition 4 Quantifier EP labels in M are not involved in any handle con-straint. Every other label is involved in exactly one constraint.

Proposition 2 and 3 directly result from the definition of thealgorithm. Toprove proposition 4, first we show the following lemma.

Lemma 5 The local top of every node in the tree is involved in exactly onehandle constraint in M.

Proof. Consider a node v in the tree with the local top ltv and the parent u.From the definition of combination, either ltv is involved in a handle con-straint in Mu (the MRS of the parent) or it is the local top of u as well. Thesame argument can be applied to the node u. If ltv is not involved in any han-dle constraint until we hit the root, then the termination step of the algorithmadds the constraint h0=q ltv. This proves that every local top is involved in at


least one constraint. On the other hand, every local top can occur in at mostone constraint because once a local top gets into a handle constraint, it cannotbe the local top of the parent or any of its ancestors. ⊔⊓

From the above lemma and the fact that the labels of all EPs except quan-tifier EPs are a local top at the leaf level (refer to the initialization step of thealgorithm) and that the quantifier EP labels are never a localtop, proposition4 is proved.

From propositions 2, 3 and 4, we can see that, the number of holes isgreater than or equal to the number of labels in M (note that wecount theglobal top handle as a hole). On the other hand, in every well-formed MRSstructure number of labels is always greater than or equal tothe number ofholes.2 These two facts lead us to the following proposition:

Proposition 6 The number of holes is equal to number of labels in M or Mis not well-formed.

In conjunction with proposition 4 and 6, proposition 3 results in the fol-lowing corollary:

Corollary 7 If M is well-formed, every hole which is not the body hole of aquantifier is involved in exactly one constraint in M.

Theorem 1 directly results from propositions 2, 3, and 4 and corollary 7.3

Although not mentioned in Copestake et al. (2005) it seems that in prac-tice a slightly modified version of this algorithm is used, where in the scopalcombination an equality constraint can be added instead of anormal han-dle constraint. This allows labels to occur as an argument ofa scopal EP in anMRS. That is, an MRS can have two EPs of the form l1:P1(..h..) and l2:P2(...)with an equality h = l2. In this case, we can collapse the two EPs into one EPwhose arguments are the union of the arguments of the two EPs excludingthe hole h. For example the EPs l1:P1(x, h1, l2) and l2:P2(y, h3, h4) can betransformed into a single EP l1:P1-2(x, y, h1, h3, h4). Whilethis transforma-tion does not affect the number of interpretations, it conforms to theorem 1and lets us keep the same definition for CF-MRS.

2this is true because in order to build a scope-resolved structure, every hole must be equatedwith some label and no label can be equated with more than one hole.

3We have made an implicit assumption in proving theorem 1 which needs to be clarified here.We have assumed that in the scopal combination (equation 5) lt2 is always a label; however this isnot the case if the grammar is not linguistically meaningful(for example consider the case wherewe have a quantifier EPEveryas a leaf node and a quantifier EPSomeas its parent which scopesoverEvery). However in this case, in the final MRS M, there is a handle constraint between twohandles where none of them is a label in M. This contradicts the definition of handle constraintswhich are required to relate one hole to one label; hence suchan MRS can be considered anill-formed MRS structure. As a result, theorem 1 remains valid even when the grammar is notlinguistically meaningful.


7.4 Canonical Form MRS

In section 7.3, we defined the notion of canonical form MRS as asubset ofMRS structures; however the definition of MRS given in section 7.2, fol-lowing the strategy of standard MRS formalism, was not formal and precise.Here, in order to give a rigorous proof for the equivalence ofqeq and outscop-ing relationship in CF-MRS, we need a mathematically clean definition of allthe concepts. Therefore, in this section we define the notionof CF-MRS moreformally as an independent concept. Note that most of the concepts, whichwere already defined in section 7.2, are redefined in this section in a moreformal fashion.

Definition 1 A CF-MRSis a triple〈GT, R, C〉 where R is asetof EPs asdefined in section 7.2; GT is a unique hole which does not occurin any ar-gument position in R4 ; and C, the set of(handle) constraints, is a bijectionfrom H-Hb to L-Lq, in which H and L are the set of all the holes and labels inR respectively; Hb is the set of body holes of the quantifiers; and Lq is the setof labels of the quantifier EPs. We require that every label and hole (exceptGT) occurs exactly once in R.5

In order to get a more intuitive representation, we introduce a graph rep-resentation for CF-MRS. The graph of a CF-MRS is a directed graph withtwo types of node and two types of edge. Every label/hole in the CF-MRS isrepresented as a single label/hole node in the graph. Solid edges connect thelabel of EPs to their holes and dotted edges represent handleconstraints; thatis every handle constraint (h, l) in C is represented as a dotted edge from holenode h to label node l. As an example, figure (3) represents thegraph repre-sentation of the CF-MRS given in (8) for the sentenceEvery dog probablychases some cat.6

(8) 〈h0, {l1: Every( x, h1, h2), l2: Dog(x), l3: Probably(h3), l4: Chase(x,y), l5: Some(y, h4, h5), l6: Cat(y)}, {(h0,l3), (h1,l2), (h3,l4), (h4,l6)}〉

As shown in this figure, every label node in the graph is labeled with itscorresponding EP, dropping the label and the handle-takingarguments whenthere is no ambiguity. Note that in the graphical representation, we order thesolid outgoing edges of every node from left to right based onthe position

4Note that byhole we mean a variable over the set of labels; therefore the set ofholes of aCF-MRS is the set of handle-taking arguments plus the globaltop GT.

5This is not a limitation in CF-MRS as it is always possible to collapse the set of all the EPswhich share the same label (i.e. an EP conjunction) to a single EP whose arguments is the unionof all the arguments of all the EPs in the conjunction. For example the EPs l1:P1(x1, h1) andl2:P2(x2, h2) can be collapsed into one EP of the form P1-2(x1, x2, h1, h2) where P1-2 is a newrelation corresponding to the conjunction of the two relations P1 and P2.

6We have removed the arrows from the edges in the graphical representation of CF-MRS andscope-resolved structures throughout this paper, since the direction is clear from the context.


FIGURE 3 An MRS graph

of the corresponding handle-taking argument in the EP; for example the re-striction hole of a quantifier always lies on the left side of its body hole in thegraphical representation.

Definition 2 Every bijection from H, the set of holes, to L, the set of labels,is called alabel assignment.

Definition 3 Given a CF-MRS M, ascope-resolvedstructure for M is thepair 〈M, I〉 where I is a label assignment which satisfies all the constraints inM.

Here, we give two different interpretations for a handle constraint. Thefirst interpretation is the standard definition of handle constraints in MRS i.e.qeq relationships. As before, when interpreted as qeq relationship, a handleconstrain (h, l) in C is represented as h =q l. A label assignment I satisfies thisqeq relationship iff either I(h)=l or I(h)=l′, where l′ is the label of a quantifierEP Q(x, hr, hb) and I recursively satisfies hb =q l. The second interpretationof a handle constraint isoutscopingrelation, that is I satisfies the constraint(h, l) (shown as h≤ l for this case) iff either I(h)=l or I(h)=l′ where l′ is thelabel of some EP P(...h′...) and I recursively satisfies h′≤l. To differentiatebetween the two possible definitions of a scope-resolved structure, we definethe two following versions of a scope-resolved structure.

Definition 4 We call a scope-resolved structurestandardwhen handle con-straints are treated as qeq relationships and call itsimplewhen they are con-sidered outscoping relations.

The graph of a scope-resolved structure is built by removingall the dottededges in the original graph and merging every hole node h withthe label nodeI(h). More precisely, the graph of a scope-resolved structure 〈M, I〉 is G=(V,E) where there is exactly one node v in V corresponding to every hole hv

in H and its corresponding label lv=I(hv). Here, for the benefit of the section7.5 and to emphasize that every node v in V corresponds to exactly one holeand one label of M, we represent the vertices of G as big circles with a dotin center. For example, figure (4) gives two scope-resolved structures for theCF-MRS given in (8). From the above definitions, it is easy to see that the


FIGURE 4 Two scope-resolved structures

FIGURE 5 A simple butnot standard scope-resolved structure

graph of a scope-resolved structure for every CF-MRS is always a tree.7 Inthis paper, whenever we refer to the holes and labels of a scope-resolved treeor its subtrees we mean the holes and labels in M which correspond to thenodes of that tree/subtree. For example, the holes and the labels of the subtreerooted at the nodeProbablyin figure (4a) are h5, h3 and l3, l4 respectively(refer to the CF-MRS given in (8)).

Note that both of the trees shown in figure (4) are both standard and sim-ple scope-resolved structures. In section 7.5, we show thatthis is not a co-incidence but for every CF-MRS structure this property holds; that is everysimple scope-resolved structure is also a standard one and vice versa. Figure(5a,b) on the other hand shows the graph representation of anMRS (which isnot a CF-MRS) and one of its simple scope-resolved structures that is not astandard one.8

Definition 5 A non-quantifier EP l1:P(...) is said to bedependenton thequantifier EP l2:Q(x, hr, hb) iff x is an argument of P. We say that a scope-resolved MRS M satisfies thisdependency constraintiff l2 outscopes l1 in

7Since we order the outgoing solid edges of every label node inthe CF-MRS graph, the graphof a scope-resolved structure is actually anorderedtree.

8Note that the graph representation and the concepts of simple and standard scope-resolvedstructures for a general MRS can be defined exactly in the sameway as they were defined forCF-MRS.


FIGURE 6 General structure of CF-MRS

M.9

Definition 6 A scope-resolved structure is calledboundedif it satisfies allthe dependency constraints carried by the non-scopal arguments of the EPs.

As the final point in this section, note that from the definition of CF-MRSevery CF-MRS is a forest of exactly n+1 trees (n is the number of quantifiers)whose roots are the global top handle and the quantifier labels, as shown infigure (6).

7.5 Equivalence of qeq and outscoping relationships in CF-MRSTo prove this equivalence we need to prove the following theorem:

Theorem 8 Given an arbitrary CF-MRS M=〈h0, R, C〉, T=〈M, I〉 is a simplescope-resolved structure if and only if T is a standard scope-resolved struc-ture.

The if direction is trivial as qeq relation always implies outscoping. In orderto prove the only if direction, we use the following lemma:

Lemma 9 Let T=〈M, I〉 be a simple scope-resolved structure and T′ be asubtree10 of T with no quantifier’s body hole. For every hole h in T′ we haveI(h) = C(h)11 .

Proof. We prove this using induction on the depth of T′, d. If d = 0, T′ is asingle leaf node u of T which corresponds to some hole h in M. Because h isnot the body hole of some quantifier, there is some label l suchthat C(h)=l orequivalently h≤l. Since u is a leaf node, this constraint is satisfied in T onlyif I(h)=l, which implies I(h)=C(h).

Now suppose that d>0, let u be the root of T′; u1, u2, ... uk be the childrenof u and T1, T2, ...Tk be the subtrees of T rooted at u1 ... uk (figure 7). Basedon the induction assumption, for every hole h′ in T1 ... Tk, we have I(h′)=

9Refer to footnote 1 for the definition of outscoping.10In this paper, by subtree of a tree T we mean a node withall of its descendants in T.11Since C, the set of constraints, is a function; if (h, l)∈ C, we can refer to l as C(h).


FIGURE 7 Inductive proof of lemma 9

FIGURE 8 Inductive proof of theorem 8

C(h′). Therefore, all we need to show is that I(hu)=C(hu), where hu is thehole corresponding to the node u. Assume to the contrary thatI(hu) 6=C(hu).Because hu is not the body hole of some quantifier, there is some label l, suchthat C(hu)=l. In order to satisfy the constraint hu≤l, there must be some holeh′ in one of the subtrees T1 ... Tk such that I(h′)=l. But we already saw thatfor every h′ in these subtrees I(h′)=C(h′). This implies C(h′)=I(h′)=l=C(hu)which is a contradiction because C is a one-to-one function. ⊔⊓

Proof of the main theorem: Let T be a simple scope-resolved structureT=〈M, I〉; using induction on n, number of quantifiers, we show that T isastandard scope-resolved structure as well. For n = 0, there is no quantifierin T, therefore according to lemma 9 for every hole h in T, I(h)=C(h) whichmeans if we treat all the constraints as qeq relationship, I satisfies all theseconstraints, hence T is also a standard scope-resolved structure.

Now let n>0, and consider an arbitrary CF-MRS M with n quantifiers asit is shown in figure (6). There is a quantifier node in T=〈M, I〉 that doesnot outscope any other quantifiers (for example the deepest quantifier in T).Without loss of generality let’s assume that Qn has this property and call thetree rooted at Qn in T Tn. We call the tree rooted at the left child of Qn in Tthe restriction tree and the tree rooted at the right child ofQn the body tree ofQn and represent them by Tr and Tb respectively (figure 8).


There is no quantifier in Tr , intuitively it means that Tr is the tree tn (fig-ure 6) in which every hole is merged with its paired label. More precisely,according to lemma 9, for every hole h in Tr , I(h)=C(h). Therefore, if wetreat all the constraints in tn (refer to figure 6) as qeq relationships, these con-straints are satisfied in Tr . Let’s detach the tree Tn from T, replace it withTb and call the new tree T′ (figure 8). It is easy to see that T′ is a simplescope-resolved structure for the CF-MRS M′ with n-1 quantifier (that is theCF-MRS shown in figure (6) without the whole tree rooted at Qn). To seewhy, first note that there is a one-to-one correspondence between the nodesof T′ and the holes/labels in M′. Second, every handle constraint in M′ is sat-isfied in T, and hence is satisfied in T′ as well (because the transformation infigure 8 does not violate anyoutscopingrelation).

T′ is a simple scope-resolved structure for M′; therefore based on the in-duction assumption, T′ is also a standard scope-resolved structure. It meansthat if we treat all the handle constraints in M′ as qeq relationship, they areall satisfied in T′. But moving from T′ to T (by replacing back the node Qn

and the subtree Tr , see figure 8) does not violate any qeq relationship whichalready holds in T′. On the other hand we already saw that if we treat all thehandle constraints in tn as qeq relationship they are satisfied in Tr . As a resultif we treat all the handle constraints in M as qeq relationships, they are allsatisfied in T. Hence T is also a standard scope-resolved structure.

7.6 Canonical Form Underspecified Representation

This result motivates a universalCanonical Form Underspecified Represen-tation (CF-UR) similar to the CF-MRS structure in figure (6). In CF-MRS,however, the dependency constraints are encoded in the non-scopal argumentswhile in Hole Semantics and Dominance Constraints, all the constraints areexplicitly expressed in the underspecified representationusing outscopingconstraints. The equivalence of qeq and outscoping relationship in CF-MRSallows us to do the same thing in CF-UR. We can remove all the non-scopalarguments and represent dependency constraints using outscoping constraintsbetween the label of every quantifier EP and the label of the non-quantifierEPs which are dependent on that quantifier. Figure (9) shows the graph repre-sentation of the CF-MRS given in (8), in which both handle anddependencyconstraints are shown using dotted edges.

In addition, we label the dependency edges (i.e. the edges which corre-spond to the dependency constraints) with integers. This isnecessary in orderto keep the information that states which argument positionin an EP is filledby which variable. In this example, the edge betweenEveryandChaseis la-beled 1 which shows that the variable quantified byEvery, say x, fills the firstnon-scopal argument position of the EPChase. In general, instead of num-


FIGURE 9 CF-MRS with explicit dependency constraints

bers we can label the dependency edges using a set of predefined roles. Forexample, the edge betweenEveryandChasecan be labeled by the roleagentand the edge betweenSomeandChasecan be labeled by the roletheme. Forsimilar reasons, the edges fromEveryto Dog and fromSometo Cat are nec-essary as they encode the non-scopal arguments of each predicate. However,since these predicates have only one argument we haven’t shown the integerlabel for these two edges.

More formally we define a CF-UR as a 6-tuple〈L, H, F, T, C, A〉 whereL is a set oflabels; H is a set of variables over labels calledholes; F is a setof labeled formulasconsisting of two types: thepredicationsof form li :Pi (h1,h2, ... hk) and thequantificationsof form l′i :Qi(h′1, h′2), where li , l′i ∈ L andh1, h2, ...hk, h′1, h′2 ∈ H; and T is a unique hole in H calledtop which doesnot occur in any argument position in F. We require that everylabel and everyhole (except T) occurs exactly once in F. Therefore in a CF-UR, no two for-mulas can be labeled by the same label and no two argument positions can befilled by the same hole. We define LQ as the set of all the labels which labelsome quantification in F and LP as the set of all other labels. We also defineHb as the set of all the holes which occur as the second argument position ofsome quantification (calledbody holes) and HC as the set of all other holes. Cis a relation over H U L and L, called the set of constraints; More precisely,C=CHUCL where CH is a bijection from HC to LP and CL is a relation overLQ and LP. Intuitively CH and CL are equivalent to the set of handle and de-pendency constraints in a CF-MRS respectively. Finally, A is a total functionfrom CL to ROLES, where ROLES is a set of predefined roles (such as{agent,theme, ...}). Intuitively, A specifies the role of the argument positionwhich isencoded by every dependency constraint. ROLES can also be defined as theset of positive integers. In this case, A represents the argument position thatevery dependency constraint encodes and we need to force thecondition that((l1, l2), i) ∈ A, only if for every 0<j<i, there is some label l such that ( (l,l2), j) ∈ A.

In order to have a more intuitive representation, we usuallyrepresent a CF-UR U=〈L, H, F, T, C, A〉 as a directed graph GU=(L∪H, S∪C) where S is the


FIGURE 10 Graphical representation of CF-UR

set of all the pairs (l, h) such that h is an argument of the formula labeled byl. The nodes in L and H are represented as dots and holes respectively and theedges in S and C are represented using solid and dotted edges respectively.We label the dependency edges (i.e. label to label dotted edges) by their cor-responding role (or argument position) specified by the function A. We alsoorder the outgoing solid edges of every label node from left to right based onthe position of the hole (in the corresponding labeled formula), to which theedge is connected. For example, the graph shown in figure (10)is the graphi-cal representation of the CF-UR in (9) in which for the purpose of clarity welabeled every label node with its corresponding labeled formula.

(9) U=〈{l1, l2, l3, l4, l5, l6}, {h0, h1, h2, h3, h4, h5}, {l1:Every(h1,h2), l2:Dog, l3:Probably(h3), l4:Chase, l5:Some(h4, h5),l6:Cat}, h0,{h0≤l3, h1≤l2, h3≤l4, h4≤l6, l1≤l2, l1≤l4, l5≤l4, l5≤l6 }, {((l1,l2), of), ((l1, l4), agent), ((l5, l4), theme), ((l5, l6), of)}〉

Note that in this example, we have assumed ROLES is a set of predefinedroles which includes the three rolesagent, themeand of. As shown in (9)we usually represent the ordered pairs (x, y) in C as x≤y. From the abovedefinitions, it is easy to see that the labels in LQ and the hole T are the onlyroots (i.e. nodes with no incoming edge) of a CF-UR graph.Given a CF-UR, any bijection from H to L is called a label assignment. Alabel assignment I satisfies a constraint x≤y iff

. when x is a hole and y is a label: either I(x)=y or I(x)=z and I recursivelysatisfies z≤y;. when x and y are both labels: either x=y or x:Fi(...z..) is in F and Irecursively satisfies z≤y;

A tuple 〈U, I〉 where U is a CF-UR and I is a label assignment whichsatisfies all the constraints in C is called asolutionof U. As with the CF-MRS sometimes we use a graphical representation to represent a solution.The graph of a solution〈U, I〉 is built by taking the graph of the CF-UR andmerging every hole node h with the label node I(h).

For example, the CF-UR in (9) has 6 possible solutions, two ofwhich are


FIGURE 11 Two solutions for the CF-UR given in figure (9)

shown in figure (11a,b)12 , one with I ={(h0, l1), (h1, l2), (h2, l5), (h3, l4),(h4, l6), (h5, l3)} and one with I ={(h0, l5), (h1, l2), (h2, l3), (h3, l1), (h4,l6), (h5, l4)}.

We have shown how a CF-MRS can be converted to a CF-UR. The inverseis straightforward. The corresponding CF-MRS of a CF-UR U=〈L, H, F, T,C, A〉 is a tuple MU= 〈T, R, C′〉 in which R is the set of all the EPs of the forml i :Qi(xi, hri, hbi) where li :Qi(hri, hbi) is a quantification in F and EPs of theform lk:Pk(xi, x j , ..., h1, h2, ...) where lk:Pk(h1, h2, ...) is a predication in F;and xi is an argument of the EP lk:Pk(...) if and only if li≤lk is in C. Finally C′

is a subset of C which includes all the hole to label (but not any label to label)constraints in C. Every (h, l) (or equivalently h≤l) in C is represented as h =ql in C′. Trivially M is a CF-MRS; hence the qeq relationships are equivalent tooutscoping constraints in M. Using this fact, it can be easily seen that everybounded scope-resolved structure of M corresponds to exactly one distinctsolution of U and vice versa. As a result, there is a one-to-one correspondencebetween the bounded scope-resolved structures of M and the solutions of U.

In a forthcoming paper we show the back and forth translationbetweenCF-UR and the two other formalisms, Hole Semantics and Dominance Con-straints.

7.7 Conclusion

We defined Canonical Form MRS and showed that the every well-formedMRS which is generated by this algorithm is in this form. We have shown forCF- MRS, qeq relationship is equivalent to dominance relationship. Based onthis result we have proposed a universal underspecified representation, calledCanonical Form Underspecified Representation or CF-UR. In aforthcomingpaper, we will show how this single representation can be translated back andforth between the other semantic formalisms such as Dominance Constraintsand Hole Semantics.

12Although dependency edges are part of the graph, for the purpose of clarity, we have onlyshown the solid edges in these figures.

REFERENCES/ 93

AcknowledgmentThis work was supported in part by a grant from National Science Foundation(#0748942), by the Defense Advanced Research Projects Agency (DARPA)under Contract No. FA8750-07-D-0185, and the Office of NavalResearchgrant No. N000140510314.

ReferencesAlshawi, H. and R. Crouch. 1992. Monotonic semantic interpretation. InProc. 30th

ACL, pages 32–39.

Bos, J. 1996. Predicate logic unplugged. InProc. 10th Amsterdam Colloquium, pages133–143.

Bos, J. 2002.Underspecification and resolution in discourse semantics. PhD Thesis.Saarland University.

Copestake, A., D. Flickinger, C. Pollard, and I. Sag. 2005. Minimal Recursion Se-mantics: An Introduction.Research,on Language and Computation3:281–332.

Copestake, A., A. Lascarides, and D. Flickinger. 2001. An Algebra for SemanticConstruction in Constraint-Based Grammars. InACL-01 Toulouse, France.

Egg, M., A. Koller, and J. Niehren. 2001. The constraint language for lambda struc-tures.Journal of Logic, Language, and Information10:457–485.

Fuchss, R., A. Koller, J. Niehren, and S. Thater. 2004. Minimal Recursion Semanticsas Dominance Constraints: Translation, Evaluation, and Analysis. InProc. ACL-04Barcelona, Spain, pages 247–254.

Koller, A., J. Niehren, and S. Thater. 2003. Bridging the gapbetween underspecifica-tion formalisms: Hole semantics as dominance constraints.In EACL-03.

Niehren, J. and S. Thater. 2003. Bridging the Gap Between Underspecification For-malisms: Minimal Recursion Semantics as Dominance Constraints. InACL-03.

Thater, S. 2007.Bridging the Gap Between Underspecification Formalisms: Mini-mal Recursion Semantics as Dominance Constraints. PhD Thesis. Universitat desSaarlandes.

8

Inessential Features and ExpressivePower of Descriptive MetalanguagesGEOFFREY K. PULLUM AND HANS-JORG TIEDE

AbstractLinguists employ a variety of features, ranging from traditional such morphosyntactic

features like those encoding person or number to more recentinventions encoding barlevel and gap locations. Linguists feel intuitively that there is a distinction between (i)real features reflecting genuine properties of languages and (ii) formal tricks exploitingthe feature machinery. Our thesis in this chapter is that this issue is trickier and moresubtle than might be thought. Notions like ‘spurious feature distinction’ or ‘artifact ofthe descriptive machinery are not really well-defined. There is a very close relationshipbetween expressiveness of the formal metalanguage and necessity of particular features:in a fairly precise sense captured by a theorem, the more expressive the descriptive met-alanguage employed, the smaller the number of features thatneed to be posited.

Keywords MODEL THEORETIC SYNTAX , FEATURES, METHODOLOGY OF

L INGUISTICS

8.1 Introduction

It is natural enough for linguists to think that the featuresthey posit in de-scriptions of natural languages are genuine, not spurious —that they reflectaspects of the subject matter rather than aspects of the machinery invoked indevising the description or the linguistic theory.

Having seen features likeCASE, GENDER, NUMBER, PERSON, andTENSE

used repeatedly in describing hundreds of languages, we feel that they havesome inherent connection with the way human languages work,rather thanwith the way human linguists work. And anyone who has attempted to de-scribe English syntax would probably feel that a featureAUX (distinguishingthe verbs that take-n’t from those that don’t) andWH (distinguishing the rel-

95


96 / GEOFFREYK. PULLUM AND HANS-JORG TIEDE

ative and interrogative pronouns from the others) also drawreal rather thanartifactual distinctions.

But linguists don’t always feel this way about all of the richarray of fea-tures posited in current or past work on syntax, and certainly not about de-vices such as theDOOM feature used by Postal (1970) to mark noun phrasestargeted for erasure later in the derivation, or the ‘[±F]’ annotations that haveoften been used to draw ad hoc distinctions among constituents with differingbehaviours.

What is the basis of the feeling that we can tell a spurious feature froma genuine one? Generalised phrase structure grammar (GPSG)and head-driven phrase structure grammar (HPSG), for example, positfeatures suchasSLASH, marking constituents containing ‘gaps’;BAR, indicating the ‘barlevel’ of phrasal constituents;SUBCAT, coding the subcategorisation of lexi-cal heads according to the complements they select; and so on. These do notnecessarily strike linguists as having the same kind of status as more tradi-tional features. How can we tell when we are looking at spurious, artifactual,or superfluous features, features that should not be countedamong the gen-uine ones reflecting attributes natural languages really have.

Our claim here is that this issue is trickier and more subtle than mighthave been thought. We argue that notions like ‘spurious feature distinction’or ‘artifact of the descriptive machinery’ are not really well-defined. Thismeans that linguists’ feelings of distaste or acceptance for particular featuresmay be based in nothing more solid than personal prejudice.

8.2 Model-theoretic syntax

We make our argument via techniques from model-theoretic syntax. We willuse trees formalised as relational structures. We briefly review the basics inthe following section.

8.2.1 Logics on trees

To illustrate how we formalise trees as relational structures, we consider thephrase structure tree in (8.1).

(8.1) Sb

bb"

""NP@@��

D

this

N

job

VP

V

stinks

EXPRESSIVEPOWEROFDESCRIPTIVEMETALANGUAGES / 97

The domain has 9 nodes. It is convenient to take them to be not just atomicelements but rather addresses of nodes — descriptions of positions relativeto the root — represented by strings. The root will have the empty string (ε)as its address; ‘0’ will be the address of the left child of theroot and ‘1’ theaddress of the right child, and we go on down: ‘01’ for the right child of theleft child of the root, ‘010’ for the left child of that, and soon.

(8.2) εb

bb"

""0ll,,

00

000

01

010

1

10

100

Each node in a binary branching tree will be a string over{0,1}; and sinceno string ever appears at more than one node — each string over{0,1} cor-responds to a unique address — we can equate trees with sets ofsuch strings.

Of course, not every set of strings over{0,1} represents a tree; but the twoconditions we need to impose to guarantee that such a set doescorrespondto a tree are remarkably simple. Simplifying by assuming (with much recentliterature) that all branching is binary (see Rogers 1998:19 for a more gen-eral statement, which also is very simple), the conditions defining binary treedomains are these:

(8.3) A binary tree domainis a setT ⊆ {0,1}∗, such that(a) if uv is in T, then so isu, and(b) if u1 is inT, then so isu0.

Part (a) requires any node to have all of its ancestors in the domain, andpart (b) requires all right branches to have left siblings. This defines binarytree domains in terms quite independent of both the diagramswith whichwe depict trees and the generative grammars that linguists often assume forgenerating them, though thus far the result looks a bit unfamiliar.

We now define abinary tree structure as a triple(T,R↓,R→), whereT isa binary tree domain,R↓ is the child-of relation (we use androgynous kinshipterminology for relations between nodes: ‘parent’, ‘child’, and ‘sibling’); i.e.,(n,m) ∈ R↓ iff m= n0 or m= n1, andR→ is the left-sibling-of relation, i.e.(m,n) ∈ R→ iff m= s0 andn = s1 for somes.

So the tree in (8.2) would be formalised as a tree structure triple,(T,R↓,R→),whereT is the set{ε,0,1,00,01,10,000,010,100}and, for example, the pair(10,100) stands in theR↓ relation and the pair(00,01) stands in theR→relation.


8.2.2 Modal logic metalanguages

To talk about trees we will usepropositional modal logics, an idea that goesback to Gazdar et al. (1988); see Blackburn et al. (1993) for amuch more thor-oughgoing development of the idea, and Moss and Tiede (2006)for a detailedand up-to-date survey. Modal logics provide a very tightly restricted way ofdescribing the structure of trees, and their relationshipswith and translationsinto other formalisms are beginning to be very well understood. Though theyoriginate in philosophical efforts to understand the notions of necessity andpossibility, they are best seen much more generally, as languages for describ-ing relational structures (such as trees) from a local and internal perspective.Blackburn et al. (2001:xii) observe that “the reader who pictures a modal for-mula as a little automaton standing at some state in a relational structure, andonly permitted to explore the structure by making journeys to neighbouringstates, will have grasped one of the key intuitions of modal model theory.”The linguist can read ‘state’ as ‘node’ and ‘relational structure’ as ‘tree’.

The key idea is to represent the labels attached to nodes in trees by atomicpropositional formulae of the logic. In other interpretations of modal logicthese would be propositions true at particular worlds in a set of worlds un-der an accessibility relation; here they simply represent labelings present atcertain points in the model.

Before we formalise labelled trees in these terms, we need tofix the syntaxof the logics that we will be considering. We can use a very simple and basickind of modal logicLB as a reference point.LB has a set of atomic formulaecorresponding to syntactic categories (we can assume this is finite), and justtwo modalities,〈→〉 and〈↓〉. The semantics is such that a formula〈↓〉ϕ istrue at a nodev iff ϕ is true at a child ofv (so it means ‘v has aϕ-labelledchild’) and〈→〉ϕ is true at a nodev iff ϕ is true at a right sibling ofv.LB can be used to say some of the things about trees that could be guaran-

teed by a context-free grammar. Consider the two formulae in(8.4):

(8.4) a.PP⇒ 〈↓〉P

b. 〈↓〉during ⇒ (P∧〈→〉NP)

We use ‘⇒’ for material implication, definedϕ⇒ψ≡¬(ϕ∧¬(ψ)). So what(8.4a) says is that a node wherePP is true has a child whereP is true; that is,it expresses the claim of X-bar theory that every PP node mustimmediatelydominate a Preposition node. And what (8.4b) says is that a node that has achild labelledduring is labelled P and has a right sibling labelled NP — asubcategorisation statement for the prepositionduring.

However, in generalLB is too weak to permit much interesting progresstoward the description of human languages, so we need to consider strongerlogics.


Three modal logics of increasing expressive strength have been a partic-ularly focus of attention in the context of model-theoreticsyntax. They areknown asLcore, Lcp and PDLtree, respectively. All of them are less expressivethan wMSO.

For our purposes here, it will be sufficient to concentrate onLcore andLcp. (For a detailed discussion of PDLtree, which is increasingly importantfor reasons relating to the rise of XML as a data representation language, seeAfanasiev et al. 2005.) The syntax of formulae for theLcoreandLcp languagesis defined as follows:

(8.5) Basic syntax forLcore andLcp

a. any atomic proposition is a formula;b. ¬(ϕ) is a formula ifϕ is;c. ϕ∧ψ is a formula ifϕ andψ are;d. a formula prefixed by a modal operator is a formula.

In addition we will use ‘⊤’ for a dummy proposition that is always true (itcould be defined as¬(a∧¬(a)) or in some similar way; its utility will becomeclear below).

The logicsLcore andLcp differ only with respect to the modal operatorsthey employ. These operators are logically akin to the diamond operators thatrepresent possibility in the alethic modal logics familiarfrom formal seman-tics. Each is written in the form〈π〉 (the angle brackets ‘〈〉’ are intended toconvey a visual suggestion of the diamond ‘3’). The ‘box’ modalities areused as well, and as usual they are defined in terms of the diamond ones:[π]ϕ(with the square brackets visually suggesting the box�) abbreviates¬〈π〉¬ϕ.Lcore is a logic with eight modal operators, the two inLB plus their in-

verses, and operators corresponding to the ancestrals of all four:

(8.6) Lcore modal operators: → ← ↑ ↓ →∗ ←∗ ↑∗ ↓∗

Lcore permits not only statements like〈↓〉ϕ, meaning that at one dominancestep down there is a node whereϕ holds, but also〈↓∗〉ϕ, which correspondsto the ancestral of the relation that↓ corresponds to: it means that there issome finite numberk such thatk dominance steps down there is a node whereϕ holds. Thus〈↓∗〉ϕ means that eitherϕ holds (k = 0), or〈↓〉ϕ holds (k = 1),or 〈↓〉〈↓〉ϕ holds (k = 2), and so on for allk≥ 0.

The logicLcp has an infinite set of modalities. They are defined recur-sively. All modalities fromLcore are available inLcp, but it has additionalmodalities that cannot be defined using those modalities. The following foursimple operators are included (as inLcore):

(8.7) Lcp basic modal operators: → ← ↑ ↓

But in addition, for any modal operatorπ and any formulaϕ, the followingare both modal operators inLcp:


(8.8) Recursively defined modal operators ofLcp:

π∗ for each modal operatorππ;ϕ? for each modal operatorπ and formulaϕ

As in the case of the〈↓∗〉ϕ modality ofLcore, theπ∗ operators afford accessto the ancestral of the accessibility relation forπ: the formulaπ∗ϕ is satisfiedat a nodeu iff ϕ holds at a nodev that you can get to fromu via a sequenceof zero or more steps mediated by theRπ relation.

The interpretation of〈π;ϕ?〉 needs a little more explanation. Intuitively,evaluating〈π;ϕ?〉ψ involves checking thatψ holds at a node that we can getto usingπ and at whichϕ holds. The two examples in (8.9) will help.

(8.9) a.〈↓;ϕ?〉ψ

b. 〈(↓;ϕ?)∗〉ψ

Formula (8.9a) can be read as ‘at some child of this node whereϕ is true,ψ istrue’. This is equivalent to〈↓〉(ϕ∧ψ), and thus would also be expressible inLcore. But the same is not true of (8.9b). For (8.9b) to be satisfied at a nodeu,there has to be a nodev, dominated byu, at whichψ is true, and additionallyϕ has to be true at all nodes on the path fromu to v. Notice that the asteriskis on ‘(↓;ϕ?)’, so what is repeated is both the step down from parent to childand the check on whetherϕ holds. There has to be a parent-child chain inwhich at every node the check to see ifϕ holds is successeful, and it has tolead down to a node whereψ holds. This relation is inexpressible inLcore.

Notice the potential applications of a formula like (8.9b) in describing syn-tactic facts in human languages. It has been suggested for various syntacticphenomena that they are permitted only within the region between the topand bottom of an unbounded dependency (see Zaenen 1983). Such phenom-ena could be described with a statement using〈(↓;ϕ?)∗〉ψ, whereψ is theproperty of being a trace andϕ, holding at all the nodes on the spine leadingdown to the node whereψ holds, is the property that determines the relevantphenomena.

8.2.3 Trees as models for modal logics

We can identify a labelled tree with a tree model, in the following sense. Abinary tree model is a pairM = 〈T,Val〉 whereT is a tree structure andVal is avaluation function — a function from formulae to node sets whichassigns to each atomic formula the set of all and only those nodes in the treeat which it holds.

So, to complete our example, assume that we have atomic formulaeS,NP,VP,D, . . ., and thus the binary tree model corresponding to the exam-ple in (8.1) would contain a valuation Val such that Val(NP) = {0},Val(V) ={10}, etc.


The remaining thing we need is a definition of the satisfaction relation. Wewrite ‘M ,v |= ϕ’ for ‘the modelM , at the nodev, satisfies(or, is a modelof) the formulaϕ.’ We define the relation|= in the following standard way.

(8.10) For a modelM , a nodev of M , and a formulaϕ:a. M ,v |= p ⇔ v∈ Val(p)

(v satisfies atomic formulap iff v is in the set Valassigns top)

b. M ,v |= ϕ∧ψ ⇔ M ,v |= ϕ∧M ,v |= ψ(v satisfies a conjunction iff it satisfies both conjuncts)

c. M ,v |= ¬ϕ ⇔ M ,v 6|= ϕ(v satisfies the negation of any formula that itdoesn’t satisfy)

As is familiar from alethic modal logic, evaluating a formula containing amodality always involves an accessibility relation that defines permitted ac-cess from one state or node to another in the model. Given thatbothLcore andLcp have multiple modalities, each modality〈π〉 will have a correspondingaccessibility relationRπ:

(8.11) M ,v |= 〈π〉ϕ ⇔ ∃u[(v,u) ∈ Rπ∧M ,u |= ϕ](v satisfies〈π〉ϕ iff it bears theπ relation to a nodeu that satisfiesϕ)

Given our discussion ofR↓ andR→ in section 8.2.1 above, it is fairly straight-forward to get a sense of the accessibility relations for themodalities inLcore.The accessibility relations for the modalities inLcp are more complex, andwill be omitted here (but the details are in Moss and Tiede 2006).

8.2.4 Definability

In order to relate the model-theoretic approach to the generative approach, weneed a model-theoretic notion that corresponds to the set ofderived structuresin the former approach. We will restrict a finite set of atomicformulae denotedby F . We will denote the set of trees that are labelled only with the featuresfrom the setF by T F , and for the set of formulae in the logicL that only usethe atomic formulae from the setF we will write L F .

We say that a subset ofT F is definable in L if there is a formula inL F

such that the subset in question is exactly the set of all those trees whichat their root nodes satisfy the formula. What it means to say that some setT ⊆ T F is definable inL is simply that there is someϕ ∈ L such thatT ={τ | τ,ε |= ϕ} (that is,T is the set of all and only those trees which at the rootnode satisfyϕ).

As an example of how grammars of certain specific types can be for-malised in certain specific logics, it is worth noting — with an informallysketched proof — that despite its very limited expressive power,Lcore is ca-pable of defining the set of all the parse trees obtainable from an arbitrary


context-free phrase structure grammar (CF-PSG).

(8.12) Theorem For any context-free phrase structure grammarG there isa formulaϕG in Lcore that defines the set of all parse trees ofG.

Proof Let the terminals, nonterminals, rules, and start symbol ofG be re-spectivelyVT , VN, P, andS. Since we are only considering binary branchingtrees (it is not hard to generalise the result ton-ary branching), every rule inPis of the formA−→ BC or A−→ a, with A,B,C∈VN anda∈VT . (Here andbelow we reserve ‘−→’ for the rewriting operation of CF-PSG rules.) Theeffects of such rules can be encoded directly inLcore as follows.

The set of formulae covering the binary branching rules contains for eachsymbol A appearing on the left hand side of a branching rule a statement‘A ⇒ Ψ’, whereΨ is a disjunction that for each ruleA−→ BC contains adisjunct of this form:

(8.13) 〈↓〉( B∧〈→〉C )

So if the only rules with PP on the left of the arrow were (i) VP−→ V1,(ii) VP −→ V2 NP, and (iii) VP−→ V3 Clause, the corresponding logicalstatement would contain a statement that in effect says this: ‘If VP holds at anode then either (i)V1 holds at a child node that has no right sibling, or (ii)V2holds at a child node that has a right sibling whereNP holds, or (iii)V3 holdsat a child node that has a right sibling whereClauseholds.’

To this we add, for eachA that appears on the left hand side of a unaryrule, a statementA⇒Ψ, whereΨ is a disjunction with disjuncts of the form〈↓〉a, one for eacha such thatA−→ a.

This much ensures that the models ofϕG comply with the restrictions thatthe rules impose on parse trees ofG. The rest of what we need to do is toensure thatonly parse trees ofG are models ofϕG (from what has been saidso far, there could be other models ofϕG with all sorts of labellings aboutwhich the rules say nothing, and they could vacuously satisfy the statementssummarised above). This we accomplish by adding four further statements.

First, we affirm that node labels are unique — at each node exactly onepropositional symbol is true, by stating that at every node some propositionholds:

(8.14) [↓∗](A1∨A2∨·· ·∨An) whereVT ∪VN = {A1,A2, · · · ,An}

and we state that for all pairs of distinct propositions the negation of one ofthem holds:

(8.15) [↓∗](ϕ1∧ϕ2∧·· ·ϕk)whereϕ1, · · · ,ϕk is the list of all state-ments of the form ‘(¬(α)∨¬(β))’, forα,β ∈VT ∪VN andα 6= β


Second, we assert that the start symbolS is true at the root — that is,S musthold at any node where not even the dummy tautology⊤ holds at the imme-diately dominating node:

(8.16) [↑]¬(⊤)⇒ S

Third, we stipulate that the terminal symbols are true only at leaves — thatwherever a terminal symbol holds, not even the dummy tautology holds atany immediately dominated node thereof (which means there cannot be one):

(8.17) [↓∗]Φ, whereΦ is the conjunction(a1⇒¬〈↓〉⊤) ∧ (a2⇒¬〈↓〉⊤) ∧ ·· · ∧ (ak⇒¬〈↓〉⊤)for all ai ∈VT .

Fourth, we assert that non-terminal symbols are true only atinternal nodes —that wherever a nonterminal holds, the dummy tautology holds at the imme-diately dominated node (which means there must be one).

(8.18) [↓∗]Φ, whereΦ is the conjunction(A1⇒ 〈↓〉⊤) ∧ (A2⇒ 〈↓〉⊤) ∧ ·· · ∧ (Ak⇒ 〈↓〉⊤)for all Ai ∈VN.

This guarantees that any model of the complex formula we havecon-structed will be a parse tree ofG, which completes the proof.

8.3 FeaturesThe trees considered so far are labelled with atomic category labels, repre-sented in the description logic as atomic formulae with the property that eachnode satisfies exactly one of them. If we want to label trees with features,we have to extend the approach presented above. One easy way to includefeatures is to allow each node to satisfymultipleatomic formulae. That way,each atomic formula corresponds to a binary valued feature:if some featureF holds at a node, that node is [+F], and if it is false, the node is [–F].

We can use the approach to represent non-binary features too, as long as itis combined with some formulae limiting their co-ocurrence. Thus we couldrepresent theBAR feature in a two-bar system by means of three features, callthemBAR0, BAR1, andBAR2. It is easy to assert what the GPSG literaturecalls a feature co-occurrence restriction — a statement of the type exemplifiedin (8.15), saying that one and only one of these features is true at each node.

To give an example of the use ofLcp, consider the following formalisationof projection from heads, based on Palm (1999). We first introduce an abbre-viation meaning ‘the featuref belongs to a node that is a head’, where (forthis purpose) we treat being a head as simply a matter of bearing an atomicfeature, corresponding to the atomic propositionhead, with the statementHϕ≡ ϕ∧head.


Then we define what it is for a featureϕ to be projected from a leaf:

(8.19) Projϕ≡ 〈(↓;(Hϕ))∗〉(Hϕ∧ lexical)

Herelexical is just an abbreviation for〈↓〉¬(〈↓〉⊤).Finally, we can require every node to be a projection: given afinite set of

lexical features Lex, we assert[↓∗]Φ, whereΦ is the disjunction of all thestatements Projϕ such thatϕ is in Lex.

The feature indicating that a node is the head would be neededin caseswhere two siblings shared the same lexical feature. Furthermore, there arecertain regularities that this head feature has to observe,such as that (if weset aside the multiple-heads treatment of coordination argued for in someGPSG work) no two siblings may both be heads, a condition thatwe couldstate thus:

(8.20) [↓∗](head⇒¬(〈←〉head∨〈→〉head))

8.3.1 Eliminable Features

The clearest sense in which a feature can be considered intuitively superflu-ous is when one can eliminate it from the grammar without any loss to thedescription. Given a treeτ ∈ T F and a subset of featuresG⊆ F, there is acorresponding treeτ′ ∈ T G that is the result of removing the features inF−Gfrom τ. We will denote the corresponding function byπ; thusπ(τ) = τ′, anddefineπ(T ) as{π(τ) | τ ∈ T }.

The notion of a feature being superfluous in the sense that it can be elim-inated without loss from the distinction can now be formalised by means ofthe following definition:

(8.21) LetF be a finite set of features,G ⊆ F, T ⊆ T F , andL a logic.Suppose thatT is definable inL F . We say thatG is eliminable inLfor T iff π(T ) is definable inL F−G.

Notice that this notion of eliminability is relative to a given logic: the featuresin G are eliminable in some languageL with respect to some set of treesTif and only if the function that gets rid of theG features is definable inLwithout using anyG features. This might hold for some particularL but notin another metalanguage. In other words, questions of whether some featureis truly needed cannot be addressed in isolation, but only inthe context of aparticular descriptive metalanguage in which the feature is used. This obser-vation is made more precise in the following theorem:

(8.22) Theorem (Tiede, 2008) Any tree language that is not definable inLcore but is definable inLcp can be defined with additional featuresin Lcore that are not eliminable inLcore.

This theorem could actually be strengthened, as its proof (for which seeTiede 2008) does not depend on any of the logics in particular. It applies to


any case of two different formal ways of defining sets of trees, each capableof defining all local sets (those that a CF-PSG can define), andone defininga proper subset of the tree-sets definable by the other, provided they are notmore powerful than wMSO. For any setT of trees definable in the morepowerful formalism but not in the less powerful one,T will be definable inthe less powerful formalism if we are permitted to decorate its nodes withadditional features.

These results can be read in two different ways. First, they state that anylanguage that cannot be defined in a weaker formalism but can in a strongerone can be defined in the weaker one if additional features areadded. Con-versely, they state that the only difference between the different formalismsmentioned above, as well as a variety of other formalisms, iswhich featuresare required to define languages: the more expressive the formalism, the fewerfeatures are required for defining languages.

When we move to the most powerful of the logics commonly used inmodel theoretic syntax, wMSO, a single feature suffices. This follows fromthe fact that wMSO characterises the tree-sets that are recognisable by finite-state tree automata (Doner, 1970). These tree-sets are known as theregulartree-sets (or ‘regular tree languages’). The set of all regular tree-sets is knownto be closed under linear tree homomorphisms, which means that any system-atic symbol-for-symbol relabeling of all the nodes in all the trees of a regulartree-set will always yield a regular tree-set.

An example will make this point clearer. Consider a finite-state tree au-tomaton recognising some set of trees in which the nodes are labelled withdistinct labelsA andB, distinguished by some crucial syntactic feature, andsuppose we wanted to relabel theB nodes asA nodes without losing track ofwhich ones they were. This can be accomplished by modifying the automatonso that it has two different states for admittingA nodes: one corresponding tothe originalA nodes, and one corresponding to the notion ‘A-labelled nodethat is really one of the relabelledB nodes’. Since any finite-state tree au-tomaton is equivalent to a wMSO logical description (Doner,1970), there isa wMSO theory that corresponds to the new automaton.

So consider in this light the question of whether theSLASH feature ofGPSG and HPSG is a genuine substantive element of the grammars of hu-man languages. A node dominated by a categoryα is marked with a featurespecification [SLASH:β] in GPSG in order to identify it as containing an ex-traction site of categoryβ that is present somewhere withinα. This eliminatesany need for movement transformations in the description ofunbounded de-pendencies (Gazdar, 1981), as seen in (8.23), where we simplify visually bynotatingα[SLASH : β] in the formα/β.


(8.23) ClauseHHH

��NP

this

Clause/NPaaa

!!!NP

I

VP/NPHHH

��V

think

Clause/NPHHH

��NP

she

VP/NPQ

Q�

�V

knew

NP/NP

e

It might be charged that this simply substitutes another formal device for theformal device of movement: instead of the NPthis being moved from anNP position to another NP position as sibling of a Clause node, it is a sisterof a Clause/NP node that dominates a chain ofα/NP nodes that leads to anNP/NP node at what would have been the pre-movement location. The chainof slash categories marks the path from the root of the landing-site constituentdown to the extraction site. So is the featureSLASH artifactual, rather thancorresponding to a genuine syntactic property of constituents?

Our thesis is that the answer is neither yes nor no. The question is a pseudo-question, insufficiently well-defined to receive an answer.

8.3.2 Inessential Features

Given that the question whether a feature is eliminable depends on the for-malism employed, it is only natural to try to give a purely structural definitionof uselessness applying to features. Marcus Kracht (1997) has proposed sucha definition. Kracht called a featureinessential“if its distribution is fixed bythe other features,” and he proposed the following formal definition.

(8.24) LetF be a finite set of features; letG be a subset ofF; and letT bea set of trees labelled with the features inF . The features inG areinessential forT if the function that eliminates the features inG isone-to-one.

The reason for identifying superfluous features with those that can be elimi-nated by a one-to-one (injective) function is that no two trees can be distin-guished only with these features. If they could, the function that eliminatesthem would map them to the same tree, hence it would not be one-to-one.

The features referred to in the theorem in (8.22) are inessential in exactlyKracht’s sense. And this might seem to conform to the intuition that when a


feature is added solely to make a non-definable set of structures definable, ithas an ad-hoc nature. When there is a distinction in tree structure that we areunable to capture using a given logic, and we add a special feature to certainnodes in certain trees just to enable the distinction to be captured using thatlogic, erasing the features we added would just give us back our original trees.They would not be identical with any other trees in the set, because if theywere we would not have needed to add the feature annotations in the firstplace.

An example due to Thatcher and used by Rogers (1998:60) will be usefulin making the point. Consider the set of all finite binary trees in which allnodes are labelledA except that in each tree exactly one node is labelledB.This set of trees is not definable inLcore, or by any CF-PSG. But we can makeit describable if we add a feature. We will assume that its presence or absencecan be explicitly referenced at any node, and we will indicate its presence by‘H’. We annotate a tree like (8.25a) as shown in (8.25b).

(8.25) Aaaaa

!!!!A A

bbb

"""

All,,

B\\��

A A

A

A\\��

A A

AH

aaaa!!!!

A AH

bbb

"""

AH

ll,,B\\��

A A

A

A\\��

A A

The ‘H’ feature is attached to everyA node that dominates the uniqueB, andto no other node. This allows us to describe the set with a CF-PSG, usingAH

as the start symbol:

(8.26) AH −→ A AH AH −→ A B A−→ A A

AH −→ AH A AH −→ B A B−→ A A

By the theorem in (8.12) we know that the set of trees this grammar generatesis describable inLcore. However, if we erase theH feature from every tree,we will get exactly the set mentioned above: in every tree there will be oneBand all other nodes will be labelledA. Yet no two distinct trees will ever becollapsed into one under thisH-erasure operation. Therefore theH feature isinessential in Kracht’s technical sense.

Both theSLASH feature of GSPG and theBAR feature familiar from X-barsyntax are inessential in exactly the same way. The featureSLASH works ina way almost exactly analogous to ‘H’ above: a constituent containing a gapis marked by placing a specification of a value forSLASH on the root of the


constituent and on each node in the continuous sequence of nodes from theroot downwards, that dominate the gap.

TheBAR feature is also (as Kracht notes) inessential. An easy way toseethis intuitively is to imagine being presented with any of the trees in, say, Jack-endoff (1977) with all of the bar-level indications (Jackendoff uses primes)removed. It would be easy to put them all back without error, given nothingmore than the content of the X-bar principles that Kornai andPullum (1990)call Lexicality, Uniformity, Succession, and Weak Maximality, plus the obser-vation that Jackendoff simplifies his diagrams in certain respects (specifically,a branch with X′′′, X′′, X′, X, and some terminal symbolσ will generally beshown with just X′′′ andσ). Stripping all the primes from Jackendoff’s treeswill never collapse one legal tree into the prime-stripped version of anotherdistinct tree. (Many versions of X-bar theory are much less principled thanJackendoff’s, of course, and it may be that in some of those the featureBAR

might turn out to be essential.)

What would be most desirable would be for the notion ‘inessential’ tobe diagnostic for features that are spurious in the sense of being linguist’sartifacts. Unfortunately things do not work out this way. Many features thatlinguists probably wouldn’t want to eliminate are inessential under this defini-tion, and many features of which they would be suspicious arenot inessential.

Consider, for example, theCASE feature for pronouns in English, takingNom, Acc, DepGen, and IndGen as its values. No two trees in most varietiesof Standard English can be distinguished on the basis of thisfeature, as itsdistribution is fixed by other aspects of the trees. On a pronoun functioningas Subject in a Clause having a tensed Verb, the Nom value (e.g., they) ismandated; on a pronoun functioning as Determiner in an NP theDepGenvalue (e.g.,their) is required. The ordinary morphosyntactic featureCASE isplainly inessential.

At first it might seem that any feature appearing on lexical category la-bels would be inessential in Kracht’s sense, but this is not quite so. Counter-intuitively, features are essential when they are optionalon a certain node.Consider the featureAUX in English. Any tree showing an ‘inverted’ auxil-iary (clause-initial before a subject) will be one that predictably hasAUX onits initial verb, and likewise any tree with a verb form ending in the suffix·n’t, which only appears on auxiliary verbs. But take a dialect inwhich bothWe haven’t enough milkandWe don’t have enough milkare grammatical. Insuch dialects, possessionhave may be treated either as an auxiliary verb or alexical verb, optionally. So both of the following trees would be well-formed(we intend the V in the second one to be taken as [–AUX ]):


(8.27) Clauseaaaa

!!!!NP

we

VPaaa

!!!V[+AUX]

have

NPQQ��

D

enough

N

milk

(8.28) Clauseaaa

!!!NP

we

VPaaa

!!!V

have

NPQQ��

D

enough

N

milk

These trees will be collapsed if the [±AUX ] markings are erased, and solelybecause of this,AUX counts as an essential feature! Yet of course, its presenceis intuitively quite unimportant: because the verb is not inone of the auxiliary-only ·n’t forms, and is not before the subject, it simply doesn’t matter whetherit bears the marking [+AUX ] or not. Though essential in the technical sense,it is entirely superfluous in the intuitive descriptive sense.

In short, Kracht’s notion of inessentiality does not correspond at all to thedescriptive linguist’s notion of being an essential or significant component ofa description.

8.4 Conclusions

Our summary and conclusions can be given briefly. The argument of thischapter has been that it is unlikely that any formal reconstruction can be givenof the notion of a feature that is a technical artifact ratherthan a genuine el-ement of natural language structure that we should expect toturn up in someguise in any reasonable description. There is a crucial tradeoff between elim-inability of features and expressive power of the descriptive metalanguage;the two issues cannot be separated.

Thus, just like the question of when a feature used in the description of onelanguage should be equated with a feature used in the description of another,the issue of when a feature is a technical trick and when it is aproperly mo-


tivated distinguishing property of linguistic expressions will not, we suspect,be reducible to any formal criterion. It may perhaps be approached informallythrough an understanding of what natural languages are typically like, but itwill not submit to an authoritative mathematical adjudication.

ReferencesAfanasiev, Loredana, Patrick Blackburn, Ioanna Dimitriou, Bertrand Gaiffe, Evan

Goris, Maarten Marx, and Maarten de Rijke. 2005. PDL for ordered trees.Journalof Applied Non-Classical Logic15(2):115–135.

Blackburn, Patrick, Maarten de Rijke, and Yde Venema. 2001.Modal Logic. Cam-bridge: Cambridge University Press.

Blackburn, Patrick, Claire Gardent, and Wilfried Meyer-Viol. 1993. Talking abouttrees. InSixth Conference of the European Chapter of the Associationfor Compu-tational Linguistics: Proceedings of the Conference, pages 21–29. Morristown, NJ:European Association for Computational Linguistics.

Doner, John. 1970. Tree acceptors and some of their applications. Journal of Com-puter and System Sciences4:406–451.

Gazdar, Gerald. 1981. Unbounded dependencies and coordinate structure.LinguisticInquiry 12:155–184.

Gazdar, Gerald, Geoffrey K. Pullum, Bob Carpenter, Ewan Klein, Thomas E. Hukari,and Robert D. Levine. 1988. Category structures.Computational Linguistics14:1–19.

Jackendoff, Ray S. 1977.X Syntax. Cambridge, MA: MIT Press.

Kornai, Andras and Geoffrey K. Pullum. 1990. The X-bar theory of phrase structure.Language66:24–50.

Kracht, Marcus. 1997. Inessential features. In C. Retore,ed., Logical Aspects ofComputational Linguistics: First International Conference, LACL ’96 (Selected Pa-pers), no. 1328 in Lecture Notes in Artificial Intelligence, pages43–62. Berlin andNew York: Springer.

Moss, Lawrence S. and Hans-Jorg Tiede. 2006. Applicationsof modal logic in lin-guistics. In P. Blackburn, J. van Benthem, and F. Wolter, eds., Handbook of ModalLogic. Amsterdam: Elsevier.

Palm, Adi. 1999. Propositional tense logic for finite trees.Presented at the Sixth Meet-ing on Mathematics of Language, University of Central Florida, Orlando, Florida;http://www.phil.uni-passau.de/linguistik/palm/papers/mol99.pdf.

Postal, Paul M. 1970. On coreferential complement subject deletion. Linguistic In-quiry 1:439–500.

REFERENCES/ 111

Rogers, James. 1998.A Descriptive Approach to Language-Theoretic Complexity.Stanford, CA: CSLI Publications.

Tiede, Hans-Jorg. 2008. Inessential features, ineliminable features, and modal log-ics for model theoretic syntax.Journal of Logic, Language and Information17(2):217–227.

Zaenen, Annie. 1983. On syntactic binding.Linguistic Inquiry14:469–504.

9

Type Signature ModulesYAEL SYGAL AND SHULY WINTNER

AbstractThis work provides the essential foundations for modular construction of typed uni-

fication grammars for natural languages. Much of the information in such grammars isencoded in the type signature, and hence we focus on modularized development of thesignatures. We extend the preliminary results of Cohen-Sygal and Wintner (2006) anddefinesignature modules, facilitating module interaction and modular developmentofgrammars. Our definitions are motivated by the actual needs of grammar developers andmeet these needs by conforming to a detailed set of desiderata.

Keywords TYPE SIGNATURES, MODULARIZATION , GRAMMAR ENGINEER-ING

9.1 Introduction

Development of large-scale grammars for natural languagesis an active areaof research in human language technology. Such grammars aredevelopednot only for purposes of theoretical linguistic research, but also for naturallanguage applications such as machine translation, speechgeneration, etc.Wide-coverage grammars are being developed for various languages (Oepenet al., 2002, Hinrichs et al., 2004, King et al., 2005) in several theoreticalframeworks, e.g., LFG (Dalrymple, 2001) and HPSG (Pollard and Sag, 1994).

Grammar development is a complex enterprise: it is not unusual for a sin-gle grammar to be developed by a team including several linguists, computa-tional linguists and computer scientists. The scale of grammars is overwhelm-ing: for example, the English resource grammar (Copestake and Flickinger,2000) includes thousands of types. This raises problems reminiscent of thoseencountered in large-scale software development. Yet while software engi-neering provides adequate solutions for the programmer, nogrammar devel-

113


114 / YAEL SYGAL AND SHULY WINTNER

opment environment supports even the most basic needs, suchas grammarmodularization, combination of sub-grammars, separate compilation and au-tomatic linkage of grammars, information encapsulation, etc. (Klint et al.,2005). Referring to grammar engineering, Copestake and Flickinger (2000)note: “to some extent it just has to be accepted that it reallyis inherentlydifficult.”

This paper provides a thorough, well-founded solution to this difficultproblem. After a review of some basic notions we list a set of desiderata inSection 9.1.2 and discuss related work in Section 9.1.3, highlighting the short-comings of existing approaches. We review the definitions ofCohen-Sygaland Wintner (2006) in Section 9.2. We then introduce signature modules inSection 9.3 and show how two modules are combined in Section 9.4. We ex-emplify the definitions on two toy examples, but the solutioncan be scaledup to real-life grammars. We conclude with directions for future research.

9.1.1 Type signatures

We assume familiarity with theories of (typed) unification grammars, as for-mulated by, e.g., Carpenter (1992) and Penn (2000). The definitions in thissection set the notation and recall basic notions. For a partial function F ,‘F(x)↓’ means thatF is defined for the valuex.

Definition 1 A type signature is a tuple〈TYPE,⊑,FEAT,Approp〉, where:

1. 〈TYPE,⊑〉 is a finite bounded complete partial order1 (thetype hierarchy)

2. FEAT is a finite set, disjoint from TYPE.

3. Approp: TYPE×FEAT→ TYPE (theappropriateness specification) is apartial function such that for everyF ∈ FEAT:Feature Introduction there exists a type Int(F) such that

Approp(Int(F),F) ↓, and for all t ∈ TYPE, if Approp(t,F) ↓,thenInt(F)⊑ t;

Upward Closure for all s,t ∈ TYPE, if Approp(s,F)↓ and s⊑ t, thenApprop(t,F)↓ andApprop(s,F)⊑ Approp(t,F).

If x⊑ y, thenx is asupertypeof y andy is asubtypeof x.

9.1.2 Desiderata

In defining a framework for grammar modularization we are guided by thefollowing set of desiderata, adapted from Cohen-Sygal and Wintner (2006):

Partiality: Modules should provide means for specifyingpartial informa-tion about the components of a grammar. Since most of the informationin typed formalisms is encoded by the type signature, modularization

1A partial order isbounded complete(BCPO) if every subset that has an upper bound has auniqueleastupper bound.

SIGNATURE MODULES / 115

must be carried out mainly through the distribution of the signaturebetween the different modules.

Extensibility: While modules can specify partial information, it must bepossible to deterministically extend a module (which can bethe resultof the combination of several modules) into a full grammar.

Privacy: Modules should be able to hide (encapsulate) information and ren-der it unavailable to other modules.

Consistency: Contradicting information in different modules must be de-tected when modules are combined.

Flexibility: The grammar designer should be provided with as much flexibil-ity as possible. The definition of modules should not be unnecessarilyconstrained.

(Remote) Reference:A good solution should enable one module to refer toentities defined in another. Specifically, it should enable the designer ofmoduleMi to use an entity (e.g., a type or a feature structure) definedin M j without specifying the entity explicitly.

Parsimony: When two modules are combined, the resulting module must in-clude only information encoded in each of the modules and informationresulting from the combination operation itself.

Summing up, a good solution for grammar modularization should facili-tate collaborative development of grammars, whether it is asingle large-scalegrammar developed by a team, or a set of grammars for different languagessharing some core fragments and principles (Bender et al., 2005, King et al.,2005), or a sequence of grammars reflecting language development. The so-lution we advocate here satisfies all these requirements andcan be used forthese and similar applications.

9.1.3 Related work

Several works address the issue of grammar modularization in unificationformalisms. Moshier (1997) views HPSG, and in particular its signature, asa collection of constraints over maps between sets. This allows the grammarwriter to specify any partial information about the signature, and providesthe needed mathematical and computational capabilities tointegrate the in-formation with the rest of the signature. Pendar (2007) outlines an approachto incorporate soft constrains into grammars, to resolve conflicting require-ments. These works do not explicitly define grammar modules and the waythey interact. There are no mechanisms for privacy and information encapsu-lation.

A modular version of context-free grammars is given by Wintner (2002).Based on it, Keselj (2001) defines modular HPSG, where each module is anordinary type signature, but each of the sets FEAT and TYPE is divided into


two disjoint sets of private and public elements. In this solution, modulescannot specify partial information; module combination isnot associative;and the only channel of interaction between modules is the names of types.

King et al. (2005) augment LFG with a makeshift signature to allow mod-ular development ofuntypedunification grammars. In addition, they suggestthat any development team should agree in advance on the feature space. Thiswork emphasizes the observation that the modularization ofthe signature isthe key for modular development of grammars. However, the proposed solu-tion is ad-hoc and cannot be taken seriously as a concept of modularization. Inparticular, the suggestion for an agreement on the feature space underminesthe essence of modular design.

To support rapid prototyping of deep grammars, Bender and Flickinger(2005) propose a framework in which the grammar developer can select pre-written grammar fragments, accounting for common linguistic phenomenathat vary across languages (e.g., word order, yes-no questions and sententialnegation). The developer can specify how these phenomena are realized in agiven language, and a grammar for that language is automatically generated,implementing that particular realization of the phenomenon, integrated with alanguage-independentgrammar core. While Bender and Flickinger (2005) re-fer to such pre-written components asmodules, these are clearly not modulesin the usual sense: they are pre-written fragments of code which the gram-mar developer does not develop, and they do not interact freely with otherfragments of the grammar.

9.2 Partially specified signatures

Our work extends and improves the results of Cohen-Sygal andWintner(2006), who introducepartially specified signatures, which we review andevaluate in this section. The key here is a move from concretetype signaturesto descriptions thereof; rather than specify types, a description uses nodes todenote types and arcs to denote elements of the subsumption and appropri-ateness relations of signatures. We assume enumerable, disjoint sets TYPE oftypes, FEAT of features and NODESof nodes, over which PSSs are defined.

Definition 2 A partially specified signature (PSS)over TYPE, FEAT andNODES is a finite, directed labeled graphS= 〈Q,T,�,Ap〉, where:

1. Q⊂ NODES is a finite, nonempty set of nodes.

2. T : Q→ TYPE is a partial one to one function, marking some of the nodeswith types.

3. �⊆ Q×Q is an antireflexive relation specifying (immediate) subsump-

tion; its reflexive-transitive closure, ‘∗�’, is antisymmetric.

4. Ap⊆Q×FEAT×Q is a relation specifying appropriateness.


5. (Relaxed Upward Closure) for allq1,q′1,q2∈QandF ∈ FEAT, if (q1,F,q2)∈

Apandq1∗� q′1, then there existsq′2∈Q such thatq2

∗� q′2 and(q′1,F,q′2)∈

Ap

A PSS is a finite directed graph whose nodes denote types and whoseedges denote the subsumption and appropriateness relations. Nodes can bemarkedby types through the functionT but can also beanonymous(un-marked). Anonymous nodes facilitate reference, in one module, to types thatare defined in another.T is one-to-one since two marked nodes must denotedifferent types.

The ‘�’ relation specifies an immediate subsumption order over thenodes,with the intention that this order hold later for the types denoted by nodes.

This is why ‘∗�’ is required to be a partial order. The type hierarchy of PSSs

is partially ordered but this order is not necessarily a bounded complete one,thus allowing more flexibility in grammar design.

In contrast to type signatures, the appropriateness relation Ap is not re-quired to be a function. Rather, it is a relation which may specify severalappropriate nodes for the values of a featureF at a nodeq. The intentionis that the eventual value ofApprop(T(q),F) be thelub of the types of allthose nodesq′ such thatAp(q,F,q′). TheAprelation is restricted by a relaxedversion of upward closure. Finally, the feature introduction condition of typesignatures is not enforced by PSSs, again, to allow more flexibility for thegrammar designer.

Example 1 A simple PSSS1 is depicted in Figure 1, where solid arrows rep-resent the ‘�’ (subsumption) relation and dashed arrows, labeled by features,the Ap relation.S1 stipulates two subtypes ofcat, n andv, with a commonsubtype,gerund. The featureAGR is appropriate to all three categories, withdistinct (but anonymous) values forApprop(n,AGR) and Approp(v,AGR).Approp(gerund,AGR) will eventually be thelub of Approp(n,AGR) andApprop(v,AGR), hence the multiple outgoingAGR arcs fromgerund.

Observe that inS1, ‘�’ is not a BCPO,Ap is not a function and the featureintroduction condition does not hold.

An additional restriction is imposed on PSSs: a PSS iswell-formedif itcontains no redundant arcs and nodes. A subsumption arc(q1,q2) is redun-dantif it is a member of the transitive closure of�, where� excludes(q1,q2).A PSS does not contain redundant appropriateness arcs if anytwo nodes thatare appropriate for the same node and feature are not relatedby subsumption.If they are, then the appropriatness arc whose target is the smaller node isredundant due to the ‘lub’ intention of appropriateness arcs. Finally, a PSSincludes no redundant nodes if any two anonymous nodes aredistinguish-able, i.e., if each node encodes unique information. Given a PSSS, it can


gerund

n v

cat agr

AGR

AGR

AGR

AGR

FIGURE 1 A partially specified signature,S1

be compactedinto a PSS,compact(S), by removing all the redundant arcsand by unifying all the indistinguishable nodes inS. Two nodes, only oneof which is anonymous, can still be otherwise indistinguishable. Such nodeswill, eventually, be coalesced, but only after all modules are combined.

To combine two PSSs, Cohen-Sygal and Wintner (2006) define the mergeoperator which fundamentally unions two PSSs, taking care to coalesce nodesthat are marked by the same type, as well as pairs of indistinguishable anony-mous nodes. An anonymous node cannot be coalesced with a typed node,even if they are otherwise indistinguishable, to guaranteethe associativity ofthe operation. Anonymous nodes are assigned types only after all modulescombine. Two PSSs can be merged only if the resulting subsumption relationis indeed a partial order, where the only obstacle can be the antisymmetry ofthe resulting relation.

After all the modules are combined, the PSS is extended into abona fidesignature by assigning types to anonymous nodes, extending‘�’ to a BCPOand extendingAp to a full appropriateness specification.

The solution of Cohen-Sygal and Wintner (2006) adheres to the desiderataof section 9.1.2, but provides very limited means for modules to interact: AmoduleMi can use an entity defined inM j only by specifically referring toits type or by specifying all its attributes. Practically,Mi can only incorporateinformation fromM j by being familiar with all the information encoded inM j . In the following section we open up new channels of communicationamong modules, inspired by modules in programming languages. We showthat our definitions neatly support collaborative development of unificationgrammars.


9.3 Signature modules

Signature modulesextend PSSs and provide a complete, well-defined frame-work for modular development of type signatures and hence oftyped unifi-cation grammars. Modules may choose which information to expose to othermodules and how other modules may use the information they encode. Oursolution extends the denotation of nodes by viewing them asparameters:Similarly to parameters in programming languages, these are entities throughwhich information can be imported or exported from other modules.

Definition 3 A (signature) moduleover the sets TYPE, FEAT and NODES isa tupleP = 〈S, Int, Imp,Exp〉, whereS= 〈Q,T,�,Ap〉 is a PSS, and where:

. Int ⊆Q is a set of internal types. Imp⊆Q is an ordered set of imported parameters. Exp⊆Q is an ordered set of exported parameters. Int∩ Imp= Int ∩Exp= /0. for all q∈ Int, T(q)↓

We refer to elements of (the repetition-free lists)Imp andExpusing indices,with the notationImp[i],Exp[ j], respectively.S is theunderlying PSSof P.

A module is a PSS whose nodes are distributed among three setsof inter-nal, importedandexportednodes. If a node is internal it cannot be importedor exported; but a node can be simultaneously imported and exported. A nodewhich does not belong to any of the three sets is calledexternal. Nodes de-note types, as above, but they differ in the way they communicate with nodesin other modules. As their name implies, internal nodes are internal to onemodule and cannot interact with nodes in other modules. Suchnodes providea mechanism similar to local variables in programming languages.

Non-internal nodes may interact with the nodes in other modules: Im-ported nodes expect to receive information from other modules, while ex-ported nodes provide information to other modules. Since anonymous nodesfacilitate reference, in one module, to information encoded in another mod-ule, such nodes cannot be internal. The order of imported andexported nodescontrols the assignment of parameters when two modules are combined, aswill be shown below.2

A signature module iscompactedin the same way a PSS is compacted;the classification of nodes is induced from the input module.Internal nodesmay be coalesced only with each other, resulting in an internal node. If an

2In fact, Imp andExpcan be general sets, rather than lists, as long as the combination oper-ations can deterministically map nodes fromExp to nodes ofImp. For simplicity, we limit thediscussion to the familiar case of lists, where matching elements fromExpto Imp is done by thelocation of the element on the list, see defintions 4 and 5.


imported node is coalesced with some other node, the resulting node is im-ported. Similarly, if one of the nodes is exported then the resulting node isexported. A module is extended to a bona fide type signature byextending itsunderlying PSS to a type signature, ignoring the classification of nodes.

Figure 2 depicts a module,P1, based on the PSS of Figure 1. In the exam-ples, the classification of nodes is encoded graphically as follows:

Internal Imported Exported External

gerund

q1

n

q2

v

q3 q4 q5

cat

q6

agr

q7

AGR

AGR

AGR

AGR

FIGURE 2 A module,P1

9.4 Module CombinationWe introduce two operators for combining signature modules. For both of theoperators, we assume that the two modules areconsistent: One module doesnot include types which are internal to the other module. If this is not the case,the internal nodes can be renamed.

We begin by lifting themergeoperation from PSSs to modules. The onlychange is that the parameters need to be combined: this is achieved by con-catenating the internal, imported and exported parametersin the two mod-ules, respectively. The formal definition is suppressed, but see Cohen-Sygaland Wintner (2006) for more details.

9.4.1 Attachment

The novel operation we introduce in this work isattachment. While the mergeoperation is symmetric, this is not the case for attachment,which behaves asa function, where one module is the input for another. Inspired by the conceptof parameters in programming languages, the exported nodesof the called


module are viewed as actual parameters which instantiate the imported nodesof the calling module, which are viewed as formal parameters. Informally,a moduleP1 receives as input another moduleP2. The information encodedin P2 is added toP1 (as in the merge operation), but additionally, the ex-ported parameters ofP2 are assigned to the imported parameters ofP1: Eachof the exported parameters of the called module is forced to coalesce withits corresponding imported parameter in the calling module, regardless of theattributes of these two parameters (i.e., whether they are indistinguishable ornot).

Definition 4 Let P1 = 〈S1, Int1, Imp1,Exp1〉 andP2 = 〈S2, Int2, Imp2,Exp2〉be two consistent modules whereS1 = 〈Q1,T1,�1,Ap1〉 andS2 = 〈Q2,T2,�2

,Ap2〉 are node-disjoint.P2 can be attachedto P1 if the following conditionshold:

1. |Imp1|= |Exp2|

2. for all i, 1≤ i≤ |Imp1|, if T1(Imp1[i])↓ andT2(Exp2[i])↓, thenT1(Imp1[i])=T2(Exp2[i])

3. S1 andS2 are mergeable

4. for all i, j, 1≤ i ≤ |Imp1| and 1≤ j ≤ |Imp1|, if Imp1[i]∗�1 Imp1[ j], then

Exp2[ j] 6∗�2 Exp2[i]

The first condition requires that the number of formal parameters of thecalling module be equal to the number of actual parameters inthe called mod-ule. The second condition states that if two typed nodes are attached to eachother, they are marked by the same type. If they are marked by two differenttypes they cannot be coalesced. Finally, the last two conditions guarantee theantisymmetry of the subsumption relation in the resulting module: The thirdcondition requires that the two underlying PSSs be mergeable, and the lastcondition requires that no subsumption cycles be created bythe attachmentof parameters.3

Definition 5 Let P1 = 〈S1, Int1, Imp1,Exp1〉 andP2 = 〈S2, Int2, Imp2,Exp2〉be two consistent modules whereS1 = 〈Q1,T1,�1,Ap1〉 andS2 = 〈Q2,T2,�2

,Ap2〉 are node-disjoint. IfP2 can be attached toP1, then theattachment ofP2 to P1 is4 P1(P2) = compact(P), whereP= 〈〈Q,T,�,Ap〉, Int, Imp,Exp〉 isdefined as follows: Let≡ be an equivalence relation overQ1∪Q2 defined bythe reflexive and symmetric closure of{(Imp1[i],Exp2[i]) | 1≤ i ≤ |Imp1|}.Then:

3Relaxed versions of these conditions are conceivable, but we did not find such versionsuseful. For example, one can require|Imp1| ≤ |Exp2| rather than|Imp1| = |Exp2|; or thatT1(Imp1[i]) andT2(Exp2[i]) be consistent rather than equal.

4This is a simplified version; the full definition is more involved and requires some adjustmentof the appropriateness arcs, to guarantee relaxed upward closure (definition 2).


. Q = {[q]≡ | q∈Q1∪Q2}. T([q]≡) =

(T1∪T2)(q′) there existsq′ ∈ [q]≡such that(T1∪T2)(q′)↓

undefined otherwise. �= {([q1]≡, [q2]≡) | (q1,q2) ∈�1 ∪ �2}. Ap= {([q1]≡,F, [q2]≡) | (q1,F,q2) ∈ Ap1∪Ap2}. Int = {[q]≡ | q∈ Int1∪ Int2}. Imp= {[q]≡ | q∈ Imp1}. Exp= {[q]≡ | q∈ Exp1}. the order ofImp andExp is induced by the order ofImp1 andExp1,respectively.

The attachment of a parametric moduleP2 to a parametric moduleP1 isdone in several stages: First the two graphs are unioned (this is a simple point-wise union of the coordinates of the graph), and all the exported nodes ofP2

are identified with the imported nodes ofP1, respectively. This is achievedthrough the equivalence relation, ‘≡’. In this way, for each imported node ofP1, all the information encoded by the corresponding exportednode ofP2 isadded. Then, similarly to the merge operation, pairs of nodes marked by thesame type and pairs of indistinguishable anonymous nodes are coalesced viathecompactoperation.

The imported and exported nodes of the resulting module are the equiva-lence classes of the imported and exported nodes of the calling module,P1,respectively. The nodes ofP2 which are neither internal nor exported yieldexternal nodes in the resulting module. This asymmetric view of nodes stemsfrom the view ofP1 as the calling module andP2 as the called module.

The parametric view of modules facilitates interaction between modulesin two channels: by naming or by reference. Through interaction by naming,nodes marked by the same type are coalesced. Interaction by reference isachieved when the imported nodes of the calling module are coalesced withthe exported nodes of the called module. Themergeoperation allows modulesto interact only through naming, whereasattachmentfacilitates both ways ofinteraction.

We present two examples that demonstrate the utility of module combi-nation through attachment. Section 9.4.2 provides a toy grammar example,inspired by a linguistically-motivated type signature butnecessarily small-scale. In Section 9.4.3 we emphasize the utility of module combination forgrammar engineering tasks by implementing parametric lists, a concept thatrequires heavy machinery in alternative approaches and is natural and simplewith signature modules. A large-scale example of the benefits of signaturemodules is outside the scope of this paper.


9.4.2 A simple example

Let P1 andP2 be the modules depicted in Figures 2 and 3.P1 stipulates twodistinct (but anonymous) values forApprop(n,AGR) andApprop(v,AGR). P2

stipulates two nodes, typednagrandvagr, with the intention that these nodesbe coalesced with the two anonymous nodes ofP1. Notice that all nodes inbothP1 andP2 are non-internal. LetImp1 = 〈q4,q5〉 and letExp2 = 〈p9, p10〉.P1(P2) is the module depicted in Figure 4. Notice howq4,q5 are coalescedwith p9, p10, respectively, even thoughq4,q5 are anonymous andp9, p10 aretyped and each pair of nodes has different attributes. Such unification of nodescannot be achieved with the merge operation.

person

p6

bool

p7

nvagr

p8

p9

vagr

p10

nagr

p13

agr

p14

numNUM

DE

FIN

ITE

PE

RS

ON

FIGURE 3 An agreement module,P2

9.4.3 A utility example: Parametric lists

Lists and parametric lists are extensively used in typed unification based for-malisms, e.g., HPSG. The mathematical foundations for parametric lists wereestablished by Penn (2000), resorting to infinite type signatures. To demon-strate the utility of signature modules, we show how they canbe used to con-struct parametric lists without hampering the finiteness ofthe signature.

Consider Figure 5. The moduleList depicts a parametric list module. Itreceives as input, through the imported nodeq3, a node which determines thetype of the list members. The entire list can then be used through the exportednodeq4. Notice thatq2 is an external anonymous node. Although its intendeddenotation is the typene list, it is anonymous in order to be unique for eachcopy of the list, as will be shown below. Now, ifPhraseis a simple moduleconsisting of one exported node, of typephrase, then the module obtained byList(phrase) is depicted in Figure 6.

Other modules can now use lists of phrases; for example, the module


person bool

gerund nvagr

n v vagr nagr

cat agr numNUM

DE

FIN

ITE

AGR

PE

RS

ON

AGR

AGR

AGR

FIGURE 4 Attachment:P1(P2)

Struct of example 5 uses an imported node as the appropriate value forthe featureCOMP-DTRS. Via attachment, this node can be instantiated byList(Phrase), as in Figure 7.

More copies of the list with other list members can be createdby differentcalls to the moduleList. Each such call creates a unique copy of the list,potentially with different types of list elements. Uniqueness is guaranteed bythe anonymity of the nodeq2 of List.

A major difference between our solution and the one of Penn (2000) isthat while our solution produces only a finite number of list copies, one copyfor each desired list, the solution of Penn (2000) produces alist copy for allthe nodes in the signature recursively, resulting in infinite copies (and hencean infinite signature), while only a small finite number of them are indeednecessary.

9.5 Conclusions

We presented a complete definition of type signature modulesand their in-teraction. Unlike existing approaches, our solution is formally defined, math-ematically proven, can be easily and efficiently implemented, and meets allthe desiderata listed in section 9.1.2. Modular construction of signatures is acrucial step toward (typed unification) grammar modularityand an essentialrequirement for the maintainability and sustainability oflarge scale gram-mars.

The examples we used are necessarily simplistic. We believe, however,that our definition of signature modules, along with the operations ofmerge


elist

q1 q2 q3

q4

FIRSTREST

List

phrase phraselist hd struct

COMP-DTRS

Phrase Struct

FIGURE 5 Modules defining parametric lists

elist

q1 q2

phrase

q3

q4

FIRSTREST

FIGURE 6 List(Phrase)

andattachment, provide grammar developers with powerful and flexible toolsfor collaborative development of large-scale grammars.

Modules provideabstraction; for example, the moduleList of Figure 5defines the structure of a list, abstracting over the type of its elements. In areal-life setting, the grammar designer must determine howto abstract awaycertain aspects of the developed theory, thereby identifying the interactionpoints between the defined module and the rest of the grammar.A first step inthis direction was done by Bender and Flickinger (2005) (section 9.1.3); webelieve that we provide a more general, flexible and powerfulframework toachieve the full goal of grammar modularization.

This work can be extended in various ways. First, the definition of modulescan be extended to include also parts of the grammar, distributing the rulesamong several modules. The combination operators we definedfor signaturemodules can be naturally extended to grammar modules. This reflects ourobservation (section 9.1.2) that most of the information intyped formalismsis encoded by the signature. We are actively pursuing this direction.

While this work is mainly theoretical, it has important practical implica-


elist phrase

phraselist hd struct

COMP-DTRS

FIRSTREST

FIGURE 7 Struct(List(Phrase))

tions. We would like to integrate our solutions in an existing environment forgrammar development. An environment that supports modularconstructionof large scale grammars will greatly contribute to grammar development andwill have a significant impact on practical implementationsof grammaticalformalisms.

Once grammar modules are fully integrated in a grammar developmentsystem, two immediate applications of modularity are conceivable. One isthe development of parallel grammars for multiple languages under a singletheory, as in Bender et al. (2005) or King et al. (2005). Here,a coremoduleis common to all grammars, and language-specific fragments are developedas separate modules. A second application is a sequence of grammars model-ing language development, e.g., language acquisition or (historical) languagechange. Here, a “new” grammar is obtained from a “previous” grammar; for-mal modeling of such operations through module compositioncan shed newlight on the linguistic processes that take place as language develops.

Acknowledgments

This research was supported by THE ISRAEL SCIENCE FOUNDATION(grant No. 137/06). We are grateful to the participants of the ISF Workshopon Large-scale Grammar Development and Grammar Engineering, held inHaifa, Israel in June 2006, for useful comments and feedback. Special thanksto Nurit Melnik and Gerald Penn for detailed discussions, and to the FG-2008reviewers for useful comments. All remaining errors are, ofcourse, our own.

ReferencesBender, Emily M. and Dan Flickinger. 2005. Rapid prototyping of scalable grammars:

Towards modularity in extensions to a language-independent core. InProceedingsof IJCNLP-05. Jeju Island, Korea.

Bender, Emily M., Dan Flickinger, Fredrik Fouvry, and Melanie Siegel. 2005. Sharedrepresentation in multilingual grammar engineering.Research on Language andComputation3:131–138.

REFERENCES/ 127

Carpenter, Bob. 1992.The Logic of Typed Feature Structures. Cambridge Tracts inTheoretical Computer Science. Cambridge University Press.

Cohen-Sygal, Yael and Shuly Wintner. 2006. Partially specified signatures: A vehiclefor grammar modularity. InProceedings of COLING-ACL, pages 145–152. Sydney,Australia.

Copestake, Ann and Dan Flickinger. 2000. An open-source grammar developmentenvironment and broad-coverage English grammar using HPSG. In Proceedings ofLREC-2000. Athens, Greece.

Dalrymple, Mary. 2001.Lexical Functional Grammar. Academic Press.

Hinrichs, Erhard W., W. Detmar Meurers, and Shuly Wintner. 2004. Linguistic theoryand grammar implementation.Research on Language and Computation2:155–163.

Keselj, Vlado. 2001. Modular HPSG. Tech. Rep. CS-2001-05, Department of Com-puter Science, University of Waterloo, Waterloo, Ontario,Canada.

King, Tracy Holloway, Martin Forst, Jonas Kuhn, and Miriam Butt. 2005. The featurespace in parallel grammar writing.Research on Language and Computation3:139–163.

Klint, Paul, Ralf Lammel, and Chris Verhoef. 2005. Toward an engineering disci-pline for grammarware.ACM Transactions on Software Engineering Methodology14(3):331–380.

Moshier, Andrew M. 1997. Is HPSG featureless or unprincipled? Linguistics andPhilosophy20(6):669–695.

Oepen, Stephan, Daniel Flickinger, J. Tsujii, and Hans Uszkoreit, eds. 2002.Collabo-rative Language Engineering: A Case Study in Efficient Grammar-Based Process-ing. Stanford: CSLI Publications.

Pendar, Nick. 2007. Soft contraints at interfaces. In T. H. King and E. M. Bender,eds.,Proceedings of the GEAF07 Workshop, pages 285–305. Stanford, CA: CSLI.

Penn, Gerald B. 2000.The algebraic structure of attributed type signatures. Ph.D.thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.

Pollard, Carl and Ivan A. Sag. 1994.Head-Driven Phrase Structure Grammar. Uni-versity of Chicago Press and CSLI Publications.

Wintner, Shuly. 2002. Modular context-free grammars.Grammars5(1):41–63.

List of Contributors

Allen, James F.

Department of Computer ScienceUniversity of RochesterRochester, NY 14627U.S.A.

[email protected]

Bjerre, Anne

University of Southern DenmarkEngstien 1DK - 6000 KoldingDenmark

[email protected]

Bjerre, Tavs

Aarhus UniversityJens Chr. Skous Vej 7DK - 8000ArhusDenmark

[email protected]

Egg, Markus

Rijksuniversiteit GroningenOude Kijk in ’t Jatstraat 269712 EK GroningenThe Netherlands

[email protected]

129

130 / FG-2008

Kepser, Stephan

CRC 441University of TubingenNauklerstr. 3572074 TubingenGermany

[email protected]

Koehn, Philipp

School of InformaticsUniversity of Edinburgh10 Crichton Street,Edinburgh, EH8 9AB, ScotlandUnited Kingdom

[email protected]

Lecomte, Alain

UMR “Structures Formelles de la Langue” CNRS-Paris 82, rue de la LibertF-93526 Saint-Denis cedexFrance

[email protected]

Maier, Wolfgang

SFB 441Universitat TubingenNauklerstr. 35, 72074 TubingenGermany

[email protected]

Manshadi, Mehdi Hafezi

Department of Computer ScienceUniversity of RochesterRochester, NY 14627U.S.A.

[email protected]

L IST OF CONTRIBUTORS/ 131

Pullum, Geoffrey K.

Linguistics and English LanguageUniversity of Edinburgh14 Buccleuch PlaceEdinburgh EH8 9LN, ScotlandUnited Kingdom

[email protected]

Søgaard, Anders

Universitat PotsdamInstitut fur LinguistikKarl-Liebknecht-Str. 24/25D-14476 GolmGermany

[email protected]

Swift, Mary

Department of Computer ScienceUniversity of RochesterRochester, NY 14627U.S.A

[email protected]

Sygal, Yael

Department of Computer ScienceUniversity of HaifaMount Carmel31905 HaifaIsrael

[email protected]

Tiede, Hans-Jorg

Department of Mathematics and Computer ScienceIllinois Wesleyan UniversityP.O. Box 2900BLoomington, IL 61702-2900U.S.A.

[email protected]

132 / FG-2008

Wintner, Shuly

Department of Computer ScienceUniversity of HaifaMount Carmel31905 HaifaIsrael

[email protected]

Documents

Proceedings of FG 2008: The 13th conference on …...FG-2008, the 13th conference on Formal Grammar, was held in Hamburg, Germany on the 9th and 10th of Auguust 2008. The conference