Representation Theory (Current Studies in Linguistics)

Representation Theory

This page intentionally left blank

Representation Theory Edwin Williams

The MIT Press

Cambridge, Massachusetts

London, England

6 2003 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any

electronic or mechanical means (including photocopying, recording, or informa-

tion storage and retrieval) without permission in writing from the publisher.

This book was set in Times New Roman on 3B2 by Asco Typesetters, Hong

Kong, and was printed and bound in the United States of America.

Library of Congress Cataloging-in-Publication Data

Williams, Edwin.

Representation theory / by Edwin Williams.

p. cm. — (Current studies in linguistics)

Includes bibliographical references and index.

ISBN 0-262-23225-1 (hc. : alk. paper) — ISBN 0-262-73150-9 (pbk. : alk. paper)

1. Grammar, Comparative and general—Syntax. I. Title. II. Current studies in

linguistics series

P291 .W54 2002

415—dc21 2002071774

To Daddy


Contents

Preface ix

Introduction

Architecture for a New Economy 1

Chapter 1

Economy as Shape Conservation 5

Chapter 2

Topic and Focus in Representation

Theory 29

Chapter 3

Embedding 59

Chapter 4

Anaphora 95

Chapter 5

A/A/A/A 117

Chapter 6

Superiority and Movement 139

Chapter 7

X-Bar Theory and Clause

Structure 171

Chapter 8

Inflectional Morphology 199

Chapter 9

Semantics in Representation

Theory 239

References 275

Index 283


Preface

In 1971 I wrote the two required qualifying papers for Ph.D. dissertation

work in linguistics. One was about ‘‘small clauses’’—the notion that

clause structure has several layers, that syntactic operations are associated

with particular layers, and that each layer can be embedded directly,

without the mediation of the higher layers. The other proposed that tones

in tonal languages compose structures that are independent of segmental

or syllabic structure and that a certain kind of mapping holds between the

tonal and segmental representations. I guess these were the two best ideas

I’ve ever had. After thirty years of trying to bring something better to

light, I have given up and have determined instead that my further con-

tribution will be to combine them—if not into one idea, then at least into

one model of the linguistic system. That is what I try to do in this book.

The two ideas take the following guise: (1) syntactic economy is actually

shape conservation (here I return to the idea from tonal systems that

grammar involves not one complex representation, but two simple ones

put into a simple relation to one another), and (2) di¤erent clausal types

can be embedded at di¤erent levels (the Level Embedding Conjecture—

an implementation of the ‘‘small clause’’ idea).

In fact, though, when I put those two ideas together, a third popped

out that isn’t inherent in either. It’s this third idea that is responsible for

the sharpest new predictions in the book: the generalization of the A/A

system to A/A/A/A . . . , which may be viewed as an n-ary generalization

of the binary structure of the NP Structure model proposed in Van

Riemsdijk and Williams 1981. So this book also brings forward a strand

of my collaboration with longtime friend Henk van Riemsdijk.

Most of the ideas in this book have been presented in series of four to

five lectures or in one-week summer school courses: in Lesbos (1999),

Plovdiv (1999), and Vienna (1996–2001), and at UCLA (1997), Univer-

sity of British Columbia (1998), LOT (2001), and University College

London (2002). Other parts were presented in multiple meetings of the

Dutch/Hungarian verb movement study group in Wasenaar, Pecs, Buda-

pest, and Otteveny, in the years 1997–2001. I have benefited particularly

from the extended contact with an audience that such series a¤ord.

I have received encouragement in developing this book from Peter

Ackema, Misi Brody, Memo Cinque, Christine Czinglar, Henry Davis,

Rose-Marie Dechaine, Marcel den Dikken, Hans-Martin Gartner, Jane

Grimshaw, Yosef Grodzinsky, Catherine Hanson, Steve Hanson, Marika

Lekakou, Mark Liberman, Ad Neeleman, Andrew Nevins, Øystein Nil-

sen, Jean-Yves Pollock, Martin Prinzhorn, Henk van Riemsdijk, Domi-

nique Sportiche, Tim Stowell, Peter Svenonius, Anna Szabolcsi, Kriszta

Szendroi, and Martina Wiltschko.

I do heartily thank Anne Mark for applying the Jaws of Life to the car-

wreck of a manuscript she got, and I won’t let her edit just this one sen-

tence so the reader may understand exactly what there is to be grateful to

her for and why.

x Preface

Introduction

Architecture for a NewEconomy

Opus ultra vires nostras agere praesumsimus.

The work reported here brings to light two main findings: first, when

syntax is economical, what it economizes on is shape distortion rather

than distance; and second, this new notion of economy calls for a new

architecture for the grammatical system, and in fact a new notion of

derivation.

For example, the theta structure on the left in (1) has the same shape as

the Case structure on the right.

(1) [agent [predicate theme]]‘ [nominative [Case-assigner accusative]]

The two structures are isomorphic in an obvious sense. I will speak in

this book of one structure as representing another structure if it is iso-

morphic to it, and I will use the wavy arrow to symbolize this type of

representation.

Sometimes one structure will be said to represent another even if not

isomorphic to it, so long as it is nearly isomorphic to it, and nothing else

is closer to it. It is in this sense that syntax economizes on, or tries to

minimize, shape distortion. I will present evidence that this gives a better

account of economy than distance minimization principles like Shortest

Move. The issue can become subtle, as each theory can be made to mimic

the other; in fact, I will argue that some of the uses of distance minimi-

zation economy in the minimalist literature are transparent contrivances

to achieve shape conservation with jury-rigged definitions of distance.

The need for a new architecture should be evident from (1). In order to

say that a theta structure is isomorphic to a Case structure, we need to

have the two structures in the first place. The two structures in (1) have

no standing in standard minimalist practice: there is no theta structure

that exists independent of Case structure; rather, Case and theta are two

parts, or de facto ‘‘regions,’’ of a single structural representation of a

clause, the notion of clause that began with Pollock 1989 and has been

elaborated in functional structure sequencing labs around the world. The

model in which a principle of shape conservation will fit most naturally is

one in which the several di¤erent aspects of clausal structure are charac-

terized as separate ‘‘sublanguages’’ (to anticipate: Theta Structure (TS),

Case Structure (CS), Surface Structure (SS), Quantification Structure

(QS), Focus Structure (FS)). Then the syntax of a sentence will be a col-

lection of structures, one (or more; see chapter 3) from each of these sub-

languages, and a set of shape-conserving mappings among them. In this

sense, then, a new economy (shape conservation) calls for a new archi-

tecture ({TS, CS, SS, QS, FS}).

The new architecture o¤ers a new style of clausal embedding that has

no analogue in standard minimalist practice: the Level Embedding Con-

jecture of chapter 3, a scheme that tightly fixes the relation among local-

ity, reconstructivity, and target type for syntactic relations in a way that I

think is not available in any other model of grammar (see below for defi-

nitions of these terms). The new architecture requires, and in fact auto-

matically provides, a generalization of the A/A distinction to an A/A/A/

A . . . distinction to derive these correlations.

I have called the theory Representation Theory to put the notion of

economy at the forefront: a Case structure ‘‘represents’’ a theta structure

it is paired with, and the essence of representation is isomorphism. So,

syntax is a series of representations of one sublanguage in another.

Chapter 1 develops some analyses in which shape conservation is

manifestly implicated in the domains of lexical structure, compound

structure, bracketing paradoxes, and Case-theta relations. These serve as

a basis for framing a general theory according to which syntax consists

of the sublanguages mentioned earlier, with the representation relation

holding among them. The Mirror Principle is viewed not as a principle

but as an e¤ect that arises automatically whenever two di¤erent sub-

languages each represent a third.

Chapter 2 applies the model to scrambling and its relation to topical-

ization, scope, and focus, using the concept of shape conservation to

reanalyze these domains. Known properties are shown to follow from

conflicting representation requirements, and language di¤erences are

analyzed as di¤erent choices in resolving such conflicts.

Chapter 3 defines di¤erent kinds of embedding for each of the sub-

languages. At the two extremes are clause union embedding in TS and the

2 Introduction

isolating, nonbridge verb embedding in FS; intermediate-sized clauses are

embedded in the intervening levels. The Level Embedding Conjecture

(LEC) says that the di¤erent clause types are not all embedded at the

same level; rather, each type is embedded at the level at which it is defined.

This leads to derivations quite di¤erent from those generated in other

known models of syntax. A generalized version of the Ban on Improper

Movement follows from this architecture.

Chapters 4–6 explore consequences of the LEC proposed in chapter 3.

Three characteristics of a rule are its locality (its range), its reconstruc-

tivity (for a movement (or scrambling) rule, which relations are computed

on its input and which on its output), and its target (the type of element—

A-position, A-position, A-position, and so on—it targets). RT with the

LEC automatically fixes the connections among them, or correlates them

(I will thus refer to LRT correlations), enabling us to answer questions

like ‘‘Why does long-distance scrambling reconstruct for binding theory,

but not short-distance scrambling?’’ and generalized versions of such

questions.

Chapter 4 defines di¤erent kinds of anaphors for each sublanguage;

tight ‘‘coargument’’ anaphors are defined at TS, and long-distance ana-

phors at SS. The theory draws a connection between the locality of an

anaphor and the type of antecedent it can have, where the types are

‘‘coargument, A position, A position, . . . ,’’ in line with the LRT correla-

tions of chapter 3.

Chapter 5 develops the empirical consequences of the generalized no-

tion of the A/A relation that flows from the LEC, and the resulting gen-

eralized notion of reconstruction. Essentially, every pair of sublanguages

in a representation relation can be said to give rise to a di¤erent A/A

distinction and a di¤erent level of reconstruction.

Chapter 6 draws a distinction between movement, as classically under-

stood, and misrepresentation, as defined here. Under special circum-

stances an element might seem to have moved because it occurs in a

structure that stands in a representation relation with another structure

it is not strictly isomorphic to. I argue that classical movement does not

reduce to misrepresentation, and in fact both are needed. Classical wh

movement, for example, is a part of the definition of the sublanguage SS

and does not arise through misrepresentation. In particular, I argue that

parallelism e¤ects observed in multiple wh movements are not the same

kind of thing as the parallelisms that motivate shape conservation and

that they appear to be so only for the simplest cases.

A New Economy 3

Chapters 7 and 8 develop the RT account of phrase structure and head

movement. Chapter 7 develops an account of X-bar theory in which a

lexical item directly ‘‘lexicalizes’’ a subsequence of functional structure; it

then defines the notion of X-bar category in syntax implied by this notion

of what a lexical item does. It is a consequence of the A/A generalization

of previous chapters that Relativized Minimality must be a misgeneral-

ization, in that it attempts to subsume head movement under a general

theory of movement. Chapter 7 argues that head movement is not move-

ment, but part of the X-bar calculus, and its locality follows from the

laws of X-bar theory, not movement theory.

Chapter 8 explains how such lexicalized subsequences can be spelled

out morphologically. Representation is argued to directly derive the Mir-

ror Principle, with a strict separation of syntax and morphological spell-

out. A model of inflectional morphology is developed, a combinatorial

system called CAT, which predicts a precise range of possible realizations

of a set of universal inflectional elements; those possible realizations

are compared with known facts. The mechanism is also applied to verb

cluster systems and is proposed to be the underlying syntax of the recur-

sive argument-of relation, wherever it occurs.

Chapter 9 develops some preliminary ideas about how semantics must

be done di¤erently in RT. Semantic compositionality must be rethought

in view of the syntactic architecture; it becomes less likely that there can

be a single representation of linguistically determined meaning. Chapter 9

also elaborates the notion of semantic value defined at each level, and it

seeks to explicate the di¤erences among types of anaphora (pronominal

anaphora, ellipsis, anaphoric destressing) in terms of these di¤erent kinds

of value.

4 Introduction

Chapter 1

Economy as ShapeConservation

I begin by exploring a problem with the usual solutions to bracketing

paradoxes. The solution to this problem leads to a new principle of

economy, Shape Conservation, which shows itself capable of replacing

the more familiar economy principles. I fashion a new theoretical archi-

tecture to maximize the empirical scope of the principle.

Linguists have distinguished two types of economy, local and nonlocal,

to use Collins’s (1996) terminology—that is, economy that compares

derivations and economy that does not. Although research seems to have

moved away from nonlocal economy, the principle studied here is non-

local, and transderivational.

It is sometimes suggested that computational considerations weigh

against nonlocal economy, but I am personally willing to put such con-

siderations aside while I try to figure out what the overall organization of

the grammatical systems should be. The computation would seem to re-

duce to a metric of tree similarity, which the considerations presented in

this book delimit somewhat, but do not fully determine.

1.1 Bracketing Paradoxes

First, the problem with bracketing paradoxes. A bracketing paradox is a

situation in which two di¤erent considerations about the structure of a

form lead to di¤erent conclusions. Usually, but not always, one consid-

eration is syntactic and the other semantic. Bracketing paradoxes are

generally dispelled by some kind of syntactic restructuring operation. My

first point is that any such restructuring must be inhibited by the existence

of other derivations and forms, and that the relation of the restructuring

to these other derivations and forms is ‘‘economical.’’

The phrase beautiful dancer is a typical example of a bracketing para-

dox. The phrase is famously ambiguous, having the meanings ‘beautiful

one who dances’ and ‘person who dances beautifully’. The two meanings

can be represented as in (1a). The first reading is easy to get, but the sec-

ond is a bracketing paradox. (& and ~& stand for ambiguous and non-

ambiguous, respectively.)

(1) a. &a beautiful [dance -er]

i. beautiful [-er [dance]]

ii. [-er [beautiful dance]]

b. ~&a beautiful [person who dances]

c. a person who [dances beautifully]

d. *a [beautiful dance] -er

If (1ai) is the logical structure of (1a), and if modification is restricted

to sisters, then (1a) should have only the meaning ‘beautiful one who

dances’, because beautiful is sister to an expression headed by -er, which

means ‘one who’. But this leaves no room for (1aii), because that mean-

ing, ‘one who dances beautifully’, has a logical structure that is athwart

the structure of (1a).

Now, we could write a restructuring rule that would relate (1aii) to

(1a), thus making it ambiguous, on the assumption that the relation of

(1a) to (1ai) is transparent and will arise in any case. But a problem then

arises for (1b). If we write the restructuring rule for (1a) in a completely

general form, it will most likely apply to (1b) as well, making it ambigu-

ous too; but it is not. Then why is it not? The idea I would like to explore,

or exploit, is that (1b) is not ambiguous because the ‘‘paradoxical’’

branch of the ambiguity is already covered by a di¤erent form, namely,

(1c), and (1c) fits the meaning better, that is, more transparently. In other

words, (1c) blocks (1b) from having its nontransparent meaning. By con-

trast, the right (transparent) form for the other, nontransparent meaning

of (1a) is (1aii), which cannot be generated. So what we have here is a

principle that says, ‘‘Use the right form, unless there isn’t one. If there

isn’t one, it’s OK to use a form that doesn’t match the meaning.’’

Of course, the failure of (1b) to mean (1c) could be explained di¤er-

ently; it could be taken to represent an island or locality condition on the

restructuring operation, for example. But further cases, such as com-

pounding, suggest that this view is wrong.

It is a stricture of form that gives rise to the gap. We expect then that if

there is no stricture of form, there will be no gap, and consequently no

6 Chapter 1

bracketing paradox. The system of English compounds is striking in its

lack of bracketing paradoxes.

(2) a. kitchen [TOWEL rack]0b. [KITCHEN towel] rack

c. ‘‘[x [y z]] means [[x y] z] only if [x [y z]] is not generable’’

For example, we have the famous compounds in (2a) and (2b), which

have the meanings and accent patterns indicated. Each structure deter-

mines a di¤erent meaning and a di¤erent pronunciation; therefore, the

meanings and pronunciations and structures are in one-to-one corre-

spondence, and we can say that in each case the structure ‘‘mirrors’’ or

‘‘represents’’ the meaning perfectly. Importantly, (2a) is unambiguous

and cannot have the meaning that (2b) has. The question is, why isn’t the

restructuring mechanism for bracketing paradoxes, whatever it is, appli-

cable here? Why can’t the form in (2a) (and its predictable pronunciation)

be restructured so it is semantically interpreted as though it were like

(2b)? In light of the examples in (1), we may now attribute the lack of

ambiguity in (2a) to the existence of the form (2b) itself; the other mean-

ing that (2a) would have is the one that (2b) represents directly. The rea-

son there are no bracketing paradoxes in the compounding system is that

the ‘‘right’’ structure is always generable; this is expressed in (2c). And the

reason for that is that there are only the barest restrictions on the syntax

of N-N compounds—any pair of nouns can be concatenated, bar none. It

is only where there is some stricture of form, as in (1d), that bracketing

paradoxes can arise. The rest of the book develops this idea, exercised

across the whole of syntax, and the architecture that syntactic theory and

syntactic derivation must have in order for this account of bracketing

paradoxes to work.

That language should seek isomorphic matches between related struc-

tures, and accept nonisomorphic matches only when isomorphic matches

are missing, is really an application of Pan˙ini’s principle, ‘‘Use the most

specific applicable form.’’ The isomorphic form is simply the most specific

applicable form, and distorted forms are available only when the iso-

morphic form is not. Shape Conservation thus turns Pan˙ini’s principle

into the economy condition governing syntax. (For further thoughts on

this, see Williams 1997, where it is shown that even economy-as-distance-

minimization can be construed as an application of Pan˙ini’s principle.)

I will now expand the scope of this kind of treatment somewhat. Ad-

verbial modification also manifests bracketing paradoxes. As a sentence

Shape Conservation 7

adverb, probably must modify a tense, or some higher sentence operator.

Completely, on the other hand, must modify the VP. (3) shows how these

sort out: in (3a) and (3b) Tense and V are separate, so the two adverbs

occupy separate positions. But what happens when they are not separate,

as in (3c,d)?

(3) a. John probably was completely cleaning it out.

b. *John completely was probably cleaning it out.

c. John probably [cleanþ ed ] it out. (probably [-ed [clean]])

d. John completely [cleanþ ed] it out. (??completely [-ed [clean]])(3c) poses no problem: if Tense is the exterior element of Vþ ed, then

probably can be seen as modifying it directly. But then (3d) is a bracket-

ing paradox: past tense intervenes between the adverb completely and the

verb it is meant to be modifying. So, completing the analogy with the

previous examples, we can say that (4b) can have the meaning in (4a),

because (4a) itself is not generable.

(4) a. *[completely clean]V -ed

b. completely [clean -ed]V

This gives a di¤erent view of modification than we would expect to

have in the exploded-Infl clause structure proposed by Pollock (1989). In

the Pollock-style clause structure this particular bracketing paradox does

not arise. Tense is a separate element; both (3a) and (3b) have the struc-

ture in (5), and each adverb is adjoined to its proper modifiee. Then V

moves to Tense covertly.

(5) [probably T [completely [V NP]]

A fully lexicalist account of inflection, where functional structure is not

part of clause structure directly but is rather part of the internal structure

of lexical items, will always involve us in these sorts of bracketing para-

doxes, and so the viability of the lexicalist account depends on gaining

some understanding of how bracketing paradoxes work. My first guess is

that bracketing paradoxes arise when the ‘‘best match’’ for a given struc-

ture is not available for some reason, so the ‘‘next best match’’ must be

used. In a lexicalist account of inflection, functional structure will be vis-

ible only at the ‘‘joints’’ between words, so any case in which an adverb

modifies an interior element will be a bracketing paradox. Chapters 7 and

8 pursue a lexicalist model of inflection in RT.

8 Chapter 1

1.2 The Meaning of Synthetic Compounds

The notion of representation, as understood here, can also be applied

to the interpretation of compound structures. The problem to be solved

arises precisely because of the extreme productivity of compounding. Any

two nouns can be put together, and some meaning that connects them

can be concocted, the only inhibition being that the head, in the normal

case, must count as setting the ‘‘major dimension’’ in determining the

meaning of the compound; the nonhead then provides discrimination

within the major dimension. So my students have no trouble thinking of

endless lists of possible relations that could hold between the two ran-

domly selected nouns biography and bicycle of the following compounds:

(6) a. biography bicycle: a bicycle on which biographies are inscribed, a

bicycle on which manuscripts of biographies are messengered in a

large publishing house, etc.

b. bicycle biography: a biography written while touring on a bicycle,

the biography of a bicycle, etc.

Although there are quite narrow rules for pronouncing compounds, it

would seem we can be no more precise about how to determine their

meaning than to say, ‘‘Find some semantic relation that can hold between

the two elements.’’ This is the general understanding of what have been

called root compounds.

It has also been suggested that there is a substrain of compounds,

complement-taking deverbal nouns, that follows a more precise rule.

(7) a. book destroyer

b. church goer

c. *goer

If the root rule can compose new forms only out of existing forms, then

the nonexistence of (7c) is cited as evidence that (7b) cannot arise simply

by applying that rule; hence, a special rule for these synthetic compounds

is postulated (Roeper and Siegel 1978). The synthetic rule is a specific rule

that manipulates the thematic structure of a lexical item, adding an ele-

ment that satisfies one of its theta roles. For example, starting with the

verb go, which takes a goal argument, this rule adds church to it to satisfy

that goal argument in the compound structure.

One problem with positing two rules for English compounding, the

root rule and the synthetic rule, is that the outputs of the two rules are

suspiciously similar: both give rise to head-final structures, with identical


accent patterns. But a much greater problem, and the one I want to con-

centrate on, is that the output of the synthetic rule is completely swamped

by the output of the root rule.

Since the root rule says, ‘‘Find some relation R . . . ,’’ with no imagin-

able restrictions on what R can be, and since ‘‘x is (or ‘restricts the refer-

ence of ’) the thematic complement of y’’ is some such R, what is to stop

the root rule from deriving compounds just like the synthetic rule, making

the synthetic rule redundant?

We might begin by thinking of the connection between the two rules as

a ‘‘blocking’’ relationship (i.e., governed by Pan˙ini’s rule): the specific

synthetic rule blocks the more general root rule, in order to prevent

the root rule from deriving synthetic compounds. I think the intuition

behind this idea is correct, but it raises a telling question that can only be

answered by bringing in the notion of representation in the sense devel-

oped here.

But the first thing to establish is that there really is a problem. Is there

anything to be lost by simply giving up the synthetic rule, leaving only the

root rule for interpreting compounds? There is at least this: the root rule

will not only derive all the good synthetic compounds, but also derive bad

ones. Consider a further fact about synthetic compounds, specifically

about nominalizations derived from ditransitive verbs like supply: the two

theta roles for supply have to be realized in a particular order with the

noun supplier.

(8) a. army gun supplier

b. *gun army supplier

Presumably (8a) is the only form generated by the specific synthetic rule,

but why can (8b) not then be generated by the root rule? The answer

cannot be ‘‘blocking,’’ because the synthetic rule cannot produce (8b),

and so the root rule will not be blocked for that case. Apparently army

supplier is a decent compound on its own, so the question reduces to this:

what is to stop the root rule from composing gun and army supplier as

shown in (9) (where R(x; y) is ‘‘y is (or ‘restricts the reference of ’) the

theme argument of the head of x’’)?

(9) a. Syntax: gunþ ‘‘army supplier’’ ! gun army supplierb. Semantics: R(army supplier, gun)

If such Rs are admitted, and I see no principled way to stop them, then

the root rule can derive anything, including ‘‘bad’’ synthetic compounds

—a real problem.

10 Chapter 1

In fact, if any R is allowed, it is not even clear how to maintain the

special role of the head in compounds—the right R could e¤ectively re-

verse the role of head and nonhead.

(10) R(H, non-H) ¼ (some) R 0(non-H, H)In other words, R says, ‘‘Interpret a compound as though the head were

the nonhead and the nonhead were the head.’’ This R defeats the very

notion of head, as it pertains to meaning.

To treat the second problem first: whatever the semantic content of the

notion ‘‘head’’ is, it relies on every relation R having an obvious choice

about which end of the relation is the ‘‘head’’ end. Semantically, the head

is the ‘‘major dimension’’ of referent discrimination. In the ordinary case

such as baby carriage the choice is obvious: a baby carriage is not a baby

at all, but a type of carriage, subtype baby. But the dvandva compounds

show how very slim the semantic contribution of headship can be.

(11) a. baby athlete

b. athlete baby

(12) a. athlete celebrity

b. celebrity athlete

In each of these the (a) and (b) examples have the same referents: ‘‘things

that are babies and athletes’’ or ‘‘things that are athletes and celebrities.’’

But in fact (11b) is somewhat strange, presumably because it implies that

babies come in types, one of which is ‘‘athlete,’’ even if it is not obvious

why this is less acceptable than the notion that athletes come in types, one

of which is ‘‘baby.’’

I think that these are both ‘‘representational’’ questions: what syntactic

structures ‘‘represent’’ various semantic structures, where one structure

represents another by mirroring its structure and parts.

We can turn the concept of head into a representational question, in

the following way:

(13) Suppose that

a. the head-complement relation is a syntactic relation [H C], and

b. R is any asymmetric semantic relation {A, B} between two

elements.

Then how is [H C] to be matched up with {A, B}?

The syntactic relation will best ‘‘represent’’ the semantic relation if its

asymmetry is matched by the asymmetry of {A, B}—but which identifi-

cation is the one that can be said to match the asymmetries? This question


can best be answered by first considering the question, what is the syn-

tactic asymmetry itself ? I think the source of the syntactic asymmetry is

the syntactic definition of head, which I take to be the following, or at

least to have the following as an immediate consequence:

(14) [H C] is a (syntactic thing of type) H.

That is, syntactically, a unit composed of a head and its complement

([H C]) ‘‘is a thing of the same type as’’ the type of H itself. Phrasing the

matter this way, there can be no question which of the two items in (15b)

is ‘‘best represented’’ by the form in (15a), namely, (15bi).

(15) a. [baby athlete]

b. i. ‘‘baby athlete’’ is a thing of the same type as ‘‘athlete’’

ii. ‘‘baby athlete’’ is a thing of the same type as ‘‘baby’’

And likewise for ‘‘athlete baby.’’

Crucially, I am assuming that the representation must match the asym-

metry of syntax with some asymmetry in the structure it is representing,

as a part of the representation relation.

Now let us return to *gun army supplier. By the root derivation men-

tioned earlier, this form has (among others) a meaning in common with

army gun supplier, and the question is how to block that. To apply the

logic above, we must assume that there is a theta structure with the form

in (16a), but none with the form in (16b).

(16) a. [goal [theme supplier]]

b. *[theme [goal supplier]]

This is a fact about theta structures themselves, not how they are repre-

sented. Then we can say that this is best represented by a structure in

which the highest N is mapped to the goal, and the next highest N is

mapped to the theme, rather than the reverse.

(17)

The result is that R can be any imaginable relation; but for a given

representation relation, we must choose R so as to maximize isomor-

phism to the represented structure. This is why the root rule appears to be

12 Chapter 1

constrained by the synthetic rule. A compound does not have to represent

a theta relation; but if it does, it must do so in the best possible way.

We have seen that there are two ways to think about this, in terms of

rules and in terms of representation. The account in terms of rules is in-

su‰cient in an important way and can be remedied only by reference to

something like representation. Therefore, we may as well devote ourselves

to solving the problem of representation and in the end be able to forget

about the rules. It is tempting to think of the synthetic rule as blocking

the root rule, but this does not give a straightforward account of why (8b)

is ungrammatical, since the synthetic rule would not derive it anyway. In

order to prevent (8b) by rule blocking, we must have recourse to what

(8b) is trying to do, and then block it because (8a) does the same thing

better. But of course what it is trying to do is to represent (8a), only it

does it less well than another form. I don’t see any way around this.

1.3 CasecTheta Representations

I have spoken of one system ‘‘representing’’ another system. I have

chosen the word represent purposely to bring to mind the mathematical

sense of representation, which involves isomorphism. So, the set of theta

structures at the level of Theta Structure (TS) is one system, and the

stems and a‰xes of a language are another system, and we can speak of

how, and how well, one represents the other. For example, we have the

theta structure ‘‘complement-predicate,’’ and this structure is represented

by the stem-su‰x structure provided by morphology. Of course, in this

case there is a natural isomorphism that relates the two.

(18) TS‘Morphology

TS: Morphology:

{complement predicate}‘ [stem su‰x]

(e.g., lique-fy)

In the case of TS we (as investigators) are lucky: there are two di¤erent

systems that represent TS. One is morphology, or word structure, and the

other is Case theory, or, as it will be called here, the level of Case Struc-

ture (CS): the system of Case assigners and their assignees, a part of

phrasal syntax. These representations are di¤erent: they reflect the di¤er-

ence between a‰x and XP, di¤erences in the positioning of the head, and

other di¤erences. But they are the same in their representation of theta

structures, so we can learn something about the representation relation by


comparing them. (Later in this chapter, and in more detail in chapter 7,

I will derive the Mirror Principle from this arrangement, and in chapter

4, some other consequences.)

Throughout this book the wavy arrow (‘ or c) will stand for the

representation relation. (19a) diagrams the arrangement under which

both morphology and phrasal syntax (specifically, ‘‘Case frames’’ in

phrasal syntax) represent theta relations.

(19) a. MorphologycTS‘CS

b. {supply theme}

i. ‘ [gun supplier]Nii. ‘ [supply guns]VP

c. {{supply theme} goal}

i. ‘ [army [gun supplier]]Nii. ‘ [[supply guns] to an army]VP

d. {{advise theme} goal}

i. ‘ [graduate student [course advisor]]Nii. ‘ *[course [graduate student advisor]]Niii. ‘ [[advise graduate students] about courses]VPiv. ‘ *[[advise courses] to graduate students]VP

e. advise: NP aboutP

By stipulated convention, the arrow points from the representing structure

to the represented structure.

(19b) illustrates the simple theta structure consisting of the predicate

supply and its theme complement; this relation can be represented by

either a compound (N) or a Case frame (VP), as shown. A more complex

theta structure, as in (19c), begets correspondingly more complex repre-

sentations. For (19c) the Case and morphological representations are dif-

ferent, but both are isomorphic to the theta structure, so long as linear

order is ignored. In other cases, however, the two representations diverge.

For example, if advise takes a theme and a goal, in that order, then

the compound seems to be isomorphic to the resulting structure (19di),

but the syntactic representation does not seem to be (19diii). And the

compound that would be isomorphic to the syntactic representation is

ungrammatical (19dii). How can this come about? We have already seen

why the compound (19dii) is ungrammatical: there is a better representa-

tion of the target theta structure. As for the syntactic representation,

suppose that the verb advise is stipulated to have the Case frame in (19e),

but not the one that would allow (19div). Then the theta structure in

14 Chapter 1

(19a) will map, or mismap, onto (19diii), because that is the best avail-

able. Hence the divergence between the compound and the Case struc-

ture. (19diii) is a misrepresentation of (19d) (and so is a ‘‘bracketing

paradox’’), which arises from a perhaps arbitrary stricture in the repre-

senting system, the stipulated subcategorization of supply (19e).

The exceptional-Case-marking (ECM) construction is another obvious

example of a Case-theta misrepresentation. TS provides a representation

of the sort given in (20). Now suppose that CS provides the representa-

tion indicated, but nothing isomorphic to the theta structure. Then the

Case structure will misrepresent the theta structure. This account misses

an important fact about ECM—that it is a rare construction—but cap-

tures the essential features of the construction itself.

(20) ECM as a bracketing paradox in syntax

TS: [believe [Mary to be alive]]

CS: [[believe Mary] to be alive]

Throughout these examples the economy principle at work is this: ‘‘Use

the ‘most isomorphic’ structure that satisfies the strictures of the repre-

senting level.’’ If only we could specify in a general way what sets of

structures are taken to be in competition, we would have a theory.

1.4 Shape Conservation

I think that most of the economy proposals about grammatical structure

made during the 1990s can be understood as principles partly designed to

aid and abet the kind of shape conservation under discussion here. First

of course is the Mirror Principle (Baker 1985), which says that the inte-

rior structure of words will mirror the exterior syntactic structure in

which the words occur. The Mirror Principle is not really a principle, but

a robust generalization that is reflected in di¤erent theories in di¤erent

ways. It is implemented in Chomsky 1993, for example, by the algorithm

of feature checking, which is stated in such a way that as a verb moves up

the tree, one of its features can be checked in syntax only after features

more deeply buried in the word have already been checked; this achieves

the mirror e¤ect because morphology adds the most deeply embedded

features first. This reduces the Mirror Principle to coordinating the two

uses of the feature set via a list, or more specifically, a ‘‘stack.’’ A stack

gives mirror behavior, but is of course only one way to get it.


My own view is that the Mirror Principle arises from the holistic

matching of two structures. Since a list is an ‘‘abstract’’ of a structure, it

can serve the same purpose in some circumstances, but only where the list

is an adequate abstract of the structure in question. I regard Chomsky’s

mechanism as an artifice that mimics structural isomorphism for simple

cases—essentially right-linear structures, which are equivalent to lists,

in the sense that there is an obvious way to construct a list from a right-

linear structure and vice versa.

As mentioned earlier, I take the Mirror Principle to be the result of

having two systems that represent one and the same theta system, in the

sense of isomorphic representation.

(21) Mirror Principle

morphologyc theta roles, inflectional elements‘Case system

So, just as there are derivational pairs that mirror each other (22a,b),

there are also inflectional pairs that do the same thing (22c,d).

(22) Derivation

a. [can [swim]VP]

b. [[swim]V able]

Inflection

c. [didT; VP [see]VP]

d. [[see]V -edT;V ]

In the same vein, Chomsky’s (1993, 1995) definition of equidistance

can be seen as a principle that promotes shape conservation, though with-

out explicitly saying so. The question he posed, to which equidistance

was the answer, is, why does the object move to AgrO and the subject to

AgrS, and not vice versa? (Here I use Chomsky’s (1993) terminology; the

problem remains in more recent Agr-less theories.) Chomsky engineers a

solution to this problem in the definition of equidistance, and as a result,

the permitted combination of movements is the familiar pair of intersect-

ing movements.

(23)

Verb movement ‘‘extends the domain’’ of the lowest NP, as domain is

defined in terms of head chains. With the domain of the lower NP ex-

16 Chapter 1

tended in this way, the two NPs in (23) are in the same domain and

hence, by definition, equally distant from anything outside that domain;

hence, they are equally eligible to move outside that domain; hence,

the subject can move over the object without violating economy con-

ditions, and the intersecting derivation results. A ‘‘shortest derivation’’

principle rules out the other, nesting derivation. The odd result is that al-

though the economy conditions are distance minimizing, distance itself is

never defined, only equidistance. I believe this is a clue that the result is

artificial.

Intersecting paths are not what previous work has taught us to expect

from movement. (24a) illustrates the famous intersecting pair of tough

movement and wh movement; as is evident, the intersecting case is much

worse than the nesting case (24b) (Fodor 1978).

(24) a. *Which sonatas is this violin easy to play tsonatas on tviolin?

b. Which violin are these sonatas easy to play tsonatas on tviolin?

So the intersecting movement of subject and object is mysterious.

Intersection might be an illusion arising from the analytic tools and not

from the phenomenon itself. Intersection only arises if two items are

moving to two di¤erent positions in the same structure. But suppose that

instead of moving both subject and object up a single tree, we are instead

trying to find their correspondents in a di¤erent tree altogether—the sort

of operation illustrated in (25). Then there is no intersection of move-

ment; what we have instead is a setup of correspondences between two

structures that preserves the interrelation of those elements (the subject

and object).

(25)

In standard minimalist practice, A would be embedded beneath B, and

movement would relate the agent to nominative, and the theme to accu-

sative. But in RT these relations are a part of the holistic mapping of TS

(containing A) to CS (containing B).

An examination of Holmberg’s (1985) generalization leads to similar

conclusions: it is better seen as a constraint on mapping one representa-

tion into another, than as a constraint on the coordinated movements,

within a single tree, of the items it pertains to (verb and direct object).


The generalization says that object shift must be accompanied by verb

movement: if the object is going to move to the left, then its verb must

do so too. The following are Icelandic examples in which the verb and/or

the direct object can be seen to reposition itself/themselves leftward over

negation:

(26) a. aDthat

Jon

Jon

keypti

bought

ekki

not

bokina

the-book

V neg NP

‘that Jon didn’t buy the book’

b. aD Jon keypti bokina ekki tV tNP V NP neg

(27) a. Jon

Jon

hefur

has

ekki

not

keypt

bought

bokina.

the-book

aux neg V NP

‘that Jon didn’t buy the book’

b. *Jon hefur bokina ekki keypt tNP. aux NP neg V

(26) shows that the object can appear on either side of negation, so long

as both are to the right of the verb. (27) shows that when the verb is to the

right of negation, the object cannot cross negation, even though it could

in (26). Clearly, what is being conserved here is the relation of the verb to

the object. (This, by the way, is not Holmberg’s original proposal, but

one derived from it that many researchers take to be his proposal. In fact,

he proposed that the object cannot move at all unless the verb moves, a

weaker generalization; it remains an empirical question which version is

the one worth pursuing.)

There are various proposals for capturing Holmberg’s generalization,

including Chomsky’s (1995) idea that the D and V ‘‘strength’’ features of

the attracting functional projection must be coordinated—that is, both

strong or both weak. This won’t really work, because if the V and the di-

rect object are attracted to the same functional projection, they will cross

over each other, and this is exactly what is not allowed.

(28) ‘‘ . . . AgrO is {strong [D-], strong [V-]}.’’ (Chomsky 1995, 352)

strong DAgrO C strong VAgrO

(29)

In order to capture the strong and most interesting form of Holmberg’s

generalization (i.e., in order to guanantee that the object cannot cross the

18 Chapter 1

verb), Chomsky’s account must be accompanied by a further stipulation

that the V obligatorily moves to Tense.

But I think that further facts demonstrate the insu‰ciency of this ap-

proach. When there are two objects, they cannot cross over each other,

though the first can move by itself.

(30) a. NP V ekki NP1 NP2b. NP V NP1 ekki t1 NP2c. NP V NP1 NP2 ekki t1 t2d. *NP V NP2 NP1 ekki t1 t2

Clearly, coordinating attraction features will not work here either. What

is obviously going on is that any set of movements is allowed that does

not perturb the interrelation of V, NP1, and NP2. Again, a holistic prin-

ciple of Shape Conservation would seem to go most directly to the heart

of the problem.

A further mystery for the standard view arises from the fact that

Holmberg’s generalization does not hold for V-final languages like

German.

(31) Sie

she

hat

has

Peter

Peter

gestern

yesterday

gesehen.

seen

‘She saw Peter yesterday.’

In (31) the object has moved leftward over the adverb without an ac-

companying movement of the verb. If the view I am suggesting here is

correct, Holmberg’s generalization does not hold because the leftward

movement of the object in Germanic (over an adverb) does not change

the relation of object to verb—the original order is conserved.

In particular theories shape conservation shows up in particular ways.

In hyper-Kaynian theories (Antisymmetry theories with massive remnant

movement) there is a signature derivation of shape-conserving mappings.

The key is systematic remnant movement—namely, remnant movement

resulting automatically from the fact that a phrase is a remnant. All trans-

formational theories of grammar have countenanced remnant move-

ment (see chapter 5 for discussion): NP movement can give rise to an AP

with a gap in it (32a), and then that AP can be displaced by wh movement

(32b).

(32) a. John is [how certain t to win]APb. [how certain t to win]AP is John


But in such a case the two movements are triggered by di¤erent things

(Case for NP movement and wh requirements for wh movement); and in

fact the movements can occur alone, and so are not coordinated with one

another. But in hyper-Kaynian remnant movement the movement of the

remnant and the movement that creates the remnant are keyed to each

other in some way. There are several ways to implement this (one could

propose that both movements are triggered by the same attractor, or

some more complicated arrangement), but in any such arrangement the

movements will always be paired.

Now suppose we find evidence in RT for a shape-conserving ‘‘trans-

lation’’ of structures in one level (L1) to structures in another (L2), as

shown in (33) (where the lines are points of correspondence under the

shape-conserving mapping).

(33)

We can mimic this behavior in Antisymmetry as follows. First, the

derivation concerns a single structure, rather than the pair of structures in

(33); that structure is the result of embedding F 00L1 as a complement of FL2 .Three movements are needed to map the material in the embedded (FL1)

structure into positions in the higher (FL2) structure in shape-conserving

fashion. We therefore need four specifiers, of F0, F1, F2, and F3 (shown in

(34), with F0 at the very top not visible). F3 in (34) corresponds to F00L1

in (33), and SpecF3 corresponds to SpecFL1 . Instead of mapping from

one level to another, as in RT, we move everything in F3 up the tree into

the region of F1 and F0. In order for these movements to achieve shape

conservation, a minimum of three moves are needed, two movements

of SpecF3 and one of F3 itself, in the following order: (a) movement of

SpecF3, making F3 a remnant; (b) movement of that remnant to a Spec

higher than the one SpecF3 was moved to; (c) a second movement of

SpecF3 (to SpecF0), to ‘‘reconstitute’’ the original order of SpecF3 and

the rest of F3.

20 Chapter 1

(34) Achieving shape conservation in Antisymmetry

What is conserved is the order, and the c-command relations, among the

elements of F3. Of course, F3 itself is not conserved, having been broken

into parts, but since the parts maintain their order and c-command rela-

tions and are therefore almost indistinguishable from an intact F3, the

result does deserve some recognition as exemplifying shape conservation.

I believe there is no simpler set of movements in Antisymmetry that could

be called shape conserving.

For this reason, I find it telling that derivations such as (34) abound in

the Antisymmetry literature. It suggests to me that there is something

fundamental about shape conservation. Since Antisymmetry was not

built to capture shape conservation directly, it can only do so in this

roundabout way—yet this roundabout derivation occurs on every page.

Of course, not all derivations in Antisymmetry instantiate (34), only

the shape-conserving ones. After all, things do get reordered in some

derivations, in all accounts. But it will still be suspicious if the ‘‘nothing

special is happening’’ derivation in Antisymmetry always instantiates

(34). It suggests to me that (33) is right.

Another principle with shape-conserving character was a principle of

Generative Semantics, where interpreted structure was deep structure,

and surface structure was the endpoint of derivation in a completely

linear model. The gist of it is this: if Q1 has scope over Q2 in interpreted

structure, then Q1 c-commands Q2 in surface structure (see, e.g., Lako¤


1972). Ignore for now that the principle is false to certain facts, such as

the ambiguity, in English, of sentences with two quantified NPs (e.g.,

Everyone likes someone)—it represents a real truth about quantifiers, and

I will in the end incorporate it into RT directly as a subinstance of the

Shape Conservation principle.

This principle has been reformulated a few times—for example, by

Huang (1982, 220),

(35) General Condition on Scope

Suppose A and B are both QPs or Q-expressions, then if A

c-commands B at SS, A also c-commands B at LF.

and by Hoji (1985, 248).

(36) *QPi QPj tj tiwhere each member c-commands the member to its right.

Probably related is the observation widely made about a number of lan-

guages that if two quantifiers are in their base order, then their interpre-

tation is fixed by that order; but if they have been permuted, then the

possibility of ambiguity arises.

All of these versions of the principle achieve the same thing: a cor-

respondence between (something close to) an interpreted structure and

(something close to) a heard structure. In fact, the correspondence is a

sameness of structure, and so encourages us to pursue the idea of a gen-

eral principle of Shape Conservation. Lako¤ ’s and Huang’s versions are

transparently shape-conserving principles. Hoji’s is not, until one realizes

that it is a representational equivalent of Huang’s and Lako¤ ’s. Fox’s

(1995) results concerning economy of scope can be seen in the same

light.

Finally, Shape Conservation bears an obvious relation to ‘‘faithfulness

to input’’ in Optimality Theory and to the f-structure/c-structure mapping

in Lexical-Functional Grammar. I will comment further on the relation

between RT and these other theories in chapter 3.

I have by now recited a lengthy catalogue of shape-conserving princi-

ples in syntax: the Mirror Principle, equidistance, Holmberg’s generaliza-

tion, various scope principles, faithfulness. I omitted Emonds’s Structure

Preservation despite its similarity in name, because it governs individual

rule applications and so lacks the holistic character of the other principles.

But I would add one more to the list: to my knowledge, the first shape-

conserving principle in the tradition of generative grammar was proposed

22 Chapter 1

in Williams 1971b, namely, that tonal elements (e.g., High and Low) are

not features of vowels or syllables, but constitute a representation sepa-

rate from segmental structure, with its own properties, and that that sep-

arate representation is made to correspond algorithmically to segmental

structure, also with its own, but di¤erent, structure. Tonal Structure, at

least as I discussed it then, was rather primitive, consisting of a sequence

of tones (L, H) grouped into morphemes; and this structure was mapped

to another linear representation, the sequence of vowels (or syllables) of

the segmental structure, in a one-to-one left-to-right manner, in a way

that accounted for such phenomena as tonal spreading. Clearly, there is

a shape-conserving principle in this, even if I did not explicitly identify it

as such; to use the terminology of this book, after the mapping Syllable

Structure represents Tonal Structure, in that elements of Tonal Structure

are put into one-to-one correspondence with elements of Syllable Struc-

ture, and the properties of Tonal Structure (only ‘‘x follows y,’’ since it is

a list) are preserved under the representation.

1.5 The Representation Model

If there are systematic circumstances in which grammar seems to want to

preserve relations between elements, we might consider building a model

from scratch that captures these directly and without contrivance.

Suppose we analyze the grammatical system into several distinct com-

ponents, each of which defines a set of structures (a sublanguage), and

which are related to each other by shape-conserving mappings. The syn-

tax of a clause will then be a mapping across a series of representations,

from Theta Structure to Case Structure to Surface Structure, and so on.

(37)


AS is a partial phonological representation with sentence accent structure

assigned. Its role will be developed in chapter 9.

To compare with the more standard model, we can see this series of

structures (38) as a decomposition of the standard clause structure (39),

with the following correspondences: what is done by structural embed-

ding in the standard theory is done by the representation relation in RT;

and what is done by movement up the tree in the standard theory is done

by isomorphic mapping across this series of representations in RT.

(38)

(39)

An immediate consequence of this decomposition is that in RT there

can be no such thing as an item being left far behind—everything that is

going to be in the clause must make it to the last representation of (38),

which would be equivalent to every NP moving to the very top shell

of (39). The single deep tree of the standard Pollock-style minimalist

theory, on the other hand, allows such ‘‘deep stragglers.’’ Although cer-

tain widely accepted accounts of some constructions (e.g., transitive ex-

pletive constructions) entail the surface positioning of NPs in original

theta positions, it seems that the trend has instead been more and more

toward analyses in which NPs never appear in deep positions. To the ex-

tent that this trend is responding to some feature of reality, I would say

that it confirms RT, in which any other arrangement is not just impossi-

ble, but literally incoherent.

Another way to contrast the two theories is in how semantics is done.

Semantics in the ramified Pollock- and Cinque-style model can be com-

24 Chapter 1

positional, in the usual sense; but semantics in RT is ‘‘cumulative,’’ in a

sense spelled out below and in chapter 9. ‘‘Embedding’’ here is not struc-

tural embedding, but ‘‘homomorphic’’ embedding: TS is ‘‘embedded’’ in

FS by a series of shape-conserving mappings.

Not everything that is a movement in the standard theory will become

an interlevel (mis-)mapping in RT. I have already remarked that wh

movement is a movement within a level, presumably SS. An interesting

pair in this regard is short-distance and long-distance scrambling. Short

scrambling might best be modeled as a mismapping between CS and SS,

whereas long scrambling might best be treated like wh movement, or

perhaps a ‘‘higher’’ mismapping (SS‘FS, for example). The di¤erent

behavior of short and long scrambling with respect to binding theory and

reconstruction should follow from this distinction. (See section 3.1 for

details, and chapters 4 and 5 for generalized applications of the chapter 3

methodology.)

Many questions about this model and its di¤erences from standard

models are still unaddressed. Though most of them will remain so, I will

take up two fundamental questions in chapters 3 and 4.

First, there is the issue of embedding: how is clausal embedding ac-

complished in RT? Embedding could have worked something like this:

elements defined in ‘‘later’’ systems (QS, FS, etc.) are ‘‘rechristened’’ as

theta objects, which can then enter into theta relations in TS. This ac-

count would preserve the obvious relation to standard minimalist practice

and its antecedents back to Syntactic Structures (Chomsky 1957). But in

chapter 3 I will try out a di¤erent view, with surprisingly di¤erent con-

sequences: embedding of di¤erent subordinate clause types happens at

di¤erent levels in RT, where the di¤erent clause types vary along the di-

mension of ‘‘degree of clause union.’’ The principle for embedding is,

‘‘Embed at the level at which the embedded object is first defined’’ (the

Level Embedding Conjecture of chapter 3). For small embeddings, like

that found in serial verb constructions, the level is TS; but for tensed-

clause embedding, the level is SS.

Second, how is semantic interpretation done in this model? Each of the

levels is associated with a di¤erent sort of value, and in chapters 4 and 9

I will try to specify what these values are. Perhaps the most important

di¤erence between RT and the standard model, then, is that there is not

one single tree that represents the meaning; TS represents theta structures,

QS scope relations, FS information structure of the kind relevant to focus,

and so on. The structure of a sentence consists of a set of structures, one


from each of these components, with the shape-conserving mapping

holding among them. Clearly, the meaning is determinable from these

representations; for example, it would be trivial to write an algorithm

that would convert such representations into classical LF structures. But

it is not the case that linguistic meaning can be identified with one of

these levels. To borrow a philosopher’s term, one might say that linguis-

tic meaning is supervenient on these representations (if it is not iden-

tical with them), in that any di¤erence in the meaning of two sentences

will correspond systematically with some di¤erence in their representation

structure. Systematicity will guarantee some notion of semantic composi-

tionality. Compositionality will hold within a level, but it will also hold

across levels. I am not sure that linguistic semantics needs anything more

than this.

Having promised to address these two substantive issues in future

chapters, I would now like to put aside a concern that I think is over-

rated. The following sentiment was often expressed to me while I was

developing the ideas outlined here: ‘‘You’ve replaced movement gov-

erned by distance minimization with holistic mapping between levels

governed by shape conservation. But the properties of movement are

rather well understood, whereas you can give only the barest idea of what

constitutes ‘structure matching’—so the theories aren’t really empirically

comparable.’’

My main objection to this is not what it says about my account of

shape conservation. I accept the charge. But I must question the claim

that there is a notion of movement that is widely accepted, much less un-

derstood. If we review the properties of movement, we find that none of

them are constant across even a highly selective ‘‘centralist’’ list of works

that seek to use movement in significant acts of explanation. What would

the properties be?

1. Is movement always to a c-commanding position?

2. Is movement always to the left?

3. Is movement always island governed?

4. Does movement always leave a gap?

5. Does movement always result in overt material in the landing site?

6. Does movement always move to the top?

7. Is movement always of an XP?

For each of these questions it is easy to find two serious e¤orts at ex-

planation giving opposite answers. For example, in work reviewed in

26 Chapter 1

chapter 6 of this book, Richards (1997) proposes that some movement

does not obey islands (question 3). In addition, Richards proposes that

movement is not always to the edge of its domain, but sometimes ‘‘tucks

in’’ beneath the top element, to use his informal terminology (question 6).

Koopman and Szabolcsi (2000) insist that there is no head movement

(question 7). And so on.

Movement, then, is a term associated with di¤erent properties in di¤er-

ent acts of explanation, and the intersection of those properties is essen-

tially null. This does not mean that no one who uses the term knows what

he or she means by it, only that there is no common understanding. I

don’t think that is a bad thing. The di¤erent uses are after all related; for

example, although it is perfectly acceptable to build a theory in which

movement sometimes leaves a gap, and sometimes leaves a pronoun, it

would be unacceptable to use the term movement in such a way that it

covered none of the cases of gap formation. So it is not that the term is

completely meaningless. But still there is no shared set of properties that

has any significant empirical entailments on its own. Someone who is

pursuing Antisymmetry, for example, will have a very di¤erent under-

standing of the term than someone who is not.

It is the familiarity of the term itself that gives rise to the illusion that

there is a substantive shared understanding of what it refers to. If every

linguist had to replace every use of the term movement with the more

elaborate syntactic relation with properties P1, P2, P3, P7, P23, I think

fewer linguists would claim that ‘‘movement is rather well understood,’’

and then some audience could be mustered for notions of syntactic rela-

tion for which the term movement is not particularly appropriate.



Chapter 2

Topic and Focus inRepresentation Theory

In chapter 1 I made some rather vague suggestions about how Case sys-

tems might be seen as ‘‘representing’’ TS, and in doing so gave some idea

about how the ‘‘left end’’ of the RT model uses the principle of Shape

Conservation. In this chapter I will turn to the other end and show how

the same notion can be used to develop an understanding of how topic

and focus interact with surface syntax.

This chapter is essentially about the interpretive e¤ects of local scram-

bling. Although English will figure in the discussion, my chief aim

will be to explicate, in terms of Shape Conservation, some mainly well

known findings about Italian, German, Spanish, and Hungarian having

to do with word order, topic, and focus. The interpretive e¤ects of long-

distance scrambling, and its place in RT, will be taken up in chapters 3

and 5, where the A/A distinction is generalized in a way that makes sense

of the di¤erence between long- and short-distance scrambling.

Long and short scrambling pose a special problem for Checking

Theory. Checking Theory provides a methodology for analyzing any

correlation between a di¤erence in syntactic form and a di¤erence in

meaning: a functional element is postulated, one whose semantics deter-

mines the di¤erence in meaning by a compositional semantics, and whose

syntax determines a di¤erence in form by acting as an attractor for

movement of some class of phrases to its position. That is, interpretable

features trigger movement. But, as I will show, in the case of focus the

moved constituent does not in general correspond to the Focus. It of

course can be the Focus itself; but in addition, it can be some phrase that

includes the Focus, or it can be some phrase that is included in the Focus.

While the first might be (mis)analyzed as a kind of pied-piping, the sec-

ond makes no sense at all from the point of view of triggered movement.

The problem with Checking Theory that will emerge from the following

observations is that it atomizes syntactic relations into trigger/moved-

element pairs, whereas in fact the syntactic computation targets structures

holistically.

2.1 Preliminaries

I will use Topic and Focus in their currently understood sense: the Topic

consists of presupposed information, and the Focus of new information.

Elsewhere (Williams 1997) I have developed the idea that Focus is es-

sentially an anaphoric notion and that Topic is a subordinated Focus. I

will take this idea up again in chapter 9, but will ignore it until then.

In chapter 1 I introduced two sets of structures, QS (¼ TopS) andFS. The properties of these structures and their relation to other struc-

tures under Shape Conservation will carry the burden of accounting for

the features of topic and focus to be examined here. The di¤erences

among the languages to be discussed will be determined by either (a) dif-

ferences in the rules for forming each structure or (b) di¤ering repre-

sentational demands (e.g., SScQS representation ‘‘trumping’’ SScCS

representation in some languages, with SS‘FS figuring in in a way to be

described).

QS represents not only the topic structure of the clause, but also the

scopes of quantifiers. The reason for collapsing these two is empirical,

and possibly false: wide scope quantifiers seem to behave like Topics, and

unlike Focuses. First, languages in which topic structure is heavily re-

flected in surface syntax tend to be languages in which quantifier scope is

also heavily reflected. German is such a language, but English is not.

Second, focusing allows for reconstruction in the determination of scope,

but topicalization does not. The latter di¤erence has a principled account

in RT, a topic explored in chapters 3 and 5.

2.2 The Structure of QS and FS

QS and FS bear representational relations to SS: SS represents QS, and

FS represents SS. In this section I will give a rough sketch of these struc-

tures, leaving many details to be fixed as analysis demands, as usual.

One question to be resolved in establishing the basic notions in this

domain is, what is the relation among the semantic notions to be repre-

sented (Topic status, wide scope) and the structural predicates precedes

and c-commands? Most clearly for adjuncts, relative scope seems to de-

30 Chapter 2

pend on the stacking relation, not the linear order, if we can rely on our

judgments of the following sentences:

(1) a. John was there a few times every day. (every > few)

b. [[[was there] a few times] every day]

c. [[[John was there] every time] a few days] (few > every)

Adjuncts are not subject to the long scope assignment that is characteris-

tic of argument NPs in a language like English, and so the stacking

order determines the interpretation: every > few for (1a), and few > every

for (1c). By contrast, in (2) the understood order of the quantifiers is

ambiguous.

(2) John saw a friend of his every day.

The simplest assumption is that again the stacking order determines the

order of interpretation, but that the direct object in (2) is subject to wide

scope assignment. So in QS scope is determined by stacking, but some

items (NPs in argument positions) are subject to long scope assignment.

Unlike quantification, topicalization seems to always be associated

with leftward positioning of elements, not just in English, but generally

across language types.

We will assume that QS incorporates both of these facts, generating a

set of structures that represent both topicalization and scope, around a

head X. These structures have roughly the following form:

(3)

The structures have a Topic segment and a non-Topic segment with

obvious, if not well understood, interpretation; in addition, hierarchical

relations determine relative scope.

Surface structures are mapped into QS under the regime of Shape

Conservation. Since the Topic segment of quantification structures is on

the left edge, items on the left edge in SS will be mapped into them iso-

morphically. In English this will include subjects, and Topics derived by

movement.

(4) a. [XP* [XP* [ . . . ]]]

Topic segment non-Topic segment

b. John left early

c. John I saw yesterday

Topic and Focus 31

This permits the Topic-like qualities of the subject position to assert

themselves without any explicit movement to the subject position; the

subject is mapped to one of the Topic positions in QS just as a moved

Topic would be.

The interpretation of focus is not at all straightforward. It is traditional

to distinguish two kinds of focus, normal and contrastive. In Williams

1981a, 1997, I argued that they should not be distinguished. Here, and

especially in chapter 9, I will in fact defend the distinction, but I will

rationalize it as involving di¤erent RT levels. In this chapter I will use

the distinction for expository, nontheoretical purposes. I will take normal

focus to be reliably identified by what can be the answer to a question;

thus, the Focus in (5B) is exactly that part of the answer that corresponds

to the wh phrase in (5A).

(5) A: What did George buy yesterday?

B: George bought [a hammock]F yesterday.

Contrastive focus, on the other hand, arises in ‘‘parallel’’ structures of the

sort illustrated in (6).

(6) John likes Mary and SHE likes HIM.

It will be worthwhile to make this distinction because (a) some languages

have di¤erent distributions for normal and contrastive focus, and (b) the

terminology will be convenient for describing some of the interpretive

e¤ects of scrambling discussed here.

The Focus itself, in a language like English, is always a phrase bearing

accent on its final position. In FS there seems to be a preference for the

Focus to come at the end of the sentence; this is reflected in normal focus

in Spanish, and in interpretive e¤ects for English scrambling (heavy NP

shift). I conclude therefore that FS is characterized by final positioning of

Focus.

But apparently these directional properties of the English focus sys-

tem are not fixed universally. Hungarian seems to exhibit the opposite

scheme. It has a Focus position that appears at the left edge of the VP,

just before the verb; all of the nontopicalized verbal constituents, includ-

ing the subject, appear to the right.

(7) Janos

Janos.nom

Evat

EVA.acc

varta

waited

a

the

mozi

cinema

elott.

in-front-of

‘Janos waited for EVA in front of the cinema.’

(E. Kiss 1995, 212)

32 Chapter 2

In (7) Evat is focused, as it is the preverbal constituent; Janos is topical-

ized. Hungarian FS thus has the following form:

(8) Hungarian FS

Topic Topic . . . Focus [V XP YP . . . ]

(Furthermore, Hungarian Focuses are left accented, instead of right ac-

cented, perhaps an independent property.)

In fact, the normal Focus is not always at the right periphery even in

languages like English. In addition to rightward-positioned Focuses, par-

ticular XPs in particular constructions have the force of a Focus by virtue

of the constructions themselves; examples in English are the cleft and

pseudocleft constructions.

(9) a. Cleft

it was XPF that S It was John that Mary saw.

[what S] is XPFb. Pseudocleft

XPF is [what S] John is what Mary saw.

The XPs in such structures can be used to answer questions and so can be

normal Focuses, or they can be contrastive Focuses (10a); furthermore,

they are incompatible with being Topics (10b). There is thus strong rea-

son to associate the pivots of these constructions with Focus.

(10) a. What did John experience?

What John experienced was humiliation.

It was humiliation that John experienced.

b. What did John experience?

*It was John who experienced humiliation.

*John is who experienced humiliation.

I will simply include these structures in FS without speculating about

why they do not have the Focus on the right or whether there is a single

coherent ‘‘definition’’ of the structures in FS. I will postpone the latter

issue until chapter 9, where I take up the general question of how levels

determine interpretation.

2.3 Heavy NP Shift

With these preliminaries, I now proceed to an analysis of heavy NP shift

(HNPS). I will argue that HNPS is not the result of movement, either to

Topic and Focus 33

the left or to the right, but arises from mismapping CS onto SS. In par-

ticular, I will argue that Checking Theory does not analyze HNPS

appropriately.

That focus is implicated in HNPS is evident from the following

paradigm:

(11) a. John gave to Mary all of the money in the SATCHEL.

b. *John gave to MARY all of the money in the satchel.

c. John gave all of the money in the satchel to MARY.

d. John gave all of the money in the SATCHEL to Mary.

One could summarize (11) in this way: HNPS can take place to put the

Focus at the end of the clause, but not to remove a Focus from the end

of the clause—thus, (11b) is essentially ungrammatical. It is as though

HNPS must take place only to aid and abet canonical FS representation,

in which focused elements are final. (11d) shows that whatever HNPS is,

it is optional. In sum, the neutral order (V NP PP) is valid regardless of

whether the Focus is final or not, but the nonneutral order (V PP NP) is

valid only if NP is the Focus.

In fact, though, the situation is slightly more complicated, and much

more interesting. In what follows I will refer to the direct object in the

shifted sentences as the shifted NP, because in the classical analysis it is

the moved element. The form in (11a) is valid not just when the Focus is

the shifted NP, but in fact as long as the Focus is clause final in the

shifted structure, whether or not the shifted NP is the Focus itself. It is

valid both for Focuses smaller than the shifted NP and for Focuses larger

than the shifted NP, as the following observations will establish.

First, the licensing Focus can be a subpart of the shifted NP.

(12) A: John gave all the money in some container to Mary. What

container?

B: (11a) John gave to Mary all of the money in the SATCHEL.

In this case the Focus is satchel, smaller than the shifted NP. Second, the

licensing Focus can be larger than, and include, the shifted NP; specifi-

cally, it can be the VP.

(13) A: What did John do?

B: (11a) John gave to Mary all of the money in the SATCHEL.

In sum, HNPS is licensed if it puts the Focus at the end of the sentence

(12), or if it allows Focus projection from the end of the sentence (13). It

34 Chapter 2

thus feeds Focus projection; recall that Focus projection is nothing more

than the definition of the internal accent pattern of the focused phrase

itself, which in English must have a final accent.

This constellation of properties is not well modeled by Checking

Theory, including Checking Theories implementing remnant analyses. To

apply these theories to the interaction of HNPS and focus would be first

to identify a functional projection with a focus feature, then to endow the

Focus of the clause with that same focus feature, and then to move the

one to the other. Without remnant movement the result would be classi-

cal NP shift, a movement to the right. Remnant movement allows the

possibility of simulating rightward movement with a pair of leftward

movements. Suppose, for example, that NP in (14) is the Focus.

(14) [V NPF PP]! . . . NPF [V t PP]! [V t PP] NPF tFirst the focused NP moves; then the remnant VP moves around it.

The problem with both the remnant movement and the classical

Checking Theory analyses is that the shifted NP is the Focus only in the

special case, not in general. So it is hard to see why, for example, a

structure like the one in (14) would be appropriate for VP focus—the

movement of the NP would be groundless, as it is not the Focus.

The correct generalization is the one stated: HNPS is licensed if it

results in a canonical AS‘FS representation. This means that it results

in the rightward shifting either of the focused constituent or of some

phrase containing the focused constituent. So, for example, (11a) with VP

focus has the following structure:

(15) CS: [V NP PP]‘! SS: [V PP NP]‘FS: [V PP NP]F

In other words, the CS, SS mismatch (marked by ‘‘‘!’’) is tolerated be-

cause of the SS, FS match. In (11b), on the other hand, both CS‘ SS and

SS‘FS are mismatched.

(16) CS: [V NP PP]‘! SS: [V PP NP]‘! FS: [V NP PPF]

This double misrepresentation is not tolerated in the face of alternatives

with no misrepresentation. (In chapter 9 I will elaborate the theory of

focus, as well as these representations, with a further relevant level (Ac-

cent Structure), but these changes will not a¤ect the structure of the

explanations given here.)

What this little system displays is an excessive lack of ‘‘greed,’’ to use

Chomsky’s (1993) term: HNPS is licensed by a ‘‘global’’ property of the

Topic and Focus 35

VP, not by the shifted NP’s needs. This is why it is di‰cult to model it

with Checking Theory, because Checking Theory atomizes the move-

ments and requires each to have separate motivation—interesting if cor-

rect, but apparently not. The remnant movement analysis is particularly

bad: not only is the wrong thing moved (sometimes a subphrase, some-

times a superphrase of the target), but the ensuing remnant movement

has no motivation either.

Hungarian focusing shows the same lack of correspondence between

displaced constituents and Focuses that English focusing does. Recall

that Hungarian has Focus-initial FS structures; furthermore, the Focus

itself is accented on the first word.

(17) a. Janos [

Janos

a

the

TEGNAPI

YESTERDAY’s

cikkeket]

articles

olvasta . . .

read

‘Janos read YESTERDAY’s articles . . .’

b. . . . nem

not

a

the

maiakat.

today’s

‘. . . not today’s.’

c. . . . nem

not

a

the

konyveket.

books

‘. . . not the books.’

d. . . . nem

not

a

the

furdoszobaban

bathroom-in

enekelt.

sang

‘. . . not sang in the bathroom.’

(Kenesei 1998, as reported in Szendroi 2001)

The fronted constituent is bracketed in (17a). As (17c) shows, that con-

stituent can be the Focus; but (17b) shows that the Focus can be smaller,

and (17d) shows that it can be larger, including the verb.

I have suppressed one further detail in connection with HNPS that is

now worth bringing to light. (11b) is not, strictly speaking, ungrammat-

ical. Rather, it has a very specialized use: it can be used ‘‘metalinguisti-

cally,’’ as in (18).

(18) A: John gave to Joe all the money in the SATCHEL.

B: No, John gave to MARY all the money in the satchel.

That is, it can be used to correct someone. Rather than brushing these

examples aside, I will show that their properties follow from the way in

which phonological and syntactically defined focus are related to each

other. But I will not do this until chapter 9, where I take up the notion of

36 Chapter 2

the ‘‘values’’ that are defined at each level, and how the values of one

level are related to the values of other levels.

So, HNPS is analyzed here, not as a movement, but as a mismapping

between CS and SS that is licensed by a proper mapping between SS and

FS. As such, it should not show the telltale marks of real movement; that

is, it should not leave phonologically detectable traces, it should intersect

rather than nest with itself, and so on. Some of these behaviors are hard

to demonstrate. However, there is one property of HNPS that has been

put forward to show that it is a real movement: it can license parasitic

gaps, and so is in fact a kind of A movement. (19) is the kind of sentence

that is meant to support this idea.

(19) John put in the satchel, and Sam t in the suitcase, all the money

they found.

The argument is based on the correct hypothesis that only ‘‘real’’ traces of

movement can license parasitic gaps, but it wrongly assumes that HNPS

is necessarily involved in the derivation of such examples.

In fact, such examples can arise independently of HNPS, through the

action of right node raising (RNR), a process not fully understood, but

clearly needed in addition to HNPS. RNR, in the classical analysis, is an

across-the-board application of a rightward movement rule in a coordi-

nate structure, as illustrated in (20).

(20) John wrote t, and Bill read t, that book.

This analysis of RNR has been contested (see Wilder 1997; Kayne 1994),

but not in a way that changes its role in the following discussion. Given

such a rule, we would expect sentences like (19) even if there were no

HNPS, so it cannot be cited to show that HNPS is a trace-leaving move-

ment rule.

We can understand (19) as arising from the across-the-board extraction

of the NP [all the money they found ] from the two Ss that precede it,

thereby not involving HNPS essentially (though of course the input

structures to RNR could be shifted; it is hard to tell).

(21) [John put ti in the satchel] and [Sam put ti in the suitcase] NPi

Evidence that RNR is the correct rule for this construction comes from

the fact that HNPS does not strand prepositions, combined with the ob-

servation that such stranded prepositions are indeed found in sentences

analogous to (21).

Topic and Focus 37

(22) a. John talked to ti about money, and Bill harangued ti about

politics, [all of the . . . ]ib. *John talked to ti about money [all of the . . . ]i

Although awkward, (22a) is dramatically better than (22b), and so HNPS

is an unlikely source for sentences like (21). See Williams 1994b for fur-

ther argument.

Although the failure of HNPS to leave stranded prepositions is used

as a diagnostic in the argument just given, it is actually a theoretically

interesting detail in itself. If HNPS is a movement rule, and, I suppose,

especially if it is a leftward parasitic-gap-licensing movement, as it is in

the remnant movement analyses of it, then why does it not strand prepo-

sitions, as other such rules do? In the RT account, HNPS arises in the

mismatch between SS and CS: the same items occur, but in di¤erent

arrangement, so stranding cannot arise, as stranding creates two con-

stituents ([P t] and NP) where there was one, in turn creating intolerable

mismatch between levels.

2.4 Variation

Some levels are in representation relations with more than one other level,

giving rise to the possibility that conflicting representational demands will

be made on one and the same level. An item in SS, for example, must be

congruent to a Case structure and to a quantification structure, and these

might make incompatible demands on the form of SS. Since mismatches

are allowed in the first place, the only question is whether there is a sys-

tematic way to resolve these conflicts. I will suggest that languages di¤er

with respect to which representation relations are favored.

This arrangement is somewhat like Optimality Theory (OT), if we

identify the notion ‘‘shape-conserving representation relation’’ with

‘‘faithfulness.’’ But RT and OT di¤er in certain ways. In RT only com-

peting representation relations can be ranked, and they can be ranked

only among themselves and only where they compete on a single level.

Intralevel constraints are simply parts of the grammar of each indepen-

dent sublanguage, and so cannot be ranked with the representation rela-

tions those sublanguages enter into. In this regard RT is more restrictive

than OT. On the other hand, I will be assuming that the properties of the

sublanguages themselves are open to language-particular variation; and

in this respect RT is less restrictive than OT, as OT seeks to account for

38 Chapter 2

all language particularity through reordering of a homogeneous set of

constraints.

RT also resembles theories about how grammatical relations (subject,

object, etc.) are realized in syntactic material. For example, Lexical-

Functional Grammar (LFG; Kaplan and Bresnan 1982) posits two levels

of representation, f-structure and c-structure. F-structure corresponds

most closely to the level called TS here, and c-structure corresponds most

closely to everything else. An algorithm matches up c-structures and

f-structures by generating f-descriptions, which are constraints on what

c-structures can represent a given f-structure. Since the overall e¤ect is

to achieve a kind of isomorphism between c-structures and f-structures,

the grammatical system in LFG bears an architectural similarity to the

RT model, especially at the ‘‘low’’ (TS) end of the model, even though

there is no level in RT explicitly devoted to grammatical relations them-

selves, that work being divided among other levels. Similar remarks apply

to the analysis of grammatical relations presented in Marantz 1984.

LFG di¤ers from RT in several ways. First, the matching between

c-structure and f-structure is not an economy principle, so the notion

‘‘closest match’’ plays no role. The LFG f-description algorithm tends

to enforce isomorphism, but its exact relation to isomorphism is an ac-

cidental consequence of the particulars of how it is formulated. By com-

parison, in RT exact isomorphism is the ‘‘goal’’ of the relations that hold

between successive levels, and deviations from exact isomorphism occur

only when, and to the exact degree to which, that goal cannot be achieved.

Second, LFG posits only two levels, whereas RT extends the matching

to a substantially larger number of representations, in order to maximize

the work of the economy principle.

Third, and most important, the place of embedding in the two systems

is di¤erent. I will propose in chapter 3 that embedding takes place at

every level, in the sense that complements and adjuncts are embedded in

later levels that have no correspondents in previous levels. In LFG, if

embedding is done anywhere, it is done everywhere; that is, if a clause is

present in c-structure, it has an f-structure image. Thus, the predictions of

RT made and tested in chapters 3–6 are not available in LFG.

2.5 English versus German Scrambling

Let us now turn to a systematic analysis of the di¤erence between English

and German in terms of mismapping between levels. Keeping in mind the

Topic and Focus 39

rough characterization of QS and FS given above, we may now charac-

terize that di¤erence as follows:

(23) a. German: SScQS > SScCS

b. English: SScCS > SScQS

c. Universal: SS‘FS

That is, in German SS representation of QS is more important than SS

representation of CS (signified by ‘‘>’’); in English the reverse is true.

And in all languages of course FS represents SS.

Let us now examine what expectations about German will flow from

the specifications in (23). Perhaps arbitrarily, I identify the following four:

1. Two definite NPs in German should not be reorderable, apart from

focus.

2. Definite pronouns move leftward.

3. A definite NP obligatorily moves leftward over (only indefinite)

adverbs.

4. Surface order disambiguates quantification, except where Q is focused.

Expectation 1: First, two definite NPs in German should not be re-

orderable, unless a special focusing is to be achieved. This is true because

SS must represent CS, unless that requirement is countervailed by some

other representational need.

(24) Two definites are not reorderable with normal focus

a. IO DO V (CS order)

b. *DO IO V

This conclusion is true, and in fact is a commonplace of the literature on

the German middlefield; see Deprez 1989 for a summary account.

Expectation 2: Definite pronouns appear on the left edge in SS, as

required by QS (¼ TopS), since they are always D-linked—again, a com-monplace of the literature.

Expectation 3: A definite NP will move leftward over an adverb, in

defiance of CS, in order for SS to match QS, as definites always have

wider scope than indefinite adverbs; see the end of this section for a dis-

cussion of the behavior induced by definite adverbs, based on findings of

Van Riemsdijk (1996). But the pull to the left to move the direct object

into the clause-initial Topic field of QS can be countervailed by the need

to place narrow focus on the object, as in (25b), which makes leaving the

NP after the adverb an option, even though the NP is D-linked. The key

40 Chapter 2

here is to understand that an NP can be both focused and D-linked, in

e¤ect both focused and topicalized, and that both of these properties are

needed to understand the German middlefield behavior. The following

cases show that these expectations are fulfilled:

(25) Definites move left, except if narrowly focused

a. weil

because

ich

I

die

the

Katze

cat

selten

seldom

streichle

pet

‘because I seldom pet the cat’

b. ?*weil ich selten die Katze streichle

(good only if contrastive focus on Katze (Diesing 1992) or

[Katze streichle] (M. Noonan, personal communication))

c. weil ich die KATZE selten streichle

(only narrow focus on KATZE )

d. What did Karl do?

Den

the

HUND

dog

hat

has

Karl

Karl

geschlagen.

beaten

‘Karl beat the DOG.’

(Prinzhorn 1998)

In passing, note the di‰culty this sort of example poses for a remnant

movement analysis of topicalization, or for rightward movement. The

problem, in both cases, is that the verb stays at the end, no matter what.

If we assume SVO order (as remnant movement theories generally do for

SOV languages), then to derive (25b) where the object is focused, we must

perform the operations of focusing and remnant movement, resulting in

something like one of the two following derivations:

(26) a. weil ich selten streichle die Katze ! topicalization

weil ich die Katze [selten t streichle] ! remnant movement

weil ich [selten streichle] die Katze ! ?? derive SOV order

weil ich selten die Katze streichle

b. weil ich selten streichle die Katze ! derive SOV order

weil ich selten die Katze [streichle t] ! topicalization

weil ich die Katze [selten streichle t] ! ?? remnant movement

weil ich selten die Katze streichle

The last step is the puzzler—how to get the verb in final position again,

but at the same time end up with the adverb before the direct object. The

operations otherwise motivated, including the remnant movement half of

focusing, do not seem to have the properties needed to achieve this.

Topic and Focus 41

In German, scrambling is more or less obligatory to disambiguate the

scope of coarguments, so there is much less surface quantifier ambiguity

in German than in English. This is because German favors QS represen-

tation over CS. But again, there is an important exception: when the sec-

ond of the two NPs is narrowly focused, it can remain in situ and be

scopally ambiguous there. The important thing here is that the possibility

of wide scope in the rightmost position is dependent on narrow focus.

Despite the other di¤erences between the two languages, German behaves

identically to English in this respect, mimicking the special contours of

the HNPS construction discussed in section 2.3, mutatis mutandis: in

German FS countervails QS representation, whereas in English HNPS it

countervails CS representation.

Importantly, German does not require that the rightmost NP be the

Focus itself; rather, it must be a part of a narrow Focus, as (25b) shows.

This detail precisely matches the case of English HNPS. It would appear

that the ‘‘global’’ property of having a canonical FS representation over-

rides the German-particular requirement that SS be a canonical QS

representation.

Expectation 4: The notion that QS is the level in which both Topics

and quantifiers get their scopes is supported by the fact that scope inter-

pretation interacts with focusing in exactly the same way that Topics do,

as the following examples establish:

(27) Movement disambiguates quantified NPs

a. ~&dass

that

eine

a

Sopranistin

soprano

jedes

every

Schubertlied

Schubert song

gesungen

sung

hat (eine > jedes)

has

‘that a soprano sang every song by Schubert’

b. ~&dass jedes Schubertlied eine Sopranistin gesungen

hat ( jedes > eine)

(Diesing 1992)

(28) ‘‘Unmoved’’ NP is ambiguous if and only if narrowly focused

a. &Er

he

hat

has

ein

a

paar

couple

Mal

times

das

the

langste

longest

Buch

book

gelesen.

read

‘He read the longest book a couple of times.’

b. ~&Er hat das langste Buch ein paar Mal gelesen.

Example (28) in particular shows that a wide scope quantifier can be left

in situ exactly in case it is narrowly focused.

42 Chapter 2

In remnant movement Checking Theories (28a) would need to be rep-

resented as follows:

(29) a. Assign (i.e., check) scope

er hat das langste Buch [ein paar Mal [t gelesen]]

b. Assign (i.e., check) Focus

er hat [ein paar Malj [das langste Buchi [tj [ti gelesen]]]]

Ein paar Mal must move precisely because das langste Buch is the Focus,

and thus not for reasons of its own. The di‰culty is increased, just as it

was in the case of HNPS, by the fact that the same word order and scope

interpretation are possible if the whole VP [das langste Buch gelesen] is

narrowly focused. In other words, not only does narrow focus in a quan-

tified NP permit in-situ positioning, but so does canonical Focus projec-

tion from that NP. Again, this is exactly the behavior found earlier for

HNPS in English. Although I do not have relevant examples, I would

expect the same results in (27) and (28) if the Focus was a subconstituent

of the direct object (e.g., contrastive focus on the noun Buch), again by

parallelism with the HNPS facts.

The overall relation of focus to topic in German can be summarized in

the following cascade of exceptions:

(30) NP must be in Case position

except if D-linked or wide scoped

except if narrowly focused or part of a canonical narrow

Focus.

RT derives this cascade from the competition of congruences that SS

must enter into.

In English, SS does not represent QS, but rather CS; thus, quantifier

ambiguities abound.

(31) He has read the longest book a couple of times.

Example (31) is ambiguous even if the whole sentence is the Focus (as it

would be, for example, in answer to the question, What happened?). The

two readings have the following structures:

(32) a. CS‘ SS !cQS (narrow scope for the longest book)

b. CS‘ SScQS (wide scope for the longest book)

By the logic of RT, (32a) is tolerated, in the face of (32b), because (32a)

gives a meaning that (32b) does not.

Topic and Focus 43

But it is not enough for a misrepresentation (or in classical terms, a

movement) to serve some purpose—it matters which purpose. For exam-

ple, HNPS is not justified simply to achieve QS‘ SS representation,

as (33) shows.

(33) *John gave to every FRIEND of mine a book. (E > b)

Rather, HNPS is justified only to achieve FSc SS congruence, as estab-

lished earlier. Although it is conceivable that a language could work the

other way (since in fact German does), English does not. It does not be-

cause it rates CS representation over QS representation tout court.

In the main line of work within the ramified Pollock-style theory of

clause structure, the leftward positioning of topicalized NPs is achieved

by movement—that is, by the same kind of relation that wh movement is.

Evidence of movement comes from viewing the di¤erent positions an NP

can occupy under di¤erent interpretations, where positions are identified

with respect to adverb positions. This methodology has been thoroughly

explored in a variety of languages.

Van Riemsdijk (1996) has pointed out the following problem with this

methodology. In German the adverbs themselves seem subject to the

same dislocating forces as the NPs; that is, definite adverbs such as dort

‘there’ move leftward, compared with their indefinite counterparts such as

irgendwo ‘somewhere’, as the following paradigm illustrates:

(34) a. Ich

I

habe

have

irgendwem/dem Typ

someone/that guy

irgendwas/das Buch

something/the book

versprochen.

promised

‘I promised someone/that guy something/the book.’

b. *Ich habe irgendwas dem Typ versprochen.

c. Ich habe das Buch dem Typ versprochen.

(35) a. Sie

she

hat

has

irgendwo/dort

somewhere/there

wen/den Typ

someone/that guy

aufgegabelt.

picked up

‘She picked someone/that guy up somewhere/there.’

b. ??Sie hat irgendwo den Typ aufgegabelt.

c. Sie hat dort den Typ aufgegabelt.

(Van Riemsdijk 1996)

Example (34) shows the relative ordering properties for a definite and an

indefinite NP, and (35) shows the same thing for an adverb and an NP: a

definite NP is bad after an indefinite adverb, but OK after a definite ad-

verb. This finding calls into serious question whether adverbs can be used

as a frame of reference against which to measure the movement of NPs. It

44 Chapter 2

also calls into question the notion that adverbs occupy fixed positions

in functional structure determined solely by what they are understood to

be modifying. And it suggests that everything, including adverbs, is mov-

ing in the same wind, or rather the same two countervailing winds of QS

(¼ TS) and FS.

2.6 Hungarian Scope

Brody and Szabolcsi (2000) (B&S) present Hungarian cases just like the

German cases observed by Noonan and others cited earlier. That is,

moved quantifiers are unambiguous in scope, while unmoved ones are

ambiguous; but not moving has consequences for focus.

According to standard analyses since E. Kiss 1987, Hungarian quanti-

fied NPs (including the subject) are generated postverbally and then

moved to the left of the verb; leftward movement fixes scope. There are

two types of position to the left of the verb: a single Focus position im-

mediately to the left of the verb, and then a series of ‘‘Topic’’ positions to

the left of that, giving the following structure:

(36) [NPT NPT . . . NPF V . . . ]

To illustrate: (37a) is not ambiguous, but (37b) is ambiguous. This is

because in (37a) both NPs have moved, so their relative scope is fixed; but

in (37b) minden filmet has not moved, so it is scopally ambiguous.

(37) a. ~&Minden

every

filmet

film

keves

few

ember

people

nezett

saw

meg. (every > few)

prt

b. &Keves

few

ember

people

nezett

saw

meg

prt

minden

every

filmet.

film

(B&S 2000, 8)

But B&S have provided a more fine-grained version of the facts. They

report that the accent pattern of the sentence disambiguates (37b); in

particular, if minden filmet is accented, then it has wide scope over keves

ember.

(38) a. Keves ember nezett meg MINDEN FILMET. (every > few)

b. Keves ember nezett meg minden filmet. (few > every)

This is now a familiar pattern, the same one we have seen in German

and English; but how it arises in Hungarian remains to be spelled out,

and this requires a few remarks about the Hungarian FS and QS levels.

Topic and Focus 45

Like English, Hungarian allows multiple Focuses, and only one of

them can occupy the designated Focus position to the left of the verb.

Secondary Focuses can be located to the right of the verb; they cannot

occupy the positions to the left of the primary Focus, as these are Topic

positions. Thus, the Hungarian FS looks like this:

(39) Hungarian FS

[ . . . F V . . . (F) (F)]

The initial Focus position is the ‘‘normal’’ position for a single Focus; in

particular, it is the position from which Focus ‘‘projects’’ in Hungarian.

The postverbal Focus positions, if a sentence has any, are strictly narrow,

nonprojecting Focus positions.

From these remarks, we can see that the RT analysis of Hungarian

is essentially the same as that of German: in particular, SScQS >

SScCS (i.e., SS representation of QS dominates SS representation of

CS). However, as in German, FS representation can tip the balance

back.

Apart from considerations of focus, in order for minden filmet to have

wide scope, it would need to appear in preposed position, as it does in

(37a).

From the fact that preposing fixes relative scope among the pre-

posed elements, we can conclude that Hungarian QS has the following

structure:

(40) Hungarian QS

[QPi [QPj V . . . ]], where QPi has scope over QPj

And from the fact that apart from special focusing considerations, pre-

posing of quantified NPs is essentially obligatory, we can again conclude

that SS representation of QS dominates SS representation of CS.

We can see the two requirements of (39) and (40) interacting exactly in

the case of a wide scope focused quantified NP. If there is a single Focus,

it must occur in the single preverbal canonical Focus position, to satisfy

Focus representation. Such representation will also fix its scope. But if

there are two Focuses, only one can appear preverbally. The other must

appear postverbally, for the reason already discussed.

The following problem then arises. Suppose the second Focus is to

have wide scope, the situation of minden filmet in (38a). A case like this

has the following representational configuration:

46 Chapter 2

(41)

As is clear, QS is misrepresented by SS. Ordinarily, this would not

be tolerated, but in this special circumstance SS representation of FS

compensates.

If, on the other hand, minden filmet is not a Focus, as in (38b), then it

must move in order to take wide scope; the reason is that the match with

FS will not be improved by not moving, whereas the match with QS will

be. In other words, for (38b) the following three structures will be in

competition with each other:

(42)

Leaving CS representation aside, (42b) and (42c) are clearly superior to

(42a), as (42a) has a misrepresentation of QS. But (42b) and (42c) repre-

sent di¤erent meanings: (42b) has wide scope for QPi and (42c) for QPj.

(42c) and (42a) are competing for representation of wide scope for QPj,

and (42c) wins. The result is that (42b) must be the representation for

(38b) where the second quantifier minden filmet is unmoved, and so it

must have narrow scope. The di¤erence that focus makes is that (42c) is

Topic and Focus 47

not a viable candidate to represent focus on the second NP, and so (42a)

wins unopposed.

By similar reasoning, we can explain why two preverbal QPs have fixed

scope. I will assume that neither is focused, so QS representation is all

that is at stake. The canonical mapping gives the surface order (43b), so

the question is why the noncanonical mapping in (43a) is barred.

(43)

It turns out that (43a) is blocked by an alternative surface order, which

represents the scope order perfectly.

(44) SS:

QS:

QPj��!QPi

QPi��!

QPj

V

V

Thus, these Hungarian cases pattern just like the German cases con-

sidered earlier. Given this parallelism, one would expect parallels to the

cases in German in which the apparently ‘‘moved’’ phrases are not the

focused phrases themselves, but projections of the focused phrases, or

subparts of the focused phrases. I do not know the pertinent facts.

B&S give a di¤erent analysis of the ambiguity of (37b). In their view,

on the wide scope reading for minden filmet it has the structure in (45),

and the reason minden filmet has wide scope is that it is structurally

higher than the subjectþ V.(45) [Keves ember nezett meg] minden filmet.

The problem this analysis raises is of course that the subjectþ V is not anatural constituent. However, in the framework adopted by B&S it is: it

arises in a derivation in which both NPs are preposed to a position in

front of the verb.

(46) minden filmet [keves ember [nezett meg t t]]FP

The traditional Hungarian Focus position is the position immediately

preceding the verb; accepting this traditional account, B&S call the con-

48 Chapter 2

stituent consisting of the VP and the first-to-the-left NP FP. Then this

entire FP is itself preposed, giving the structure in (47).

(47) [Keves ember [nezett meg t t]]FP [minden filmet t].

That is, the derivation proceeds by hyper-Kaynian remnant movement.

There are some special problems here for analyses that use remnant

movement. The first is that such analyses cannot be applied to German,

for reasons given in the preceding section, nor can it be applied to English

HNPS, also for reasons already given—essentially, the two-way failure of

correspondence between the Focus and the moved constituent. But there

is a problem peculiar to Hungarian itself. The remnant movement of the

subjectþV is actually a movement of the entire FP, which consists of theentire VP and the focused constituent that immediately precedes it. So,

one would expect any phrase that was a part of the VP to show up to the

left of the in-situ QP; but in fact, such phrases (videon ‘on videotape’, in

the following example) can appear either before or after that QP,

(48) a. Keves ember nezett meg videon minden filmet.

b. Keves ember nezett meg minden filmet videon.

and the scope of minden filmet in both cases can be construed as wide (B.

Ugrozdi, personal communication). Example (48a) is compatible with all

theories, but (48b) is mysterious for B&S’s account, as it must have the

following structure:

(49) [Keves ember [nezett meg]VP] minden filmet tFP videon.

Somehow videon has escaped the VP (and FP), to the right. Pursuing the

logic of radical remnant movement, we might assign this example the

following structure, in which the apparent rightward movement of videon

is really the result of its leftward movement, plus radical leftward rem-

nant movement:

(50) a. [keves ember minden filmet videon [nezett meg t t t]]!b. [nezett meg t t t] [keves ember [minden filmet [videon t . . .

But the problem with this is that there should be no space between minden

filmet, which is focused, and the verb, as the Focus must always precede

the verb directly.

The general character of the problem that Hungarian poses for check-

ing theories of focus and topic is no di¤erent from what we have seen for

other languages: Checking Theory armed with triggering features for

Topic and Focus 49

focus and topicalization will wipe out any trace of Case and theta struc-

ture: once a remnant movement has taken place, all trace of Case and

theta structures is invisibly buried in entirely emptied constituents. This

consequence of remnant movement does not seem to hold empirically.

2.7 Spanish Focus

We have adopted the ‘‘answer to a question’’ test for identifying normal

focus. English allows normal focus anywhere, not just on the right edge,

as the constitution of FS would lead us to expect.

(51) A: Who did John give the books to t?

B: John gave MARY the books.

This can be taken to show that English allows FS to be misrepresented by

SS, sacrificed in this case for accurate CS representation.

(52)

Spanish, on the other hand, does not seem to permit nonfinal normal

Focuses—at least, not as answers to questions.

(53) A: Who called?

B: *JUAN

JUAN

llamo por telefono.

called

(Zubizarreta 1998)

B 0: Llamo por telefono JUAN.

(54) Spanish

FS‘ SS > . . .

The logic of this chapter suggests that Spanish di¤ers from other lan-

guages in favoring FS‘ SS representation over all others. The fact that

Spanish has a subject-postposing rule (as illustrated in (53B 0)) aids itin meeting this requirement, though RT does not causally connect the

ungrammaticality of (53B) with the presence of the postposing rule. One

reason for making no such connection is that other languages with subject

postposing (specifically Italian; see (55)) permit both (53B) and (53B 0).

50 Chapter 2

The ungrammaticality of (53B) follows directly. A related but di¤erent

approach to the problem would be to allow Spanish to have the same FS

as English, and to block (53B) by (53B 0)—that is, to say that the mereavailability of (53B 0) is enough to ensure that (53B) is blocked. I thinkthis is the wrong approach in general. First, there are languages like Ital-

ian, where the analogues of both (53B) and (53B 0) are grammatical.

(55) A: Who called?

B: GIANNI

GIANNI

ha

has

urlato.

called

B 0: Ha urlato GIANNI.(Samek-Lodovici 1996)

Second, even in a language like English, which lacks subject postposing,

we can create cases where the same logic would apply, blocking com-

pletely grammatical answer patterns like (56B).

(56) B: I gave the SATCHEL to Mary.

B 0: I gave to Mary the SATCHEL.

Clearly, the alternative order in (56B 0) does not compete with the order in(56B), or at least it does not win.

In German and English we saw that focus considerations can counter-

vail requirements of scope assignment. In Spanish we would expect focus

considerations to override requirements of scope assignment. That is, we

should find cases where NPs are obligatorily mis-scoped in surface struc-

ture because of overriding focus requirements. I do not have the relevant

facts at the moment. There is one methodological obstacle to getting

relevant facts: we have identified normal focus with answerhood, but

answers to questions generally take wide scope.

This is not to say that Spanish lacks any sort of Focus non-phrase-

finally—it lacks only the kind of Focus that is needed for answering

questions. Zubizarreta (1998, 76) gives the following example:

(57) JUAN

JUAN

llamo por telefono (

called

no

not

PEDRO).

PEDRO

Here a phrase-initial accented NP can serve as a contrastive Focus—just

where it cannot serve as a Focus for the purpose of answering questions.

In chapter 9 I will embed a theory of contrastive versus normal focus in a

theory of the values assigned at each level: FS will be the input to ques-

tion interpretation, but Accent Structure (a level to be introduced in

Topic and Focus 51

chapter 9), which normally ‘‘represents’’ FS by matching an accented

phrase to a focused phrase at FS, will be shown to give special meta-

linguistic e¤ects when FS is not canonically represented, as in (57).

What happens in Spanish when a normal Focus cannot be postposed,

for some reason intrinsic to the structural (i.e., CS- or SS-related) restric-

tions in the language? It is not clear, as it is di‰cult to form a question in

Spanish where the question word is nonfinal, because postposing and

reordering always seem to permit postposing. Nevertheless, small clause

constructions might be relevant cases.

(58) A: Con

with

quien

who

llegaron

arrived

enferma?

sick

‘Whoi did they arrive with sicki?’

B: Llegaron con MARIA enferma.

B 0: *Llegaron con enferma MARIA.B 00: *Llegaron enferma con Maria.

(J. Camacho, personal communication)

As the translation indicates, the PP con MARIA modifies the verb, and

the adjective enferma (with feminine ending) modifiesMaria and so enters

into some kind of secondary predication relation with it. That predication

relation does not permit postposing, of eitherMaria or the PP con Maria.

In that case the normal Focus can be nonfinal, as in (58B). This shows

that Spanish does permit nonfinal normal Focuses, but only when it has

no choice.

What does it mean to have no choice? In RT it must mean one of two

things. First, it could mean that the representing level simply has no form

that corresponds to [V PP AP], the form of the VP in (58B 0). Second, itcould mean that SS, in addition to representing FS, must also represent

some other structure, presumably the one in which small clause predica-

tion is adjudicated, and that the call to represent that structure is stronger

than the call to represent FS. As I have no considerations favoring one

over the other, I will let the question stand.

2.8 Russian Subjects

Russian exhibits the same behavior we found in German scrambling and

English HNPS: obligatory leftward positioning of elements unless they

are narrowly focused.

52 Chapter 2

(59) a. Usi

ears.acc.pl

zalozilo.

clogged-up.neut.sg

(Lavine 1997)

b. *Zalozilo usi.

(unless usi is narrowly focused)

(S. Harves, personal communication)

The only argument to zalozilo is the accusatively marked internal argu-

ment usi; one would normally expect it to appear postverbally, as other

such internal arguments would. But in fact that is not the normal order

for such sentences; rather, the order in which the argument occurs pre-

verbally is the normal order. It is normal in the sense that it is the only

order, for example, in which Focus projects, and so the only focus-neutral

order.

The di¤erence between German and Russian lies in the freedom with

which arguments can cross the verb. Nothing like Holmberg’s general-

ization holds in Russian.

There are two ways to account for this state of a¤airs. I will outline

them, without choosing between them.

The first possibility, the simpler of the two, is that Russian FS imposes

the NP V order, in that such a structure is the only one from which Rus-

sian permits Focus projection. In other words, Russian FS has the fol-

lowing structures, among others (where ‘‘ 0’’ marks accented positions).

(60) Russian FS

a. [NP 0 V NP 00]Fb. [NP 0 V]Fc. [V NP 0F]

The pattern in (60b) is in fact the pattern for Focus projection in

English intransitive sentences.

(61) a. One of my friends 0 died.b. One of my friends died 0.

If the main accent is on died, as in (61b), then died also bears narrow

focus; but if it is on friends, as in (61a), then it can project to the entire

sentence.

Under this regime the derivation of (59a,b) would look like this:

(62) a. CS: [zalozilo usi]‘ SS: [zalozilo usi] !cFS: [usi zalozilo]Fb. CS: [zalozilo usi]‘ SS: [zalozilo usi]cFS: [zalozilo usiF]

Topic and Focus 53

In this scheme Russian has a notion of subject at FS in the sense that

only structures with a preverbal NP allow projection. But the requirement

that there be a subject could arise somewhat earlier, so long as it did not

arise as early as CS, or wherever nominative Case is assigned, because it

clearly has nothing to do with nominative Case. Suppose, for concrete-

ness, that there is an SS requirement that there be a subject, which must

be met even if there is no nominative. Given such a requirement, surface

structures would have to have the following form, where the first NP is

the ‘‘subject’’:

(63) SS: [NP V . . . XP]

In that case the structures assigned to (59a) would look like this:

(64) CS: [zalozilo usi]‘! SS: [usi zalozilo]cFS: [usi zalozilo]F

That is, SS misrepresents CS, but faithfully represents the Focus-

projecting FS. Because constraints within levels are inviolable, the surface

structure for (59b) must be the same as the surface structure for (59a); but

then, the ‘‘heard’’ output is wrong, since the form of (59b) is Zalozilo usi.

In order to model the facts, there must be a ‘‘heard’’ representation that is

subsequent to SS; suppose that FS is such a representation. Then, FS will

(mis)represent SS, rather than the reverse, and the following derivation is

possible:

(65) CS: [zalozilo usi]‘! SS: [usi zalozilo]‘! FS: [zalozilo usiF]

Here SS misrepresents CS, as it must in order to meet the SS subject

requirement; in addition, FS misrepresents SS, presumably in order to

achieve narrow focus on usi.

In later chapters I will adopt two elements from this analysis: (a) that

FS represents SS, rather than the reverse, and (b) that di¤erent levels have

di¤erent ‘‘subjects.’’

The notion ‘‘subject’’ could be a feature of many di¤erent levels, but

with predictably di¤ering properties, if the properties depend on the prop-

erties of the levels themselves. Shape Conservation will tie the subjects

together: in the canonical mapping between levels, subject at one level

will map to subject at the next. See section 3.2.2 for a generalization of

the notion ‘‘subject’’ across the levels.

In the second account SS has a notion of subject that motivates the first

mismapping. This notion of subject is completely analogous to the Ex-

tended Projection Principle (EPP), since it is understood as a requirement

54 Chapter 2

distinct from Case assignment in minimalism. See Lavine (forthcoming)

for extended argument for this arrangement in a minimalist account of

Russian.

What Russian adds to the picture developed here is the fact that com-

plement and verb can reorder in mismapping. In Germanic and Romance

any reordering of complement and head is associated with Case, and the

evidence for separating the EPP from Case has come largely from exple-

tive constructions. Lavine’s work establishes that the phenomenon is a

good deal more general. I will return to Russian impersonal verbs in

chapter 5, after necessary notions about the RT levels are introduced in

chapter 3.

At present we have no means to weigh the relative cost of mismapping

that respects head order and mismapping that does not. In Williams

1994b, in a di¤erent theoretical context, I proposed the principle TRAC,

which suggested that reordering (for scrambling) was compelled to main-

tain the theta role assignment configuration, which among other things

specified the directionality of theta role assignment; but clearly this is not

generally true. Still, although I have no concrete suggestion to o¤er at this

point, I am tempted to think that reorderings that violate TRAC are

more costly than reorderings that do not.

2.9 Conclusion

2.9.1 Semantics of Form

The facts pertaining to the interaction of scrambling, topic, and focus

provide a rich testing ground for theories attempting to account for cor-

relations between syntactic form and meaning. Checking Theory provides

a simple account, interesting if correct because it assumes a straightfor-

ward compositional semantics: interpretable features are interpreted in

situ, accounting for meaning, and they act as syntactic attractors, ac-

counting for form. But for the constructions examined here, this account

does not seem to work; instead, what we find is a holistic matching of a

clause structure with a Case structure on the one hand and a quantifica-

tion structure on the other, without the possibility of reducing the inter-

relations involved in the match to a set of triggered movement relations.

This is because the possibility of mismatching two structures depends

crucially on what other structures exist, and because the ‘‘moved’’ con-

stituent does not correspond to the constituent on which the interpreta-

tion turns.

Topic and Focus 55

Perhaps the most radical conclusion that can be drawn from this is that

semantics is not compositional in a significant sense: the quantification

structure of a clause is fixed holistically, by matching a surface structure

with an independently generated quantification structure, and how that

match works is determined by what other matching relations the sur-

face structure enters into. To this extent, the quantification and focus

structures of a sentence are not determined by a strictly compositional

computation.

If this conclusion is accepted, then we must account for why semantics

appears to be compositional. I think we can best understand this by con-

sidering the question, when would a pattern-matching theory of semantics

be fully indistinguishable from a compositional semantics? The answer is,

when every possible attempt to match succeeded—when for any given

quantification structure there was a surface structure that fully matched a

Case structure and a focus structure, so that full isomorphism held across

the board. In that case we could use either theory interchangeably; the

result would always be the same. If the conclusion of this chapter is cor-

rect, English and German approximate this state, but neither achieves it,

and in fact they deviate from it in di¤erent ways. The approximation is

close enough that if only a narrow range of facts is examined in any one

analysis, the failure of compositionality will escape detection. Given sub-

stantive conclusions about the nature of each of the sublanguages, it

is probably inevitable that a completely isomorphic system would be

impossible.

2.9.2 How Many Levels?

How many levels are there? In this chapter I suggested four or five (CS,

TS, SS, QS, FS). At di¤erent points in what follows, I will talk about

models with di¤erent numbers of levels. What is the right number? If we

had the right number, and the properties of each, we would pretty much

have a complete theory. I have nothing like that. What I have instead is

evidence for a number of implicational relations of the sort, ‘‘If property

A occurs in level X and property B occurs in later/earlier level Y, then it

follows that . . .’’; and in fact the discussion in this chapter has had exactly

this character. These implicational predictions exploit the main idea

without requiring a full theory, and seem su‰ciently rich to me to en-

courage further investigation into what might be viewed as a family of

representation theories.

56 Chapter 2

Every theory—or more properly, every theoretical enterprise—has

at least one open-ended aspect to it. For example, di¤erent Checking

Theories propose di¤erent numbers of functional elements and di¤erent

numbers of features distributed among them. It is no trivial matter to

determine whether some group of checking analyses, and the Checking

Theories that lie behind them, are compatible with one another, and

consequently whether there is a prospect of a final Checking Theory that

is compatible with all of those analyses. What makes them all Checking

Theories is that they all have the same view of the design plan of syntax:

they all incorporate some notion of movement governed by locality or

economy that results in checked features, which are used up.

The same is true of representation theories. In chapter 4 I introduce a

new level, Predicate Structure. The reason for the new level is that the

levels determined by the considerations in chapters 1–3 do not allow

enough distinctions. In introducing the new level, I assume, basically

without demonstration, that it is compatible with the results of the previ-

ous chapters. In chapter 9 I introduce a new kind of level, Accent Struc-

ture, for focus. Again, I do so because the levels proposed earlier do not

allow enough distinctions, and I hope that the newly extended theory is at

least compatible with the results of this chapter. One can see repeating

itself here the history of the development of Checking Theories. Many

journal articles are devoted simply to achieving some descriptive goal by

splitting some functional element into finer structure.

Much the same can be said of OT. There, the content of the constraints

themselves is not fixed, nor is the architecture (division into modules) of

the linguistic system. So the number of ‘‘Optimality’’ Theories is enor-

mous and varied, but we are still justified in calling them Optimality

Theories if they hew to the basic tenets: the calculus for evaluating can-

didate structures against a set of constraints, and the notion that all vari-

ation reduces to constraint ordering.

In like manner, I would reserve the term Representation Theory for any

theory that posits multiple syntactic levels in a shape-conserving relation

to one another, whatever the levels turn out to be. To that, I would like to

add one other substantive hypothesis, the Level Embedding Conjecture of

chapter 3, if for no other reason than I feel that the most interesting pre-

dictions follow from the model that incorporates that idea. A number of

things can be inferred about this class of theories, things that are inde-

pendent of various decisions about what the levels are.

Topic and Focus 57

The correct RT will have no fewer levels than are envisioned in this

chapter. Can we see enough of how the methodology works to gain some

rough idea about what the final model might look like? I think the limit-

ing case is an RT with exactly the same number of levels as there are

functional elements in the structure of a clause in the corresponding

Checking Theory. That is, it would not have a ‘‘Case Structure’’; rather,

it would have an ‘‘Accusative Structure’’ and a ‘‘Dative Structure.’’

Likewise, it would not have a Theta Structure; rather, it would have a

Patient Structure and an Agent Structure. I think this limiting case is not

correct, because there appear to be functional subgroupings of these

notions: patient and theme seem to be part of a system with certain

properties, as do accusative and dative. But even if this limiting case

turned out to be correct, RT would not thereby become a notational

variant of Checking Theory, because the architecture is di¤erent, and the

architecture makes predictions that Checking Theory is intrinsically inca-

pable of. I turn to those predictions in the next chapter.

58 Chapter 2

Chapter 3

Embedding

In the preceding chapters the levels of RT have been used to account for

word order facts of a certain sort: mismapping between levels has been

invoked as a means of achieving marked word orders with certain inter-

pretive e¤ects. In this chapter I will sketch other properties of the levels

and indicate how certain high-level syntactic generalizations might be

derived from the architecture of the model in a way that I think is un-

available in other theoretical frameworks.

I will consider two kinds of embedding here, complement embedding

and functional embedding, and I will treat them very di¤erently. Suppose

we accept the notion that there is a fixed hierarchy of functional elements

(T, Agr, etc.) that compose clause structure (and similar sets for other

phrase types). Functional embedding is then the embedding that takes

place within one fixed chain of such elements—embedding AgrO under T,

for example. Complement embedding is the embedding that takes place

between two such chains—embedding NP or CP under V, for example.

In this chapter I suggest that complement embedding takes place at

every level, with di¤erent complement types entering at di¤erent levels.

The result is an explanation of the range of clause union e¤ects and a

derivation of a generalized version of the Ban on Improper Movement.

The methodology is pursued further in chapters 4 and 5, resulting in what

I call the LRT correlations: for any syntactic process, three of its prop-

erties will inevitably covary, namely, its locality, its reconstructive behav-

ior, and its target (e.g., A or A position). These properties are tied together

by what level they apply at, and in particular by what complement types

are defined there. In chapter 4 I show that anaphors are ‘‘indexable’’ in

this way by level, with predictably varying properties across the levels.

English himself, for example, is a CS anaphor, whereas Japanese zibun is

an SS anaphor; ideally, all properties are determined by those assignments,

and earlier anaphors ‘‘block’’ later ones by general principle (the Level

Blocking Principle). In chapter 5 I do the same for scrambling rules. The

predictions bound up in these correlations rely on the feature of RT that

does not translate into minimalism or other theories, namely, the decom-

position of clause structure into distinct sublevels or sublanguages.

In chapter 7, turning to functional embedding, I propose an axiomati-

zation of X-bar theory that reduces head-to-head movement to X-bar

theory, accounting for its locality and especially for its restriction to a

single clause structure. In chapter 8 I take up the morphological con-

sequences of this account. In RT a lexical item is understood as ‘‘lexical-

izing’’ or ‘‘representing’’ a subsequence of functional structure.

3.1 The Asymmetry of Representation

Before turning to complement embedding, I need to make a point about

representation that is entailed by the account I will give. Representation

will necessarily be an asymmetric relation in the model that embraces the

results of this chapter, for reasons having to do with how embedding is

accomplished.

By hypothesis, all levels are involved in embedding (the Level Embed-

ding Conjecture; see section 3.2.1). Functional elements are themselves

associated with particular levels. Tense, for example, is not defined before

SS, and so enters structures there at the earliest. Consequently, there will

be representation relations that systematically violate isomorphism. For

example:

(1) TS: [agent [V theme]]‘CS: [NPnom [VT NPacc]]T

There is at least one element in CS—namely, the T(ense) marking—that

is absent from TS in (1); hence, there is not a two-way one-to-one map-

ping between the two sets of structures.

Despite the lack of isomorphism, such relations will count as com-

pletely true mappings, not mismappings. The reason is that the represen-

tation relation itself will have an asymmetric definition. To take TS‘CS

as a special case, true representation will have the following properties:

(2) a. Every item in TS maps to an item in CS.

b. Every significant relation between items in TS maps to a relation

in CS (for relations like ‘‘head of ’’).

Importantly, (2) does not impose the reverse requirements: that every

item in CS be mapped to an item in TS, and so on. If (2) defines repre-

60 Chapter 3

sentation, then representation is not really isomorphism, but homomor-

phism, and so is asymmetric. A homomorphism is like an isomorphism in

being structure preserving and therefore reversible; but the reverse is not

defined for the full range. Representation must be asymmetric if new lex-

ical or functional material enters at each level, as the hypotheses to be

entertained in this chapter will require. The Case structure in (1) includes

more than the theta structure (T, in particular), but it can still be said to

represent the theta structure in (1), if (2) is true. Under this view the mis-

mappings described in chapter 2 are now to be viewed as deviations from

homomorphism, rather than from isomorphism.

No kind of embedding is immune. Adjuncts will also enter clause

structure at later levels, perhaps at all levels. Wh movement itself is not

defined until SS, presumably also the level where CP structure is defined

(or, where it takes IP), and so any adjuncts that are themselves CPs (such

as when, where, and why clauses) involving wh movement cannot enter

until that point either.

Let us look at a concrete example involving adjuncts. (3) is a fully valid

representation relation; the tree on the right obviously has more in it, but

that doesn’t matter if all the items and relations in the first tree have cor-

respondents in the second.

(3)

(4) Preserved relations

V head of VP

NP1 subject of VP

NP2 object of V

NP1 left of VP

The new item, the adverb, and the new relations it enters into with the

rest of the sentence do not interfere with the representation relation.

In what follows I will speak of the representation relation as holding

sometimes between two levels or sublanguages, sometimes between two

members (or trees) of those levels or sublanguages, and even sometimes

Embedding 61

between subparts of trees in di¤erent levels. It is of course the fact that

the representation relation preserves the structure of one level in the

structure of the next level that makes it possible to slip from one to an-

other of these usages.

Wh movement takes place within the SS level, in the following way. A

structure in CS is mapped into a very similar structure in SS; wh move-

ment derives another structure within SS; and that structure (at least in

languages with overt wh movement) is then mapped to a structure in FS.

(5)

As in previous chapters, the wavy arrow (‘) marks a representation

relation, and now the straight arrow marks an intralevel derivational re-

lation. So the structure has ‘‘grown’’ a SpecC in SS. In e¤ect, the Case

structure is mapped isomorphically to a subpart of the surface structure

that carries forward (backward?) from there.

Exactly how the functional elements, ‘‘real’’ movement rules, and so

on, sort out into levels remains to be fixed empirically. But in advance of

that, this chapter lays out a theory that says that all the important prop-

erties of the items will in turn be fixed by that choice.

Some processes, elements, and such, may be defined at more than one

level. For those cases, two of which are anaphors and scrambling rules,

the model has further consequences: blocking holds between levels, so

‘‘early’’ elements always block ‘‘late’’ elements (see Williams 1997 for

further discussion).

It should be clear that there is a relation between the levels of RT and

the layers of functional structure in standard Checking Theories. The

asymmetry noted above is fully consistent with this. Later levels of RT

correspond to higher layers in functional structure. In particular, later

levels have ‘‘bigger’’ structures than earlier levels: I will suggest below

that CP exists in SS, for example, but only IP exists in some earlier

structure (CS or PS). For some considerations, it will be simple to trans-

late between RT and Checking Theories, because of the ‘‘higher equals

later’’ correspondence that holds between them. I will naturally dwell on

those considerations for which there appears to be no easy translation

from RT to Checking Theory in order to e‰ciently assess the di¤erences

between them.

62 Chapter 3

3.2 Complement Embedding and the Level Embedding Conjecture

I will suggest in this section that each of the RT levels defines a di¤erent

complement type and that all complement types are embeddable. The

complement types range from the very ‘‘small’’ clauses at TS to the very

‘‘large’’ clauses at FS. The range of complement types corresponds to

the degree of clause union that the embedding involves: TS complements

are very tight clause union complements (like serial verb constructions),

whereas FS complements are syntactically isolated from the clause they

are embedded into. This di¤erence follows immediately from the model

itself: RT automatically defines a range of types of embedding comple-

ments, one type defined at each level, as summarized in (6).

(6) Types of embedding

TS objects: serial verb constructions (VPs?)

CS objects: exceptional Case marking; control? (IPs)

SS objects: transparent that clause embedding (CPs)

FS objects: nonbridge verb embedding (big CPs)

On the right I have indicated the category in standard theory to which

the objects defined at each level correspond. This correspondence cannot

be taken literally as a statement about what objects are defined in each

level of RT, because di¤erent RT levels define di¤erent types of objects

altogether. For example, TS does not define VPs; rather, it defines theta

structures, which consist of a predicate and its arguments. Nevertheless,

the objects in the RT level of TS correspond most closely to the VPs of

standard theory, and so on for the rest of the levels in (6).

This aspect of embedding is a ramified ‘‘small clause’’ theory, with

small, medium, large, and extra large as available sizes. In a strict sense,

the structures ‘‘grow’’ from left to right, theta structures being the small-

est and focus structures the largest.

3.2.1 The Level Embedding Conjecture

There are thus many types of embeddable complements under a ramified

small clause theory, but where does embedding take place? One way to

treat complement embedding in RT would be to do all embedding at

TS. Complex theta structures would be mapped forward into complex

Case structures, and so on; and higher clause types would then be

‘‘recycled’’ back through TS for complement embedding, as the diagram

in (7) indicates.

Embedding 63

(7)

This arrangement would make RT most resemble minimalist practice and

its antecedents. I think though that much can be gained by a di¤erent

scheme: the one already alluded to, in which di¤erent kinds of embedding

are done at di¤erent levels. As there seem to be di¤erent ‘‘degrees’’ or

‘‘types’’ of embedding with respect to how isolated from one another the

matrix and embedded clauses are, we might gain some insight into them

by associating the di¤erent types with di¤erent levels in RT. I will refer

to this theory of embedding as the Level Embedding Conjecture (LEC). In

RT the LEC is in a way the simplest answer to the question of how

embedding is done: it says that an item can be embedded exactly at the

level at which it is defined, and no other.

(8)

For example, the tightest clause union e¤ects can be achieved by

embedding one theta structure into another in TS, deriving a complex

theta structure, which is then mapped into a simple Case structure. The

behavior of such embedding is dominated by the fact that there are too

many theta roles for the number of Cases, so some kind of sharing or

Case shifting must take place. A good example of this is serial verb con-

structions, where two theta role assigners (i.e., verbs) must typically share

a single Case-marked direct object, and where there must be a tight se-

mantic relation between the two.

At the other extreme, that clause embedding takes place much later, in

SS for example. What does a derivation involving that clause embedding

look like? Two clauses (matrix and embedded) both need to be derived to

the level of SS, at which point one is embedded in the other.

(9) TS: CS: SS:

[Bill, [believes]] ‘ [Bill, [believes]] ‘ [Bill [believes]]þ[Mary, [ate a dog]]‘ [Mary [ate a dog]]‘ [Mary [ate a dog]] !

[Bill [believes [Mary [ate

a dog]]]]

The verb believe is subcategorized to take an SS complement. This sub-

categorization is always taken to determine not only the type of the

complement, but also the level in which the embedding takes place; it is

64 Chapter 3

this double determination that generates the broad consequences alluded

to at the beginning of this chapter, and detailed below.

Before we turn to the details of embedding at di¤erent levels, a word

about the notion ‘‘lexical item’’ in RT. Lexical items obviously partici-

pate in multiple representations. Ordinarily the entries in the lexicon are

regarded as triples of phonological, syntactic, and semantic information.

In RT lexical items are n-tuples of TS, CS, . . . , and phonological infor-

mation. For example, the theta role assigner squander, which assigns a

theme and an agent role in TS, is related to the Case assigner squander,

which assigns accusative Case in CS; to the surface verb squander with its

properties, whatever they are; and so on.

(10) squander TS: [agent [squander theme]]

CS: [squander accusative]

SS: . . .

. . .

Part of the algorithm that computes isomorphism between levels clearly

takes into account identity of lexical items across di¤erent levels; thus,

(11a) and (11b) will count as isomorphic, but (11c) and (11d) will not.

(11) a. [agent [squander [theme]]] ‘ b. [nominative [squander

accusative]]

c. [agent [squander [theme]]] ‘! d. [nominative [squash

accusative]]

Lexical entries such as (10) are the basis for such identities. The rest of

this chapter assumes something like this conception of the lexicon, actu-

ally just the obvious elaboration of the usual assumption.

3.2.1.1 TS Embedding As mentioned above, the lowest level of em-

bedding is associated with the strongest clause union e¤ects, since a com-

plex theta structure is represented by a simple Case structure. Consider

for example the following serial verb constructions from Dagaare (12a)

and ¼j Hoan (12b):(12) a. o

3sg

da

past

mOng

stir

la

factive

saao

food

de

take

bing

put

bare

leave

ko

give

ma

me

(Bodomo 1998, (32))

b. ma

1sg

a-

prog

qkhupour

j’oput.in

djo

water

ki

part

kx’u

pot

na

in

‘I am pouring water into a pot.’

(Collins 2001)

Embedding 65

In the serial verb construction the clause contains several verbs, each

thematically related in some way to at least some of the objects. Signifi-

cantly, there is a single direct object, and a single indirect object. We can

view this as a combination of two theta structures, followed by a subse-

quent representation by a single Case structure.

(13) TS: CS:

{V1 theme}þ ‘ [VCase assigner NP NP]

{V2 theme}þ{V3 theme, goal}¼{V1 V2 V3 theme goal}

In other words, three simple theta structures, one for each V, are com-

bined into a complex theta structure, and that is mapped onto a simple

ditransitive Case structure.

It is typically remarked in connection with such constructions that the

connection between the verbs is extremely tight semantically, so tight that

the verbs can only be understood as denoting subparts of a single event.

If so, we might suppose that events are defined in TS, hence that com-

plex events are derived there. The ‘‘þ’’ in (13), then, is a complex-event-deriving operator with a limited range of possible meanings, and only

these are available for serial verb constructions. The possible meanings

include ‘causes’, ‘occurs as a part of the same event’, and so on.

Such remarks are reminiscent of what is often said about ‘‘lexical’’

causatives: that the notion of causation is extremely direct, causing and

caused events constituting a single complex event. For example, (14a,b)

are not synonymous.

(14) a. John encoded the information.

b. John brought it about that the information got encoded.

(14b) holds of a much wider set of situations than (14a). (14a) covers only

the case where John performed an action that resulted in the encoding

without other mediating events or other agents. In fact, (14b) might tend

to exclude the meaning that (14a) has, but this is most likely due to

blocking (i.e., for the situations for which (14a) and (14b) are both appli-

cable, (14a) is preferred, because it is more specific than (14b)).

As we have hypothesized that morphology has access only to TS, and

to nothing higher, it is not surprising that lexical causatives are restricted

to the ‘‘single complex event’’ interpretation, since that is the only inter-

pretation available at TS, a fact we know independently from serial verb

constructions.

66 Chapter 3

There is a more complex situation that arises in serial verb construc-

tions: each of the verbs has Case-assigning properties. The second verb

is sometimes felt to be ‘‘preposition-like.’’ These might be analyzed as a

complex theta structure mapping into a complex Case structure, where

the complex Case structure has two Case assigners, V and P. I will leave

the matter for further work.

Other examples of TS embedding might include tight causative con-

structions. The causative in Romance involves Case shifting (nom! acc,acc! dat) that can be understood as arising from the need to accommo-date a complex theta structure in a simple Case frame.

(15) Jean a fait þ Pierre manger la pomme!Jean

Jean

a fait

made

manger

eat

la

the

pomme

apple.acc

a

to

Pierre.

Pierre.dat

‘Jean made Pierre eat the apple.’

The complex predicate constructions studied in Neeleman 1994 are further

potential examples. We could characterize embedding in TS as embedding

that shows obvious apparent violations of the Theta Criterion—two or

more verbs assign the same theta role to the same NP, without the media-

tion of PRO or trace. The reason this embedding does not respect the Theta

Criterion is that the Theta Criterion itself does not hold in TS; rather, it

holds of the way that theta structures are mapped to Case structures.

3.2.1.2 CS Embedding CS embedding conforms strictly to the Theta

Criterion, but may exhibit Case interrelatedness between two clauses.

Exceptional Case-marking (ECM) constructions might well be good

instances of CS embedding. Case is not really shared between the two

clauses in these constructions; rather, the matrix V has Case influence in

the embedded clause. With regard to event structure, there is no ‘‘single

event’’ interpretation, as the two verbs are part of the designation of dif-

ferent events.

(16) John believes himself to have won the race.

Furthermore, although the embedded clause in (16) is transparent to Case

assignment by the verb of the matrix clause, the sentence clearly has two

Case assignment domains, and in fact in (16) two accusative Cases have

been assigned. Thus, ECM is di¤erent from TS embedding.

(17) CS: [John believes]þ[himself to have won the raceacc]¼John believes himselfacc to have won the raceacc

Embedding 67

English provides some minimal pairs illustrating the di¤erence between

CS and TS embedding. Expletives do not exist in TS, where every relation

is a pure theta relation. Expletives exist to fill Case positions that do not

have arguments in TS mapped to them. Given this, we might wish to an-

alyze certain small clause constructions as CS embeddings and others as

TS embeddings, depending on whether an expletive is involved or not.

English has two constructions that might di¤er in just this way: most

small clause constructions require an expletive in the direct object posi-

tion when the subject of the small clause is itself a clause, but a few do

not.

(18) a. I want to make *(it) obvious that Bill was wrong.

b. I want to make (very) clear that Bill was wrong.

For a handful of adjectives like clear and certain, the verb make does not

require an expletive; and as the adverb very in (18b) indicates, the reason

is not simply that make-clear is an idiosyncratic compound verb. If we

suppose that expletives do not enter until CS, we could assign (18a,b) the

following structures, respectively:

(19) a. TS: [make clear]VP CS: [make clear]V that S

b. TS: [make]V CS: [make it clear . . . ]VPc. *How clear did he make that he was leaving?

d. How clear did he make it that he was leaving?

Make-clear is a complex predicate formed in TS, analogous to causative

constructions of the kind found in Romance, where, incidentally, exple-

tives are also excluded (Kayne 1975).

Expletives then mark ‘‘formal’’ non-TS Case positions, that is, posi-

tions with no correspondent in TS. It is likely that ‘‘Case’’ itself is not a

single notion; in particular, it is likely that so-called inherent Case is

present in TS, and only derivatively in CS. CS then would introduce only

formal Cases, not inherent or semantic Cases. Evidence for this would

come from compounding: as we have restricted compounding to repre-

senting TS, only inherent Case should show up in compounding. Al-

though I have not investigated the matter in detail, this does conform to

my general impression.

In the case of make clear, the TS phrase [make clear]VP is mapped to

the CS atom [make clear]V. That it is truly atomic can be seen in the

contrast between (19c) and (19d): make clear does not allow the extrac-

tion of clear, but make it clear does. In previous work (Williams 1998a) I

attributed this to the di¤erence between a lexical formation (make clear)

68 Chapter 3

and a phrasal formation (make it clear), along with a principle stipulating

the atomicity of lexical units in phrasal syntax. RT allows a relativized

notion of atomicity: if a phrase at one level corresponds to, or is (mis)-

mapped to, an atom at the next level, that atom will be frozen for all

processes subsequent to that level. An advantage of this conception is that

it does not force us to call make clear a word in the narrow sense, a des-

ignation discouraged by its left-headedness and by its modifiability (make

very clear). The relativization involved here—relativizing the notion of

atomicity to hold between every pair of adjacent levels—will become a

familiar notion in chapter 4 and subsequently.

3.2.1.3 SS and FS Embedding Embedding at SS is ordinary that clause

embedding. Case cannot be shared across the that clause boundary (but

see Kayne 1981) because Case is already fully assigned by the time the

that clause is embedded in its matrix.

(20) CS: SS:

I think ‘ I thinkþhe is sick‘ he is sick ¼ I think that he is sick

If wh occurs in SS, as I have assumed, then embedding in FS should be

out of the reach of wh movement; that is, complements embedded in FS

should be absolute islands with respect to FS embedding. What sort of

embeddings would be expected in FS? Presumably, embeddings in which

it would be reasonable to attribute a focus structure to the complement.

Since focus is generally a root ‘‘utterance’’ feature, the embedded clauses

that are focus structures would be those that most closely match matrix

utterances in their semantics and properties. From this perspective, it

would be reasonable to expect ‘‘utterance’’ verbs like exclaimed and

yelled to embed focus structures. These verbs embed not just proposi-

tions, but ‘‘speech acts,’’ loosely speaking, as the verbs qualify the manner

of the act itself. This is the class of verbs traditionally identified as non-

bridge verbs, so called because their complements resist extraction.

(21) *Who did John exclaim that he had seen t?

To the extent that this is so, then the assignment of this kind of

embedding to FS derives the behavior of these verbs with respect to wh

extraction.

(22) SS (wh movement):‘FS (too late for wh movement):

[John exclaimed]þ John exclaimed [he saw who]

[he saw who]

Embedding 69

In the case of nonbridge verbs, the parts are simply not put together in

time for extraction, hence their islandhood. In fact, though, they should

not be absolute islands, but islands only to pre-FS movement. If a move-

ment is defined for FS, these verbs should act like bridge verbs for that

movement.

In order to guarantee that embedding is delayed until FS, the lexi-

cal entry for nonbridge verbs must be endowed with subcategorization

for FS objects, which is in keeping with their meaning, as mentioned

earlier.

It is reported that some languages (e.g., Russian) resist wh extraction

from all tensed clauses. Perhaps in such a language, all tensed-clause

embedding takes place at FS.

The derivation of the islandhood of nonbridge verb complements is

an example of a kind of explanation natural to RT. I will refer to such

explanations as timing explanations.

3.2.1.4 Countercyclic Derivation The LEC forces some rather un-

expected derivations. The matrix may develop a very complex structure

itself before the lowest embedded clause is actually embedded into it. For

example, consider a sentence in which an ECM infinitive is embedded in

a matrix that clause, and another that clause is embedded under the verb

in the ECM clause.

(23) a. [that . . . [him to have said [that . . . ]]ECM]

b. He believes him to have said that he was leaving.

The LEC actually requires that the ECM construction be embedded in its

matrix before the that clause is embedded under the verb in the ECM

clause, so for this kind of case the order of embedding is ‘‘countercyclic.’’

This is of course because under the LEC, ECM embedding takes place in

CS, and that clause embedding takes place in SS, so the derivation looks

like this:

(24)

70 Chapter 3

Similarly, it could happen that a verb taking a that complement is

embedded under a matrix raising verb before its own complement clause

is added.

(25) TS: . . . SS:

[seemsþ sad] seems [sad that Bill is leaving]

The reason for thinking that raising embedding takes place in TS is that it

is found in compound formations.

(26) a. sad seeming

b. odd appearing

We have seen reason to restrict compounds to levels that are repre-

sentations of TS; but then since raising constructions can appear as com-

pounds, raising must be a TS relation, and so the order of derivation in

(25) follows.

I do believe that it is entirely harmless that derivations proceed this

way. I wish it were more than this; countercyclic embedding is a distinc-

tive feature of RT, so that one should be able to exploit it to find empiri-

cal di¤erences with other theories, none of which have this property. Still,

I have not been able to find any such di¤erences.

It is important to emphasize that the LEC ensures an orderly assem-

blage of multiclause structure, just as much as the incremental application

of Merge in minimalist practice; it simply gives a di¤erent order. Embed-

dings take place in the order of complement type, rather than in bottom-

to-top order.

3.2.2 Consequences of the LEC

To sum up the consequences of the LEC, one might say that it forces or

suggests generalizations of fundamental elements of linguistic structure:

generalized A/A distinction, subjecthood, generalized anaphoric binding,

generalized scrambling. The dimension of generalization is always across

the RT levels. The first two are taken up in the remainder of this section,

the last two in chapters 4 and 5.

3.2.2.1 The Relational Nature of Improper Movement The LEC derives

the Ban on Improper Movement (BOIM) directly. In fact, it derives a

generalization of it that is distinctive to RT.

The BOIM is generally taken to block movement from A positions to

A positions, as in (27), in which John moves, in its last step, from SpecC

of the lower clause to SpecI in the higher clause.

Embedding 71

(27) *John seems [t [Bill has seen t]]CP.

I will take it as given that the BOIM is real. I will suggest how it can be

generalized in RT, and how it can be derived from the basic architecture

of the model in a way that is not possible in standard minimalist practice

or its antecedents.

The generalization of the BOIM to the Generalized BOIM (GBOIM)

is nothing more than the generalization of the A/A distinction that we

will see in this chapter and in chapters 4 and 5. I will state the GBOIM as

it would occur if it were instantiated in a standard model, one with a

ramified Pollock/Cinque-style clause structure.

(28) The GBOIM

Given a Pollock/Cinque-style clausal structure X1 > � � � > Xn

(where Xi takes Xiþ1P as its complement), a movement operationthat spans a matrix and an embedded clause cannot move an

element from Xj in the embedded clause to Xi in the matrix, where

i < j.

In RT, as we will see shortly, the GBOIM follows from the architecture

of the theory and therefore needs no independent statement.

The GBOIM is a proper generalization of the BOIM to the extent that

A positions are beneath A positions in clausal architecture as a special

case; in general, according to the BOIM, if you are on the second floor of

clause A, and you move into clause B, you can’t move to a floor any

lower than the second.

Since we will generalize the A/A distinction in RT to the relation be-

tween any pair of levels, and since there will be no A/A distinction apart

from this, the BOIM! GBOIM generalization is forced in the presenttheoretical context. In this generalized version, items in Case positions

in an embedded clause, for example, cannot move into theta positions in

the matrix, and so forth. However, items in theta positions can move to

higher theta positions, higher Case positions, and so on.

The GBOIM is not obviously true, and a number of existing analyses

run counter to it, to the extent that it regiments A and A positions as

special cases. For example, any analysis in which clitic movement is A

movement is contrary to the BOIM, if the subject position is an A posi-

tion superior to the clitic position. Analyses of this sort must be re-

examined in light of the GBOIM. Some are taken up below, though most

will remain unaddressed.

72 Chapter 3

The BOIM itself is not derivable in minimalist practice from the basic

principles governing derivation, such as economy or extension (the strict

cycle). For example, at the point at which wh movement would violate

the BOIM, a minimalist analysis would have built up a structure like

(29a), and neither economy nor the strict cycle nor extension prevents the

application of wh movement to derive (29b) by putting the wh in SpecV

(or SpecI, for that matter).

(29) a. [V [wh . . . ]CP]V 0

b. [wh [V [t . . . ]CP]V 0 ]V 0

This is not to say that there cannot be principles that block particular

cases of the BOIM (the GBOIM is in fact such a principle); my limited

point is that it does not follow organically from basic assumptions about

derivation or economy.

But I believe the GBOIM does follow unavoidably from the basic ar-

chitecture of RT, or something like it, so long as the LEC is a part of it.

The RT levels determine di¤erent kinds of embedding, as described in the

previous sections. To make the discussion concrete, assume that SS is the

level at which ‘‘transparent’’ that clause embedding takes place. Di¤erent

levels are also associated with di¤erent kinds of movement; again, for the

sake of concreteness, let’s assume that SS is the level at which wh move-

ment takes place and CP structure is introduced. Proper movement takes

place in derivations with the following character: first, two surface struc-

tures are built up by building up all of the structures smaller than (read,

‘‘earlier than’’) these structures. Then the two surface structures are com-

bined, and finally movement takes place.

(30)

The GBOIM follows from the RT architecture in this way. The earliest

that wh movement can take place is after the embedding in SS. However,

at that point, not only has the embedded clause been built up to the level

of SS, but so has the matrix clause; thus, there is no analogue of (29a) for

wh movement to apply to. When wh movement applies in SS, since the

surface structure it applies to already has a CP structure, extension (or

something like it) requires that it operate in such a way as to move the wh

item to the periphery of that surface structure. It will thus always move

the wh item to SpecC, since that position is introduced in SS.

Embedding 73

For improper movement to take place, the matrix would have to have

peripheral positions ‘‘lower’’ than the highest position in the embedded

clause. However, that possibility is excluded by the LEC, which says that

embedding can take place only among elements of the same type, because

each level defines a di¤erent type. (31), repeated here from (29a), is

therefore not a possible structure in RT with the LEC.

(31) [V [wh . . . ]CP]V 0

The problem in deriving the GBOIM in a theory in which (31) is a well-

formed syntactic object is that the matrix and embedded clauses are in

di¤erent degrees of development. The embedded clause is fully developed

to the level CP, but the matrix is only partially developed, so there is no

level at which it can embed this CP and thereby derive the improper

movement in (29b). Of course, the matrix itself can be developed to the

level CP, but then the embedding will occur in SS, and extension, or some

equivalent, will force movement to the top of the matrix CP, respecting

the BOIM. It is this di¤erence in development of matrix and embedded

structures that gives rise to the problem of improper movement. In RT,

since embedding is always of objects at the same level, no such di¤erence

arises and improper movement is therefore impossible.

RT crucially needs some notion of extension to prevent trivial defeat

of the most interesting predictions of the LEC. These trivial defeats cor-

respond to what in the standard model would be violations of the strict

cycle if it were applied in a phrase-by-phrase manner, as suggested in

Williams 1974. I will assume that extension, essentially as it is used in

Chomsky 1995, has to be part of the intended interpretation of RT as

well: any operation has to a¤ect material that could not have been af-

fected in a previous level. The parallelism with the standard interpretation

is clear: simply replace level with cycle, where every node is ‘‘cyclic.’’

Without something like extension there is no good reason why movement

in SS would have to be to the periphery of the CP structure defined there,

and not, for example, to SpecIP. In general, extension requires that the

periphery be a¤ected by an operation. There are in fact some problems

with the literal notion of extension, which I will take up later.

Two immediate empirical consequences of the GBOIM are worth

noting here.

First, ‘‘raising to object position’’ as a movement rule is impossible,

since it is a movement from a higher (subject) position in the embedded

clause to a lower (object) position in the matrix clause. If the arguments

74 Chapter 3

(in, e.g., Postal 1974 or Lasnik 1999) for raising to object in ECM con-

structions are correct, then the analysis involving (improper) movement

must now be replaced by an analysis in which mismapping the TS‘CS

representation accounts for the facts. Only ‘‘real’’ (intralevel) movement

is governed by extension.

The more di‰cult problem is tough movement. I think the widely

accepted misanalysis of tough movement as involving movement to ma-

trix subject position has obstructed progress in syntax at several points in

the past 40 years, and so deserves close attention. According to the stan-

dard analysis, tough movement actually seems to involve a pair of move-

ments: first, wh movement to SpecC, and second, a (BOIM-violating)

movement from SpecC of the lower clause to SpecI of the higher.

(32) Johni is tough ti to please ti.

Of course, the di‰culty can be solved by simply generating John in the

top position in the first place, eliminating the second movement. But that

implies that John receives a theta role from tough, and what has always

stood in the way of that conclusion is the synonymy of (32) with (33).

(33) It is tough to please John.

Call (32) the object form, and call (33) the event form (because (32) has

the ‘‘object’’ John as its subject, and (33) has the event to please John as

its subject (extraposed)).

The main argument for tough movement, then, is the synonymy of the

event form and the object form of these sentences. But this synonymy

could be misleading. One component of the synonymy is the perception

that selection restraints on John in the two sentences not only are similar,

but seem to emanate wholly from the lower predicate ( please), and not at

all from the higher predicate (tough). But that perception may be illusory.

It may be that a class of predicates (easy, tough, etc.) takes such a broad

class of arguments, including both events and objects in general, that it

is hard to detect selection restraints; in e¤ect, anything can be easy, for

example. In some cases there is an obvious sense in which a thing can be

easy.

(34) The test/contest/chore/task/errand/puzzle was easy.

At least for such cases, it must be admitted that easy takes a single

nominal argument as subject. For other cases it is less obvious what it

means to apply the term easy.

Embedding 75

(35) The book/store/bank/rock/tower/dog was easy.

For such cases, though, either the context will determine in what way the

thing is easy, or the way it is easy can be specified in an adjunct clause.

(36) The book was easy [to read/write/clean/hide].

But if this view is correct, we are taking the object form to have the

following properties: easy takes the object as its (thematic) subject, and

the clause after easy is an adjunct. We then must conclude that the tough

sentences are at least ambiguous, between this and the usual BOIM-

violating derivation; but now perhaps we can eliminate the latter deriva-

tion, as redundant.

In fact, there is good reason to. First, there are structures just like (36)

whose object and event forms are not synonymous, or even equivalent in

terms of grammaticality.

(37) a. Mary is pretty to look at.

b. *It is pretty to look at Mary.

So we know we need structures of the type suggested by (36) anyway. The

ungrammaticality of (37b) follows simply from the fact that pretty cannot

take an event as an argument, but easy can.

Second, there are structures synonymous with (35) that cannot be

derived by movement. Consider (38a–f ), where (38a) parallels the sen-

tences in (35).

(38) a. John is good.

b. John is good to talk to.

c. It is good to talk to John.

d. John is good for conversation.

e. John is a good person to talk to t.

f. *It is a good person to talk to John.

Good acts like a tough predicate in (38a–c), showing the synonymy of

object and event forms. However, (38d), though roughly synonymous

with (38c), could not conceivably be derived from it. The same is true of

(38e), as (38f ) shows.

So we need to generate the object form directly, with the object getting

its primary theta role from the tough predicate, and getting its relation to

the embedded predicate only indirectly, as the embedded predicate is an

adjunct to the tough predicate.

76 Chapter 3

The adjunct status of the embedded clause is further shown by its option-

ality (see (39a)); in true cases where a matrix subject gets its theta role from

an embedded predicate, the embedded predicate is not optional (see (39b)).

(39) a. John is easy.

b. *John seems.

But so far I have not explained one of the salient facts about the

construction that supports the movement relation I am trying to ban:

namely, that the matrix subject (e.g., of (36)) is interpreted as the object

of the embedded verb. Since in my analysis the matrix subject gets its

theta role from the matrix predicate, and the embedded clause is an ad-

junct clause, it does not immediately follow that the subject will be inter-

preted as identical to the embedded object. Clearly, some mechanism

must interpret the matrix subject as ‘‘controlling’’ the embedded object

position, or more precisely, the operator chain in the adjunct clause that

includes the object gap. I have nothing to contribute to that topic here;

for my purposes it is enough to observe that several diverse constructions

require such a mechanism as a part of their description; the pretty to look

at construction in (37) is one such case, and (40) illustrates two more.

(40) a. John bought it [to look at t]. (purpose clause)

b. John is too big [to lift t]. (too/enough complement)

In each of these the embedded operator chain is linked to a matrix

argument—object in (40a) and subject in (40b). As there is no chance that

movement could establish that link for these cases, I will stick with my

conclusion about the tough cases: the matrix subject gets a simple theta

role from the tough predicate; the embedded clause is an adjunct with an

operator chain, which is interpretively linked to the matrix subject.

If this analysis of the tough construction is correct, then a major ob-

stacle to the (G)BOIM is eliminated, and this I think is in fact the most

compelling reason to accept that analysis.

The LEC rules out more than the (G)BOIM. It also rules out, for

example, any relation between two subject positions if CP structure

intervenes. M. Prinzhorn (personal communication) points out that it

automatically rules out superraising.

(41) a. *John seems [that t saw Bill].

b. *John seems [that Bill saw t].

c. *John seems [that it was seen t].

Embedding 77

Not all of (41a–c) count as pure superraising cases in all theories, but in

fact they are all ruled out by the LEC: once any CP structure is present in

the embedded clause, it is present by hypothesis in the matrix clause, and

so, by extension, it is too late to execute any subject-to-subject relations.

H.-M. Gartner (personal communication) provides more cases that are

relevant for the GBOIM, and hence for the LEC—namely, the following

intriguing examples from German:

(42) a. Weniwho

glaubst

believe

du

you

[t 0i dassthat

Maria

Maria.nom

ti sieht]?

sees

‘Who do you believe that Maria sees?’

b. Weni glaubst du [t0i sieht Maria ti]?

c. Ich

I

frage mich [

wonder

weniwho

du

you

glaubst [t 0ibelieve

dass

that

Maria

Maria.nom

ti sieht]].

sees

‘I wonder who you believe that Maria sees.’

d. *Ich frage mich [weni du glaubst [t0i sieht Maria ti]].

(H.-M. Gartner, personal communication)

Schematically:

(43) a. [wh V [twh]Vfinal]V2b. [wh V [twh]V2]V2c. . . . [wh V [twh]Vfinal]Vfinald. *. . . [wh V [twh]V2]Vfinal

The clear generalization is that it is possible to extract into a V2 (verb-

second) clause from either a V2 or a Vfinal (verb-final) clause, but it is

possible to extract into a Vfinal clause only from a Vfinal clause. This is a

very odd fact. Clearly, V2 clauses are not themselves islands, as (43b)

shows; islandhood is determined not just by where the extracted element

is coming from, but also by where it is going.

This is the sort of fact that barriers were designed for (Chomsky 1982).

But I will instead develop a ‘‘timing’’ explanation in terms of the LEC. It

will be a little like the account of nonbridge verb embedding: specifically,

it will be based on the supposition that V2 clauses are ‘‘bigger’’ (and

therefore ‘‘later’’) than Vfinal clauses. The supposition takes some plau-

sibility from the fact that V2 clauses are most often matrix clauses. We

might imagine that matrix clauses have more functional structure than

embedded clauses—functional structure associated with ‘‘speech act’’

aspects of an utterance (this is the ‘‘performative’’ syntax that harks back

to Ross 1970).

78 Chapter 3

(44) [[[ . . . ] . . . ]FVfinal . . . ]F 0

F 0 here is the extra functional structure that triggers V2; FVfinal structureis strictly smaller.

Furthermore, and in fact as a consequence of being ‘‘bigger,’’ V2

clauses will be later than Vfinal clauses in RT. For concreteness, I will

assume that V2 structures are defined in FS, whereas Vfinal structures are

defined in SS, where SS‘FS.

In this setup wh movement will have to take place at two di¤erent

levels, since the cases we are looking at have embedded wh and matrix

wh. Matrix wh is in FS, and embedded wh is in SS. We might imagine

that FS wh is fed by embedded wh; that is, in terms of the structure in

(44), wh moves to SpecFVfinal in SS, and from there to SpecF0.

(45) [wh [t [ . . . t . . . ] . . . ]FVfinal . . . ]F 0

The second movement might not be a movement, but part of the SS‘FS

representation. However, I will ignore that possibility here as it plays no

role in the explanation of Gartner’s paradigm.

As is well known, some German verbs embed V2 complements, which

in present terms means that they embed FS clauses at the level FS. If

these V2 complements are indirect questions, they will involve FS wh

movement to SpecF 0, as well as V2, which itself is presumably triggeredby F 0. So such embedded questions are completely parallel to matrixquestions in their syntax and relation to the levels. The diagrams in (43)

can now be annotated with the clausal structure postulated in (44), to give

the following structures:

(46) a. [wh V [t [ . . . t . . . ]]FVfinal ]F 0

b. [wh V [t [ . . . t . . . ]]F 0 ]F 0

c. [wh V [t [ . . . t . . . ]]FVfinal ]FVfinald. *[wh [V [t [ . . . t . . . ]]F 0 ]]FVfinal

Only the final movement of the wh in each case is of interest here. Given

that F 0 > FVfinal in the functional hierarchy, only in (46d) is that finalmovement a GBOIM-violating ‘‘downgrading,’’ from F 0 to FVFinal; allthe other final movements are either upgradings (46a) or movements that

maintain the functional level of the trace (46b,c). Hence, Gartner’s para-

digm follows from the GBOIM.

I will conclude this section by pointing out a case that is a counter-

example to the LEC so long as it relies on the completely literal notion of

extension: the French L-tous construction, illustrated here:

Embedding 79

(47) a. Marie

Marie

a

has

toutesiall

voulu

wanted

[les

them

manger ti].

to-eat

‘Marie wanted to eat them all.’

b. Il

it

a

has

tousiall

fallu

needed

[qu’ils

that they

parlent].

speak

‘It was necessary that they all speak.’

c. Il

it

a

has

tousiall

fallu

needed

[que

that

Louis

Louis

les

them

lise

read

ti].

‘It was necessary that Louis read them all.’

In each of these the tous in the matrix modifies the embedded direct ob-

ject, suggesting it has been moved from there. The problem, as noted by

J.-Y. Pollock (personal communication), is that the tous seems to violate

extension under the LEC. Tous is located to the right of the matrix sub-

ject, but seems to have been moved out of an embedded clause that is

‘‘bigger’’ (in terms of functional structure) than the phrase to which it has

attached. This is especially apparent in cases like (47c): tous has moved

out of an embedded that clause, but still has moved to a position short of

the subject in the matrix. The LEC with extension would not allow this: if

the embedded clause is a CP, then so is the matrix, and extension would

dictate no movement except to the edge of that CP. I can imagine two

sorts of answer. First, although tous movement can span clauses, the

clauses must be infinitival, or, as in (47b,c), subjunctive. Infinitival clauses

are smaller (and therefore earlier) than full CPs; perhaps subjunctive

clauses are also smaller and earlier, despite the presence of que. The other

sort of answer requires a reformulation of extension. I have thus far taken

extension quite literally to crucially involve the periphery of the domain.

I might instead reformulate it in a more abstract way, as ‘‘Movement

within a level can only be to positions that are uniquely made available at

that level,’’ without requiring that those positions be peripheral in that

level. I have no concrete suggestion to make, but the issue will recur in

later chapters, as there are other examples of this sort to consider.

3.2.2.2 Subjects In this chapter I have shown that a generalized ban on

improper movement follows from the architecture of RT, and in chapters

4 and 5, I will show how a generalized notion of the A/A distinction

and reconstruction emerges as well. Similarly, I will suggest in this section

that there is a generalized notion of subject in RT, with each level defin-

ing its own particular kind of subject: theta subject in TS, perhaps identi-

80 Chapter 3

fied as agent; Case subject in CS, perhaps identified with nominative

Case; surface subject in SS, perhaps identified with ‘‘pure’’ EPP subjects in

languages with nonnominative subjects like Russian (Lavine 2000) and

Icelandic. Even FS may involve some notion of subject.

In what sense, though, is there a generalized notion of subject? Isn’t

it simply the case that agents are introduced at TS, nominative Case is

introduced at CS, and so on, and that there is no intrinsic link among

these elements, as the term subject tends to imply? In fact, the represen-

tation relation ties these di¤erent notions of subject together: the agent

is ‘‘canonically’’ mapped into the nominative NP in CS, which is ‘‘ca-

nonically’’ mapped into the ‘‘pure’’ EPP subject position in SS, and so

on. I put quotation marks around ‘‘canonically,’’ because that concept is

exactly what this book tries to explicate in terms of the notion of shape

conservation. So RT o¤ers a natural account of the notion that subjects

are agents, nominative, and topicalized: this results from the purely ca-

nonical mapping across all the relevant levels, but it also permits devia-

tion from canonicity, of the type shown in chapter 2.

In what follows I will try to sort out some of the wealth of what is now

known about subjects into properties of di¤erent levels. I cannot pretend

to o¤er anything more than suggestions at this point. I do think that RT

gives voice to the old intuition that there are several di¤erent notions of

subject that get wrapped up into one; at the same time it seems to o¤er

the possibility to derive the properties of the di¤erent notions from what

is already known about the structure of each level and how it is repre-

sented in the next.

3.2.2.2.1 Quirky Subjects For languages like Icelandic at least, it is

obvious that there is a notion of subject more ‘‘superficial’’ than Case as-

signment. I will tentatively identify the level at which this more superficial

notion of subject applies as SS, though in the next section I will revise this

guess to a level intermediate between SS and CS.

As detailed, for example, in Andrews 1982 and Yip, Maling, and

Jackendo¤ 1987, Icelandic has a class of verbs that take subjects that are

not nominative, but are instead ‘‘quirkily’’ Case marked with dative, ac-

cusative, or genitive.

(48) Drengina

the-boys.acc

vantar

lacks

mat.

food.acc

(Andrews 1982, 462)

Embedding 81

In the appropriate circumstances nominative Case can show up on the

direct object when the subject receives a quirky Case.

(49) Mer

me.dat

syndist

thought-saw

alfur.

elf.nom

‘I thought I saw an elf.’

(Andrews 1982, 462)

Andrews presents clear evidence that the dative and accusative NPs

in these two examples are subjects in the obvious senses. First, quirkily

Case-marked NPs can undergo raising, and the quirky Case is preserved

under that operation.

(50) Hana

her.acc

virDistseems

vanta

to-lack

peninga.

money.acc

(Andrews 1982, 464)

The verb vanta assigns quirky accusative Case to its subject, and (50)

shows that raising preserves the Case. It is only in the case of quirky Case

assignment that a raised subject can be Case marked anything but nomi-

native. Second, quirky subject Case marking shows up in Icelandic ECM

constructions.

(51) Hann

he

telur

believes

barninu

the-child.dat

(ı barnaskap sınum)

(in his foolishness)

hafa

to-have

batnaDrecovered-from

veikin.

the-disease.nom

(Andrews 1982, 464)

Third, quirkily Case-marked subjects are ‘‘controllable’’ subjects.

(52) EgiI

vonast

hope

til

to

aD PROito

vanta

lack

ekki

not

efni

material

ı

for

ritgerDina.the-thesis

(Andrews 1982, 465)

As mentioned before, vanta assigns accusative Case to its subject, and as

(52) shows, that accusative NP is silent, but understood as coreferential

with the nominative matrix NP.

Andrews emphasizes that other preverbal NPs, such as topicalized

NPs, cannot participate as the pivot NP in an ECM, control, or raising

construction. So the quirkily Case-marked subjects really are subjects in a

substantive sense.

Clearly, the subject in these sentences is at some point within the Case-

assigning reach of the verb. I will assume that these Cases are assigned in

CS, in the following sorts of structures:

82 Chapter 3

(53) a. CS: [NPnom [V NPacc]]

b. CS: [NPdat [V NPnom]]

Suppose that SS generates structures like the following:

(54) SS: [NPA [V NPB]]

We could regard structures like (54) as Case free, or Case indi¤erent,

leading to slightly di¤erent theories. I will arbitrarily pursue the idea that

such structures are Case indi¤erent. Surface structures are Case indi¤erent

in that A and B in (54) can bear any Case insofar as the well-formedness

conditions of SS are concerned; what Cases they turn out to bear in a

particular sentence will be determined by what Case structures they are

matched up with. The natural shape-conserving isomorphism will identify

NPnom with NPA, and NPacc with NPB. It is natural to identify NPA in SS

as a ‘‘subject’’ and to inquire about its properties. The notion of subject in

CS is obvious: the most externally assigned Case in CS. I will not go into

how structures like (53) are generated, but see Harley 1995 for sugges-

tions compatible with proposals made here (see especially the Mechanical

Case Rule).

Quirky Case marking splits the subject properties in two, a split that

corresponds to the two levels CS and SS in RT: specifically, quirky sub-

jects are Case marked (CS), nonagreeing (CS), raisable (SS), and con-

trollable (SS).

The controllable subject will be the SS subject (to be revised shortly,

when a further level is interposed between CS and SS) regardless of the

Case of the NP in CS that is matched to the SS subject.

Quirky subjects, on the other hand, do not act like nominative subjects

in regard to agreement—quirky subjects do not agree.

(55) Verkjanna

the-pains.gen

er

is

taliDbelieved

ekki

not

g0ta.to-be-noticeable

(Andrews 1982, 468)

Agreement is presumably then a property determined in CS. This

arrangement—Case-marked subject and agreement in CS, controllable

subject in SS, with representational mapping connecting the two—gives

the two notions of subject needed to interact with other phenomena in

grammar. CS looks inward, and SS outward.

3.2.2.2.2 EPP Subjects, Raising, and Control In chapter 2 we saw that

Russian also has a notion ‘‘subject’’ that is ‘‘beyond Case.’’ In certain

circumstances a clause-initial position must be filled, a requirement that

Embedding 83

can be evaded only to achieve a special focus e¤ect. Furthermore, the

trigger for this movement is not Case, as the NP (or other phrase moved

to clause-initial position) already has its own Case, which it brings with it.

In a ramified Pollock-style model, such examples can be understood as

instances of ‘‘pure’’ EPP, a movement motivated apart from any Case

requirement. They are also beyond any requirement of agreement. They

are therefore beyond CS, like the Icelandic examples. But in fact, they

di¤er from the Icelandic examples in an important way: the pure EPP

position in Russian is also not a controllable position. The Russian verb

tosnit’ ‘feel nauseous’, like the verb zalozilo ‘clogged’ discussed in chapter

2, takes no subject argument; but it does, again like zalozilo, take an in-

ternal accusative object that must be fronted in ‘‘neutral’’ circumstances. I

have chosen this verb because it has an animate argument and so could

potentially participate in control structures. But in fact that NP argument

cannot be controlled.

(56) a. Dzona

John.acc

tosnilo.

felt-nauseous.neut

b. Menja

me.acc

prodolzalo

continued

tosnit’.

to-feel-nauseous

(Babby 1998a)

c. *Ja

I

xocu

want

tosnit’.

to-feel-nauseous

d. Ja

I

xocu,

want

ctoby

so-that

menja

me.acc

tosnilo.

feel-nauseous

(E. Chernishenko, personal communication)

(56a) illustrates the use of the verb in a tensed clause. (56b) shows that

the verb is compatible with aspectual predicates. (56c) illustrates the un-

grammatical situation where the accusative NP is controlled as the sub-

ject of an embedded infinitive. (56d) shows how a Russian speaker would

say what (56c) intends to say—using a subjunctive clause with an overt

accusative argument, clearly not a control structure.

In the view put forward here, (56a), (56b), and the embedded clause of

(56d) all have subjectless TSs, which are mapped, at least in the case of

(56a) and (56b), to subjectful surface structures, but too late for control;

at the relevant level for determining control, they still have no subject.

Assuming that control is established at CS (this will be amended shortly),

(56a) and (56c) are derived as follows:

84 Chapter 3

(57) a. TS: [tosnilo Dzona]‘CS: [tosnilo Dzona]

‘ SS: [Dzona tosnilo]

b. TS: ja xocu [tosnit’ PRO]‘CS: ja xocu [tosnit’ PRO]

‘ SS: ja xocu [PRO tosnit’]

The infinitive in (57b) does not have a PRO subject until SS, too late for

control in CS. I have implemented control in terms of PRO, but that is

not essential to the point. What is essential is that at the relevant level,

and in the relevant sense, tosnit’ does not have a subject.

So Russian diverges from Icelandic on this point (cf. Icelandic (52)). In

order to assess this di¤erence between Russian and Icelandic, we must fix

the level at which control is established. This question can be approached

in both RT and the standard ramified Pollock/Cinque-style clause struc-

ture. In a theory with such a clause structure, we would conclude that

there was a further level of functional structure that could be used to sort

out the di¤erent notions of subject, as shown in (58).

(58)

This array of conclusions can be modeled in RT by the following sub-

sequence of the representational chain:

(59) Case-Agr Structure‘Control Structure‘Russian-EPP Structure

Each representation would have a subject position, which would be

mapped or mismapped from a previous level. Control Structure would

have mapped into its subject position the highest Case position in CS; the

objects defined in Control Structure would be the ones selected by raising

and control predicates; and Russian-EPP Structure would have a notion

of subject more abstract than (in other words, not limited to) Control

Structure.

The equivalence of (58) and (59) should be familiar by now, which is

of course not to say that the theories in which they arise are equivalent. In

both theories certain results must obtain to achieve empirical adequacy:

in English all three notions must collapse into one; in Icelandic control

subjects must be distinct from Case-Agr subjects; and in Russian all three

Embedding 85

notions must be distinct. The two models will achieve these results in dif-

ferent ways.

The question for RT is how to graft the subchain in (59) into the model

presented in chapter 2. This question could be definitively answered by

identifying the ends of (59) with elements of the chapter 2 sequence. A

plausible candidate of course is that Case-Agr Structure is CS and that

Russian-EPP Structure is SS; but then Control Structure will intervene

between CS and SS as a new level.

In fact, there is good reason to posit a level between CS and SS. The

reasoning is simple: there is a notion of subject that is more abstract or

more general than ‘‘most externally assigned Case’’ but narrower than

‘‘topicalized subject.’’ Control and raising seem to require some interme-

diate notion of subject. In chapter 4 we will see that anaphoric control

requires a further notion of subject as well. The question then emerges, do

all these phenomena converge on a single notion of intermediate subject?

One consideration is the bounding of anaphors. Earlier I identified the

English anaphor as a CS anaphor. One reason for positing a level earlier

than SS is that CP structure is defined in SS, and elements in SpecC do

not seem to be able to antecede English reflexives, as shown earlier (this

is simply the well-known generalization that English reflexives must be

A-bound). Himself is thus bound by some earlier notion of subject; the

question is, is it the CS subject? For English it is di‰cult to say, but for

the Icelandic reflexive sig the answer is no.

We also know from Icelandic that the control subject is not the agree-

ment subject. For one thing, Icelandic allows control of NPs that would

not be assigned nominative Case. Moreover, when nominative Case is

assigned to the object and the verb agrees with it, control nevertheless

targets a di¤erent NP, the ‘‘subject’’ in some higher (or later) sense. This

later subject then is not the agreement subject, which we might take to be

the CS ‘‘subject.’’ But neither is it an SS subject, in that it is restricted to

A antecedents. This Icelandic anaphor, as well as the English himself, is

thus likely to be an element introduced in a level intermediate between CS

and SS, a level I will now identify with the label Predicate Structure (PS).

We have thus identified (58) as the subsequence CS‘PS‘ SS, so that

the model now looks like this:

(60) TS‘CS‘PS‘ SS‘FS‘PSb

QS

86 Chapter 3

Assigning himself to PS in English is slightly arbitrary, since it could

as easily be assigned to CS; only Icelandic shows evidence of the slightly

more abstract notion of subject. But this assignment does allow the im-

mediate annexation of the findings reported in Williams 1980, where the

licensing of anaphor binding in English was identified with the notion

‘‘predicate,’’ rather than ‘‘subject’’; in the present context we could return

to the notion ‘‘subject,’’ but only if we mean precisely the PS subject.

Another phenomenon that might be accounted for in terms of the

properties of PS is VP deletion. PS will define a notion of one-place

predicate, corresponding to some version of the (traditional) English VP,

which is abstracted away from whatever subject it is applied to; this

abstracted VP is what is needed to account for so-called sloppy identity.

(61) John likes himself and Sam does too.

What does Sam do? In the sloppy reading he does not ‘‘like John’’;

rather, he ‘‘self-likes,’’ just as John does. There is some controversy

whether this is the right view. I will return to the matter in chapter 9,

where I fill in some idea of what the semantic values assigned to the

objects in each level are.

Control and raising themselves must be assigned to some representa-

tion earlier than SS, if SS is where CP structure is introduced. Essentially,

this follows from the logic of the GBOIM in RT, even though it is not

usually considered a case of improper movement. Control and raising are

NP Structure rules, in the terminology of Van Riemsdijk and Williams

(1981), which entails that they are always relations between pairs of A

positions. But by the LEC, they must then be defined in a level that has

only A positions; this excludes SS, if SS is the level in which CP structures

are introduced. In other words, the following will always be ungrammat-

ical structures:

(62) a. *John seems [ . . . to have won]CP.

b. *John tried [ . . . to have won]CP.

These violate the GBOIM in RT, though not in the familiar application

of the term improper movement, as noted earlier. Since we already know

from Icelandic that control is defined in a more abstract level than CS, we

are left with the conclusion that control is bounded by CS on one side and

SS on the other—and so we are left with the conclusion that control is

defined at PS as well.

Embedding 87

The conclusion about (62a) was established independently in Williams

1994b, where it is argued that CP structure inhibits the transmission of

the theta role to the matrix subject. (62b) is a case of obligatory control,

in the sense of Williams 1980, where it is demonstrated that there are no

cases of obligatory control over CP structure; that is, control of x by John

in examples like the following is always an instance of optional or ‘‘arbi-

trary’’ control:

(63) a. John wonders [who [x to talk to]]CP.

b. [Who [x to talk to]]CP was not known to John.

See Williams 1980 for further discussion, and also see Wurmbrand 1998

for a comprehensive account of the di¤erence between obligatory and

optional control exercised across a variety of European languages that

delivers exactly this conclusion. But see also Landau 1999, where it is

argued that the obligatory/optional distinction is specious.

It follows as well that PRO in CP cannot be controllable; that is, deri-

vations like (64) are impossible.

(64) *Johni wants [PROi [ — to talk to ti]]CP.

This again follows if control is defined at PS. But alongside (64) we do

find (65a,b).

(65) a. John bought it [OPi [ — to read ti]]CP.

b. A shelf arrived [OPi [ — to put books on ti]]CP.

(65b) appears to involve a control relation between the direct object and

the SpecCP of the clause [OP [to to put books on t ]]. Why is that relation

allowed, if control is consigned to PS? The crucial di¤erence between

(65a) and (65b) must be that the clause in (65b) is an adjunct clause. The

rules determining the form and meaning of adjunct clauses are patently

not confined to PS, as in general wh movement can be involved in the

formation of adjuncts (e.g., relative clauses). The question remains, are

there any principled grounds for separating ‘‘real’’ control from control

of wh-moved operators in adjunct structures? I will postpone this question

until it is appropriate to discuss in general when adjuncts are embedded.

For the time being we may satisfy ourselves with the idea that ‘‘argu-

mental’’ control is established at PS.

Part of the benefit of the LEC can be achieved in a theory with stan-

dard clausal architecture by allowing the embedding of structures smaller

than CP—that is, ‘‘small clauses.’’ Locality e¤ects and limitations on the

88 Chapter 3

target of rules can be achieved in this way: embedding structures smaller

than CP will give a weaker clause boundary (thus allowing local rules

to apply in such a way as to bridge the clause boundary), and omitting

CP will at the same time provide a narrower class of targets (the A target

SpecC will be excluded, for example). This was the strategy adopted in

Williams 1974, where I argued that certain clause types lack CP structure,

having only IP or smaller structure (hence, ‘‘small clauses’’) (though this

terminology did not exist at the time—CP90s ¼ S 070s; IP90s ¼ S70s). Forexample, there are no gerunds with a wh complementizer system, so ger-

unds cannot be used to form indirect questions.

(66) *I wondered [whose book Bill’s having seen t].

What the LEC in RT adds to the small clause theory is that ‘‘smaller’’

corresponds to ‘‘earlier,’’ and this draws in the further property of rules

connected with reconstructivity—that is, the details about what move-

ment rules reconstruct for what relations. It also draws in the notion of

target type (A vs. A), if each RT level defines di¤erent types of NPs.

Small clause theories have no means of connecting locality with these

notions of target type and reconstructivity in a theoretically organic way.

I will discuss the full set of locality-reconstructivity-target correlations

(LRT correlations) in chapter 4. But for the moment I restrict attention

to the correlation between target and locality.

Wurmbrand (1998) has pursued the small clause methodology for

German restructuring verbs; she argues that they lack CP and IP struc-

ture, having only something like VP structure, and proposes that their

clause-union-like properties result from the smaller structure. This sort of

analysis is quite similar to the proposal I am making, in that smaller

clause types result in more clause union e¤ects, and it thus explains

locality-target correlations—penetrable complements are ones that lack

A targets.

Cinque (2001) has taken a di¤erent but related tack. He has argued

that restructuring verbs actually are themselves functional elements. Sup-

pose that clausal functional structure ¼ F1 > F2 > � � �Fn. Normally, amain verb takes a complement by instantiating Fn, and taking an FiP as

its complement. But Cinque suggests that a restructuring verb is an Fi,

and that it takes the rest of the functional chain, Fiþ1 > � � �Fn, as itscomplement, just as an abstract Fi would.

At first glance this would appear to give the same results as the small

clause approach: the restructuring verbs will take smaller complements

Embedding 89

than normal verbs, in that a restructuring verb identified as an Fi will

take as its complement only the tail of the clausal functional chain start-

ing at Fiþ1 and so will in e¤ect take a small clause as its complement.Clause union e¤ects will derive from the fact that the restructuring verb

and its complement compose a single clausal functional chain.

On the last point, though, Cinque’s proposal is quite di¤erent from the

small clause embedding proposal, the RT proposal (with the LEC), and

Wurmbrand’s proposal. In these accounts a small clause complement is a

separate (if degenerate) subchain from the chain that terminates in the

restructuring verb, not a continuation of that subchain.

The di¤erence is radical enough that it should be easy to devise decisive

tests, though I will not try to do so here. On one count, though, the evi-

dence is very simple, at least in principle.

Cinque argues for his proposal in part by pointing out that adverbs

that cannot be repeated in a single clause also cannot be repeated in a

restructuring verb structure. This of course does not follow at all from a

theory in which there is an actual operation of clause reduction. It

does follow from Cinque’s proposal if we accept Cinque’s (1998) central

idea about the distribution of adverbs: namely, that adverb types are in

a one-to-one relation to clausal functional structure, and that the non-

repeatability of adverbs follows from the absence of subcycles in the

clausal functional structure. Naturally, this nonrepeatability will carry

over to restructuring structures, if the verb and its complement instantiate

a single clausal functional structure.

The prediction is somewhat di¤erent in a small clause theory of the

restructuring predicates. The di¤erence between the two theories is sche-

matized in (67) (RV ¼ restructuring verb; MV ¼ main verb).(67) a. Cinque-style theory

F1 > F2 > F3 > F4 > F5 > F6 > F7RV MV

b. Small clause theory

F1 > F2 > F3 > F4 > F5 > F6 > F7 > F5 > F6 > F7RV MV

In the Cinque-style structure there is one clausal architecture, F1 . . . F7;

in the small clause structure the restructuring verb itself is an F7 and takes

the small clause F5 > F6 > F7 as its complement.

The theories coincide in predicting that adverbs associated with ‘‘high’’

adverbs at F1 . . . F4 cannot be repeated, if we make Cinque’s assumption

90 Chapter 3

about the relation of adverb type to functional structure, simply because

these functional projections occur only once in each structure. But with

respect to ‘‘low’’ adverbs, ones associated with F5 . . . F7, the theories

diverge. The Cinque-style structure predicts that they will not be repeat-

able. The small clause theory predicts that they will be repeatable—once

modifying the restructuring verb, and once modifying the main verb.

The small clause analysis seems to be borne out in the following

example:

(68) John quickly made Bill quickly leave.

The manner adverb quickly can be seen to modify both the restructuring

verb and the main (embedded) verb, and thus the structure (67b) appears

to be the correct one. This at least establishes that the small clause anal-

ysis is correct for make in English; I have not obtained the facts about

Romance restructuring verbs to determine whether they behave as make

does in (68).

In this section I have taken up some new empirical domains (control,

raising, predication, VP deletion, and their interaction with Case) and

posited a further level in RT to treat the complex of phenomena that arise

when they interact. I cannot blame the reader who at this point is dis-

tressed by the proliferation of levels in RT. But I do think that some

perspective is required in evaluating the practice involved. Much of the

proliferation of levels corresponds, point by point, with proliferation in

a ramified Pollock/Cinque-style theory (RP/CT), in that there is at the

limit (the worst case) a one-to-one correspondence between levels in

RT and functional elements in RP/CT. As I remarked earlier, the worst

case deflates my theory, because in this case the parallelism induced by

Shape Conservation is trivialized. But for the moment I would focus on

the fact that RP/CT lives with the following more or less permanent

mystery: there is a fixed universal set of functional elements in some fixed

order that defines clause structure, each with its own properties and its

own dimensions of linguistic variation. Now, this mystery corresponds

exactly to the ramified levels of RT—to the extent that often a revision

in the understanding of the role of a functional element in RP/CT will

translate straightforwardly into a revision in the understanding of a

level in RT. The fact that functional elements are called lexical items in

RP/CT and levels in RT should not be allowed to obscure this corre-

spondence. I think the correspondence puts into perspective the method-

ology that RT naturally gives rise to: solve problems by figuring out what

Embedding 91

levels are involved in the phenomena and fix the details of those levels

accordingly—in the worst case, standard practice.

3.2.2.2.3 Subject Case and Agreement In this last subsection I will

speculate on how an insight of Yip, Maling, and Jackendo¤ (1987) could

be expressed in RT. There is a di¤erence between English, on the one

hand, and both ergative languages and languages like Icelandic, on the

other hand, which has eluded the model so far. In English, the subject, if

Case marked, is always Case marked in a way that is independent of the

verb it is subject of, and in particular, independent of what Cases are

assigned in the VP. But in the other languages mentioned, subject Case

marking is dependent on the Case structure of the VP in ways noted

earlier. Yip, Maling, and Jackendo¤ suggest that the subject falls within

the Case domain of the verb in Icelandic-type languages, whereas in

English the subject is in a separate domain; in Icelandic, in their view,

there is only one Case domain, whereas in English the clause is divided

into two Case domains. A further corollary of this view is that nonsub-

ject nominatives will be found only in Icelandic-type languages.

In the present theoretical context we might adapt Yip, Maling, and

Jackendo¤ ’s conclusions by treating English nominative as a Case

assigned at PS instead of CS. If it is the only Case assigned in PS, and if

it is always assigned to the subject defined at that level, then there will be

no opportunity for it to mix with the rest of the Case system, which is

assigned at CS.

Under this arrangement we no longer have a ‘‘single-level’’ Case

theory. But perhaps it is arbitrary to expect that in the first place.

This arrangement makes an interesting prediction about expletives. The

simplest account of expletives is to treat them as ‘‘formal’’ Case holders;

that is, they occupy a Case position in CS that does not correspond to a

theta position in TS. But in fact, we might consider confining expletives

to PS; in that case we would expect (subject) expletives only in languages

like English, which has an ‘‘absolute’’ nominative subject requirement.

I do not know if the facts will bear out this conclusion. But German is

clearly a language of the Icelandic type with regard to Case assignment.

(69) Mir

I.dat

ist

is

geholfen.

helped

‘I was helped.’

Since the dative subject in (69) is a controllable nonnominative subject,

the remarks about Icelandic apply here. Moreover, German does not

seem to have a subject expletive.

92 Chapter 3

(70) a. Es

it

wurde

was

getanzt.

danced

‘There was dancing.’

b. Gestern

yesterday

wurde (*

was

es)

it

getanzt.

danced

c. Ich

I

glaube

believe

dass (*

that

es)

it

getanzt

was

wurde.

danced

The expletive es appears only in matrix clauses, presumably because it is

not a subject expletive, but a fill-in for Topic position; therefore, because

of the well-known matrix/subordinate di¤erence in German clausal syn-

tax—topicalization and V-to-C movement apply only in the matrix—it

will play a role only in matrix clauses. So, even the notion ‘‘expletive’’

needs to be generalized across the RT levels.

The lesson from this section will become familiar: a previously unitary

concept is generalized across the levels of RT. In this case it is the notion

‘‘subject,’’ di‰cult to define, but now decomposed into components:

agreement subject, control subject, thematic subject, Case subject, pure

EPP subject, and so on. But the decomposition brings more than its

parts, because these notions are ordered with respect to one another by

the asymmetric representation relation. The ordering allows us to say,

for example, that Icelandic quirkily Case-marked subjects are ‘‘earlier’’

than Russian pure EPP subjects and therefore liable to control.

Embedding 93


Chapter 4

Anaphora

The overall typology of anaphoric elements can be reinterpreted in terms

of the di¤erent levels of RT. Associating di¤erent anaphors with di¤erent

levels interacts with the LEC to fix properties of anaphoric items in a way

that I think is unique. In a sense it is a generalization of the method used

to explain the BOIM in chapter 3. The same method will be applied more

broadly still in chapters 5 and 6.

The Level Blocking Principle introduced in chapter 3 will play an im-

portant role in the discussion as well. According to this principle, if one

and the same operation can take place in two di¤erent levels, the appli-

cation in the early level blocks the application in the later level. If ana-

phors are introduced at every level, the applicability of such a principle

will be obvious.

4.1 The Variable Locality of Anaphors

It emerged in the 1980s, beginning with Koster 1985, that there is a hier-

archy of anaphoric elements, from ones that must find their antecedents

at very close range, to those whose antecedents can be very far away. It

will be natural to associate these with the levels of RT in such a way that

the more long-distance types are assigned to later structures, with the

hope that the ranges of the di¤erent types can be made to follow from

the ‘‘sizes’’ of the objects defined at each level. In this sense RT levels

index the set of anaphors in the same way that they index embedding

types as shown in chapter 3. Here and in chapter 5 we will see that RT,

with the LEC, draws together three di¤erent properties of syntactic

relations: their locality, their reconstructivity, and their target (where

target refers to choice of A or A antecedent, generalized in a way to be

suggested in chapter 5). I will refer to the correlations among these three

di¤erent aspects of syntactic relation as the LRT correlations (locality-

reconstructivity-target). Although di¤erent aspects of this three-way cor-

relation have been identified in previous work, it seems to me that the

whole of it has not been drawn together theoretically, nor has the scope

of the generalization involved been well delineated. I believe it is a dis-

tinctive feature of RT that it forces a very strong generalized version of

the correlation.

For example, RT makes explicit the following correlation about how

locality and type of possible antecedent covary. Traditionally, it has been

assumed that an anaphor must have an A position antecedent. For ex-

ample, it has been held that a wh-movement-derived antecedent is not

available for English reflexives (except of course under reconstruction).

Thus:

(1) a. *John wondered [which man]i pictures of himselfi convinced

Mary that she should investigate t.

b. John wondered which mani Bill thought [ti would like himself ].

In (1b) the reflexive is bound to which man, but via its A position trace,

which c-commands and is local to it. In (1a), however, this is impos-

sible; the trace of which man does not c-command the anaphor, and (most

importantly) which man is in an A position and so is ineligible itself as

antecedent.

But in RT, the notion of A position is relativized. Each representation

relation gives rise to a unique A/A distinction: positions at level Ri are

A positions with respect to positions at level Riþ1. As a result, we mightexpect anaphors at di¤erent levels to behave di¤erently; specifically, we

might expect anaphors at later levels to have an apparently ‘‘expanded’’

notion of potential antecedent (target). Furthermore, as we move ‘‘right-

ward’’ in the RT model, this expanding class of antecedents should be

correlated with loosening locality restrictions, simply because the struc-

tures get ‘‘bigger.’’ Thus, the discussion in this chapter helps substantiate

the LRT correlations made possible by the LEC in chapter 3, and first

put to analytic use there to generalize and explain the BOIM.

The correlations are purely empirical, and not necessary (apart from

the theory that predicts them, of course). Consider, for example, Japa-

nese zibun and Korean caki. As is well known, these anaphors are not

bounded by subjects as English himself is, or in fact by any sort of clause

type; nor are they bounded by Subjacency.

96 Chapter 4

(2) Johni-i

John-nom

Billj-ekey

Bill-dat

Maryk-ka

Mary-nom

cakii=j=k-lul

self-acc

cohahanta-ko

like-compl

malhayssta.

told

‘Johni told Billj that Maryk likes selfi=j=k.’

(Korean; Gill 2001, 1)

As a consequence, in RT caki must be an anaphor that is introduced in a

late level, perhaps SS or FS, the levels at which tensed-clause embedding

takes place. As an SS (or FS) anaphor, it will take as its antecedents the

elements that are developed at SS (or FS), among them the Topic and

the Focus of the utterance. So, caki should be able to be bound by a class

of antecedents not available for the English reflexive, namely, A ante-

cedents; and this prediction seems to be borne out.

(3) Johni-un

John-top

ttal-i

daughter-nom

cakii-pota

self-than

ki-ka

height-nom

te

more

kuta.

is-tall

‘As for Johni, (his) daughter is taller than selfi.’

(Korean; Gill 2001, 1)

In this structure caki is bound from the derived A Topic position. This is

possible because caki is licensed at SS, where such elements as Topic are

introduced. Similar facts hold for zibun in Japanese and ziji in Chinese.

Such licensing is impossible in English, as English reflexives are

licensed in CS (or at least, before SS), and Topics don’t exist in that level.

(4) *(As for) [JohnT]i . . . the book for himselfi to read was given to t by

Bill.

In RT this property of the English reflexive is not a free parameter, but is

determined by another di¤erence between zibun and himself. Namely,

subject opacity holds for himself, but not zibun, because in the RT model

each of these properties is determined by what level the reflexive is intro-

duced in, so only certain combinations of properties is possible.

In addition to zibun, Japanese has another reflexive, zibunzisin, which

is essentially like English reflexive himself, both in locality and in type of

antecedent (A/A).

(5) Johni-wa

John-top

[[Billj-ga

Bill-nom

Maryk-ni

Mary-dat

zibunzisin�i=j=�k-o

himself-acc

subete

all

sasageta]

devote

to]

that

omotta.

thought

‘John thought that Bill devoted all (of ) himself to Mary.’

Anaphora 97

Latin also shows a correlation between distance and type of anteced-

ent. According to facts and analysis provided in Benedicto 1991, the

Latin se anaphor has both a greater scope and a greater class of possible

antecedents than standard anaphors. First, reflexive binding of se (dative

sibi here) can penetrate finite clause boundaries.

(6) CiceroiCicero.nom

e¤ecerat

had-achieved

[ut

comp

Quintus

Quintus

Curius

Curius.nom

consilia

designs.acc

Catalina

Catalina.gen

sibiirefl.dat

proderet].

reveal.subj

‘Cicero had induced Quintus Curius to reveal Cataline’s designs to

him.’

(Sall, Cat., 26.3; from Benedicto 1991, (1))

In fact, it can even penetrate into finite relative clauses.

(7) EpaminondasiEpaminondas.nom

[ei

him.dat

[qui

that.nom

sibiirefl.dat

ex

by

lege

law.abl

praetor

praetor.nom

successerat]]

succeeded.ind

exercitum

army.acc

non

not

tradidit.

transferred

‘Epaminondas did not transfer the army to the one who succeeded

him as a praetor according to the law.’

This is especially noteworthy since it casts doubt on treating long-distance

reflexivization in terms of movement, as several accounts have proposed.

Given this, we would expect the reflexive to occur in late levels. From

that it would follow that it could target A antecedents. Citing the follow-

ing examples, Benedicto argues that this is exactly the case:

(8) Canumidogs.gen

tam

such

fida

trusty

custodia

watchfulness.nom

quid

what

significat

mean

aliud

else

nisi

except

[seirefl.acc

ad

for

hominem

men.gen

commoditates

comfort.acc

esse

be

generatos]?

created.inf

‘The trusty watchfulness of the dogs, . . . what else does it mean,

except that they were created for human comfort?’

‘Cic., Nat. deor, 2.158; from Benedicto 1991, (24))

(9) A

by

CaesareiCaesar.abl

ulade

very

liberaliter

generously

inuitor

am-invited

[sibiirefl.dat

ut

comp

sim

be.subj

legatus].

legate.nom

‘Caesar most liberally invites me to take a place on his personal sta¤.’

(Cic., Att., 2.18.3; from Benedicto 1991, (25))

98 Chapter 4

(10) A

by

Curione

Curio.abl

mihi

me.dat

nuntiatum

announced

est

was

[eum

he.acc

ad

to

me

me.acc

uenire].

come.inf

‘It was announced to me by Curio that he was coming to me.’

(Benedicto 1991, (33))

Benedicto makes the point that normally passive by phrases cannot con-

trol reflexives in general. The fact that the by phrase in (10) is the ante-

cedent of the reflexive suggests that it can be so solely by virtue of its role

as a topicalized NP, which of course is consistent as well with its surface

position.

In RT terms this means that the reflexives in these examples are directly

bound by the Topic position, not bound to the trace of the Topic position.

(11)

Anticipating the discussion of Reinhart and Reuland’s (1993) theory of

anaphora in section 4.3, I will note that it seems unlikely that Benedicto’s

conclusions can be rewritten in terms of logophoricity (unless logophoric

is redefined to correspond to topic-anteceded ).

Important to evaluating RT in this connection is that in the absence

of a theory there is no logical connection between locality and type of

antecedent, in either direction. Thus, the locality of himself does not pre-

dict the ungrammaticality of (1a), as no subject interrupts the anaphor-

antecedent relation. In the other direction, there is nothing about the

lack of locality of zibun that directly predicts that it could be bound by

A antecedents. One can easily imagine a language in which, for example,

(2) is grammatical but (3) is not—all one would need is the ability to

independently specify the locality and the antecedent class for a given

anaphor.

RT does not allow this, as both properties of an anaphor derive from

the particular level the anaphor is assigned to. Assigning an anaphor to a

level simultaneously determines its locality (its relation to its antecedent

will be restricted to spanning the objects that are manufactured at that

level) and its antecedent class (it will take as antecedents the elements that

appear in structures at that level). And it does so in a completely gener-

alized ‘‘graded’’ or indexed way: the larger the locality domain, the wider

the class of antecedents. In this regard RT is more generous than other

theories with only the A/A distinction. But that generosity is apparently

needed, and it is compensated by the locality-type correlation. In section

4.3 I will suggest that the flaws in Reinhart and Reuland’s (1993) theory

Anaphora 99

stem mainly from its having only a binary distinction for types of ana-

phors (their ‘‘reflexive predicate/logophoric pronoun’’ distinction) instead

of the indexed notion suggested here.

In advance of looking at any data, we can in fact sketch what we would

expect to be the properties of anaphors at di¤erent levels in RT. These are

all consequences of the LRT correlations, which in turn follow from the

architecture of the model. If there is an anaphor associated with TS, for

example, it will relate coarguments of a single predicate, and nothing else,

because the structures of TS are verbs combined with arguments. If there

are complex theta structures for clause union e¤ects, we would expect the

antecedent-anaphor relation for these anaphors to be able to span these

complex units. In English the prefix self- has exactly this property: it

can relate coarguments of a single predicate, but nothing further away,

whether or not a subject intervenes. Its extreme locality can best be

appreciated by comparing it with the English syntactic reflexives him/her/

it/oneself, which permit the following pattern of antecedents:

(12) a. Stories about the destruction of oneself can be amusing.

b. ‘x’s stories about y’s destruction of x’

c. ‘x’s stories about y’s destruction of y’

(12b) and (12c) are both possible interpretations of (12a); but with the

anaphoric prefix self-, instead, only the reading corresponding to (12c) is

available.

(13) a. Self-destruction stories can be amusing.

b. *‘x’s stories about y’s destruction of x’

c. ‘x’s stories about y’s destruction of y’

(13b) represents the case where the antecedent is not a coargument of the

reflexive; such cases are impossible for self-, but possible for oneself. A

first guess about what is wrong with (13b) would be that destruction had a

covert opacity-inducing subject; but that account would fail to explain

why (12b) is not parallel to (13b).

In the context of RT, if we assign self- to the earliest level, TS, the

observed behavior is expected. Anaphors like himself and oneself will be

assigned to (possibly) higher levels. The assignment of self- to the lowest

level is probably not accidental; being an a‰x, it has access only to TS,

since the levels higher than TS in RT play no role in morphology, as I

proposed in the account of the Mirror Principle in chapter 1. This con-

clusion holds only for what is traditionally called derivational morphol-

100 Chapter 4

ogy. Inflectional morphology clearly must have access to all levels. In

chapter 8 I reconstruct the traditional distinction in RT terms.

There are nona‰xal syntactic reflexives that also seem to be confined

to TS. For example, Baker (1985) reports that the reflexive in Chi-mwi:ni

is a free reflexive like English himself, but it is confined to direct object

position and can take only the immediate subject as argument. And in

fact one of the Dutch reflexives discussed in the next section is probably

another case of this kind.

4.2 Dutch zich and zichzelf (Koster 1985)

The Dutch reflexives zich and zichzelf, discussed in detail by Koster

(1985), can be distinguished by assigning them to di¤erent RT levels.

(14a–d) are Koster’s examples showing the di¤erence in locality between

the two.

(14) a. *Max

Max

haat

hates

zich.

self

‘Max hates himself.’

b. Max

Max

hoorde

heard

mij

me

over

about

zich

self

praten.

talk

‘Max heard me talk about him.’

c. Max

Max

haat

hates

zichzelf.

selfself

‘Max hates himself.’

d. *Max

Max

hoorde

heard

mij

me

over

about

zichzelf

selfself

praten.

talk

‘Max heard me talk about him.’

(14a) shows that zich cannot take a clausemate antecedent, and (14c)

shows that zichzelf can. We may achieve an adequate description of these

facts by assigning the two reflexives to di¤erent levels: zichzelf to TS, and

zich to CS. These assignments are warranted if zich approximates English

himself and zichzelf approximates English self-, given the discussion in

section 4.1.

These assignments explain (14d), insofar as zichzelf, being a TS ana-

phor, is restricted to coargument antecedents; but they do not, strictly

speaking, explain (14a)—that is, why zich is ungrammatical with a co-

argument antecedent. An obvious first guess is that zich is subject to some

kind of Condition B and is too close to Max to satisfy that condition.

Anaphora 101

However, I think it would be more interesting to explore the idea that

zich and zichzelf are in a blocking relation with one another: where one

is used, the other cannot be. (See Williams 1997 for a general discussion

of the role of blocking in anaphora.) As in other blocking relations, the

direction of blocking is determined by the licensing conditions that hold

for the two items in the blocking relation; when the licensing conditions

associated with one of the items are strictly narrower than the licensing

conditions associated with the other, the former will block the latter when

those narrower conditions obtain. In the case of zich and zichzelf it is

obvious that zichzelf will block zich, because the conditions for licensing

TS anaphors are narrower than the conditions for licensing CS anaphors:

TS anaphors are limited to coargument antecedents, while CS anaphors

are not so limited, but could include them. The existence of (14c), then, is

the reason that (14a) is ungrammatical. If this is correct, then Condition

B will not be relevant, a conclusion that I will demonstrate again shortly,

on di¤erent grounds.

In general, if a given ‘‘process’’ can occur at more than one level, then

an application in an earlier level will block an application in a later level.

I would hope that the reasoning will always reduce to the asymmetry of

the representation relation, as suggested by the last part of the previous

paragraph in connection with reflexives at di¤erent levels, but I have

not thought through the problem su‰ciently to be sure that the logic

appealed to there will be available in all cases. This kind of blocking of an

early level by a late level is frequent enough to deserve a name, so I will

call it level blocking. It will be relevant again in chapters 5 and 6 in con-

nection with scrambling. The problem for applying the principle in gen-

eral is identifying instances of the ‘‘same process,’’ a murky concept; it is

in fact what makes blocking murky in general, but no more so here.

Murky, but inescapable, apparently.

Not only is blocking a more interesting theoretical possibility than a

Condition B solution to the ungrammaticality of (14a); there are also

empirical obstacles to implementing the latter solution. With some inher-

ently reflexive verbs, zich is permitted in a clausemate context.

(15) a. Max

Max

wast

washes

zich.

self

‘Max washes himself.’

b. Max

Max

schaamt

shames

zich.

self

‘Max is ashamed.’

102 Chapter 4

This strongly suggests that zich is not subject to anything like Condition

B; if it were, there would be no account of the di¤erence between (14a)

and (15a), since they have identical syntactic structure. But the blocking

account of the antilocality of zich at least hints where to look for the

answer. Under the blocking account (15a) would be grammatical only if

for some reason zichzelf is not permitted with these verbs; and in fact it is

not.

(16) *Max schaamt zichzelf.

Why is this so? Perhaps these verbs are only ‘‘formally’’ reflexive; that

is, perhaps they are, thematically speaking, intransitive verbs. In that case

there would be no possibility of introducing the reflexive in TS, as TS

consists purely of theta roles, and so nothing corresponding to the posi-

tion of the reflexive. The reflexive is therefore a kind of expletive element

in such cases. Having no theta role, it cannot be introduced until CS,

when the nonthematic but Case-marked direct object is introduced. But

since zichzelf is eligible only for coargument anaphora, it cannot be

used, enabling zich to appear. This explanation is satisfying because it

relates the thematic structure of the verb to the already established

blocking relation between the two reflexives in the only way that they

could be related.

In fact, this conclusion carries over to English, which also has ‘‘formal’’

or ‘‘expletive’’ reflexives. Consider (17).

(17) John behaved himself.

As noted earlier, the English form self-, being a prefix, must be resolved

in TS; therefore, it cannot participate in formal reflexive structures.

(18) *self-behavior

Admittedly, the import of (18) is undercut somewhat by the fact that even

the English CS full reflexive form is blocked in such contexts.

(19) *behavior/*shame/*perjury of oneself

It seems that formal reflexives are systematically blocked in nominaliza-

tions. But this again is reasonable in RT, after all, since nominalizations

do not have Case, and formal anaphors are pure ‘‘Case holders.’’ What-

ever the preposition of governs in nominalizations must be thematic.

At first glance wast in (15a) appears to exemplify a third pattern, dif-

ferent from those of both haat and schaamt; but it is actually simply am-

Anaphora 103

biguous between the two. Alongside its intransitive use (as in Max wast)

it also has a transitive use; furthermore, the intransitive use has a formal

reflexive, just like schaamt, so that wast merely appears to take both

reflexives, a situation inconsistent with the use of level blocking. English

wash shows the same ambiguity as Dutch wast, except that the intransi-

tive in English perhaps does not take a formal reflexive.

(20) a. Max wast zich/zichzelf.

b. Max washed.

c. Max washed himself.

4.3 Reconstruing Reinhart and Reuland’s (1993) Findings

If the proposals made thus far are correct, we can elaborate on a distinc-

tion used by Reinhart and Reuland (1993) (R&R) to account for the

behavior of di¤erent kinds of anaphora. In the end we will reject their

theory of anaphora, because it is incompatible with the one developed

here, and because of its own unresolved flaws. Our model will more

closely resemble Koster’s (1985), which drew something like the same

distinction that R&R’s model draws, but without its limitations.

R&R identify circumstances in which the locality of binding is sus-

pended, as in (21a).

(21) a. Johni thinks that Mary likes Sue and himselfi.

b. *Johni thinks that Mary likes himselfi.

The di¤erence between (21a) and (21b) is that in (21b) the reflexive is in

an argument position, whereas in (21a) it is in only part of an argument

position. R&R conclude that there are two types of anaphor: lexical

(SELF, in R&R’s terms) and logophoric (SE). Lexical anaphora holds of

coarguments of a single predicate and hence occurs ‘‘in the lexicon,’’

whereas logophoric anaphora is a discourse-level business, the same

business that resolves pronoun antecedence.

I think the fundamental problem with R&R’s account is that there is

nothing in the account intermediate between lexical and logophoric ana-

phora. In Dutch, for example, zichzelf does seem to hold roughly of

coarguments, as we saw, and hence could be construed as lexical. How-

ever, not only is zich not a discourse-level anaphor, it in fact has rather

tight locality restrictions, something like English himself—a property

R&R’s account will entirely miss.

104 Chapter 4

The binary distinction made in R&R’s account leads to two other

problems as well.

The first problem is posed by ECM constructions, which show opacity

e¤ects even though the reflexive is not a coargument of the antecedent.

(22) a. John believes [himself to have won].

b. John thinks that Mary believes herself to have won.

R&R devise the notion ‘‘syntactic coargument’’ for this case: believes

assigns Case to himself and a theta role to John, and so they are co-

arguments in some extended sense. In fact, I should not quarrel too

much with this conclusion, as it corresponds so closely to my own, in that

one could call an antecedent in RT’s CS a ‘‘syntactic’’ argument. But

even so, I think R&R’s account su¤ers here mainly from having only a

binary distinction. Once their account is revised so that ‘‘syntactic co-

argument’’ replaces ‘‘thematic coargument’’ as the determinant of reflex-

ive antecedence, it becomes impossible to distinguish himself from self- in

English, where indeed coargument in the narrowest sense (theta-theoretic

coargument) seems to be the governing relation. Consider (23), for

example.

(23) *John self-believes to have left.

Here self- cannot correspond to the ‘‘syntactic’’ object of believes. Of

course, one might stipulate some property of believe that would rule (23)

out, but that would fail to express the very likely and interesting conclu-

sion that such cases are impossible.

In the RT analysis of ECM presented earlier, ECM arises from mis-

mapping TS to CS.

(24) a. TS: [John believes x]þ [himself to have left] ¼b. CS: John believes himself [to have left]

If anaphora applies in CS (or perhaps in CS and TS), then in CS it will

relate the two Case positions John and himself. The locality condition will

apply in CS and so will be bounded by the subject of believe, as that is the

highest NP available in the Case domain of that verb.

The other problem R&R’s account never satisfactorily resolves has to

do with reciprocals. Reciprocals in direct object position show familiar

locality e¤ects; but other reciprocals, while appearing in a broader

range of contexts, do not escape the utterance altogether in finding their

antecedents.

Anaphora 105

(25) John and Mary think pictures of each other are in the post o‰ce.

In (26a) each other is clearly not a coargument of John and Mary; but

from this we cannot conclude that each other is logophoric, as (26b)

shows.

(26) a. [John and Mary]i called on each other at the same time.

b. *[Each other]i’s houses consequently had a forlorn and deserted

look.

What is needed again is something intermediate between coargument

and discourse level; CS or SS application of reciprocal interpretation will

give the right result.

In fact, examples like (25), despite not being constrained to coargu-

ments, nevertheless show strict locality e¤ects.

(27) *John and Mary think that Bill wants pictures of each other to be

in the post o‰ce.

R&R’s account predicts that such cases should be grammatical: since the

antecedent cannot by any means be construed as a coargument of its

predicate, it must be taken to be a logophoric pronoun and should then

show no opacity e¤ects. Stipulating that long-distance reflexives involve

movement does not help either, as we saw in the case of Latin that long-

distance reflexivization penetrated the most hardened islands.

A correct summary of the situation in English must include the fact

that there are at least two di¤erent uses of the reflexive that cannot be

construed as coargument-antecedent-taking cases: one like (27) in which

the reflexive occurs in an argument position, in which case it shows sub-

ject opacity e¤ects; and another involving coordinated reflexives as in

(21a) (Sue and himself ), which do not show such opacity e¤ects. R&R’s

theory cannot distinguish these two, as it makes only a binary distinction

and neither of these qualifies as the ‘‘reflexive predicate’’ case. I will out-

line the RT account of such cases in the next section.

4.4 Predicate Structure

To resolve a problem with binding theory in RT, it is necessary to posit a

level between CS and SS. If that level is identified as the level at which the

subject-predicate relation is established, further puzzles can be tentatively

resolved, and new di¤erences between the RT treatment and the standard

treatment of binding theory emerge.

106 Chapter 4

4.4.1 The Level of Binding Theory

The behavior of German binding theory will compel us to interpose a

level between CS and SS, by the following reasoning. Short-distance

scrambling in German takes place after Case assignment, because the

scrambled NPs retain their Cases. Short scrambling takes place before

binding theory applies, because binding theory relations are computed

strictly on the output of short scrambling. Furthermore, binding theory

(in English) applies strictly before wh movement. Assuming wh movement

takes place in SS, we have the following implications:

(28) Case BT wh

CS < scrambling < XS < SS

In order to account for these relations, we must posit the level XS at

which binding theory applies, and this level cannot be identified with

either CS or SS.

In this section I will take up the idea from chapter 3 that XS is Predi-

cate Structure (PS), the level in which the subject-predicate relation is

instantiated. I suggested in chapter 3 that such a level is needed to un-

derstand the di¤erence between Icelandic and Russian EPP e¤ects. Here I

show that its existence resolves other puzzles as well, and I rationalize the

behavior of English-style anaphors by identifying them as PS anaphors.

This conclusion interacts with the LEC to make an unusual, but per-

haps correct, prediction about long-distance anaphors in English: specifi-

cally, if a tensed-clause boundary intervenes between an anaphor and the

nearest of its possible antecedents, then the anaphor can take more dis-

tant antecedents than if no tensed-clause boundary intervenes. The pre-

diction follows from the LEC because tensed Ss are introduced late (in

SS), and so any anaphor that still lacks an antecedent when the derivation

reaches SS is in fact an SS anaphor and therefore enjoys the broadest se-

lection of antecedents under the LRT correlations. The prediction is quite

contrary to intuition, and contrary to the standard treatment of anaphors

in general, wherein tensed-clause boundaries either are irrelevant to

choice of antecedent or prevent the choice of any higher antecedent, but

never enlarge the choice of possible antecedents.

The closest to a test case I have been able to construct for this predic-

tion is the following pair:

(29) a. Mary hopes that John will think that pictures of herself are in

the post o‰ce.

Anaphora 107

b. *Mary hopes that John believes pictures of herself to be in the

post o‰ce.

If the judgments are as indicated, then the prediction is borne out. In

(29a) a tensed-clause boundary is introduced before herself has found its

antecedent. As a result, the anaphor is an SS anaphor and therefore not

subject to the bounding e¤ects of subjects at PS; hence, it is allowed to

skip over the subject John to target Mary as its antecedent (perhaps only

if Mary is a Focus—but this makes sense since SS (or AS, to be intro-

duced in chapter 9) defines Focuses, not subjects). In (29b), on the other

hand, the structure [John believes [ pictures of herself to be . . . ]] is a CS (or

maybe PS, it doesn’t matter) embedding, because believe takes an IP-sized

complement, and that structure contains a subject for herself to bind to;

as a result, the anaphor here is a PS anaphor and must take that subject

as its antecedent. Recall that earlier anaphors always block later ana-

phors, so that if an anaphor can be identified as an earlier anaphor, it

must be.

Note that Reinhart and Reuland’s (1993) proposals do not discriminate

between (29a) and (29b), because in both cases the anaphor is not a

coargument (even in their extended sense of coargument) of any possible

antecedent. Hence, these must both be logophoric anaphors and so

should behave similarly; in particular, they should not show the di¤erence

that (29a) and (29b) illustrate.

From this section and preceding ones, it emerges that the English re-

flexive is a flexible type of anaphor: it takes its antecedent at the earliest

possible moment. If it has a possible antecedent in TS, it takes it; but

then, if it arrives in PS without an antecedent and a subject is available

there, it must take that; and finally, if it reaches SS with no possible

antecedent, it can even skip over subjects in its search. Reinhart and Reu-

land (1993) capture one part of this with their binary distinction, but if

the reasoning here is correct, there is really an n-way distinction, one for

each level in RT.

The standard view is that English has a subject-oriented anaphor as its

main anaphor, plus some special cases like (21), which are perhaps,

according to this view, not really anaphors at all, and for which a special

unrelated theory must be devised.

By contrast, RT has a single notion of anaphor whose properties vary

across the levels in a predictable way. According to this view, English

only appears to have a subject-oriented anaphor as its main anaphor, be-

cause it is only in very unusual circumstances that an anaphor can escape

108 Chapter 4

antecedents until as late as SS. In other languages, like Japanese and

Dutch, which have multiple anaphors, there is less flexibility, because the

di¤erent anaphors are specialized to di¤erent levels of the RT model.

The RT model of anaphora is thus much tighter than Reinhart and

Reuland’s model, because it does not posit di¤erent types of anaphors

with potentially di¤erent properties. Rather, it posits a single anaphor,

with properties that vary predictably across levels.

4.4.2 Nominative Case and PS

In discussing Case so far, I have alluded only briefly to the special status

of nominative Case and its relation to subject position (chapter 2). The

distinction just made between CS and PS o¤ers an opportunity to address

the di¤erent ways in which languages treat the Case of the subject.

In some languages the subject seems to be assigned Case by a calculus

involving the verb and the other verbal arguments as well; but in other

languages the subject is invariably assigned nominative Case. Languages

of the first type are the ergative and the quirky Case-marking languages,

and languages of the second type are the nominative-accusative languages

like English—what I will call labile nominative and fixed nominative

languages, respectively.

In RT it is natural to associate this di¤erence in behavior with two

di¤erent levels: labile nominative will be assigned at the same time as ac-

cusative and other VP-internal Cases, whereas fixed nominative will be

assigned later, in isolation from the assignment of the other Cases. In this

I follow the ‘‘Case in tiers’’ model (Yip, Maling, and Jackendo¤ 1987),

which distinguishes the two behaviors by identifying di¤erent domains for

the Case assignment algorithm: VP for English-style languages, and S for

quirky Case-marking languages.

This approach can be implemented in several di¤erent ways. I think it

is worthwhile to explore the simplest scheme, wherein nominative in fixed

nominative languages is assigned in PS, and assigned only to the subject

there, but nominative in labile nominative languages is assigned in CS,

and the nominative NP is associated with PS under the CS‘PS map-

ping. This comes closest to modeling Yip, Maling, and Jackendo¤ ’s

scheme. This leaves open the possibility that some languages might have

both rules, a possibility I will not explore here, but which might open the

way to a coherent description of ‘‘mixed’’ ergative languages.

The Case and predicate structures derived for ordinary monadic and

dyadic predicates for the two language types are illustrated in (30).

Anaphora 109

(30)

After the Case and predicate structures are ‘‘unified’’ in the simplest pos-

sible way (by CS‘PS), both languages will have predicate structures

that look like this:

(31) [NPnom [V]]

[NPnom [V NPacc]]

Which is only to say that for the simplest cases the two language types

will look identical, which of course they do. Of interest, then, is how they

diverge for the less ordinary cases.

Consider first the Icelandic quirky Case-marking verbs, as discussed in

chapter 3. For some verbs, the subject is dative, and the object is nomi-

native. So the Case and predicate structures must be as shown in (32b,c).

(32) a. Barninu

the-child.dat

batnaDirecovered-from

veikin.

the-disease.nom

‘The child recovered from the disease.’

(Yip, Maling, and Jackendo¤ 1987, 223)

b. CS: [NPdat [V NPnom]]

c. PS: [NP [V XP]pred]

The obvious CS‘PS isomorphism gives the desired result. The rea-

son this could never happen in English, or any other fixed nominative

language, is that PS requires that its subject be nominative, and so there is

no opportunity for a CS nonnominative to be mapped to that position.

In (32c) the subject NP in PS has no Case marking. This should be

interpreted to mean that no Case marking is assigned or licensed at that

level for the structures in question. But if that position is put into corre-

spondence with an NP in the earlier CS level that does have a Case, then

any NP that occupies that level must have a Case as well; otherwise, the

shape-conserving correspondence is compromised.

As mentioned earlier, the CS/PS distinction in RT duplicates the dis-

tinction between the VP and S domains of Case assignment found in

Yip, Maling, and Jackendo¤ ’s model, and therefore assimilates their

results. Nevertheless, many questions remain. First, where do Case struc-

tures like (32b) come from? They could be generated by something like

110 Chapter 4

Yip, Maling, and Jackendo¤ ’s algorithm. But the algorithm itself will

involve mapping a series of Cases onto a syntactic structure, and so it

might be more interesting in the present context to try to model it as a

matching of two structures. However, I will leave that project for further

research.

For all cases—both normal dyadic predicates and quirky Case-

marking predicates—there is a notion of subject in PS that is independent

of CS. This captures the truth that however Case is assigned to the sub-

ject, its status as ‘‘subject’’ for such purposes as (target of ) raising and

(antecedent or target of ) control will be independent of that assignment.

This accords with the general finding that quirky Case-marked subjects

are ‘‘real’’ subjects in these respects. RT can be seen as defining several

notions of subject, one at each level, with predictably varying properties,

so that if all sorts of subject in levels earlier than PS converge as the same

sort of subject in PS, then they will be treated alike in all respects for any

processes later than PS.

Ergative languages can be treated similarly. In ergative languages an

intransitive subject has the same Case as the transitive object. The exam-

ples in (33) are from West Greenlandic.

(33) a. Kaalip

Karl.erg

Hansi

Hans.abs

takuaa.

sees

‘Karl sees Hans.’

b. Kaali

Karl.abs

pisuppoq.

walks

‘Karl walks.’

(Yip, Maling, and Jackendo¤ 1987, 220)

In such a language both ergative and absolutive are assigned in relation

to V in CS. No Case is assigned to the subject in PS. In the mapping to

PS, the ‘‘highest’’ NP is chosen for the NP subject position, because that

choice minimizes distortion.

(34) a. CS: [NP erg NPabs V]

a

PS: [NP [ . . . V]]

b. CS: [NPabs V]

a

PS: [NP [ . . . V]]

Anaphora 111

I have not seriously studied how ergative languages would fare under

this modelization, and I make these last remarks only to indicate what

seems to me to be the most obvious direction for such a project.

4.5 Functional Structure and Antecedence

In the literature on the locality of anaphora, beginning with Pica 1991,

much has been made of the morphological nature of the anaphor in

determining locality. Pica’s generalization is that long-distance anaphors

are monomorphemic, and local anaphors are bimorphemic. A number

of attempts have been made to explain this. The main ideas of RT do not

seem to me to connect with Pica’s generalization in any particular way.

But Burzio (1996) has brought into focus a generalization about the

nature of the antecedent of the anaphor that under some straightforward

assumptions does connect with RT in an interesting way.

What I want to concentrate on here is Burzio’s finding that subjects

of tensed clauses are, as he puts it, ‘‘more prominent’’ antecedents than

subjects of infinitives. One part of this finding is already widely recog-

nized: namely, that subjects of tensed clauses are more likely ‘‘blockers’’

of long-range anaphora than subjects of infinitives. However, this fact

has usually been expressed by saying that tensed clauses are islands for

some anaphors, whereas infinitives are not, thus making the relation to

the subject position a secondary, incidental thing. But Burzio, drawing on

the work of Timberlake (1979) and others, observes that it also sometimes

matters whether the antecedent itself is the subject of a tensed clause or of

an infinitive. (35) illustrates this e¤ect in Russian.

(35) a. I

and

onihe

ne

not

prosil

asked

nikogo

any

iz

of

nix

them

[provesti

lead

sebjaiself

v

to

nuznoe

needed

mesto] . . .

place

‘And he did not ask any of them to lead him to the necessary

place . . .’

b. ?(pro) I

and

onihe

stydilsja

embarrassed

[PROi poprosit’

ask

kogo-libo

any

iz

of

nix

them

[provesti

lead

sebjaiself

v

to

nuznoe

needed

mesto]].

place

‘And he was embarrassed to ask any of them to lead him to the

necessary place.’

(Timberlake 1979, as reported in Burzio 1996)

112 Chapter 4

Schematically, these examples have the following form:

(36) a. . . . [TensedP antecedentnom [Infinitive V reflexive]]

b. ? . . . [Infinitive antecedentpro [Infinitive V reflexive]]

Burzio’s generalization is not a consequence of any current theory, as

Burzio himself indicates. In fact, to the extent that analysis of anaphoric

relations is reduced to locality, the finding is slightly paradoxical, in that

it says that if an anaphoric relation is going to span at least x amount of

structure, then it must span even slightly more, xþ D, where D is the dif-

ference in functional structure that would separate a nominative Case-

marked subject from a PRO subject, if any.

In trying to come to grips with his finding in classical terms, Burzio

concludes that the long-distance antecedent is not the NP itself, but a

structure that includes the NP as well as part of clausal functional struc-

ture (the part responsible for nominative Case checking). Although I will

not need to distinguish the nominative antecedent in this way, I will make

a proposal that I think captures Burzio’s insight that nominatives are

‘‘more prominent.’’

In order to accommodate Burzio’s finding in RT, we might think of it

in the following way: the more long-distance the anaphoric relation is,

the higher the antecedent must be in the functional structure of its own

clause. Put this way, the resemblance of Burzio’s finding to the LRT cor-

relations is obvious—specifically, it is a locality-target correlation. This

suggests a timing explanation of the kind that RT implements with the

LEC. However, there are some obstacles, one perhaps fundamental, to

implementing Burzio’s finding in RT.

In previous explanations of the LRT correlations (e.g., in the deriva-

tion of the BOIM in chapter 3), I have used the concept of extension as

an auxiliary hypothesis; and I believe I have used it in a straightforward

way. In the present context, however, the use of extension is either a very

delicate matter, or impossible, and some rethinking is called for.

To see the problem, let’s assume that Russian is like English in that

T(ense) is introduced in SS and infinitives, control, and so on, are located

in PS. Let’s assume further that nominative Case is introduced in the

structure just as in English—that is, at the level at which T is introduced,

SS (see chapter 3 for other possibilities). Then (35a) is as expected, be-

cause at SS, and not before SS, the most peripheral element in the struc-

ture will be the matrix nominative NP, which in fact is the antecedent.

Anaphora 113

However, (35b) is more problematic. There are several di¤erent as-

sumptions about what the analysis is, but in fact none of them make the

problem go away. The matrix, although tensed, does not have an overt

nominative subject; instead, it might have one of the following:

(37) a. a covert nominative subject

b. a covert nonnominative subject (call it pro)

c. no syntactically represented subject at all

If (37a) is correct, then we have to ask, why is a covert nominative not

targeted when nominative is the target?, and we have no answer. If (37b)

is correct, then we have a serious problem with interpreting extension,

because we have to ask, why can’t pro be targeted as long as it is pe-

ripheral?, and we have no answer. If (37c) is correct, we might have an

answer, depending on how extension is interpreted. If we understand it to

mean that a relation must span every level of embedding, then perhaps we

can account for (35b): the long-distance reflexive cannot be assigned until

SS, but the surface structure of (35b) has no targetable antecedent, be-

cause there is no antecedent in the matrix.

Although (35c) might seem to be the best choice, I am not sure it leads

to the best theory or the best understanding of extension. Extension is

perhaps nothing more than a very simple approximation of the principle

that is needed here.

The intuition behind extension is that rules should always target ‘‘new’’

material. In classical minimalism, extension requires that the most

recently merged element be targeted; it does so because the most recently

merged element is the most peripheral. So far I have followed this think-

ing. But we might instead concentrate on ‘‘new’’ itself, and not try to im-

plement it in terms of peripherality. We would then say that what is new

in SS is T and nominative Case (at least), and that therefore a rule

assigned to that level must target nominative. That would explain the

di¤erence between (35a) and (35b), derive Burzio’s finding, and in fact

account for what he means by calling the nominative ‘‘more prominent.’’

Importantly, while nominative Case is ‘‘new’’ in SS, the NP itself is

not; under shape-conserving mapping, nominative NPs correspond to

NPs in the previous levels (TS, CS, PS).

(38) PS: [NP PredP]

��! ��!

SS: [NPnom VPT]

114 Chapter 4

The reason that the long-distance reflexive cannot be assigned to these

earlier ‘‘shadows’’ of the nominative NP is that it is assigned to SS, the

one stipulation about Russian on which this account hangs.

The most general lesson from this chapter is that if a distinction must

be drawn, the RT levels might provide a nonarbitrary way to draw it.

Here I discussed di¤erent kinds of anaphors, and di¤erent kinds of Case

systems. Rather than saying that there are two (or more) kinds of ana-

phors with di¤erent and perhaps unrelated properties, and the same for

Case systems, RT allows the properties to di¤er in a completely system-

atic way, if the distinction to be drawn can be aligned with the di¤erence

between RT levels. I will explore further instances of this method in the

next two chapters.

Anaphora 115


Chapter 5

A/A/A/A

In chapter 4 I developed a typology for anaphoric elements in RT by

assigning di¤erent anaphors to di¤erent RT levels, in a way that explains

their properties; in particular, this methodology explains the link between

the locality of an anaphor and the type of antecedent it requires (theta

antecedent/A antecedent/A antecedent). The later the level, the larger the

defined structures are and the more types of NPs are available, and thus

locality and antecedent type are linked with one another. This coordina-

tion is one dimension of what I have called the LRT correlations, corre-

lations that stem from the basic architecture of RT in a way that I think

distinguishes it from other models.

In this chapter I will apply the same methodology to scrambling and

movement rules. Every representation relation is capable of mismatches,

that is, nonisomorphic relations between structures, to which I will con-

tentiously apply the term scrambling. The later the representation rela-

tion, the broader the scrambling, and the wider the class of elements

targeted; all this is parallel to the account of anaphors in chapter 4.

But for scrambling and other rules, including in fact ‘‘real’’ movement

rules, there is a third dimension of variation: reconstruction. As with lo-

cality and target, the reconstruction relations a scrambling or movement

rule enters into are determined entirely by where in the model it occurs.

Simply put, a scrambling or movement relation reconstructs for any rela-

tion defined in previous levels, and for no relation defined at the same or

at later levels.

This is not a stipulation. Instead, it is the inevitable consequence of

how the notion of representation organizes the various levels: if a certain

relation (say ‘‘antecedent to anaphor’’) is established at level Xi, and level

Xiþ1 represents level Xi, then when level Xi mismaps to level Xiþ1, thatmismapping will appear to reconstruct, in that the configuration in level

Xi will be the one relevant to establishing the relation, not the configura-

tion in level Xiþ1. In fact, by this reasoning, reconstruction is entirelyrelativized to the levels: each representation relation will (appear to) re-

construct for any relations defined on any previous levels. This notion

of reconstruction is in fact indistinguishable theoretically from the rela-

tivized version of the A/A distinction discussed in chapter 4, and if my

reasoning is correct, the relativized notions should entirely replace the

binary notions.

The prediction about reconstruction is exactly the prediction made by

the model outlined in Van Riemsdijk and Williams 1981, but again gen-

eralized. In that model all NP movement (and some other similar rules)

first was applied to derive NP Structure, where binding theory was

defined, and then wh movement defined S Structure. This established a

natural relation between NP movement and wh movement, a relation in

which the latter reconstructs for the former, but not the reverse, and nei-

ther reconstructs with itself. RT generalizes this notion of reconstruc-

tivity, as part of the LRT correlation. I do believe that RT not only

generalizes reconstruction and the A/A distinction, but in fact rationalizes

them: they arise organically and inevitably as a result of the sequence of

representation relations that a grammar assigns to a sentence. In addi-

tion, RT links both rule type and reconstructivity with locality (via the

LEC), correlations the NP Structure model, lacking the LEC, could not

make.

In this chapter I will treat ‘‘real’’ movement and scrambling similarly,

even though they correspond to quite di¤erent mechanisms in the theory.

For the purposes of reconstruction, this di¤erence will be irrelevant.

In chapter 6 I will return to the diagnostic I used earlier to distinguish

movement from (mis)representation: representation gives rise to (the ap-

pearance of ) intersecting dependencies, whereas movement gives rise to

nesting dependencies. In this chapter we will see another diagnostic for

the di¤erence: movement can apply only once in a given domain, but

scrambling can apply more than once. This di¤erence follows from the

basic di¤erence in the nature of the rules: scrambling is part of the

matching up of two structures, and where the matchup is not perfect that

will involve, or give the appearance of involving, multiple displacements.

The title of this chapter is a sort of joke, playing on the ambiguity of

bar. In the A/A distinction, the bar actually means ‘not’, so A means ‘not

argument’, as opposed to ‘argument’. So A , if it means anything, means

‘not not argument’, which is identical to argument. The bar notation thus

locks us into the binariness of the opposition. I suggest that the bar ¼ not

118 Chapter 5

interpretation simply be dropped, opening up the possibility of generaliz-

ing to a series, a series in fact indexed by the RT levels.

5.1 Long-Distance and Short-Distance Scrambling

Extremely local scrambling will be identified as CS misrepresentation;

that is, CS is mapped onto a misrepresenting SS, or some level later than

CS. This identification generates expectations about reconstruction—that

is, about the interaction of scrambling with binding theory (BT) and the

other properties that Van Riemsdijk and Williams (1981) associated with

A movement in the NP Structure model. The precise expectations depend

on where the theories that govern these phenomena intersect the levels of

the RT model. If, for example, BT applies in SS, meaning that surface

structures are the sole determinants of BT relations, and if local scram-

bling precedes SS, then local scrambling will interact with BT in the fol-

lowing way:

(1) Only the structures arising from local scrambling will determine the

applicability of the BT definitions.

This is largely correct. In German, for example, local scrambling

(sometimes called object shift) shows the following behavior:

(2) a. Ich

I

habe

have

die

the

Gasteiguests.acc

einander

one-another.dat

ti vorgestellt.

introduced

‘I introduced the guests to one another.’

b. *Ich habe einander die Gastei vorgestellt.

In this example I have assumed that the base order in German is ‘‘dative

accusative V,’’ an assumption that is somewhat controversial. I have

used a trace to mark the scrambled-from position; but of course in RT

there will be no trace of scrambling, because it is not a real movement,

but a displacement that arises from the mismatch of two levels. Only the

scrambled order (2a) permits the accusative NP to bind the dative NP,

assuming the theta order [goal [theme V]]. I will therefore assume that

BT applies in SS, or shortly after CS, if other levels intervene. The NP

Structure model captured a part of this generalization, in separating NP

movement from wh movement.

Long-distance movement (including scrambling), on the other hand,

does not conform to (1); in fact, it defies it systematically, in that (a) the

target position does not license antecedents that could not have been

licensed from the start position, and (b) moved anaphors that need local

A/A/A/A 119

antecedents find them local to the start position, not the target position.

All this is well known (see Van Riemsdijk and Williams 1981; Webelhuth

1989; Mahajan 1989; Vanden Wyngaerd 1989; Deprez 1989; Santorini

1990; Williams 1994b).

I have assumed that wh movement (and its special instance called top-

icalization) is a movement internal to one of the levels in the model, SS

or later. Suppose for discussion that the relevant level is SS, and suppose

that for English, BT is determined at PS (i.e., between CS and SS), as

suggested in chapter 4. Then the interaction of BT and wh will show re-

construction e¤ects, as diagrammed in (3).

(3) CS‘PS‘ SS . . .

BT wh

Binding relations established in SS will be established independent of—in

fact, in complete ignorance of—the operation of wh movement in FS.

How does this work concretely?

Suppose we have the theta structure John likes himself, which is

mapped onto a CS object and thence onto the PS object John likes himself,

which in turn is transformed into himself John likes t in SS. The binding

relations established in CS are not perturbed by the later movement.

(4) PS: John likes himself (BT)m Johni likes

himselfi

a

SS: John likes himself whm himself John likes t

Notice that the derivation involves a mixture of derivation and represen-

tation relations. The antecedent is defined in PS, SS represents PS, and

movement occurs within SS.

Nothing fundamental is changed if we replace wh movement within a

level with a representation relation occurring after PS, or, more to the

point, a misrepresentation relation. For example, suppose for purposes of

illustration that there were a scrambled relation between PS and SS; then

we would find the derivation in (5), which shows the same reconstruction

features as (4).

(5) PS: John likes himself (BT)m Johni likes

himselfi

a

SS: himself John likes

So reconstruction is indi¤erent as to whether the reconstructing relation

is interlevel misrepresentation or intralevel movement. Therefore, even

120 Chapter 5

without deciding whether ‘‘long-distance’’ scrambling is a wh movement–

like movement or a misrepresentation, we know how it will interact with

BT from the fact that it takes place later than BT.

In RT it is inevitable that reconstruction is a relative term. A movement

M reconstructs with respect to a relation R if R is established before M,

in the sense of the examples just discussed. In principle, a given move-

ment relation could reconstruct for one relation (even a movement rela-

tion) before it, and ‘‘be reconstructed for’’ by another movement after it,

all in the same sentence.

Given this, we might expect to find scrambling at every level. Long-

and short-distance scrambling could be understood as misrepresentations

of CS and PS, respectively, but are there other scramblings as well? RT

leads us to expect that there could be scramblings both earlier and later

than the ones identified as long and short scrambling.

Linguists have learned to think of long scrambling as involving re-

construction, and short scrambling as not. In fact, though, even short

scrambling shows reconstruction e¤ects of a certain kind: theta relations

could be viewed as being assigned ‘‘under reconstruction’’ of short

scrambling. Thinking of the theme in (6) as getting its thematic relation to

the verb under reconstruction of object shift is perfectly analogous to the

long-distance case.

(6) Ich

I

habe

have

Bill

Bill

gestern

yesterday

t gesehen.

seen

‘I saw Bill yesterday.’

But in fact RT even suggests that certain ‘‘Case-changing’’ or

grammatical-relation-changing operations (e.g., the antipassive construc-

tion) could be the result of ‘‘scrambling’’ at the earliest level, between TS

and CS. I note this possibility, but will not pursue it here.

Conversely, RT leads us to look for scramblings later than long

scrambling. This expectation too is fulfilled, as we will see in section 5.2.

The behavior of the extremely long and the extremely short cases of

scrambling tend to support the idea that scrambling occurs for every rep-

resentation relation in RT, and to call into question the binarity of the

A/A distinction.

Thus far I have accounted for the relation of di¤erent sorts of scram-

bling to BT in a way that generalizes the A/A distinction, and that

generalizes to other sets of relations besides BT relations. But I have not

yet accounted for the locality of scrambling and its interaction with BT. It

has traditionally been thought that long scrambling is A scrambling and

A/A/A/A 121

that local scrambling is A scrambling. The basic still-unanswered ques-

tion is why the possibility of reconstruction should correlate positively

with the distance moved. Linguists are so familiar with this correlation

that they do not generally appreciate that it remains unexplained. I will

now outline how the correlation between distance moved and type of

scrambling is achieved, and in fact is inevitable, in RT.

Recall that in chapter 3 di¤erent types of embedding were associated

with, and took place at, di¤erent RT levels, according to the LEC. These

ranged from very small TS embeddings, showing the tightest clause union

e¤ects, to embedding at FS, which showed the strong ‘‘insulating’’ prop-

erties of nonbridge verb embedding. The locality of movement relations

will be determined, in part, by what level the movement applies at, and in

particular, by what embedding has taken place at that level; an extraction

is of course impossible if the phrase to be extracted from is not yet present

in the same structure as the target of the movement.

This arrangement makes predictions about the relation of the locality

of scrambling to the type of reconstruction that takes place. If a particu-

lar kind of scrambling is defined on an early representation relation, it

will be ‘‘local,’’ in that it will not be able to bridge tensed embeddings,

and it will also not show reconstruction e¤ects with respect to BT, as-

suming BT applies later; but if a scrambling takes place later, it will be

nonlocal, and it will interact with BT reconstructively.

(7) XS ‘ PS ‘ SS

��! BT ��! tensed-S embedding

XS scrambling PS scrambling

It will be nonlocal in that it has access to a new set of embeddings to ex-

tract from; it will not interact with BT reconstructively because BT occurs

later. For example, on this view any scrambling that spans tensed Ss must

show BT reconstruction—to span tensed Ss, it must occur at or later than

SS, and by then BT relations are already fixed.

German presents an interesting constellation of scramblings. It has

what is called object shift, which is a purely local clause-bound scram-

bling. It has a subcase of object shift out of ‘‘restructuring’’ infinitives.

And some dialects allow scrambling out of infinitives in general, in a

construction studied by Santorini (1990). Santorini demonstrates two

properties of the construction: the scrambling is middlingly local, and it

acts like A movement for BT. As for locality, although it is not clause

bound, it can bridge at most an embedded infinitive construction.

122 Chapter 5

(8) a. Ich

I

habe

have

vorgehabt, [

planned

die

the.acc

Gasteiguests

einander

recip.dat

ti vorzustellen]IP.

to-introduce

‘I planned to introduce the guests to each other.’

b. ?Ich habe die Gastei einanderj vorgehabt [t0i tj ti vorzustellen]IP.

Here the accusative and the reciprocal have both scrambled out of the

embedded clause, and they have switched places; the new order licenses

binding. Santorini shows that the binding is not licensed under recon-

struction, inasmuch as (9b) is impossible.

(9) a. Ich habe vorgehabt, [einanderi die Gastej ti tj vorzustellen]IP.

b. *Ich habe einanderi vorgehabt, [ti die Gastej ti tj vorzustellen]IP.

In (9b) the reciprocal has scrambled out of the infinitive, leaving the an-

tecedent behind. The result is ungrammatical, showing that reconstruc-

tion is not the right approach, as it should work here if it were working in

(8). So the movement in (8) is like A movement.

The problem with calling it A movement is precisely its nonlocality.

First, it is not like NP movement in that it preserves earlier Case assign-

ments; relevant here is the fact that only vorzustellen ‘to introduce’ in (8)

assigns dative Case, not vorgehabt ‘planned’. Second, it is unlike NP

movement in being able to move nonsubject arguments out of an em-

bedded clause.

It is instructive to see how RT determines where this scrambling must

take place. The particular pairing of properties just described can be

accommodated in RT only in a very particular way. The scrambling in

(8) is bounded on the left by CS and on the right by BT; the first because

the original Cases are preserved on the scrambled NPs, and the second

because the binding is licensed in the derived positions. Furthermore, it

is bounded on the left by whatever level infinitive embedding occurs at,

and it is bounded on the right by whatever level tensed-S embedding

occurs at. There are a number of arrangements that satisfy all of these

constraints, though of course they have a quite specific character, so the

proposal is not without content, even if the levels are not specifically

identified.

(10) CS ‘ XS . . . A YS . . . B ZS . . . C WS

��! ��! ��! ��!

Case infinitive

embedding

BT tensed-S

embedding

A/A/A/A 123

The scrambling must occur somewhere between YS and ZS here, in the

region subscripted B.

In order for this to be a fully satisfying account of Santorini’s con-

struction, the levels XS . . . WS must be identified. I could arbitrarily

make assignments now (e.g., ZS ¼ PS, WS ¼ SS, YS ¼ CS) on the basisof the levels identified in chapter 4. Even without those details, however,

some of the interest of the theory can be enjoyed, because of the implica-

tional relations that must hold no matter how those assignments are

made, as the boundings illustrated in (10) must be preserved in any more

specific model. For example, if the relative ordering of the relevant levels

is that given in (10), there cannot be a type of scrambling that spans

tensed Ss yet interacts with BT nonreconstructively, since BT applies later

than Case assignment.

(11)

Nor could there be a type of scrambling that leads to new Case assign-

ments (and so is earlier than CS) yet interacts with BT reconstruc-

tively. RT allows a number of di¤erent kinds of scrambling (A, B, C in

(10)), thus expanding the A/A repertoire in a way perhaps prefigured

by Webelhuth (1989). However, it does not allow for just any arbitrary

combination of properties, which Webelhuth’s theory unfortunately did,

because saying that scrambling is simply a mixture of some A and some

A properties leads to exactly this expectation.

Thus far I have considered reconstruction for anaphor binding. But

languages also exhibit reconstruction for scope, with the same di¤erence

between short and long scrambling. Ueyama (1998) presents evidence

that these two types of scrambling in Japanese di¤er in their reconstruc-

tion behavior just in the manner the RT model would predict.

First, it has been well known since Hoji 1985 that monoclausal scram-

bling in languages like Japanese gives rise to scope ambiguity. The inter-

pretation of the scopal order of S and O is unambiguous in SOV clauses,

but OSV order introduces the possibility for O to take wide scope. This is

characteristic of scrambling before scope fixing, a possibility only for very

early, and therefore very local, scrambling. See chapter 6 for further dis-

cussion of monoclausal scrambling.

124 Chapter 5

Ueyama (1998) presents a range of scope interactions involving scram-

bling in biclausal structures, documenting a sharp distinction between

monoclausal and biclausal scrambling in Japanese.

First, as already indicated, if scrambling does not occur, scopes are

fixed.

(12) [Yaohan-sae]QP2 -ga [

Yaohan-even-nom

seizi-dantai

political party

X-ga

X-nom

[55%-no

55%-gen

ginkoo]QP1 -ni

bank-dat

supai-o

spy-acc

okurikonda

dispatched

to]CPcomp

kimetuketeiru.

conclude

‘[Even Yaohan]QP2 concludes [that political party X had dispatched

spies to [55% of the banks]QP1 ]CP.

(Ueyama 1998, 50)

For these cases with no scrambling, Q2 has scope over Q1unambiguously.

Second, scrambling within the lower clause does not change scope

relations between lower-clause and upper-clause quantifiers.

(13) [Yaohan-sae]QP2 -ga [[

Yaohan-even-nom

55%-no

55%-gen

ginkoo]QP1 -ni

bank-dat

seizi-dantai

political party

X-ga

X-nom

supai-o

spy-acc

okurikonda

dispatched

to]CPcomp

kimetuketeiru.

conclude


spies to [55% of the banks]QP1 ]CP.’

(Ueyama 1998, 51)

Here QP1 has been scrambled to the head of the embedded clause, but

that does not a¤ect its scope interaction with the matrix QP2: again, QP2has scope over QP1. This result is perhaps expected on all accounts, but it

is relevant to Ueyama’s demonstration nevertheless.

The surprising fact is that when the embedded quantifier QP1 is

scrambled to the matrix clause, it still cannot take scope over the matrix

QP2.

(14) [55%-no

55%-gen

ginkoo]QP1 -ni [

bank-dat

Yaohan-sae]QP2 -ga [

Yaohan-even-nom

seizi-dantai

political party

X-ga

X-nom

supai-o

spy-acc

okurikonda

dispatched

to]CPcomp

kimetukeiteiru.

conclude


spies to [55% of the banks]QP1 ]CP.’

(Ueyama 1998, 51)

A/A/A/A 125

QP1 unambiguously takes scope beneath QP2, even though it has been

fronted beyond it. This is surprising, because in the monoclausal case

OSV order leads to the possibility of wide scope for O over S. These cases

clearly indicate a close connection between locality and reconstruction.

Long scrambling reconstructs for quantifier scope fixing, whereas short

scrambling does not, exactly the direction of correlation that RT predicts.

5.2 Scrambling and the Subject

Bayer and Kornfilt (1994) present another set of scrambling cases that

show the full scope of the LRT correlations (locality, reconstructivity,

and target). They demonstrate a three-way split for scrambling that

strands a quantifier. On the assumption that quantifiers, or at least some

quantifiers, are not present until SS (or QS), it follows that scrambling

targeting those quantifiers cannot apply at least until then, and with a

further assumption it follows that quantifier-related scrambling cannot

take place to a position beneath the subject. Bayer and Kornfilt’s exam-

ples are these:

(15) a. Socken

socks

zieht

puts

der

the

Heinrich

Heinrich

im

in-the

Sommer

summer

keine

none

an.

on

‘Heinrich puts no socks on in the summer.’

b. ? . . . dass [Socken der Heinrich im Sommer keine anzieht]IP.

that

c. *. . . dass der Heinrich [Socken [im Sommer keine anzieht]].

that

In these examples the noun Socken has been scrambled away from its

quantifier keine. Since this scrambling appears to target quantifiers, it

applies at SS, but then extension will require it to move to the edge of

the constituents defined at that level. Since those constituents are IPs at

a minimum (more likely CPs), any scrambling will have to move to the

edge of IP, or to CP; as a result, (15c) is impossible, because here the

scrambling moves only to the left edge of VP.

Moltmann (1990) shows convincingly that scrambling targeting quan-

tifier expressions obligatorily exhibits reconstruction e¤ects, even when

applying in a simple clause.

126 Chapter 5

(16) a. . . . weil

because

Hans

Hans

Bilder

pictures

voneinanderiof-each-other.acc

den

the

Leutenipeople.dat

keine t

none

zeigen

to-show

mochte.

wants

‘. . . because Hans doesn’t want to show the people any pictures

of each other.’

(Moltmann 1990, (116a))

b. . . . weil

because

Maria

Maria

diese

these

Bilder

pictures

voneinanderiof-each-other.acc

den

the

Leutenipeople.dat

sicher t

surely

zeigen

to-show

wollte.

wanted

‘. . . because Maria surely wanted to show the people these

pictures of each other.’

(Moltmann 1990, (117a))

In (16a) the binding of voneinander by Leute takes place under recon-

struction; that this is a special feature of the rule splitting quantifier

phrases is shown by the unavailablity of binding in (16b), completely

parallel to (16a) except that the scrambling moves an intact definite NP.

It appears then that German actually has three kinds of scrambling

(scrambling in simple clauses, scrambling that moves out of embedded

clauses, and quantifier-targeting scrambling of the kind just illustrated),

and each shows the reconstructivity properties that are expected to follow

from the particular level at which it applies.

In the chapter 6 discussion of Superiority in Japanese, we will see the

same di¤erence between scrambling to a position above and scrambling

to a position below the subject: scrambling two wh words to a position

below the subject does not lead to ambiguity, but scrambling them both

to a position above the subject potentially does, thus repeating the lesson

learned from German in chapter 2.

In the debate over the nature of short, clause-bounded scrambling the

recurring question is whether it is an A movement or an A movement.

Some researchers have argued for A movement, some for A movement,

some for a mixed status of one kind or another. The cluster of properties

distinguishing A from A movements was hypothesized in Van Riemsdijk

and Williams 1981 to involve the now familiar cluster of properties con-

cerning BT and reconstruction. In the present context the questions con-

cerning the status of movement have all been relativized. That is, we now

ask not whether a given movement reconstructs, but what it reconstructs

A/A/A/A 127

for; and we ask not whether a moved constituent may antecede BT ele-

ments, but what it may antecede. I think that some of the confusion and

conflicting results pertaining to clause-bounded scrambling can be solved

in the context of a notion of the A/A distinction relativized in this way.

A recurring observation is that scrambling to a position beneath the

subject has di¤erent properties from scrambling to a position above the

subject. In German, for example, an accusative NP scrambled over a

dative reflexive may antecede that reflexive, but an accusative NP scram-

bled over a nominative subject may not antecede that subject.

(17) a. . . . dass

that

der

the

Arztidoctor.nom

den

the

Patientenjpatient.acc

sichi=jhimself.dat

ti im

in-the

Spiegel

mirror

zeigte.

showed

‘. . . that the doctor showed the patient himself in the mirror.’

(Muller 1995, 160)

b. *. . . dass

that

den

the

Frank

Frank.acc

sichihimself

manchmal

sometimes

ti nicht

not

gemocht

liked

hat.

has

(Muller 1995, 161)

Can such e¤ects be understood as arising from timing in RT? I have

already suggested in chapters 3 and 4 that there are several notions of

subject, each pertaining to a di¤erent level in the model: Case-theoretic,

controllable, nominative, and so on. Each of these occupies a position in

its own level, related to the subject positions in the other levels through

the representation relation.

Suppose, as suggested in chapter 3, that the controllable or nominative

subject is defined in PS and that the binding of subject-sensitive anaphors

takes place there as well. Schematically:

(18) TS‘CS‘A PS ‘B SS‘FS

control,

nominative

Case,

reflexive

binding

If representation relation A is a scrambled relation, the scrambling will be

restricted to positions beneath the surface subject position, which we are

identifying here with nominative Case; and it will also appear to deter-

mine the input to reflexive binding, as it precedes the level at which re-

128 Chapter 5

flexive binding takes place. If representation relation B is a scrambled re-

lation, it will involve scrambling over the surface subject position; and

it will appear that the binding relations are computed on the input to the

scrambling relation. Since the subject is overtly marked nominative, we

know that it is the subject of PS, and so scrambling must follow it. If

scrambling took place before PS, it would not preserve the nominative

Case relations, as these, unlike the Case relations of internal arguments,

are not determined until PS.

For representation relation B, the arrangement shown in (18) deter-

mines that scrambling reconstructs for BT. In fact, that is exactly what

happens for scrambling to a position above the subject, but not for

scrambling to a position beneath the subject.

(19) a. . . . dass

that

der

the

Arztjdoctor

sich�i=jhimself

den

the

Patientenipatient.acc

ti im

in-the

Spiegel

mirror

gezeigt

showed

hat.

has

‘. . . that the doctor showed the patient himself in the mirror.’

(Muller 1995, 177)

b. . . . dass

that

sichihimself

der

the

Fritz

Fritz.nom

ti schlau

intelligent

vorkommt.

appears

‘. . . that Fritz appears intelligent to himself.’

(19a) shows that scrambling to a position beneath the subject does not

allow reconstruction. In RT this means that such scrambling occurs

strictly before PS. But (19b) shows that scrambling to a position above

the subject does permit reconstruction, just as we would expect if scram-

bling to that position was not possible until PS.

Examples (19a,b) draw a fine distinction between movement to posi-

tions above and below the subject, but excluding movement to SpecC. Of

course, movement to SpecC permits reconstruction as well, in English as

well as in German, as (20) shows.

(20) a. Himselfi John likes ti.

b. Sichihimself

hat

has

FritziFritz

schon immer tialways

gemocht.

liked

(Muller 1995, 177)

(17) and (19) together show the fine interaction among binding, recon-

struction, and Case assignment, an interaction unique to RT as far as I

can tell.

A/A/A/A 129

5.3 A/A/A Reconstruction

The limitation of the binary A/A distinction becomes more acutely evi-

dent when we consider ‘‘higher’’ (or ‘‘later’’) reconstructions. Suppose

there were a rule that moved wh words, perhaps among other things, but

in such a way that a moved wh word was interpreted strictly in reference

to its reconstructed, or original, position. RT in fact leads us to expect

such a scrambling rule, on grounds of full generality: why should any

representation relation not be subject to mismatching to achieve semantic

e¤ects? Exactly such a rule is found in Japanese.

(21) ?Dono

which

hon-oibook.acc

Masao-ga

Masao.nom

[Hanako-ga

Hanako.nom

ti

tosyokan-kara karidasita ka]CPchecked-out

siritageatteiru.

wants-to-know

‘Masao wants to know which book Hanako checked out.’

(Saito 1991, (33a))

The wh word at the top of the matrix clause in (21) is interpreted at

the top of the embedded clause, even though it has been entirely

removed from the embedded clause, which means that it is licensed (wh-

interpreted) in its reconstructed position. We will assume that the move-

ment illustrated in (21) (called long topicalization by Saito (1991)) occurs

later than wh movement or wh interpretation. If wh movement (or con-

strual) occurs at SS, then long topicalization must occur at FS; and since

reconstruction is relative, this means that long topicalization reconstructs

for the purpose of wh movement/interpretation. Although this construc-

tion is called topicalization, it lacks the wa topic marking of more famil-

iar Japanese topicalization structures. Presumably this is because long

topicalization applies after such wa marking is licensed. The lack of wa

marking again would make sense if long topicalization applied at FS,

with wa marking applying at SS or earlier.

It is important to observe that X reconstructs for Y only if X is strictly

later than Y. Specifically, elements at the same level do not reconstruct

for one another. For example, as noted in Van Riemsdijk and Williams

1981 and Williams 1994b, wh movement does not reconstruct for wh

movement.

(22) a. *[Which picture of ti]j do you wonder whoi tj upset?

b. ?Whoi do you wonder [which picture of ti upset]?

130 Chapter 5

Both (22a) and (22b) involve extraction of one wh phrase from another

and so neither is fully grammatical. But only (22a) requires reconstruction

of wh movement for wh movement, and so it is far worse than (22b).

Likewise, long topicalization in Japanese, although it reconstructs

for wh movement, does not reconstruct for long topicalization, even

though multiple long topicalizations in a single multiclause structure are

grammatical.

(23) a. Taroo-ga

Taro-nom

[Hanako-ga

Hanako-nom

Masao-ni

Masao-dat

sono

that

hon-o

book-acc

watasita

handed

to]

that

omotteiru koto.

thinks

‘Taro thinks that Hanako handed the book to Masao.’

b. Sono honi-o Masaoj-ni Taroo-ga [Hanako-ga ti tj watasita to]

omotteiru koto.

c. Taroo-ga

Taro-nom

[Hanako-ga

Hanako-nom

sono

that

hon-o

book-acc

yonda

read

to]

that

itta koto.

said

‘Taro said that Hanako read the book.’

d. *[Hanako-ga ti yonda to]j sono honi-o [Taroo-ga tj itta koto].

(Saito 1991, 16)

(23b) is a version of (23a) in which double long topicalization has taken

place. Likewise, (23d) is a version of (23c) in which long topicalization

has taken place. The di¤erence is that in (23d) the applications are in-

trinsically nested, which is to say nothing more than that in order for

(23d) to be grammatical, long scrambling would have to reconstruct for

itself—and as we have seen, this is in general impossible. In RT the no-

tion that a given type of scrambling could reconstruct for itself is inco-

herent, since any given type of scrambling is simply the relation between

two adjacent levels, and any given sentence could involve only one such

relation.

Other details about the interaction of A and A systems also follow

from the architecture of RT. It is a theorem of RT that if X reconstructs

for Y, then Y cannot reconstruct for X. We already know that wh move-

ment reconstructs for NP movement.

(24) How [likely tBill to win] is Bill tAP?

We can therefore conclude that NP movement does not reconstruct for

wh movement. But what would that mean? Consider a language that

has both wh in situ for indirect questions (like Chinese) and Case-driven

A/A/A/A 131

raising (like English). Then one version of the question we are presently

addressing is, what would block the following derivation?

(25) a. [— wondered [wh [who to see Bill]]] raisingm

b. [who wondered [wh [t to see Bill]]] reconstructionm

c. [— wondered [wh [who to see Bill]]] wh construalm

d. [— wondered [wh who [t to see Bill]]]

In other words, the embedded wh word is raised to the matrix, then

reconstructed into its original position, and then used to make the em-

bedded clause an indirect question by wh construal strictly within the

embedded clause. This is what it would mean for raising to reconstruct

for wh movement. There are somewhat more complicated cases that

make the same point for a language like the one just imagined, but with

real wh movement instead of just wh construal.

(26) [Pictures of twh wondered [who tNP to bother Bill]].

Here the NP movement of pictures of who reconstructs for the licensing of

the embedded wh movement.

It is I think safe to assume at this point that such cases do not exist. But

why not? In RT this can be predicted from the very fact that wh recon-

structs for raising, via the theorem just mentioned.

The examples in (25) and (26) are at variance with RT in another,

though related, way: since raising is an IP rule, it cannot apply in the

presence of CP structure in the first place (see chapter 3).

Importantly, the details of the interaction illustrated in (25) and (26) do

not follow from the BOIM by itself—none of the movements in (25),

overt or covert, is improper. (25) does follow from the NP Structure

model of Van Riemsdijk and Williams (1981), and in fact follows in the

same way it does in RT. So a theory in which the GBOIM is added as an

extra condition will need still more conditions to regulate NP/wh recon-

struction interactions. By contrast, both the GBOIM and the facts in (25)

and (26) can be derived from the architecture of RT itself. These cases

thus add further weight to the argument that the GBOIM should be

architecturally derived.

Examples (25) and (26) are exactly like Saito’s (1991) long topical-

ization examples discussed earlier, except that Case-driven raising is

substituted for long topicalization. In both instances we tried to create

cases in which wh construal (or movement) takes place under reconstruc-

tion. One works, the other is blocked. This tells us that there is no abso-

132 Chapter 5

lute answer to a question like, ‘‘Is wh interpreted under reconstruction?’’

Rather, one must ask, ‘‘Is wh interpreted under reconstruction of Y?’’

RT suggests that there will potentially be a series of scramblings, one

between each representationally related pair Xn ‘Xnþ1, and that eachwill appear to reconstruct for the purposes of any relations established at

or prior to Xn. The nature of each kind of scrambling will be determined

by the level at which it operates; n/nþ 1 scrambling will scramble (and‘‘reconstruct’’) only nodes of the type defined at Xn or earlier, and its

‘‘range’’ will be determined by the size of the structures defined at Xn.

This again is one dimension of the LRT correlations.

The following diagrams the potential scrambling relationships and

their e¤ects on interpretation. The model assumed is the one presented in

chapter 9, which omits the level PS discussed in chapters 3 and 4.

(27)

5.4 Remnant Movement

The term remnant movement has great currency recently. But in fact there

have always been remnant movements, and there always will be, even

if present proposals fall by the wayside. Remnant movement is the

movement of a phrase containing the trace of something that has been

removed from it. Uncontroversially, remnant movement has taken place

in (28), assuming that certain is a raising predicate.

(28) [How certain ti to win]j is Johni tj?

Given the existence of remnant movement, the problem, as usual, is to

exclude most instances of it—for example, (29a).

A/A/A/A 133

(29) a. *Whoi were [pictures of ti]j seen tj?

b. Whoi were seen pictures of ti?

There is a derivation of (29a) in which the prohibition against extracting

from subjects has been evaded—by first extracting the wh word from the

direct object, (29b), and then moving the direct object to subject position,

(29b)! (29a).There will be any number of ways to exclude any particular case.

For example, the derivation of (29a) via (29b) is ruled out by the cycle

in Williams 1974 and by extension in Chomsky 1995. RT automatically

excludes most remnant movements, including (29), in the following way.

A remnant movement always involves two rules, the remnant-moving

rule (wh movement in (29)) and the remnant-creating rule. For remnant

movement to take place, the remnant-moving rule must ‘‘reconstruct

for’’ the remnant-creating rule. In RT this will happen only when the

remnant-creating rule applies earlier than the remnant-moving rule. In

other words, remnant movement is really a special case of reconstruction,

and everything that has been said about reconstruction applies.

Since NP movement (or its equivalent) occurs in CS (or PS) or there-

abouts, and wh movement in SS, wh movement reconstructs for NP

movement, giving (28) but excluding (29a). The general implication for

remnant movement is this:

(30) Corollary about remnant movement

A moved remnant cannot contain a hole ‘‘bigger’’ (or ‘‘later’’) than

the one it creates.

Importantly, no stipulation is needed to ensure this behavior; it follows,

as do all reconstruction interactions, from the architecture of RT.

Some but not all of this (and I think in fact an arbitrary subpart) fol-

lows from minimalist practice, from the incremental version of the cycle

that Chomsky (1995) has called extension (see also Williams 1974). Ex-

tension requires that every operation must enlarge the tree. As mentioned,

extension blocks the derivation of (29a) from (29b), as the movement

from object to subject does not reach the edge of the tree, but ‘‘tucks in’’

beneath the subject, to use Richards’s (1997) term. But this works only

within a single clause. In a multiple-clause structure, extension does not

block illicit remnant movements. Consider the following derivation:

(31) a. seems a picture of who to be for sale wh movementm

b. seems who a picture of ti to be for sale NP movementm

c. *[A picture of ti]j seems [whoi tj to be for sale]?

134 Chapter 5

(31c) is ungrammatical, of course—but why? It is not ungrammatical be-

cause of extension—it never disobeys extension, because the NP move-

ment strictly follows the wh movement. Perhaps it is ungrammatical

because seem does not take a wh complement. But why is that? Why do

raising predicates never take wh complements? Whatever the reason (it

used to be government; now its identity is uncertain), it is clearly not the

principle of extension itself, so (31c) and (31a) receive fundamentally dif-

ferent accounts.

In RT (31c) is ungrammatical for exactly the same reason as (29a):

since all NP movements must precede all wh movements, there is no

opportunity for the interaction that (31) illustrates. In fact, this explana-

tion was already a part of the NP Structure model (Van Riemsdijk and

Williams 1981), of which RT in this regard is a generalization.

The conclusions about remnant movement will provide no comfort for

proponents of Antisymmetry. RT has a rich movement rule typology;

because the representational levels index the set of movement rules, many

opportunities for remnant movement arise even while excluding (25) and

(18), as these are not consistent with the regime of remnant movement

(expressed in (30)) that follows from RT architecture. But in a theory

with a greatly reduced inventory of movement rules—in extreme cases, a

single movement rule (Move XP)—this regime would allow no remnant

movement at all. And remnant movement theories tend to have a greatly

reduced inventory of types of movement, seeking in particular to elimi-

nate head movement (e.g., Koopman and Szabolcsi 2000).

5.5 Summary of Findings

The overall argument for generalizing the A/A distinction along the lines

suggested here is summarized in table 5.1. This table charts ‘‘reconstruc-

tion’’ possibilities. Each column represents a ‘‘reconstructing’’ movement

or relation of some kind. Each cell in the column specifies whether that

movement ‘‘reconstructs’’ for the purposes of the relation corresponding

to the row of that cell. For example, wh movement (second column)

‘‘reconstructs for’’ anaphor binding, in the sense that binding relations

are licensed by the pre–wh movement structure.

The squinting eye can detect a rough diagonal from top left to bot-

tom right, with check marks below the diagonal and stars above the

diagonal (and question marks where the facts are indeterminate). This

diagonal arises because of the correlation of reconstruction with levels;

A/A/A/A 135

Table 5.1

What reconstructs for what

This !reconstructs

for this # Focus

Wh

movement

Long

scrambling

Short

scrambling

NP

movement

Movement

for Case

Wh movementp

* ? *

Long scrambling ? (opaque) ? ? *

Weak quantifiersp p p

* ? (raising) ?

Anaphor bindingp p p

* *

Short scrambling ? (opaque) ? (opaque) * * *

NP movementp?

p p? *

Q-floatp?

p? * * *

Theta relationsp p p p p p

136

Chapter5

the correlation follows from how the levels are related to one another by

representation.

RT links the reconstruction correlation with two other correlations:

rule target type and locality also vary systematically across levels, as

described in previous chapters. So we now have a sketch of the full set of

what I have called the LRT correlations.

As I noted at the outset, the title of this chapter is facetious: it pretends

that the ‘‘bar’’ of A , which means ‘not’, can be iterated like the bar of

X-bar theory. The serious side of this abuse of notation is that if the

approach in this chapter is correct, then A and A are simply two arbitrary

points in a spectrum of rule types. The cost of moving from the binary

distinction to the n-ary one is a richness of rule types. But I think that

richness is more than compensated for by the LRT correlations, and the

fact that these correlations flow organically from the architecture of the

model.

It seems to me that the set of LRT correlations is highly constraining,

eliminating many possible analyses, since it ties together three di¤erent

qualities of syntactic relations. It also seems to me that the full set of LRT

correlations follows from the representation model in a way that cannot

be duplicated without the architecture that representation requires.

A/A/A/A 137


Chapter 6

Superiority and Movement

I have assumed thus far that ‘‘real’’ movement—specifically, wh move-

ment—does not arise from (mis)representation of one level by another,

but is in fact movement in the traditional sense, as a part of the definition

of one of the levels, tentatively identified as SS in previous chapters.

Scrambling and wh movement therefore each have a completely di¤erent

status in the theory. The considerations o¤ered so far in favor of sepa-

rating the two theoretically were (a) the observation that wh movement,

unlike, for example, object shift in Icelandic, shows nesting rather than

intersecting patterns, and (b) that rules like wh movement operate once

per applicable domain, whereas scrambling rules operate multiple times. I

suggested that nesting and single application are diagnostic of ‘‘true’’

movement. Now it is time to back that suggestion up.

The empirical anxiety that presents itself is of course the notion that as

more and more pieces of the puzzle fall into place, scrambling and wh

movement will come to be seen as basically the same operation. This has

certainly been the widely held view so far. And recent theories and find-

ings seem to bolster the identification of a single notion of displacement

that is responsible for both, individuating di¤erences being attributed to

nonessential features of the relations involved.

For example, in work from the past decade on some Slavic wh systems,

it appears that wh movement sometimes exhibits what might be seen as

‘‘parallel’’ movement within a single domain, resulting in intersecting

derivations. If this initial impression is sustained, it undermines RT, in

that it suggests that ‘‘real’’ movement must be governed by principles that

enforce parallelism of movement of a set of elements in a single structure;

these principles would naturally extend to ‘‘movement’’ versions of the

phenomena I have cited as cases of shape-conserving representation, such

as scrambling, and would suggest that a unified theory could be achieved

if all phenomena were treated as cases of movement. But then the repre-

sentation relation would be left with nothing to account for. To consider

concrete cases, if parallelism of some kind governs wh movement, then

why does it not govern object shift (as a movement rule) as well, thus

making redundant the representation account of object shift and related

phenomena under the regime of Shape Conservation?

I have cited parallelism (Shape Conservation) as evidence for decom-

posing clause structure so that parallelism can be said to hold of the re-

lation among the decomposed parts, but if very similar parallelism can

also be shown to hold within a tree, then the architecture that arises from

the decomposition is less interesting and may in fact stand in the way of a

truly general theory. So it is a pressing empirical and theoretical problem

to see whether various sorts of parallelism e¤ects can actually be assimi-

lated to one another, as success in this endeavor would undermine not

only the results of chapters 1 and 2, but also the host of generalizations

that follow from the LEC in chapters 3–5.

Richards (1997) has built a theory in which the parallelism e¤ects of

scrambling are derived from a general theory of movement that also has

wh movement in its scope—in other words, a unified theory. I suppose I

should have been tempted to build a unified representational theory as

well, by which I mean a theory in which the principal features of wh

movement derive from Shape Conservation. I could not see any interest-

ing way to do this, so I leave that possibility unexplored here. But I will

point to circumstantial evidence, some of which I think is based on com-

pelling analyses, for distinguishing movement from shape-conserving

mapping between levels.

6.1 Is Superiority a Case of Shape Conservation?

One type of parallelism e¤ect that wh movement exhibits, even in a lan-

guage like English, has been called Superiority since Chomsky 1973. An

obvious and worthy goal would be to develop a theory of Superiority

governing movement that could account for scrambling parallelisms by

claiming that they arise from construing scrambling as movement. Sev-

eral researchers, most notably Richards (1997), have constructed theories

exactly along these lines.

Clearly, if there is no real di¤erence between the ‘‘Superiority’’ e¤ects

of wh movement and the ‘‘parallelism’’ e¤ects of local scrambling, and

the like, such a unified theory must be sought. But in fact I think the

two sorts of parallelism are fundamentally di¤erent, and di¤erent in a

140 Chapter 6

way that draws exactly the distinction between a movement relation

(wh movement) and a (mis)representation relation, what I am calling

scrambling.

I will argue (and in fact already have, in Williams 1994b) that Superi-

ority is in any event not a constraint on movement, but a consequence of

BT, to the extent that Superiority violations are really Crossover viola-

tions. If the parallelism distortion found in Superiority violations turns

out to result from Crossover violations, then there is no way that the

analysis can be extended to scrambling. Theoretically, then, a lot is up

in the air: Can scrambling and wh movement be assimilated under one

general theory? Is the Superiority Condition a part of that theory? Is

nesting versus intersecting a diagnostic of anything? In presenting my case

for the BT treatment of Superiority, I will also review Richards’s (1997)

version of Superiority, as it seems closest to achieving the unified theory

of movement whose scope would include wh movement and scrambling.

I will argue that its main conclusions are incorrect and that the correct

understanding of Superiority would make the unified theory Richards

discusses impossible in any event.

Superiority (Chomsky 1973) says that if two wh phrases are both eligi-

ble to move by wh movement, the higher one moves.

(1) a. Who saw whom?

b. *Whomi did who see ti?

Superiority thus preserves the order of the two wh words and so can be

understood to enforce a kind of parallelism constraint. Starting from (1),

we might seek to expand the coverage of Superiority to all parallelism

e¤ects, including the ones that were used in previous chapters to moti-

vate RT levels and Shape Conservation. So, another form of the central

question I would like to address in this chapter is, can A movements

and short-distance scrambling be shown to be governed by a general

Superiority Condition? In short, do A movement and scrambling show

Superiority e¤ects, in the sense in which wh movement does?

The answer will be an unequivocal no. In Williams 1991, 1994b, I sug-

gested an analysis of Superiority that in fact would make the extension to

A movement and scrambling impossible. Taking my inspiration from

Chierchia’s (1992) theory of the ambiguity ofWho does everybody like t?–

type sentences, I suggested that multiple-wh questions involve a ‘‘bind-

ing’’ relation between the unmoved wh word and the trace of the moved

wh word, so that Superiority violations are really Weak (and sometimes

Strong) Crossover violations.


(2)

Since the object position can never bind an anaphor that occupies the

subject position, binding of who by ti here clearly violates BT (Weak/

Strong Crossover); as a result, Superiority is reduced to W/SCO.

But what purpose would such binding serve? It clearly does not result

in ‘‘coreference’’ between the two terms. But it has been noted since Kuno

and Robinson 1972 that the relation between the two wh words in a

multiple-wh question is not symmetric, the moved wh word serving as

an ‘‘independent’’ variable (‘‘sorting key,’’ to use Kuno and Robinson’s

term) and the unmoved wh word as the ‘‘dependent’’ variable. This can

be seen in the answers to multiple-wh questions. Such questions can be

answered by giving pair lists, but also by giving a function for relating the

dependent and the independent variable, so understood.

(3) a. Who wrote what?

b. Each student wrote his own name.

(3b) maps students onto written things—exactly such a function.

We must not be misled by the fact that list answers can be given to

multiple-wh questions.

(4) A: Who read what?

B: Bill read Moby Dick,

Sam read Omoo,

Pete read Typee.

A list is simply one way to specify a function; the function in (4B) is

(f(Bill) ¼Moby Dick, f(Sam) ¼ Omoo, . . .). In fact, ‘‘function’’ is exactly

the right notion. A function can map di¤erent independent variables onto

the same dependent variable, but it cannot map one and the same in-

dependent variable onto di¤erent dependent variables; and answers to

multiple-wh questions seem to conform to this restriction.

(5) A: Who read what?

B: Bill read Moby Dick,

Sam read Omoo,

Pete read Omoo.

*B 0: Bill read Moby Dick,

Sam read Omoo,

Sam read Typee.

142 Chapter 6

The (5B 0) answer is odd; it can be improved by replacing the last twoparts of the answer with Sam read Omoo and Typee, which restores func-

tionhood (K. Kohler, personal communication).

Telling evidence for this view comes from the following example:

(6) Who knows what who bought t?

On the standard account this should be a Superiority violation, because

what has crossed over who. But on my account it will be a violation

on only one interpretation, the one where the embedded who function-

ally depends on what. When the embedded who depends on the matrix

who, the WCO configuration does not arise, and in fact the sentence is

grammatical on exactly that one interpretation. This view leads to a

straightforward account of the role of D-linking in Superiority as well,

incorporating the findings of Pesetsky (1987). See Williams 1994b for

more details.

A further observation on Superiority in German by Wiltschko (1997)

strongly supports the view that the two wh words in a multiple-wh ques-

tion are in the dependent-independent variable relation. Wiltschko first

presents incontrovertible evidence that German has Superiority e¤ects,

e¤ects that had been hidden from previous investigators by factors

that Wiltschko identifies and controls for. In a discussion of the role of

D-linking in Superiority violations, she then shows that Superiority in

German is governed by a very mysterious semantic condition, illustrated

by the following examples:

(7) I am sure that Peter and Mary must have talked to each other on the

phone.

a. Weißt

know

du

you

wer

who

wen

whom

angerufen

called

hat?

has

‘Do you know who called whom?’

b. *Weißt du wen wer angerufen hat?

(Wiltschko 1997, (32))

(8) I am sure that Peter, Paul, and Mary must have all talked to each

other on the phone.

a. Weißt du wer wen angerufen hat?

‘Do you know who called whom?’

b. Weißt du wen wer angerufen hat?

(Wiltschko 1997, (33))

The only di¤erence between (7) and (8) is that there are three individuals

in (8) and only two in (7); but, as Wiltschko notes, this means that in (8)


‘‘the answer can consist of (at least) two pairs.’’ We might take this

strange condition to be a condition on how dependent and independent

variables are related: if the independent variable cannot take at least two

di¤erent values, then there is no real nontrivial function, just a simple

fixed answer. Apart from any such consideration the condition is quite

peculiar, and, as Wiltschko shows, it does not follow from any of the

accounts of D-linking; in fact, the initial ‘‘setup’’ sentence in both (7) and

(8) guarantees the D-linking of all the wh phrases, thereby eliminating it

as a factor in discriminating them.

Important for present concerns is the conclusion that since this account

of Superiority is specific to binding relations, it is impossible to extend it

to scrambling, because elements that are A-scrambled do not in general

bear any relation to one another, binding-theoretic or otherwise. For ex-

ample, in neither (9a) nor (9b) is there a binding relation, or any other,

between Johann and das Buch.

(9) a. weil

because

Johann

Johann

das

the

Buch

book

gelesen

read

hat

has

‘because Johann read the book’

b. weil das Buch Johann gelesen hat

Referentially speaking, Johann and das Buch are completely independent

of one another, showing no coreference or dependency of reference, and

so there is no reason for any BT principle to force them to be in one or

another structural relation with each other. Likewise, the scrambling of

the verb and its complement NPs over negation in Scandinavian lan-

guages discussed in chapter 2 cannot conceivably involve any binding

relations among the moved elements, in general. So if the BT account

of Superiority in Williams 1994b is correct, it simply cannot be extended

to A movement. Put the other way around, if Superiority needs to be

extended to A movement and especially to scrambling, then it must be

something very di¤erent from what I have just suggested, and at heart it

must not have anything to do with the configurations that license depen-

dent reference.

But as I mentioned at the outset, some multiple-wh constructions

look at first glance just like Icelandic scrambling, suggesting a common

account. To use the terminology of this book, they raise the question

whether wh movement is shape conserving in the sense in which I am

using that term here. It is important that wh movement not show any true

shape-conserving properties, because if it does, there is no good reason to

144 Chapter 6

distinguish scrambling from other cases of movement and the rationale

for RT begins to evaporate. So my plan will be to show that wh move-

ment appears to have shape-conserving properties for special simple cases

where BT relations are involved, but that it is not shape conserving in

general.

The Slavic languages provide a rich source of information on multiple-

wh structures, including parallelism e¤ects of a kind that can be only

weakly illustrated by the English Superiority paradigm. In Bulgarian

multiple-wh questions, for example, both wh words move to the front of

the clause, maintaining their relative order.

(10) a. Kogo

whom

kakvo

what

e

aux

pital

asked

Ivan?

Ivan

‘Whom did Ivan ask what?’

b. *Kakvo

what

kogo

whom

e

aux

pital

asked

Ivan?

Ivan

(Boskovic 1995, 13–14, as reported in Richards 1997, 281)

There is one obvious di¤erence between Bulgarian and English

multiple-wh questions that I will put aside for the moment: in Bulgarian

all of the wh phrases in a multiple-wh question move, whereas in English

only the single independent one moves. I will concentrate first on what

the languages have in common: a single wh word is selected, moved to the

front, and interpreted as the independent variable. I will return to the

di¤erence in the fate of the dependent variables later.

In RT, facts such as those in (10) could be treated in two di¤erent

ways, with di¤erent consequences. (10) could be subsumed under a gen-

eral theory of Superiority of the type already discussed, which reduces

Superiority to a BT relation; or it could be accounted for by whatever RT

mechanism gives rise to the parallelism e¤ects in multiple scrambling

structures—the Shape Conservation principle regulating interlevel match-

ing, as I have proposed.

For the special case of two wh words, Shape Conservation appears to

hold, as (10) illustrates, and it could be used to support either account.

But for other cases wh movement appears not to obey Shape Conserva-

tion, suggesting that it is not to be subsumed under the same theory as

scrambling or object shift (analyzed as an interlevel mismatching con-

strained by Shape Conservation) and so must be an instance of ‘‘real’’

movement constrained by the W/SCO account of Superiority proposed in

Williams 1994b.


For example, Bulgarian multiple-wh questions involving three wh

words exhibit the following behavior:

(11) a. Koj

who

kogo

whom

kakvo

what

e

aux

pital?

asked

b. Koj kakvo kogo e pital?

c. *Kakvo koj kogo e pital?


The wh word that was highest before movement (here, koj ) must remain

highest after movement, but the other two wh words can appear in either

order. Why would this be? In the theory of Superiority I have just out-

lined, the answer is straightforward: each of the lower wh words must

stand in the dependent-independent variable relation to the highest wh

word, a relation governed by BT; but they need not bear any particular

relation to each other. That is, the dependence is strictly binary. In this

regard Bulgarian is just like English, where in (12), for example, what

depends on who and whom depends on who, but what and whom bear no

particular relation to each other.

(12)

Why should they? Two reflexive pronouns, for example, may share an

antecedent, but nothing in any theory I am aware of forces any particular

structural relation between the two anaphors themselves.

(13) John gave a picture of himself to himself.

In a multiple-wh question there can be but one independent variable,

and the rest of the wh words are dependent. In English, the independent

variable is the moved one, and all the unmoved wh words must be de-

pendent on it. This is perhaps why wh movement is obligatory in English,

in the sense that exactly one wh word must move: if an unmoved wh

word must be dependent, then there must be a moved wh word that is

independent.

So Bulgarian and English multiple-wh questions are alike in that one

wh word (always the independent variable) moves to SpecC, and the rest

of the wh words are dependent on that one. They di¤er in that in Bul-

garian all wh words are moved (or scrambled) to the position of the

moved wh word (a di¤erence I will take up in later sections). But that is

perhaps the only di¤erence; in other words, there is no more reason for

146 Chapter 6

Bulgarian than for English to assume that the dependent wh words bear

any particular relation to one another.

In the Icelandic object shift construction involving V and two NPs (see

chapter 2), the situation is quite di¤erent; it is in fact the entire constella-

tion of VþNP1 þNP2 whose pieces can be reordered with negation, butnever in such a way as to reorder any of the parts of the constellation.

Here an entire pattern is being holistically conserved; in multiple-wh

movement, only the relation of each of the dependent variables to the in-

dependent variable is conserved. So the condition governing object shift

(Shape Conservation) and the condition governing multiple wh (BT ap-

plying to the dependent-independent variable relation) are fundamentally

di¤erent, and di¤erent in a way that flows from their very di¤erent status

in RT.

As Richards (1997) shows, only the first of the wh words in a multiple

question shows Subjacency e¤ects in its relation to its deep position. He

accounts for this in terms of his notion of a ‘‘Subjacency tax’’: in e¤ect,

the first movement to a particular SpecC must obey Subjacency, and all

further movements to that SpecC are free to violate Subjacency. Strictly

speaking, we could preserve the RT program by simply accepting this

view in the present context as well and moving on to other questions. But

the distinction we have been using between dependent and independent

wh words suggests a di¤erent view: namely, that only independent wh

words are subject to Subjacency. The movement of the dependent wh

words could be e¤ected by further applications of wh movement (relieved

of the need to obey Subjacency) or, in RT, by interlevel scrambling. In

the following, I will suggest that Rudin’s (1988) original distinction is

valid (see section 6.2.1): the independent wh word moves by wh move-

ment, an intralevel movement, and the rest of the wh words move by

scrambling, the interlevel relation governed by Shape Conservation.

6.2 Scrambling Wh Words

There is good evidence that multiple wh movement always involves

scrambling. One consideration is the role of D-linking in governing mul-

tiple wh movement—essentially the same as its role in governing scram-

bling. Another consideration is the focusing e¤ects that the reordering of

wh words have on interpretation—again, just what is found with scram-

bling. Focusing and D-linking are of course di¤erent, as I emphasized in

chapter 2. D-linked elements can be focused, a fact that seems to me to


have been overlooked, partly because of the notion that focusing involves

‘‘new information’’ and D-linking ‘‘old information,’’ which I regard as a

confusion (see chapter 2 and Wiltschko 1997).

6.2.1 Scrambling and D-Linking

There is clear evidence, presented in Rudin 1988 and strengthened since

then, that the movement of the ‘‘extra’’ wh words in both Serbo-Croatian

and Bulgarian multiple-wh questions is due to focus-motivated scram-

bling, and not to a rule akin to wh movement. Part of the evidence comes

from the behavior of D-linked wh expressions.

First, D-linked wh expressions in Serbo-Croatian need not move,

whereas non-D-linked wh expressions must, on the assumption that in

this language, as in English, bare wh words are not D-linked, but ‘which

N’ NPs are.

(14) a. Ko

who

sta

what

kupuje?

bought

b. *Ko

who

kupuje

bought

sta?

what

c. Ko

who.nom

je

aux.3sg

kupio

bought.prt

koju

which

knjigu?

book.acc

(Konapasky 2002, 101)

However, if the D-linked wh word is the only wh NP in a question, then it

must move.

(15) Jucer

yesterday

je

aux.3sg

Petar

Petar.nom

kupio

bought.prt

koju

which

knjigu?

book.acc


These facts suggest that wh movement is obligatory in the sense that a wh

SpecC must be filled, but not obligatory apart from that. Moreover, they

suggest that the movement of the noninitial wh words is not a movement

targeting wh attractors, but a kind of scrambling. Boskovic (1999) in fact

suggests that there is no wh movement in Serbo-Croatian single-clause

sentences, only scrambling; but Konapasky (2002) uses the facts just cited

to justify Rudin’s original claim, against Boskovic’s—namely, that the

first wh phrase targets wh attractors, but the others do not.

A second rather e¤ective argument can be built on the fact that in some

dialects at least, non-D-linked wh phrases in embedded non-wh clauses

are not extracted from their embedded clause, but are nevertheless obli-

gatorily fronted within their clause.

148 Chapter 6

(16) a. Kokwho.nom

tvrdis

claim.2sg

[da

that

koga

who.acc

tk voli]?

love.3sg

‘Who do you claim that who loves?’

b. *Kok tvrdis [da tk voli koga]?


Koga in (16a), whose movement is apparently obligatory, does not end up

in the supposedly triggering Spec in the matrix. Rather, this movement

appears to happen in response to D-linking-related scrambling pressures

that arise within the embedded clause itself. This again strongly suggests

that the movement of the noninitial wh words is not targeting wh attrac-

tors in any of the cases.

Taken together, then, the behaviors exhibited by D-linked wh ex-

pressions strongly suggest that the first wh phrase is moved by obligatory

wh movement, and that the other wh phrases are moved by D-linking-

sensitive scrambling.

Further evidence leading to the same conclusion comes from (only)

Bulgarian. As noted earlier, multiple wh movement in Bulgarian is order

preserving; however, as both Rudin (1988) and Richards (1997) discuss,

D-linked wh expressions do not obey this stricture.

(17) a. Koj

who

kogo

whom

e

aux

vidjal?

seen

‘Who saw whom?’

b. *Kogo koj e vidjal?

c. Koj

which

profesor

professor

koja

which

kniga

book

e

aux

vidjal?

read

d. ?Koja kniga koj profesor e vidjal?

(Richards 1997, 104; from R. Izvorski, personal

communication)

There is apparently a noteworthy di¤erence between (17b) and (17d),

and D-linking is presumably implicated since the di¤erence comes down

to ‘which N’ versus ‘who’. Why is crossing-over allowed for D-linked

phrases only? Richards (1997, 111) suggests that there is an extra attrac-

tor in (17d), a Topic phrase above CP. I will accept this conclusion, but

interpreted in RT terms—it implies that in (17d) the primary wh move-

ment a¤ects koj profesor; the movement of koja kniga is secondary, and

hence scrambling. If the wh-moved phrase in (17d) is the independent

variable, then the topicalization of koja kniga reconstructs for the estab-

lishment of the dependent-independent variable relation, and koja kniga

is therefore the dependent variable, despite appearing first in the clause.


(18)

XS is whatever level the dependent variable relation is licensed in, and YS

‘‘misprepresents’’ XS (symbolized by ‘‘‘!’’).

It is important to realize that there is nothing incoherent about top-

icalizing, focusing, or in any manner moving a dependent variable to

the head of the clause. In fact, this happens in English ( just not with wh

variables).

(19) Which of his poems does every poet like best t?

In (19) his poems is dependent on every poet but has been moved beyond

it by wh movement. The dependent variable relation is determined in such

cases under reconstruction.

The conclusion that the primary wh word undergoes wh movement,

and that the movement of the secondary wh words is achieved by di¤erent

means, has consequences for models of these structures. A model in

which the only means for displacement is the unified and general theory

of ‘‘movement’’ is hard pressed to account for the di¤erent behaviors of

di¤erent kinds of movements without undermining the unification and

generality of the theory. I think that Richards’s (1997) theory of move-

ment comes the closest to addressing these questions. In his view, what I

would call shape-conserving movement occurs whenever several elements

are attracted to the same (instance of ) the same feature. Shape Conser-

vation is not a principle of Richards’s theory, but a consequence of how

Shortest Move is defined (see Richards 1997 for a discussion of the defi-

nitions that yield the results).

Richards analyzes multiple-wh question movement in Serbo-Croatian

and Bulgarian as multiple movements to a single attractor. But examples

like (16a) pose di‰culties for the view that the movement of the second-

ary wh expressions is provoked by a wh attractor, since the movement

does indeed occur, and in fact obligatorily, but not to the site of the pur-

ported attractor.

In the RT view there are two kinds of displacement: movement, with

approximately the properties associated with wh movement, and scram-

bling, which results when the shape-conserving mapping that must hold

between levels is relaxed for one reason or another.

150 Chapter 6

6.2.2 Long-Distance versus Short-Distance Scrambling

I now take up Rudin’s (1988) notion that the final position of all but the

first of the wh words in a multiple-wh question arises from scrambling,

looking especially at problems that come up in implementing her insights

in RT.

6.2.2.1 Serbo-Croatian If the above conclusions about how Bulgarian

and English multiple-wh questions work are correct, then we will not

want to extend Superiority to A movements or scrambling; we have good

reason to maintain that local scrambling and object shift are best ana-

lyzed as interlevel holistic (mis)mapping, whereas the relation between wh

words in long-distance multiple wh movement is best treated as pairwise

instances of binding within a single level. In fact, though, some construc-

tions seem to arise from an interaction between the two kinds of relations,

and it has been a commonplace in the literature on multiple wh move-

ment since Rudin 1988 to distinguish long- and short-distance movement

along these lines.

Serbo-Croatian di¤ers from Bulgarian in allowing reordering of wh

words.

(20) a. Ko

who

je

aux

koga

whom

vidjeo?

saw

‘Who saw whom?’

b. Koga je ko vidjeo?

Serbo-Croatian thus shows no Superiority e¤ects here; however, with

long multiple wh movement (not grammatical for all speakers) Superior-

ity e¤ects again show up.

(21) a. Ko

who

je

aux

koga

whom

vidjeo?

saw


c. Ko

who

si

aux

koga

whom

tvrdio

claimed

da

that t

je

aux

istukao?

beaten t

‘Who claimed that who was beaten?’

d. *Koga si ko tvrdio da je istukao?

(Boskovic 1995, as reported in Richards 1997, 32)

(21a,b) show that exchange is possible for short movements, while (21c,d)

show that it is not possible for long ones.

This di¤erence between long and short scrambling is familiar from

the findings in chapter 5. The scrambling involved in (21) resembles the


scrambling that ‘‘fixes’’ WCO violations. Since we are in fact assuming

that Superiority is a subcase of WCO, these facts are not surprising. If,

for concreteness, we assume that WCO is adjudicated in PS, then the

scrambling in question could occur in PScCS. Rudin (1988) in fact

gives evidence that Serbo-Croatian has WCO-correcting scrambling in-

dependent of what happens to wh words.

(22) CS ‘

��!

PS

WCO

scrambling

Although Serbo-Croatian SS is compatible with either order of two wh

words, we should not expect the two orders to be equivalent (‘‘Nature

hates a synonymy’’), and they are not. Konapasky (2002) translates the

two cases in the following way:

(23) a. Ko

who.nom

je

aux.3sg

sta

what.acc

prodao?

sold.prt

‘Who sold what?’

b. Sta

what.acc

je

aux.3sg

ko

who.nom

prodao?

sold.prt

‘What exactly did who sell?’

Konapasky interprets the di¤erence as a di¤erence in focus, pointing out

that in (23b) the moved wh word is interpreted as focused. We might

slightly reinterpret this finding in light of the ideas about the dependent-

independent interpretation of multiple questions; we could well imagine

that what is special about (23b) is the reversal in the dependent-indepen-

dent interpretation. This is the ‘‘marked’’ interpretation precisely because

it is the one that does not faithfully mirror PS; but the mismatch is

licensed precisely because it does achieve the other construal of the sen-

tence, the one switching the dependent and independent variables. Such

a change in interpretation is consistent with the conclusion that Serbo-

Croatian has WCO-fixing CS‘PS scrambling. On this account the pri-

mary interpretive di¤erence will be ‘‘logical’’—having to do not with

focus, but with the relation between the two wh words; the focusing dif-

ference would then be a side e¤ect. Further study of the semantic di¤er-

ence between the inverted and uninverted structures is clearly required, as

I am only guessing at what Konapasky’s gloss might mean.

In Bulgarian, where any wh word except the top one can reorder (see

(11)), one would expect di¤erences in meaning to be associated with the

152 Chapter 6

di¤erent orders. I have seen no discussion of the relevant cases, and I had

di‰culty getting Bulgarian informants to verbalize any such di¤erence.

An intriguingly similar situation arises in a completely di¤erent do-

main: the ordering of prenominal adjectives in English. As is well known,

the ordering is largely fixed, although the principles governing the order

remain obscure.

(24) a. i. a big red house

ii. *a red big house

b. i. a stupid old man

ii. *an old stupid man

It may be that the ordering is determined by something like which predi-

cate expresses a more natural ‘‘general’’ class, where the relevant sense

of natural is not strictly linguistic. Be that as it may, the relevant point

here is that the ‘‘wrong’’ order can legitimately occur, but with a special

interpretation.

(25) a. a RED big house

b. an OLD stupid man

At first glance the di¤erence between (24) and (25) might be seen as a pure

focusing e¤ect, a kind of contrastive focusing. In a Checking Theory,

for example, one might insert a Focus projection somewhere in the func-

tional structure of NP (or DP), with a feature that draws the focused

adjective to it.

I think, though, that this approach is not correct, and in fact that the

focusing e¤ect here, as in the Serbo-Croatian inverted-wh cases, is sec-

ondary. The crux of the focusing account is that the adjectives in the

inverted cases function semantically as though they occurred in their

uninverted order except for the fact that they are focused; that is, focusing

is laid on top of the usual interpretation that these adjectives would have.

But the focusing account of inversion can be sustained only for cases

that involve predicates for which di¤erences in the reference of the NP

would seem to be indi¤erent to the predicates’ order. To take (24bi), if we

take the set of old men and then take the stupid ones of those, we should

get the same result as if we were to take the stupid men and then take

the old ones of those. So there appears to be no ‘‘logical’’ di¤erence in the

interpretation of the two cases. But for an important class of cases the

intersective interpretation of the adjectives is not available.

(26) the second green ball

(Matthei 1979)


Here the order is fixed: first we take the green subset of all balls, and then

we take the second (according to some ordering scheme) of those. Impor-

tantly, in such cases reversing the two adjectives produces more than just

a change in the focusing.

(27) the GREEN second ball

(27) is sensible only when there is some way to define a set of ‘‘second

balls’’ and then take the (unique) green one from it. For example, if we

came across a two-dimensional array of balls, we might understand the

second column of balls to be the set of ‘‘second balls’’ and then look for

the green one among them.

Most significantly, (27) is not ambiguous, and in particular it has no

interpretation that has the same extension as (26). This means that the

interpretation is not simply focusing laid on top of the usual interpre-

tation of the two adjectives. Rather, the fundamental logical relation

between the two adjectives has changed. (27) forces into existence a

weird notion, the set of ‘‘second balls’’; but as soon as we understand how

that notion might be realized in some concrete situation, the weirdness

subsides.

In turn, this means that if the adjectives are inverted by scrambling,

then that scrambling precedes the ‘‘compositional’’ semantic interpreta-

tion of modification. Importantly, this ordering is obligatory, as I think

there is no alternative purely ‘‘focused’’ interpretation of (27).

It seems to me that exactly the same holds for Serbo-Croatian scram-

bling of wh words: it precedes the establishment of the basic logical rela-

tions among the wh words, and thus precedes the level (by assumption,

PS) in which those relations are established. This conclusion is buttressed

by the fact that Serbo-Croatian in any case has a type of scrambling that

could do this, namely, WCO-fixing scrambling.

(28) a. ??Njegovihis

susjedi

neighbors

ne

not

vjeruju

trust

nijednom

no

politicarui.

politician

b. Nijednom politicarui njegovi susjedi ne vjeruju.

(Richards 1997, 30; from M. Mihaljevic, personal

communication)

6.2.2.2 Modeling Bulgarian Bulgarian di¤ers from Serbo-Croatian in

not allowing scrambling of the two wh-words in a multiple-wh question;

as I noted earlier, drawing on Richards 1997 and Boskovic 1999, scram-

154 Chapter 6

bling can occur among the subordinate wh words, but none of them can

scramble with the first wh word.

(29) a. Koj

who

kogo

whom

kakvo

what

e

aux

pital?

asked

b. Koj kakvo kogo e pital?

c. *Kakvo koj kogo e pital?


This is trickier to model in RT than the Serbo-Croatian situation, and

in fact it cannot be modeled straightforwardly. Clearly, scrambling occurs

in Bulgarian, but not before wh dependency relations are determined;

otherwise, (29c) would be grammatical, as its counterpart is in Serbo-

Croatian, and its dependencies would be the reverse of those in (29a). At

the same time, though, ‘‘free’’ scrambling cannot occur after the determi-

nation of wh dependencies; if it could, we would again expect (29c) to be

grammatical, but with a ‘‘reconstructed’’ interpretation—that is, with an

interpretation identical to that of (29b). But this leads to the conclusion

that there is no scrambling at all, which of course is inconsistent with the

fact that both (29a) and (29b) are grammatical. So any straightforward

interleaving of scrambling and the other levels involved here (Case as-

signment, the establishment of wh dependencies) by itself will not do jus-

tice to the known facts.

I will propose a solution that capitalizes on what we already know

about wh movement: it always moves the independent variable. If this

feature of wh movement is held constant across all languages, then the

Bulgarian facts can be accounted for in this way: scrambling takes place

after wh dependencies are determined, but before wh movement, as shown

in (30).

(30) CS

Case

‘ PS

wh dependencies

‘

��!

scrambling

SS

wh movement

So wh movement will apply after scrambling, but because it is constrained

to move only the independent variable, it will lift that independent vari-

able, no matter where it lies among the scrambled wh phrases, to the top

of the structure.

For example, (29b) will have the following derivation (where I ¼independent and D ¼ dependent):


(31)

The PScCS representation is rigid, and wh dependencies set up in PS

are fixed: the ‘‘superior’’ wh word must be chosen, since it is identified as

the independent variable. Then scrambling occurs in the PS‘ SS repre-

sentation; by itself, it would give the appearance of ‘‘reconstructed’’

dependencies; the NP superior in CS would be interpreted as the inde-

pendent variable no matter what the surface order. But wh movement

then moves the independent variable to the top in SS, and so the inde-

pendent variable in e¤ect ‘‘regains’’ its original superior position.

I must admit I feel uneasy about this account, because it feels like

‘‘cheating’’ against the spirit of RT. Specifically, by allowing wh move-

ment to target only the independent variable, we are ‘‘coding’’ a Superi-

ority property at a previous level (CS, PS) and then allowing it to reassert

itself at a later level.

Against that unease, I rehearse to myself the following. First, the

needed feature of wh movement is already attested for English, and for

that matter, for Bulgarian; and even Japanese, with no overt wh move-

ment, shows a pattern that reflects the wh variable dependencies.

(32) a. *John-ga

John-nom

naze

why

nani-o

what-acc

katta

bought

no?

q

b. John-ga nani-o naze katta no?

(Saito 1994)

Here scrambling must obligatorily reorder the wh words. We might take

these facts to show that (a) why is not a good independent variable (as

proposed in Williams 1994b), and (b) Japanese has a scrambling rule that

can apply before wh dependencies are determined. That is, (32a) is the CS

order, but (32b) is the scrambled PS order, the one on which wh depen-

dencies are calculated; (32a) as a CS order will yield an interpretation in

which naze is the independent variable, and so is ungrammatical.

Support for (a) comes from some observations about English quantifi-

cation constructions.

156 Chapter 6

(33) a. *For every reason, someone left.

b. Everyone left for some reason.

c. For every girl, there is a boy.

If we regard the interpretive configuration (Q1 (Q2 (. . .))) as a dependency

of Q2 on Q1, then (33a) shows that reasons are not good independent

variables, but they are good dependent variables. (33c) is a control show-

ing that (33a) is not ungrammatical simply because every cannot take

wide scope from the preposed position. (33a) might have an interpreta-

tion for some speakers in which someone has wider scope than every, but

that is irrelevant here. See chapter 5 and section 6.2.2.3 for more on the

varieties of scrambling in Japanese.

Further support for this interpretation of (32) comes from the following

example:

(34) Dare-ga

who-nom

naze

why

nani-o

what-acc

katta

bought

no?

q‘Who bought what why?’

(Richards 1997, 282)

The additional wh word dare in this example makes it possible for naze to

precede nani. In the context of the proposal just made, the reason is that

dare is now the independent variable, on which both naze and nani are

dependent, and so naze is not forced into the position of being the inde-

pendent variable.

The second thing I rehearse against my unease about the solution

under discussion is that it might not be necessary to countenance deriva-

tions like (34), where a short movement ‘‘hides’’ invisibly beneath a long

movement. I turn to this topic in section 6.3.

6.2.2.3 Long versus Short Monoclausal Scrambling in Japanese Japa-

nese presents a similar puzzle concerning A and A movement, and like-

wise presents a puzzle for straightforward Checking Theories. According

to a well-known generalization originally due to Kuroda (1970) (see also

Hoji 1986), Japanese quantifiers are not unambiguous in situ, but move-

ment introduces scope ambiguity.

(35) a. ~&Dareka-ga

someone-nom

daremo-o

everyone-acc

hihansita. (b > E)criticized

b. &Daremo-o dareka-ga t hihansita.

(Kuroda 1970)


One way to understand the di¤erence between (35a) and (35b) is to sup-

pose that a moved quantifer may be interpreted in either its moved or its

unmoved position and thus has an ambiguous relation to anything that it

moves over. In classical terms we might understand this in the sense of A

versus A movement: A movement results in interpretation in the moved-to

position, whereas A movement results in interpretation in the moved-from

position (i.e., scope reconstruction for A but not A movement). In the RT

relativization of the A/A distinction, we would say instead that scram-

bling occurs either before or after scope determination.

The special problem for Checking Theories arises in cases where two

NPs move over the subject.

(36) a. ~&John-ga

John-nom

dareka-ni

someone-dat

daremo-o

everyone-acc

syookaisita. (b > E)introduced

b. ~&Dareka-ni John-ga daremo-o syookaisita. (b > E)c. ~&Dareka-ni daremo-o John-ga syookaisita. (b > E)d. &Daremo-o John-ga dareka-ni syookaisita.

(Yatsushiro 1996, as reported in Richards 1997, 82)

(36a) and (36b) are as expected. However, (36c) is surprising on the

Checking Theory account: if (36c) has the representation in (37), we ex-

pect it to be ambiguous, which it is not.

(37) [NP1-ni [NP2-o [NP-ga t1 t2 V]]]

The reason is that if both NP1 and NP2 are scopally ambiguous be-

tween deep and derived positions, then either scope order is possible:

NP1 > NP2 if NP1 takes scope from the derived position and NP2 takes

scope from the deep position, and the reverse if the reverse. But the fact is

simply that NP2 cannot take scope over NP1.

Importantly, though, if the two NPs crossing the subject switch their

relative order, then ambiguity results again.

(38) a. &Daremo-o dareka-ni John-ga syookaisita.

b. ~&Dareka-ni daremo-o John-ga syookaisita. (b > E)

The problem for Checking Theory is that it atomizes the NP movement

relations here, as each NP is checked independently of the others. It

therefore cannot account for e¤ects that arise from the relative ‘‘move-

ment’’ of two NPs with respect to each other, just the kinds of e¤ects for

which RT was envisaged.

Richards (1997) uses such examples to promote the idea that Superior-

ity holds for A movement, once ambiguity is controlled for. That is, (38b)

158 Chapter 6

illustrates A movement obeying Superiority, and (38a) doesn’t count, be-

ing ambiguous. In Richards’s account of (38a) the two NPs move to the

same functional node, and Superiority dictates their relative order. In

(38b) they again move to the same node, in the same order, but an extra

higher attractor (EXTRA in (39)) attracts NP-o to a higher position,

giving rise to ambiguity; since the extra attractor only attracts NP-o, Su-

periority does not prevent this movement (see Richards 1997 for formu-

lation of the relevant principles).

(39) [NP2-o [NP1-ni [t2 [NP-ga t1 t2 V]]]]EXTRA

Positing extra attractors does not solve this problem, though. In prin-

ciple, there is now no reason not to posit yet another attractor that

attracts NP-ni over the (derived) position of NP-o, thus again predicting

that the order ‘‘NP-ni NP-o NP-ga’’ will be ambiguous.

(40) [NP1-ni [NP2-o [t1 [t2 [NP-ga t1 t2 V]]]]EXTRA1 ]EXTRA2

But we know from (36c) that it is not.

The basic generalization about Japanese quantifiers is, ‘‘If two NPs

cross, ambiguity results,’’ understood in such a way that NP-ni and NP-o

do not cross in (36c), but do cross in (38a). But Checking Theory, because

it atomizes movement relations, cannot deal with cases where several

things move in concert. It must be augmented with an extrinsic principle

that controls either the input or the output of the derivation in a way that

has nothing to do with the operation of Checking Theory itself. In this

way, Checking Theory can be shielded against these and other related

empirical challenges, but at the cost of having less and less to say about

how these systems actually work.

How are these facts to be accounted for in RT?

Let us suppose that the orders of NP-ga, NP-ni, and NP-o, are repre-

sented in CS by the following structure:

(41) [NP-ga [NP-ni [NP-o V]]]

And let us suppose that SS can be generated by general rules, such as

(42).

(42) S! [NP1 [NP2 [NP3 V]]]Suppose further that SS uniquely determines quantifier scope; that is,

SScQS is strictly enforced.

The problem then reduces to this: how can (41) be mapped onto (42)

isomorphically? There is only one way, of course: NP-ga! NP1, and so


on. This is why (41) as a surface structure is not ambiguous. The other

mappings are misrepresentations of (41). The following two are the mis-

representations that give rise to (38a) and (38b), respectively:

(43)

(44)

(43) shows why ‘‘NP-ni NP-o NP-ga’’ has wide scope for NP-ni, and (44)

shows why ‘‘NP-o NP-ni NP-ga’’ has wide scope for NP-o.

Now the question remains, why can NP-ni have wide scope in ‘‘NP-o

NP-ni NP-ga V,’’ when NP-o does not have wide scope in ‘‘NP-ni NP-o

NP-ga’’? The answer must be that ‘‘. . . -o . . . -ni . . .’’ is more distant from

CS than ‘‘. . . -ni . . . -o . . .’’ and so is warranted only if a di¤erence in

meaning is achieved—that is, only if the further mismatch is compensated

by a closer match to QS. At least, that is the logic of RT.

6.2.2.4 Long versus Short Scrambling in Hungarian The standard

treatment of long versus short scrambling facts is to posit two di¤erent

movements, A and A, and (or or) two di¤erent positions, A and A, which

are their respective targets. This is the strategy adopted in Checking

Theories, for example. But in the context of RT, we could instead pro-

pose a single position that is the ‘‘target’’ of two di¤erent ‘‘movements’’:

a ‘‘virtual’’ movement, which arises as a part of the (mis)representation

of one level by another, and an A movement, which is intralevel SS

movement.

Analysis of WCO and Superiority facts in Hungarian suggests that this

must be so. There appears to be only one Focus position, which appears

just before the verb and whose filling triggers verbal particle postposing;

but WCO and Superiority violations arise only when the position is filled

by long wh movement.

(45) a. Kitiwho.acc

szeret

loves

az

the

anyjaimother-his

ti?

b. *Kitiwho.acc

gondol

thinks

az

the

anyjaimother-his

hogy

that

Mari

Mari

szeret

loves

ti?

(E. Kiss 1989, 208)

160 Chapter 6

This suggests that we cannot associate the preverbal Focus position with

either A or A status; that is, we cannot call it an A or an A position, in-

dependent of when the movement takes place. In RT we need only set

things up in Hungarian so that the Focus position is accessible some time

before CP embedding takes place. There is no need to fix ahead of time

how a given position in SS will be filled.

6.3 Masked Scrambling

The worrisome thing about the last derivation posited above for Bulgar-

ian (see (31)) is that there is an ‘‘invisible’’ structure in which the inde-

pendent wh word is not superior to the rest. A similar situation arose in

the discussion of Japanese quantifier scrambling in section 6.2.2.3. Per-

haps ‘‘invisible’’ scramblings are not allowed. If so, then derivation (31)

will not occur, but another one will be allowed, as the scrambling in that

case is not invisible.

(46)

A ‘‘paradox’’ arises from having both A and A movements available

for the same ‘‘process.’’ The problem is that a sentence in which A move-

ment is supposed to have applied can always be viewed instead as the

outcome of an application of A movement, followed by the application

of A movement; the surface order will be the same, but the interpre-

tive e¤ects will be di¤erent. As we saw in (21), repeated here, in Serbo-

Croatian short scrambling precedes dependent-independent variable

fixing, whereas long scrambling follows it (and so reconstructs for it).

(47) a. Ko

who

je

aux

koga

whom

vidjeo?

saw


c. Ko

who

si

aux

koga

whom

tvrdio

claimed

da

that t

je

aux

istukao?

beaten t

d. *Koga si ko tvrdio da je istukao?

But what prevents a derivation of (47d) in which first the two wh words

switch positions in the lower clause by short scrambling, and then the


same wh words move to the A position in the higher clause, thus nullify-

ing Superiority e¤ects?

(48) D Structure! A scrambling! A movement(Of course, in RT scrambling is not classical movement; I put the matter

in classical terms here because the issue is not specific to RT.) If such a

derivation were possible, (47d) should be grammatical. We must prevent

A scrambling from applying to the wh words in the lower clause, or at

least prevent wh movement from applying to its output.

There is a subtlety in determining what would count toward making a

scrambling ‘‘invisible.’’ Certainly part of it has to do with whether the

surface string shows the scrambling order; if it does, then the scrambling

is certainly not invisible. However, there is another way in which a

scrambling, even one that did not manifest itself in the surface string,

could achieve visibility: it could induce some e¤ect in the interpretation.

In fact, visibility is a matter of interpretation anyway. The scrambling

is visible in the obvious sense if there is some sign of it in the phonological

interpretation; therefore, one could easily imagine that the semantic in-

terpretation could provide some sign as well, in the form of an e¤ect on

meaning.

The crucial case of this type would be the one in which long scram-

bling appeared to give rise to WCO repair, by virtue of a prior, ‘‘string-

invisible’’ short scrambling.

(49) CS: [NP1 NP2 V]S1PS: [NP2 NP1 V]S1SS: [NP2 [NP1 V]S1 ]S2

Scrambling takes place at both CS‘PS and PS‘ SS; the CS‘PS

scrambling is string invisible. If this derivation is allowed, then the crucial

question is, what relation does it bear to the derivation in (50), with which

it coincides in both CS and SS?

(50) CS: [NP1 NP2 V]S1PS: [NP1 NP2 V]S1SS: [NP2 [NP1 V]S1 ]S2

Although (49) and (50) are string indistinguishable, they might di¤er in

interpretation. The di¤erence would center on the interpretive properties

of PS and (under the assumptions we have made) would include the

bound variable dependencies that WCO governs. A single long scram-

bling will appear to reconstruct for such dependencies; but a short

162 Chapter 6

scrambling, followed by a long scrambling, will not. So the crucial ques-

tion is, if a language has both short and long scrambling, and the short

scrambling has interpretive e¤ects, are all long scramblings ambiguous?

In the cases examined in this book, it appears they are not. From this we

would tentatively conclude that the prohibition against ‘‘invisible’’

scrambling is a prohibition against ‘‘string-invisible’’ scrambling. How-

ever, I regard this as an open question, and it is entirely possible that the

correct answer is more complicated than the present discussion suggests:

it might, for example, depend on how evident the semantic e¤ect is. In

other words, there is no conclusion about invisible scrambling that fol-

lows from the central tenets of RT, and in fact a number of di¤erent

answers to the questions about it are compatible with those tenets. In

what follows I will explore some considerations suggesting that ‘‘string-

invisible’’ scrambling should not be allowed, but further research could

uncover a more complicated situation.

We can in fact observe the behavior of masked scrambling in English.

In the context of RT, scrambled orders (i.e., ones that deviate from TS

and later structure) are marked; and such deviation can be tolerated only

to achieve isomorphy somewhere else. But marked orders must be ‘‘visi-

ble’’; that is, there must be some way to reconstruct them. But if the re-

gion in which the marked order occurs has been evacuated, then that

evidence is gone; for example, once wh movement has taken place in (51),

no evidence remains to show which of the two orders was instantiated in

the lower clause. In such a case we assume the unmarked order, as it has

the lowest ‘‘energy state.’’

(51) a. whi . . . ti NP

b. whi . . . NP ti

(52) Assume Lowest Energy State

If there is no evidence for the marked order, assume the unmarked

order.

There is some evidence from English for such a supposition. The evi-

dence comes from the interaction of scrambling and contraction. The

known law governing contraction is (53), illustrated in (54).

(53) Don’t contract right before an extraction or ellipsis site.

(54) a. Bill’s in the garage.

b. Do you know where Bill is t?

c. *Do you know where Bill’s t?


But because English has scrambling that can potentially move extraction

sites away from contractions, we can see how (53) interacts with such

scramblings.

The ‘‘normal’’ order for a series of time specifications within a clause

runs from the smallest scale to the largest.

(55) The meeting is at 2:00 p.m. on Thursdays in October in odd years

. . .

Any of these time specifications can be questioned.

(56) a. When is the meeting at 2:00 p.m. t? (Answer: on Thursday)

b. When is the meeting t on Thursday? (Answer: at 2:00)

Furthermore, the time specifications can be scrambled, up to ambiguity.

(57) The meeting is on Thursdays at 2:00 p.m.

Crucially, though, scrambling cannot be used to evade the restriction on

contraction.

(58) a. Do you know when the meeting is t on Thursday? (Answer: at

2 p.m.)

b. *Do you know when the meeting’s t on Thursday? (Answer: at

2 p.m.)

c. Do you know when the meeting is at 2:00 p.m. t? (Answer: on

Thursday)

d. Do you know when the meeting’s at 2:00 p.m. t? (Answer: on

Thursday)

e. *Do you know when the meeting’s on Thursday t? (Answer: at

2 p.m.)

(58b) clearly runs afoul of the trace contraction law (53); but why is (58e)

not a possible structure that would give the appearance that cases like

(58b) had evaded the law? (58e) must be eliminated, and a prohibition

against masked scrambling (52) looks like a promising means of doing

that. But again, I think it would be foolish not to explore more subtle

possibilities governing visibility.

6.4 Locality in RT

In chapter 3 the LEC was used to explain certain locality e¤ects, and in

particular the correlation between locality of operations and other prop-

164 Chapter 6

erties of operations. This naturally raises the issue of whether all locality

e¤ects can be so derived. In fact, not only scrambling is a¤ected by the

locality imposed by the LEC—wh movement is as well. As detailed in

chapter 3, wh movement cannot extract from structures that are not

embedded until after the level at which wh movement applies, and in fact

the islandhood of nonbridge verb complements was cited as an example

of that kind of explanation.

But if we accept the results of this chapter, there will be some obstacles

to reducing all locality to the LEC. Specifically, restrictions on wh move-

ment that fall under the traditional rubrics of Subjacency and the ECP

cannot be explained.

The Wh Island Constraint, for example, cannot be derived. (59) is a

typical Wh Island Constraint violation.

(59) *Whati do you wonder who bought ti?

Assume the LEC. As attested by the presence of the wh word in its

SpecC, the embedded clause is built up to the level of CP at the level at

which wh movement is defined—let’s say, SS; but if wh movement is

available at SS, there is no timing explanation for the ungrammaticality

of (59). If CP is present in the embedded clause, then it is also present,

and available for targeting, in the matrix clause.

Of course, one could supplement the LEC with more specific ideas

about how levels are characterized. For example, one could require that

all movement in a level applies before all embedding in a level; then tim-

ing would account for the Wh Island Constraint.

I am not at all convinced this is worthwhile. To begin with, there are

languages that are reported not to have a Wh Island Constraint; this

would at least tell us that the stipulation just mentioned was subject to

variation, an odd conclusion given its ‘‘architectural’’ flavor. We would

especially find ourselves in a bind if we were to accept Rizzi’s (1982)

conclusion that Italian has a wh island paradigm like the following:

(60) a. *whi . . . [wh . . . [that . . . ti . . . ]]

b. whi . . . [that . . . [wh . . . ti . . . ]]

That is, extraction from a that clause inside an indirect question is un-

grammatical, but extraction from an indirect question inside a that clause

is grammatical. Since both wh clauses and that clauses clearly involve CP

structure, they are introduced at the same level, and there is no way to

make this distinction with timing under the LEC. If it is ‘‘too late’’ to


extract wh in (60a), then it is too late in (60b) as well, and so there is no

way to distinguish them. See Rizzi 1982 for examples and for an account

of how languages vary with respect to wh-island e¤ects.

I will tentatively conclude, then, that wh movement is subject to local-

ity constraints on embedding, beyond those predicted by RT.

Importantly, scrambling cannot be subject to constraints beyond those

RT imposes. That is because scrambling is not a rule operating within

any level, but arises as competing requirements of Shape Conservation

are played out. So it is important that scrambling not show any locality

conditions that cannot be reduced to the LEC and its e¤ect on timing.

From this point of view, the conclusions reached in this chapter

about multiple-wh questions are especially significant. Rudin (1988)

argues that the primary and secondary wh movements are di¤erent sorts

of movement—the di¤erence between wh movement and scrambling,

respectively. We would thus expect the primary wh movement to obey

Subjacency, and the secondary wh movements to obey only the strictures

imposed by the LEC.

Richards (1997) documents detailed di¤erences between the primary

and secondary wh movements that suggest this distinction might be cor-

rect. Interestingly, Richards’s own theory draws no distinction between

the movement of wh and the movement of other elements; they are all

instances of Move, which has a uniform (if spare) set of properties. In-

stead, Richards proposes what he calls a ‘‘Subjacency tax’’ theory of how

rules are governed by constraints: if several movements target the same

functional projection, the first movement obeys Subjacency, but the rest

of the movements are free to apply in defiance of Subjacency (the first one

having paid the ‘‘Subjacency tax’’). The tax notion exactly distinguishes

the first movement from the rest.

Consider, for example, the following cases in Bulgarian:

(61) a. *Koja

which

knigaibook

otrece

denied

senatorat

the-senator

[malvata

the-rumor

ce

that

iska

wanted

da

to

zabrani

ban

ti]?

b. ?Koj

which

senator

senator

koja

which

knigaibook

otrece

denied

[malvata

the-rumor

ce

that

iska

wanted

da

to

zabrani ti]?

ban

(Richards 1997, 240)

166 Chapter 6

The single complex-NP extraction of koja kniga in (61a) is ungrammatical

because of Subjacency; but in (61b) the same extraction causes only weak

unacceptability, because the primary extraction targeting the matrix

SpecC (of koj senator) obeys Subjacency. The movement of NP1 ‘‘pays

the Subjacency tax’’; NP2 is then free to move in violation of Subjacency,

which it in fact does in this example under reasonable assumptions. (See

Richards 1997 for the original formulation of this theory and extensive

examples.)

In the end, then, Richards’s theory delineates approximately the same

di¤erence between the primary and secondary wh movements that Rudin

(1988) proposed, and that is needed in RT; Richards simply derives that

di¤erence from his notion of the Subjacency tax.

We have already discussed examples that cast doubt on the view that

the two movements are the same kind of movement in the first place:

namely, the Bulgarian examples in which a secondary wh word in an

embedded clause does not move to its primary counterpart, but never-

theless obligatorily moves within its own clause ((16), repeated here).

(62) a. Kokwho.nom

tvrdis

claim.2sg

[da

that

koga

who.acc

tk voli]?

love.3sg

b. *Kok tvrdis [da tk voli koga]?


Such examples suggest that the di¤erence between the primary and the

secondary wh words has nothing to do with wh attraction. If that were so,

the Subjacency tax theory would be irrelevant, as only a single wh word

would ever be moved to SpecC anyway.

6.5 Conclusion

In this chapter I have pursued the notion that scrambling and wh move-

ment are fundamentally di¤erent: wh movement is an intralevel move-

ment rule, and scrambling is simply the misrepresentation of one RT level

by the next level.

I have argued in particular that in multiple wh movement languages

only one wh expression undergoes wh movement, and the rest undergo

scrambling, essentially Rudin’s (1988) conclusion. I have argued that

assimilating scrambling to wh movement is a mistake, and that in partic-

ular the theory proposed by Richards (1997) leaves significant questions

unanswered.


After citing problems for the views of others, especially Richards, I

think it is only fair to expose a problem with the RT formulation of

multiple movement. The problem arises in trying to state precisely what

occurs at what levels and to correlate that with conclusions drawn from

other languages. For Bulgarian in particular, the problem manifests itself

as a conflict between the ordering of scrambling and its locality. Bulgar-

ian wh scrambling is a long-distance phenomenon (at least in the dialects

that allow it; see the discussion surrounding (16)), penetrating CPs in

particular.

(63) Koj

which

profesoriprofessor

koj

which

vaprosj tiquestion

iska

wanted

[da

to

kaze

say

molitva

prayer

[predi

before

da

that

obsadim

we-discussed

tj]]?

‘Which professor wanted to say a prayer before we discuss which

issue?’

(Richards 1997, 109; from R. Izvorski, personal communication)

So wh scrambling must occur after CP embedding.

At the same time I have supposed that wh scrambling occurs before wh

movement, since this explains, in the context of RT, why the independent

variable is always exterior. Combining these conclusions with the finding

of previous chapters that wh and CP embedding occur in the same level

(say, SS) results in the following ‘‘ordering’’ paradox (x > y means ‘y

happens before x’):

(64) a. wh movement > wh scrambling

b. wh scrambling > CP embedding

c. CP embedding ¼ wh movement

By one consideration, then, wh scrambling is strictly ordered before wh

movement; by another, they occur at the same level, exactly the level at

which CP embedding occurs.

The only way to dissolve paradoxes is to attack their assumptions until

one falls. The easiest one to attack here is the identification of the level at

which CP embedding takes place and the level at which wh movement

takes place. Ordering either wh scrambling or wh movement before CP

embedding is out of the question; in particular, it is incoherent, as it is

impossible to extract from something that is not embedded yet. The

ordering we need is CP embedding, wh scrambling, wh movement. There

is no paradox in this order, so long as there are further levels after the

168 Chapter 6

level of CP embedding. We simply don’t have the means to independently

identify the other levels.

The other possibility would be to develop some means of allowing wh

scrambling to occur after wh movement, but in such a way that a sec-

ondary (non-D-linked) wh expression could not be scrambled above the

primary one. The latter task is daunting because we have seen that such

scrambling above the primary wh expression is possible in certain lan-

guages: witness the long topicalization of wh words in Japanese, for ex-

ample, discussed in chapter 5. I will leave the problem unresolved.



Chapter 7

X-Bar Theory and ClauseStructure

Taken together, this chapter and the next provide what I would tenta-

tively call the RT model of phrase structure, inflection, and head-to-head

phenomena. Even taken together, they are too ambitious for their length,

as they propose a theory of phrase structure that incorporates (i.e., elimi-

nates) both overt and covert head movement, and an account of the

morphology/syntax interface (‘‘morphosyntax’’) that presumes to forgo

‘‘readjustment’’ rules.

The two chapters are interdependent in that this chapter introduces

the definitions of phrasal categories, the mechanisms responsible for

agreement and Case assignment, and the relation between these and

the inflectional categories marked on the verbal head, and the next chap-

ter proposes a theory about how the inflected verbal head is spelled

out. Beyond that, this chapter uses mechanisms that are not fully devel-

oped or justified until the next chapter: specifically, the marking of the

complement-of relation on category nodes (using the sign ‘‘>’’), the no-

tion of reassociation, and the particular theory of multiple exponence.

Before turning to these matters, I would like to outline why I think

there is an RT model of phrase structure that is di¤erent from the stan-

dard treatment, and to briefly suggest how it is di¤erent. The phenomena

explored here are accounted for in the standard model by a combination

of X-bar theory and movement governed by the Head Movement Con-

straint (HMC; Travis 1984). The HMC is commonly understood to be a

subcase of Relativized Minimality (Rizzi 1990). Relativized Minimality

says that locality conditions are parameterized and that the significant

subcases correspond to the A, A, and V (or head) subsystems. But in the

past several chapters I have suggested that the A/A distinction should be

generalized to, or dissolved into, a more general parameterized distinction

(A/A/A/A) defined by the RT levels, and that the locality associated with

each of these is determined by the level in which it is defined, in that it is

determined by the size of the structures that are assembled at that level.

But now the Relativized Minimality series ‘‘A/A/head’’ becomes awk-

ward. There is no natural place for ‘‘head’’ in the new generalization.

This suggests that the locality of head movement needs a separate ac-

count, not related to the A/A distinction or its generalization in RT.

V cannot be located in any particular level, but in fact occurs in every

level; it di¤ers in this respect from the entities A/A/A . . . and so cannot

be assimilated to them. In fact, verbs, and heads in general, are inde-

pendently parameterized by the RT levels. See the more extensive discus-

sion of Relativized Minimality in section 7.3.

In the following discussion the sign ‘‘>’’ indicates the complement-of

relation. In this chapter and the next, we will see that this relation always

holds between two elements, but in fact elements of quite diverse types,

including at least the following:

(1) a. a word and a phrase (saw > [the boy]NP)

b. a morpheme and a morpheme ( pick < ed )

c. a word and a word (V > V)

d. a feature and a feature (Tense > AgrO)

This is further complicated by the fact that words make up phrases,

features make up word labels, and so on, and there must be some rela-

tion between the complement-of relations of complex forms and the

complement-of relations of their parts. What follows, in this chapter and

the next, is a calculus of these relations that seems to me to be the most

appropriate for RT.

These two chapters flesh out a view of the relation between syntax and

morphology that I have put forward in a number of places, particularly in

Williams 1981a and 1994a,b and in Di Sciullo and Williams 1987. In

those works I viewed the Mirror Principle as arising from the fact that

words and phrasal syntax instantiate the same kinds of relations. As

argued in Di Sciullo and Williams 1987, the Mirror Principle is nothing

more than the compositionality of word formation; that is, [ pickþ -ed ]Vas a morphological unit is equivalent to [did pick]VP as a syntactic unit.

Both instantiate the complement-of relation between T and V, but one

does it in a head-final ‘‘word’’ structure with its properties, and the other

in a head-initial ‘‘phrase’’ structure with its own di¤erent properties.

These two chapters attempt to provide a more explicit calculus to back up

that claim.

This view, sometimes called lexicalism, has been confused with another

view, one that goes back to Den Besten 1976 and before that to Genera-

172 Chapter 7

tive Semantics, related to ‘‘deep versus surface’’ lexical insertion. The no-

tion ‘‘surface’’ (or ‘‘late’’) lexical insertion of course only makes sense in a

derivational theory. But even in a nonderivational theory we can ask

what the relation is between the form of the word and the environment it

appears in. In earlier work I took the view that the lexicon contains its

own laws of formation, sharing some features with, but di¤erent from,

the laws of syntax, and that the ‘‘interface’’ between the lexicon and syn-

tax could be narrowed to exactly this: the lexicon produces lexical items

with their properties, and syntax determines the distribution of such

words solely on the basis of their ‘‘top-level’’ properties, not on the basis

of how they came to have those properties during lexical derivation. I

think this view is vindicated by the nearly inevitable role that ‘‘lexicalism’’

plays in RT.

I think the question of whether insertion is ‘‘late’’ or ‘‘early’’ depends to

such an extent on the particular theory in which it is asked that to raise it

in the abstract is useless. For example, Generative Semantics and Den

Besten’s (1976) theory are quite di¤erent frameworks, so di¤erent that

each one’s assumption of ‘‘late insertion’’ can hardly be seen as support-

ing it in the other. But I do think that the above-mentioned question

about ‘‘lexicalism’’ can be fruitfully raised as a general programmatic

question.

7.1 Functional Structure

I will propose here an X-bar theory in which a lexical item directly ‘‘lex-

icalizes’’ a subsequence of the functional hierarchy, where by functional

hierarchy I mean the sequence of elements that make up clause structure:

T > AgrS > AgrO . . . Aspect > V. In the construction of a clause, the

entire functional hierarchy must be lexicalized; however, there is more

than one way to accomplish that. For example:

(2)

X-Bar Theory and Clause Structure 173

In the theory to be presented, it is not just that was in (2a) bears some

relation to the bracketed subsequence T > AgrS; rather, it is T > AgrSin that ‘‘T > AgrS’’ is its categorial label. All types of elements—

morphemes, words, compounds, phrases—can realize subsequences; and

no element can realize anything but a subsequence. In this theory ‘‘lexi-

calizing a subsequence’’ is not a derived property of lexical items; rather,

it is simply what lexical items do.

I suppose the biggest mystery about language that these proposals turn

on is where the functional hierarchy comes from in the first place—why

there is such a thing as Cinque’s (1998) functional hierarchy. Put dif-

ferently, why is intraclause embedding so fundamentally di¤erent from

interclause embedding? The question is acute in all frameworks, but has

not been seriously addressed. I am not innocent. The tradition of solving

syntactic problems by introducing new fixed levels of internal clause

structure includes my own dissertation (Williams 1974), which sought to

explain transformational rule ordering by appealing to four levels of in-

ternal clause structure (and I secretly thought there were six) and arguing

that the subparts of that structure had independent existence (as small

clauses), but without asking where the structures came from.

To me, the mystery is this: why aren’t all embeddings on a par, each

phrase’s properties being determined by its head, or its subparts, in the

same way? In such a theory the transition from V to its complement CP

would be no di¤erent from the transition from T to AgrS or from AgrS to

AgrO. But this simplest state of a¤airs is not what we find, and in ac-

knowledgment of the mysterious distinction I will refer to them as com-

plement embedding and functional embedding, respectively.

A related mystery is, why is the internal structure so rigid? Cinque

(1998) has identified over a hundred steps from the top to the bottom of

a clause. In fact, the clause might not have the purely linear structure

Cinque suggests. It at least seems to have some adjunct ‘‘subcycles,’’ as

shown here:

(3) a. John let us down every month every other year every decade . . .

b. John let us down on every planet in every galaxy . . .

c. [[[VP] XP] XP] . . .

d. ?John let us down because Mary was there because he was sick

. . .

Recursion of time and place is possible, as schematized in (3c), but per-

haps not of causes. The obvious nesting of meanings in such subcycles

174 Chapter 7

suggests that the whole structure itself might be explicated in terms of

meaning, but nothing substantive has been forthcoming, and I have

nothing to add myself.

At any rate, what follows is a theory of what the syntax of expressions

that express a single functional hierarchy can look like, and it executes

the idea that all items, including all lexical items, lexicalize (or realize)

subsequences of the functional hierarchy. As interesting as I think the

consequences are, I must warn in advance that my proposals do not

address this mystery of why complex phrases with fixed functional struc-

ture exist in the first place. I hope that whatever the solution to this mys-

tery turns out to be, it will be compatible with what follows, and so I will

take the existence of the functional sequence and its linear structure as

axiomatic.

7.2 An Axiomatization of X-Bar Theory

Consider the complement embedding of the direct object NP under V.

Full NPs (or DPs, whichever turns out to be right) may not exist until SS;

at least, some of their components, such as relative clauses, do not exist

until then. Nevertheless, TS contains a ‘‘primitive’’ version of an SS NP,

CS a more developed version, and so on; and these are in correspondence

with one another under Shape Conservation.

(4) a. TS: [amalgamatev holdingsnp]vp‘

b. CS: [amalgamateV [his holdings]NPacc ]

(The introduction of adjuncts, in this case his, will be taken up later.) To

propose a term, there is a shadow of the CS NP his holdings in TS, and

that shadow is its correspondent under representation: the np holdings in

TS.

Functional embedding, on the other hand, introduces material into a

tree at a later level that has no shadow or correspondent in TS. Suppose,

for the purpose of exposition, that T(ense) is introduced in SS; then the

surface structure in (5a) will represent the theta structure in (5b).

(5) a. CS: [amalgamateV [his holdings]NPacc ]VP‘

b. SS: [amalgamate[T>V] [his holdings]NPacc ][T>V]P

T in SS clearly has no correspondent in CS.

T is not an independent node in the surface structure or any other

structure; rather, it is a feature that has been applied to the projection of


V. (Shortly I will explain how such structures arise and what expressions

of the form [x > y] mean.) Functional embedding can also introduce a

lexical head, like complement embedding. In this version, auxiliaries re-

alize functional elements.

(6) a. CS: [amalgamate [his holdings]NPacc ]‘

b. SS: [willT [amalgamate [his holdings]NPacc ]VP]TP

Complement embedding, on the other hand, has no analogue of (5b); it

is always done by explicit subordination to an overt head. That is to say,

the main verb is never realized as an a‰x on its direct object. This is be-

cause the construct consisting of the a‰x plus the direct object would

have to have a label, and that label would violate the second axiom in the

formalism I will provide shortly.

Important for the present discussion is that in neither style of func-

tional embedding (feature or full word) does a shadow of the embedding

element (T in (5), will in (6)) appear in TS.

In this section I want to develop the rationale for the distinction be-

tween complement embedding (4) and functional embedding (5). It is

central to the way in which RT and minimalist practice di¤er, and the

distinctive consequences of RT stem from it. The discussion culminates in

an axiomatization of X-bar theory.

Although complement and functional embedding di¤er in the funda-

mental way just mentioned, they both are compatible with the principle

of Shape Conservation, which holds of the successive members of a deri-

vation regardless of what kind of embedding is involved.

Consider the mapping in (4); it puts in correspondence the elements in

the theta structure and the Case structure, and also their relations, in the

following sense. First, the TS ‘‘verb’’ amalgamate and the CS ‘‘verb’’

amalgamate are in what we might call ‘‘lexical’’ correspondence; that is,

these are two faces of the same lexical item. A lexical item has tradition-

ally been understood as a collection of di¤erent forms of the same thing;

the usual list of the forms includes syntactic form, phonological form, and

semantic form. I would expand that list to include all of the RT levels,

but the idea is the same: a lexical item is the coordination of its con-

tributions to all the levels it participates in. Thus, for lexical items the

representational mapping conserves the relation ‘‘x is a ‘face’ of the lex-

ical item y.’’ Second, the mapping also conserves the complement-of re-

lation: in TS holdings is in a theta relation to amalgamate, and in CS it is

in a Case relation, but since these are the complement relations of the two

176 Chapter 7

respective representations, the correspondence is again conservative. And

third, the head-of relation is conserved: heads are mapped into heads.

We can formalize this conservation somewhat in terms of the notion

commutation. If a relation is preserved, we can say that it commutes with

the representation relation. For example, we will say that the head-of re-

lation commutes with the representation relation, in that the following

relation will always hold:

(7) The head of the representation of X ¼ the representation of the headof X.

Schematically:

(8) [amalgamatev holdingsnp]vp lhead-of amalgamate

b representation b representation

[amalgamateV [his holdings]NPacc ] lhead-of amalgamate

Construing representation this way allows a more abstract characteriza-

tion of what is conserved than simply geometrical congruence, though

geometrical congruences will certainly be entailed by it.

I have spoken sometimes of a subpart of a structure at one level as the

‘‘correspondent’’ or ‘‘shadow’’ or ‘‘image’’ of a subpart of a structure at a

di¤erent level. The shape-conserving mapping between levels warrants

such locutions. The shape-conserving mapping is defined as a mapping of

one whole level (i.e., set of structures) onto another whole level. As a part

of that process, it derivatively maps individual structures at one level to

individual structures at another level. Further, if it conserves the part-of

relation, then it will map parts of individual structures at one level to

parts of individual structures at the other level. In (8), for example, hold-

ings is a part of the TS :VP amalgamate holdings. That TS :VP is mapped

to an XS:VP amalgamate his holdings; in virtue of that mapping, and its

conservation of the part-of relation, TS :holdings is also mapped to

XS :his holdings. That is, TS :holdings is the TS correspondent, or

shadow, of XS :his holdings under the shape-conserving mapping. If the

mapping were not shape conserving, then it would be impossible to know

what the correspondents of an XS :phrase were. And in fact because the

mapping allows deviations and therefore is not fully shape conserving and

also because new elements enter at each level, problems may well arise in

some cases in determining what is the correspondent of what. But in

general there is a coherent and obvious notion of the earlier correspon-

dents of a phrase.


Complement embedding always involves one lexical freestanding head

embedding a phrase as its complement. But, as already mentioned, func-

tional embedding is often, though not always, signified not by a lexical

item, but simply by a feature on the head, as in (5), where the feature

controls some aspect of the morphology of the head. We might call this

kind of embedding a‰xal embedding, since its sign is usually an a‰x

(perhaps silent) on the head. As with complement embedding, we want to

understand how this functional embedding relation conforms to Shape

Conservation. I will consider two ways of making sense of this kind of

embedding.

In standard minimalist practice, stemming from Travis 1984, a‰xal

embedding is accomplished by head-to-head movement, wherein the main

verb is generated in a phrase subordinate to the a‰x (or its featural com-

position) and then moves to the a‰x. The successive movements of the

verb account for the Mirror Principle, since if the movement is always,

for example, left adjunction, the order in which the a‰xes (now, su‰xes)

appear on the verb will correspond to their hierarchical embedding in the

structure that the verb moved through.

(9) [[[[Vþ af1]þ af2]þ af3] . . . [ . . . tVþaf1þaf2 . . . [ . . . tVþaf1 . . .[ . . . tV]V]F1P]F2P]F3P(where afi bears features Fi)

Such a proposal accounts for the Mirror Principle by building morpho-

logical a‰xation directly into the syntactic derivation in a particular way.

Although such a view still has explicit adherents (see, e.g., Cinque 1998),

most researchers have retreated from this strongly antilexicalist view.

Unfortunately, the retreat has usually involved a weakening of the expla-

nation of Mirror Principle e¤ects.

The account I will present here will divide the problem into two parts:

first, how does X-bar theory regulate the information on phrase labels?

And second, how does morphology realize those labels when they occur

on terminal nodes? Despite being more complicated than the hybrid

Cinque-style theory in having two separate components, phrasal and

morphological, it succeeds in capturing the full Mirror Principle e¤ects,

because it involves nothing but X-bar theory and direct morphological

realization. In other words, there is nothing more relating the two—no

readjustment rules, no movements of any kind, and therefore no locality

conditions of any kind, just the calculus itself. The practical problem with

178 Chapter 7

locality conditions and readjustment rules is that they lead to ‘‘instantly

revisable’’ theories. It therefore seems to me that the strongest possible

theory lies down the path that begins by separating phrasal syntax and

lexical syntax from one another.

In the place of head-to-head movement for phrasal syntax, I propose

that a feature can directly take a phrase as a functional complement, and

when it does, the feature is realized on the head of the phrase by the in-

teraction of Shape Conservation and the definition of head.

If we want to embed a phrase under a full lexical item H, the most ob-

vious and simplest way to do so is by concatenation: that is, concatenate

the head and the phrase, and name the resulting phrase after the head.

(10) Lexical Headþ Some Phrase ¼ [Lexical Head� Some Phrase]LHPBut suppose that instead we want to subordinate a phrase to a feature, as

in (11). The simplest way is to add the feature to the feature complex of

the phrase itself. Then the feature will ‘‘percolate’’ to the head. The per-

colation is in fact forced by representation—in particular, by the com-

mutation of representation and the head-of relation.

If the feature did not percolate (downward) in SS, then the SS :VP would

not count as having a head (since its feature composition would be dif-

ferent from that of its V), and this would break the commutation dia-

gram. Note that the representation relation is not symmetric; the surface

structure has ‘‘more information’’ than the Case structure. This reflects

the general asymmetry of the representation relation already discussed

and does not alter the conclusion about percolation.

The notation X > Y used in (11) and elsewhere indicates the

complement-of relation; it means ‘X takes Y as a complement’. For ex-

ample, T > V is what results from adding T to the featural complex I

have abbreviated by V. In other words, the label itself is structured by

the complement-of relation. This is meant as an alternative to the usual

notion that a node is a set of features, with no order or relation among

them. In the account I am proposing here, nodes are features in ‘‘comple-

ment chains’’ of the kind that can be symbolized as A > B > C > D > E.


(See chapter 8 for more on this, and for some theorems about the latent

descriptive power of this notation.)

The notation gives structure to the set of features. That structure makes

possible a simple axiomatization of X-bar theory, at least insofar as

X-bar theory concerns the well-formedness of phrase labels in trees—

instantiating in particular the head-of relation and a feature percolation

mechanism.

Below are two trees that will fall under the axiomatization. (12a) is an

example of a simple clause with a single main verb. (12b) is an example of

a clause with an auxiliary verb and a main verb. The axioms to be dis-

cussed will be illustrated with respect to (12a).

(12)

(12a) illustrates two properties we want the system to have. First, each

node is structured with respect to the complement-of relation. Second,

180 Chapter 7

there are only three relations that can hold between a mother node and a

daughter node:

(13) Axiom 1 (The juncture types of X-bar theory)

There are just three juncture types:

a. mother node ¼ X > daughter node (embedding)

b. daughter node ¼ mother node > X (satisfaction)

c. mother node ¼ daughter node (adjunction)

Case (13a) licenses embedding the daughter node under X at the mother

node. For example, in tree (12a) T embeds AgrS at the top. Case (13b)

licenses the ‘‘satisfaction’’ of features under agreement; for example, in

tree (12a) AgrO is discharged or ‘‘checked’’ by the direct object, as illus-

trated by the relation of [T > AgrS] to its daughter [T > AgrS > AgrO].

Finally, case (13c) licenses adjunction structures, where mother and

daughter nodes are identical; this is illustrated in tree (12a) by the two

Adv nodes.

On this account agreement is strictly local. The AgrS feature percolates

as far as it likes, except of course that it can be ‘‘checked’’ only by a fea-

ture, and only when it is peripheral in the label, and it must be checked by

a sister to the label.

I think Axiom 1 is in fact X-bar theory itself. It tells what form suc-

cession of heads must take, thereby defining the notion ‘‘head’’; and at

the same time it defines the permissible percolations of features. But the

structures found in natural language are also defined by that previously

discussed mysterious condition on functional sequences, which I will call

the Pollock-Cinque functional hierarchy (PCFH).

(14) Axiom 2 (PCFH)

There is a universal set of elements (T, AgrO, AgrS, . . . , V) that are

in a fixed chain of complement-of relations:

(T > AgrS > � � � > AgrO > V)

Labels must be subsequences of this hierarchy.

Labels in trees must conform to both Axiom 1 and Axiom 2. The

structures admitted by Axiom 1 filtered by Axiom 2 turn out to be just

the right structures; that is, they turn out to be structures like (12a), and

most other possibilities are left out.

Axiom 2 guarantees that a lexical item, or in fact any element, whether

simple or derived, must lexicalize a subsequence of functional structure.


The ‘‘operations’’ of embedding and satisfaction defined in Axiom 1 pre-

serve the subsequence property: they add to labels and remove from them

only at the ends, so that going up the projection, the labels on the heads

will vary smoothly from V to C, never departing from subsequencehood.

Axiom 1 in RT accomplishes much of what is done by (covert) verb

movement (or head-to-head movement) in standard minimalist practice.

As mentioned earlier, head-to-head movement captures at least part of

the mirror relation that exists between word structure and phrase struc-

ture. Axiom 1 accomplishes the same work by limiting successions be-

tween hierarchically adjacent nodes to pairs that di¤er only at the top

(13a) or the bottom (13b) of the label; this has the provable e¤ect that the

label on the very lowest node bears a mirror relation to the succession of

phrases that dominates it. The fact that every pair of labels must meet the

stringent conditions of (13) is comparable to the restriction, in minimalist

practice, that head-to-head movement is extremely local. The admissibil-

ity of adjunct junctures (13c), which leave the label unchanged, corre-

sponds, in minimalist practice, to the feature of Relativized Minimality

that makes certain adjuncts invisible to head-to-head movement.

Axioms 1 and 2 account for what the syntactic structures can look like,

but do not say how labels are spelled out. The spell-out of labels is the

topic of chapter 8, where the morphological interpretation of ‘‘>’’ is

taken up.

The role of X-bar theory in RT, then, can be summarized as follows.

There is a relation ‘‘head-of’’ that holds in all levels and participates in

the correspondences between the levels under Shape Conservation. There

are two kinds of embedding: functional embedding and complement

embedding. Functional embedding occurs between levels, because some

elements (e.g., T) are simply not defined for levels earlier than PS in the

version of RT given in chapter 4, for example. A functional element may

be introduced as a free lexical item or as a feature. In either case it sub-

ordinates another phrase; if it is a feature, it is added to the label of the

phrase it subordinates, and that added feature propagates down to the

head to preserve the head-of relations that the structure enters into.

With this more specific understanding of how X-bar theory operates in

the architecture of RT, I would like to return to the discussion in chapter

3 in which I suggested that the embedding of di¤erent kinds of Ss occurs

at di¤erent levels, and that locality and clause union e¤ects of the array of

small clause embedding types can be made to follow from that arrange-

ment. The specific problem I want to address is that I risk inconsistency

182 Chapter 7

with that earlier conclusion if I now say that NPs have shadows in TS,

but Ss (at least some—for example, tensed Ss) do not. Furthermore, if Ss

do have shadows in early structure comparable to the ones that NPs

have, then the LRT correlations laid out in chapters 3–5 are jeopardized

in a way I will explain shortly.

The main reason for saying that the head of an NP appears as a

shadow in TS is for selection, and we do find tight selection between the

verb and the head noun of the direct object. On the other hand, there is

no selection whatever between the matrix verb and the verb of a that

clause. This is ordinarily understood as resulting from the fact that that is

the head of the clause, and the matrix verb selects the head. Although this

answers the point about clauses, it raises a problem for the DP theory of

NPs (e.g., Abney 1987): if D is really the head of the direct object, then it

is hard to see why there is selection between V and the N beneath D.

In fact, the di¤erence between NP and S is even more extreme: the

main verb does not even select the tense, or for that matter the finiteness,

of the embedded complement. Grimshaw (1978) made this point clearly

when she showed that when a verb selects wh, it cannot select even the T

value of the IP beneath wh (much less its main verb); consequently, any

verb that selects wh automatically and inevitably takes both finite and

infinitive complements.

(15) I know whythe bird sings

to sing

� �.

This means that the apparent selection for T shown by most verbs must

be mediated.

(16) I know thathe left

*to leave

� �.

That is, know selects that and that selects [þfinite]. Some di‰culties ariseon this view; for example, some predicates seem to be able to determine

the subjunctivity of that clauses.

(17) a. It is important that he be here.

b. *It is known that he be here.

In addition, sequence-of-tense phenomena, although not involving selec-

tion by the main verb, do show that that is not absolutely opaque, since

they link main and embedded T specifications. Despite these problems I

will assume that Grimshaw’s conclusion is essentially correct, and that

verbs select only for C.


It is a lexical fact about the complementizer that in English that it, un-

like wh, is restricted to finite complements. Overt complementizers in

other languages are not so restricted, taking both finite and infinitival

complements; and in fact English whether does so as well.

(18) I don0t know whetherto go

he went

� �.

In RT the di¤erence in the behavior of NPs and Ss with respect to

selection will follow from the fact that NP will have its head N as its

shadow in TS, whereas a that clause will have only that, the selected head,

as its shadow in TS. This is exactly what we would expect if that were the

head of the that clause and N were the head of the NP. On this view what

is special about that is that its complement (TP) is not defined until SS,

because T itself is defined only at SS.

This conclusion gives up the DP hypothesis of NPs, but for a good

reason: the obvious di¤erence in selection between NPs and Ss. Propo-

nents of the DP hypothesis (Abney (1987), and others) have taken pains

to develop mechanisms and definitions that permit selection between the

matrix verb and the N head of NP (inside DP), but not in a way that

draws any distinction between NP and S. As a result, the hypothesis sug-

gests that the same selection will be found with Ss; but it is not—selection

by the main verb stops with that for CP clauses. Moreover, while that

is selected by verbs, as suggested by the CP hypothesis for clauses, the D

of a DP is never selected by verbs; if a verb takes a DP at all, then it

takes the full range of determiners, with some completely explainable

exceptions.

In RT, then, there is a fundamental di¤erence between the embedding

of NPs and the embedding of Ss: an NP complement is embedded in

TS, and in all subsequent levels; S embedding, on the other hand, is

distributed across the RT levels depending on what kind of S is being

embedded. Baker (1996) o¤ers some evidence for treating NP and S

embedding in sharply di¤erent ways. He shows that in polysynthetic lan-

guages NP arguments do not occupy theta or Case-licensing positions;

rather, what appear to be the expression of NP arguments are actually

adjuncts. The arguments involve standard binding-theoretic tests for

constituency. S complements, on the other hand, are embedded as argu-

ments exactly as they are in English; again, standard binding-theoretic

arguments involving c-command lead inevitably to this conclusion.

Actually, there is a version of RT that o¤ers the possibility of having it

both ways. In this version NPs could be N-headed in TS and D-headed in

184 Chapter 7

SS, and clauses would be that-headed in both levels, thereby preserving

their di¤erent selectional behavior. I will not pursue this possibility here,

because it threatens to undermine the LRT correlations of chapter 3, in

the following way. If correspondence across levels does not respect cate-

gories, as the NP‘DP correspondence would not, then possibilities

arise that defeat the LRT correlations. Suppose, for example, that an IP

is embedded beneath V at an early level (somewhere before SS—say, PS)

for ECM, raising, and obligatory control constructions, as suggested in

chapter 3. Various clause union e¤ects that depend on the absence of CP

structure (e.g., obligatory control) could take place there; then the IP

could ‘‘grow’’ a CP through correspondence with an SS structure; in

other words, an SS :CP would be put in correspondence with the PS : IP.

The SS :SpecC could then be the target of wh movement. We would then

have derived obligatory control across a filled SpecC, exactly contrary to

the prediction outlined in chapter 3. The following illustrates the deriva-

tion just described, with (19a)‘ (19b):

(19) a. PS: NPi [V [NPi . . . ]IP] (obligatory control established)

b. SS: NPi [V [wh [NPi . . . ]IP]CP] (obligatory control preserved, wh

movement)

The straightforward way to avoid this defeat of the LRT correlations is

to prevent correspondence under Shape Conservation where the cate-

gories are not homogeneous: [[ . . . ]IP . . . ]CP cannot be a representation of

[ . . . ]IP. The only ‘‘growth’’ that is allowed is growth that preserves the

category, essentially adjunction. This would be a feature of the Shape

Conservation algorithm, which unfortunately is still under development.

But if this feature survives further investigation, then NP cannot become

DP under shape-conserving ‘‘correspondence.’’ We could still maintain

that a TS :NP could be embedded in SS under a D. However, since it

would not have had any previous communication with the V that the DP

is embedded under, we would need, as Abney did, to make D transparent

to selection so that V could directly see NP beneath D in the structure.

(20) [V [ [ ]NP]DP]

7.3 Relativized Minimality

One must pause soberly before putting aside one of the most fruitful ideas

of modern linguistics, but if the theory I have developed thus far is taken

seriously, Relativized Minimality must be seen as a pseudogeneralization.


Although there is some correspondence between head movement and

the mechanisms proposed here, a close examination of the context in

which head movement operates reveals decisive di¤erences. The real

content of a theory with head-to-head movement lies in the constraints

limiting the movement, as otherwise the theory says, ‘‘Anything can move

anywhere.’’ The best candidate for the theory of the bound on head

movement is the Head Movement Constraint (Travis 1984), and in par-

ticular, the generalization called Relativized Minimality (Culicover and

Wilkins 1984; Rizzi 1990).

The main problem with Relativized Minimality in the context of RT

was stated earlier in this chapter. The generalization of the A/A distinc-

tion to the A/A/A . . . distinction and the rationalization of the properties

of each type in terms of its association with a level under the LEC leaves

no room for heads, as heads themselves have no privileged relation to

any of the levels, occurring in all of them. But if head movement is not

covered under anything like Relativized Minimality, then some other

account of the localities it exhibits must be sought.

Other considerations point in the same direction. For the A/A/A . . .

series, Relativized Minimality is weak compared with the locality that

follows from the RT architecture in that it permits rule interactions that

cannot arise in RT. For example, Relativized Minimality permits head

movement over SpecCP, to the matrix verb.

(21) a. [Vþ C [wh tC IP]]b. *I wonder-that [who tthat Bill saw t]

The reason is that according to Relativized Minimality, di¤erent systems

—head, A, A—do not interfere with each other; they only self-interfere,

in the sense that a movement of type X will be bounded only by occur-

rences of targets of type X. But (21) is not possible in RT with the LEC.

It remains to find out whether languages instantiate the type of structure

illustrated in (21), but the prediction is clear. Likewise for cases in which

an A movement bridges an A specifier—the latter includes what has been

called superraising, as discussed in section 3.1, and is again not possible in

RT on principled grounds.

Another di¤erence between head movement governed by Relativized

Minimality and the account of inflection and agreement suggested here lies

in the di¤erent ways that one can ‘‘cheat’’ in the two theories. I say ‘‘cheat

in,’’ but I should probably say ‘‘extend’’: di¤erent theories allow di¤erent

sorts of ‘‘natural’’ extension, and I think a theory should be evaluated on

the basis of whether its natural extensions would be welcome or not.

186 Chapter 7

There are two obvious ways to cheat with head movement in Rela-

tivized Minimality, and in fact both have been exploited, or should I say,

explored. One is to extend the number of self-interfering systems, to four

(or more); in the limit, the theory reduces to the null theory, as in the

limit every element belongs to a di¤erent category from every other ele-

ment, and so nothing interferes with anything. The other obvious way to

cheat is to sidestep the locality condition by chaining a number of little

moves together into one large move; in the context of head movement

this is called excorporation. Both are standard.

For example, in Serbo-Croatian we find the following evidence of verb

clustering:

(22) a. Zaspali

slept.prt

bejahu.

aux.3pl

‘They had slept.’

b. *Zaspali [Marko i Petar] bejahu.

c. [Marko i Petar] bejahu zaspali.


The auxiliary and the participle cannot be separated, suggesting that they

form a tight cluster, one naturally seen as arising from head movement of

the lexical verb to the participle. But when there are two participles (byl

and koupil in the following related Czech examples), the second seems

able to hop over the first.

(23) a. Tehdy

then

bych

aux.1sg

byl

was.prt

koupil

bought.prt

knihy.

books.acc

‘Then I would have bought books.’

b. Byl bych tbyl koupil knihy.

c. *Koupil bych byl tkoupil knihy.


This appears to be a case of ‘‘long head movement.’’ There are two ways

to extend Relativized Minimality to accommodate this phenomenon.

First, we might increase the number of self-interfering categories, the

course taken by Rivero (1991), who proposes to account for the pattern in

(23) by saying that bych is functional, while byl and koupil are lexical: in

(23b) lexical grammatically hops over functional, whereas in (23c) lexical

ungrammatically hops over lexical. Second, we might say that clustering

takes place in the usual way, but then excorporation accounts for the

possibility of (23b) (and details about how it operates account for (23c));


this is the course taken by Boskovic (1999). See Konapasky 2002 for

summary discussion and critique.

In the theory presented here, where there is no head movement, and

no Relativized Minimality, these extensions are not available. The inad-

missibility of excorporation will follow from theorems about reassocia-

tion given in chapter 8. And separating heads into two types will make no

di¤erence to the system under discussion here, so long as both types are

part of the calculus of complement taking.

But in fact the X-bar theory proposed here has its own way to accom-

modate such facts. I will postpone my own analysis of the Serbo-Croatian

paradigm until chapter 8, where I claim that verb clustering follows from

a narrow theory of label spell-out. For the time being I simply want to

emphasize that the types of solutions or extensions available to RT are

very di¤erent, lacking as it does head movement and its governing theory

of locality, Relativized Minimality.

7.4 Clause Structure and Head-to-Head Movement

The sort of X-bar theory sketched in the previous section will permit a

full account of head-to-head movement e¤ects without movement, local-

ity conditions, or readjustment rules.

7.4.1 Reassociation and Case-Preposition Duality

There is a functional equivalence between Case marking and preposi-

tions, long recognized but not formalized; for example, there is some

equivalence between to NP and NPdat. This is not to say that the two are

interchangeable, just that there seem to be two ways to ‘‘mark’’ an NP, or

two ways to ‘‘embed’’ an NP under a Case/preposition (see Williams

1994b). Suppose that P stands for some Case/preposition; suppose further

that the relation between the Case marking/preposition and the NP it is

attached to is one of embedding. I will leave open whether it is functional

embedding or complement embedding (it can probably be either, de-

pending on the preposition), and I will use the symbol ‘‘>’’ already

introduced to indicate the embedding relation. Then the equivalence we

are discussing is this:

(24) [P > NP]P@ [[ . . . [P > N] . . . ]P>N]P

On the left P governs the full NP and projects a P node; on the right

P > N is realized on the head noun (as, for example, [dat > N], which is

188 Chapter 7

more traditionally notated as Ndat). In both cases P subordinates N. I will

call this relation Case-preposition (C-P) duality, even though it will turn

out to be a broader relation.

In what way are these two structures equivalent? And in what way are

they di¤erent? They are obviously not identical, in that in a given con-

struction in a given language with a given meaning, only one of them can

be used; so English has only to boys, whereas Latin has only pueribus.

Nevertheless, the two expressions are alike in two ways: first, the relation

between P/dat and the N/NP is approximately the same in the two cases,

and second, the distribution of P > NP and [P > N]P is approximately

the same (i.e., they fulfill the same function, that of expressing the dative

argument of a verb).

But what is the basis of their equivalence? Why should they be alike

at all? We might regard the two representations in (24) as mutually

derivable by abstraction/conversion (‘‘)’’ signals abstraction, and ‘‘(’’conversion).

(25) [ . . . [X > Y] . . . ]X , [X > . . . [Y] . . . ]

I will aim to derive the equivalence from the X-bar theory developed here

without any independent operations, but I will nevertheless refer to these

operations in exposition.

Specifically, I will explore the possibility that C-P duality is nothing

other than the relation called reassociation in chapter 8. There, reassocia-

tion is shown to be a property of the complement-of relation in the mor-

phology of the functional system, so that, for example, if the left-hand

side is a valid expression, then so is the right-hand side, and vice versa.

(26) [[X > Y] > Z], [X > [Y > Z]]

The relation accounts for, among other things, a kind of ‘‘clumping’’ in

how functional elements are realized morphologically. Given that func-

tional elements are strictly ordered (T > AgrS > AgrO > V), one would

expect only right- (or perhaps left-) linear structures to realize the in-

flected verb; however, morphological structures like these are found as

realizations of this order:

(27) Swahili inflected verb

[a-li]

AgrS-past

[ki-soma]

AgrO-V

[AgrS > T] > [AgrO > V], [AgrS > T > AgrO > V]

(Barrett-Keach 1986, 559)


The structure is a symmetrical binary tree, rather than a right-branching

one (see chapter 8 for details; see also Barrett-Keach 1986). Why can the

symmetrical structure on the left realize the linear chain of functional

elements on the right? The bottom line of (27) shows that the actual

structure of the inflected verb enters into a ‘‘C-P-like’’ duality relation

with the functional structure it is supposed to represent. See chapter 8 for

further discussion.

Reassociation is the relation illustrated here:

(28) [A > B] > C, A > [B > C]

C-P duality of the sort instantiated in (25) can be viewed as a straight-

forward case of reassociation if we can appeal to a null element 0 to serve

as the third term in the reassociation.

(29) [[0 > X] > Y], [0 > [X > Y]]

But the extension to 0 suggests an even more exotic possibility. In (25)

X is abstracted out, leaving behind Y; but suppose even more were

abstracted out, namely, X > Y.

(30) a. [[ . . . [X > Y] . . . ]X>Y]X ) [[X > Y] > [ . . . [0] . . . ]X>Y]Xb. [0 > [X > Y > 0]]) [[0 > X] > [Y > 0]]) [[0 > X > Y] > 0]

In terms of reassociation, (30a) is simply a double application of the op-

eration Reassociate, as indicated in (30b).

For X ¼ P and Y ¼ N we would then have:(31) [[ . . . [P > N] . . . ]P>N]P ) [[P > N] > [ . . . [0] . . . ]P>N]PThis essentially evacuates the head position of the complement entirely—

not just the P/Case, but the N as well. This suggests that the head of the

noun could be realized on the preposition itself. And this possibility arises

purely through X-bar theory, with no further mechanisms, so long as the

theory includes C-P duality, as it arises if reassociation holds of X-bar

syntax. Such cases resemble the ‘‘inflected prepositions’’ found in Breton,

where an agreement mark on a preposition precludes overt expression of

its direct object (Anderson 1982).

Examples like (31) will be well formed only if the label [P > N] satisfies

Axiom 2; that is, P must be higher than N on the relevant functional

hierarchy. There are perhaps two kinds of prepositions: one ‘‘functional’’

and transparent, for which Axiom 2 would be satisfied; and another that

takes ‘‘true’’ complements, for which it would not be satisfied (see Wil-

liams 1994b for extended discussion). Structures that instantiate the right-

190 Chapter 7

hand side of (30) might be the coalescences of P and pronoun or article

found in some languages.

(32) a. zu

to

dem

the.dat

) zum (German)

b. a

to

le

the

chien) au chien (French)dog

Taking (32b), and assuming the DP hypothesis, we have:

(33) P > [D > NP]DP ) [P > D] [0 > NP]NPIn this way, head-to-head movement, in its instantiation as a kind of

‘‘incorporation,’’ is realized directly by X-bar theory.

Used in this way, Reassociate bears an obvious relation to covert verb

movement. However, of course it is not movement, and it need not be

bounded by any extrinsic locality conditions; rather, it is localized by the

X-bar formalism itself.

In the remainder of this section I will explore the possibility that C-P

duality, as an instantiated application of Reassociate in syntax, is the ap-

propriate syntax for overt verb movement as well. Given the conclusions

of the last two sections, this is an almost obligatory step to take. In sec-

tion 7.3 I suggested problems with Relativized Minimality as an account

of head-to-head relations, partly because in section 7.2 I developed an

alternative account of inflection in syntax. But since in the standard ac-

count of clause structure Relativized Minimality is the principle govern-

ing the locality of overt head movement, something must be developed in

its stead if it is to be eliminated on general grounds.

Verb-second (or subject-aux inversion (SAI) in English) can be seen as

arising from C-P duality in the following way. A declarative clause is a

tensed entity, where the tense is realized on the head of VP (also the head

of NP, if nominative is simply the realization of tense on N, as suggested

in Williams 1994b).

(34) [NP VPT]T

By C-P duality, this is the same as (35).

(35) [T [NP VP]]

(35) itself is not instantiated, because there are no lexical items that purely

instantiate T (unless do is one). But if V in the tensed clause is represented

as T > V, then the indicative clause instead looks like (36),

(36) [NP [T > V]P]T


which, by (radical) C-P duality, is the same as (37).

(37) [[T > V] [NP 0P]]T , [NP [T > V]P]T

Thus, auxiliary inversion structures (the left-hand side of (37)) arise from

uninverted structures through C-P duality. Duality captures the most es-

sential properties of inversion: it is local (only the top tensed verb can

move, and only within a single clausal structure), and it is to the left (like

P). Duality does not capture the fact that inversion is restricted to modal

verbs in English, but not in German, a point to which I will return.

The restriction to movement within a single clausal structure follows

from Axiom 2, which says that all labels must be substrings of the PCFH,

just as [P > D] in (33) is. T > V either is, or abbreviates, one such sub-

string, but to move into a higher clause it would require a label like

[T > � � � C � � � > T � � � ], which violates the PCFH. The reason ‘‘move-ment’’ appears to displace items to the left follows from the fact that T

(or for that matter V) takes its complements to the right.

Thus far I have assumed that the subject is an adjunct or specifier of

the VP and hence does not participate in the duality. In fact, though, the

subject is treated somewhat di¤erently in the two constructions related by

the duality. In the V-initial structure, the subject is treated more as a

direct object than as a subject, in that adverbs cannot intervene between

it and [T > V].

(38) a. *Did recently John leave?

b. John recently did leave.

c. *John saw recently Bill.

In this, the fronted auxiliary is playing the same role that the preposition

plays in certain absolutive constructions.

(39) a. With John recently departed, . . .

b. *With recently John departed, . . .

In fact, the absolutive construction is a good model for SAI, and it

emphasizes the notion that SAI arises from C-P duality. Some absolutive

constructions—for example, the Latin ablative absolute—even use Case

instead of P, thus confirming the connection.

(40) Caesare

Caesar.abl

vivo, . . .

living.abl

‘With Caesar living, . . .’

192 Chapter 7

That is, the Latin absolutive bears the same relation to the English

absolutive that an uninverted clause bears to an inverted clause, the rela-

tion of C-P duality. Another model is the consider construction (consider

Bill foolish) and similar small clause constructions, in which the V relates

to the NP as it would to a direct object.

In English, we can say that the inverted auxiliary relates to the subject

in its ‘‘derived’’ position; this is because the invertible verbs are all auxil-

iaries, and auxiliaries only take subject arguments. But in other inversion

constructions this is impossible. In German, for example, the class of in-

vertible verbs includes all predicates, and so the verb must govern aspects

of the clause structure that should not be accessible to it from its derived

position. For instance:

(41) Gestern kaufteV Hans das Buch tV.

The derived position of the verb should not permit a theta relation to

the direct object, because of the locality of theta relations. Rather, the

‘‘trace’’ of the verb should be responsible for that relation, as it is in

standard accounts. How can this be done with C-P duality?

C-P duality can give us a kind of trace, if we take the 0 of reassociation

seriously. I have used representations in which XPs have 0 heads without

saying how they are licensed, and they clearly do not occur freely. In the

following structure,

(42) [X > 0P]

[X > [0 . . . ]0P]XP

X ‘‘controls’’ 0 by virtue of governing 0P. It controls it in the sense that

the 0P acts, in its interior, as though X occupied its head position. It

might not be far-fetched to regard the 0 as an ‘‘anaphor,’’ with X as its

antecedent. This in fact makes it just like movement: the 0 head of 0P is

identified with X. But in this case, the antecedence arises from Reasso-

ciate and the complete evacuation of the label on the head when Reasso-

ciate applies in the most radical manner.

Control of a 0 head is also found in gapping.

(43) [John saw Mary][T]P and [Bill 0 Pete][0]P.

In the simplest interpretation the [T] label on the first conjunct serves

as the antecedent of the [0] label on the second conjunct, and by virtue of

C-P duality governs the interior of the second conjunct. This explains the

locality of the construction: it cannot occur in coordinated CPs, because

the antecedence holds only for immediate conjuncts.


(44) *I think [that John saw Mary] and [that Bill 0 Pete].

In (43) more than just T is deleted in the second conjunct; the verb is

also deleted, and the verb is understood as identical to the verb of the first

conjunct. In Williams 1997 I suggested that a 0 head always licenses a 0

complement if the following relation holds:

(45) Antecedent of complement of 0 head ¼ complement of antecedentof 0 head.

That is, antecedence and complementation commute. See Williams 1997

for a derivation of (45) from a more general principle and for a discussion

of its scope and properties.

Returning to SAI, the notion that the 0 head of 0P is anteceded by

whatever governs 0P explains why the absent head nevertheless manages

to govern the internal structure of the 0P that it heads. For example,

fronted auxiliaries are compatible only with whatever complements are

possible when fronting does not take place:

(46) a. Can John [0 swim]0P?

b. *Can John [0 swimming]0P?

c. John can swim.

d. *John can swimming.

Questions about the type of complement of the 0 head are deferred to its

antecedent, the fronted auxiliary.

A. Neeleman (personal communication) suggests that traces might not

in fact be necessary—at least one of the forms in the dual relation will

represent a structure at a previous level in which the relevant licensing

takes place. So, for example, (46a) is the dual of (46c), and (46c) itself or

some structure that it represents licenses the relation between can and the

present participle; in that case the trace in (46a) is not needed for licensing.

7.4.2 Multiple Exponence in Syntax

The mechanism I have called Reassociate also provides a means of ac-

counting for multiple exponence in syntax. Multiple exponence is an em-

barrassment for the theory of labels proposed here; it shouldn’t exist. The

reason is, given a functional hierarchy F1 > � � � > F13 as in (47a), whereeach Fi is subcategorized for Fiþ1, applying Reassociate will derive ob-jects like those in (47b), but not like those in (47c) or (47d). (M marks

subunits that correspond to morphemes.)

194 Chapter 7

(47) a. F1 > F2 > F3 > F4 > F5 > F6 > F7 > F8 > F9 > F10 > F11> F12 > F13

b. [F1 > F2 > F3]M > [F4 > F5 > F6 > F7 > F8 > F9 > F10 > F11> F12 > F13]M[F1 > F2]M > [F3 > F4]M > [F5 > F6 > F7 > F8]M > [F9> F10]M > [F11 > F12 > F13]M[F1 > F2 > F3 > F4 > F5 > F6 > F7 > F8 > F9]M > [F10 > F11> F12 > F13]M

c. F10 > F2 > F3 > F6 > F5 > F13 > F8 > F8 > F9 > F1 > F11> F2 > F13

d. [F1 > F2 > F3 > F4 > F5 > F6]M > [F6 > F7 > F8 > F9 > F10> F11 > F12 > F13]M

(47c) is simply a random assortment of the original set of features, of

course inadmissible. But cases like (47d), described with the term multiple

exponence, seem to occur rather frequently. An example that will be dis-

cussed more thoroughly in section 8.3 is this one from Georgian:

(48) g-xedav-s

2sg.obj-see-3sg.subj

The problem is that both the prefix and the su‰x are sensitive to both

subject and object agreement features, and hence in some sense realize

them; but then, no matter what the feature hierarchy is, there is no way to

segment it into morphemes by Reassociate.

Suppose that the feature hierarchy here is (49a) (where S# stands for

subject number agreement, Sp for subject person agreement, etc.). Then

an acceptable segmentation would be (49b), but what is found, appar-

ently, is (49c).

(49) a. S# > Sp > O# > Op > V

b. [[O# > Op] > V] < [S# > Sp]

c. [[(S# > Sp >) O# > Op] > V] < [S# > Sp (> O# > Op)]

However, the problem would disappear if we were to make ‘‘silent’’ some

of the features in the two morphemes (here, as in chapter 8, parentheses

indicate silent features).

(50) [S# > Sp (> O# > Op)]prefix > [(S# > Sp >) O# > Op]su‰x

Now the forms are combinatorially valid. We have drawn a distinction

between what a morpheme is paradigmatically sensitive to, and what it

‘‘expresses’’ insofar as the rules for combining forms are concerned. So


both prefix and su‰x can be sensitive to the value of some feature Fi, but

only one of them will ‘‘express’’ it. The theory makes the interesting pre-

diction that multiple exponence will always involve features that are ad-

jacent on the functional hierarchy. See chapter 8 for further discussion.

Multiple exponence is found in phrasal syntax as well. If we accept the

account of head movement–type phenomena that I have suggested, then

my proposed theory of multiple exponence can be imported here, making

the same very specific predictions about the character of multiple expo-

nence in phrasal syntax.

As an example, consider the complementizer agreement phenomena

found in certain dialects of Dutch (Zwart 1997).

(51) datte

that.pl

wy

we

speult

play.pl

(East Netherlandic; from Zwart 1997)

As Zwart notes, in some dialects the morphology on the complementizer

di¤ers from the morphology on the verb, while in others the two are

identical; in East Netherlandic they are di¤erent. What the agreeing

dialects have in common is that the complementizer always agrees with

the subject, and it always agrees in addition to (not instead of ) the verb.

Let us suppose that the functional hierarchy is C > T > SA > � � �V(SA ¼ subject agreement). Then we can understand the East Netherlandic

example in the following way:

(52) datte (pl)

[C > (T > SA)]zfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflffl{C T S . . . V|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

speult

[T > SA > � � � > V]That is, the T and SA features are silent on the complementizer (they

could as easily have been silent on the verb; see chapter 8 for a discussion

of underdetermination of such analyses). From the point of view of syn-

tax, speult will ‘‘be a’’ T > SA > V, and datte will ‘‘be a’’ C and so can

combine with the tensed V in accord with the functional hierarchy.

7.4.3 The Distribution of Dual Forms

In sum: if X-bar theory is formulated to account for C-P duality, through

an extension of Reassociate to the domain of phrasal syntax, then it will

196 Chapter 7

also account for various cases of absorption, head-to-head movement,

gapping, and so on. Is it a notational variant of head-to-head move-

ment? Putting aside the dismissive tone of the phrase, we can answer,

yes, in some respects. It would be quite surprising if it were not, because

the theory of head-to-head movement now answers to numerous well-

documented findings. The theories converge in many ways—for example,

with respect to locality and traces. Nonetheless, they are quite di¤erent in

character, the one consisting entirely of the laws of X-bar theory, and the

other also including movement and the theory of locality that movement

requires.

I remarked at the outset that C-P duality is a possibility not always

realized, in the sense that the two forms it relates do not always both ex-

ist, or if they do, are not always equipotent. But why not? Why do we not

find a given structure existing alongside all its fully grammatical dual

structures?

There may be no single answer to this question. In some cases the

absence of requisite lexical items is the cause; for example, the T in (35)

([T [NP VP]]) cannot be realized because T does not correspond to any

lexical item.

Blocking is the answer in other cases. Consider the a le) au rule in

French. C-P duality gives two structures:

(53) a

to

[le

the

N], au [0 N]N

The fact that there is a special lexical item (au) for the right-hand side

may be enough to make the left-hand side ungrammatical, through

blocking, since the left-hand side is what would be expected on more

general grounds, if the item au did not exist.

In still other cases both sides of the duality may be permitted to exist if

there is some di¤erence in meaning. For example, a dative preposition

and a dative Case marking might exist side by side, so long as they dif-

fered in meaning. The English SAI~declarative duality is clearly another

example: the semantic di¤erence is the di¤erence between interrogative

and declarative, or, more accurately, between a range of interpretations

that includes interrogative (also exclamative, conditional, and imperative)

and a range of interpretations that includes declarative. But other dis-

crepancies are unaccounted for. For example, in English only the auxil-

iary verbs participate in SAI. The SAI verbs must be subcategorized to

take the ‘‘absolutive’’ NP VP sequence as their complement, as well as


having their usual VP subcategorization. Only auxiliary verbs as a class

have this possibility in English (whereas in German, for example, any

tensed verb can participate). In English there are telltale discrepancies

suggesting that double subcategorization is in fact the correct way to

characterize the situation.

(54) a. *Amn’t I going?

b. Aren’t I going?

c. *I aren’t going?

Amn’t is not a possible IP verb, and aren’t is an IP verb with agreement

possibilities di¤erent from those of its VP counterpart.

198 Chapter 7

Chapter 8

Inflectional Morphology

8.1 The Mirror Principle

The Mirror Principle is the name of an e¤ect, in that it is derived in

theories, not fundamental. Specifically, the e¤ect is that the order of

morphemes on inflected verbs seems to reflect the structure of the syntac-

tic categories that dominate that verb. For example, in a language in

which the verb is marked for both object agreement and subject agree-

ment, subject agreement marking is generally ‘‘outside of ’’ (i.e., farther

from the stem than) object agreement marking; this ordering mirrors the

ordering of subject and object in the clause, where subject is outside of

object. To give another example, admirably detailed by Cinque (1998),

the expression of various kinds of modality by means of a‰xes on the

verb mirrors the expression of those same kinds of modality when that

expression is achieved by means of adverbs or auxiliary verbs. For

example, in English, ability is expressed by the modal can, whereas in

the Una language of New Guinea, ability is expressed by a verbal su‰x

-ti. Through painstaking language comparisons, Cinque shows that if an

auxiliary verb in one language and an a‰x in another language represent

the same functional element, then they will occur in the same spot on the

functional hierarchy.

So long as functional embedding occurs between levels, we have al-

ready derived the Mirror Principle in RT. As shown in chapter 7, it arises

from the interaction of Shape Conservation with functional embedding.

Recall that there are two kinds of embedding, complement embedding

and functional embedding. A lexical item is concatenated with its com-

plement, whereas a feature is added to the top of the label of its comple-

ment; in both cases the ‘‘>’’ relation is instantiated. Suppose that f is a

feature borne by morpheme a.

(1) a. Complement embedding: [a > B]fPb. Functional embedding: [ . . . ][f>B]

A familiar example in English is pairs related by do support.

(2) a. [didT leave][T>V]Pb. [left][T>V]P

The following is the example of functional embedding discussed in chap-

ter 7, on the assumption that T marking occurs at SS:

(3) a. CS: [amalgamateV . . . ]VP lhead-of amalgamateV

a a

b. SS: [amalgamate[T>V] . . . ][T>VP] lhead-of amalgamate[T>V]

The addition of T to the VP in SS (giving [T > VP]) is the act of embed-

ding under T that is privileged to occur in SS; the ‘‘percolation’’ of T to

the head of VP is necessary to conserve the head-of relation that existed

in CS. What happens with one feature happens with any number of fea-

tures. English is not a good language for illustration; still, supposing that

AgrS > T in the functional hierarchy, then (3) would subsequently un-

dergo further functional embedding.

(4) [amalgamate[T>V] . . . ][T>VP] ) [amalgamate[AgrS>T>V] . . . ][AgrS>T>VP]In this way, Shape Conservation will derive a mirroring of the syntactic

structure of a phrase on the label of the phrase itself.

Actually, the Mirror Principle also arises in RT in a di¤erent way. If

X-bar theory requires that labels honor the functional hierarchy, and if

that requirement applies equally to labels on phrases and labels on heads,

then, assuming that the set of features is the same for each, this require-

ment will enforce mirror e¤ects. For example, the following case will

never arise:

(5) [ . . . HF1>F2>F3>F4 . . . ]F1>F3>F2>F4

If there is a single functional hierarchy, then it is impossible for both the

head and the label on the phrase to respect it, since they have di¤erent

orders of elements.

But is the Mirror Principle actually true? There are a number of cases,

some of which will be discussed later in this chapter, that seem to argue

against it. There are languages, for example, where subject agreement

marking stands between the verb and object agreement marking, and

200 Chapter 8

there is no reason to think that the syntax of such languages di¤ers from

the syntax of English in the relative ordering of the subject and object

NPs. How can we respond to such cases?

We could abandon the principle, or come to view it as a superficial

tendency that does not deserve deep codification in the theory of gram-

mar. Mirroring is a norm, but not required. One version of this strategy

calls for a separate set of operations whose role is to mediate between the

syntactic and the morphological representations—in other words, a set of

rules to fix the mistakes that arise when they don’t match, so-called re-

adjustment rules. The problem with such rules is that, once they are a

part of a theory at all, there is no stopping them. If, for example, one’s

theory of the syntax-morphology interface includes three (types of ) read-

justment rules, and one encounters a language whose morphosyntactic

interactions lie beyond those rules, it doesn’t hurt much to add a fourth

one. Adding a readjustment rule component to a theory elasticizes it in

such a way that it can respond flexibly to new data. While that might be a

good property of some engineering projects, it is a bad property of a

theory.

Cinque (1998) o¤ers the most rigid (i.e., the best) theory of morpho-

syntax. In this theory all inflectional a‰xation arises from head-to-head

movement; as a result, if afi bears feature fi, then the following structure

arises and a perfect mirroring results.

(6) [[[[[[Vþ af1]þ af2]þ af3]3P t]2P t]1P t]VPCinque’s theory is quite rigid in its predictions, and clearly false, as

Cinque himself recognizes. How can it be fixed? One possibility would be

to introduce a readjustment component. Cinque steers clear of this blank

check and instead suggests (but does not work out) a theory of null aux-

iliary verbs and applies it to some obviously troublesome cases. I think

Cinque’s instinct is correct—not to write a blank check, but to develop a

substantive theory—but I would like to suggest a di¤erent solution to the

problem of apparent instances of nonmirroring.

I think the best place to start is with the recognition that syntax and

morphology (i.e., word formation, including inflected-word formation)

have di¤erent syntaxes; there are universal di¤erences (syntax includes

XPs, morphology does not), and there are language-particular di¤erences

(English words are head final, English phrases are not; in other lan-

guages, such as Japanese, words and phrases are both head final). But


they have one thing in common: they are both productive systems for,

among other things, representing the functional hierarchy. Crucially, they

represent the same functional hierarchy, but because they are di¤erent

systems, they do so di¤erently.

It is of course an empirical fact, or claimed fact, that words and phrases

are di¤erent, in this way or in some other way. It is in fact an empirical

imposition that there are words—combinations of morphemes that are

smaller than phrases—at all. On minimalist assumptions it is not clear

that nonmonomorphemic words should even exist; certainly one can

imagine developing a mapping between a sound representation and a

meaning representation that does not have anything corresponding to

morphologically complex words. Artificial logical languages do not have

morphology, for example, nor do programming languages (actually, pro-

gramming languages do have morphology, but as a practice among

programmers, not as a part of o‰cial language specification).

But given that there are words, and that words cannot be theoreti-

cally dissolved into phrases, leaving only morphemes or features, the

questions remain, how do words ‘‘work’’ internally, and how do they

interface with syntactic representations? In chapter 7 I gave some idea

about how phrasal syntax represents the functional hierarchy and how

the head of a phrase relates to the head of a word; and I have located the

Mirror Principle in that representation and that relation. But now, how

do words work themselves? That is, how do words represent the func-

tional hierarchy?

The hope is of course that the correct theory of how words work will

eliminate the need for any devices that mediate between syntax and word

structure; then we will have eliminated any need for the Mirror Principle

itself, as it will simply be a property of the architecture of the theory. It

will not be the case that morphology mirrors syntax, or vice versa; rather,

they will both mirror or ‘‘represent’’ the same functional hierarchy, but in

di¤erent ways.

I have declared that RT inflectional morphology is lexicalist. But I am

sure I have overstated the matter—I am sure it would be possible to make

a Checking Theory account of verbal morphology consonant with the

rest of RT. As usual, the things we call theories are really much looser

than we think. But I do think that a lexicalist morphology is the best kind

for RT. For one thing, it makes maximal use of the representation rela-

tion to account for inflectional morphology, and for the Mirror Principle

in particular. For another, I think it would indeed be peculiar to have

202 Chapter 8

eliminated NP movement from scrambling, and from the association of

theta roles with Case, but to still have a Checking Theory for verbs.

In what follows I will develop an account of the sublanguage of inflec-

tional morphology as an independent language. I will treat inflected ver-

bal elements and VPs as di¤erent languages, both representing the same

set of abstract functional elements, in accordance with the conclusion of

chapter 7.

I will propose a model for the sublanguage of verbal inflection, a for-

mal language that I think is an accurate model of inflectional morphol-

ogy, and I will present some statistical evaluation of its accuracy. I will

also give some idea of how the model can be applied to other aspects of

linguistic structure—in particular, how it models Germanic and Hungar-

ian verb raising and verb projection raising. In general, it seems to be a

promising model wherever ‘‘inheritable’’ lexical specifications play them-

selves out combinatorially without extrinsic inhibition.

8.2 The Language CAT

Let us assume that there is a universal set of elements, as in (7), and

that these elements are in a fixed hierarchical relation to one another, as

indicated.

(7) Universal elements and hierarchy

AgrS > T > (Asp >) AgrO > V

(or T > AgrS > (Asp >) AgrO > V)

The question I want to explore is how such elements can be realized by

lexical items. The elements are not themselves lexical items, but they are

realized by lexical items. For example, -ed in English realizes T (and

maybe other features at the same time), and plead realizes V.

To express the fact that one morpheme is above another in the hierar-

chy in (7), we will endow each element in (7) with a ‘‘subcategorization’’

of the form in (8), adopting the convention that if a morpheme expresses

an element, it inherits its subcategorization.

(8) a. T: AgrOb. -ed: T, AgrO

T takes an AgrO complement, and -ed, because it realizes T, takes an

AgrO complement. Given a set of lexical elements each of which expresses

one of the elements in (7), we can derive a linear string that contains them


all, if we adopt the X-bar convention that when an element is combined

with another element that it is subcategorized for, the result is of the same

category as the subcategorizing element (the principle of X-Bar Head

Projection).

(9) [morpheme1AgrS

[morpheme2T

[morpheme3Asp

[morpheme4AgrO

[morpheme5]V]AgrO ]Asp]T]AgrSV

Such an account predicts that the surface order of the morphemes will

mirror the underlying relation of the elements in (7) to one another.

However, in general we find that the surface order of the morphemes of

inflected verbs di¤ers from the order in (7). One way to accommodate

these di¤erent orders is to generate (9) directly and then apply rules

that ‘‘adjust’’ it into another structure. As already mentioned, while this

approach is obviously not incoherent and so is conceivably correct, it

should nevertheless be suppressed because of the following considera-

tions: first, readjustment rules are an inorganic addition to the theory,

and second, their presence undercuts any specific expectations about the

surface order of morphemes. While (9) is not the only order realizing

(7), it is quite obvious that the possible orders realizing (7) are sharply

limited. I have posed the problem of readjustment rules as a problem

for derivation of inflected verbs in the lexicon, but it applies equally to

models that derive the inflected verb in syntax (as in Cinque 1998) or in

‘‘both’’ (as in Chomsky 1995—some sort of derivation in the lexicon,

and feature-by-feature checking in syntax under a ‘‘mirror’’ regime). In

Cinque’s model, for example, the verb moves successively through a

series of functional projections that define clause structure, picking up

one a‰x in each projection under adjunction; this predicts a right-linear

string of morphemes mirroring the underlying order of the functional

elements, and any deviation must be handled by a di¤erent mechanism.

I think a better approach is to somewhat enlarge the combinatorial

possibilities among the elements in (7) in the first place. The sole conven-

tion that governs combinations thus far is X-Bar Head Projection. Sup-

pose we add to this the convention that a composed unit can inherit a

subcategorization as well as a type; this subcategorization is inherited

from the nonhead (whereas the type is inherited from the head). Com-

bining the two conventions gives the following Rule of Combination

(RC):

204 Chapter 8

(10) a. Rule of Combination

X Y þY Z ! [XþY]X Zb. X/YþY/Z ¼ X/Z

(10a) is the basic rule of Categorial Grammar (given in that theory’s no-

tation in (10b)), and for this reason I will call the language generated by

RC from a set of elements CAT. To illustrate, suppose for simplicity that

T takes V as its complement, and some V takes N(P) as its complement;

then, we derive the tensed transitive verb to the right of the arrow in (11)

by applying RC to the two elements to the left of the arrow.

(11) V NP þ -ed(T) V ! Ved, T NPRC will also derive such objects as tensed Asp, as in (12).

(12) T-morphemeþT Asp

Asp-morpheme

AspAgrO

!!T-Asp

T AgrO

Such a rule does not by itself allow for the generation of alternative

morpheme orders. However, it does allow for more diversity in the struc-

tures that can instantiate (7), permitting (in addition to the purely right-

linear structure) such structures as the following, where T and Asp have

combined to form an intermediate unit of type T, with subcategoriza-

tion AgrO:

(13) AgrS [[T Asp]T; AgrO [AgrO V]AgrO ]]

RC accounts straightforwardly for morphological fusion, the situation

in which one morpheme instantiates more than one feature. If some fea-

tures are permitted to have phonologically empty realizations, derivations

like the following will be possible:

(14) [e]X Y þ [morpheme]Y Z ! [morpheme]X ZThis account of fusion predicts that fused elements must be adjacent in

the hierarchy in (7), since RC will only combine adjacent elements. The

prediction is overwhelmingly true.

This model still does not generate alternative orders of morphemes. To

generate di¤erent orders, we will relax the interpretation of the subcate-

gorization notation. The traditional notion of subcategorization bundles

together three di¤erent kinds of information: type, order, and level.

(15) Subcategorization

a. Type (N vs. V, etc.)

b. Order (left vs. right)

c. Level (root vs. stem; X0 vs. Xn)


So NP encodes the idea that the verb takes a nominal object (N), that it

takes it to the right ( X), and that it takes a phrase-level complement

(NP as opposed to N). I want to investigate the properties of the lan-

guage that results from relaxing the order and level restrictions, retaining

only type subcategorization. Relaxing the order restriction means that if

V takes N(P) as a complement, then it can take it either to the right or to

the left [V N] or [N V]. To eliminate ambiguity about which element

takes which as complement in a structure, I will use the sign ‘‘>’’ intro-

duced in chapter 7 to indicate the relation of head to complement, with

the narrow end pointing to the complement. For example, if V takes an N

complement, then both of the following constructions are licensed when

the order restriction is dropped:

(16) a. [N < V]

b. [V > N]

I will now define CAT to be the language that is generated by a set of

elements in head-complement order under the RC, where subcategoriza-

tion specifies type only, leaving level and order free.

(17) CAT ¼ {A( B), B( C), C( D) . . . þ RC}CAT uses type subcategorization only

Put di¤erently, CAT is the set of permutations that arise from suspending

order and level subcategorization. I will now determine some properties

of CAT with an eye to evaluating its role as a model of some linguistic

systems, inflectional morphology among them.

The first thing to establish is the relation of CAT, where order and

level are relaxed, to the language that results when they are enforced.

When the elements in (17) are combined in such a way that order is

fixed, subcategorization is not inherited, and only head type is projected,

they determine a single structure, which I will call the right-linear string

(RLS).

(18) Right-linear string

[A > [B > [C > [D > [E]E]D]C]B]A

The RLS is the model of Pollock/Cinque-style clause structure, and, via

the Mirror Principle, the model of inflectional morphology that is widely

assumed.

The RLS bears a particular relation to CAT that can be explicated by

defining two CAT-preserving operations, Flip and Reassociate.

206 Chapter 8

(19) a. Flip

If X ¼ [A > B], A and B terminal or nonterminal,

Flip(X) ¼ [B < A].

b. Reassociate

If X ¼ [A > [B > C]], R(X) ¼ [[A > B] > C].

Flip is CAT preserving in the sense that if [A > B] belongs to CAT, then

it is guaranteed that [B < A] belongs to CAT, by virtue of CAT’s indif-

ference to order. To show that Reassociate is CAT preserving, we reason

from the RC in this way: in X ¼ [A > [B > C]], [B > C] is of type B, with

subcategorization the same as C’s; so A must have subcategorization B

if X belongs to CAT; but then A must be directly combinable with B, and

the result of that combination will have subcategorization C; so, given

the RC, [[A > B] > C] must also belong to CAT. So, both operations are

CAT preserving. Furthermore, both have obvious inverses, and the in-

verses are also CAT preserving.

We can now show that CAT is the language that can be generated

from the RLS by Flip and Reassociate. We do this by showing that any

member X of CAT can be mapped onto the RLS by some combination of

Flip and Reassociate, and since these are invertible and CAT preserving,

that mapping can be viewed backward as a generation of X from the RLS

by some combination of Flip and Reassociate.

Suppose there is a structure X that is a member of CAT but cannot be

mapped onto the RLS by Flip or Reassociate, or their inverses. Then,

there must be some node in X that is either a left-branching structure

([[A > B] > C]) or a structure of the form [A < B], for if there are only

right-branching structures and rightward-pointing carets, then the struc-

ture is the RLS. In the first case, if right association cannot convert X to

[A > [B > C]] by reasoning already given, then it cannot belong to CAT

in the first place, and likewise for the second case; hence, there can be no

such structure. So,

(20) CAT ¼ RLSþwhere by RLSþ I mean the language generated from the RLS by Flipand Reassociate.

The properties of CAT just identified are useful in discussing CAT as a

model of linguistic systems. By virtue of Flip and Reassociate, CAT can

be taken as a model of systems that appear to involve movement. In fact,

CAT, via its RLSþ interpretation, mimics movement of constituents of


arbitrary size, over arbitrary distances. To see this, consider the RLS in

(21a), and whether H in that structure could be moved to the position

between B and C.

(21) Flip and Reassociate can e¤ect long-distance moves, of any node to

any higher position.

a. [A > [B > [C > [D > [E > [F > [G > [H > [I J]]]]]]]]]5

Derivation:

b. [A > [B > [C > [D > [E > [F > [G > [H > [I J]]]]]]]]]

Reassociatem

c. [A > [[[[[[B > C] > D] > E] > F] > G] > H] > [I > J]]

Flipm

d. [A > [H < [[[[[B > C] > D] > E] > F] > G]] > [I > J]]

In the derivation (21b–d), first several applications of Left-Reassociate

gather all of the material intervening between the moving item and the

landing site, and then a single Flip e¤ects the movement. It is important

to understand that as far as CAT is concerned, there is no movement;

rather, there is a theorem that if (21b) belongs to CAT, then so does

(21d); Flip and Reassociate are simply a way of thinking about this via

the RLSþ interpretation of CAT. Nevertheless, these conclusions inviteus to consider CAT as a model of linguistic structures that appear to in-

volve movement.

While a single unbounded movement is allowed, multiple movements

are quite constrained. The Flip operation in (21c) reverses the caret, thus

blocking any further applications of Reassociate. Hence, any further

movement in the vicinity of the movement path will be blocked; in par-

ticular, there will be

(22) a. no movement of the moved constituent

b. no movement out of the moved constituent (where it is complex)

c. no movement out of extracted-from constituents

It is again important to realize that these are not constraints that need to

be imposed on Flip and Reassociate; they all reduce to theorems about

CAT. A question I have not been able to answer is, is any system of

transformations of the RLS constrained by (22) equivalent to CAT or

RLSþ?Because of the restrictions in (22), CAT cannot be used to model wh

movement, as wh movement does not conform to any of them. CAT thus

208 Chapter 8

di¤ers from full-blown Categorial Grammar. In particular, it does not

have ‘‘type lifting,’’ which can be used to evade (22).

I will now try to assess how big CAT is. If the set of base elements

is finite, as it is in the cases we intend to model, CAT itself is finite.

As I characterized it earlier, for some fixed chain of elements in the

complement-taking relation, CAT defines some set of permutations of

those elements. The full set of permutations of n elements (call it P) has n!

elements (n� (n� 1)� (n� 2) � � � � 2). As n grows, CAT becomes a tinysubset of P; for this reason, any system of a certain size that resembles

CAT most likely is CAT.

For three elements, CAT is actually identical to P, but for any larger n

it is not.

(23) Suppose 1 > 2 > 3 > 4 > 5. Then:

3: 1 2 3

[2 < 1] > 3

1 [3 2]

3 < [1 > 2]

[2 3] 1

3 < [2 > 1]

4: 1 2 [3 > 4]

3 [1 2] 4

1 2 [4 < 3]

*3 1 4 2

1 > [[3 < 2] > 4]

*2 4 1 3

5: *3 1 5 2 4, etc.

The starred strings are the non-CAT strings for n ¼ 3, 4, and 5. To seethat they are non-CAT, we can try to build a parse tree for them from the

bottom up; for the examples given, there is no way to start building the

tree, because no adjacent elements are combinable in either direction (this

does not, however, characterize all failures of strings to be members of

CAT).

The non-CAT strings given here are derivable from the RLS by move-

ment free of the constraints in (22). For example, (24) gives the derivation

of ‘‘*3 1 5 2 4.’’

(24) (1 2 3 4 5)! 1 5 2 3 (4 t)! 3 1 5 2 t (4 t) ¼ 3 1 5 2 4In the first step 5 is extracted from ‘‘2 3 4 5’’; in the second step 3 is

extracted from that as well, violating the prohibition against extraction

from extracted-from constituents.

In what follows I will try to give some idea of how fast CAT grows rela-

tive to P. The table in (25) shows how many elements of P are excluded

from CAT for n ¼ 1 . . . 9, and the percentage excluded. Evidently, CATbecomes a vanishing portion of P.


(25) # Total :Excluded-from-CAT % excluded

3 6:0 0

4 24:2 8.3

5 120:30 25.0

6 720:326 45.3

7 5,040:3,234 64.1

8 40,320:31,762 78.8

9 362,880:321,244 88.5

I have not been able to devise a formula that will give the number of

CAT elements for n elements, so the figures in (25) were calculated by

hand. There is a formula that puts an upper bound on CAT and is still

smaller than P; the table in (26) compares the value of this formula with

P. (FR ¼ Flip-Reassociate upper bound)(26) FR ¼ 22n�3

n Ratio of n! to FR(n)

2 1.00eþ 0003 7.50e� 0014 7.50e� 0015 9.38e� 0016 1.41eþ 0007 2.46eþ 0008 4.92eþ 0009 1.11eþ 00110 2.77eþ 00111 7.61eþ 00112 2.28eþ 00213 7.42eþ 00214 2.60eþ 00315 9.74eþ 00316 3.90eþ 00417 1.66eþ 00518 7.45eþ 00519 3.54eþ 006

210 Chapter 8

The formula is arrived at by considering each node to be independently

flippable, and each pair of adjacent nodes to be independently reassoci-

able; since there are n� 1 of the former and n� 2 of the latter, there are(27) 2n�1 � 2n�2 ¼ 22n�3

ways to transform the RLS to generate CAT. But this overestimates the

actual number of permutations: any pair of adjacent unflipped right-

associated nodes in a structure X can be left-associated to yield another

member of CAT that has the same order of terminal elements, so the

same permutation of elements will be counted twice. I have not figured

out a way to subtract or to estimate the size of such redundancies.

Clearly, if we were modeling a linguistic system involving 15 con-

catenating elements, and the observed permutations of these elements

were found to conform to what would be expected of a CAT system, we

would have resounding confirmation that CAT is a good model of the

system, since the chance that exactly these orders would arise in a system

not essentially equivalent to CAT would be small. Unfortunately, most

linguistic systems do not involve the concatenation of such large numbers

of elements; some cases of interest, such as inflectional systems, may in-

volve 4 to 6 elements, and at that level the di¤erence between CAT and P

is not astronomical. Conclusions can nevertheless be drawn for systems of

this size as well, if a number of di¤erent languages are considered. For

example, since the chance that 10 languages with 5 morphemes are all

CAT ¼ .7510 ¼ 5%, one could claim significant confirmation of the CAT-like behavior of the subsystem in question from a collection of 10 such

languages. With this in mind, in section 8.4 I will survey inflectional sys-

tems with 4 and 5 morphemes to assess CAT as a model of inflection.

8.3 Inflectional Systems as an Instantiation of CAT

Suppose we have a fixed universal chain of elements in the complement-of

relation, as in (28).

(28) Universal elements and hierarchy

AgrS > T > Asp > AgrO > V, or perhaps

T > AgrS > Asp > AgrO > V (type subcategorization only)

As before, the caret in X > Y means ‘X takes things of type Y as com-

plement’, but with no restriction on the linear order of the elements or on

the ‘‘level’’ (i.e., bar level, as in X-bar theory) of the elements.


CAT with (28) as its base is clearly not a good model of any particular

language’s inflectional morphology, as no language has inflectional mor-

phology where, for example, the past tense a‰x may freely occur either

before or after the verb (corresponding to Flip). Any given language will

fix the linear order. In addition, any given language will fix the ‘‘level’’ at

which items attach, in a way that I will make precise.

We might say that CAT models inflectional morphology in the sense

that it sets the limits on possible realizations of the universal chain in (28),

but that any particular language will impose order and level constraints

on the subcategorization of particular items that will yield some subset of

CAT. In particular, it would be interesting to explore the possibility that

the only way inflectional systems can di¤er is in terms of these two prop-

erties. (29) is an attempt to formulate this hypothesis.

(29) Lexical Variation Hypothesis

Language-particular inflectional systems di¤er only in

a. order restrictions

b. level restrictions

on the subcategorizations of individual morphemes or classes of

morphemes.

The Lexical Variation Hypothesis (LVH) is independent of whether CAT

is a good model of inflection in general; it could be that CAT sets accu-

rate bounds on what permutations of elements in general can instantiate

the chain in (28), but that the way languages di¤er within that bound is

something other than (29). In what follows I will be evaluating the LVH

as well as CAT, but CAT is the main prey.

The order restriction determines the di¤erence between prefix and su‰x

for morphemes, and the di¤erence between head-initial and head-final

order in syntax.

The level restrictions have to do with what ‘‘size’’ the complement must

be. The details depend on assumptions about what units are available in

the first place. Two cases will be of interest here. One, already mentioned,

will be the word/phrase distinction; the subcategorization N, for exam-

ple, I will take to be ambiguous between N0 and NP. In addition, we

will need recourse to levels of morphological structure, the most familiar

version of which is the root/stem/word distinction introduced in Selkirk

1982, where stems are composed of roots, but not vice versa, and words

are composed of stems, but not vice versa, giving a three-way distinction

among levels. So we will allow a language to impose a restriction on an

212 Chapter 8

AgrO morpheme, for example, that it attach to a verb root, and not to

any other level of verb, in accordance with the LVH.

I should note that this system will give ambiguous derivations for cases

that are not normally ambiguous, and where there is no obvious semantic

ambiguity. To take an example from English derivational morphology, if

both -ate and -ion are type 1 (root-attaching) su‰xes, then both of the

following structures will be allowed:

(30) a. [[a¤ect þ at]þ ion]b. [a¤ect [atþ ion]]

If the subcategorizations and restrictions are satisfied in (30a), then under

the RC, they must be satisfied in (30b) as well. The possibility of structure

(30b) might be welcome for such cases, as there is some tendency to think

of -ation as a single a‰x in such cases; in the present case, for instance,

there is no word *a¤ectate.

Strictly speaking, a further unfamiliar sort of derivation should be

possible as well. Typically, the lexicon is divided into roots, stems, root-

attaching a‰xes, and stem-attaching a‰xes. But, in fact, the system

proposed here does not give exactly this classification. Consider the prop-

erties of -able and -ity listed in (31a,b).

(31) a. -able, A, Vstemb. -ity, N, Arootc. -[ability]: N, Vstemd. [compactþ [ability]]

The question raised in (31c) is, can -ity attach directly to -able, to derive

the complex su‰x -ability, with the properties shown in (31c) (derived by

the RC)? The question comes down to whether or not -able can satisfy the

subcategorization of -ity, and crucial to the answer is whether it satisfies

the restriction that -ity attaches only to roots. Now, -able itself attaches to

stems, but this leaves open the question whether it is itself a stem or a

root or both. If we decide that it can be a root, then there is nothing to

block (31c), and so (31d) will be a typical nominalization using these two

a‰xes. If CAT is right, then these ambiguities are harmless; if they can be

verified, I would in fact consider them confirmatory, because they would

be puzzling without CAT.

For each language I will examine in section 8.4, I will ask two ques-

tions. First, is the order of inflected elements a CAT order or not? Sec-

ond, is there a reasonable specification of order and level restrictions on

the morphemes that instantiate the functional elements that will yield the


particular shape of the inflected word in that language? The first question

addresses CAT by itself, the second, CAT plus the LVH.

I will begin with the assumption that the chain of elements in (28) is

the fixed universal base for CAT; any flexibility introduced into this as-

sumption would not be necessarily incompatible with CAT, but it would

weaken empirical expectations. If, for example, T and AgrS were ordered

di¤erently in di¤erent languages, we would simply have di¤erent bases

for CAT in those di¤erent languages.

In order to have a verbal morphology, a language needs a set of mor-

pheme classes that span the functional chain. Recall that a morpheme can

span subchains of the functional chain through fusion, which arises when

one of the morphemes that the RC combines is a null morpheme. In

general, the fusions that occur in a language are systematic; for example,

in English AgrS and T always fuse. Such generalizations are part of the

lexical style of the language; but, while fascinating in their own right

and essentially not understood, they are not directly the subject at hand.

In (32) the set {m1, m2, m3} is a spanning vocabulary for F1 . . .F6.

(32)

If the RC generates m1, m2, and m3, then it is guaranteed that m2 can

combine with m3, and the result of that combination can combine with

m1, and so [m1 [m2 m3]] will span the functional structure. The spanning

vocabulary might consist of a‰xes, in which case single inflected words

will span the functional structure; or it might consist of words, in which

case syntactic constructions will span the functional structure (giving rise

to what are called auxiliary verb systems); or it might consist of some

combination of the two. In English, for example, the spanning vocabu-

lary consists of both words and roots and a‰xes.

(33) T > AgrS >

|-----was-----|

Asp > AgrO > V

|-------seeing-------|

Was is a word that spans T and AgrS; seeing is a (derived) word that

spans Asp, AgrO, and V (under the assumption that AgrO is universally a

part of the chain).Was and seeingP can be combined in syntax, since was is

a T, AgrO element and seeingP is a projection of the AgrO element seeing.

214 Chapter 8

(34) a. In lexicon

ing: AgrO þAsp, Vsee: V, NP

seeþ ing! seeingAsp, NP

b. In syntax

was: T, AspP

seeing: AspP, NP

seeing: AspP, NPþNP! [seeing NP]AspP[was]þ [seeing]AspP ! [was [seeing NP]]TP

Importantly, it is the RC that is responsible for the operations in both

syntax and morphology. The only di¤erence is that in morphology it

combines X0-level objects, whereas in syntax it combines X0- and XP-

level objects; but this is the characteristic di¤erence between syntax and

morphology in any event. From that di¤erence arises the further di¤er-

ence that inheritance of subcategorization largely has no e¤ect in phrasal

syntax, since XPs have no subcategorization. It is possible that there are

phrasal syntax junctures of units smaller than XP, in which case inheri-

tance should be detectable again; I will suggest in section 8.5 that this is

the correct view of the syntax of verb-raising constructions.

An obvious di‰culty for the notion of spanning vocabulary, as it arises

from the RC, is the existence of multiple exponence. Multiple exponence

(the expression of a single functional element on more than one mor-

pheme in an inflected verb) should be impossible given the RC. This is

because if a feature is in two morphemes, there is no way those mor-

phemes can be combined by the RC: the subcategorization of one can

never match the type of the other, nor can they be hooked together by

any intermediate morphemes, for essentially the same reason.

Thus far I have assumed that the functional elements that a morpheme

‘‘realizes’’ will be exactly the set of elements that the shape of the mor-

pheme is sensitive to. This is a very natural assumption; for example, the

fact that the appearance, or not, of the -s marker on English verbs is

sensitive to the functional elements of Tense and Person and Number

leads us to suppose that -s ‘‘represents’’ these features. To account for the

possibility of multiple exponence, we must pull apart somewhat these two

properties of morphemes. We must allow a morpheme to be ‘‘sensitive

to’’ more features than it realizes. The result will inevitably be a weaker

theory, though there is at least one version that has some teeth to it.

Suppose the functional elements that a morpheme represents must be


some continuous subsequence of the full chain of functional elements that

it is sensitive to. This would allow a notation in which the functional ele-

ments that the morpheme is sensitive to, but that it does not represent,

can simply be marked as ‘‘inert’’ for the purposes of the RC. (I will use

parentheses to mark such an inert subsequence.)

(35) Multiple exponence

T > AgrS > Asp > AgrO > V

|-------------af1(--------------)|

|------af2------|

Suppose that af1 is ‘‘sensitive to’’ the features T through AgrO, whereas

af2 is sensitive to Asp through AgrO. Without the notion of inert element

the RC could not combine both af1 and af2 with a verb, to derive forms

like (36), because both a‰xes would be subcategorized for AgrO, but

neither would be AgrO.

(36) Vþ af1 þ af2But suppose Asp and AgrO are inert for the purposes of the RC, even

though they are relevant for the paradigmatic behavior of the class of

a‰xes that af1 belongs to. The resulting representations will be as in (37)

and the RC can combine them both with the verb as in (37c), since now

the type of af2 matches the subcategorization of af1.

(37) a. af1: T > AgrS > (Asp > AgrO) T, Asp

b. af2: Asp > AgrO Asp, V

c. [[Vþ af2]Asp þ af1]The restriction of inert elements to a single subsequence is an empiri-

cally sharp prediction, though I am not in a position to present evidence

that would confirm or refute it. We could envision an even tighter ver-

sion, in which the subsequence was always peripheral; again, I have no

idea how such a restriction would fare empirically, but it is easy to imag-

ine a kind of language that would refute it.

But one problem with the account as it stands is that it permits arbi-

trary choices in determining which a‰x has the inert features, and where

features are inert. Imagine two a‰xes, af1 and af2, each sensitive to the

same subchain of three elements.

(38) a. . . . F1 > F2 > F3 . . .

af1: F1 > F2 > F3af2: F1 > F2 > F3

b. Vþ af1 þ af2

216 Chapter 8

There are six di¤erent ways that inertness could be assigned so that af1and af2 can be combined with a verb as successive morphemes, as in

(38b); (39) shows three of them.

(39) a. af1: (F1 > F2 > F3)

af2: F1 > F2 > F3b. af1: (F1 > F2) > F3af2: F1 > (F2 > (F3)

c. af1: (F1) > F2 > F3af2: F1 > (F2 > F3)

We can probably rule out (39a) on general grounds: it gives af1 no fea-

tures for the RC to use in deriving complex verbs, so such an a‰x would

never appear in a derivation that was purely the result of successive

applications of the RC. As for the di¤erence between (39b) and (39c),

there is an interesting connection between inertness of features and para-

digm structure that could be used to give determinate analyses in such

cases. Elsewhere (Williams 1997) I have proposed that the inert elements

will always be minor paradigm dimensions, and the noninert elements will

be major paradigm dimensions. Major dimensions represent the broadest

subdivisions in the paradigm, and evidence for major versus minor

status comes from studying syncretism in the paradigm. The fact that all

English past tense forms fall together (e.g., pleaded is the past form for

all persons and numbers) is evidence for the major status of Tense in

English, whereas the fact that English 3rd person forms do not fall to-

gether ( pleads vs. plead ) shows that Person is a minor dimension.

This connection to paradigm structure could resolve ambiguities in the

lexical assignment of inertness. If the analysis in (39b) were correct, for

example, we would expect F3 to be more major than af2, but no such

expectation arises from analysis (39c).

There are in fact languages with exactly the a‰x pattern illustrated in

(39), Arabic and Georgian being among those I have analyzed as just

proposed. In each language there are two morpheme classes (a prefix class

and a su‰x class) each of which is sensitive to exactly the same set of

features ({SubjNumber, SubjPerson, ObjNumber, ObjPerson} in Geor-

gian and {SubjGender, SubjPerson, SubjNumber} in Arabic). (My anal-

yses were based on prior studies by Anderson (1992) and Noyer (1992),

respectively.)

There is a huge potential for ambiguity in the assignment of inertness

for the Georgian case especially, where four features are implicated. (40)

lists half of the possibilities.


(40) a. af1: F1 > F2 > F3 > F4af2: (F1 > F2 > F3 > F4)

b. af1: (F1) > F2 > F3 > F4af2: F1 > (F2 > F3 > F4)

c. af1: (F1 > F2) > F3 > F4af2: F1 > F2 > (F3 > F4)

d. af1: F1 > (F2 > F3 > F4)

af2: (F1) > F2 > F3 > F4e. af1: (F1 > F2 > F3 > F4)

af2: F1 > F2 > F3 > F4

In both languages examined it turns out that if Fi is a major dimension

for af1, then it is a minor dimension for af2, and vice versa. The Georgian

inflected verb, for example, has the form in (41), where both a‰xes

are sensitive to both subject agreement features and object agreement

features.

(41) F1 > F2 > F3 > F4af1 þ rootþ af2

But an examination of syncretisms in the paradigms for the two a‰xes

shows that Subject features are major dimensions for the su‰x, and

minor for the prefix, and Object features are the opposite, so that the

underdetermination is resolved, giving a system something like (40c).

It is perhaps surprising that the paradigms associated with morphemes

sensitive to identical sets of features should not have the same major/

minor dimensional ordering within the same language, but that may be

one of the milder surprises in store for us in the much-studied but little-

understood human ability to build paradigms. The structure of paradigms

is not the subject of this book; but see Williams 1997 for a discussion of

the paradigms from Arabic and Georgian that substantiate the claims

about paradigms made here.

For present purposes it is enough to know that the hypothesized con-

nection to paradigm structure can eliminate the arbitrariness of deter-

mining what functional elements are inert in what morphemes and can

therefore yield more determinate analyses.

8.4 Some Inflectional Systems

I have already outlined the enterprise of this section. For each language I

will first, determine whether CAT plus a universal base of functional ele-

218 Chapter 8

ments sets the proper bounds on what an inflectional system can do to

represent functional elements; and second, see if the details of word shape

in particular languages can be predicted by specifying level and order

restrictions on particular morphemes or classes of morphemes, in accor-

dance with the LVH.

The simplest sort of language from an inflectional point of view is one

where the RLS of functional elements is realized as an RLS of su‰xes.

(42) V

V

þ af2AgrO

þ af3T

þ af4AgrS

Such a language is the one expected in particular in Cinque’s version of

the Pollock-style model, in which the verb moves in syntax through the

head position of a series of functional projections, one projection for each

functional element, picking up an a‰x in each move by left-adjoining to

it. In the terms I will use to describe inflectional systems here, it is a lan-

guage that exhibits no fusion, and in which each morpheme takes its

complement to the left. If we use a kind of level restriction to bar a‰x-

ation to other a‰xes, then exactly the left-linear structure will result.

(43) [[[Vþ af1]þ af2]þ af3]While languages do exist that so transparently represent the functional

chain, they are somewhat rare.

More complex is a language with some fusion, but with the trans-

parently mirroring order of markers. Consider for example Mohawk or

Southern Tiwa, whose verbal inflectional systems look like this:

(44) a. Ka-’u’u-wia-ban.

1subj.2obj-baby-give-past

(Southern Tiwa)

b. [AgrS > AgrO > V] < T

[AgrS ¼ AgrO > V] < T

c. T: su‰x, T, AgrSAgrS ¼ AgrO: prefix arising from fusion: AgrS, VV: stem

Example (44a) shows that subject and object agreement marking are

fused into one morpheme, ka. (‘‘¼’’ represents the boundary at which twoadjacent elements are fused.) Mohawk and Southern Tiwa have the fur-

ther complication that T is on the opposite side of the stem, but as the

parse in (44b) suggests, this is not a problem for the hierarchical relation


among the elements. The match between functional elements and mor-

phemes is one-to-one except for the single fusion. Note that Mohawk and

Southern Tiwa require the complement order AgrS < T so that AgrO and

AgrS will be adjacent, hence able to fuse; it remains to be seen if that

order is universally possible. (44c) shows the language-particular specifi-

cations that determine the shape of the inflected verb.

Swahili, which does not exhibit fusion, would at first glance seem to

provide an even more transparent representation of the functional ele-

ments and thus an exact match between morphemes and functional

elements. But findings reported by Barrett-Keach (1986) show that the

Swahili inflected verb does not instantiate the RLS. Barrett-Keach shows

that the inflected verb has an internally bifurcated structure, as illustrated

in (45b).

(45) a. [AgrS þ TþAgrO þ V]wordb. [[AgrS þ T] [AgrO þ V]]word

Barrett-Keach gives two kinds of evidence for this conclusion. First, the

inflected verb has the accent pattern that Swahili assigns to compound

terms generally, including nominal compounds: main stress on the pen-

ultimate syllable of the second element, and secondary stress on the

penultimate syllable of the first element. (SP and OP stand for subject

and object pronoun clitic.)

(46) Juma

Juma

a-li-ki-soma

sp-past-op-read

kitabu.

book

‘Juma read the book.’

(Barrett-Keach 1986, (1a))

This would follow if the structure in (45b) were correct, and the two con-

stituents of the inflected verb were identified as stems.

(47) [[T AgrS]stem [AgrO V]stem]word

Barrett-Keach’s second piece of evidence is that Swahili has a su‰x, cho,

indicating relativization, which can appear in the middle of the inflected

verb, exactly between the two hypothesized stems.

(48) kitabu

book

a-li-cho-ki-soma

sp-past-rel-op-read

‘the book which s/he read’

(Barrett-Keach 1986, (10))

220 Chapter 8

Cho is clearly a su‰x, because it can also be appended to the comple-

mentizer (ambaþ cho). It can receive a unitary account only if a-li-cho-ki-

soma has the internal structure indicated in (45), which allows cho to be

appended to the first stem of the inflected verb. We can achieve that

structure by stipulating the following language-specific constraints on the

morphemes that realize the functional elements:

(49) T: prefix

AgrS: stem

AgrO: prefix

V: stem

T and AgrS compose a stem through a‰xation, as do AgrO and V; then

compounding (actually, the RC applying to two stems) assembles the

complete inflected verb from these subunits. Swahili illustrates what

might be called a word-internal auxiliary system (the T-AgrS stem), and

this treatment of it prefigures my general treatment of auxiliary systems.

I now turn to the more problematic cases for theories that essentially

expect the RLS (or LLS) as the only realization of functional elements.

The first is Navajo, in which AgrS intervenes between AgrO and V.

(50) AgrO Asp T AgrS V

There are two ways to parse this structure in CAT terms, depending

on whether T > AgrS (51a) or AgrS > T (51b). The lexical specifications

needed to force the analysis are given below each parse.

(51) a. [AgrO < [Asp < [T > AgrS]]] > V

T: prefix

AgrS: stem

Asp: prefix

AgrO: prefix

b. [[AgrO < Asp] < T] < [AgrS > V]

T: su‰x

Asp: su‰x

AgrS: su‰x

AgrO: stem

Mohawk and Swahili both require T > AgrS, so we might want to tenta-

tively assume that as the universal order and therefore favor parse (51a).

On behalf of parse (51b) we could point to the uniform su‰xation to the

AgrO stem that would result; although mixed systems exist with both


prefixes and su‰xes, the economics of the lexicon may favor uniform

prefixation or su‰xation. I leave the question open.

Inuit is the mirror image of Navajo, with AgrS between V and AgrO as

a su‰x.

(52) a. V T AgrS AgrOb. Piita-p

Piita-erg

mattak

mattak.abs

niri-va-a-0.

ate-indic-3sg.subj-3sg.obj

‘Piita ate the mattak.’

(Bok-Bennema 1995, 105)

(53) V < [T > AgrS > AgrO]

AgrS: prefix

T: prefix

AgrO: stem

(52) shows the order of elements, and (53) shows the parse and the lexical

specifications that force the analysis. As with Navajo, a di¤erent parse

results if AgrS > T.

In Yuman, Lakhota, and Alabama, on the other hand, there is a CAT

parse only if T > AgrO.

(54) AgrO AgrS V T

[[AgrO < AgrS] > V] < T

(P. Munro, personal communication)

There is no parse if AgrS > T, as the string then represents the ‘‘3 1 4 2’’

configuration already shown to lie outside CAT. We now have a conflict

between the requirements of two di¤erent languages: Navajo requires

T > AgrS because its fusion of AgrS and AgrO entails that these must be

adjacent in the chain; Yuman, on the other hand, requires AgrS > T. An

obvious way to resolve this would be to allow languages to di¤er in their

choice on this point, or, equivalently, to claim that there are two distinct

notions of Tense, one superior to and the other inferior to AgrS. Con-

vincing support for the latter position would of course be a language in

which both occur. I will leave the matter unresolved.

In English, auxiliary verbs are part of the spanning vocabulary. The

auxiliary verbs take their complements in syntax rather than morphology;

consequently, their complements are XPs rather than Xs. The special

feature of the morphology is that only one a‰x occurs on the verb; AgrSand T are always fused. Temporarily abandoning the fixed universal

chain, I will interpolate some Asp elements and a Voice element in the

222 Chapter 8

functional chain. How this new chain is related to the universal chain will

be left open.

(55) English functional chain

AgrS ¼ T > Asp1 > Asp2 > Voi > V

How various elements of the spanning vocabulary are related to the

functional chain is shown in (56).

(56) T Asp1 Asp2 Voi V

might have been being killed

|-killed-|

|------killing------|

|-----------killed-----------|

|------------------kill------------------|

|-----------------------kills-----------------------|

|------------passive--was------------|

|-------has--------|

|--been--|

|-modal-|

[----s-----|

The complex items shown here are derived by the RC from more ele-

mentary morphemes; for example, kills, which spans the whole chain, is

formed from kill and -s. The basic rule for relating form to function here

is the following: the stem of a form is determined by the left edge of its

span, and the form of that stem by the right edge. For example, has spans

T and Asp1. If it spanned more to the right, a di¤erent stem would be

used (has vs. was); if it spanned less to the left, a di¤erent form would

be used (e.g., has vs. have).

A given clause will span functional structure by a combination of mor-

phological and syntactic derived units. For example, (57) shows the

derivation of the pieces of John was being sued, from bottom to top.

(57) [be < ing]Voi, V morphology

[sue < ed]V, NP morphology

[being > suedP]VoiP syntax

John [was > [being > sued]]T syntax

The RC applies in both the lexicon and the syntax. The only di¤erence in

the outcome is determined by independent di¤erences between morphol-

ogy and syntax: complementation is left-headed in syntax but right-headed


in morphology, and complements are phrases in syntax but X0s in

morphology.

The RC, along with lexical specifications of the type that the LVH

a¤ords, thus lays out a good first approximation to the general question,

what is a possible verbal inflectional system in natural language? The fact

that the RC is invariant across agglutinating and isolating systems makes

it the only real candidate for a general answer. In what follows I will

sketch its role in other domains.

8.5 Verb (Projection) Raising as an Instance of CAT

I now turn to an application of CAT outside inflectional morphology:

namely, the realm of verb projection raising. In fact, I believe the appli-

cations of CAT outside morphology are numerous, and I have picked

verb projection raising merely as an illustration. My best guess is that

CAT is the relevant model of a system that involves only the playing out

of lexical specifications of type, order, and level.

The analysis presented here is based on Haegeman and Van Riems-

dijk’s (1986) discussion of the phenomenon. The model I present below

incorporates insights from their work, but rejects the role of movement in

the system, deriving all forms directly by the RC and lexical specifications

of order and level.

Example (58) illustrates verb raising in Dutch.

(58) a. *dat

that

Jan

Jan

een

a

huis

house

kopen

buy

wil

wants

(‘‘DS’’) NP < V < V

‘that Jan wants to buy a house’

b. dat Jan een huis wil kopen (VR) NP < [V > V]

c. *dat Jan wil een huis kopen (VPR) V > [NP < V]

(H&VR 1986, 419)

(58a) is the (ungrammatical) deep structure in Haegeman and Van

Riemsdijk’s (H&VR) model; (58b) is the verb-raising (VR) structure.

(58c) is the verb projection raising (VPR) structure, which is ungram-

matical in Dutch. In the VR construction, an embedded verb is raised out

of its complement and adjoined to the matrix verb, to the right; in the

VPR construction, the same operation is performed on an embedded VP.

While VPR is ungrammatical in Dutch, it is found in some other Ger-

manic dialects, such as West Flemish (59) and Swiss German (60).

224 Chapter 8

(59) a. da Jan een hus kopen wilt NP < V < V

b. da Jan een hus wilt kopen (VR) NP < [V > V]

c. da Jan wilt een hus kopen (VPR) V > [NP < V]

(H&VR 1986, 419)

(60) a. das de Hans es huus chaufe wil NP < V < V

b. das de Hans es huus wil chaufe (VR) NP < [V > V]

c. das de Hans wil es huus chaufe (VPR) V > [NP < V]

(H&VR 1986, 419)

I will analyze the VR and VPR constructions as instantiations of CAT.

This means that CAT sets the outer bounds on the form that these

constructions can take. It also means that all variation will be found in

the level and order subcategorizations of predicates or classes of predi-

cates. In the right margin of the constructions listed above are the CAT

representations.

If we were to interpret CAT as RLSþ (or, more appropriately, LLSþ:just like RLSþ, but using the LLS instead of the RLS as the base), thenwe would take (59a) as the LLS and [[NP < V1]VP < V0] as the base

structure, and we would apply Flip and Reassociate to derive [NP <

[V0 > V1]], which is the West Flemish VR structure. Some mechanism

would be needed to guarantee that Reassociate and Flip applied obliga-

torily in this case.

I will instead model V(P)R directly as CAT, in accordance with the

LVH. Under this interpretation Flip will correspond to order : right,

absence of Flip to order : left, and optional Flip to unspecified order. Left-

Reassociate will correspond to level :X0, which gives VR; lack of Left-

Reassociate will correspond to level :XP; and optional Left-Reassociate

will correspond to unspecified level. It seems to me that the entire range

of constructions discussed by H&VR can be described in these terms.

Dutch, for example, obligatorily Flips embedded verbs, but never VPs;

in CAT terms this means that the verbs in question have the subcatego-

rization shown in (61).

(61) V0

Modal verbs are exceptional in that they undergo Flip optionally.

(62) a. dat

that

ik

I

hem

him

zien

see

wilMwant

b. dat ik hem wilM zien

(H&VR 1986, 426)


In CAT terms this means that the order parameter is unset for these

verbs; or, equivalently, they have the additional subcategorization in (63).

(63) V0

There is an unexpected exception to (63): only basic Vs can have this

subcategorization, not V0s that are themselves complex verbs.

(64) a. *dat

that

ik

I

hem

him

kunnen

can

zien

see

wilMwant

b. dat ik hem wilM kunnen zien

‘that I want to be able to see him’

(H&VR 1986, 426)

This restriction is intuitively a level constraint: complex [V V] structures

are ‘‘bigger’’ than simple Vs. If we use the term stem in such a way that it

includes simple Vs, but excludes V-V compound verbs, then we could add

the level restriction to (63) to get (65).

(65) V0stem

In all of these cases the derived verb cluster has the same subcatego-

rization as the complement verb in the cluster, as determined by the RC.

As a result, hem is the direct object of the complex cluster in (64b), for

example, so the CAT structure of that clause is as follows:

(66) dat ik [hem < [wil > [kunnen > zien]V]V]VP

German has obligatory Flip for auxiliary verbs (H&VR 1986, 427) but

optional Flip for modals; these are straightforwardly treated as order

constraints on the model of Dutch.

West Flemish obligatorily Flips either the V or the whole VP around a

modal or auxiliary, as in (59). The order and level restrictions that ac-

count for this are as follows:

(67) M,A: V

The notation V is to be understood as ‘V0 or VP’; that is, no level con-

straint is applied, and so the term covers both VR and VPR.

I now turn to the complexities that arise when a series of VPs is

involved in VPR in Swiss German. I will show that the lack of a level

constraint in (67) accounts precisely for a complex array of possible out-

comes. The possible orders of a series of four verbs in which the lowest

takes a direct object are listed in (68).

226 Chapter 8

(68) a. das

that

er [

he

en

an

arie

aria

singe]

sing

chone]

can

wele]

want

hat

has

‘that he has wanted to be able to sing an aria’

b. N < V4 < V3 < V2 < V1c. V1 NP V2 V3 V4d. V1 V2 NP V3 V4e. V1 V2 V3 NP V4f. *V1 V2 V3 V4 NP

(H&VR 1986, 428)

The verbs must all appear in Flipped order; the direct object can appear

anywhere in the series except after the most deeply embedded comple-

ment. This patterning follows immediately from the stipulation in (67),

coupled with the further stipulation that no verb that takes a direct object

can take it on the right.

(69) a. M,A, V

b. V, NP

The absence of a level constraint in (69a) corresponds in RLSþ to op-tional Reassociate; Flip is obligatory, so the verbs always appear in

exactly reverse order (the reverse of (68b)).

(70) a. Reassociate at will.

b. Flip all V < V nodes. (for complex as well as simple Vs)

c. Flip no NP < V nodes. (for complex as well as simple Vs)

d. V2, NP

e. V1, V

f. V1 þ V2 ! [V1 > V2]V, NPThe stipulation in (70b,c) that Flip is forced (or fails) for both complex

and simple Vs taking direct objects follows from the RC, hence does not

count as a separate stipulation. If a complex verb is formed by combining

a modal or auxiliary with a transitive verb, the subcategorization of the

transitive verb will be inherited, including any order restriction, as the RC

dictates—(70f ) is the result of combining (70d) and (70e) with the RC. So

the extra stipulation in (70c) is not part of the theory; rather, it is added

for clarification.

(70a–c) can generate all of the patterns in (68). (68d), for example, is

derived by applying Reassociate followed by obligatory Flip.


(71)

It is important to remember that Flip and Reassociate are not essential to

the analysis; rather, they are just a way to think about CAT. The entire

analysis is (69) by itself.

A further consequence is that when the embedded verb has two argu-

ments, they may individually appear anywhere among the set of rean-

alyzed verbs, so long as they do not exchange places; the verbs will be

ordered among themselves exactly as in the one-argument case (68).

(72) das

that

er

he

em Karajan1(to) Karajan

en

an

arie2aria

vorsinge3sing-for

chone2can

wil1wants

‘that he wants to be able to sing an aria for Karajan’

(H&VR 1986, 434)

(73) a. NP1 NP2 V1 V2 V3b. NP1 V1 NP2 V2 V3c. V1 NP1 V2 NP2 V3d. V1 NP1 NP2 V2 V3e. V1 NP1 V2 NP2 V3f. V1 V2 NP1 NP2 V3g. *. . . NP2 . . . NP1 . . .

h. *. . . V3 . . . N . . .

In order to treat these cases as CAT, we must have some means of

representing verbs that take two arguments. We will adopt the ‘‘small

clause’’ analysis.

(74) [[NP < NP] < V]

Given this, we can derive all of the patterns in (73) from the stipulations

in (70). In terms of Flip and Reassociate, we can derive all of the patterns

in (73) from (73a). For example, we can apply Reassociate to (73a) to

derive (75a), and then apply Flip to derive (73f ); or we can apply Reas-

sociate to (73a) to derive (75b) and then apply Flip to derive (73d); or we

can simply not apply Reassociate, but then apply Flip to derive (73b).

(75) a. ! [[[NP1 < [NP2 < V3]] < V2] < V1] Flipm

[V1 > [V2 > [NP1 < [NP2 < V3]]]] (73f )

b. ! NP1 < [[NP2 < [V3 < V2]] < V1] Flipm

[V1 > [NP1 < [NP2 < [V2 > V3]]]] (73d)

c. [NP1 < [V1 > [NP2 < [V2 > V3]]]] (73b)

228 Chapter 8

As in the previous example, Flip and Reassociate play no role in the

analysis, which is completely determined by (69).

CAT’s success in modeling V(P)R is considerable, and the evidence

for the LVH is compelling as well. With very simple lexical stipulations

about subcategorization of individual lexical items or classes of lexical

items—mechanisms that surely no theory could forgo—we have suc-

ceeded in modeling V(P)R as described by H&VR, but without move-

ment and without the novel mechanism of dual analysis that they

believed necessary to describe the phenomena.

If CAT is the appropriate model whenever lexical subcategorizations

are played out in syntax, then it should come as no surprise that V(P)R

shows CAT-like behavior. Other constructions where CAT should be ap-

plicable are noun incorporation, causatives, derivational morphology,

and preposition stranding.

But not wh movement. CAT is not Categorial Grammar as espoused

by (among others) Bach (1976), Moortgat (1988), and Steedman (1996) in

that it lacks type-lifting, the feature that makes it possible to embed

descriptions of the broadest long-distance dependencies.

8.6 The Hungarian Verbal System

Hungarian has a verbal system very much like that of Germanic. It can

be similarly modeled by CAT, but with one striking shortcoming. Tradi-

tionally, the positioning of the Hungarian verbal modifier (VM, to be

explained below) has been modeled along with the rest of the verbal sys-

tem. CAT cannot do this. CAT gives a simple and satisfying model of the

verbal system minus the VM, capturing many of its very particular (but

robust) properties. But when the CAT definitions needed to model the

positioning of the VM are added to it, it overgenerates to the point that

the model is useless, no longer predicting any of the interesting features.

CAT is so restrictive that its failure to model a system is by itself in-

formative, and so no cause for lament. But in this case the message is

sharper: it suggests that, despite tradition, the positioning of the VM is

independent of the verbal system. In the end I will o¤er reasons to think

this is so.

8.6.1 The Verbal System without VMs

I will quickly sketch the verbal system first without the VM, and then

with the VM, noting the main generalizations. These generalizations


represent a hard-won understanding of the system developed over a de-

cade or so by Kenesei (1994), Szabolcsi (1996), Koopman and Szabolcsi

(2000), and Brody (1997), among many others.

Hungarian has a small series of optional ‘‘modal’’ verbs that occur in a

clause in fixed interpretive order, just the sort of system CAT likes.

(76) Nem

not

fogok

will.1sg

kezdeni

begin.inf

akarni

want.inf

be

in

menni.

go.inf

‘I will not begin to want to go in.’

(Koopman and Szabolcsi 2000, 16)

Ignoring the VM (be), each element in (76) has scope over all elements to

its right. Furthermore, any reordering of adjacent elements results in

ungrammaticality. From this, we can conclude that the following order

holds:

(77) nem > fogok > kezdeni > akarni > main-verb

In its rigidity, and its rightward orientation, this system resembles for ex-

ample the English auxiliary system, and in fact, Koopman and Szabolcsi

(2000) refer to the order in (76) as the English order. I will adopt this term

from them and use it to refer to the ‘‘head-first’’ order. It is of course the

RLS.

In addition to the order displayed in (76), Hungarian has a di¤erent—

in fact, opposite—way to deploy the series in (77).

(78) a. Nem [fogok > kezdeni > [[be < menni] < akarni]].

b. Nem [fogok > [[[be < menni] < akarni] < kezdeni]].


Importantly, the interpretive order of the elements in (78) is the same as

in (76); that is, akarni always has scope over menni, for example, despite

their being in opposite orders in (76) and (78). In other words, (78) rep-

resents di¤erent ways to realize the abstract structure in (77). The carets

in (78) indicate the understood orders. The order of elements in (78b) I

will call the compound order, as the head-complement order is that found

in compound terms. Brody calls it the roll-up order, for good reason, as

we shall see. The tensed verb and its complement are always in the

English order.

As the forms in (78) show, any given sentence with multiple auxiliaries

will show a mixture of the English and compound orders. But there are

strong constraints on the mixture.

230 Chapter 8

1. The tensed verb cannot occur in a compound order.

(79) a. fogok > be < menni < akarni < kezdeni

b. *be < menni < akarni < kezdeni < fogok

2. Any compound structure must be at the bottom of the string of

auxiliaries.

(80) a. nem > fogok > kezdeni > akarni > be < menni

b. nem > fogok > akarni > be < menni < kezdeni

c. *nem > fogok > [akarni < kezdeni] > be < menni

3. The English order cannot occur inside a compound order.

(81) a. fogok > be < menni < akarni < kezdeni

b. *fogok > be < [akarni > menni] < kezdeni

These three findings can be summed up in the following recipe for creat-

ing alternative orders for a given string of auxiliary verbs completely in

the English order: beginning at the bottom, the bottom two terms can be

compounded, or ‘‘rolled up’’; and this rule can be applied repeatedly, but

not at the very top, where the tensed verb must be in the English order.

This system is easily modeled in CAT. Since each auxiliary, apart from

the tensed auxiliary, can appear on either side of its complement, each is

ambiguous with respect to order; that is, each has both of the following

subcategorizations:

(82) F, F

This by itself is not enough, because, with the RC, it will generate all of

the ungrammatical orders in (79)–(81). (80c), for example, would count

as grammatical, with exactly the parse indicated. To prevent this, we

must also impose level constraints. There is some question what the rele-

vant levels are; I will assume they are word and phrase (as the term com-

pound in compound order suggests). Assuming further that the compound

order is essentially lexical, and the English order is essentially phrasal, we

have the following subcategorization:

(83) Aux: Fn, F0

That is, each auxiliary takes a phrase of type F to the right, or a word of

type F to the left.

Furthermore, because the tensed auxiliary does not participate in the

compound structures, it has strictly the left of the two subcategorizations

in (83).


(84) AuxT: Fn

I assume this is a further stipulation, as there is in general no ban on

tensed verbs entering compound structures (e.g., English baby-sat).

Then, given the RC, along with the assumption that words can head

words, and words can head phrases, but phrases cannot occur in words,

we predict some of the contours of the Hungarian system. The fact that

the English order cannot occur in the middle of a compound follows from

the fact that a phrase (the bracketed FP in (85)) cannot occur in a com-

pound (marked here with { }).

(85) *fogok > {[akarni > [be < menni]]FP < kezdeni}

The fact that a compound cannot occur in the middle of a sequence of

auxiliaries does not follow from the specifications in (83). (86) is a parse

of such a case consistent with (83).

(86) *nem > fogok > [akarni < kezdeni]Aux; VP > [be < menni]VP

In (86) akarni and kezdeni form a compound verb, where akarni has its

VP-taking, rather than V-taking, subcategorization; that subcategoriza-

tion is inherited by the compound, according to the RC. Although some

speakers accept forms very much like this, I will assume that they are

ungrammatical, and I will introduce the further specifications necessary

to rule them out.

The problem would be solved if akarni were prevented from using its

VP-taking subcategorization when it was in a compound. This can be

achieved by reconstruing the ambiguity of the auxiliary verbs in a slightly

di¤erent way. Specifically, the principal ambiguity will be between root-

and word-level forms for each of the auxiliaries, as in (87).

(87) akarni: root, Frootword, Fn

That is, akarni is still ambiguous, but between the two levels root and

word; roots enter into the compounding system, and words into phrasal

syntax. Now (86) cannot be produced; only the root akarni can appear on

the left of a compound, and only a further root subcategorization can be

inherited by the compound.

To allow compound structures to appear in syntax, we must allow

roots to be reconstrued as words; once this is done, they can be used in

syntax, but they cannot enter the compounding system again. But this is

the classical relation between words and phrases.

232 Chapter 8

While the ‘‘coding’’ in (87) may appear suspicious, it is really harmless,

when one considers that if CAT is the model, the only way languages can

di¤er is with respect to level, order, and type restrictions, and these

restrictions are enforced in a rigid local fashion by X-bar inheritance and

the RC. I suspect that the ambiguity in (87) occurs in English as well,

with particle-verb constructions; that is, the relation between (88a) and

(88b) is really a level-order ambiguity between root- and word-level iden-

tification of the particle itself.

(88) a. John [looked up]V the answer.

b. John looked the answer up.

c. *John looked right up the answer.

d. John looked the answer right up.

e. the looking up of the answer

f. *the looking of the answer up

The lexical version of the particle excludes modification (88c), whereas

the syntactic version allows it (88d). The lexical version nominalizes (88e);

the lexical particle is ‘‘inside’’ the nominalization and therefore immune

to the laws governing the form of NPs. The syntactic version does not

nominalize (88f ); the syntactic particle is ‘‘outside’’ the nominalization,

where it is excluded from NP on general grounds. I imagine this line of

analysis could be applied to German separable prefixes as well.

Finally, to account for the absence of tensed verbs inside compound

structures, we require that T be represented only by a word-level element.

In the reformulation this remains a separate stipulation.

These stipulations exactly account for the Hungarian compounding

paradigm, if the VM is excluded.

Koopman and Szabolcsi (2000) seek a theory of clusters that involves

only phrasal syntax and XP movement. They thus seek to avoid any

reference to the lexical/phrasal distinction on which the analysis just

given rests. Their theory thereby also distinguishes itself from any of the

theories in which the roll-up structure results from X0 movement, and

VM fronting from XP movement.

But on close inspection the relevant distinction can be found in Koop-

man and Szabolcsi’s account, just relabeled as ‘‘smallness’’ instead of

‘‘lexicality.’’ Smallness, never defined, has less intuitive content than lex-

icality, though it would seem to be extensionally equivalent to it, judging

from the examples that Koopman and Szabolcsi give. But ‘‘smallness’’

leads to grave problems that ‘‘lexicality’’ does not have.


What allows Koopman and Szabolcsi to contemplate the elimination

of X0 movement is that massive remnant movement makes it possible

to simulate lexical movement by phrasal movement, as in the following

derivations:

(89) a. [XP YP ZP H]HP ![XP [YP [ZP [tXP tYP tZP H]HP]]]![tXP tYP tZP H]HP [XP [YP [ZP tHP]]]!

b. [XP YP ZP H]HP ![H [XP YP ZP tH]HP]

The pair of movements in (89a) result in the same surface configuration

as the movement in (89b). The movements in (89a) are first evacuation of

everything in HP except its head, followed by movement of the remnant

HP. The movement in (89b) is head movement.

Koopman and Szabolcsi simulate the head clustering for verbs in the

compound structure with the following condition:

(90) When the specifier of VPþ is a small VM or an inverted sequence,VPþ optionally extracts from CP. Otherwise, VPþ cannot extractfrom CP.

For reasons of space, I will not explain here how this principle interacts

with the theoretical environment that Koopman and Szabolcsi provide to

yield the constructions I have identified as lexical, or at least as involving

nonphrasal heads; but see Williams, in preparation, for a full discussion.

It is enough to see that lexicality is entering the system under the guise of

smallness. I think this is a step backward from the general understanding

of these constructions, in that it replaces a word with a relatively concrete

meaning (lexical ) with one distinctly less concrete (small ).

8.6.2 The Verbal System with VMs

I think that the fact that the RC with X-bar inheritance allows the be-

havior of the Hungarian verbal system, so complex at first glance, to be

boiled down to (87) (with help from (84)) is an impressive result. The

analysis is challenged, however, by the behavior of the VMs, which can-

not be fit into the system without losing all predictions.

The VM is a particle, or sometimes a short phrase, that is closely asso-

ciated with the main verb, sometimes forming an idiomatic expression

with it. The VM occurs either before or after the tensed verb, depending

on features of the sentence in which it occurs. If there is a preverbal neg-

234 Chapter 8

ative or Focus phrase, the VM occurs after the verb; if not, and if some

other conditions are met, it occurs before the verb.

(91) a. Nem

not

fogok

will.1sg

be

in

menni.

go.inf

‘I will not go in.’

b. Be fogok menni.

c. *Nem be fogok menni.

d. *Be nem fogok menni.

Be is a complement of menni; but in (91b) it occurs to the left of the

tensed auxiliary verb. And in fact, an unbounded number of auxiliary

verbs can appear between the particle to the left of the tensed verb and

the verb of which it is a complement.

(92) Be

in

fogok

will.1sg

kezdeni

begin.inf

akarni

want.inf

menni.

go.inf

The question is, what regulates the relation between these two positions?

The ‘‘trigger’’ for the appearance of be in initial position has been

argued to be phonological (e.g., Szendroi 2001): the auxiliary verb needs

‘‘support,’’ if not from a negative or a Focus, then from a particle. I will

assume that the trigger is an extrinsic constraint that CAT is not obliged

to model. Even so, CAT fails.

So far I have posited leftward root subcategorization for the compound

order and rightward phrasal subcategorization for the English order. To

generate (92), the CAT specifications must admit a third possibility—

namely, that a sequence of words can realize the English order, as only

words can transmit, via the RC, the lower verb’s need for the particle to

the top of the verb chain.

(93) a. Aux: Fword

b. menni: be

c. be < [fogok > kezdeni > akarni > menni]

If each auxiliary has a specification like the one in (93a), and the verbs

taking VMs have specifications like the one for menni in (93b), then (92)

will have a parse like (93c).

There is in fact some circumstantial evidence in favor of treating VMs

in this way. The verbs that enter into compounding relations with one

another are approximately the same verbs that permit VM raising: utalni

‘hate’, for example, does neither. But the lists are not identical (K. Szen-

droi, personal communication), so this consideration is hard to evaluate.


But there are two problems with analyzing VMs in this way.

First, (93) predicts that particle movement should be compatible with

compounding, but it is not.

(94) *Be < [fogok > kezdeni > [menni < akarni]].

Particle raising is compatible only with the pure English order, so any

compounding interferes. From the point of view of CAT this is very odd,

as other phrasal complements are compatible with compounding, which

shows that compounding is transparent to a main verb’s subcategoriza-

tion. For example:

(95) Nem >

not

fogom

will.1sg

> akarni

want.inf

> [szet

apart

szedni

take.inf

< kezdeni]

begin.inf

a

the

radiot.

radio

This example shows that compounding of the main verb (represented by

the bracketed sequence) does not prevent the main verb’s direct object

subcategorization (szetszedni: NP) from becoming the subcategorization

of higher constituents. If for direct objects, then why not for particles?

Second, particles seem to be able to raise out of embedded CP com-

plements under certain circumstances. For example:

(96) Szet

apart

kell,

must

hogy

that

szedjem

take.subjunctive.1sg

a

the

radiot.

radio

‘I must take apart the radio.’


Although such cases are quite restricted, the fact that they exist at all

suggests that CAT is not the right mechanism to account for them.

These two properties of VM positioning—opacity of the compound

structures and nonlocality—both point to movement in the classical

sense, rather than CAT inheritance. Compounds are always opaque to

syntactic movement, but CPs are not.

If indeed the VM is positioned by movement and not by the same sort

of system that creates the verbal clusters, a sharp theory is needed to ex-

plain how a child would not be led astray by all the evidence that has

misled linguists into analyzing the two phenomena as one system. CAT is

just such a theory, because simple considerations unequivocably rule it

out as a model of the VM, even though it is an obvious model of the

verbal clusters.

Another reason to implicate movement in the positioning of the VM is

noted repeatedly by Koopman and Szabolcsi (2000): the VM can often be

236 Chapter 8

a full phrase. This again is characteristic of movement, especially move-

ment that bridges CPs.

(97) [a

the

szoba-ba]PProom-into

menni

go.inf

‘go into the room’

And, importantly, the VM cannot be phrasal when incorporated into a

compound.

(98) *[[a

the

szoba-ban]PProom-in

maradni]

stay.inf

akarni

want.inf

‘want to stay in the room’

This example falls within the scope of the theory outlined in section 8.6.1:

compounding involves X0s exclusively. (96) and (97) fall outside that

theory.

I think that CAT’s initial di‰culty in modeling the Hungarian verbal

complex turns out to be its virtue: CAT has the grace to fail obviously

and thereby to show where nature is jointed. Perhaps, as the last few

points independently suggest, the Hungarian VM does not compose a

homogeneous class of elements with the verbal particles after all.

In light of our conclusions about Hungarian, we can return to the

problem raised in chapter 7 about verb clusters in Czech and related lan-

guages; (99) repeats the facts from that discussion.

(99) a. Dal

give.prt

jsem

aux.1sg

mu

him.dat

penıze.

money.acc

‘I gave him money.’

b. Tehdy

then

bych

aux.1sg

byl

was.prt

koupil

bought.prt

knihy.

books.acc

‘Then I would have bought books.’

c. Byl bych tbyl koupil knihy.

d. *Koupil bych byl tkoupil knihy.


When there is a single participle, it can move to the left of the auxiliary.

When there are two participles, the first can move to the left of the auxil-

iary, but the second cannot. With the Hungarian system as a model, we

formulate the following restrictions:

(100) a. aux: PartP

b. Part0

c. part: XP


That is, auxiliary verbs can take a following participial phrase or a pre-

ceding participial stem; participles, on the other hand, always take an XP

complement. When both an auxiliary and a participle are present, two

structures are possible.

(101) a. aux > [part > X]PartPb. [[part < aux] > X]

(101b) corresponds to the possibility of (99a). When there are two par-

ticiples, the following structures are possible:

(102) a. aux > [part1 > [part2 . . . ]Part2P]Part1Pb. [part1 < aux] > [part2 . . . ]Part2Pc. *[part2 < [aux > part1]][ PartP]d. *[part2 < [part1 < aux][ PartP]]

The first participle can form a complex word with the auxiliary, and the

result will have the subcategorization of the nonhead part1 and so takes

Part2P on the right. But there is no way for the second participle to ap-

pear on the left as in (102c), because the unit [aux > part1] will itself be

phrasal and therefore cannot take a stem complement to the left. Simi-

larly, (102d) cannot be formed because [part1 < aux], while a stem-level

object, inherits its subcategorization from its nonhead (part1) and so can

only take an XP to the right, not a participial stem to the left. In Czech,

then, auxiliary verbs are just like the Hungarian cluster-forming auxiliary

verbs, and participles are like Hungarian nonauxiliary verbs.

These languages even have an analogue of Hungarian VM positioning.

There is a general rule of XP topicalization (Rivero 1991, Konapasky

2002) that can fill the initial position, illustrated here in Serbo-Croatian.

(103) [Citao

read.prt

knjigu]VPbook

je

aux

Ivan tVP.

Ivan

‘Ivan had read the book.’


If such an example were taken to show that aux had, in addition to (100a)

and (100b), a subcategorization like the following, then, as in Hungarian,

all sorts of unrealized possibilities would arise:

(104) aux: XP

Rather, as Konapasky (2002) shows, such phrases occupy the initial po-

sition by virtue of an entirely di¤erent process of XP topicalization.

238 Chapter 8

Chapter 9

Semantics in RepresentationTheory

Two features of RT lead to revisions in the standard assumptions about

how semantics is determined by syntactic form. One stems from the no-

tion of derivation in RT. In the syntactic analysis of a sentence, there is

no single structure that represents all of the information relevant to se-

mantics; semantics then must be done over the whole set of forms that

constitute the derivation and the matching relations that hold among

them. The other stems from the fact that the shape-conserving matching

that holds between levels does not always correspond to isomorphism, as

we have seen in several cases, beginning with the bracketing paradoxes of

chapter 1. To the extent that one end of such matches is semantic (or,

more semantic than the other), they give rise to instances in which the

system deviates from a strictly compositional system, the sort of system

that is standardly assumed. In sections 9.1 and 9.2 I will briefly outline

the issues involved in these two deviations, but without arriving at any

firm conclusions, apart from what I have just mentioned; the discussion

is provisional and speculative throughout, even by the standards of the

previous chapters. The role of blocking in determining meaning will re-

ceive special attention, since, as pointed out frequently in this book,

blocking is part and parcel of Shape Conservation: the most similar

blocks all the less similar, all else being equal.

In sections 9.2–9.5 I will explore, in the most preliminary possible way,

how RT fares in analyzing certain problems connected with the form-

meaning relation. In some cases I think an obvious advantage can be

demonstrated; in other cases I can show no more than that a coherent

account is possible. In section 9.2 I will illustrate the role of the blocking

aspect of Shape Conservation in understanding the contribution of Case

marking and the like. In section 9.3 I use the RT levels to index di¤erent

sorts of focus. In section 9.4 I address some problems in ellipsis, and in

section 9.5 I sketch how RT levels can be understood to index di¤erent

kinds of NP interpretations.

9.1 Compositionality

9.1.1 Matching and Compositionality

I take the interesting hypothesis about compositionality to be that it is

strict—every phrase’s meaning is some strict function of the meaning of

its parts; otherwise, the hypothesis does not say much. In what follows I

will be talking about representations of meaning that have structure: rep-

resentations that indicate the scopes of quantifiers, or that identify the

thematic roles of NPs, or whatever else there is—some of the levels of

RT. So I will be discussing translation of syntactic structures into some

other kind of language, not real semantics, which relates sentences to the

world. The question about compositionality then is one of compositional

translation: is the translation of every phrase X strictly a function of the

translation of its parts?

In the compositional scheme we start with a syntactic tree in language

A, and step by step, from the bottom up, we build some translation of

that tree. We do this for every sentence in language A, thus deriving a

second language B, consisting of all those translations. So B is whatever

A translates to.

But there is another way to think of translation. We can think of the

languages A and B as both antecedently defined, and of the translation as

a ‘‘matching’’ relation between them, one that matches to every sentence

in the first language a corresponding item in the second language.

Of course, compositional translation can be viewed as one particular

kind of matching relation. In fact, if we require the matching relation to

be absolutely ‘‘shape conserving’’—that is, if it matches up structures in

language A with structures in language B, observing conditions on the

identification of terminal elements across the two languages and respect-

ing isomorphism of structure—then the matching kind of translation

might be indistinguishable from compositional translation. In composi-

tional translation, the bottom-up piece-by-piece building of the second

tree based on what is found in the first tree will result in a tree that is

isomorphic to the first tree, and so matching translation and composi-

tional translation will always come out the same.

But there is a circumstance in which these two notions of translation

could diverge. For matching translation, we can think of the two lan-

240 Chapter 9

guages that are being matched up as definable independent of one an-

other, according to laws of form that might di¤er. In that case there

might not be an isomorphic match in the second tree for every phrase in

the first. This need not necessarily prevent the matching translation from

being a complete translation. For example, if there is no isomorphic

structure, the matching relation might pick the ‘‘nearest’’ structure as the

translation—still shape conserving, but not strict. The translation will be

fully defined, but it will diverge from a compositional translation for such

cases.

In the course of exposing RT in this book I have already presented

cases like this, which support the idea that the translation is matching,

not compositional. The cases of mismatch discussed in chapters 1 and 2

all have this character: in-situ quantifiers get wide scope by ‘‘mismatch-

ing’’ the structures at a later level, for example. But if this is possible

in general, then the question becomes, what makes the translation look

largely compositional? The answer has to be a combination of the fact

that the matching relation that happens to be in use in language is shape

conserving in the sense just mentioned, and the fact that the structures

defined in the two sublanguages are largely similar. It is straightforward

that if the two languages are fully isomorphic, then the result is indistin-

guishable from compositional translation. But if the structures defined in

the two languages are only slightly divergent, then the discrepancies be-

tween the two results might be infrequent and localized.

However, there is an interesting di¤erence between compositional

translation and matching translation that goes beyond these discrep-

ancies. In compositional translation, the translation of any given sentence

proceeds on the basis of that sentence by itself. But in the matching

theory, at least for the discrepant cases, the conclusion that b in language

B is the ‘‘best match’’ for a in language A cannot be determined by

looking just at a and b; instead, it must involve seeing what other struc-

tures are defined in languages A and B, insofar as there cannot be any-

thing that is a better match to a than b is. In this sense the matching

translation is ‘‘holistic’’: it matches the whole of language A to the whole

of language B in a way that cannot be broken down into the matching of

individual elements in A to individual elements in B.

In linguistics, syntactic transformation has traditionally been the means

of accounting for divergences of this kind, preserving compositionality.

For example: how can we compositionally determine the thematic struc-

ture of the verb see when its direct object is moved many clauses away?

Semantics in RT 241

Undo the transformation first; the transformation is responsible for the

distorted picture of thematic structure in surface structure. But I have

argued in specific cases (quantifier scope, heavy NP shift, scrambling,

etc.) that movement is not the correct account; rather, it is interlevel

holistic matching of structures.

9.1.2 Compositionality and the Place of Semantics in RT

We can think of RT as involving two di¤erent representational situations.

In one, a structure represents another piece of syntax, and in the other, it

represents a piece of semantics. To take one example, SS representing CS

is a case of syntax representing syntax, and SS representing QS (¼ TopS)is a case of syntax representing a semantic structure. To take another ex-

ample, in chapter 2 I analyzed a particular kind of linguistic variation as

arising from the way di¤erent languages resolve a conflict between a case

of ‘‘structural’’ representation and a case of ‘‘semantic’’ representation.

The formulas for English and German scrambling are these:

(1) English favors SScCS over SScQS (¼ TopS).German favors SScQS (¼ TopS) over SScCS.

And in both languages the possibility of SScFS can neutralize the dif-

ference. From this mechanism I derived the fact that English requires

elements following the verb to maintain a strict order that only focusing

e¤ects can disrupt, whereas in German scrambling is obligatory, except in

the face of some focusing e¤ects.

The model implicit in the above discussion is not the linear representa-

tion model, but a model in which there are three levels that SS must rep-

resent, namely, CS, QS (¼ TopS), and FS.(2) CS ‘ SS c FS

b

QS (¼ TopS)In such a model we can talk about the competing representational

requirements that these three peripheral structures place on SS.

In what follows I want to bring the model back in line with the linear

representation model, yet allow for the representational competition that

the results in chapter 2 depend on. But at the same time I want to model

certain other phenomena involving focus, which will make the model

in (2) unworkable. Furthermore, I want to develop a sense of the gross

architecture of the entire model, instead of simply adding a new level

represented by SS every time a new descriptive problem presents itself.

242 Chapter 9

The project begins with the previously mentioned, possibly indefensi-

ble, categorization of the levels into ‘‘semantic’’ and ‘‘syntactic.’’ In some

linguists’ view, all representations (i.e., ‘‘structures’’) are syntactic. But

some are intuitively more semantic than others: the representation that

unambiguously displays the scope of quantifiers is more semantic than

the representation that displays structural Case relations. But there is

another way to describe the di¤erence between two kinds of representa-

tions: one kind lies directly on the path to spell-out, and the other kind

does not. So, in the model that was the basis for the early part of the

book, CS and SS were indubitably on the way to spell-out, and QS cer-

tainly was not, given the existence of in-situ ambiguous quantifiers in the

output of English pronounced sentences.

In this light, FS itself is a fudge. FS consists of at least the two dif-

ferent elements, ‘‘display of primary sentential accent’’ and ‘‘display of

most salient new information.’’ These two di¤erent notions are clearly

related—but how?

We can begin to form a new model by identifying certain representa-

tions as ‘‘semantic’’: TS, QS, FS. These will not be on the main line to

spell-out. The other representations will be: CS, PS, SS, and AS (Accent

Structure, which displays the accent structure of the utterance). The main

line from CS to AS will be a linear series of representation relations, as

follows:

(3) CS ‘ PS‘ SS ‘ AS

We must also add the interpretive representations. Clearly, di¤erent syn-

tactic levels are relevant for di¤erent aspects of interpretation; for exam-

ple, AS is relevant for focus, but CS may not be. A simple scheme would

be to associate each of the interpretive levels with one of the syntactic

levels.

(4) TS ?S QS (¼ TopS) FS

��! ��! ��! ��!

CS‘PS‘ SS ‘AS

In general, representational conflicts at a given level will arise between

the interpretive level and the structural representational demands on that

level. Whether there are further conflicts will be taken up in the next

section.

This model permits the chapter 2 analysis of English and German,

though now the analysis is cast in slightly di¤erent terms. English favors

CS‘ SS over QS (¼ TopS)‘ SS, and German favors the reverse.

Semantics in RT 243

Moreover, the e¤ects of focus can be factored in by taking AScFS

fidelity into account, in that it can tip the balance back to parity in the

otherwise lopsided representational conflict. (Review chapter 2 for the

empirical basis of the English/German di¤erence, and see section 9.3 for

further analysis of the AS, SS, FS system.)

The model in (4) suggests that in general, since each syntactic level

represents both another syntactic level and an interpretive level, repre-

sentational conflicts will arise between these two. Whether there are fur-

ther sources of conflicts will be taken up in the next section.

Although it is compatible with the findings of this book, and in fact

makes them natural, the model in (4) raises questions about the linguistic

representation of meaning. Each interpretive level is separate from the

others, and there is no connection, no representation relation, between

them. Each of them is an aspect of LF, in the usual sense, exactly in the

sense that in RT each of the syntactic levels is an aspect of the syntax of

a clause. But how are these di¤erent aspects related to one another? The

theta structure of a clause will display the theta relations of the verb in

relation to the verb, and the quantification structure will display the

quantificational structure, but what is the relation between the two? One

wants to know which argument of the verb is quantified in which way.

To take a concrete example, consider a focused definite NP agent of a

verb. Its agentivity is represented in TS, its definiteness in CS, and its

focused property in FS, but how are all these facts related to each other?

The obvious answer is representation. Although I have spoken of repre-

sentation as relating whole structures to whole structures, in doing so it

relates parts of structures to parts of structures. For example, an internal

argument of V in TS will be mapped to a Case-marked accusative in CS,

and so forth, all the way to a focused constituent in AS/FS. We can thus

speak of an NP that is Case-marked, theta-marked, and focused only by

taking into account all of these levels and how they are related to one

another by shape-conserving representation.

We can even define a relation between the theta role an object receives

and its scope, even though these will not be in any direct chain of repre-

sentation, because there will be an induced representation relation that

holds between them, by virtue of the representations that the model does

express directly.

(5) TSn ?SnQS (¼ TopS)n FSn n n n

CS ‘ PS ‘ SS ‘ AS

244 Chapter 9

The representations symbolized by the long arrows are induced by the

representations induced by the short arrows, in the fashion described be-

fore. Some NP in AS represents a focused NP in FS, and that NP repre-

sents an NP in SS, which represents . . . some NP in TS, and so there is an

indirect relation between FS and TS, and also between particular NPs in

FS and NPs (or whatever arguments are) in TS.

For example, consider the following TS/QS pair, with the obvious

head-to-head matches:

(6) TS: [boy]agent [V [girl]patient]

QS: [some boy]QP [V [every girl]QP]

The natural isomorphism will match the agent in TS to the preverbal QP

in QS; this will result in the further match between boy and boy. Some

will not be matched, as it makes its ‘‘first’’ appearance in SS and QS. Boy

occurs in both TS and QS; in TS it is agent, and in QS it is (head of ) a

quantified NP. The full interpretation of [some boy] in QS and later levels

will be some function of the interpretation of [boy] as agent of V. And

so on.

If the matching between levels were always isomorphic, then the

induced isomorphism could be established directly, abridging the repre-

sentation circuit. But owing to the existence of misrepresentation, the

induced representation must make essential use of the chain of represen-

tation relations to establish the relation between TS and QS.

But nothing is changed when mismatching occurs. Recall that English

favors SS as a representation of CS over QS, and so surface structures

with two quantification structures are ambiguous, in one instance mis-

mapping the two Case structures by crossing.

(7)

Here, as before, representation provides the relation between the quanti-

fied NPs and their images in TS.

This aspect of semantics in RT is nothing other than the ‘‘higher equals

later’’ correspondent of compositionality in standard Checking Theory

practice. That is, representation replaces domination for functional

Semantics in RT 245

embedding. For example, in RT an accusative represents a patient; in

standard practice it functionally dominates it.

9.2 Blocking in Semantics

The blocking principle in general prevents multiple representation; that is,

the following situation is not allowed:

(8) *x‘ ys1‘ ys2

(8) corresponds to the notion that ‘‘nature hates a synonymy’’—there

cannot be a di¤erence in form without some di¤erence in meaning. If x is

a concept and ys1 and ys2 are words, then (8) is the notion of synonymy

that holds in the lexicon, and especially in inflectional morphology, where

it is understood that variant forms like sneaked/snuck cannot coexist in

the same grammar. The blocking principle, thus construed, has been

shown to be an operative constraint on language acquisition (Pinker

1984); it is what drives out *goed. It can also, and perhaps thereby, be

construed as a constraint on the form of a grammar.

As ordinarily understood, the blocking principle does more than forbid

synonymy; it says which of two forms is chosen to represent the given

meaning—namely, the one more specifically tailored for that meaning.

For example, were is the general form of the past tense of be, and was is

the form specific to the 1st singular past; although both was and were are

compatible with ‘‘1st singular past,’’ blocking dictates that only was can

express that notion, being most specifically fitted to it.

Although I do not think that the blocking principle is well understood,

I nevertheless regard the principle of Shape Conservation to be a case of

blocking in the sense just described. If ys1 and ys2 are both candidates to

represent x in (8), and if ys1 is more congruent to x than ys2 is, then ys1 is

‘‘more specific to’’ x than ys2 is, and must be chosen to resolve the syn-

onymy. In the simplest case (‘‘all else being equal’’) that should settle the

matter. But in fact, since di¤erent representational levels are connected to

di¤erent aspects of meaning, it is inevitable that blocking will not give a

determinate answer to the question of which of two forms is to be used to

represent, for example, a given theta structure.

For this reason the role of the blocking principle in the present con-

text is not straightforward. Such a principle is clearly required, but it is

not clear what phenomena fall under it. For example, we have analyzed

246 Chapter 9

HNPS as a case of ‘‘misrepresentation’’ between CS and SS, which exists

alongside the ‘‘true’’ representation; so, restricting ourselves to CS and

SS, (8) seems to be instantiated. But, as we saw in the discussion of

HNPS, this ‘‘misrepresentation’’ is accompanied by di¤erences in inter-

pretation at FS. So the blocking principle is upheld in the end, but in a

wider context, one that includes FS. Constructions that look synonymous

(the shifted and unshifted variants of HNPS cases) turn out to have dif-

ferent meanings at FS.

But this raises the question, what di¤erences can count as di¤erences

that license a representational synonymy? For, in the case of HNPS, the

TS‘CS representation does display representational synonymy; it is

only in a later representation that the focus-related di¤erence in meaning

arises. So it is natural to ask, is there any limit on the ‘‘delay’’ that can

occur between a representational synonymy and the di¤erence in meaning

that rescues it?

To put the question in concrete terms: Scrambling interacts with defi-

niteness in German, and other semantic classifications, in ways analyzed

in chapter 2. There, the Synonymy Principle was seen to be satisfied in a

direct way, in that the structure of the example discussed always looked

like (9).

(9)

Suppose surface structure1 and surface structure2 are the scrambled and

unscrambled representations of one and the same Case structure, as the

diagram illustrates. We know that in German this situation is correlated

with di¤erences in scope/topicalization aspects of interpretation repre-

sented in QS. So the CS‘ SS representation involves ‘‘synonymy,’’ but

the surface structures do not, as each surface structure in SS receives a

di¤erent interpretation in QS.

Semantics in RT 247

Now consider a di¤erent kind of case, one that in fact appears to model

known phenomena. Suppose that two di¤erent Case-marking systems

could represent one and the same theta structure, but with the same or a

related di¤erence in meaning as in the case of German scrambling; in

other words, scope, or specificity, or something else, turns on the di¤er-

ence. In such a case the sign of the di¤erence in meaning would be

‘‘remote’’ from the representation of the di¤erence in meaning itself: a

Case distinction would control a di¤erence in meaning two representa-

tions away, so to speak, as shown in diagram (10).

(10)

The licensing of the TS‘CS synonymy is ‘‘delayed’’ until QS.

On methodological grounds I suppose we should begin by disallowing

such cases, in that we would then have a much tighter idea about the

scope of the Synonymy Principle. With delayed licensing, we are saying

that any di¤erence in meaning can license any di¤erence in form. Without

delayed licensing, we can more narrowly specify how di¤erences in form

and di¤erences in meaning are related to one another: only di¤erences in

form and di¤erences in meaning that are in the same region of the model

can interact in this way. The actual predictions would of course depend

on the details of the model, but to take an extreme case, di¤erences in

Case marking (at the early end of the model) could not correspond to

di¤erences in information structure (at the late end).

To illustrate with a concrete case, consider Swahili object agreement

(OM indicates the object agreement a‰x).

(11) a. N-a-m-penda Juma.

I-tns-om-like Juma

‘I like Juma.’

b. *Napenda Juma.

c. N-a-ki-soma kitabu.

I-tns-om-read book

‘I read the book.’

d. Nasoma kitabu.

‘I read a book.’

248 Chapter 9

When the object is animate, object agreement is obligatory; but when the

object is inanimate, it occurs only with definites. If object agreement is at

the same level as Case assignment, then the pattern in (11) shows that

Swahili agreement for indefinites has TS‘ SC synonymy, not resolved

until QS, if QS is where definites and indefinites are sorted out.

This conclusion that delayed synonymy resolution is possible can be

averted by structuring the model di¤erently. For example, suppose that

QS represents the scope of quantifiers, as before, but that the definite/

indefinite distinction is established earlier—say, in CS. Then of course the

TS‘CS synonymy is resolved on the spot, and the more narrow con-

ception of how blocking enforces itself is possible.

I do not find myself in any position to resolve the question of delayed

licensing of synonymy. It is a question that does not translate easily into

standard minimalist practice with Checking Theory, and so deserves fur-

ther study in empirically distinguishing these two styles of modeling how

semantics is determined by syntactic form.

9.3 Kinds of Focus

9.3.1 IFocus and LFocus

The RT model just outlined provides an index to another set of related

entities, the di¤erent kinds of focus. Several kinds of focus, or focusing

e¤ects, have been cited in the literature: normal focus, contrastive focus,

and the focusing that occurs in special constructions like pseudocleft,

cleft, scrambling, and HNPS. I think this variety can be understood in

terms of mismappings between levels. If we look at the right-hand side of

the model as it now stands, we see several opportunities for mismatch.

(12)

The mismatch between SS and QS (¼ TopS) has already been discussedin chapter 2, and nothing said here will change the conclusions drawn

there. I will try to show that the way SS, AS, and FS relate to one another

Semantics in RT 249

can account for the variety of focusing e¤ects and can allow them, despite

their di¤erent properties, to be seen as part of a systematic whole.

I will begin by drawing attention to an only partly appreciated dimen-

sion on which types of focus can be di¤erentiated. The discussion that

follows depends on sorting them out clearly.

One kind of focus generates a propositional presupposition—that is, a

presupposition that some proposition is true. This sort of focus is found

in the cleft construction, for example.

(13) It was John who Bill saw.

(13) presupposes that Bill saw someone. I will call this kind of Focus a

Logical Focus (LFocus). I include in this type the answers to questions.

When the answer to a question is a whole sentence, the ‘‘real’’ answer

must be the focus of the sentence.

(14) A: What did you buy in New York?

B: I bought a RECORD in New York.

B 0: *I bought a record in New YORK.

The question answer focus is often cited as the core case of normal focus.

The other kind of focus is tied directly to the placement of main sen-

tence accent, but it does not involve anything propositional. For example:

(15) John wants a red hat and a BLUE hat.

The ‘‘presupposition’’ generated by focusing on BLUE is just the word

hat, and nothing bigger than that. One could try to extract a proposi-

tional presupposition from (15) (e.g., John wants an X-colored hat), but

that is an artifact of the particular example and is not possible in general.

(16) John compared the red hat to the BLUE hat.

There is no proposition out of which BLUE has been abstracted in (16).

Rather, BLUE is what I called a disanaphor in Williams 1997, and hat is

its paired anaphor; the requirement is that the disanaphor be di¤erent

from whatever stands in the same relation (‘‘ �R�!’’ in (17)) to the an-tecedent of the anaphor that the disanaphor bears to the anaphor.

(17) X

0disanaphor

�R�! �R�!

antecedent of anaphor

¼anaphor

This is the Disanaphora Principle proposed in Williams 1997, where it is

shown that the relation between hat and hat in (16) obeys general princi-

250 Chapter 9

ples of anaphora. The accent pattern, and the accompanying anaphoric

commitments, are essentially obligatory.

(18) *John compared the red hat to the blue HAT.

(The fact that (18) is not absolutely ungrammatical is a point to which I

will return.)

The most convincing examples showing that accent-induced Focus/

Presupposition structure has nothing to do with propositional presuppo-

sition comes from how telephone numbers are pronounced when they

include repeated digits (M. Liberman, personal communication).

(19) a. 258-3648b. *258-3648c. *258-3656d. 258-3656

Here again the pattern is obligatory, so long as the speaker groups the

digits in the usual way (3-2-2). Again, no propositional presupposition

is raised. The anaphora involved here takes ‘‘same digit’’ as the identity

condition in the domain in which that anaphora operates. As this kind of

focus pertains to what has been called the information structure of a sen-

tence, I will call it Information Focus (IFocus).

I will associate IFocus and LFocus with di¤erent levels in RT. As

LFocus for the pseudocleft construction involves wh movement, it cannot

occur any earlier than SS, and I will assume that it is defined in SS (or the

closely related QS). As IFocus involves the phonological accent pattern, it

is plausibly associated with AS, which itself determines FS (Information

Structure (IS)), resulting in the following diagram (the same as (12)):

(20) (QS‘)SS‘AS(cFS)

LFocus IFocus

We now have two notions of focus, so it is important to know how they

are related to each other. The answer is representation. That is, in the

normal situation, IFocus represents LFocus. Notice that the representa-

tion is not direct, but rather induced by the circuit.

Given a sentence with a nontrivial LFocus in SS, how is it represented

by AS?

LFocus and IFocus are similar in an important way. Each breaks up a

sentence into two parts: the Focus and the rest. We might suppose, then,

that matching up the structures on this basis would be a part of the natural

isomorphism between the two levels SS and AS, with the consequence

Semantics in RT 251

that, in the normal case, the IFocus and the LFocus would be identified

with each other. That is indeed what we find in the ‘‘unmarked’’ pronun-

ciation of cleft sentences.

(21) a. It was JOHN that Bill saw.

b. *It was John that Bill SAW.

It is also what we find in normal focus, as defined by question-answer

pairs.

(22) A: What did you buy in New York?

B: I bought a RECORD in New York.

B 0: *I bought a record in New YORK.

For the relation between IFocus and LFocus to be completely clear,

the full details of AS—and for that matter QS (¼ TopS)—must bedeveloped, and I will not do that here. I will make the smallest number of

assumptions possible. That is, AS generates a set of accent structures, and

in particular defines the notion ‘‘Accented Phrase’’ in a way that captures

its central property: for English, it appears that the Accented Phrase can

be any phrase that contains the main accent on a right branch. The fact

that in a right-branching structure a number of di¤erent phrases will

qualify is the phenomenon of Focus projection.

(23) I [want to [see [the man [in the [red HAT]]]]].

Any of the bracketed expressions in (23) can be the Accented Phrase in

AS, hence the IFocus in IS. The definition of Accented Phrase accounts

for Focus projection. The IFocus in FS will canonically map to the

Accented Phrase in AS.

SS likewise defines LFocus in some manner. At the worst, certain con-

structions, like clefts and sentential answers to questions, are specified as

determining an LFocus. In the natural isomorphism between the levels,

LFocus ¼ Accented Phrase ¼ IFocus. Correspondingly, the LPresuppo-sition of (21a) (Bill saw someone) and its IPresupposition (Bill saw t) are

matched as well.

What is odd about (21b), then, is that the IFocus is not identified with

the LFocus. An odd sentence, but not a truly ungrammatical one—it

simply has a very specialized use. We can use the machinery just devel-

oped to explicate that use.

When the natural isomorphism holds between SS and AS, the pairings

IFocus ¼ LFocus and IPresupposition ¼ LPresupposition result. The se-mantics is straightforward: the meaning of the IFocus is some function of

252 Chapter 9

the LFocus, and the meaning of the IPresupposition is some function of

the LPresupposition. But when the isomorphism is broken, as in (21b),

these identities do not hold. Instead, for (21b) the identities are these:

(24) It was John that Bill SAW.

SS: LFocus ¼ JohnLPresup ¼ Bill saw someone

PP: Accented Phrase ¼ SAWRest ¼ it was John that Bill X

IS: IFocus ¼ SAWIPresup ¼ it was John that Bill X

The IPresupposition here includes both the LFocus and (part of ) the

LPresupposition. It therefore cannot be identified with the LPresupposi-

tion—or, for that matter, with any other constituent in SS. Its meaning

therefore cannot be (a function of ) the meaning of the LPresupposition,

or the meaning of any subconstituent in SS. Rather, it must take the

whole surface structure (with both LFocus and LPresupposition) as its

value, but with the IFocus abstracted out.

(25) [saw]IFocus [[John]LFocus [Bill Xed someone]LPresup]IPresup

(25) shows how AS represents SS, but without the natural isomorphism.

It is because of the nonisomorphism that (21b) has such a special-

ized use. Normally, the LFocus is not IPresupposed. In this example it

is; in fact, a particular LFocus:LPresupposition pair is IPresupposed.

Under what circumstances would this be appropriate? Only if that

LFocus :LPresupposition pair had occurred together in recent previous

discourse. But that could really only be the case if something like (26A)

preceded (21b).

(26) A: It was JOHN that Bill heard.

B: No, it was John that Bill SAW.

The narrow circumstance in which this sort of IPresupposition is pos-

sible is what gives examples like (21b) their metalinguistic or ‘‘corrective’’

flavor. In ordinary terminology, the focus on saw in (21b) would be called

contrastive focus and would be given a separate theoretical treatment, or

at least the promise of one. But in fact, many of the things that are true of

focus in general are true of contrastive focus as well, and there is there-

fore much to lose in not giving them a common account. For example,

the rules for determining Focus projection are the same for both con-

trastive focus and normal focus, as the following examples show.

Semantics in RT 253

(27) a. It was John that Bill SAW in the morning.

b. It was John that Bill saw in the MORNING.

c. A: What did you do to John?

B: I SAW him.

d. A: What happened?

B: Bill saw John in the MORNING.

In (27a) the contrastive focus is narrow, just as the normal focus is in

(27c); likewise, in (27b) the contrastive focus is potentially broad, just as

the normal focus is in (27d). Such parallels compel us to treat contrastive

and normal focus by the same mechanisms, which include the identifica-

tion of the IFocus and the relation of IFocus to the Accented Phrase.

In addition, when a language has left-accented Focuses, as Hungarian

does, the left accenting holds for both normal and contrastive focusing.

But of course a di¤erence must be drawn somewhere. In the present

scheme it is drawn in the relation of SS to AS, and specifically in the re-

lation of the IFocus to the LFocus—when IFocus represents LFocus, we

get normal focusing, when it doesn’t, we get contrastive.

An important element in this explanation is that LFocus is subordi-

nate to IFocus. This is shown by the fact that LFocus can wind up in the

IPresupposition, but the reverse can never happen, because of how AS

and SS relate to one another. In other words, it is not enough to say of

(28B) that it has two Focuses. The following exchange will always be

impossible:

(28) A: JOHN saw Bill.

B: *No, it was Sam that JOHN saw.

Here, speaker B has attempted to correct speaker A, but has chosen the

wrong focus strategy to do it: he has preserved speaker A’s main Focus

as an Accented Phrase and has added his own correction as an LFocus

di¤erent from the Accented-Phrase-defined Focus. A theory that assigns

triggering features to Focuses does not thereby explain this particular

asymmetry, even if it assigns di¤erent features to the two. RT distin-

guishes them by virtue of the asymmetric relation between levels and the

fact that they are located in di¤erent levels.

9.3.2 Copular Inversion and Focus

The apparatus developed here can unravel some of the intricacy of

copular constructions. Copular sentences with two NPs show a complex

254 Chapter 9

interaction among IFocus, LFocus, and referentiality. Such sentences

usually have inverted and uninverted variants.

(29) a. John is the mayor.

b. The mayor is John.

From small clause constructions, we know that one of these orders is

more basic.

(30) a. I consider John the mayor.

b. *I consider the mayor John.

I will assume that the ‘‘narrower’’ term (John) is the subject of the sen-

tence in some sense of subject relevant to a level prior to SS or to SS itself,

the earliest level in which LFocus and IFocus are defined; I will then refer

to the order in (29a) as the subject order (see Williams 1997 for fuller

discussion, but in a di¤erent theoretical context). Both (29a) and (29b) are

grammatical with final accent; however, they diverge if the accent falls on

the initial NP.

(31) a. JOHN is the mayor.

b. *The MAYOR is John.

Like some previous examples, (31b) is not ungrammatical; rather, it is

restricted to ‘‘corrective’’ contexts. We may gain some understanding of

(31) if we assume that the order in (31a) is the subject order, but the order

in (31b) is not. Then the pattern in (31) is just the familiar pattern we

have seen for HNPS, and the logic of (31) is, ‘‘Invert to deliver a canoni-

cal (final) Focus, but not otherwise.’’

The two orders show a surprising further di¤erence in relatives and

questions.

(32) a. I wonder who is the mayor?

b. I wonder who the mayor is?

c. I met the person who is the mayor.

d. *I met the person who the mayor is.

The intriguing contrast is (32b) versus (32d): since both involve wh

movement, it seems unlikely that the di¤erence has to do with movement

per se. Also, both have noncanonical (nonfinal) IFocuses, so the answer

does not lie there either.

But two plausible suppositions will su‰ce to explain the di¤erence in

the context of RT. First, suppose the inverted subject must be an LFocus;

Semantics in RT 255

and second, suppose that questions, but not relatives, have LFocus

‘‘pivots’’ (wh words). Then (32b) ‘‘compensates’’ for noncanonical sub-

ject order by establishing a canonical LFocus; but in (32d) there is

no corresponding compensation, so the noncanonical subject order is

unmitigated.

There is some evidence for both of the suppositions needed in this ex-

planation. First, questions do seem to raise a propositional presupposi-

tion of exactly the sort that would be given by identifying the pivot as the

LFocus. That is, (33a) seems to presuppose the truth of (33b).

(33) a. Who did you see?

b. You saw someone.

Second, there is some di¤erence in the presuppositions for inverted and

uninverted copular sentences, which I think the following examples bring

out:

(34) a. Bill thought that John was the mayor, but in fact the town had

no mayor.

b. ?Bill thought that the mayor was John, but in fact the town had

no mayor.

That is, the inverted form seems to carry a presupposition, ‘‘the mayor is

somebody,’’ which the uninverted form does not carry. (For further dis-

cussion, see Williams 1998a.)

9.3.3 Spanish LFocus

Having made the distinction between IFocus and LFocus, let us return to

a problem alluded to in chapter 2. It has often been noted that ‘‘answers

to questions’’ in Spanish are obligatorily clause final.

(35) A: Who called?

B: *JUAN llamo por telefono.

JUAN called

(Zubizarreta 1998, 76)

B 0: Llamo por telefono JUAN.

Even to discuss the problem, we must distinguish normal focus from

contrastive focus, because contrastive focus in Spanish is not subject to

the limitation just illustrated. However, distinguishing them risks losing

an account of all they have in common, as noted earlier in this chapter:

they have anaphoric commitments of the same kind, they both carry nu-

256 Chapter 9

clear stress internally, and so on. The IFocus/LFocus distinction allows

us to treat them separately without abandoning a common account of the

phenomena just described.

First, we will need to assume one of the conclusions reached earlier:

that questions, and their answers, involve LFocus of the answer, for rea-

sons already given—a question generates an LPresupposition, and the

response to the question carries forward the LPresupposition of the ques-

tion and substitutes the answer for the wh phrase as LFocus. Now we

may begin to approach the question of how Spanish focus works.

First, why must the answer to a question, which we have identified now

as the LFocus, be clause final in Spanish? We have already assumed that

in the canonical SS‘AS representation, the LFocus is mapped to the

IFocus. Let us further assume that the SS Focus is rightmost. The right-

ward positioning of the LFocus in SS arises from the requirement that SS

match QS; in other words, we will assume that the rightness requirement

originates in QS and propagates to SS under Shape Conservation. Right-

ness of LFocus will be enforced to the extent that SScQS is enforced. In

particular, if SScQS supersedes SScPS, then LFocuses will appear in

a rightward position, if possible.

Let us suppose that Spanish is such a language. Then we do expect the

behavior in (35): if the LFocus can be rightmost, then it must be right-

most. But other predictions are generated as well.

First, the LFocus will appear on the right only if the syntax allows

it. Since subjects can be postposed in Spanish, rightward positioning of

LFocused subjects is possible. But, as we saw in section 2.7, there are sit-

uations in which such postposing is impossible.

(36) A: Con

with

quien

who

llegaron

arrived

enferma?

sick

‘Whoi did he arrive with sicki?’

B: Llegaron con MARIA enferma.

B 0: *Llegaron enferma con MARIA.

Example (36) is significant in sorting out theoretical treatments of

focusing e¤ects. In RT (36B) is grammatical precisely because (36B 0) isnot. (36B 0) is not grammatical because it is not an available structure inthe relevant level of representation (SS in the present context). Therefore,

(36B) is the closest match to the quantification structure, and so even

though it mismatches on the positioning of the Focus, it is the best match,

hence grammatical (though it is judged slightly worse than a ‘‘normal’’

Semantics in RT 257

answer in which postposing is possible). So, the best match wins, even

when the best match is a bad match. I must stress that I do not have an

account for why postposing is not allowed in these cases, only for why

nonfinal Focuses are acceptable when postposing is not allowed.

In a Checking Theory account of such structures, (36B) is mysterious.

If there is a focus feature that must be checked in Spanish, resulting in

obligatory postposing of the subject, why is that feature not left unsatis-

fied in (36B), making the sentence ungrammatical? That is, the gramma-

ticality of (36B) cannot be understood in terms of the ungrammaticality

of (36B 0), because an unchecked feature is an unchecked feature.In RT, English di¤ers from Spanish in two ways. First, English does

not allow subjects to be postposed. I assume this is due to a di¤erence in

the constitution of SS (or PS). Second, English does not favor FSc SS

(or derivatively FScQS), in that LFocuses are tolerated in nonfinal

position, as we saw earlier. I assume these are independent di¤erences

between the two languages. If so, then there is room for other language

types—specifically, for a language that strongly favors LFocus in right-

most position, but without subject postposing. Such a language would

treat subject LFocuses in the same way that English does, that is, in situ;

but in the VP, where reordering is possible, not putting the LFocus in

final position would be sharply worse than in English. French might be

such a language.

The second thing to understand about Spanish is why the rightward

positioning requirement is not imposed for contrastive focus. In short,

because contrastive focus does not involve an LFocus. The LPresupposi-

tion is a presupposition of truth, and, as we saw in section 9.3.2, it is not

relevant to the general case of contrastive focus.

(37) I prefer the red book to the [BLUE]IFocus book.

The same notion of IFocus is applicable to both contrastive and normal

focus, but the rightward positioning requirement for answers stems from

the syntax of LFocus in SS, not from IFocus, and so has no e¤ect on

examples like (37) or like Zubizarreta’s (1998, 76) (see (57) in chapter 2).

(38) JUAN

JUAN

llamo por telefono (

called

no

not

PEDRO).

PEDRO

Here the Focus is an IFOCUS and is not extraposed even though

extraposition is possible.

The focusing in (38) involves no truth presupposition, insofar as saying

JUAN called does not presuppose the truth of someone called. It presup-

258 Chapter 9

poses that x called has occurred in the discourse already; but that is

nothing more than to say that x called is an anaphor, not that it is true.

(39) Mary didn’t call; but JUAN called.

The anaphor called is licensed by Mary didn’t call, even though that

clause explicitly denies that Mary called and gives no indication that

anyone else did.

9.3.4 Hungarian Focus

As we saw in chapter 2, Hungarian focus structure is Focus initial, in

that the Focus precedes all nontopicalized clause elements, including the

subject.

(40) Hungarian focus structure

Topic . . . Topic F [V . . . ]

Hungarian di¤ers in this way from the languages we have considered

so far—English, Spanish, Italian. If this is correct, then Hungarian di¤ers

parametrically in how it structures one of the levels (FS), which tells us

that the levels themselves are not fully fixed universally. We will see that

languages can vary in two ways: not only in which representation rela-

tions they prefer over others, as in chapter 2, but also in how the levels

themselves are structured. RT will then di¤er from other theories in

having a nonuniform source of variation—Checking Theory reduces all

variation to strength of features, Antisymmetry reduces all variation to

remnant movement; Optimality Theory reduces all variation to reorder-

ing of constraints. For some this might be enough to put RT out of the

running, but surely that conclusion is premature.

Hungarian di¤ers from English in another way: the Focus itself must

be initially accented. In (41) the Focus can be any of the underlined

constituents.

(41) Janos [a TEGNAPI cikkeket] olvasta.

Janos the yesterday’s articles read

‘Janos read yesterday’s articles.’

(Kenesei 1998, as reported in Szendroi 2001)

Hungarian and English thus di¤er on two parameters: Is neutral Focus

position on the left or the right? and Is the focused constituent left

Semantics in RT 259

accented or right accented? If it turns out that all languages are of either

the Hungarian or the English type, then I will be deeply embarrassed, as I

have constructed a theory in which there are four possible language types,

including as well, for example, languages where the left-accented Focus

occurs on the right periphery, and the reverse.

(42) a. [ . . . [Accent . . . ]]

b. [[ . . . Accent] . . . ]

I frankly cannot think of a natural scheme to tie these two parameters

together as one. In RT in particular it would be di‰cult to coordinate

them, as they govern di¤erent levels: the accent placement parameter

governs AS, and the left versus right placement of Focus itself is a feature

of FS. For these reasons I hope the two parameters do not turn out to be

linked empirically.

Furthermore, there is a little evidence, from English and symmetrically

from Hungarian, suggesting that they are independent. Both English and

Hungarian have nonperipheral Focuses, and those Focuses are accented

like their peripheral counterparts.

(43)

In Hungarian, noninitial Focuses are allowed only as second Focuses, as

single Focuses must move to initial Focus position.

These examples show that the internal placement of the accent is inde-

pendent of whether the Focus is peripheral or not, suggesting that the in-

ternal placement is independent of the external distribution and in turn

that languages with the parameters set as in (42) are to be expected.

In sum, then, it appears we might say that universally, (a) Focuses are

either left or right accented, as a part of the definition of the Accented

Phrase in AS; (b) the principal constituent of FS is located either left-

peripherally or right-peripherally in the structures defined there; and (c)

the AS is mapped to the FS under Shape Conservation.

260 Chapter 9

9.4 Ellipsis in RT

In section 9.3 we had call to identify the complement of an IFocus as an

‘‘anaphoric’’ IPresupposition. In fact, IPresupposition is a poor term,

since, as shown there, there is no presupposition in the sense of a propo-

sition with a truth value. Finding anaphora operating in AS, in the form

of destressing, suggests revisiting the theme of chapter 4, where it was

shown that di¤erent reflexive anaphors occupy di¤erent RT levels, with

predictably di¤erent properties. Are there other kinds of anaphors that

can be ‘‘indexed’’ according to the RT levels?

A good candidate is the family of ellipsis rules. English and other

languages display several kinds of ellipsis, with puzzling di¤erences in

behavior. I think some of these properties, particularly involving di¤er-

ences in locality, can be explained by locating them at di¤erent RT levels.

English, as well as other languages, has an ellipsis rule that deletes

everything but a single remnant constituent.

(44) John wants to build Mary a tree house on Friday, and

Samnom; too

Samacc; too

a co‰n; too

on Sunday; too

8>><>>:

9>>=>>;.

Although (45) is potentially ambiguous, given a particular focus, its in-

terpretation is fairly well fixed.

(45) Bob saw BILL, and Pete too.

¼ and Bob saw Pete0 and Pete saw Bill

This is exactly what we would expect if the construction in question were

interpreted in AS. The interpretive layer of AS (FS) partitions a sentence

into IFocus and IPresupposition, and it is the IPresupposition, and only

the IPresupposition, that is used as the antecedent. For this reason I will

refer to this kind of ellipsis as Focus ellipsis.

Fixing Focus ellipsis at AS—that is, very late—suggests that it will be

highly nonlocal. In particular, it suggests that the ellipsis site itself can

span CP boundaries, which does indeed seem possible (elided elements

are struck through).

(46) Someone thinks that Bill likes fruitcake, and

Someone thinks that Pete likes fruitcake too

Semantics in RT 261

VP ellipsis presents quite a di¤erent picture. VP ellipsis seems intrinsi-

cally bound up with the notion of subjecthood we have associated with

PS: the elided material is always interpreted as a predicate that takes the

remnant of ellipsis as its subject.

(47) Sue likes oats in the morning and John does too.

Since VP deletion is licensed (first) in PS, we would expect it to be im-

mune to the identification of the Focus, and this seems largely true.

(48)John saw MARY

JOHN saw Mary

� �and then BILL did too.

The anaphora is compatible with any choice of Focus. Not only can

the main accent be located anywhere; in addition, wherever it is, Focus

projection is possible without a¤ecting the interpretation of the ellipsis.

Again, this is what would be expected if VP ellipsis were adjudicated

in PS, before AS. Moreover, the availability of ‘‘strict’’ versus ‘‘sloppy’’

readings does not turn on focus structure, as the following examples

show:

(49) a. JOHN likes his mother, and so does BILL.

b. John likes his MOTHER, and so does Bill.

c. i. Bill likes Bill’s mother (sloppy)

ii. Bill likes John’s mother (strict)

Both (49a) and (49b) have both readings in (45), despite having di¤er-

ent accent structures. (See Williams 1974 or Fiengo and May 1994 for

accounts of the strict and sloppy readings.)

What is invariant about VP deletion is the relation of the ellipsis to

what remains. The VP is a predicate on the subject that remains, and it

is on the basis of this that the strict/sloppy readings are sorted out—the

ambiguous pronoun bears an ambiguous relation to the subject.

Focus ellipsis bears the relation IPresupposition to the IFocus that

remains undeleted; therefore, in both cases the target of the ellipsis is ap-

propriate to the level at which it takes place. Focus ellipsis also shows

strict/sloppy identity ambiguities.

(50) a. Sam told JOHN to buy his mother a present, and PETE as well.

b. i. Sam told Pete to buy John’s mother a present

ii. Sam told Pete to buy Sam’s mother a present

Appropriately, the ambiguity lies in how the pronoun relates to the rem-

nant of the ellipsis, in this case, the Focus; as a result, all else being equal,

262 Chapter 9

Focus ellipsis behaves in a way parallel to VP ellipsis. For both VP ellip-

sis and Focus ellipsis, we can imagine the sort of account put forward

in Williams 1974, wherein the deleted material bears an ‘‘abstraction’’

relation to the remnant material. In the case of VP ellipsis the abstrac-

tion is the abstraction inherent in the subject-predicate relation; in the

case of Focus we can easily imagine that the same kind of abstraction is

involved.

(51) a. John lx (x likes his mother)

b. John lx (Sam told x to buy his mother a present)

Then in both cases the ambiguity will lie in whether the pronoun takes as

its antecedent the lambda variable x (for the sloppy reading) or the argu-

ment of the lambda expression, John (for the strict reading).

The result so far is that the interpretation of the ellipsis, and in par-

ticular the behavior of the strict/sloppy ambiguity, turns on structures

needed independently: the articulation into subject and predicate in PS

for VP ellipsis, and the articulation into Focus and Presupposition for

Focus ellipsis.

However, this pretty picture is marred somewhat by the existence of

speakers who accept a wider class of sloppy readings for VP ellipsis. The

following sort of case is reported by Fiengo and May (1994):

(52) a. John’s father thinks that he will win, and Bill’s father does too.

b. i. Bill’s father thinks that John will win (strict)

ii. Bill’s father thinks that Bill will win (sloppy)

Fiengo and May develop a theory of sloppy identity that depends on a

general notion of ‘‘parallelism’’ that must hold in ellipsis sites; the sloppy

interpretation arises here because the relation between John’s and he in

the first clause of (52a) is structurally parallel to the relation between

Bill’s and Bill in (52bii).

The sloppy readings for examples like (52) are, I think, only marginally

available, and not at all for some speakers. But the mystery remains:

where do they come from? I think the focus structures of the examples

shed some light on the situation. Importantly, the success of sloppy am-

biguity that turns on antecedents other than subjects depends completely

on focus structure, as the following examples show:

(53) a. John’s father thinks he will win, and BILL’s father does too.

b. John’s father thinks he will win, and Bill’s MOTHER does too.

0Bill’s MOTHER thinks Bill will winc. *John’s father thinks he will win, and BILL’s mother does too.

Semantics in RT 263

(53b) does not have a sloppy reading, the one indicated beneath it. This

is clearly the result of Bill ’s not being the Focus of the second clause.

(53c) simply shows that given the context, BILL could not be the Focus,

because of the disanaphora conditions on focusing discussed earlier.

Two points will clarify the situation. First, for some speakers it appears

that sloppy identity for VP ellipsis is being licensed in exactly the manner

of Focus ellipsis: sloppy identity can turn only on the Focus. That is, the

ellipsis is being licensed by a structure that looks like this:

(54) BILL lx (x’s mother [thinks he will win])

This is a structure that arises in FS, not PS. So we might conclude that for

some speakers the sloppiness can arise in FS, not PS. This will also ex-

plain why (53b) does not have a sloppy reading; it does not qualify for

one in PS, because the sloppiness does not turn on the subject, and it does

not qualify for one in FS, because the sloppiness does not turn on the

Focus. So we may account for the phenomenon in (53) by supposing that

for some speakers VP ellipsis is licensed in FS, instead of (or actually, in

addition to) PS.

The most compelling reason that this picture must be essentially correct

is that even for speakers who allow Focus-anteceded sloppy identity for

VP ellipsis, focus plays no role when the licensing is subject-anteceded.

This can be verified in examples already given; for example, (53a,b),

which are repeated here, both have valid sloppy interpretations in which

the antecedent for the pronoun is Bill’s mother (the reading indicated in

(55c)).

(55) a. John’s father thinks he will win, and BILL’s father does too.

b. John’s father thinks he will win, and BILL’s MOTHER does

too.

c. Bill’s mother thinks that Bill’s mother will win.

What this means is that all speakers have access to the ‘‘core’’ case of

VP licensing—the one found in PS, where only subjects antecede elided

material, and where variations in focus structure play no role in the

availability of antecedents. So focus-based variation arises only when the

licensing takes place at FS.

Now let us apply this methodology to other ellipsis rules. English has

another ellipsis rule called gapping, a stylistically somewhat formal rule.

Gapping seems restricted to coordinated IPs; at least, that is what the

following paradigm suggests:

264 Chapter 9

(56) a. I think that John saw Mary, and Mary John.

b. *I think that John saw Mary, and that Mary, John.

This restriction suggests that gapping is defined on the level at which IPs

are defined, but not CPs—in other words, on something like PS. If that is

so, then gapping should be bounded by CPs not only as shown in (56),

but also as shown in (57).

(57) a. John thinks that Sue bought a dog, and Pete, a cat.

b. John wants to buy a dog, and Pete, a cat.

c. John wants Sue to buy a dog and Pete, a cat.

(57a) is grammatical, but it cannot mean ‘. . . and Pete thinks that Sue

bought a cat’; that is, the ellipsis cannot bridge the tensed complement

structure, but must be contained entirely within it.

(58) a. *[John thinks that Sue bought a dog] and [Pete thinks that Sue

bought a cat].

b. John thinks that [Sue bought a dog] and [Pete bought a cat].

c. John wants Sue to buy a dog and Pete, wants Sue to buy a cat.

d. John wants Sue to buy a dog and Pete wants a cat to buy a

dog.

The restriction follows if gapping is restricted to PS, where CP structure

has not yet been introduced. Of special interest is (58c), as the embedded

clause has a subject, but is not tensed. (58c) is slightly more di‰cult to

parse in the manner indicated. In fact, a di¤erent reading interferes, the

one indicated in (58d) (see Hankamer 1973 for discussion). But most

speakers accept (57b), particularly if the pause is made especially prom-

inent. If these discriminations are correct, they strongly confirm the

framework that predicts them. To summarize the prediction: from a fact

about the context in which gapping takes place (56), we infer the dis-

criminations in (57) and (58), discriminations we have no right to expect

in the absence of RT.

In all of the discussions of locality so far, I have given cases in which

the ellipsis slices into the complement—that is, deletes part of it. But

then what about cases in which the ellipsis includes the whole of the

complement?

(59) John said [that he was leaving]CP on Monday, and

Bill said [that he was leaving]CP on Tuesday.

Semantics in RT 265

In (59) an entire CP has been gapped along with the verb. But how is that

possible, if gapping occurs at a level where CP has not yet been intro-

duced? The answer must be something like this. At the point at which the

gapped structure is assigned an antecedent, which I will continue to sup-

pose is IP, the full CP structure has not been introduced in the antecedent

VP, but the gapping rule nevertheless establishes the antecedent relation

between the two VPs. (The relation is indicated here by coindexation.)

(60) John [said that]VPi on Monday and Bill [e]VPi on Tuesday.

At a later stage—say, SS—the full tensed CP is filled into the comple-

ment position in the first clause.

(61) John [said [that he was leaving]CP]VPi on Monday, and Bill [e]VPi on

Tuesday.

In the resulting structure [e]VP will be understood as having the whole VP

as its antecedent, including the CP.

Under this arrangement the rule licensing the gapped material does not

have access to the CP structure; but it does not need to have that access.

Therefore, it will still be impossible to delete a proper subpart of a com-

plement CP.

The final ellipsis rule I will consider in connection with the RT levels

is sluicing. Sluicing is triggered by the presence of wh phrases, so it is in-

evitable that it is licensed in SS, the level in which wh is defined.

(62) John likes someone, but I don’t know who [John likes t].

Given that sluicing is licensed in a structure in which CP has been intro-

duced, we expect that it can slice into CPs, and this appears to be so.

(63) John thinks that Mary will lie to someone, but I don’t know

who John thinks [that Mary will lie to t].

The residual preposition guarantees that the embedded clause has been

sliced into, and not simply deleted as a whole, which (as we saw in the

case of gapping) is irrelevant to evaluating locality.

9.5 The Semantic Values of Elements in RT Levels

An NP in TS corresponds to a pure theta role; an NP in higher levels

corresponds more and more to what we think of as a full NP—refer-

ential, quantificational, and so on. An NP in CS is a Cased NP; pre-

266 Chapter 9

sumably it is here, and possibly in later levels, that expletives enter. We

can then talk about the ‘‘history’’ of an NP as the series of objects at dif-

ferent levels that are put in correspondence under the isomorphic map-

ping that relates the levels to one another.

(64) TS: [dog . . . ]‘

CS: [dognom . . . ]‘

SS: [[every dog] . . . ]

The sequence dog, dognom, every dog is established by Shape Conserva-

tion.

Since presumably every NP has an image in every level, it might at first

seem di‰cult to distinguish the di¤erent levels. But in fact I think that

anaphors, as described in chapter 4, can give us some insight into the

di¤erences between the levels. Recall that anaphoric bindings are a part

of what Shape Conservation carries forward from one level to the next,

so that a coindexation (or its equivalent) established in an early level will

persist in later levels.

(65) TS: [dogi likes himselfi]‘

CS: [dognomi likes himselfi]‘

SS: [[every dog]i likes himselfi]

If the anaphor is assigned its antecedent in CS (for concreteness), then

that assignment is carried forward to SS by the Shape Conservation

mapping. Put in terms used in earlier chapters, the ‘‘antecedent’’ relation

commutes with the representation relation, in that, given an anaphor, the

image (under shape-conserving mapping) of the anaphor takes as its an-

tecedent the image of the antecedent of the anaphor.

But an anaphoric relation established in an early level may ‘‘mean’’

something di¤erent from an anaphoric relation established in later levels;

at least, that is what I will tentatively suggest in what follows.

For example, an anaphoric binding in TS binds two theta roles together

—two coarguments, or, as suggested in chapter 4, perhaps a somewhat

broader notion. One cannot coherently say that the two theta roles

‘‘corefer’’ since reference, in the sense of that property which, for exam-

ple, definite NPs have, is not a concept at that level. Theta roles in TS are

the actors, patients, and so on, that are the arguments of predicates, and

coindexing two theta roles says that they are ‘‘the same’’—that is, ‘‘iden-

tified.’’ This will translate into coreference in a later level—specifically, in

whatever later level the relevant notion of reference is operative. This at

Semantics in RT 267

least tells us that split antecedents are impossible at this level, as splitting

an anaphor implies some kind of substructure, and theta roles themselves

are indivisible at TS, that is, atomic. Later coindexings might be liable to

split antecedents, as at least the full notion of reference will have to allow

the sorts of relations that have been referred to as coreference, overlap in

reference, subsumption of reference, disjointness of reference, and so on,

and therefore will clearly allow the sort of structure that would support

split antecedence. By this thinking, then, we arrive at the notion that early

anaphors will not allow split antecedence, but late anaphors will.

This will be more than a way to simply classify anaphors, as we now

know some things about the behavior of early and late anaphors: early

anaphors will display sharp locality restrictions, will have a limited set

of admissible antecedents (in the A/A sense), and will always be trans-

parently reconstructed for by movement and scrambling relations. If it

turns out that these things also correlate with the possibility of having

split antecedents, then that becomes a strong cross brace in the empirical

underpinning of RT.

I have not carried out the broad empirical survey that would deliver

a sound decision on this speculation. It would be relevant to know, for

example, whether long-distance uses of Japanese zibun allow split ante-

cedents. But there is one suggestive indication that the correlations are

exactly as expected. It is well known that English clausemate, coargument

antecedents are not allowed to be split, and as I have already suggested,

these are CS or possibly TS anaphors, on the grounds of locality and

reconstructivity.

(66) *Johni told Maryj about themselves[i; j].

This fact is certainly consonant with my proposals; indeed, if it were false,

it would call into serious question the premise on which I am basing the

further predictions in this section. At the other end of the scale are ana-

phors of the kind discussed by Reinhart and Reuland (1993); as deter-

mined in chapter 4, these are defined at a late stage in the model, on

grounds of their lack of locality.

(67) John told Mary that at least Bill and himself would be there.

The question then is, can these anaphors be split? The following example

is relevant:

(68) Johni told Maryj that at least Bill and themselves[i; j] would be

invited to the party.

268 Chapter 9

If the judgment discriminating (66) and (68) is reliable, these examples are

encouraging, because in the absence of RT, there is no particular reason

that locality and target type should correlate with the possibility of split

antecedents.

If these two types of anaphors di¤er in this way, then we would expect

them to di¤er in reconstructivity as well: anaphors that do not allow split

antecedents would reconstruct, and anaphors that do allow split ante-

cedents would not. Although the following examples are the right kinds

of examples to make the point, I think they are complex enough that firm

judgments are not available; consequently, although the marks in (69) do

correspond to my own judgments, perhaps they should be read as only

the ‘‘predicted’’ judgments.

(69) a. What Johni saw t was himselfi dancing in the street.

b. *What Johni told Maryi that he saw on TV was Bill and

themselvesi dancing in the streets.

Of course, in order for (69b) to be relevant at all, it must be determined

that reconstruction is necessary in the first place; if the surface, unrecon-

structed configuration of the anaphor and its putative antecedents is

valid, then (69b) would be irrelevant to the question of reconstruction.

But I think the following example establishes that something like c-

command is necessary even for these sorts of reflexives:

(70) *Exactly when John told Mary to leave, I saw Bill and themselves

dancing in the streets on TV.

Controllable (null) subjects are like anaphors in dividing into two sorts,

one allowing splitting, and the other not; the former are traditionally

called obligatory control cases, and the latter, non–obligatory control

cases. Obligatory control cases take determinate local antecedents; non–

obligatory control cases take ‘‘arbitrary’’ and ‘‘inferred’’ antecedents.

As suggested in chapter 3, it is very likely that these two sorts of con-

trol correspond to di¤erent ‘‘sizes’’ of infinitives. Wurmbrand (1998) has

documented that this is the case in German. Applying the same reasoning

used earlier, the RT expectation is that the ‘‘smaller’’ the infinitive, the

earlier the control relation is established, and the less possibility there will

be for split antecedence. Again, some very clear cases suggest that this is

so. As discussed in chapter 3, no infinitive that clearly takes CP structure

shows the properties of obligatory control; likewise, such infinitives show

split antecedents.

Semantics in RT 269

(71) a. Non–obligatory control

Johni told Maryj [how [PRO][i; j] to save themselves]CP.

b. Obligatory control

*Johni promised Maryj [[PRO][i; j] to save themselves]CP.

(I have included a reflexive in both cases to guarantee the relevant con-

struals for the examples. The reflexive itself cannot be the locus of the

splitting or nonsplitting of antecedents, as it occupies (the whole of ) an

argument position and cannot be split; but such a reflexive can take as its

unsplit antecedent another NP that itself has split antecedents).

Not all non–obligatory control cases show overt CP structure, but at

least the ones that do behave exactly as expected, uniformly allowing split

antecedents. Conversely, obligatory control structures do not allow split

antecedents.

The anaphoric systems in other languages should reveal the same

pattern: long-distance anaphors should allow split antecedents, anaphors

with high locality have unsplit antecedents. In Japanese, for example, we

might expect zibun and zibunzisin to di¤er in exactly this way. However,

in checking the literature I have not found examples that unambiguously

demonstrate this, independently of the splitting involved in infinitival

control.

If I am putting RT to correct use here, in trying to rationalize the ‘‘split

antecedents’’ divide among anaphoric elements, then in fact that divide

must be the tip of the iceberg, as every pair of RT levels has the potential

to give rise to other, but related, kinds of distinctions. This will require

sorting out the RT levels more precisely than I have been able to do here.

An additional distinction is perhaps isolated in the following pair:

(72) a. John wants to win.

b. John wants himself to win.

First, I think these do not di¤er at all regarding the possibility of split

antecedents; in both cases the antecedent of the embedded subject is sim-

ply John. But another distinction has often been noted: namely, that (72a)

has the de se reading and (72b) does not. Partee (1971) caught one aspect

of this distinction in the contrast between the following pair, which di¤er

sharply in their meanings:

(73) a. Only John wants to win.

b. Only John wants himself to win.

270 Chapter 9

In standard theory this might be attributed to a di¤erence between PRO

and himself. RT at least o¤ers the opportunity to interpret the di¤erences

in another way. Significantly, the structure that gives rise to the de se

reading is ‘‘smaller,’’ and therefore earlier, than the one that does not.

In a related vein, RT levels can also be used to distinguish various

kinds of quantifier scope assignment. The first clue is to understand how

quantifier scope relates to various opportunities for reconstruction.

We know, for example, that wh movement reconstructs for quantifier

interpretation in some instances, and not in others, and in fact that NP

movement itself reconstructs for certain quantifiers. Wh reconstruction

for scope takes place in examples like this:

(74) How many people does John think Bill saw t?

This example is actually ambiguous between de dicto and de re inter-

pretations, which can be schematized as follows:

(75) a. John thinks [x many people [Bill saw t]] What is x?

b. [x many people] [John thinks [Bill saw t]] What is x?

(75a) represents the de dicto interpretation, which plausibly involves

a quantifier having scope in the lower clause; (75b) represents the wide

scope de re interpretation.

(75a) certainly suggests that, in RT, wh movement can occur later than

the construal of quantifiers like that many, by the theory’s general meth-

odology. In the working model I have adopted for this book, that is not

strictly speaking possible, but of course we might take SS to be an ab-

breviation of some number of levels in which this can be sorted out. Does

(75b) suggest that quantifier construal occurs after wh movement as well?

Quite possibly, I would guess, though not necessarily, as an embedded

quantifier could have wide scope without the benefit of wh movement.

When we turn to NP movement, we again find evidence for recon-

struction—what have been called quantifier lowering cases with raising

verbs.

(76) Someone seems to have been here.

a. for someone x, x seems to have been here

b. seems [for someone x, x to have been here]

In RT there will be no lowering; instead, there will be ordering. The con-

strual of the quantifier someone precedes NP movement; since NP move-

ment is associated with the level PS, quantifier construal must precede

Semantics in RT 271

that level. The conclusion that presents itself from the data examined thus

far is that quantifiers can be construed in any level; but in fact, quantifiers

di¤er regarding where they are construed.

(77) Not many boys are believed [t to have left].

a. not many boys [believed [t to have left]]

b. believed [not many boys [t to have left]]

Most speakers reject the narrow scope reading (77b). So, NP movement

seems to reconstruct for construal of someone, but not for construal of

not many. In RT this simply means that the levels (or range of levels)

at which these two quantifiers are construed are di¤erent: one before, one

after PS. In this regard RT mimics the findings of Beghelli and Stowell

(1997) under the ‘‘later equals higher’’ equivalence discussed in chapter 2.

That the existential is construed early is consistent with the fact that the

implicit quantification of suppressed arguments is interpreted as existen-

tial, and with extremely narrow scope:

(78) a. They weren’t attacked.

b. They weren’t attacked by someone.

c. not [bx [x attacked them]]d. bx [not [x attacked them]]

(78a) can only have meaning (78c), whereas (78b) can have meanings

(78c) and (78d). Perhaps the existential binding of implicit arguments

is accomplished at TS, thus explaining its generally narrow scope—

anything else will come later.

Splitting up quantifier construals between levels raises some technical

questions, to which I can at this point only stipulate arbitrary answers,

but I suppose I should do at least that if only to show that the project is

not incoherent. The general idea is this. As in earlier sections of this

chapter, we have seen that the interpretation of structures ‘‘accumulates’’

across levels. Just as with anaphoric bindings, then, scope assignments

that are established at earlier levels are preserved in later structure under

Shape Conservation.

Many questions remain unanswered. For example, why are some

quantifiers excluded from early construal, and presumably, some ex-

cluded from late construal? I have no specific ideas about this, though I

would of course note that it is a problem for the standard model as well.

It is particularly troublesome for Beghelli and Stowell’s (1997) model,

where quantifiers are assigned scope by moving them to preestablished,

272 Chapter 9

dedicated positions in functional structure. The question is, why are those

positions located where they are in functional structure?—essentially the

same question that arises under the already mentioned ‘‘higher equals

later’’ equivalence between the two styles of modeling the relation be-

tween syntax and semantics.

But even with so much in darkness, I am encouraged to try to extend

the LRT correlations of earlier chapters to questions of scope and ante-

cedence, so as to lock together an even more disparate array of properties

of syntactic relationships in a way I think is impossible in other models.

Semantics in RT 273


References

Abney, S. 1987. The English noun phrase in its sentential aspect. Doctoral dis-

sertation, MIT.

Anderson, S. 1982. Where’s morphology? Linguistic Inquiry 13, 571–612.

Anderson, S. 1992. A-morphous morphology. Cambridge: Cambridge University

Press.

Andrews, A. 1982. The representation of Case in Modern Icelandic. In J.

Bresnan, ed., The mental representation of grammatical relations, 427–503. Cam-

bridge, Mass.: MIT Press.

Babby, L. 1998a. Subject control in direct predication: Evidence from Russian.

In Z. Boskovic, S. Franks, and W. Snyder, eds., Formal Approaches to Slavic

Linguistics 1997: The Connecticut Meeting, 17–37. Ann Arbor: Michigan Slavic

Publications.

Babby, L. 1998b. Voice and diathesis in Slavic. Ms., Princeton University.

Bach, E. 1976. An extension of classical transformational grammar. In Problems

in linguistic metatheory: Proceedings of the 1976 conference at Michigan State

University, 183–224. East Lansing: Michigan State University, Department of

Linguistics.

Baker, M. 1985. The Mirror Principle and morphosyntactic explanation. Linguis-

tic Inquiry 16, 373–415.

Baker, M. 1996. The polysynthesis parameter. Oxford: Oxford University Press.

Barrett-Keach, C. N. 1986. Word-internal evidence from Swahili for Aux/Infl.

Linguistic Inquiry 17, 559–564.

Bayer, J., and J. Kornfilt. 1994. Against scrambling as an instance of Move-alpha.

In N. Corver and H. van Riemsdijk, eds., Studies on scrambling, 17–60. Berlin:

Mouton de Gruyter.

Beghelli, P., and T. Stowell. 1997. Distributivity and negation. In A. Szabolcsi,

ed., Ways of scope taking, 71–107. Dordrecht: Kluwer.

Benedicto, E. 1991. Latin long-distance anaphora. In J. Koster and E. Reuland,

eds., Long-distance anaphora, 171–184. Cambridge: Cambridge University Press.

Besten, H. den. 1976. Surface lexicalization and trace theory. In H. van Riems-

dijk, ed., Green ideas blown up: Papers from the Amsterdam Colloquium on Trace

Theory. Publications of the Linguistics Department 13. Amsterdam: University of

Amsterdam, Linguistics Department.

Bodomo, A. B. 1998. Serial verbs as complex predicates in Dagaare and Akan. In

I. Maddieson and T. J. Hinnebusch, eds., Language history and linguistic descrip-

tion in Africa. Vol. 2, Trends in African linguistics, 195–204. Trenton, N.J.: Africa

World Press.

Bok-Bennema, R. 1995. Case and agreement in Inuit. Berlin: Mouton de Gruyter.

Boskovic, Z. 1995. On certain violations of the Superiority Condition, AgrO, and

economy of derivation. Ms., University of Connecticut.

Boskovic, Z. 1999. On multiple feature checking. In S. D. Epstein and N. Horn-

stein, eds., Working minimalism, 159–187. Cambridge, Mass.: MIT Press.

Brody, M. 1997. Mirror theory. Ms., University College London.

Brody, M., and A. Szabolcsi. 2000. Overt scope: A case study in Hungarian. Ms.,

University College London and New York University.

Burzio, L. 1996. The role of the antecedent in anaphoric relations. In R. Freidin,

ed., Current issues in comparative grammar, 1–45. Dordrecht: Kluwer.

Chierchia, G. 1992. Functional wh and weak crossover. In D. Bates, ed., Pro-

ceedings of the 10th West Coast Conference on Formal Linguistics, 75–90. Stan-

ford, Calif.: CSLI Publications.

Chomsky, N. 1957. Syntactic structures. The Hague: Mouton.

Chomsky, N. 1973. Conditions on transformations. In S. Anderson and P.

Kiparsky, eds., A festschrift for Morris Halle, 232–286. New York: Holt, Rine-

hart and Winston.

Chomsky, N. 1982. Barriers. Cambridge, Mass.: MIT Press.

Chomsky, N. 1993. A minimalist program for linguistic theory. In K. Hale and

S. J. Keyser, eds., The view from Building 20: Essays in linguistics in honor of

Sylvain Bromberger, 1–52. Cambridge, Mass.: MIT Press.

Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press.

Cinque, G. 1998. Adverbs and functional heads. Oxford: Oxford University Press.

Cinque, G. 2001. ‘‘Restructuring’’ and functional structure. Ms., University of

Venice.

Collins, C. 1996. Local economy. Cambridge, Mass.: MIT Press.

Collins, C. 2001. The internal structure of verbs in Ju|’hoan and ¼jHoan. In A.Bell and P. Washburn, eds., Cornell working papers in linguistics 18. Ithaca, N.Y.:

Cornell University, CLC Publications.

Culicover, P., and W. Wilkins. 1984. Locality in linguistic theory. New York:

Academic Press.

Deprez, V. 1989. On the typology of syntactic positions and the nature of chains.

Doctoral dissertation, MIT.

276 References

Diesing, M. 1992. Indefinites. Cambridge, Mass.: MIT Press.

Di Sciullo, A.-M., and E. Williams. 1987. On the definition of word. Cambridge,

Mass.: MIT Press.

Fiengo, R., and R. May. 1994. Indices and identity. Cambridge, Mass.: MIT

Press.

Fodor, J. 1978. Parsing strategies and constraints on transformations. Linguistic

Inquiry 9, 427–474.

Fox, D. 1995. Economy and scope. Natural Language Semantics 23, 283–341.

Gill, K.-H. 2001. The long-distance anaphora conspiracy: The case of Korean.

Ms., University of Edinburgh.

Grimshaw, J. 1978. English wh-constructions and the theory of grammar. Doc-

toral dissertation, University of Massachusetts, Amherst.

Haegeman, L., and H. van Riemsdijk. 1986. Verb projection raising, scope, and

the typology of rules a¤ecting verbs. Linguistic Inquiry 17, 417–466.

Hankamer, J. 1973. Unacceptable ambiguity. Linguistic Inquiry 4, 17–68.

Harley, H. 1995. Subjects, events, and licensing. Doctoral dissertation, MIT.

Hoji, H. 1985. Logical Form constraints and configurational structures in Japa-

nese. Doctoral dissertation, University of Washington.

Hoji, H. 1986. Scope interpretation in Japanese and its theoretical implications. In

M. Dalrymple, J. Goldberg, K. Hanson, M. Inman, C. Pinon, and S. Wechsler,

eds., Proceedings of the 5th West Coast Conference on Formal Linguistics, 87–101.

Stanford, Calif.: CSLI Publications.

Holmberg, A. 1985. Word order and syntactic features. Doctoral dissertation,

University of Stockholm.

Huang, C.-T. J. 1982. Logical relations in Chinese and the theory of grammar.

Doctoral dissertation, MIT.

Kaplan, R., and J. Bresnan 1982. Lexical-Functional Grammar: A formal system

for grammatical representation. In J. Bresnan, ed., The mental representation of

grammatical relations, 173–281. Cambridge, Mass.: MIT Press.

Kayne, R. 1975. French syntax. Cambridge, Mass.: MIT Press.

Kayne, R. 1981. Two notes no the NIC. In A. Belletti, L. Brandi, and L. Rizzi,

eds., Theory of markedness in generative grammar, 317–346. Pisa: Scuola Normale

Superiore.

Kayne, R. 1994. The antisymmetry of syntax. Cambridge, Mass.: MIT Press.

Kenesei, I. 1994. The syntax of focus. Ms., University of Szeged.

Kenesei, I. 1998. Adjuncts and arguments in VP-focus. Acta Linguistica Hungar-

ica 45/1–2, 61–88.

E. Kiss, K. 1987. Configurationality in Hungarian. Dordrecht: Reidel.

E. Kiss, K. 1995. NP movement, operator movement, and scrambling in Hun-

garian. In K. E. Kiss, ed., Discourse configurational languages, 207–243. Oxford:

Oxford University Press.

References 277

Konapasky, A. 2002. A syntacto-morphological analysis of dependent heads in

Slavic. Doctoral dissertation, Princeton University.

Koopman, H., and A. Szabolcsi. 2000. Verbal complexes. Cambridge, Mass.:

MIT Press.

Koster, J. 1985. Reflexives in Dutch. In J. Gueron, H.-G. Obenauer, and J.-Y.

Pollock, eds., Grammatical representations, 141–167. Dordrecht: Foris.

Kuno, S., and J. Robinson. 1972. Multiple wh-questions. Linguistic Inquiry 3,

463–488.

Kuroda, S.-Y. 1970. Remarks on the notion of subject with reference to words

like ‘‘also,’’ ‘‘even,’’ or ‘‘only.’’ Annual Bulletin, vol. 3, 111–129; vol. 4, 127–152.

Tokyo: Research Institute of Logopedics and Phoniatrics.

Lako¤, G. 1972. On Generative Semantics. In D. Steinberg and L. Jakobovits,

eds., Semantics, 232–296. Cambridge: Cambridge University Press.

Landau, I. 1999. Elements of control. Doctoral dissertation, MIT.

Lasnik, H. 1999. Minimalist analysis. Oxford: Blackwell.

Lavine, J. 1997. Null expletives and the EPP in Slavic. Ms., Princeton University.

Lavine, J. 2000. Topics in the syntax of non-agreeing predicates in Slavic. Doc-

toral dissertation, Princeton University.

Mahajan, A. 1989. The A/A0 distinction and movement theory. Doctoral disser-tation, MIT.

Marantz, A. 1984. Grammatical relations. Cambridge, Mass.: MIT Press.

Matthei, E. 1979. The acquisition of prenominal modifier sequences: Stalking the

second green ball. Doctoral dissertation, University of Massachusetts, Amherst.

Moltmann, F. 1990. Scrambling in German and the specificity e¤ect. Ms., MIT.

Moortgat, M. 1988. Categorial investigations. Doctoral dissertation, University

of Amsterdam.

Muller, G. 1995. A-bar syntax: A study in movement types. Berlin: Mouton de

Gruyter.

Neeleman, A. 1994. Complex predicates. Doctoral dissertation, Utrecht

University.

Noyer, R. 1992. Features, positions, and a‰xes in autonomous morphological

structure. Doctoral dissertation, MIT.

Partee, B. 1971. On the requirement that transformations preserve meaning. In

C. Fillmore and D. T. Langendoen, eds., Studies in linguistic semantics, 1–21.

New York: Holt, Rinehart and Winston.

Pesetsky, D. 1987. Wh-in-situ: Movement and unselective binding. In E. Reuland

and A. ter Meulen, eds., The representation of (in)definiteness, 98–129. Cam-

bridge, Mass.: MIT Press.

Pica, P. 1991. On the interaction between antecedent-government and binding:

The case of long-distance reflexivization. In J. Koster and E. Reuland, eds., Long-

distance anaphora, 119–135. Cambridge: Cambridge University Press.

278 References

Pinker, S. 1984. Language learnability and language development. Cambridge,

Mass.: Harvard University Press.

Pollock, J.-Y. 1989. Verb movement, Universal Grammar, and the structure of

IP. Linguistic Inquiry 20, 365–424.

Postal, P. 1974. On raising: One rule of English grammar and its theoretical impli-

cations. Cambridge, Mass.: MIT Press.

Prinzhorn, M. 1998. Prosodic and syntactic structure. Ms., University of Vienna.

Reinhart, T., and E. Reuland. 1993. Reflexivity. Linguistic Inquiry 24, 657–720.

Richards, N. 1997. What moves where when in which language? Doctoral disser-

tation, MIT.

Riemsdijk, H. van. 1996. Adverbia en bepaaldheid. Ms., University of Tilburg.

Riemsdijk, H. van, and E. Williams. 1981. NP Structure. The Linguistic Review 1,

171–217.

Rivero, M.-L. 1991. Long head movement and negation: Serbo-Croatian vs.

Slovak and Czech. The Linguistic Review 8, 319–351.

Rizzi, L. 1982. Violations of the Wh-Island Constraint and the Subjacency Con-

dition. In Issues in Italian syntax, 49–76. Dordrecht: Kluwer.

Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press.

Roeper, T., and M. Siegel. 1978. A lexical transformation for verbal compounds.


Ross, J. R. 1970. On declarative sentences. In R. A. Jacobs and P. S. Rosenbaum,

eds., Readings in English transformational grammar, 222–272. Waltham, Mass.:

Ginn.

Rudin, C. 1988. On multiple questions and multiple wh-fronting. Natural

Language and Linguistic Theory 6, 445–501.

Saito, M. 1991. Long distance scrambling in Japanese. Ms., University of Con-

necticut, Storrs.

Saito, M. 1992. Long distance scrambling in Japanese. Journal of East Asian

Linguistics 1, 69–118.

Saito, M. 1994. Improper adjunction. In M. Koizumi and H. Ura, eds., Formal

Approaches to Japanese Linguistics 1, 263–293. MIT Working Papers in Linguis-

tics 24. Cambridge, Mass.: MIT, Department of Linguistics and Philosophy,

MITWPL.

Samek-Lodovici, V. 1996. Constraints on subjects: An optimality-theoretic anal-

ysis. Doctoral dissertation, Rutgers University.

Santorini, B. 1990. Long distance scrambling and anaphora binding. Ms., Uni-

versity of Pennsylvania.

Selkirk, E. 1982. The syntax of words. Cambridge, Mass.: MIT Press.

Steedman, M. 1996. Surface structure and interpretation. Cambridge, Mass.: MIT

Press.

References 279

Szabolcsi, A. 1996. Verb and particle movement in Hungarian. Ms., UCLA.

Szendroi, K. 2001. Focus and the syntax-phonology interface. Doctoral disserta-

tion, University of Southern California.

Timberlake, A. 1979. Reflexivization and the cycle in Russian. Linguistic Inquiry

10, 109–141.

Travis, L. 1984. Parameters and e¤ects of word order variation. Doctoral disser-

tation, MIT.

Ueyama, A. 1998. Two types of dependency. Doctoral dissertation, University of

Southern California.

Vanden Wyngaerd, G. 1989. Object shift as an A-movement rule. In P. Branigan,

J. Gaulding, M. Kubo, and K. Murasugi, eds., Student Conference in Linguistics

1989, 256–271. MIT Working Papers in Linguistics 11. Cambridge, Mass.: MIT,

Department of Linguistics and Philosophy, MITWPL.

Webelhuth, G. 1989. Syntactic saturation phenomena and the modern Germanic

languages. Doctoral dissertation, University of Massachusetts, Amherst.

Wilder, C. 1997. Some properties of ellipsis in coordination. In A. Alexiadou and

T. H. Hall, eds., Studies on Universal Grammar and typological variation, 59–107.

Amsterdam: John Benjamins.

Williams, E. 1971a. Small clauses in English. Ms., MIT.

Williams, E. 1971b. Underlying tone in Margi and Igbo. Ms., MIT. [Published

1976, Linguistic Inquiry 7, 463–484.]

Williams, E. 1974. Rule ordering in syntax. Doctoral dissertation, MIT.

Williams, E. 1977. Discourse and Logical Form. Linguistic Inquiry 8, 101–139.

Williams, E. 1980. Predication. Linguistic Inquiry 11, 203–238.

Williams, E. 1981a. Argument structure and morphology. The Linguistic Review

1, 81–114.

Williams, E. 1981b. Language acquisition, markedness, and phrase structure. In

S. Tavakolian, ed., Language acquisition and linguistic theory, 8–34. Cambridge,

Mass.: MIT Press.

Williams, E. 1981c. On the notions ‘‘lexically related’’ and ‘‘head of a word.’’


Williams, E. 1986. A reassignment of the functions of LF. Linguistic Inquiry 17,

265–299.

Williams, E. 1987. Implicit arguments, the binding theory, and control. Natural

Language and Linguistic Theory 5, 151–180.

Williams, E. 1991. ‘‘Why crossover?’’ Handout, colloquium presentation, MIT.

Williams, E. 1994a. Negation in English and French. In D. Lightfoot, ed., Verb

movement, 189–206. Cambridge, Mass.: MIT Press.

Williams, E. 1994b. Thematic structure in syntax. Cambridge, Mass.: MIT Press.

Williams, E. 1997. Blocking and anaphora. Linguistic Inquiry 28, 577–628.

280 References

Williams, E. 1998a. The asymmetry of predication. In R. Blight, ed., Texas Lin-

guistic Forum 38, 323–333. Austin: University of Texas, Texas Linguistic Forum.

Williams, E. 1998b. Economy as shape conservation. In Celebration: An electronic

festschrift in honor of Noam Chomsky’s 70th birthday. http://addendum.mit.edu/

celebration.

Williams, E. In preparation. The structure of clusters. Ms., Rutgers University.

[To be presented at NIAS/Collegium Budapest Cluster Study Group.]

Wiltschko, M. 1997. D-linking, scrambling and superiority in German. Groninger

Arbeiten zur germanistischen Linguistik 41, 107–142.

Wurmbrand, S. 1998. Infinitives. Doctoral dissertation, MIT.

Yatsushiro, K. 1996. On the unaccusative construction in nominative Case

licensing. Ms., University of Connecticut, Storrs.

Yip, M., J. Maling, and R. Jackendo¤. 1987. Case in tiers. Language 63, 217–

250.

Zubizarreta, M. L. 1998. Prosody, focus, and word order. Cambridge, Mass.: MIT

Press.

Zwart, C. J.-W. 1997. Morphosyntax of verb movement: A minimalist approach to

the syntax of Dutch. Dordrecht: Kluwer.

References 281


Index

A/A0 distinction, 72, 118–121, 171relativization of, 96, 121, 130–133Ablative absolute, 192–193Accent Structure (AS), 243, 251–261Adjective order, 153–154Adjuncts and X-bar, 61–62Adverb positioning, 44Anaphora, 95–116Antisymmetry, 19–21Arabic inflection, 217–218Assume Lowest Energy State, 163

Benedicto, E., 98–99Binding, 120Blocking in semantics, 10, 246–249Bracketing paradoxes, 5–8Bridge verbs, 69Bulgarian, 145–146, 154–157, 168Burzio, L., 112–113

Case structure, 13Case-preposition duality, 188–194CAT, 203–238Causativization, 66–67Checking Theory, 29, 35–36Cinque, G., restructuring verbs, 90–91functional structure, 201–202Complement-of relation, 179Complementizer agreement, 196Compositionality, 240–246Contraction, 163–164Control, 85–86, 269–270obligatory/optional distinction, 87–88Copular inversion, 254–256Countercyclic derivation, 70–71CS embedding, 67–69Czech verb clusters, 237–238

D-linking, 41, 144, 148–149Disanaphora Principle, 250

Dutch reflexive, 101–103Dutch verb clusters, 224–229

ECM as CS embedding, 15, 67–68, 105Ellipsis, 261–266Embedding, 25functional vs. complement, 59, 174–176,199–201

English auxiliary system, 222–224EPP subjects, 83–85Equidistance, 16Ergative case, 110–111Excorporation, 187–188Expletives, 68, 92Extension, 73, 114

Flip, 206–211Focus, 34IFocus vs. LFocus, 249–261normal and contrastive, 32, 249Focus ellipsis, 261Focus Structure (FS), 30–33FS embedding, 60–70Functional structure, 173

Gapping, 193–194, 264–266General Ban on Improper Movement(GBOIM), 72

General Condition on Scope, 22Georgian inflection, 217–218Germanrestructuring verbs, 89–91scrambling, 39–44, 119, 122–124, 126–129V2 vs. V-final, 78–79WCO, 143–145

Haegeman, L., and H. van Riemsdijk,(1986), 224–229

Head-complement relation, 11–12Head Movement Constraint, 171

Heavy NP Shift, 33–38Holmberg’s generalization, 17–19Hungarianfocus, 36, 259–260scope, 45–50scrambling, 160–165verb clusters, 229–237

IFocus, 249–261Improper movement, 71–75Induced representation, 244IPresupposition, 252–253

Japaneselong vs. short scrambling, 157–161reflexive, 97scope and scrambling, 124–126

Konopasky, A., 148–152Koopman, H., and A. Szabolcsi (2000),229–237

L-tous, 79–80Latin reflexive, 98–99Level Blocking Principle, 95 102Level Embedding Conjecture, 63–65Lexical-Functional Grammar, 22, 38Lexical Variation Hypothesis (LVH), 212,219

Lexicalism, 172–173, 202LFocus, 249–261Locality, 164–167Long topicalization, 130–132LPresupposition, 252–253LRT correlations, 59, 117–135

Mirror Principle, 15, 178, 199–203Mohawk inflection, 219–220Movement, 26, 62Multiple exponence, 194–196, 216Multiple-WH movement, 145–147

Navajo inflection, 221–222Nominative case, 109NP structure model, 118, 127

Optimality theory, 22, 38

Pan˙ini’s principle, 7, 10

Parallel movement, 139 141Predicate Structure (PS), 86 87, 106 112

QS, 30 33Quantifier interpretation andreconstruction, 42 43, 271 273

Quantifier scope, 42Quirky case, 81 83, 110 112

Reassociation, 188 194, 206 211Reconstruction, 117–135and quantifier interpretation, 271–273Reinhart, T., and E. Reuland (1993), 99–101, 104–106, 108

Relativized Minimality, 185–188Remnant movement, 19–21, 133–135Representation, 13–14asymmetry of, 60as homomorphism, 61model for, 23–24Richards, N., 140, 149–150, 158–159,166

Right node raising (RNR), 37Rule of Combination (RC), 204–205Russian subject position, 52–55, 83–85

Scrambling, 39–44, 117–135, 157–165long vs. short, 157–161masked, 161Selection 183Self-, 100Semantic compositionality, 242–246Semantic interpretation, 25Serbo-Croatian, 151–152verb clusters, 187–188, 237–238Serial verbs, 65–66Shadow, 175 177Shape Conservation, 5, 7–8, 15–23, 239–242, 246

Small clause theory, 63Southern Tiwa inflection, 219–220Spanish focus, 50–52, 256–259Spanning vocabulary, 214–215Split antecedents, 267–269SS embedding, 69–70Subcategorization, 203–204Subject auxiliary inversion, 191–192SubjectsEPP, 83–85quirky, 81–83and scrambling, 126–129Superiority, 140–145, 158–159Superraising, 77–78Swahili inflection, 220–221object agreement, 248–249Swiss German verb clusters, 224–229Synonymy Principle, 247Synthetic compounds, 9–13

Target, 95–96Theta Structure, 13Topic, 30–32Tough movement, 75–77TS Embedding, 65–67

Ueyama, A., 124–126

284 Index

V2, 78–79, 191–192Verb projection raising, 224 –229Verb-particle construction, 233Verbal modifier (Hungarian), 229–237VP ellipsis, 262–264

Weak Crossover, 141–145, 154West Flemish verb clusters, 224–229Wiltschko, M., 143–145Wurmbrand, S. (restructuring verbs), 89–90

X-bar theory, 175–185

Yip, M., J. Maling, and R. Jackendo¤(1987), 109–112

Yuman (Lakhota, Alabama) inflection, 222

Index 285


Current Studies in Linguistics

Samuel Jay Keyser, general editor

1. A Reader on the Sanskrit Grammarians

J. F. Staal, editor

2. Semantic Interpretation in Generative Grammar

Ray Jackendo¤

3. The Structure of the Japanese Language

Susumu Kuno

4. Speech Sounds and Features

Gunnar Fant

5. On Raising: One Rule of English Grammar and Its Theoretical Implications

Paul M. Postal

6. French Syntax: The Transformational Cycle

Richard S. Kayne

7. Panini as a Variationist

Paul Kiparsky, S. D. Joshi, editor

8. Semantics and Cognition

Ray Jackendo¤

9. Modularity in Syntax: A Study of Japanese and English

Ann Kathleen Farmer

10. Phonology and Syntax: The Relation between Sound and Structure

Elisabeth O. Selkirk

11. The Grammatical Basis of Linguistic Performance: Language Use and

Acquisition

Robert C. Berwick and Amy S. Weinberg

12. Introduction to the Theory of Grammar

Henk van Riemsdijk and Edwin Williams

13. Word and Sentence Prosody in Serbocroatian

Ilse Lehiste and Pavle Ivic

14. The Representation of (In)definiteness

Eric J. Reuland and Alice G. B. ter Meulen, editors

15. An Essay on Stress

Morris Halle and Jean-Roger Vergnaud

16. Language and Problems of Knowledge: The Managua Lectures

Noam Chomsky

17. A Course in GB Syntax: Lectures on Binding and Empty Categories

Howard Lasnik and Juan Uriagereka

18. Semantic Structures

Ray Jackendo¤

19. Events in the Semantics of English: A Study in Subatomic Semantics

Terence Parsons

20. Principles and Parameters in Comparative Grammar

Robert Freidin, editor

21. Foundations of Generative Syntax

Robert Freidin

22. Move a: Conditions on Its Application and Output

Howard Lasnik and Mamoru Saito

23. Plurals and Events

Barry Schein

24. The View from Building 20: Essays in Linguistics in Honor of Sylvain

Bromberger

Kenneth Hale and Samuel Jay Keyser, editors

25. Grounded Phonology

Diana Archangeli and Douglas Pulleyblank

26. The Magic of a Common Language: Jakobson, Mathesius, Trubetzkoy, and

the Prague Linguistic Circle

Jindrich Toman

27. Zero Syntax: Experiencers and Cascades

David Pesetsky

28. The Minimalist Program

Noam Chomsky

29. Three Investigations of Extraction

Paul M. Postal

30. Acoustic Phonetics

Kenneth N. Stevens

31. Principle B, VP Ellipsis, and Interpretation in Child Grammar

Rosalind Thornton and Kenneth Wexler

32. Working Minimalism

Samuel Epstein and Norbert Hornstein, editors

33. Syntactic Structures Revisited: Contemporary Lectures on Classic Trans-

formational Theory

Howard Lasnik with Marcela Depiante and Arthur Stepanov

34. Verbal Complexes

Hilda Koopman and Anna Szabolcsi

35. Parasitic Gaps

Peter W. Culicover and Paul M. Postal, editors

36. Ken Hale: A Life in Language

Michael Kenstowicz, editor

37. Flexibility Principles in Boolean Semantics: The Interpretation of Coordina-

tion, Plurality, and Scope in Natural Language

Yoad Winter

38. Phrase Structure Composition and Syntactic Dependencies

Robert Frank

39. Representation Theory

Edwin Williams

Documents

Representation Theory (Current Studies in Linguistics)