Click here to load reader
Upload
edwin-williams
View
244
Download
11
Embed Size (px)
Citation preview
Representation Theory
This page intentionally left blank
Representation Theory Edwin Williams
The MIT Press
Cambridge, Massachusetts
London, England
6 2003 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or informa-
tion storage and retrieval) without permission in writing from the publisher.
This book was set in Times New Roman on 3B2 by Asco Typesetters, Hong
Kong, and was printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Williams, Edwin.
Representation theory / by Edwin Williams.
p. cm. — (Current studies in linguistics)
Includes bibliographical references and index.
ISBN 0-262-23225-1 (hc. : alk. paper) — ISBN 0-262-73150-9 (pbk. : alk. paper)
1. Grammar, Comparative and general—Syntax. I. Title. II. Current studies in
linguistics series
P291 .W54 2002
415—dc21 2002071774
To Daddy
This page intentionally left blank
Contents
Preface ix
Introduction
Architecture for a New Economy 1
Chapter 1
Economy as Shape Conservation 5
Chapter 2
Topic and Focus in Representation
Theory 29
Chapter 3
Embedding 59
Chapter 4
Anaphora 95
Chapter 5
A/A/A/A 117
Chapter 6
Superiority and Movement 139
Chapter 7
X-Bar Theory and Clause
Structure 171
Chapter 8
Inflectional Morphology 199
Chapter 9
Semantics in Representation
Theory 239
References 275
Index 283
This page intentionally left blank
Preface
In 1971 I wrote the two required qualifying papers for Ph.D. dissertation
work in linguistics. One was about ‘‘small clauses’’—the notion that
clause structure has several layers, that syntactic operations are associated
with particular layers, and that each layer can be embedded directly,
without the mediation of the higher layers. The other proposed that tones
in tonal languages compose structures that are independent of segmental
or syllabic structure and that a certain kind of mapping holds between the
tonal and segmental representations. I guess these were the two best ideas
I’ve ever had. After thirty years of trying to bring something better to
light, I have given up and have determined instead that my further con-
tribution will be to combine them—if not into one idea, then at least into
one model of the linguistic system. That is what I try to do in this book.
The two ideas take the following guise: (1) syntactic economy is actually
shape conservation (here I return to the idea from tonal systems that
grammar involves not one complex representation, but two simple ones
put into a simple relation to one another), and (2) di¤erent clausal types
can be embedded at di¤erent levels (the Level Embedding Conjecture—
an implementation of the ‘‘small clause’’ idea).
In fact, though, when I put those two ideas together, a third popped
out that isn’t inherent in either. It’s this third idea that is responsible for
the sharpest new predictions in the book: the generalization of the A/A
system to A/A/A/A . . . , which may be viewed as an n-ary generalization
of the binary structure of the NP Structure model proposed in Van
Riemsdijk and Williams 1981. So this book also brings forward a strand
of my collaboration with longtime friend Henk van Riemsdijk.
Most of the ideas in this book have been presented in series of four to
five lectures or in one-week summer school courses: in Lesbos (1999),
Plovdiv (1999), and Vienna (1996–2001), and at UCLA (1997), Univer-
sity of British Columbia (1998), LOT (2001), and University College
London (2002). Other parts were presented in multiple meetings of the
Dutch/Hungarian verb movement study group in Wasenaar, Pecs, Buda-
pest, and Otteveny, in the years 1997–2001. I have benefited particularly
from the extended contact with an audience that such series a¤ord.
I have received encouragement in developing this book from Peter
Ackema, Misi Brody, Memo Cinque, Christine Czinglar, Henry Davis,
Rose-Marie Dechaine, Marcel den Dikken, Hans-Martin Gartner, Jane
Grimshaw, Yosef Grodzinsky, Catherine Hanson, Steve Hanson, Marika
Lekakou, Mark Liberman, Ad Neeleman, Andrew Nevins, Øystein Nil-
sen, Jean-Yves Pollock, Martin Prinzhorn, Henk van Riemsdijk, Domi-
nique Sportiche, Tim Stowell, Peter Svenonius, Anna Szabolcsi, Kriszta
Szendroi, and Martina Wiltschko.
I do heartily thank Anne Mark for applying the Jaws of Life to the car-
wreck of a manuscript she got, and I won’t let her edit just this one sen-
tence so the reader may understand exactly what there is to be grateful to
her for and why.
x Preface
Introduction
Architecture for a NewEconomy
Opus ultra vires nostras agere praesumsimus.
The work reported here brings to light two main findings: first, when
syntax is economical, what it economizes on is shape distortion rather
than distance; and second, this new notion of economy calls for a new
architecture for the grammatical system, and in fact a new notion of
derivation.
For example, the theta structure on the left in (1) has the same shape as
the Case structure on the right.
(1) [agent [predicate theme]]‘ [nominative [Case-assigner accusative]]
The two structures are isomorphic in an obvious sense. I will speak in
this book of one structure as representing another structure if it is iso-
morphic to it, and I will use the wavy arrow to symbolize this type of
representation.
Sometimes one structure will be said to represent another even if not
isomorphic to it, so long as it is nearly isomorphic to it, and nothing else
is closer to it. It is in this sense that syntax economizes on, or tries to
minimize, shape distortion. I will present evidence that this gives a better
account of economy than distance minimization principles like Shortest
Move. The issue can become subtle, as each theory can be made to mimic
the other; in fact, I will argue that some of the uses of distance minimi-
zation economy in the minimalist literature are transparent contrivances
to achieve shape conservation with jury-rigged definitions of distance.
The need for a new architecture should be evident from (1). In order to
say that a theta structure is isomorphic to a Case structure, we need to
have the two structures in the first place. The two structures in (1) have
no standing in standard minimalist practice: there is no theta structure
that exists independent of Case structure; rather, Case and theta are two
parts, or de facto ‘‘regions,’’ of a single structural representation of a
clause, the notion of clause that began with Pollock 1989 and has been
elaborated in functional structure sequencing labs around the world. The
model in which a principle of shape conservation will fit most naturally is
one in which the several di¤erent aspects of clausal structure are charac-
terized as separate ‘‘sublanguages’’ (to anticipate: Theta Structure (TS),
Case Structure (CS), Surface Structure (SS), Quantification Structure
(QS), Focus Structure (FS)). Then the syntax of a sentence will be a col-
lection of structures, one (or more; see chapter 3) from each of these sub-
languages, and a set of shape-conserving mappings among them. In this
sense, then, a new economy (shape conservation) calls for a new archi-
tecture ({TS, CS, SS, QS, FS}).
The new architecture o¤ers a new style of clausal embedding that has
no analogue in standard minimalist practice: the Level Embedding Con-
jecture of chapter 3, a scheme that tightly fixes the relation among local-
ity, reconstructivity, and target type for syntactic relations in a way that I
think is not available in any other model of grammar (see below for defi-
nitions of these terms). The new architecture requires, and in fact auto-
matically provides, a generalization of the A/A distinction to an A/A/A/
A . . . distinction to derive these correlations.
I have called the theory Representation Theory to put the notion of
economy at the forefront: a Case structure ‘‘represents’’ a theta structure
it is paired with, and the essence of representation is isomorphism. So,
syntax is a series of representations of one sublanguage in another.
Chapter 1 develops some analyses in which shape conservation is
manifestly implicated in the domains of lexical structure, compound
structure, bracketing paradoxes, and Case-theta relations. These serve as
a basis for framing a general theory according to which syntax consists
of the sublanguages mentioned earlier, with the representation relation
holding among them. The Mirror Principle is viewed not as a principle
but as an e¤ect that arises automatically whenever two di¤erent sub-
languages each represent a third.
Chapter 2 applies the model to scrambling and its relation to topical-
ization, scope, and focus, using the concept of shape conservation to
reanalyze these domains. Known properties are shown to follow from
conflicting representation requirements, and language di¤erences are
analyzed as di¤erent choices in resolving such conflicts.
Chapter 3 defines di¤erent kinds of embedding for each of the sub-
languages. At the two extremes are clause union embedding in TS and the
2 Introduction
isolating, nonbridge verb embedding in FS; intermediate-sized clauses are
embedded in the intervening levels. The Level Embedding Conjecture
(LEC) says that the di¤erent clause types are not all embedded at the
same level; rather, each type is embedded at the level at which it is defined.
This leads to derivations quite di¤erent from those generated in other
known models of syntax. A generalized version of the Ban on Improper
Movement follows from this architecture.
Chapters 4–6 explore consequences of the LEC proposed in chapter 3.
Three characteristics of a rule are its locality (its range), its reconstruc-
tivity (for a movement (or scrambling) rule, which relations are computed
on its input and which on its output), and its target (the type of element—
A-position, A-position, A-position, and so on—it targets). RT with the
LEC automatically fixes the connections among them, or correlates them
(I will thus refer to LRT correlations), enabling us to answer questions
like ‘‘Why does long-distance scrambling reconstruct for binding theory,
but not short-distance scrambling?’’ and generalized versions of such
questions.
Chapter 4 defines di¤erent kinds of anaphors for each sublanguage;
tight ‘‘coargument’’ anaphors are defined at TS, and long-distance ana-
phors at SS. The theory draws a connection between the locality of an
anaphor and the type of antecedent it can have, where the types are
‘‘coargument, A position, A position, . . . ,’’ in line with the LRT correla-
tions of chapter 3.
Chapter 5 develops the empirical consequences of the generalized no-
tion of the A/A relation that flows from the LEC, and the resulting gen-
eralized notion of reconstruction. Essentially, every pair of sublanguages
in a representation relation can be said to give rise to a di¤erent A/A
distinction and a di¤erent level of reconstruction.
Chapter 6 draws a distinction between movement, as classically under-
stood, and misrepresentation, as defined here. Under special circum-
stances an element might seem to have moved because it occurs in a
structure that stands in a representation relation with another structure
it is not strictly isomorphic to. I argue that classical movement does not
reduce to misrepresentation, and in fact both are needed. Classical wh
movement, for example, is a part of the definition of the sublanguage SS
and does not arise through misrepresentation. In particular, I argue that
parallelism e¤ects observed in multiple wh movements are not the same
kind of thing as the parallelisms that motivate shape conservation and
that they appear to be so only for the simplest cases.
A New Economy 3
Chapters 7 and 8 develop the RT account of phrase structure and head
movement. Chapter 7 develops an account of X-bar theory in which a
lexical item directly ‘‘lexicalizes’’ a subsequence of functional structure; it
then defines the notion of X-bar category in syntax implied by this notion
of what a lexical item does. It is a consequence of the A/A generalization
of previous chapters that Relativized Minimality must be a misgeneral-
ization, in that it attempts to subsume head movement under a general
theory of movement. Chapter 7 argues that head movement is not move-
ment, but part of the X-bar calculus, and its locality follows from the
laws of X-bar theory, not movement theory.
Chapter 8 explains how such lexicalized subsequences can be spelled
out morphologically. Representation is argued to directly derive the Mir-
ror Principle, with a strict separation of syntax and morphological spell-
out. A model of inflectional morphology is developed, a combinatorial
system called CAT, which predicts a precise range of possible realizations
of a set of universal inflectional elements; those possible realizations
are compared with known facts. The mechanism is also applied to verb
cluster systems and is proposed to be the underlying syntax of the recur-
sive argument-of relation, wherever it occurs.
Chapter 9 develops some preliminary ideas about how semantics must
be done di¤erently in RT. Semantic compositionality must be rethought
in view of the syntactic architecture; it becomes less likely that there can
be a single representation of linguistically determined meaning. Chapter 9
also elaborates the notion of semantic value defined at each level, and it
seeks to explicate the di¤erences among types of anaphora (pronominal
anaphora, ellipsis, anaphoric destressing) in terms of these di¤erent kinds
of value.
4 Introduction
Chapter 1
Economy as ShapeConservation
I begin by exploring a problem with the usual solutions to bracketing
paradoxes. The solution to this problem leads to a new principle of
economy, Shape Conservation, which shows itself capable of replacing
the more familiar economy principles. I fashion a new theoretical archi-
tecture to maximize the empirical scope of the principle.
Linguists have distinguished two types of economy, local and nonlocal,
to use Collins’s (1996) terminology—that is, economy that compares
derivations and economy that does not. Although research seems to have
moved away from nonlocal economy, the principle studied here is non-
local, and transderivational.
It is sometimes suggested that computational considerations weigh
against nonlocal economy, but I am personally willing to put such con-
siderations aside while I try to figure out what the overall organization of
the grammatical systems should be. The computation would seem to re-
duce to a metric of tree similarity, which the considerations presented in
this book delimit somewhat, but do not fully determine.
1.1 Bracketing Paradoxes
First, the problem with bracketing paradoxes. A bracketing paradox is a
situation in which two di¤erent considerations about the structure of a
form lead to di¤erent conclusions. Usually, but not always, one consid-
eration is syntactic and the other semantic. Bracketing paradoxes are
generally dispelled by some kind of syntactic restructuring operation. My
first point is that any such restructuring must be inhibited by the existence
of other derivations and forms, and that the relation of the restructuring
to these other derivations and forms is ‘‘economical.’’
The phrase beautiful dancer is a typical example of a bracketing para-
dox. The phrase is famously ambiguous, having the meanings ‘beautiful
one who dances’ and ‘person who dances beautifully’. The two meanings
can be represented as in (1a). The first reading is easy to get, but the sec-
ond is a bracketing paradox. (& and ~& stand for ambiguous and non-
ambiguous, respectively.)
(1) a. &a beautiful [dance -er]
i. beautiful [-er [dance]]
ii. [-er [beautiful dance]]
b. ~&a beautiful [person who dances]
c. a person who [dances beautifully]
d. *a [beautiful dance] -er
If (1ai) is the logical structure of (1a), and if modification is restricted
to sisters, then (1a) should have only the meaning ‘beautiful one who
dances’, because beautiful is sister to an expression headed by -er, which
means ‘one who’. But this leaves no room for (1aii), because that mean-
ing, ‘one who dances beautifully’, has a logical structure that is athwart
the structure of (1a).
Now, we could write a restructuring rule that would relate (1aii) to
(1a), thus making it ambiguous, on the assumption that the relation of
(1a) to (1ai) is transparent and will arise in any case. But a problem then
arises for (1b). If we write the restructuring rule for (1a) in a completely
general form, it will most likely apply to (1b) as well, making it ambigu-
ous too; but it is not. Then why is it not? The idea I would like to explore,
or exploit, is that (1b) is not ambiguous because the ‘‘paradoxical’’
branch of the ambiguity is already covered by a di¤erent form, namely,
(1c), and (1c) fits the meaning better, that is, more transparently. In other
words, (1c) blocks (1b) from having its nontransparent meaning. By con-
trast, the right (transparent) form for the other, nontransparent meaning
of (1a) is (1aii), which cannot be generated. So what we have here is a
principle that says, ‘‘Use the right form, unless there isn’t one. If there
isn’t one, it’s OK to use a form that doesn’t match the meaning.’’
Of course, the failure of (1b) to mean (1c) could be explained di¤er-
ently; it could be taken to represent an island or locality condition on the
restructuring operation, for example. But further cases, such as com-
pounding, suggest that this view is wrong.
It is a stricture of form that gives rise to the gap. We expect then that if
there is no stricture of form, there will be no gap, and consequently no
6 Chapter 1
bracketing paradox. The system of English compounds is striking in its
lack of bracketing paradoxes.
(2) a. kitchen [TOWEL rack]0b. [KITCHEN towel] rack
c. ‘‘[x [y z]] means [[x y] z] only if [x [y z]] is not generable’’
For example, we have the famous compounds in (2a) and (2b), which
have the meanings and accent patterns indicated. Each structure deter-
mines a di¤erent meaning and a di¤erent pronunciation; therefore, the
meanings and pronunciations and structures are in one-to-one corre-
spondence, and we can say that in each case the structure ‘‘mirrors’’ or
‘‘represents’’ the meaning perfectly. Importantly, (2a) is unambiguous
and cannot have the meaning that (2b) has. The question is, why isn’t the
restructuring mechanism for bracketing paradoxes, whatever it is, appli-
cable here? Why can’t the form in (2a) (and its predictable pronunciation)
be restructured so it is semantically interpreted as though it were like
(2b)? In light of the examples in (1), we may now attribute the lack of
ambiguity in (2a) to the existence of the form (2b) itself; the other mean-
ing that (2a) would have is the one that (2b) represents directly. The rea-
son there are no bracketing paradoxes in the compounding system is that
the ‘‘right’’ structure is always generable; this is expressed in (2c). And the
reason for that is that there are only the barest restrictions on the syntax
of N-N compounds—any pair of nouns can be concatenated, bar none. It
is only where there is some stricture of form, as in (1d), that bracketing
paradoxes can arise. The rest of the book develops this idea, exercised
across the whole of syntax, and the architecture that syntactic theory and
syntactic derivation must have in order for this account of bracketing
paradoxes to work.
That language should seek isomorphic matches between related struc-
tures, and accept nonisomorphic matches only when isomorphic matches
are missing, is really an application of Pan˙ini’s principle, ‘‘Use the most
specific applicable form.’’ The isomorphic form is simply the most specific
applicable form, and distorted forms are available only when the iso-
morphic form is not. Shape Conservation thus turns Pan˙ini’s principle
into the economy condition governing syntax. (For further thoughts on
this, see Williams 1997, where it is shown that even economy-as-distance-
minimization can be construed as an application of Pan˙ini’s principle.)
I will now expand the scope of this kind of treatment somewhat. Ad-
verbial modification also manifests bracketing paradoxes. As a sentence
Shape Conservation 7
adverb, probably must modify a tense, or some higher sentence operator.
Completely, on the other hand, must modify the VP. (3) shows how these
sort out: in (3a) and (3b) Tense and V are separate, so the two adverbs
occupy separate positions. But what happens when they are not separate,
as in (3c,d)?
(3) a. John probably was completely cleaning it out.
b. *John completely was probably cleaning it out.
c. John probably [cleanþ ed ] it out. (probably [-ed [clean]])
d. John completely [cleanþ ed] it out. (??completely [-ed [clean]])(3c) poses no problem: if Tense is the exterior element of Vþ ed, then
probably can be seen as modifying it directly. But then (3d) is a bracket-
ing paradox: past tense intervenes between the adverb completely and the
verb it is meant to be modifying. So, completing the analogy with the
previous examples, we can say that (4b) can have the meaning in (4a),
because (4a) itself is not generable.
(4) a. *[completely clean]V -ed
b. completely [clean -ed]V
This gives a di¤erent view of modification than we would expect to
have in the exploded-Infl clause structure proposed by Pollock (1989). In
the Pollock-style clause structure this particular bracketing paradox does
not arise. Tense is a separate element; both (3a) and (3b) have the struc-
ture in (5), and each adverb is adjoined to its proper modifiee. Then V
moves to Tense covertly.
(5) [probably T [completely [V NP]]
A fully lexicalist account of inflection, where functional structure is not
part of clause structure directly but is rather part of the internal structure
of lexical items, will always involve us in these sorts of bracketing para-
doxes, and so the viability of the lexicalist account depends on gaining
some understanding of how bracketing paradoxes work. My first guess is
that bracketing paradoxes arise when the ‘‘best match’’ for a given struc-
ture is not available for some reason, so the ‘‘next best match’’ must be
used. In a lexicalist account of inflection, functional structure will be vis-
ible only at the ‘‘joints’’ between words, so any case in which an adverb
modifies an interior element will be a bracketing paradox. Chapters 7 and
8 pursue a lexicalist model of inflection in RT.
8 Chapter 1
1.2 The Meaning of Synthetic Compounds
The notion of representation, as understood here, can also be applied
to the interpretation of compound structures. The problem to be solved
arises precisely because of the extreme productivity of compounding. Any
two nouns can be put together, and some meaning that connects them
can be concocted, the only inhibition being that the head, in the normal
case, must count as setting the ‘‘major dimension’’ in determining the
meaning of the compound; the nonhead then provides discrimination
within the major dimension. So my students have no trouble thinking of
endless lists of possible relations that could hold between the two ran-
domly selected nouns biography and bicycle of the following compounds:
(6) a. biography bicycle: a bicycle on which biographies are inscribed, a
bicycle on which manuscripts of biographies are messengered in a
large publishing house, etc.
b. bicycle biography: a biography written while touring on a bicycle,
the biography of a bicycle, etc.
Although there are quite narrow rules for pronouncing compounds, it
would seem we can be no more precise about how to determine their
meaning than to say, ‘‘Find some semantic relation that can hold between
the two elements.’’ This is the general understanding of what have been
called root compounds.
It has also been suggested that there is a substrain of compounds,
complement-taking deverbal nouns, that follows a more precise rule.
(7) a. book destroyer
b. church goer
c. *goer
If the root rule can compose new forms only out of existing forms, then
the nonexistence of (7c) is cited as evidence that (7b) cannot arise simply
by applying that rule; hence, a special rule for these synthetic compounds
is postulated (Roeper and Siegel 1978). The synthetic rule is a specific rule
that manipulates the thematic structure of a lexical item, adding an ele-
ment that satisfies one of its theta roles. For example, starting with the
verb go, which takes a goal argument, this rule adds church to it to satisfy
that goal argument in the compound structure.
One problem with positing two rules for English compounding, the
root rule and the synthetic rule, is that the outputs of the two rules are
suspiciously similar: both give rise to head-final structures, with identical
Shape Conservation 9
accent patterns. But a much greater problem, and the one I want to con-
centrate on, is that the output of the synthetic rule is completely swamped
by the output of the root rule.
Since the root rule says, ‘‘Find some relation R . . . ,’’ with no imagin-
able restrictions on what R can be, and since ‘‘x is (or ‘restricts the refer-
ence of ’) the thematic complement of y’’ is some such R, what is to stop
the root rule from deriving compounds just like the synthetic rule, making
the synthetic rule redundant?
We might begin by thinking of the connection between the two rules as
a ‘‘blocking’’ relationship (i.e., governed by Pan˙ini’s rule): the specific
synthetic rule blocks the more general root rule, in order to prevent
the root rule from deriving synthetic compounds. I think the intuition
behind this idea is correct, but it raises a telling question that can only be
answered by bringing in the notion of representation in the sense devel-
oped here.
But the first thing to establish is that there really is a problem. Is there
anything to be lost by simply giving up the synthetic rule, leaving only the
root rule for interpreting compounds? There is at least this: the root rule
will not only derive all the good synthetic compounds, but also derive bad
ones. Consider a further fact about synthetic compounds, specifically
about nominalizations derived from ditransitive verbs like supply: the two
theta roles for supply have to be realized in a particular order with the
noun supplier.
(8) a. army gun supplier
b. *gun army supplier
Presumably (8a) is the only form generated by the specific synthetic rule,
but why can (8b) not then be generated by the root rule? The answer
cannot be ‘‘blocking,’’ because the synthetic rule cannot produce (8b),
and so the root rule will not be blocked for that case. Apparently army
supplier is a decent compound on its own, so the question reduces to this:
what is to stop the root rule from composing gun and army supplier as
shown in (9) (where R(x; y) is ‘‘y is (or ‘restricts the reference of ’) the
theme argument of the head of x’’)?
(9) a. Syntax: gunþ ‘‘army supplier’’ ! gun army supplierb. Semantics: R(army supplier, gun)
If such Rs are admitted, and I see no principled way to stop them, then
the root rule can derive anything, including ‘‘bad’’ synthetic compounds
—a real problem.
10 Chapter 1
In fact, if any R is allowed, it is not even clear how to maintain the
special role of the head in compounds—the right R could e¤ectively re-
verse the role of head and nonhead.
(10) R(H, non-H) ¼ (some) R 0(non-H, H)In other words, R says, ‘‘Interpret a compound as though the head were
the nonhead and the nonhead were the head.’’ This R defeats the very
notion of head, as it pertains to meaning.
To treat the second problem first: whatever the semantic content of the
notion ‘‘head’’ is, it relies on every relation R having an obvious choice
about which end of the relation is the ‘‘head’’ end. Semantically, the head
is the ‘‘major dimension’’ of referent discrimination. In the ordinary case
such as baby carriage the choice is obvious: a baby carriage is not a baby
at all, but a type of carriage, subtype baby. But the dvandva compounds
show how very slim the semantic contribution of headship can be.
(11) a. baby athlete
b. athlete baby
(12) a. athlete celebrity
b. celebrity athlete
In each of these the (a) and (b) examples have the same referents: ‘‘things
that are babies and athletes’’ or ‘‘things that are athletes and celebrities.’’
But in fact (11b) is somewhat strange, presumably because it implies that
babies come in types, one of which is ‘‘athlete,’’ even if it is not obvious
why this is less acceptable than the notion that athletes come in types, one
of which is ‘‘baby.’’
I think that these are both ‘‘representational’’ questions: what syntactic
structures ‘‘represent’’ various semantic structures, where one structure
represents another by mirroring its structure and parts.
We can turn the concept of head into a representational question, in
the following way:
(13) Suppose that
a. the head-complement relation is a syntactic relation [H C], and
b. R is any asymmetric semantic relation {A, B} between two
elements.
Then how is [H C] to be matched up with {A, B}?
The syntactic relation will best ‘‘represent’’ the semantic relation if its
asymmetry is matched by the asymmetry of {A, B}—but which identifi-
cation is the one that can be said to match the asymmetries? This question
Shape Conservation 11
can best be answered by first considering the question, what is the syn-
tactic asymmetry itself ? I think the source of the syntactic asymmetry is
the syntactic definition of head, which I take to be the following, or at
least to have the following as an immediate consequence:
(14) [H C] is a (syntactic thing of type) H.
That is, syntactically, a unit composed of a head and its complement
([H C]) ‘‘is a thing of the same type as’’ the type of H itself. Phrasing the
matter this way, there can be no question which of the two items in (15b)
is ‘‘best represented’’ by the form in (15a), namely, (15bi).
(15) a. [baby athlete]
b. i. ‘‘baby athlete’’ is a thing of the same type as ‘‘athlete’’
ii. ‘‘baby athlete’’ is a thing of the same type as ‘‘baby’’
And likewise for ‘‘athlete baby.’’
Crucially, I am assuming that the representation must match the asym-
metry of syntax with some asymmetry in the structure it is representing,
as a part of the representation relation.
Now let us return to *gun army supplier. By the root derivation men-
tioned earlier, this form has (among others) a meaning in common with
army gun supplier, and the question is how to block that. To apply the
logic above, we must assume that there is a theta structure with the form
in (16a), but none with the form in (16b).
(16) a. [goal [theme supplier]]
b. *[theme [goal supplier]]
This is a fact about theta structures themselves, not how they are repre-
sented. Then we can say that this is best represented by a structure in
which the highest N is mapped to the goal, and the next highest N is
mapped to the theme, rather than the reverse.
(17)
The result is that R can be any imaginable relation; but for a given
representation relation, we must choose R so as to maximize isomor-
phism to the represented structure. This is why the root rule appears to be
12 Chapter 1
constrained by the synthetic rule. A compound does not have to represent
a theta relation; but if it does, it must do so in the best possible way.
We have seen that there are two ways to think about this, in terms of
rules and in terms of representation. The account in terms of rules is in-
su‰cient in an important way and can be remedied only by reference to
something like representation. Therefore, we may as well devote ourselves
to solving the problem of representation and in the end be able to forget
about the rules. It is tempting to think of the synthetic rule as blocking
the root rule, but this does not give a straightforward account of why (8b)
is ungrammatical, since the synthetic rule would not derive it anyway. In
order to prevent (8b) by rule blocking, we must have recourse to what
(8b) is trying to do, and then block it because (8a) does the same thing
better. But of course what it is trying to do is to represent (8a), only it
does it less well than another form. I don’t see any way around this.
1.3 CasecTheta Representations
I have spoken of one system ‘‘representing’’ another system. I have
chosen the word represent purposely to bring to mind the mathematical
sense of representation, which involves isomorphism. So, the set of theta
structures at the level of Theta Structure (TS) is one system, and the
stems and a‰xes of a language are another system, and we can speak of
how, and how well, one represents the other. For example, we have the
theta structure ‘‘complement-predicate,’’ and this structure is represented
by the stem-su‰x structure provided by morphology. Of course, in this
case there is a natural isomorphism that relates the two.
(18) TS‘Morphology
TS: Morphology:
{complement predicate}‘ [stem su‰x]
(e.g., lique-fy)
In the case of TS we (as investigators) are lucky: there are two di¤erent
systems that represent TS. One is morphology, or word structure, and the
other is Case theory, or, as it will be called here, the level of Case Struc-
ture (CS): the system of Case assigners and their assignees, a part of
phrasal syntax. These representations are di¤erent: they reflect the di¤er-
ence between a‰x and XP, di¤erences in the positioning of the head, and
other di¤erences. But they are the same in their representation of theta
structures, so we can learn something about the representation relation by
Shape Conservation 13
comparing them. (Later in this chapter, and in more detail in chapter 7,
I will derive the Mirror Principle from this arrangement, and in chapter
4, some other consequences.)
Throughout this book the wavy arrow (‘ or c) will stand for the
representation relation. (19a) diagrams the arrangement under which
both morphology and phrasal syntax (specifically, ‘‘Case frames’’ in
phrasal syntax) represent theta relations.
(19) a. MorphologycTS‘CS
b. {supply theme}
i. ‘ [gun supplier]Nii. ‘ [supply guns]VP
c. {{supply theme} goal}
i. ‘ [army [gun supplier]]Nii. ‘ [[supply guns] to an army]VP
d. {{advise theme} goal}
i. ‘ [graduate student [course advisor]]Nii. ‘ *[course [graduate student advisor]]Niii. ‘ [[advise graduate students] about courses]VPiv. ‘ *[[advise courses] to graduate students]VP
e. advise: NP aboutP
By stipulated convention, the arrow points from the representing structure
to the represented structure.
(19b) illustrates the simple theta structure consisting of the predicate
supply and its theme complement; this relation can be represented by
either a compound (N) or a Case frame (VP), as shown. A more complex
theta structure, as in (19c), begets correspondingly more complex repre-
sentations. For (19c) the Case and morphological representations are dif-
ferent, but both are isomorphic to the theta structure, so long as linear
order is ignored. In other cases, however, the two representations diverge.
For example, if advise takes a theme and a goal, in that order, then
the compound seems to be isomorphic to the resulting structure (19di),
but the syntactic representation does not seem to be (19diii). And the
compound that would be isomorphic to the syntactic representation is
ungrammatical (19dii). How can this come about? We have already seen
why the compound (19dii) is ungrammatical: there is a better representa-
tion of the target theta structure. As for the syntactic representation,
suppose that the verb advise is stipulated to have the Case frame in (19e),
but not the one that would allow (19div). Then the theta structure in
14 Chapter 1
(19a) will map, or mismap, onto (19diii), because that is the best avail-
able. Hence the divergence between the compound and the Case struc-
ture. (19diii) is a misrepresentation of (19d) (and so is a ‘‘bracketing
paradox’’), which arises from a perhaps arbitrary stricture in the repre-
senting system, the stipulated subcategorization of supply (19e).
The exceptional-Case-marking (ECM) construction is another obvious
example of a Case-theta misrepresentation. TS provides a representation
of the sort given in (20). Now suppose that CS provides the representa-
tion indicated, but nothing isomorphic to the theta structure. Then the
Case structure will misrepresent the theta structure. This account misses
an important fact about ECM—that it is a rare construction—but cap-
tures the essential features of the construction itself.
(20) ECM as a bracketing paradox in syntax
TS: [believe [Mary to be alive]]
CS: [[believe Mary] to be alive]
Throughout these examples the economy principle at work is this: ‘‘Use
the ‘most isomorphic’ structure that satisfies the strictures of the repre-
senting level.’’ If only we could specify in a general way what sets of
structures are taken to be in competition, we would have a theory.
1.4 Shape Conservation
I think that most of the economy proposals about grammatical structure
made during the 1990s can be understood as principles partly designed to
aid and abet the kind of shape conservation under discussion here. First
of course is the Mirror Principle (Baker 1985), which says that the inte-
rior structure of words will mirror the exterior syntactic structure in
which the words occur. The Mirror Principle is not really a principle, but
a robust generalization that is reflected in di¤erent theories in di¤erent
ways. It is implemented in Chomsky 1993, for example, by the algorithm
of feature checking, which is stated in such a way that as a verb moves up
the tree, one of its features can be checked in syntax only after features
more deeply buried in the word have already been checked; this achieves
the mirror e¤ect because morphology adds the most deeply embedded
features first. This reduces the Mirror Principle to coordinating the two
uses of the feature set via a list, or more specifically, a ‘‘stack.’’ A stack
gives mirror behavior, but is of course only one way to get it.
Shape Conservation 15
My own view is that the Mirror Principle arises from the holistic
matching of two structures. Since a list is an ‘‘abstract’’ of a structure, it
can serve the same purpose in some circumstances, but only where the list
is an adequate abstract of the structure in question. I regard Chomsky’s
mechanism as an artifice that mimics structural isomorphism for simple
cases—essentially right-linear structures, which are equivalent to lists,
in the sense that there is an obvious way to construct a list from a right-
linear structure and vice versa.
As mentioned earlier, I take the Mirror Principle to be the result of
having two systems that represent one and the same theta system, in the
sense of isomorphic representation.
(21) Mirror Principle
morphologyc theta roles, inflectional elements‘Case system
So, just as there are derivational pairs that mirror each other (22a,b),
there are also inflectional pairs that do the same thing (22c,d).
(22) Derivation
a. [can [swim]VP]
b. [[swim]V able]
Inflection
c. [didT; VP [see]VP]
d. [[see]V -edT;V ]
In the same vein, Chomsky’s (1993, 1995) definition of equidistance
can be seen as a principle that promotes shape conservation, though with-
out explicitly saying so. The question he posed, to which equidistance
was the answer, is, why does the object move to AgrO and the subject to
AgrS, and not vice versa? (Here I use Chomsky’s (1993) terminology; the
problem remains in more recent Agr-less theories.) Chomsky engineers a
solution to this problem in the definition of equidistance, and as a result,
the permitted combination of movements is the familiar pair of intersect-
ing movements.
(23)
Verb movement ‘‘extends the domain’’ of the lowest NP, as domain is
defined in terms of head chains. With the domain of the lower NP ex-
16 Chapter 1
tended in this way, the two NPs in (23) are in the same domain and
hence, by definition, equally distant from anything outside that domain;
hence, they are equally eligible to move outside that domain; hence,
the subject can move over the object without violating economy con-
ditions, and the intersecting derivation results. A ‘‘shortest derivation’’
principle rules out the other, nesting derivation. The odd result is that al-
though the economy conditions are distance minimizing, distance itself is
never defined, only equidistance. I believe this is a clue that the result is
artificial.
Intersecting paths are not what previous work has taught us to expect
from movement. (24a) illustrates the famous intersecting pair of tough
movement and wh movement; as is evident, the intersecting case is much
worse than the nesting case (24b) (Fodor 1978).
(24) a. *Which sonatas is this violin easy to play tsonatas on tviolin?
b. Which violin are these sonatas easy to play tsonatas on tviolin?
So the intersecting movement of subject and object is mysterious.
Intersection might be an illusion arising from the analytic tools and not
from the phenomenon itself. Intersection only arises if two items are
moving to two di¤erent positions in the same structure. But suppose that
instead of moving both subject and object up a single tree, we are instead
trying to find their correspondents in a di¤erent tree altogether—the sort
of operation illustrated in (25). Then there is no intersection of move-
ment; what we have instead is a setup of correspondences between two
structures that preserves the interrelation of those elements (the subject
and object).
(25)
In standard minimalist practice, A would be embedded beneath B, and
movement would relate the agent to nominative, and the theme to accu-
sative. But in RT these relations are a part of the holistic mapping of TS
(containing A) to CS (containing B).
An examination of Holmberg’s (1985) generalization leads to similar
conclusions: it is better seen as a constraint on mapping one representa-
tion into another, than as a constraint on the coordinated movements,
within a single tree, of the items it pertains to (verb and direct object).
Shape Conservation 17
The generalization says that object shift must be accompanied by verb
movement: if the object is going to move to the left, then its verb must
do so too. The following are Icelandic examples in which the verb and/or
the direct object can be seen to reposition itself/themselves leftward over
negation:
(26) a. aDthat
Jon
Jon
keypti
bought
ekki
not
bokina
the-book
V neg NP
‘that Jon didn’t buy the book’
b. aD Jon keypti bokina ekki tV tNP V NP neg
(27) a. Jon
Jon
hefur
has
ekki
not
keypt
bought
bokina.
the-book
aux neg V NP
‘that Jon didn’t buy the book’
b. *Jon hefur bokina ekki keypt tNP. aux NP neg V
(26) shows that the object can appear on either side of negation, so long
as both are to the right of the verb. (27) shows that when the verb is to the
right of negation, the object cannot cross negation, even though it could
in (26). Clearly, what is being conserved here is the relation of the verb to
the object. (This, by the way, is not Holmberg’s original proposal, but
one derived from it that many researchers take to be his proposal. In fact,
he proposed that the object cannot move at all unless the verb moves, a
weaker generalization; it remains an empirical question which version is
the one worth pursuing.)
There are various proposals for capturing Holmberg’s generalization,
including Chomsky’s (1995) idea that the D and V ‘‘strength’’ features of
the attracting functional projection must be coordinated—that is, both
strong or both weak. This won’t really work, because if the V and the di-
rect object are attracted to the same functional projection, they will cross
over each other, and this is exactly what is not allowed.
(28) ‘‘ . . . AgrO is {strong [D-], strong [V-]}.’’ (Chomsky 1995, 352)
strong DAgrO C strong VAgrO
(29)
In order to capture the strong and most interesting form of Holmberg’s
generalization (i.e., in order to guanantee that the object cannot cross the
18 Chapter 1
verb), Chomsky’s account must be accompanied by a further stipulation
that the V obligatorily moves to Tense.
But I think that further facts demonstrate the insu‰ciency of this ap-
proach. When there are two objects, they cannot cross over each other,
though the first can move by itself.
(30) a. NP V ekki NP1 NP2b. NP V NP1 ekki t1 NP2c. NP V NP1 NP2 ekki t1 t2d. *NP V NP2 NP1 ekki t1 t2
Clearly, coordinating attraction features will not work here either. What
is obviously going on is that any set of movements is allowed that does
not perturb the interrelation of V, NP1, and NP2. Again, a holistic prin-
ciple of Shape Conservation would seem to go most directly to the heart
of the problem.
A further mystery for the standard view arises from the fact that
Holmberg’s generalization does not hold for V-final languages like
German.
(31) Sie
she
hat
has
Peter
Peter
gestern
yesterday
gesehen.
seen
‘She saw Peter yesterday.’
In (31) the object has moved leftward over the adverb without an ac-
companying movement of the verb. If the view I am suggesting here is
correct, Holmberg’s generalization does not hold because the leftward
movement of the object in Germanic (over an adverb) does not change
the relation of object to verb—the original order is conserved.
In particular theories shape conservation shows up in particular ways.
In hyper-Kaynian theories (Antisymmetry theories with massive remnant
movement) there is a signature derivation of shape-conserving mappings.
The key is systematic remnant movement—namely, remnant movement
resulting automatically from the fact that a phrase is a remnant. All trans-
formational theories of grammar have countenanced remnant move-
ment (see chapter 5 for discussion): NP movement can give rise to an AP
with a gap in it (32a), and then that AP can be displaced by wh movement
(32b).
(32) a. John is [how certain t to win]APb. [how certain t to win]AP is John
Shape Conservation 19
But in such a case the two movements are triggered by di¤erent things
(Case for NP movement and wh requirements for wh movement); and in
fact the movements can occur alone, and so are not coordinated with one
another. But in hyper-Kaynian remnant movement the movement of the
remnant and the movement that creates the remnant are keyed to each
other in some way. There are several ways to implement this (one could
propose that both movements are triggered by the same attractor, or
some more complicated arrangement), but in any such arrangement the
movements will always be paired.
Now suppose we find evidence in RT for a shape-conserving ‘‘trans-
lation’’ of structures in one level (L1) to structures in another (L2), as
shown in (33) (where the lines are points of correspondence under the
shape-conserving mapping).
(33)
We can mimic this behavior in Antisymmetry as follows. First, the
derivation concerns a single structure, rather than the pair of structures in
(33); that structure is the result of embedding F 00L1 as a complement of FL2 .Three movements are needed to map the material in the embedded (FL1)
structure into positions in the higher (FL2) structure in shape-conserving
fashion. We therefore need four specifiers, of F0, F1, F2, and F3 (shown in
(34), with F0 at the very top not visible). F3 in (34) corresponds to F00L1
in (33), and SpecF3 corresponds to SpecFL1 . Instead of mapping from
one level to another, as in RT, we move everything in F3 up the tree into
the region of F1 and F0. In order for these movements to achieve shape
conservation, a minimum of three moves are needed, two movements
of SpecF3 and one of F3 itself, in the following order: (a) movement of
SpecF3, making F3 a remnant; (b) movement of that remnant to a Spec
higher than the one SpecF3 was moved to; (c) a second movement of
SpecF3 (to SpecF0), to ‘‘reconstitute’’ the original order of SpecF3 and
the rest of F3.
20 Chapter 1
(34) Achieving shape conservation in Antisymmetry
What is conserved is the order, and the c-command relations, among the
elements of F3. Of course, F3 itself is not conserved, having been broken
into parts, but since the parts maintain their order and c-command rela-
tions and are therefore almost indistinguishable from an intact F3, the
result does deserve some recognition as exemplifying shape conservation.
I believe there is no simpler set of movements in Antisymmetry that could
be called shape conserving.
For this reason, I find it telling that derivations such as (34) abound in
the Antisymmetry literature. It suggests to me that there is something
fundamental about shape conservation. Since Antisymmetry was not
built to capture shape conservation directly, it can only do so in this
roundabout way—yet this roundabout derivation occurs on every page.
Of course, not all derivations in Antisymmetry instantiate (34), only
the shape-conserving ones. After all, things do get reordered in some
derivations, in all accounts. But it will still be suspicious if the ‘‘nothing
special is happening’’ derivation in Antisymmetry always instantiates
(34). It suggests to me that (33) is right.
Another principle with shape-conserving character was a principle of
Generative Semantics, where interpreted structure was deep structure,
and surface structure was the endpoint of derivation in a completely
linear model. The gist of it is this: if Q1 has scope over Q2 in interpreted
structure, then Q1 c-commands Q2 in surface structure (see, e.g., Lako¤
Shape Conservation 21
1972). Ignore for now that the principle is false to certain facts, such as
the ambiguity, in English, of sentences with two quantified NPs (e.g.,
Everyone likes someone)—it represents a real truth about quantifiers, and
I will in the end incorporate it into RT directly as a subinstance of the
Shape Conservation principle.
This principle has been reformulated a few times—for example, by
Huang (1982, 220),
(35) General Condition on Scope
Suppose A and B are both QPs or Q-expressions, then if A
c-commands B at SS, A also c-commands B at LF.
and by Hoji (1985, 248).
(36) *QPi QPj tj tiwhere each member c-commands the member to its right.
Probably related is the observation widely made about a number of lan-
guages that if two quantifiers are in their base order, then their interpre-
tation is fixed by that order; but if they have been permuted, then the
possibility of ambiguity arises.
All of these versions of the principle achieve the same thing: a cor-
respondence between (something close to) an interpreted structure and
(something close to) a heard structure. In fact, the correspondence is a
sameness of structure, and so encourages us to pursue the idea of a gen-
eral principle of Shape Conservation. Lako¤ ’s and Huang’s versions are
transparently shape-conserving principles. Hoji’s is not, until one realizes
that it is a representational equivalent of Huang’s and Lako¤ ’s. Fox’s
(1995) results concerning economy of scope can be seen in the same
light.
Finally, Shape Conservation bears an obvious relation to ‘‘faithfulness
to input’’ in Optimality Theory and to the f-structure/c-structure mapping
in Lexical-Functional Grammar. I will comment further on the relation
between RT and these other theories in chapter 3.
I have by now recited a lengthy catalogue of shape-conserving princi-
ples in syntax: the Mirror Principle, equidistance, Holmberg’s generaliza-
tion, various scope principles, faithfulness. I omitted Emonds’s Structure
Preservation despite its similarity in name, because it governs individual
rule applications and so lacks the holistic character of the other principles.
But I would add one more to the list: to my knowledge, the first shape-
conserving principle in the tradition of generative grammar was proposed
22 Chapter 1
in Williams 1971b, namely, that tonal elements (e.g., High and Low) are
not features of vowels or syllables, but constitute a representation sepa-
rate from segmental structure, with its own properties, and that that sep-
arate representation is made to correspond algorithmically to segmental
structure, also with its own, but di¤erent, structure. Tonal Structure, at
least as I discussed it then, was rather primitive, consisting of a sequence
of tones (L, H) grouped into morphemes; and this structure was mapped
to another linear representation, the sequence of vowels (or syllables) of
the segmental structure, in a one-to-one left-to-right manner, in a way
that accounted for such phenomena as tonal spreading. Clearly, there is
a shape-conserving principle in this, even if I did not explicitly identify it
as such; to use the terminology of this book, after the mapping Syllable
Structure represents Tonal Structure, in that elements of Tonal Structure
are put into one-to-one correspondence with elements of Syllable Struc-
ture, and the properties of Tonal Structure (only ‘‘x follows y,’’ since it is
a list) are preserved under the representation.
1.5 The Representation Model
If there are systematic circumstances in which grammar seems to want to
preserve relations between elements, we might consider building a model
from scratch that captures these directly and without contrivance.
Suppose we analyze the grammatical system into several distinct com-
ponents, each of which defines a set of structures (a sublanguage), and
which are related to each other by shape-conserving mappings. The syn-
tax of a clause will then be a mapping across a series of representations,
from Theta Structure to Case Structure to Surface Structure, and so on.
(37)
Shape Conservation 23
AS is a partial phonological representation with sentence accent structure
assigned. Its role will be developed in chapter 9.
To compare with the more standard model, we can see this series of
structures (38) as a decomposition of the standard clause structure (39),
with the following correspondences: what is done by structural embed-
ding in the standard theory is done by the representation relation in RT;
and what is done by movement up the tree in the standard theory is done
by isomorphic mapping across this series of representations in RT.
(38)
(39)
An immediate consequence of this decomposition is that in RT there
can be no such thing as an item being left far behind—everything that is
going to be in the clause must make it to the last representation of (38),
which would be equivalent to every NP moving to the very top shell
of (39). The single deep tree of the standard Pollock-style minimalist
theory, on the other hand, allows such ‘‘deep stragglers.’’ Although cer-
tain widely accepted accounts of some constructions (e.g., transitive ex-
pletive constructions) entail the surface positioning of NPs in original
theta positions, it seems that the trend has instead been more and more
toward analyses in which NPs never appear in deep positions. To the ex-
tent that this trend is responding to some feature of reality, I would say
that it confirms RT, in which any other arrangement is not just impossi-
ble, but literally incoherent.
Another way to contrast the two theories is in how semantics is done.
Semantics in the ramified Pollock- and Cinque-style model can be com-
24 Chapter 1
positional, in the usual sense; but semantics in RT is ‘‘cumulative,’’ in a
sense spelled out below and in chapter 9. ‘‘Embedding’’ here is not struc-
tural embedding, but ‘‘homomorphic’’ embedding: TS is ‘‘embedded’’ in
FS by a series of shape-conserving mappings.
Not everything that is a movement in the standard theory will become
an interlevel (mis-)mapping in RT. I have already remarked that wh
movement is a movement within a level, presumably SS. An interesting
pair in this regard is short-distance and long-distance scrambling. Short
scrambling might best be modeled as a mismapping between CS and SS,
whereas long scrambling might best be treated like wh movement, or
perhaps a ‘‘higher’’ mismapping (SS‘FS, for example). The di¤erent
behavior of short and long scrambling with respect to binding theory and
reconstruction should follow from this distinction. (See section 3.1 for
details, and chapters 4 and 5 for generalized applications of the chapter 3
methodology.)
Many questions about this model and its di¤erences from standard
models are still unaddressed. Though most of them will remain so, I will
take up two fundamental questions in chapters 3 and 4.
First, there is the issue of embedding: how is clausal embedding ac-
complished in RT? Embedding could have worked something like this:
elements defined in ‘‘later’’ systems (QS, FS, etc.) are ‘‘rechristened’’ as
theta objects, which can then enter into theta relations in TS. This ac-
count would preserve the obvious relation to standard minimalist practice
and its antecedents back to Syntactic Structures (Chomsky 1957). But in
chapter 3 I will try out a di¤erent view, with surprisingly di¤erent con-
sequences: embedding of di¤erent subordinate clause types happens at
di¤erent levels in RT, where the di¤erent clause types vary along the di-
mension of ‘‘degree of clause union.’’ The principle for embedding is,
‘‘Embed at the level at which the embedded object is first defined’’ (the
Level Embedding Conjecture of chapter 3). For small embeddings, like
that found in serial verb constructions, the level is TS; but for tensed-
clause embedding, the level is SS.
Second, how is semantic interpretation done in this model? Each of the
levels is associated with a di¤erent sort of value, and in chapters 4 and 9
I will try to specify what these values are. Perhaps the most important
di¤erence between RT and the standard model, then, is that there is not
one single tree that represents the meaning; TS represents theta structures,
QS scope relations, FS information structure of the kind relevant to focus,
and so on. The structure of a sentence consists of a set of structures, one
Shape Conservation 25
from each of these components, with the shape-conserving mapping
holding among them. Clearly, the meaning is determinable from these
representations; for example, it would be trivial to write an algorithm
that would convert such representations into classical LF structures. But
it is not the case that linguistic meaning can be identified with one of
these levels. To borrow a philosopher’s term, one might say that linguis-
tic meaning is supervenient on these representations (if it is not iden-
tical with them), in that any di¤erence in the meaning of two sentences
will correspond systematically with some di¤erence in their representation
structure. Systematicity will guarantee some notion of semantic composi-
tionality. Compositionality will hold within a level, but it will also hold
across levels. I am not sure that linguistic semantics needs anything more
than this.
Having promised to address these two substantive issues in future
chapters, I would now like to put aside a concern that I think is over-
rated. The following sentiment was often expressed to me while I was
developing the ideas outlined here: ‘‘You’ve replaced movement gov-
erned by distance minimization with holistic mapping between levels
governed by shape conservation. But the properties of movement are
rather well understood, whereas you can give only the barest idea of what
constitutes ‘structure matching’—so the theories aren’t really empirically
comparable.’’
My main objection to this is not what it says about my account of
shape conservation. I accept the charge. But I must question the claim
that there is a notion of movement that is widely accepted, much less un-
derstood. If we review the properties of movement, we find that none of
them are constant across even a highly selective ‘‘centralist’’ list of works
that seek to use movement in significant acts of explanation. What would
the properties be?
1. Is movement always to a c-commanding position?
2. Is movement always to the left?
3. Is movement always island governed?
4. Does movement always leave a gap?
5. Does movement always result in overt material in the landing site?
6. Does movement always move to the top?
7. Is movement always of an XP?
For each of these questions it is easy to find two serious e¤orts at ex-
planation giving opposite answers. For example, in work reviewed in
26 Chapter 1
chapter 6 of this book, Richards (1997) proposes that some movement
does not obey islands (question 3). In addition, Richards proposes that
movement is not always to the edge of its domain, but sometimes ‘‘tucks
in’’ beneath the top element, to use his informal terminology (question 6).
Koopman and Szabolcsi (2000) insist that there is no head movement
(question 7). And so on.
Movement, then, is a term associated with di¤erent properties in di¤er-
ent acts of explanation, and the intersection of those properties is essen-
tially null. This does not mean that no one who uses the term knows what
he or she means by it, only that there is no common understanding. I
don’t think that is a bad thing. The di¤erent uses are after all related; for
example, although it is perfectly acceptable to build a theory in which
movement sometimes leaves a gap, and sometimes leaves a pronoun, it
would be unacceptable to use the term movement in such a way that it
covered none of the cases of gap formation. So it is not that the term is
completely meaningless. But still there is no shared set of properties that
has any significant empirical entailments on its own. Someone who is
pursuing Antisymmetry, for example, will have a very di¤erent under-
standing of the term than someone who is not.
It is the familiarity of the term itself that gives rise to the illusion that
there is a substantive shared understanding of what it refers to. If every
linguist had to replace every use of the term movement with the more
elaborate syntactic relation with properties P1, P2, P3, P7, P23, I think
fewer linguists would claim that ‘‘movement is rather well understood,’’
and then some audience could be mustered for notions of syntactic rela-
tion for which the term movement is not particularly appropriate.
Shape Conservation 27
This page intentionally left blank
Chapter 2
Topic and Focus inRepresentation Theory
In chapter 1 I made some rather vague suggestions about how Case sys-
tems might be seen as ‘‘representing’’ TS, and in doing so gave some idea
about how the ‘‘left end’’ of the RT model uses the principle of Shape
Conservation. In this chapter I will turn to the other end and show how
the same notion can be used to develop an understanding of how topic
and focus interact with surface syntax.
This chapter is essentially about the interpretive e¤ects of local scram-
bling. Although English will figure in the discussion, my chief aim
will be to explicate, in terms of Shape Conservation, some mainly well
known findings about Italian, German, Spanish, and Hungarian having
to do with word order, topic, and focus. The interpretive e¤ects of long-
distance scrambling, and its place in RT, will be taken up in chapters 3
and 5, where the A/A distinction is generalized in a way that makes sense
of the di¤erence between long- and short-distance scrambling.
Long and short scrambling pose a special problem for Checking
Theory. Checking Theory provides a methodology for analyzing any
correlation between a di¤erence in syntactic form and a di¤erence in
meaning: a functional element is postulated, one whose semantics deter-
mines the di¤erence in meaning by a compositional semantics, and whose
syntax determines a di¤erence in form by acting as an attractor for
movement of some class of phrases to its position. That is, interpretable
features trigger movement. But, as I will show, in the case of focus the
moved constituent does not in general correspond to the Focus. It of
course can be the Focus itself; but in addition, it can be some phrase that
includes the Focus, or it can be some phrase that is included in the Focus.
While the first might be (mis)analyzed as a kind of pied-piping, the sec-
ond makes no sense at all from the point of view of triggered movement.
The problem with Checking Theory that will emerge from the following
observations is that it atomizes syntactic relations into trigger/moved-
element pairs, whereas in fact the syntactic computation targets structures
holistically.
2.1 Preliminaries
I will use Topic and Focus in their currently understood sense: the Topic
consists of presupposed information, and the Focus of new information.
Elsewhere (Williams 1997) I have developed the idea that Focus is es-
sentially an anaphoric notion and that Topic is a subordinated Focus. I
will take this idea up again in chapter 9, but will ignore it until then.
In chapter 1 I introduced two sets of structures, QS (¼ TopS) andFS. The properties of these structures and their relation to other struc-
tures under Shape Conservation will carry the burden of accounting for
the features of topic and focus to be examined here. The di¤erences
among the languages to be discussed will be determined by either (a) dif-
ferences in the rules for forming each structure or (b) di¤ering repre-
sentational demands (e.g., SScQS representation ‘‘trumping’’ SScCS
representation in some languages, with SS‘FS figuring in in a way to be
described).
QS represents not only the topic structure of the clause, but also the
scopes of quantifiers. The reason for collapsing these two is empirical,
and possibly false: wide scope quantifiers seem to behave like Topics, and
unlike Focuses. First, languages in which topic structure is heavily re-
flected in surface syntax tend to be languages in which quantifier scope is
also heavily reflected. German is such a language, but English is not.
Second, focusing allows for reconstruction in the determination of scope,
but topicalization does not. The latter di¤erence has a principled account
in RT, a topic explored in chapters 3 and 5.
2.2 The Structure of QS and FS
QS and FS bear representational relations to SS: SS represents QS, and
FS represents SS. In this section I will give a rough sketch of these struc-
tures, leaving many details to be fixed as analysis demands, as usual.
One question to be resolved in establishing the basic notions in this
domain is, what is the relation among the semantic notions to be repre-
sented (Topic status, wide scope) and the structural predicates precedes
and c-commands? Most clearly for adjuncts, relative scope seems to de-
30 Chapter 2
pend on the stacking relation, not the linear order, if we can rely on our
judgments of the following sentences:
(1) a. John was there a few times every day. (every > few)
b. [[[was there] a few times] every day]
c. [[[John was there] every time] a few days] (few > every)
Adjuncts are not subject to the long scope assignment that is characteris-
tic of argument NPs in a language like English, and so the stacking
order determines the interpretation: every > few for (1a), and few > every
for (1c). By contrast, in (2) the understood order of the quantifiers is
ambiguous.
(2) John saw a friend of his every day.
The simplest assumption is that again the stacking order determines the
order of interpretation, but that the direct object in (2) is subject to wide
scope assignment. So in QS scope is determined by stacking, but some
items (NPs in argument positions) are subject to long scope assignment.
Unlike quantification, topicalization seems to always be associated
with leftward positioning of elements, not just in English, but generally
across language types.
We will assume that QS incorporates both of these facts, generating a
set of structures that represent both topicalization and scope, around a
head X. These structures have roughly the following form:
(3)
The structures have a Topic segment and a non-Topic segment with
obvious, if not well understood, interpretation; in addition, hierarchical
relations determine relative scope.
Surface structures are mapped into QS under the regime of Shape
Conservation. Since the Topic segment of quantification structures is on
the left edge, items on the left edge in SS will be mapped into them iso-
morphically. In English this will include subjects, and Topics derived by
movement.
(4) a. [XP* [XP* [ . . . ]]]
Topic segment non-Topic segment
b. John left early
c. John I saw yesterday
Topic and Focus 31
This permits the Topic-like qualities of the subject position to assert
themselves without any explicit movement to the subject position; the
subject is mapped to one of the Topic positions in QS just as a moved
Topic would be.
The interpretation of focus is not at all straightforward. It is traditional
to distinguish two kinds of focus, normal and contrastive. In Williams
1981a, 1997, I argued that they should not be distinguished. Here, and
especially in chapter 9, I will in fact defend the distinction, but I will
rationalize it as involving di¤erent RT levels. In this chapter I will use
the distinction for expository, nontheoretical purposes. I will take normal
focus to be reliably identified by what can be the answer to a question;
thus, the Focus in (5B) is exactly that part of the answer that corresponds
to the wh phrase in (5A).
(5) A: What did George buy yesterday?
B: George bought [a hammock]F yesterday.
Contrastive focus, on the other hand, arises in ‘‘parallel’’ structures of the
sort illustrated in (6).
(6) John likes Mary and SHE likes HIM.
It will be worthwhile to make this distinction because (a) some languages
have di¤erent distributions for normal and contrastive focus, and (b) the
terminology will be convenient for describing some of the interpretive
e¤ects of scrambling discussed here.
The Focus itself, in a language like English, is always a phrase bearing
accent on its final position. In FS there seems to be a preference for the
Focus to come at the end of the sentence; this is reflected in normal focus
in Spanish, and in interpretive e¤ects for English scrambling (heavy NP
shift). I conclude therefore that FS is characterized by final positioning of
Focus.
But apparently these directional properties of the English focus sys-
tem are not fixed universally. Hungarian seems to exhibit the opposite
scheme. It has a Focus position that appears at the left edge of the VP,
just before the verb; all of the nontopicalized verbal constituents, includ-
ing the subject, appear to the right.
(7) Janos
Janos.nom
Evat
EVA.acc
varta
waited
a
the
mozi
cinema
elott.
in-front-of
‘Janos waited for EVA in front of the cinema.’
(E. Kiss 1995, 212)
32 Chapter 2
In (7) Evat is focused, as it is the preverbal constituent; Janos is topical-
ized. Hungarian FS thus has the following form:
(8) Hungarian FS
Topic Topic . . . Focus [V XP YP . . . ]
(Furthermore, Hungarian Focuses are left accented, instead of right ac-
cented, perhaps an independent property.)
In fact, the normal Focus is not always at the right periphery even in
languages like English. In addition to rightward-positioned Focuses, par-
ticular XPs in particular constructions have the force of a Focus by virtue
of the constructions themselves; examples in English are the cleft and
pseudocleft constructions.
(9) a. Cleft
it was XPF that S It was John that Mary saw.
[what S] is XPFb. Pseudocleft
XPF is [what S] John is what Mary saw.
The XPs in such structures can be used to answer questions and so can be
normal Focuses, or they can be contrastive Focuses (10a); furthermore,
they are incompatible with being Topics (10b). There is thus strong rea-
son to associate the pivots of these constructions with Focus.
(10) a. What did John experience?
What John experienced was humiliation.
It was humiliation that John experienced.
b. What did John experience?
*It was John who experienced humiliation.
*John is who experienced humiliation.
I will simply include these structures in FS without speculating about
why they do not have the Focus on the right or whether there is a single
coherent ‘‘definition’’ of the structures in FS. I will postpone the latter
issue until chapter 9, where I take up the general question of how levels
determine interpretation.
2.3 Heavy NP Shift
With these preliminaries, I now proceed to an analysis of heavy NP shift
(HNPS). I will argue that HNPS is not the result of movement, either to
Topic and Focus 33
the left or to the right, but arises from mismapping CS onto SS. In par-
ticular, I will argue that Checking Theory does not analyze HNPS
appropriately.
That focus is implicated in HNPS is evident from the following
paradigm:
(11) a. John gave to Mary all of the money in the SATCHEL.
b. *John gave to MARY all of the money in the satchel.
c. John gave all of the money in the satchel to MARY.
d. John gave all of the money in the SATCHEL to Mary.
One could summarize (11) in this way: HNPS can take place to put the
Focus at the end of the clause, but not to remove a Focus from the end
of the clause—thus, (11b) is essentially ungrammatical. It is as though
HNPS must take place only to aid and abet canonical FS representation,
in which focused elements are final. (11d) shows that whatever HNPS is,
it is optional. In sum, the neutral order (V NP PP) is valid regardless of
whether the Focus is final or not, but the nonneutral order (V PP NP) is
valid only if NP is the Focus.
In fact, though, the situation is slightly more complicated, and much
more interesting. In what follows I will refer to the direct object in the
shifted sentences as the shifted NP, because in the classical analysis it is
the moved element. The form in (11a) is valid not just when the Focus is
the shifted NP, but in fact as long as the Focus is clause final in the
shifted structure, whether or not the shifted NP is the Focus itself. It is
valid both for Focuses smaller than the shifted NP and for Focuses larger
than the shifted NP, as the following observations will establish.
First, the licensing Focus can be a subpart of the shifted NP.
(12) A: John gave all the money in some container to Mary. What
container?
B: (11a) John gave to Mary all of the money in the SATCHEL.
In this case the Focus is satchel, smaller than the shifted NP. Second, the
licensing Focus can be larger than, and include, the shifted NP; specifi-
cally, it can be the VP.
(13) A: What did John do?
B: (11a) John gave to Mary all of the money in the SATCHEL.
In sum, HNPS is licensed if it puts the Focus at the end of the sentence
(12), or if it allows Focus projection from the end of the sentence (13). It
34 Chapter 2
thus feeds Focus projection; recall that Focus projection is nothing more
than the definition of the internal accent pattern of the focused phrase
itself, which in English must have a final accent.
This constellation of properties is not well modeled by Checking
Theory, including Checking Theories implementing remnant analyses. To
apply these theories to the interaction of HNPS and focus would be first
to identify a functional projection with a focus feature, then to endow the
Focus of the clause with that same focus feature, and then to move the
one to the other. Without remnant movement the result would be classi-
cal NP shift, a movement to the right. Remnant movement allows the
possibility of simulating rightward movement with a pair of leftward
movements. Suppose, for example, that NP in (14) is the Focus.
(14) [V NPF PP]! . . . NPF [V t PP]! [V t PP] NPF tFirst the focused NP moves; then the remnant VP moves around it.
The problem with both the remnant movement and the classical
Checking Theory analyses is that the shifted NP is the Focus only in the
special case, not in general. So it is hard to see why, for example, a
structure like the one in (14) would be appropriate for VP focus—the
movement of the NP would be groundless, as it is not the Focus.
The correct generalization is the one stated: HNPS is licensed if it
results in a canonical AS‘FS representation. This means that it results
in the rightward shifting either of the focused constituent or of some
phrase containing the focused constituent. So, for example, (11a) with VP
focus has the following structure:
(15) CS: [V NP PP]‘! SS: [V PP NP]‘FS: [V PP NP]F
In other words, the CS, SS mismatch (marked by ‘‘‘!’’) is tolerated be-
cause of the SS, FS match. In (11b), on the other hand, both CS‘ SS and
SS‘FS are mismatched.
(16) CS: [V NP PP]‘! SS: [V PP NP]‘! FS: [V NP PPF]
This double misrepresentation is not tolerated in the face of alternatives
with no misrepresentation. (In chapter 9 I will elaborate the theory of
focus, as well as these representations, with a further relevant level (Ac-
cent Structure), but these changes will not a¤ect the structure of the
explanations given here.)
What this little system displays is an excessive lack of ‘‘greed,’’ to use
Chomsky’s (1993) term: HNPS is licensed by a ‘‘global’’ property of the
Topic and Focus 35
VP, not by the shifted NP’s needs. This is why it is di‰cult to model it
with Checking Theory, because Checking Theory atomizes the move-
ments and requires each to have separate motivation—interesting if cor-
rect, but apparently not. The remnant movement analysis is particularly
bad: not only is the wrong thing moved (sometimes a subphrase, some-
times a superphrase of the target), but the ensuing remnant movement
has no motivation either.
Hungarian focusing shows the same lack of correspondence between
displaced constituents and Focuses that English focusing does. Recall
that Hungarian has Focus-initial FS structures; furthermore, the Focus
itself is accented on the first word.
(17) a. Janos [
Janos
a
the
TEGNAPI
YESTERDAY’s
cikkeket]
articles
olvasta . . .
read
‘Janos read YESTERDAY’s articles . . .’
b. . . . nem
not
a
the
maiakat.
today’s
‘. . . not today’s.’
c. . . . nem
not
a
the
konyveket.
books
‘. . . not the books.’
d. . . . nem
not
a
the
furdoszobaban
bathroom-in
enekelt.
sang
‘. . . not sang in the bathroom.’
(Kenesei 1998, as reported in Szendroi 2001)
The fronted constituent is bracketed in (17a). As (17c) shows, that con-
stituent can be the Focus; but (17b) shows that the Focus can be smaller,
and (17d) shows that it can be larger, including the verb.
I have suppressed one further detail in connection with HNPS that is
now worth bringing to light. (11b) is not, strictly speaking, ungrammat-
ical. Rather, it has a very specialized use: it can be used ‘‘metalinguisti-
cally,’’ as in (18).
(18) A: John gave to Joe all the money in the SATCHEL.
B: No, John gave to MARY all the money in the satchel.
That is, it can be used to correct someone. Rather than brushing these
examples aside, I will show that their properties follow from the way in
which phonological and syntactically defined focus are related to each
other. But I will not do this until chapter 9, where I take up the notion of
36 Chapter 2
the ‘‘values’’ that are defined at each level, and how the values of one
level are related to the values of other levels.
So, HNPS is analyzed here, not as a movement, but as a mismapping
between CS and SS that is licensed by a proper mapping between SS and
FS. As such, it should not show the telltale marks of real movement; that
is, it should not leave phonologically detectable traces, it should intersect
rather than nest with itself, and so on. Some of these behaviors are hard
to demonstrate. However, there is one property of HNPS that has been
put forward to show that it is a real movement: it can license parasitic
gaps, and so is in fact a kind of A movement. (19) is the kind of sentence
that is meant to support this idea.
(19) John put in the satchel, and Sam t in the suitcase, all the money
they found.
The argument is based on the correct hypothesis that only ‘‘real’’ traces of
movement can license parasitic gaps, but it wrongly assumes that HNPS
is necessarily involved in the derivation of such examples.
In fact, such examples can arise independently of HNPS, through the
action of right node raising (RNR), a process not fully understood, but
clearly needed in addition to HNPS. RNR, in the classical analysis, is an
across-the-board application of a rightward movement rule in a coordi-
nate structure, as illustrated in (20).
(20) John wrote t, and Bill read t, that book.
This analysis of RNR has been contested (see Wilder 1997; Kayne 1994),
but not in a way that changes its role in the following discussion. Given
such a rule, we would expect sentences like (19) even if there were no
HNPS, so it cannot be cited to show that HNPS is a trace-leaving move-
ment rule.
We can understand (19) as arising from the across-the-board extraction
of the NP [all the money they found ] from the two Ss that precede it,
thereby not involving HNPS essentially (though of course the input
structures to RNR could be shifted; it is hard to tell).
(21) [John put ti in the satchel] and [Sam put ti in the suitcase] NPi
Evidence that RNR is the correct rule for this construction comes from
the fact that HNPS does not strand prepositions, combined with the ob-
servation that such stranded prepositions are indeed found in sentences
analogous to (21).
Topic and Focus 37
(22) a. John talked to ti about money, and Bill harangued ti about
politics, [all of the . . . ]ib. *John talked to ti about money [all of the . . . ]i
Although awkward, (22a) is dramatically better than (22b), and so HNPS
is an unlikely source for sentences like (21). See Williams 1994b for fur-
ther argument.
Although the failure of HNPS to leave stranded prepositions is used
as a diagnostic in the argument just given, it is actually a theoretically
interesting detail in itself. If HNPS is a movement rule, and, I suppose,
especially if it is a leftward parasitic-gap-licensing movement, as it is in
the remnant movement analyses of it, then why does it not strand prepo-
sitions, as other such rules do? In the RT account, HNPS arises in the
mismatch between SS and CS: the same items occur, but in di¤erent
arrangement, so stranding cannot arise, as stranding creates two con-
stituents ([P t] and NP) where there was one, in turn creating intolerable
mismatch between levels.
2.4 Variation
Some levels are in representation relations with more than one other level,
giving rise to the possibility that conflicting representational demands will
be made on one and the same level. An item in SS, for example, must be
congruent to a Case structure and to a quantification structure, and these
might make incompatible demands on the form of SS. Since mismatches
are allowed in the first place, the only question is whether there is a sys-
tematic way to resolve these conflicts. I will suggest that languages di¤er
with respect to which representation relations are favored.
This arrangement is somewhat like Optimality Theory (OT), if we
identify the notion ‘‘shape-conserving representation relation’’ with
‘‘faithfulness.’’ But RT and OT di¤er in certain ways. In RT only com-
peting representation relations can be ranked, and they can be ranked
only among themselves and only where they compete on a single level.
Intralevel constraints are simply parts of the grammar of each indepen-
dent sublanguage, and so cannot be ranked with the representation rela-
tions those sublanguages enter into. In this regard RT is more restrictive
than OT. On the other hand, I will be assuming that the properties of the
sublanguages themselves are open to language-particular variation; and
in this respect RT is less restrictive than OT, as OT seeks to account for
38 Chapter 2
all language particularity through reordering of a homogeneous set of
constraints.
RT also resembles theories about how grammatical relations (subject,
object, etc.) are realized in syntactic material. For example, Lexical-
Functional Grammar (LFG; Kaplan and Bresnan 1982) posits two levels
of representation, f-structure and c-structure. F-structure corresponds
most closely to the level called TS here, and c-structure corresponds most
closely to everything else. An algorithm matches up c-structures and
f-structures by generating f-descriptions, which are constraints on what
c-structures can represent a given f-structure. Since the overall e¤ect is
to achieve a kind of isomorphism between c-structures and f-structures,
the grammatical system in LFG bears an architectural similarity to the
RT model, especially at the ‘‘low’’ (TS) end of the model, even though
there is no level in RT explicitly devoted to grammatical relations them-
selves, that work being divided among other levels. Similar remarks apply
to the analysis of grammatical relations presented in Marantz 1984.
LFG di¤ers from RT in several ways. First, the matching between
c-structure and f-structure is not an economy principle, so the notion
‘‘closest match’’ plays no role. The LFG f-description algorithm tends
to enforce isomorphism, but its exact relation to isomorphism is an ac-
cidental consequence of the particulars of how it is formulated. By com-
parison, in RT exact isomorphism is the ‘‘goal’’ of the relations that hold
between successive levels, and deviations from exact isomorphism occur
only when, and to the exact degree to which, that goal cannot be achieved.
Second, LFG posits only two levels, whereas RT extends the matching
to a substantially larger number of representations, in order to maximize
the work of the economy principle.
Third, and most important, the place of embedding in the two systems
is di¤erent. I will propose in chapter 3 that embedding takes place at
every level, in the sense that complements and adjuncts are embedded in
later levels that have no correspondents in previous levels. In LFG, if
embedding is done anywhere, it is done everywhere; that is, if a clause is
present in c-structure, it has an f-structure image. Thus, the predictions of
RT made and tested in chapters 3–6 are not available in LFG.
2.5 English versus German Scrambling
Let us now turn to a systematic analysis of the di¤erence between English
and German in terms of mismapping between levels. Keeping in mind the
Topic and Focus 39
rough characterization of QS and FS given above, we may now charac-
terize that di¤erence as follows:
(23) a. German: SScQS > SScCS
b. English: SScCS > SScQS
c. Universal: SS‘FS
That is, in German SS representation of QS is more important than SS
representation of CS (signified by ‘‘>’’); in English the reverse is true.
And in all languages of course FS represents SS.
Let us now examine what expectations about German will flow from
the specifications in (23). Perhaps arbitrarily, I identify the following four:
1. Two definite NPs in German should not be reorderable, apart from
focus.
2. Definite pronouns move leftward.
3. A definite NP obligatorily moves leftward over (only indefinite)
adverbs.
4. Surface order disambiguates quantification, except where Q is focused.
Expectation 1: First, two definite NPs in German should not be re-
orderable, unless a special focusing is to be achieved. This is true because
SS must represent CS, unless that requirement is countervailed by some
other representational need.
(24) Two definites are not reorderable with normal focus
a. IO DO V (CS order)
b. *DO IO V
This conclusion is true, and in fact is a commonplace of the literature on
the German middlefield; see Deprez 1989 for a summary account.
Expectation 2: Definite pronouns appear on the left edge in SS, as
required by QS (¼ TopS), since they are always D-linked—again, a com-monplace of the literature.
Expectation 3: A definite NP will move leftward over an adverb, in
defiance of CS, in order for SS to match QS, as definites always have
wider scope than indefinite adverbs; see the end of this section for a dis-
cussion of the behavior induced by definite adverbs, based on findings of
Van Riemsdijk (1996). But the pull to the left to move the direct object
into the clause-initial Topic field of QS can be countervailed by the need
to place narrow focus on the object, as in (25b), which makes leaving the
NP after the adverb an option, even though the NP is D-linked. The key
40 Chapter 2
here is to understand that an NP can be both focused and D-linked, in
e¤ect both focused and topicalized, and that both of these properties are
needed to understand the German middlefield behavior. The following
cases show that these expectations are fulfilled:
(25) Definites move left, except if narrowly focused
a. weil
because
ich
I
die
the
Katze
cat
selten
seldom
streichle
pet
‘because I seldom pet the cat’
b. ?*weil ich selten die Katze streichle
(good only if contrastive focus on Katze (Diesing 1992) or
[Katze streichle] (M. Noonan, personal communication))
c. weil ich die KATZE selten streichle
(only narrow focus on KATZE )
d. What did Karl do?
Den
the
HUND
dog
hat
has
Karl
Karl
geschlagen.
beaten
‘Karl beat the DOG.’
(Prinzhorn 1998)
In passing, note the di‰culty this sort of example poses for a remnant
movement analysis of topicalization, or for rightward movement. The
problem, in both cases, is that the verb stays at the end, no matter what.
If we assume SVO order (as remnant movement theories generally do for
SOV languages), then to derive (25b) where the object is focused, we must
perform the operations of focusing and remnant movement, resulting in
something like one of the two following derivations:
(26) a. weil ich selten streichle die Katze ! topicalization
weil ich die Katze [selten t streichle] ! remnant movement
weil ich [selten streichle] die Katze ! ?? derive SOV order
weil ich selten die Katze streichle
b. weil ich selten streichle die Katze ! derive SOV order
weil ich selten die Katze [streichle t] ! topicalization
weil ich die Katze [selten streichle t] ! ?? remnant movement
weil ich selten die Katze streichle
The last step is the puzzler—how to get the verb in final position again,
but at the same time end up with the adverb before the direct object. The
operations otherwise motivated, including the remnant movement half of
focusing, do not seem to have the properties needed to achieve this.
Topic and Focus 41
In German, scrambling is more or less obligatory to disambiguate the
scope of coarguments, so there is much less surface quantifier ambiguity
in German than in English. This is because German favors QS represen-
tation over CS. But again, there is an important exception: when the sec-
ond of the two NPs is narrowly focused, it can remain in situ and be
scopally ambiguous there. The important thing here is that the possibility
of wide scope in the rightmost position is dependent on narrow focus.
Despite the other di¤erences between the two languages, German behaves
identically to English in this respect, mimicking the special contours of
the HNPS construction discussed in section 2.3, mutatis mutandis: in
German FS countervails QS representation, whereas in English HNPS it
countervails CS representation.
Importantly, German does not require that the rightmost NP be the
Focus itself; rather, it must be a part of a narrow Focus, as (25b) shows.
This detail precisely matches the case of English HNPS. It would appear
that the ‘‘global’’ property of having a canonical FS representation over-
rides the German-particular requirement that SS be a canonical QS
representation.
Expectation 4: The notion that QS is the level in which both Topics
and quantifiers get their scopes is supported by the fact that scope inter-
pretation interacts with focusing in exactly the same way that Topics do,
as the following examples establish:
(27) Movement disambiguates quantified NPs
a. ~&dass
that
eine
a
Sopranistin
soprano
jedes
every
Schubertlied
Schubert song
gesungen
sung
hat (eine > jedes)
has
‘that a soprano sang every song by Schubert’
b. ~&dass jedes Schubertlied eine Sopranistin gesungen
hat ( jedes > eine)
(Diesing 1992)
(28) ‘‘Unmoved’’ NP is ambiguous if and only if narrowly focused
a. &Er
he
hat
has
ein
a
paar
couple
Mal
times
das
the
langste
longest
Buch
book
gelesen.
read
‘He read the longest book a couple of times.’
b. ~&Er hat das langste Buch ein paar Mal gelesen.
Example (28) in particular shows that a wide scope quantifier can be left
in situ exactly in case it is narrowly focused.
42 Chapter 2
In remnant movement Checking Theories (28a) would need to be rep-
resented as follows:
(29) a. Assign (i.e., check) scope
er hat das langste Buch [ein paar Mal [t gelesen]]
b. Assign (i.e., check) Focus
er hat [ein paar Malj [das langste Buchi [tj [ti gelesen]]]]
Ein paar Mal must move precisely because das langste Buch is the Focus,
and thus not for reasons of its own. The di‰culty is increased, just as it
was in the case of HNPS, by the fact that the same word order and scope
interpretation are possible if the whole VP [das langste Buch gelesen] is
narrowly focused. In other words, not only does narrow focus in a quan-
tified NP permit in-situ positioning, but so does canonical Focus projec-
tion from that NP. Again, this is exactly the behavior found earlier for
HNPS in English. Although I do not have relevant examples, I would
expect the same results in (27) and (28) if the Focus was a subconstituent
of the direct object (e.g., contrastive focus on the noun Buch), again by
parallelism with the HNPS facts.
The overall relation of focus to topic in German can be summarized in
the following cascade of exceptions:
(30) NP must be in Case position
except if D-linked or wide scoped
except if narrowly focused or part of a canonical narrow
Focus.
RT derives this cascade from the competition of congruences that SS
must enter into.
In English, SS does not represent QS, but rather CS; thus, quantifier
ambiguities abound.
(31) He has read the longest book a couple of times.
Example (31) is ambiguous even if the whole sentence is the Focus (as it
would be, for example, in answer to the question, What happened?). The
two readings have the following structures:
(32) a. CS‘ SS !cQS (narrow scope for the longest book)
b. CS‘ SScQS (wide scope for the longest book)
By the logic of RT, (32a) is tolerated, in the face of (32b), because (32a)
gives a meaning that (32b) does not.
Topic and Focus 43
But it is not enough for a misrepresentation (or in classical terms, a
movement) to serve some purpose—it matters which purpose. For exam-
ple, HNPS is not justified simply to achieve QS‘ SS representation,
as (33) shows.
(33) *John gave to every FRIEND of mine a book. (E > b)
Rather, HNPS is justified only to achieve FSc SS congruence, as estab-
lished earlier. Although it is conceivable that a language could work the
other way (since in fact German does), English does not. It does not be-
cause it rates CS representation over QS representation tout court.
In the main line of work within the ramified Pollock-style theory of
clause structure, the leftward positioning of topicalized NPs is achieved
by movement—that is, by the same kind of relation that wh movement is.
Evidence of movement comes from viewing the di¤erent positions an NP
can occupy under di¤erent interpretations, where positions are identified
with respect to adverb positions. This methodology has been thoroughly
explored in a variety of languages.
Van Riemsdijk (1996) has pointed out the following problem with this
methodology. In German the adverbs themselves seem subject to the
same dislocating forces as the NPs; that is, definite adverbs such as dort
‘there’ move leftward, compared with their indefinite counterparts such as
irgendwo ‘somewhere’, as the following paradigm illustrates:
(34) a. Ich
I
habe
have
irgendwem/dem Typ
someone/that guy
irgendwas/das Buch
something/the book
versprochen.
promised
‘I promised someone/that guy something/the book.’
b. *Ich habe irgendwas dem Typ versprochen.
c. Ich habe das Buch dem Typ versprochen.
(35) a. Sie
she
hat
has
irgendwo/dort
somewhere/there
wen/den Typ
someone/that guy
aufgegabelt.
picked up
‘She picked someone/that guy up somewhere/there.’
b. ??Sie hat irgendwo den Typ aufgegabelt.
c. Sie hat dort den Typ aufgegabelt.
(Van Riemsdijk 1996)
Example (34) shows the relative ordering properties for a definite and an
indefinite NP, and (35) shows the same thing for an adverb and an NP: a
definite NP is bad after an indefinite adverb, but OK after a definite ad-
verb. This finding calls into serious question whether adverbs can be used
as a frame of reference against which to measure the movement of NPs. It
44 Chapter 2
also calls into question the notion that adverbs occupy fixed positions
in functional structure determined solely by what they are understood to
be modifying. And it suggests that everything, including adverbs, is mov-
ing in the same wind, or rather the same two countervailing winds of QS
(¼ TS) and FS.
2.6 Hungarian Scope
Brody and Szabolcsi (2000) (B&S) present Hungarian cases just like the
German cases observed by Noonan and others cited earlier. That is,
moved quantifiers are unambiguous in scope, while unmoved ones are
ambiguous; but not moving has consequences for focus.
According to standard analyses since E. Kiss 1987, Hungarian quanti-
fied NPs (including the subject) are generated postverbally and then
moved to the left of the verb; leftward movement fixes scope. There are
two types of position to the left of the verb: a single Focus position im-
mediately to the left of the verb, and then a series of ‘‘Topic’’ positions to
the left of that, giving the following structure:
(36) [NPT NPT . . . NPF V . . . ]
To illustrate: (37a) is not ambiguous, but (37b) is ambiguous. This is
because in (37a) both NPs have moved, so their relative scope is fixed; but
in (37b) minden filmet has not moved, so it is scopally ambiguous.
(37) a. ~&Minden
every
filmet
film
keves
few
ember
people
nezett
saw
meg. (every > few)
prt
b. &Keves
few
ember
people
nezett
saw
meg
prt
minden
every
filmet.
film
(B&S 2000, 8)
But B&S have provided a more fine-grained version of the facts. They
report that the accent pattern of the sentence disambiguates (37b); in
particular, if minden filmet is accented, then it has wide scope over keves
ember.
(38) a. Keves ember nezett meg MINDEN FILMET. (every > few)
b. Keves ember nezett meg minden filmet. (few > every)
This is now a familiar pattern, the same one we have seen in German
and English; but how it arises in Hungarian remains to be spelled out,
and this requires a few remarks about the Hungarian FS and QS levels.
Topic and Focus 45
Like English, Hungarian allows multiple Focuses, and only one of
them can occupy the designated Focus position to the left of the verb.
Secondary Focuses can be located to the right of the verb; they cannot
occupy the positions to the left of the primary Focus, as these are Topic
positions. Thus, the Hungarian FS looks like this:
(39) Hungarian FS
[ . . . F V . . . (F) (F)]
The initial Focus position is the ‘‘normal’’ position for a single Focus; in
particular, it is the position from which Focus ‘‘projects’’ in Hungarian.
The postverbal Focus positions, if a sentence has any, are strictly narrow,
nonprojecting Focus positions.
From these remarks, we can see that the RT analysis of Hungarian
is essentially the same as that of German: in particular, SScQS >
SScCS (i.e., SS representation of QS dominates SS representation of
CS). However, as in German, FS representation can tip the balance
back.
Apart from considerations of focus, in order for minden filmet to have
wide scope, it would need to appear in preposed position, as it does in
(37a).
From the fact that preposing fixes relative scope among the pre-
posed elements, we can conclude that Hungarian QS has the following
structure:
(40) Hungarian QS
[QPi [QPj V . . . ]], where QPi has scope over QPj
And from the fact that apart from special focusing considerations, pre-
posing of quantified NPs is essentially obligatory, we can again conclude
that SS representation of QS dominates SS representation of CS.
We can see the two requirements of (39) and (40) interacting exactly in
the case of a wide scope focused quantified NP. If there is a single Focus,
it must occur in the single preverbal canonical Focus position, to satisfy
Focus representation. Such representation will also fix its scope. But if
there are two Focuses, only one can appear preverbally. The other must
appear postverbally, for the reason already discussed.
The following problem then arises. Suppose the second Focus is to
have wide scope, the situation of minden filmet in (38a). A case like this
has the following representational configuration:
46 Chapter 2
(41)
As is clear, QS is misrepresented by SS. Ordinarily, this would not
be tolerated, but in this special circumstance SS representation of FS
compensates.
If, on the other hand, minden filmet is not a Focus, as in (38b), then it
must move in order to take wide scope; the reason is that the match with
FS will not be improved by not moving, whereas the match with QS will
be. In other words, for (38b) the following three structures will be in
competition with each other:
(42)
Leaving CS representation aside, (42b) and (42c) are clearly superior to
(42a), as (42a) has a misrepresentation of QS. But (42b) and (42c) repre-
sent di¤erent meanings: (42b) has wide scope for QPi and (42c) for QPj.
(42c) and (42a) are competing for representation of wide scope for QPj,
and (42c) wins. The result is that (42b) must be the representation for
(38b) where the second quantifier minden filmet is unmoved, and so it
must have narrow scope. The di¤erence that focus makes is that (42c) is
Topic and Focus 47
not a viable candidate to represent focus on the second NP, and so (42a)
wins unopposed.
By similar reasoning, we can explain why two preverbal QPs have fixed
scope. I will assume that neither is focused, so QS representation is all
that is at stake. The canonical mapping gives the surface order (43b), so
the question is why the noncanonical mapping in (43a) is barred.
(43)
It turns out that (43a) is blocked by an alternative surface order, which
represents the scope order perfectly.
(44) SS:
QS:
QPj��!QPi
QPi��!
QPj
V
V
Thus, these Hungarian cases pattern just like the German cases con-
sidered earlier. Given this parallelism, one would expect parallels to the
cases in German in which the apparently ‘‘moved’’ phrases are not the
focused phrases themselves, but projections of the focused phrases, or
subparts of the focused phrases. I do not know the pertinent facts.
B&S give a di¤erent analysis of the ambiguity of (37b). In their view,
on the wide scope reading for minden filmet it has the structure in (45),
and the reason minden filmet has wide scope is that it is structurally
higher than the subjectþ V.(45) [Keves ember nezett meg] minden filmet.
The problem this analysis raises is of course that the subjectþ V is not anatural constituent. However, in the framework adopted by B&S it is: it
arises in a derivation in which both NPs are preposed to a position in
front of the verb.
(46) minden filmet [keves ember [nezett meg t t]]FP
The traditional Hungarian Focus position is the position immediately
preceding the verb; accepting this traditional account, B&S call the con-
48 Chapter 2
stituent consisting of the VP and the first-to-the-left NP FP. Then this
entire FP is itself preposed, giving the structure in (47).
(47) [Keves ember [nezett meg t t]]FP [minden filmet t].
That is, the derivation proceeds by hyper-Kaynian remnant movement.
There are some special problems here for analyses that use remnant
movement. The first is that such analyses cannot be applied to German,
for reasons given in the preceding section, nor can it be applied to English
HNPS, also for reasons already given—essentially, the two-way failure of
correspondence between the Focus and the moved constituent. But there
is a problem peculiar to Hungarian itself. The remnant movement of the
subjectþV is actually a movement of the entire FP, which consists of theentire VP and the focused constituent that immediately precedes it. So,
one would expect any phrase that was a part of the VP to show up to the
left of the in-situ QP; but in fact, such phrases (videon ‘on videotape’, in
the following example) can appear either before or after that QP,
(48) a. Keves ember nezett meg videon minden filmet.
b. Keves ember nezett meg minden filmet videon.
and the scope of minden filmet in both cases can be construed as wide (B.
Ugrozdi, personal communication). Example (48a) is compatible with all
theories, but (48b) is mysterious for B&S’s account, as it must have the
following structure:
(49) [Keves ember [nezett meg]VP] minden filmet tFP videon.
Somehow videon has escaped the VP (and FP), to the right. Pursuing the
logic of radical remnant movement, we might assign this example the
following structure, in which the apparent rightward movement of videon
is really the result of its leftward movement, plus radical leftward rem-
nant movement:
(50) a. [keves ember minden filmet videon [nezett meg t t t]]!b. [nezett meg t t t] [keves ember [minden filmet [videon t . . .
But the problem with this is that there should be no space between minden
filmet, which is focused, and the verb, as the Focus must always precede
the verb directly.
The general character of the problem that Hungarian poses for check-
ing theories of focus and topic is no di¤erent from what we have seen for
other languages: Checking Theory armed with triggering features for
Topic and Focus 49
focus and topicalization will wipe out any trace of Case and theta struc-
ture: once a remnant movement has taken place, all trace of Case and
theta structures is invisibly buried in entirely emptied constituents. This
consequence of remnant movement does not seem to hold empirically.
2.7 Spanish Focus
We have adopted the ‘‘answer to a question’’ test for identifying normal
focus. English allows normal focus anywhere, not just on the right edge,
as the constitution of FS would lead us to expect.
(51) A: Who did John give the books to t?
B: John gave MARY the books.
This can be taken to show that English allows FS to be misrepresented by
SS, sacrificed in this case for accurate CS representation.
(52)
Spanish, on the other hand, does not seem to permit nonfinal normal
Focuses—at least, not as answers to questions.
(53) A: Who called?
B: *JUAN
JUAN
llamo por telefono.
called
(Zubizarreta 1998)
B 0: Llamo por telefono JUAN.
(54) Spanish
FS‘ SS > . . .
The logic of this chapter suggests that Spanish di¤ers from other lan-
guages in favoring FS‘ SS representation over all others. The fact that
Spanish has a subject-postposing rule (as illustrated in (53B 0)) aids itin meeting this requirement, though RT does not causally connect the
ungrammaticality of (53B) with the presence of the postposing rule. One
reason for making no such connection is that other languages with subject
postposing (specifically Italian; see (55)) permit both (53B) and (53B 0).
50 Chapter 2
The ungrammaticality of (53B) follows directly. A related but di¤erent
approach to the problem would be to allow Spanish to have the same FS
as English, and to block (53B) by (53B 0)—that is, to say that the mereavailability of (53B 0) is enough to ensure that (53B) is blocked. I thinkthis is the wrong approach in general. First, there are languages like Ital-
ian, where the analogues of both (53B) and (53B 0) are grammatical.
(55) A: Who called?
B: GIANNI
GIANNI
ha
has
urlato.
called
B 0: Ha urlato GIANNI.(Samek-Lodovici 1996)
Second, even in a language like English, which lacks subject postposing,
we can create cases where the same logic would apply, blocking com-
pletely grammatical answer patterns like (56B).
(56) B: I gave the SATCHEL to Mary.
B 0: I gave to Mary the SATCHEL.
Clearly, the alternative order in (56B 0) does not compete with the order in(56B), or at least it does not win.
In German and English we saw that focus considerations can counter-
vail requirements of scope assignment. In Spanish we would expect focus
considerations to override requirements of scope assignment. That is, we
should find cases where NPs are obligatorily mis-scoped in surface struc-
ture because of overriding focus requirements. I do not have the relevant
facts at the moment. There is one methodological obstacle to getting
relevant facts: we have identified normal focus with answerhood, but
answers to questions generally take wide scope.
This is not to say that Spanish lacks any sort of Focus non-phrase-
finally—it lacks only the kind of Focus that is needed for answering
questions. Zubizarreta (1998, 76) gives the following example:
(57) JUAN
JUAN
llamo por telefono (
called
no
not
PEDRO).
PEDRO
Here a phrase-initial accented NP can serve as a contrastive Focus—just
where it cannot serve as a Focus for the purpose of answering questions.
In chapter 9 I will embed a theory of contrastive versus normal focus in a
theory of the values assigned at each level: FS will be the input to ques-
tion interpretation, but Accent Structure (a level to be introduced in
Topic and Focus 51
chapter 9), which normally ‘‘represents’’ FS by matching an accented
phrase to a focused phrase at FS, will be shown to give special meta-
linguistic e¤ects when FS is not canonically represented, as in (57).
What happens in Spanish when a normal Focus cannot be postposed,
for some reason intrinsic to the structural (i.e., CS- or SS-related) restric-
tions in the language? It is not clear, as it is di‰cult to form a question in
Spanish where the question word is nonfinal, because postposing and
reordering always seem to permit postposing. Nevertheless, small clause
constructions might be relevant cases.
(58) A: Con
with
quien
who
llegaron
arrived
enferma?
sick
‘Whoi did they arrive with sicki?’
B: Llegaron con MARIA enferma.
B 0: *Llegaron con enferma MARIA.B 00: *Llegaron enferma con Maria.
(J. Camacho, personal communication)
As the translation indicates, the PP con MARIA modifies the verb, and
the adjective enferma (with feminine ending) modifiesMaria and so enters
into some kind of secondary predication relation with it. That predication
relation does not permit postposing, of eitherMaria or the PP con Maria.
In that case the normal Focus can be nonfinal, as in (58B). This shows
that Spanish does permit nonfinal normal Focuses, but only when it has
no choice.
What does it mean to have no choice? In RT it must mean one of two
things. First, it could mean that the representing level simply has no form
that corresponds to [V PP AP], the form of the VP in (58B 0). Second, itcould mean that SS, in addition to representing FS, must also represent
some other structure, presumably the one in which small clause predica-
tion is adjudicated, and that the call to represent that structure is stronger
than the call to represent FS. As I have no considerations favoring one
over the other, I will let the question stand.
2.8 Russian Subjects
Russian exhibits the same behavior we found in German scrambling and
English HNPS: obligatory leftward positioning of elements unless they
are narrowly focused.
52 Chapter 2
(59) a. Usi
ears.acc.pl
zalozilo.
clogged-up.neut.sg
(Lavine 1997)
b. *Zalozilo usi.
(unless usi is narrowly focused)
(S. Harves, personal communication)
The only argument to zalozilo is the accusatively marked internal argu-
ment usi; one would normally expect it to appear postverbally, as other
such internal arguments would. But in fact that is not the normal order
for such sentences; rather, the order in which the argument occurs pre-
verbally is the normal order. It is normal in the sense that it is the only
order, for example, in which Focus projects, and so the only focus-neutral
order.
The di¤erence between German and Russian lies in the freedom with
which arguments can cross the verb. Nothing like Holmberg’s general-
ization holds in Russian.
There are two ways to account for this state of a¤airs. I will outline
them, without choosing between them.
The first possibility, the simpler of the two, is that Russian FS imposes
the NP V order, in that such a structure is the only one from which Rus-
sian permits Focus projection. In other words, Russian FS has the fol-
lowing structures, among others (where ‘‘ 0’’ marks accented positions).
(60) Russian FS
a. [NP 0 V NP 00]Fb. [NP 0 V]Fc. [V NP 0F]
The pattern in (60b) is in fact the pattern for Focus projection in
English intransitive sentences.
(61) a. One of my friends 0 died.b. One of my friends died 0.
If the main accent is on died, as in (61b), then died also bears narrow
focus; but if it is on friends, as in (61a), then it can project to the entire
sentence.
Under this regime the derivation of (59a,b) would look like this:
(62) a. CS: [zalozilo usi]‘ SS: [zalozilo usi] !cFS: [usi zalozilo]Fb. CS: [zalozilo usi]‘ SS: [zalozilo usi]cFS: [zalozilo usiF]
Topic and Focus 53
In this scheme Russian has a notion of subject at FS in the sense that
only structures with a preverbal NP allow projection. But the requirement
that there be a subject could arise somewhat earlier, so long as it did not
arise as early as CS, or wherever nominative Case is assigned, because it
clearly has nothing to do with nominative Case. Suppose, for concrete-
ness, that there is an SS requirement that there be a subject, which must
be met even if there is no nominative. Given such a requirement, surface
structures would have to have the following form, where the first NP is
the ‘‘subject’’:
(63) SS: [NP V . . . XP]
In that case the structures assigned to (59a) would look like this:
(64) CS: [zalozilo usi]‘! SS: [usi zalozilo]cFS: [usi zalozilo]F
That is, SS misrepresents CS, but faithfully represents the Focus-
projecting FS. Because constraints within levels are inviolable, the surface
structure for (59b) must be the same as the surface structure for (59a); but
then, the ‘‘heard’’ output is wrong, since the form of (59b) is Zalozilo usi.
In order to model the facts, there must be a ‘‘heard’’ representation that is
subsequent to SS; suppose that FS is such a representation. Then, FS will
(mis)represent SS, rather than the reverse, and the following derivation is
possible:
(65) CS: [zalozilo usi]‘! SS: [usi zalozilo]‘! FS: [zalozilo usiF]
Here SS misrepresents CS, as it must in order to meet the SS subject
requirement; in addition, FS misrepresents SS, presumably in order to
achieve narrow focus on usi.
In later chapters I will adopt two elements from this analysis: (a) that
FS represents SS, rather than the reverse, and (b) that di¤erent levels have
di¤erent ‘‘subjects.’’
The notion ‘‘subject’’ could be a feature of many di¤erent levels, but
with predictably di¤ering properties, if the properties depend on the prop-
erties of the levels themselves. Shape Conservation will tie the subjects
together: in the canonical mapping between levels, subject at one level
will map to subject at the next. See section 3.2.2 for a generalization of
the notion ‘‘subject’’ across the levels.
In the second account SS has a notion of subject that motivates the first
mismapping. This notion of subject is completely analogous to the Ex-
tended Projection Principle (EPP), since it is understood as a requirement
54 Chapter 2
distinct from Case assignment in minimalism. See Lavine (forthcoming)
for extended argument for this arrangement in a minimalist account of
Russian.
What Russian adds to the picture developed here is the fact that com-
plement and verb can reorder in mismapping. In Germanic and Romance
any reordering of complement and head is associated with Case, and the
evidence for separating the EPP from Case has come largely from exple-
tive constructions. Lavine’s work establishes that the phenomenon is a
good deal more general. I will return to Russian impersonal verbs in
chapter 5, after necessary notions about the RT levels are introduced in
chapter 3.
At present we have no means to weigh the relative cost of mismapping
that respects head order and mismapping that does not. In Williams
1994b, in a di¤erent theoretical context, I proposed the principle TRAC,
which suggested that reordering (for scrambling) was compelled to main-
tain the theta role assignment configuration, which among other things
specified the directionality of theta role assignment; but clearly this is not
generally true. Still, although I have no concrete suggestion to o¤er at this
point, I am tempted to think that reorderings that violate TRAC are
more costly than reorderings that do not.
2.9 Conclusion
2.9.1 Semantics of Form
The facts pertaining to the interaction of scrambling, topic, and focus
provide a rich testing ground for theories attempting to account for cor-
relations between syntactic form and meaning. Checking Theory provides
a simple account, interesting if correct because it assumes a straightfor-
ward compositional semantics: interpretable features are interpreted in
situ, accounting for meaning, and they act as syntactic attractors, ac-
counting for form. But for the constructions examined here, this account
does not seem to work; instead, what we find is a holistic matching of a
clause structure with a Case structure on the one hand and a quantifica-
tion structure on the other, without the possibility of reducing the inter-
relations involved in the match to a set of triggered movement relations.
This is because the possibility of mismatching two structures depends
crucially on what other structures exist, and because the ‘‘moved’’ con-
stituent does not correspond to the constituent on which the interpreta-
tion turns.
Topic and Focus 55
Perhaps the most radical conclusion that can be drawn from this is that
semantics is not compositional in a significant sense: the quantification
structure of a clause is fixed holistically, by matching a surface structure
with an independently generated quantification structure, and how that
match works is determined by what other matching relations the sur-
face structure enters into. To this extent, the quantification and focus
structures of a sentence are not determined by a strictly compositional
computation.
If this conclusion is accepted, then we must account for why semantics
appears to be compositional. I think we can best understand this by con-
sidering the question, when would a pattern-matching theory of semantics
be fully indistinguishable from a compositional semantics? The answer is,
when every possible attempt to match succeeded—when for any given
quantification structure there was a surface structure that fully matched a
Case structure and a focus structure, so that full isomorphism held across
the board. In that case we could use either theory interchangeably; the
result would always be the same. If the conclusion of this chapter is cor-
rect, English and German approximate this state, but neither achieves it,
and in fact they deviate from it in di¤erent ways. The approximation is
close enough that if only a narrow range of facts is examined in any one
analysis, the failure of compositionality will escape detection. Given sub-
stantive conclusions about the nature of each of the sublanguages, it
is probably inevitable that a completely isomorphic system would be
impossible.
2.9.2 How Many Levels?
How many levels are there? In this chapter I suggested four or five (CS,
TS, SS, QS, FS). At di¤erent points in what follows, I will talk about
models with di¤erent numbers of levels. What is the right number? If we
had the right number, and the properties of each, we would pretty much
have a complete theory. I have nothing like that. What I have instead is
evidence for a number of implicational relations of the sort, ‘‘If property
A occurs in level X and property B occurs in later/earlier level Y, then it
follows that . . .’’; and in fact the discussion in this chapter has had exactly
this character. These implicational predictions exploit the main idea
without requiring a full theory, and seem su‰ciently rich to me to en-
courage further investigation into what might be viewed as a family of
representation theories.
56 Chapter 2
Every theory—or more properly, every theoretical enterprise—has
at least one open-ended aspect to it. For example, di¤erent Checking
Theories propose di¤erent numbers of functional elements and di¤erent
numbers of features distributed among them. It is no trivial matter to
determine whether some group of checking analyses, and the Checking
Theories that lie behind them, are compatible with one another, and
consequently whether there is a prospect of a final Checking Theory that
is compatible with all of those analyses. What makes them all Checking
Theories is that they all have the same view of the design plan of syntax:
they all incorporate some notion of movement governed by locality or
economy that results in checked features, which are used up.
The same is true of representation theories. In chapter 4 I introduce a
new level, Predicate Structure. The reason for the new level is that the
levels determined by the considerations in chapters 1–3 do not allow
enough distinctions. In introducing the new level, I assume, basically
without demonstration, that it is compatible with the results of the previ-
ous chapters. In chapter 9 I introduce a new kind of level, Accent Struc-
ture, for focus. Again, I do so because the levels proposed earlier do not
allow enough distinctions, and I hope that the newly extended theory is at
least compatible with the results of this chapter. One can see repeating
itself here the history of the development of Checking Theories. Many
journal articles are devoted simply to achieving some descriptive goal by
splitting some functional element into finer structure.
Much the same can be said of OT. There, the content of the constraints
themselves is not fixed, nor is the architecture (division into modules) of
the linguistic system. So the number of ‘‘Optimality’’ Theories is enor-
mous and varied, but we are still justified in calling them Optimality
Theories if they hew to the basic tenets: the calculus for evaluating can-
didate structures against a set of constraints, and the notion that all vari-
ation reduces to constraint ordering.
In like manner, I would reserve the term Representation Theory for any
theory that posits multiple syntactic levels in a shape-conserving relation
to one another, whatever the levels turn out to be. To that, I would like to
add one other substantive hypothesis, the Level Embedding Conjecture of
chapter 3, if for no other reason than I feel that the most interesting pre-
dictions follow from the model that incorporates that idea. A number of
things can be inferred about this class of theories, things that are inde-
pendent of various decisions about what the levels are.
Topic and Focus 57
The correct RT will have no fewer levels than are envisioned in this
chapter. Can we see enough of how the methodology works to gain some
rough idea about what the final model might look like? I think the limit-
ing case is an RT with exactly the same number of levels as there are
functional elements in the structure of a clause in the corresponding
Checking Theory. That is, it would not have a ‘‘Case Structure’’; rather,
it would have an ‘‘Accusative Structure’’ and a ‘‘Dative Structure.’’
Likewise, it would not have a Theta Structure; rather, it would have a
Patient Structure and an Agent Structure. I think this limiting case is not
correct, because there appear to be functional subgroupings of these
notions: patient and theme seem to be part of a system with certain
properties, as do accusative and dative. But even if this limiting case
turned out to be correct, RT would not thereby become a notational
variant of Checking Theory, because the architecture is di¤erent, and the
architecture makes predictions that Checking Theory is intrinsically inca-
pable of. I turn to those predictions in the next chapter.
58 Chapter 2
Chapter 3
Embedding
In the preceding chapters the levels of RT have been used to account for
word order facts of a certain sort: mismapping between levels has been
invoked as a means of achieving marked word orders with certain inter-
pretive e¤ects. In this chapter I will sketch other properties of the levels
and indicate how certain high-level syntactic generalizations might be
derived from the architecture of the model in a way that I think is un-
available in other theoretical frameworks.
I will consider two kinds of embedding here, complement embedding
and functional embedding, and I will treat them very di¤erently. Suppose
we accept the notion that there is a fixed hierarchy of functional elements
(T, Agr, etc.) that compose clause structure (and similar sets for other
phrase types). Functional embedding is then the embedding that takes
place within one fixed chain of such elements—embedding AgrO under T,
for example. Complement embedding is the embedding that takes place
between two such chains—embedding NP or CP under V, for example.
In this chapter I suggest that complement embedding takes place at
every level, with di¤erent complement types entering at di¤erent levels.
The result is an explanation of the range of clause union e¤ects and a
derivation of a generalized version of the Ban on Improper Movement.
The methodology is pursued further in chapters 4 and 5, resulting in what
I call the LRT correlations: for any syntactic process, three of its prop-
erties will inevitably covary, namely, its locality, its reconstructive behav-
ior, and its target (e.g., A or A position). These properties are tied together
by what level they apply at, and in particular by what complement types
are defined there. In chapter 4 I show that anaphors are ‘‘indexable’’ in
this way by level, with predictably varying properties across the levels.
English himself, for example, is a CS anaphor, whereas Japanese zibun is
an SS anaphor; ideally, all properties are determined by those assignments,
and earlier anaphors ‘‘block’’ later ones by general principle (the Level
Blocking Principle). In chapter 5 I do the same for scrambling rules. The
predictions bound up in these correlations rely on the feature of RT that
does not translate into minimalism or other theories, namely, the decom-
position of clause structure into distinct sublevels or sublanguages.
In chapter 7, turning to functional embedding, I propose an axiomati-
zation of X-bar theory that reduces head-to-head movement to X-bar
theory, accounting for its locality and especially for its restriction to a
single clause structure. In chapter 8 I take up the morphological con-
sequences of this account. In RT a lexical item is understood as ‘‘lexical-
izing’’ or ‘‘representing’’ a subsequence of functional structure.
3.1 The Asymmetry of Representation
Before turning to complement embedding, I need to make a point about
representation that is entailed by the account I will give. Representation
will necessarily be an asymmetric relation in the model that embraces the
results of this chapter, for reasons having to do with how embedding is
accomplished.
By hypothesis, all levels are involved in embedding (the Level Embed-
ding Conjecture; see section 3.2.1). Functional elements are themselves
associated with particular levels. Tense, for example, is not defined before
SS, and so enters structures there at the earliest. Consequently, there will
be representation relations that systematically violate isomorphism. For
example:
(1) TS: [agent [V theme]]‘CS: [NPnom [VT NPacc]]T
There is at least one element in CS—namely, the T(ense) marking—that
is absent from TS in (1); hence, there is not a two-way one-to-one map-
ping between the two sets of structures.
Despite the lack of isomorphism, such relations will count as com-
pletely true mappings, not mismappings. The reason is that the represen-
tation relation itself will have an asymmetric definition. To take TS‘CS
as a special case, true representation will have the following properties:
(2) a. Every item in TS maps to an item in CS.
b. Every significant relation between items in TS maps to a relation
in CS (for relations like ‘‘head of ’’).
Importantly, (2) does not impose the reverse requirements: that every
item in CS be mapped to an item in TS, and so on. If (2) defines repre-
60 Chapter 3
sentation, then representation is not really isomorphism, but homomor-
phism, and so is asymmetric. A homomorphism is like an isomorphism in
being structure preserving and therefore reversible; but the reverse is not
defined for the full range. Representation must be asymmetric if new lex-
ical or functional material enters at each level, as the hypotheses to be
entertained in this chapter will require. The Case structure in (1) includes
more than the theta structure (T, in particular), but it can still be said to
represent the theta structure in (1), if (2) is true. Under this view the mis-
mappings described in chapter 2 are now to be viewed as deviations from
homomorphism, rather than from isomorphism.
No kind of embedding is immune. Adjuncts will also enter clause
structure at later levels, perhaps at all levels. Wh movement itself is not
defined until SS, presumably also the level where CP structure is defined
(or, where it takes IP), and so any adjuncts that are themselves CPs (such
as when, where, and why clauses) involving wh movement cannot enter
until that point either.
Let us look at a concrete example involving adjuncts. (3) is a fully valid
representation relation; the tree on the right obviously has more in it, but
that doesn’t matter if all the items and relations in the first tree have cor-
respondents in the second.
(3)
(4) Preserved relations
V head of VP
NP1 subject of VP
NP2 object of V
NP1 left of VP
The new item, the adverb, and the new relations it enters into with the
rest of the sentence do not interfere with the representation relation.
In what follows I will speak of the representation relation as holding
sometimes between two levels or sublanguages, sometimes between two
members (or trees) of those levels or sublanguages, and even sometimes
Embedding 61
between subparts of trees in di¤erent levels. It is of course the fact that
the representation relation preserves the structure of one level in the
structure of the next level that makes it possible to slip from one to an-
other of these usages.
Wh movement takes place within the SS level, in the following way. A
structure in CS is mapped into a very similar structure in SS; wh move-
ment derives another structure within SS; and that structure (at least in
languages with overt wh movement) is then mapped to a structure in FS.
(5)
As in previous chapters, the wavy arrow (‘) marks a representation
relation, and now the straight arrow marks an intralevel derivational re-
lation. So the structure has ‘‘grown’’ a SpecC in SS. In e¤ect, the Case
structure is mapped isomorphically to a subpart of the surface structure
that carries forward (backward?) from there.
Exactly how the functional elements, ‘‘real’’ movement rules, and so
on, sort out into levels remains to be fixed empirically. But in advance of
that, this chapter lays out a theory that says that all the important prop-
erties of the items will in turn be fixed by that choice.
Some processes, elements, and such, may be defined at more than one
level. For those cases, two of which are anaphors and scrambling rules,
the model has further consequences: blocking holds between levels, so
‘‘early’’ elements always block ‘‘late’’ elements (see Williams 1997 for
further discussion).
It should be clear that there is a relation between the levels of RT and
the layers of functional structure in standard Checking Theories. The
asymmetry noted above is fully consistent with this. Later levels of RT
correspond to higher layers in functional structure. In particular, later
levels have ‘‘bigger’’ structures than earlier levels: I will suggest below
that CP exists in SS, for example, but only IP exists in some earlier
structure (CS or PS). For some considerations, it will be simple to trans-
late between RT and Checking Theories, because of the ‘‘higher equals
later’’ correspondence that holds between them. I will naturally dwell on
those considerations for which there appears to be no easy translation
from RT to Checking Theory in order to e‰ciently assess the di¤erences
between them.
62 Chapter 3
3.2 Complement Embedding and the Level Embedding Conjecture
I will suggest in this section that each of the RT levels defines a di¤erent
complement type and that all complement types are embeddable. The
complement types range from the very ‘‘small’’ clauses at TS to the very
‘‘large’’ clauses at FS. The range of complement types corresponds to
the degree of clause union that the embedding involves: TS complements
are very tight clause union complements (like serial verb constructions),
whereas FS complements are syntactically isolated from the clause they
are embedded into. This di¤erence follows immediately from the model
itself: RT automatically defines a range of types of embedding comple-
ments, one type defined at each level, as summarized in (6).
(6) Types of embedding
TS objects: serial verb constructions (VPs?)
CS objects: exceptional Case marking; control? (IPs)
SS objects: transparent that clause embedding (CPs)
FS objects: nonbridge verb embedding (big CPs)
On the right I have indicated the category in standard theory to which
the objects defined at each level correspond. This correspondence cannot
be taken literally as a statement about what objects are defined in each
level of RT, because di¤erent RT levels define di¤erent types of objects
altogether. For example, TS does not define VPs; rather, it defines theta
structures, which consist of a predicate and its arguments. Nevertheless,
the objects in the RT level of TS correspond most closely to the VPs of
standard theory, and so on for the rest of the levels in (6).
This aspect of embedding is a ramified ‘‘small clause’’ theory, with
small, medium, large, and extra large as available sizes. In a strict sense,
the structures ‘‘grow’’ from left to right, theta structures being the small-
est and focus structures the largest.
3.2.1 The Level Embedding Conjecture
There are thus many types of embeddable complements under a ramified
small clause theory, but where does embedding take place? One way to
treat complement embedding in RT would be to do all embedding at
TS. Complex theta structures would be mapped forward into complex
Case structures, and so on; and higher clause types would then be
‘‘recycled’’ back through TS for complement embedding, as the diagram
in (7) indicates.
Embedding 63
(7)
This arrangement would make RT most resemble minimalist practice and
its antecedents. I think though that much can be gained by a di¤erent
scheme: the one already alluded to, in which di¤erent kinds of embedding
are done at di¤erent levels. As there seem to be di¤erent ‘‘degrees’’ or
‘‘types’’ of embedding with respect to how isolated from one another the
matrix and embedded clauses are, we might gain some insight into them
by associating the di¤erent types with di¤erent levels in RT. I will refer
to this theory of embedding as the Level Embedding Conjecture (LEC). In
RT the LEC is in a way the simplest answer to the question of how
embedding is done: it says that an item can be embedded exactly at the
level at which it is defined, and no other.
(8)
For example, the tightest clause union e¤ects can be achieved by
embedding one theta structure into another in TS, deriving a complex
theta structure, which is then mapped into a simple Case structure. The
behavior of such embedding is dominated by the fact that there are too
many theta roles for the number of Cases, so some kind of sharing or
Case shifting must take place. A good example of this is serial verb con-
structions, where two theta role assigners (i.e., verbs) must typically share
a single Case-marked direct object, and where there must be a tight se-
mantic relation between the two.
At the other extreme, that clause embedding takes place much later, in
SS for example. What does a derivation involving that clause embedding
look like? Two clauses (matrix and embedded) both need to be derived to
the level of SS, at which point one is embedded in the other.
(9) TS: CS: SS:
[Bill, [believes]] ‘ [Bill, [believes]] ‘ [Bill [believes]]þ[Mary, [ate a dog]]‘ [Mary [ate a dog]]‘ [Mary [ate a dog]] !
[Bill [believes [Mary [ate
a dog]]]]
The verb believe is subcategorized to take an SS complement. This sub-
categorization is always taken to determine not only the type of the
complement, but also the level in which the embedding takes place; it is
64 Chapter 3
this double determination that generates the broad consequences alluded
to at the beginning of this chapter, and detailed below.
Before we turn to the details of embedding at di¤erent levels, a word
about the notion ‘‘lexical item’’ in RT. Lexical items obviously partici-
pate in multiple representations. Ordinarily the entries in the lexicon are
regarded as triples of phonological, syntactic, and semantic information.
In RT lexical items are n-tuples of TS, CS, . . . , and phonological infor-
mation. For example, the theta role assigner squander, which assigns a
theme and an agent role in TS, is related to the Case assigner squander,
which assigns accusative Case in CS; to the surface verb squander with its
properties, whatever they are; and so on.
(10) squander TS: [agent [squander theme]]
CS: [squander accusative]
SS: . . .
. . .
Part of the algorithm that computes isomorphism between levels clearly
takes into account identity of lexical items across di¤erent levels; thus,
(11a) and (11b) will count as isomorphic, but (11c) and (11d) will not.
(11) a. [agent [squander [theme]]] ‘ b. [nominative [squander
accusative]]
c. [agent [squander [theme]]] ‘! d. [nominative [squash
accusative]]
Lexical entries such as (10) are the basis for such identities. The rest of
this chapter assumes something like this conception of the lexicon, actu-
ally just the obvious elaboration of the usual assumption.
3.2.1.1 TS Embedding As mentioned above, the lowest level of em-
bedding is associated with the strongest clause union e¤ects, since a com-
plex theta structure is represented by a simple Case structure. Consider
for example the following serial verb constructions from Dagaare (12a)
and ¼j Hoan (12b):(12) a. o
3sg
da
past
mOng
stir
la
factive
saao
food
de
take
bing
put
bare
leave
ko
give
ma
me
(Bodomo 1998, (32))
b. ma
1sg
a-
prog
qkhupour
j’oput.in
djo
water
ki
part
kx’u
pot
na
in
‘I am pouring water into a pot.’
(Collins 2001)
Embedding 65
In the serial verb construction the clause contains several verbs, each
thematically related in some way to at least some of the objects. Signifi-
cantly, there is a single direct object, and a single indirect object. We can
view this as a combination of two theta structures, followed by a subse-
quent representation by a single Case structure.
(13) TS: CS:
{V1 theme}þ ‘ [VCase assigner NP NP]
{V2 theme}þ{V3 theme, goal}¼{V1 V2 V3 theme goal}
In other words, three simple theta structures, one for each V, are com-
bined into a complex theta structure, and that is mapped onto a simple
ditransitive Case structure.
It is typically remarked in connection with such constructions that the
connection between the verbs is extremely tight semantically, so tight that
the verbs can only be understood as denoting subparts of a single event.
If so, we might suppose that events are defined in TS, hence that com-
plex events are derived there. The ‘‘þ’’ in (13), then, is a complex-event-deriving operator with a limited range of possible meanings, and only
these are available for serial verb constructions. The possible meanings
include ‘causes’, ‘occurs as a part of the same event’, and so on.
Such remarks are reminiscent of what is often said about ‘‘lexical’’
causatives: that the notion of causation is extremely direct, causing and
caused events constituting a single complex event. For example, (14a,b)
are not synonymous.
(14) a. John encoded the information.
b. John brought it about that the information got encoded.
(14b) holds of a much wider set of situations than (14a). (14a) covers only
the case where John performed an action that resulted in the encoding
without other mediating events or other agents. In fact, (14b) might tend
to exclude the meaning that (14a) has, but this is most likely due to
blocking (i.e., for the situations for which (14a) and (14b) are both appli-
cable, (14a) is preferred, because it is more specific than (14b)).
As we have hypothesized that morphology has access only to TS, and
to nothing higher, it is not surprising that lexical causatives are restricted
to the ‘‘single complex event’’ interpretation, since that is the only inter-
pretation available at TS, a fact we know independently from serial verb
constructions.
66 Chapter 3
There is a more complex situation that arises in serial verb construc-
tions: each of the verbs has Case-assigning properties. The second verb
is sometimes felt to be ‘‘preposition-like.’’ These might be analyzed as a
complex theta structure mapping into a complex Case structure, where
the complex Case structure has two Case assigners, V and P. I will leave
the matter for further work.
Other examples of TS embedding might include tight causative con-
structions. The causative in Romance involves Case shifting (nom! acc,acc! dat) that can be understood as arising from the need to accommo-date a complex theta structure in a simple Case frame.
(15) Jean a fait þ Pierre manger la pomme!Jean
Jean
a fait
made
manger
eat
la
the
pomme
apple.acc
a
to
Pierre.
Pierre.dat
‘Jean made Pierre eat the apple.’
The complex predicate constructions studied in Neeleman 1994 are further
potential examples. We could characterize embedding in TS as embedding
that shows obvious apparent violations of the Theta Criterion—two or
more verbs assign the same theta role to the same NP, without the media-
tion of PRO or trace. The reason this embedding does not respect the Theta
Criterion is that the Theta Criterion itself does not hold in TS; rather, it
holds of the way that theta structures are mapped to Case structures.
3.2.1.2 CS Embedding CS embedding conforms strictly to the Theta
Criterion, but may exhibit Case interrelatedness between two clauses.
Exceptional Case-marking (ECM) constructions might well be good
instances of CS embedding. Case is not really shared between the two
clauses in these constructions; rather, the matrix V has Case influence in
the embedded clause. With regard to event structure, there is no ‘‘single
event’’ interpretation, as the two verbs are part of the designation of dif-
ferent events.
(16) John believes himself to have won the race.
Furthermore, although the embedded clause in (16) is transparent to Case
assignment by the verb of the matrix clause, the sentence clearly has two
Case assignment domains, and in fact in (16) two accusative Cases have
been assigned. Thus, ECM is di¤erent from TS embedding.
(17) CS: [John believes]þ[himself to have won the raceacc]¼John believes himselfacc to have won the raceacc
Embedding 67
English provides some minimal pairs illustrating the di¤erence between
CS and TS embedding. Expletives do not exist in TS, where every relation
is a pure theta relation. Expletives exist to fill Case positions that do not
have arguments in TS mapped to them. Given this, we might wish to an-
alyze certain small clause constructions as CS embeddings and others as
TS embeddings, depending on whether an expletive is involved or not.
English has two constructions that might di¤er in just this way: most
small clause constructions require an expletive in the direct object posi-
tion when the subject of the small clause is itself a clause, but a few do
not.
(18) a. I want to make *(it) obvious that Bill was wrong.
b. I want to make (very) clear that Bill was wrong.
For a handful of adjectives like clear and certain, the verb make does not
require an expletive; and as the adverb very in (18b) indicates, the reason
is not simply that make-clear is an idiosyncratic compound verb. If we
suppose that expletives do not enter until CS, we could assign (18a,b) the
following structures, respectively:
(19) a. TS: [make clear]VP CS: [make clear]V that S
b. TS: [make]V CS: [make it clear . . . ]VPc. *How clear did he make that he was leaving?
d. How clear did he make it that he was leaving?
Make-clear is a complex predicate formed in TS, analogous to causative
constructions of the kind found in Romance, where, incidentally, exple-
tives are also excluded (Kayne 1975).
Expletives then mark ‘‘formal’’ non-TS Case positions, that is, posi-
tions with no correspondent in TS. It is likely that ‘‘Case’’ itself is not a
single notion; in particular, it is likely that so-called inherent Case is
present in TS, and only derivatively in CS. CS then would introduce only
formal Cases, not inherent or semantic Cases. Evidence for this would
come from compounding: as we have restricted compounding to repre-
senting TS, only inherent Case should show up in compounding. Al-
though I have not investigated the matter in detail, this does conform to
my general impression.
In the case of make clear, the TS phrase [make clear]VP is mapped to
the CS atom [make clear]V. That it is truly atomic can be seen in the
contrast between (19c) and (19d): make clear does not allow the extrac-
tion of clear, but make it clear does. In previous work (Williams 1998a) I
attributed this to the di¤erence between a lexical formation (make clear)
68 Chapter 3
and a phrasal formation (make it clear), along with a principle stipulating
the atomicity of lexical units in phrasal syntax. RT allows a relativized
notion of atomicity: if a phrase at one level corresponds to, or is (mis)-
mapped to, an atom at the next level, that atom will be frozen for all
processes subsequent to that level. An advantage of this conception is that
it does not force us to call make clear a word in the narrow sense, a des-
ignation discouraged by its left-headedness and by its modifiability (make
very clear). The relativization involved here—relativizing the notion of
atomicity to hold between every pair of adjacent levels—will become a
familiar notion in chapter 4 and subsequently.
3.2.1.3 SS and FS Embedding Embedding at SS is ordinary that clause
embedding. Case cannot be shared across the that clause boundary (but
see Kayne 1981) because Case is already fully assigned by the time the
that clause is embedded in its matrix.
(20) CS: SS:
I think ‘ I thinkþhe is sick‘ he is sick ¼ I think that he is sick
If wh occurs in SS, as I have assumed, then embedding in FS should be
out of the reach of wh movement; that is, complements embedded in FS
should be absolute islands with respect to FS embedding. What sort of
embeddings would be expected in FS? Presumably, embeddings in which
it would be reasonable to attribute a focus structure to the complement.
Since focus is generally a root ‘‘utterance’’ feature, the embedded clauses
that are focus structures would be those that most closely match matrix
utterances in their semantics and properties. From this perspective, it
would be reasonable to expect ‘‘utterance’’ verbs like exclaimed and
yelled to embed focus structures. These verbs embed not just proposi-
tions, but ‘‘speech acts,’’ loosely speaking, as the verbs qualify the manner
of the act itself. This is the class of verbs traditionally identified as non-
bridge verbs, so called because their complements resist extraction.
(21) *Who did John exclaim that he had seen t?
To the extent that this is so, then the assignment of this kind of
embedding to FS derives the behavior of these verbs with respect to wh
extraction.
(22) SS (wh movement):‘FS (too late for wh movement):
[John exclaimed]þ John exclaimed [he saw who]
[he saw who]
Embedding 69
In the case of nonbridge verbs, the parts are simply not put together in
time for extraction, hence their islandhood. In fact, though, they should
not be absolute islands, but islands only to pre-FS movement. If a move-
ment is defined for FS, these verbs should act like bridge verbs for that
movement.
In order to guarantee that embedding is delayed until FS, the lexi-
cal entry for nonbridge verbs must be endowed with subcategorization
for FS objects, which is in keeping with their meaning, as mentioned
earlier.
It is reported that some languages (e.g., Russian) resist wh extraction
from all tensed clauses. Perhaps in such a language, all tensed-clause
embedding takes place at FS.
The derivation of the islandhood of nonbridge verb complements is
an example of a kind of explanation natural to RT. I will refer to such
explanations as timing explanations.
3.2.1.4 Countercyclic Derivation The LEC forces some rather un-
expected derivations. The matrix may develop a very complex structure
itself before the lowest embedded clause is actually embedded into it. For
example, consider a sentence in which an ECM infinitive is embedded in
a matrix that clause, and another that clause is embedded under the verb
in the ECM clause.
(23) a. [that . . . [him to have said [that . . . ]]ECM]
b. He believes him to have said that he was leaving.
The LEC actually requires that the ECM construction be embedded in its
matrix before the that clause is embedded under the verb in the ECM
clause, so for this kind of case the order of embedding is ‘‘countercyclic.’’
This is of course because under the LEC, ECM embedding takes place in
CS, and that clause embedding takes place in SS, so the derivation looks
like this:
(24)
70 Chapter 3
Similarly, it could happen that a verb taking a that complement is
embedded under a matrix raising verb before its own complement clause
is added.
(25) TS: . . . SS:
[seemsþ sad] seems [sad that Bill is leaving]
The reason for thinking that raising embedding takes place in TS is that it
is found in compound formations.
(26) a. sad seeming
b. odd appearing
We have seen reason to restrict compounds to levels that are repre-
sentations of TS; but then since raising constructions can appear as com-
pounds, raising must be a TS relation, and so the order of derivation in
(25) follows.
I do believe that it is entirely harmless that derivations proceed this
way. I wish it were more than this; countercyclic embedding is a distinc-
tive feature of RT, so that one should be able to exploit it to find empiri-
cal di¤erences with other theories, none of which have this property. Still,
I have not been able to find any such di¤erences.
It is important to emphasize that the LEC ensures an orderly assem-
blage of multiclause structure, just as much as the incremental application
of Merge in minimalist practice; it simply gives a di¤erent order. Embed-
dings take place in the order of complement type, rather than in bottom-
to-top order.
3.2.2 Consequences of the LEC
To sum up the consequences of the LEC, one might say that it forces or
suggests generalizations of fundamental elements of linguistic structure:
generalized A/A distinction, subjecthood, generalized anaphoric binding,
generalized scrambling. The dimension of generalization is always across
the RT levels. The first two are taken up in the remainder of this section,
the last two in chapters 4 and 5.
3.2.2.1 The Relational Nature of Improper Movement The LEC derives
the Ban on Improper Movement (BOIM) directly. In fact, it derives a
generalization of it that is distinctive to RT.
The BOIM is generally taken to block movement from A positions to
A positions, as in (27), in which John moves, in its last step, from SpecC
of the lower clause to SpecI in the higher clause.
Embedding 71
(27) *John seems [t [Bill has seen t]]CP.
I will take it as given that the BOIM is real. I will suggest how it can be
generalized in RT, and how it can be derived from the basic architecture
of the model in a way that is not possible in standard minimalist practice
or its antecedents.
The generalization of the BOIM to the Generalized BOIM (GBOIM)
is nothing more than the generalization of the A/A distinction that we
will see in this chapter and in chapters 4 and 5. I will state the GBOIM as
it would occur if it were instantiated in a standard model, one with a
ramified Pollock/Cinque-style clause structure.
(28) The GBOIM
Given a Pollock/Cinque-style clausal structure X1 > � � � > Xn
(where Xi takes Xiþ1P as its complement), a movement operationthat spans a matrix and an embedded clause cannot move an
element from Xj in the embedded clause to Xi in the matrix, where
i < j.
In RT, as we will see shortly, the GBOIM follows from the architecture
of the theory and therefore needs no independent statement.
The GBOIM is a proper generalization of the BOIM to the extent that
A positions are beneath A positions in clausal architecture as a special
case; in general, according to the BOIM, if you are on the second floor of
clause A, and you move into clause B, you can’t move to a floor any
lower than the second.
Since we will generalize the A/A distinction in RT to the relation be-
tween any pair of levels, and since there will be no A/A distinction apart
from this, the BOIM! GBOIM generalization is forced in the presenttheoretical context. In this generalized version, items in Case positions
in an embedded clause, for example, cannot move into theta positions in
the matrix, and so forth. However, items in theta positions can move to
higher theta positions, higher Case positions, and so on.
The GBOIM is not obviously true, and a number of existing analyses
run counter to it, to the extent that it regiments A and A positions as
special cases. For example, any analysis in which clitic movement is A
movement is contrary to the BOIM, if the subject position is an A posi-
tion superior to the clitic position. Analyses of this sort must be re-
examined in light of the GBOIM. Some are taken up below, though most
will remain unaddressed.
72 Chapter 3
The BOIM itself is not derivable in minimalist practice from the basic
principles governing derivation, such as economy or extension (the strict
cycle). For example, at the point at which wh movement would violate
the BOIM, a minimalist analysis would have built up a structure like
(29a), and neither economy nor the strict cycle nor extension prevents the
application of wh movement to derive (29b) by putting the wh in SpecV
(or SpecI, for that matter).
(29) a. [V [wh . . . ]CP]V 0
b. [wh [V [t . . . ]CP]V 0 ]V 0
This is not to say that there cannot be principles that block particular
cases of the BOIM (the GBOIM is in fact such a principle); my limited
point is that it does not follow organically from basic assumptions about
derivation or economy.
But I believe the GBOIM does follow unavoidably from the basic ar-
chitecture of RT, or something like it, so long as the LEC is a part of it.
The RT levels determine di¤erent kinds of embedding, as described in the
previous sections. To make the discussion concrete, assume that SS is the
level at which ‘‘transparent’’ that clause embedding takes place. Di¤erent
levels are also associated with di¤erent kinds of movement; again, for the
sake of concreteness, let’s assume that SS is the level at which wh move-
ment takes place and CP structure is introduced. Proper movement takes
place in derivations with the following character: first, two surface struc-
tures are built up by building up all of the structures smaller than (read,
‘‘earlier than’’) these structures. Then the two surface structures are com-
bined, and finally movement takes place.
(30)
The GBOIM follows from the RT architecture in this way. The earliest
that wh movement can take place is after the embedding in SS. However,
at that point, not only has the embedded clause been built up to the level
of SS, but so has the matrix clause; thus, there is no analogue of (29a) for
wh movement to apply to. When wh movement applies in SS, since the
surface structure it applies to already has a CP structure, extension (or
something like it) requires that it operate in such a way as to move the wh
item to the periphery of that surface structure. It will thus always move
the wh item to SpecC, since that position is introduced in SS.
Embedding 73
For improper movement to take place, the matrix would have to have
peripheral positions ‘‘lower’’ than the highest position in the embedded
clause. However, that possibility is excluded by the LEC, which says that
embedding can take place only among elements of the same type, because
each level defines a di¤erent type. (31), repeated here from (29a), is
therefore not a possible structure in RT with the LEC.
(31) [V [wh . . . ]CP]V 0
The problem in deriving the GBOIM in a theory in which (31) is a well-
formed syntactic object is that the matrix and embedded clauses are in
di¤erent degrees of development. The embedded clause is fully developed
to the level CP, but the matrix is only partially developed, so there is no
level at which it can embed this CP and thereby derive the improper
movement in (29b). Of course, the matrix itself can be developed to the
level CP, but then the embedding will occur in SS, and extension, or some
equivalent, will force movement to the top of the matrix CP, respecting
the BOIM. It is this di¤erence in development of matrix and embedded
structures that gives rise to the problem of improper movement. In RT,
since embedding is always of objects at the same level, no such di¤erence
arises and improper movement is therefore impossible.
RT crucially needs some notion of extension to prevent trivial defeat
of the most interesting predictions of the LEC. These trivial defeats cor-
respond to what in the standard model would be violations of the strict
cycle if it were applied in a phrase-by-phrase manner, as suggested in
Williams 1974. I will assume that extension, essentially as it is used in
Chomsky 1995, has to be part of the intended interpretation of RT as
well: any operation has to a¤ect material that could not have been af-
fected in a previous level. The parallelism with the standard interpretation
is clear: simply replace level with cycle, where every node is ‘‘cyclic.’’
Without something like extension there is no good reason why movement
in SS would have to be to the periphery of the CP structure defined there,
and not, for example, to SpecIP. In general, extension requires that the
periphery be a¤ected by an operation. There are in fact some problems
with the literal notion of extension, which I will take up later.
Two immediate empirical consequences of the GBOIM are worth
noting here.
First, ‘‘raising to object position’’ as a movement rule is impossible,
since it is a movement from a higher (subject) position in the embedded
clause to a lower (object) position in the matrix clause. If the arguments
74 Chapter 3
(in, e.g., Postal 1974 or Lasnik 1999) for raising to object in ECM con-
structions are correct, then the analysis involving (improper) movement
must now be replaced by an analysis in which mismapping the TS‘CS
representation accounts for the facts. Only ‘‘real’’ (intralevel) movement
is governed by extension.
The more di‰cult problem is tough movement. I think the widely
accepted misanalysis of tough movement as involving movement to ma-
trix subject position has obstructed progress in syntax at several points in
the past 40 years, and so deserves close attention. According to the stan-
dard analysis, tough movement actually seems to involve a pair of move-
ments: first, wh movement to SpecC, and second, a (BOIM-violating)
movement from SpecC of the lower clause to SpecI of the higher.
(32) Johni is tough ti to please ti.
Of course, the di‰culty can be solved by simply generating John in the
top position in the first place, eliminating the second movement. But that
implies that John receives a theta role from tough, and what has always
stood in the way of that conclusion is the synonymy of (32) with (33).
(33) It is tough to please John.
Call (32) the object form, and call (33) the event form (because (32) has
the ‘‘object’’ John as its subject, and (33) has the event to please John as
its subject (extraposed)).
The main argument for tough movement, then, is the synonymy of the
event form and the object form of these sentences. But this synonymy
could be misleading. One component of the synonymy is the perception
that selection restraints on John in the two sentences not only are similar,
but seem to emanate wholly from the lower predicate ( please), and not at
all from the higher predicate (tough). But that perception may be illusory.
It may be that a class of predicates (easy, tough, etc.) takes such a broad
class of arguments, including both events and objects in general, that it
is hard to detect selection restraints; in e¤ect, anything can be easy, for
example. In some cases there is an obvious sense in which a thing can be
easy.
(34) The test/contest/chore/task/errand/puzzle was easy.
At least for such cases, it must be admitted that easy takes a single
nominal argument as subject. For other cases it is less obvious what it
means to apply the term easy.
Embedding 75
(35) The book/store/bank/rock/tower/dog was easy.
For such cases, though, either the context will determine in what way the
thing is easy, or the way it is easy can be specified in an adjunct clause.
(36) The book was easy [to read/write/clean/hide].
But if this view is correct, we are taking the object form to have the
following properties: easy takes the object as its (thematic) subject, and
the clause after easy is an adjunct. We then must conclude that the tough
sentences are at least ambiguous, between this and the usual BOIM-
violating derivation; but now perhaps we can eliminate the latter deriva-
tion, as redundant.
In fact, there is good reason to. First, there are structures just like (36)
whose object and event forms are not synonymous, or even equivalent in
terms of grammaticality.
(37) a. Mary is pretty to look at.
b. *It is pretty to look at Mary.
So we know we need structures of the type suggested by (36) anyway. The
ungrammaticality of (37b) follows simply from the fact that pretty cannot
take an event as an argument, but easy can.
Second, there are structures synonymous with (35) that cannot be
derived by movement. Consider (38a–f ), where (38a) parallels the sen-
tences in (35).
(38) a. John is good.
b. John is good to talk to.
c. It is good to talk to John.
d. John is good for conversation.
e. John is a good person to talk to t.
f. *It is a good person to talk to John.
Good acts like a tough predicate in (38a–c), showing the synonymy of
object and event forms. However, (38d), though roughly synonymous
with (38c), could not conceivably be derived from it. The same is true of
(38e), as (38f ) shows.
So we need to generate the object form directly, with the object getting
its primary theta role from the tough predicate, and getting its relation to
the embedded predicate only indirectly, as the embedded predicate is an
adjunct to the tough predicate.
76 Chapter 3
The adjunct status of the embedded clause is further shown by its option-
ality (see (39a)); in true cases where a matrix subject gets its theta role from
an embedded predicate, the embedded predicate is not optional (see (39b)).
(39) a. John is easy.
b. *John seems.
But so far I have not explained one of the salient facts about the
construction that supports the movement relation I am trying to ban:
namely, that the matrix subject (e.g., of (36)) is interpreted as the object
of the embedded verb. Since in my analysis the matrix subject gets its
theta role from the matrix predicate, and the embedded clause is an ad-
junct clause, it does not immediately follow that the subject will be inter-
preted as identical to the embedded object. Clearly, some mechanism
must interpret the matrix subject as ‘‘controlling’’ the embedded object
position, or more precisely, the operator chain in the adjunct clause that
includes the object gap. I have nothing to contribute to that topic here;
for my purposes it is enough to observe that several diverse constructions
require such a mechanism as a part of their description; the pretty to look
at construction in (37) is one such case, and (40) illustrates two more.
(40) a. John bought it [to look at t]. (purpose clause)
b. John is too big [to lift t]. (too/enough complement)
In each of these the embedded operator chain is linked to a matrix
argument—object in (40a) and subject in (40b). As there is no chance that
movement could establish that link for these cases, I will stick with my
conclusion about the tough cases: the matrix subject gets a simple theta
role from the tough predicate; the embedded clause is an adjunct with an
operator chain, which is interpretively linked to the matrix subject.
If this analysis of the tough construction is correct, then a major ob-
stacle to the (G)BOIM is eliminated, and this I think is in fact the most
compelling reason to accept that analysis.
The LEC rules out more than the (G)BOIM. It also rules out, for
example, any relation between two subject positions if CP structure
intervenes. M. Prinzhorn (personal communication) points out that it
automatically rules out superraising.
(41) a. *John seems [that t saw Bill].
b. *John seems [that Bill saw t].
c. *John seems [that it was seen t].
Embedding 77
Not all of (41a–c) count as pure superraising cases in all theories, but in
fact they are all ruled out by the LEC: once any CP structure is present in
the embedded clause, it is present by hypothesis in the matrix clause, and
so, by extension, it is too late to execute any subject-to-subject relations.
H.-M. Gartner (personal communication) provides more cases that are
relevant for the GBOIM, and hence for the LEC—namely, the following
intriguing examples from German:
(42) a. Weniwho
glaubst
believe
du
you
[t 0i dassthat
Maria
Maria.nom
ti sieht]?
sees
‘Who do you believe that Maria sees?’
b. Weni glaubst du [t0i sieht Maria ti]?
c. Ich
I
frage mich [
wonder
weniwho
du
you
glaubst [t 0ibelieve
dass
that
Maria
Maria.nom
ti sieht]].
sees
‘I wonder who you believe that Maria sees.’
d. *Ich frage mich [weni du glaubst [t0i sieht Maria ti]].
(H.-M. Gartner, personal communication)
Schematically:
(43) a. [wh V [twh]Vfinal]V2b. [wh V [twh]V2]V2c. . . . [wh V [twh]Vfinal]Vfinald. *. . . [wh V [twh]V2]Vfinal
The clear generalization is that it is possible to extract into a V2 (verb-
second) clause from either a V2 or a Vfinal (verb-final) clause, but it is
possible to extract into a Vfinal clause only from a Vfinal clause. This is a
very odd fact. Clearly, V2 clauses are not themselves islands, as (43b)
shows; islandhood is determined not just by where the extracted element
is coming from, but also by where it is going.
This is the sort of fact that barriers were designed for (Chomsky 1982).
But I will instead develop a ‘‘timing’’ explanation in terms of the LEC. It
will be a little like the account of nonbridge verb embedding: specifically,
it will be based on the supposition that V2 clauses are ‘‘bigger’’ (and
therefore ‘‘later’’) than Vfinal clauses. The supposition takes some plau-
sibility from the fact that V2 clauses are most often matrix clauses. We
might imagine that matrix clauses have more functional structure than
embedded clauses—functional structure associated with ‘‘speech act’’
aspects of an utterance (this is the ‘‘performative’’ syntax that harks back
to Ross 1970).
78 Chapter 3
(44) [[[ . . . ] . . . ]FVfinal . . . ]F 0
F 0 here is the extra functional structure that triggers V2; FVfinal structureis strictly smaller.
Furthermore, and in fact as a consequence of being ‘‘bigger,’’ V2
clauses will be later than Vfinal clauses in RT. For concreteness, I will
assume that V2 structures are defined in FS, whereas Vfinal structures are
defined in SS, where SS‘FS.
In this setup wh movement will have to take place at two di¤erent
levels, since the cases we are looking at have embedded wh and matrix
wh. Matrix wh is in FS, and embedded wh is in SS. We might imagine
that FS wh is fed by embedded wh; that is, in terms of the structure in
(44), wh moves to SpecFVfinal in SS, and from there to SpecF0.
(45) [wh [t [ . . . t . . . ] . . . ]FVfinal . . . ]F 0
The second movement might not be a movement, but part of the SS‘FS
representation. However, I will ignore that possibility here as it plays no
role in the explanation of Gartner’s paradigm.
As is well known, some German verbs embed V2 complements, which
in present terms means that they embed FS clauses at the level FS. If
these V2 complements are indirect questions, they will involve FS wh
movement to SpecF 0, as well as V2, which itself is presumably triggeredby F 0. So such embedded questions are completely parallel to matrixquestions in their syntax and relation to the levels. The diagrams in (43)
can now be annotated with the clausal structure postulated in (44), to give
the following structures:
(46) a. [wh V [t [ . . . t . . . ]]FVfinal ]F 0
b. [wh V [t [ . . . t . . . ]]F 0 ]F 0
c. [wh V [t [ . . . t . . . ]]FVfinal ]FVfinald. *[wh [V [t [ . . . t . . . ]]F 0 ]]FVfinal
Only the final movement of the wh in each case is of interest here. Given
that F 0 > FVfinal in the functional hierarchy, only in (46d) is that finalmovement a GBOIM-violating ‘‘downgrading,’’ from F 0 to FVFinal; allthe other final movements are either upgradings (46a) or movements that
maintain the functional level of the trace (46b,c). Hence, Gartner’s para-
digm follows from the GBOIM.
I will conclude this section by pointing out a case that is a counter-
example to the LEC so long as it relies on the completely literal notion of
extension: the French L-tous construction, illustrated here:
Embedding 79
(47) a. Marie
Marie
a
has
toutesiall
voulu
wanted
[les
them
manger ti].
to-eat
‘Marie wanted to eat them all.’
b. Il
it
a
has
tousiall
fallu
needed
[qu’ils
that they
parlent].
speak
‘It was necessary that they all speak.’
c. Il
it
a
has
tousiall
fallu
needed
[que
that
Louis
Louis
les
them
lise
read
ti].
‘It was necessary that Louis read them all.’
In each of these the tous in the matrix modifies the embedded direct ob-
ject, suggesting it has been moved from there. The problem, as noted by
J.-Y. Pollock (personal communication), is that the tous seems to violate
extension under the LEC. Tous is located to the right of the matrix sub-
ject, but seems to have been moved out of an embedded clause that is
‘‘bigger’’ (in terms of functional structure) than the phrase to which it has
attached. This is especially apparent in cases like (47c): tous has moved
out of an embedded that clause, but still has moved to a position short of
the subject in the matrix. The LEC with extension would not allow this: if
the embedded clause is a CP, then so is the matrix, and extension would
dictate no movement except to the edge of that CP. I can imagine two
sorts of answer. First, although tous movement can span clauses, the
clauses must be infinitival, or, as in (47b,c), subjunctive. Infinitival clauses
are smaller (and therefore earlier) than full CPs; perhaps subjunctive
clauses are also smaller and earlier, despite the presence of que. The other
sort of answer requires a reformulation of extension. I have thus far taken
extension quite literally to crucially involve the periphery of the domain.
I might instead reformulate it in a more abstract way, as ‘‘Movement
within a level can only be to positions that are uniquely made available at
that level,’’ without requiring that those positions be peripheral in that
level. I have no concrete suggestion to make, but the issue will recur in
later chapters, as there are other examples of this sort to consider.
3.2.2.2 Subjects In this chapter I have shown that a generalized ban on
improper movement follows from the architecture of RT, and in chapters
4 and 5, I will show how a generalized notion of the A/A distinction
and reconstruction emerges as well. Similarly, I will suggest in this section
that there is a generalized notion of subject in RT, with each level defin-
ing its own particular kind of subject: theta subject in TS, perhaps identi-
80 Chapter 3
fied as agent; Case subject in CS, perhaps identified with nominative
Case; surface subject in SS, perhaps identified with ‘‘pure’’ EPP subjects in
languages with nonnominative subjects like Russian (Lavine 2000) and
Icelandic. Even FS may involve some notion of subject.
In what sense, though, is there a generalized notion of subject? Isn’t
it simply the case that agents are introduced at TS, nominative Case is
introduced at CS, and so on, and that there is no intrinsic link among
these elements, as the term subject tends to imply? In fact, the represen-
tation relation ties these di¤erent notions of subject together: the agent
is ‘‘canonically’’ mapped into the nominative NP in CS, which is ‘‘ca-
nonically’’ mapped into the ‘‘pure’’ EPP subject position in SS, and so
on. I put quotation marks around ‘‘canonically,’’ because that concept is
exactly what this book tries to explicate in terms of the notion of shape
conservation. So RT o¤ers a natural account of the notion that subjects
are agents, nominative, and topicalized: this results from the purely ca-
nonical mapping across all the relevant levels, but it also permits devia-
tion from canonicity, of the type shown in chapter 2.
In what follows I will try to sort out some of the wealth of what is now
known about subjects into properties of di¤erent levels. I cannot pretend
to o¤er anything more than suggestions at this point. I do think that RT
gives voice to the old intuition that there are several di¤erent notions of
subject that get wrapped up into one; at the same time it seems to o¤er
the possibility to derive the properties of the di¤erent notions from what
is already known about the structure of each level and how it is repre-
sented in the next.
3.2.2.2.1 Quirky Subjects For languages like Icelandic at least, it is
obvious that there is a notion of subject more ‘‘superficial’’ than Case as-
signment. I will tentatively identify the level at which this more superficial
notion of subject applies as SS, though in the next section I will revise this
guess to a level intermediate between SS and CS.
As detailed, for example, in Andrews 1982 and Yip, Maling, and
Jackendo¤ 1987, Icelandic has a class of verbs that take subjects that are
not nominative, but are instead ‘‘quirkily’’ Case marked with dative, ac-
cusative, or genitive.
(48) Drengina
the-boys.acc
vantar
lacks
mat.
food.acc
(Andrews 1982, 462)
Embedding 81
In the appropriate circumstances nominative Case can show up on the
direct object when the subject receives a quirky Case.
(49) Mer
me.dat
syndist
thought-saw
alfur.
elf.nom
‘I thought I saw an elf.’
(Andrews 1982, 462)
Andrews presents clear evidence that the dative and accusative NPs
in these two examples are subjects in the obvious senses. First, quirkily
Case-marked NPs can undergo raising, and the quirky Case is preserved
under that operation.
(50) Hana
her.acc
virDistseems
vanta
to-lack
peninga.
money.acc
(Andrews 1982, 464)
The verb vanta assigns quirky accusative Case to its subject, and (50)
shows that raising preserves the Case. It is only in the case of quirky Case
assignment that a raised subject can be Case marked anything but nomi-
native. Second, quirky subject Case marking shows up in Icelandic ECM
constructions.
(51) Hann
he
telur
believes
barninu
the-child.dat
(ı barnaskap sınum)
(in his foolishness)
hafa
to-have
batnaDrecovered-from
veikin.
the-disease.nom
(Andrews 1982, 464)
Third, quirkily Case-marked subjects are ‘‘controllable’’ subjects.
(52) EgiI
vonast
hope
til
to
aD PROito
vanta
lack
ekki
not
efni
material
ı
for
ritgerDina.the-thesis
(Andrews 1982, 465)
As mentioned before, vanta assigns accusative Case to its subject, and as
(52) shows, that accusative NP is silent, but understood as coreferential
with the nominative matrix NP.
Andrews emphasizes that other preverbal NPs, such as topicalized
NPs, cannot participate as the pivot NP in an ECM, control, or raising
construction. So the quirkily Case-marked subjects really are subjects in a
substantive sense.
Clearly, the subject in these sentences is at some point within the Case-
assigning reach of the verb. I will assume that these Cases are assigned in
CS, in the following sorts of structures:
82 Chapter 3
(53) a. CS: [NPnom [V NPacc]]
b. CS: [NPdat [V NPnom]]
Suppose that SS generates structures like the following:
(54) SS: [NPA [V NPB]]
We could regard structures like (54) as Case free, or Case indi¤erent,
leading to slightly di¤erent theories. I will arbitrarily pursue the idea that
such structures are Case indi¤erent. Surface structures are Case indi¤erent
in that A and B in (54) can bear any Case insofar as the well-formedness
conditions of SS are concerned; what Cases they turn out to bear in a
particular sentence will be determined by what Case structures they are
matched up with. The natural shape-conserving isomorphism will identify
NPnom with NPA, and NPacc with NPB. It is natural to identify NPA in SS
as a ‘‘subject’’ and to inquire about its properties. The notion of subject in
CS is obvious: the most externally assigned Case in CS. I will not go into
how structures like (53) are generated, but see Harley 1995 for sugges-
tions compatible with proposals made here (see especially the Mechanical
Case Rule).
Quirky Case marking splits the subject properties in two, a split that
corresponds to the two levels CS and SS in RT: specifically, quirky sub-
jects are Case marked (CS), nonagreeing (CS), raisable (SS), and con-
trollable (SS).
The controllable subject will be the SS subject (to be revised shortly,
when a further level is interposed between CS and SS) regardless of the
Case of the NP in CS that is matched to the SS subject.
Quirky subjects, on the other hand, do not act like nominative subjects
in regard to agreement—quirky subjects do not agree.
(55) Verkjanna
the-pains.gen
er
is
taliDbelieved
ekki
not
g0ta.to-be-noticeable
(Andrews 1982, 468)
Agreement is presumably then a property determined in CS. This
arrangement—Case-marked subject and agreement in CS, controllable
subject in SS, with representational mapping connecting the two—gives
the two notions of subject needed to interact with other phenomena in
grammar. CS looks inward, and SS outward.
3.2.2.2.2 EPP Subjects, Raising, and Control In chapter 2 we saw that
Russian also has a notion ‘‘subject’’ that is ‘‘beyond Case.’’ In certain
circumstances a clause-initial position must be filled, a requirement that
Embedding 83
can be evaded only to achieve a special focus e¤ect. Furthermore, the
trigger for this movement is not Case, as the NP (or other phrase moved
to clause-initial position) already has its own Case, which it brings with it.
In a ramified Pollock-style model, such examples can be understood as
instances of ‘‘pure’’ EPP, a movement motivated apart from any Case
requirement. They are also beyond any requirement of agreement. They
are therefore beyond CS, like the Icelandic examples. But in fact, they
di¤er from the Icelandic examples in an important way: the pure EPP
position in Russian is also not a controllable position. The Russian verb
tosnit’ ‘feel nauseous’, like the verb zalozilo ‘clogged’ discussed in chapter
2, takes no subject argument; but it does, again like zalozilo, take an in-
ternal accusative object that must be fronted in ‘‘neutral’’ circumstances. I
have chosen this verb because it has an animate argument and so could
potentially participate in control structures. But in fact that NP argument
cannot be controlled.
(56) a. Dzona
John.acc
tosnilo.
felt-nauseous.neut
b. Menja
me.acc
prodolzalo
continued
tosnit’.
to-feel-nauseous
(Babby 1998a)
c. *Ja
I
xocu
want
tosnit’.
to-feel-nauseous
d. Ja
I
xocu,
want
ctoby
so-that
menja
me.acc
tosnilo.
feel-nauseous
(E. Chernishenko, personal communication)
(56a) illustrates the use of the verb in a tensed clause. (56b) shows that
the verb is compatible with aspectual predicates. (56c) illustrates the un-
grammatical situation where the accusative NP is controlled as the sub-
ject of an embedded infinitive. (56d) shows how a Russian speaker would
say what (56c) intends to say—using a subjunctive clause with an overt
accusative argument, clearly not a control structure.
In the view put forward here, (56a), (56b), and the embedded clause of
(56d) all have subjectless TSs, which are mapped, at least in the case of
(56a) and (56b), to subjectful surface structures, but too late for control;
at the relevant level for determining control, they still have no subject.
Assuming that control is established at CS (this will be amended shortly),
(56a) and (56c) are derived as follows:
84 Chapter 3
(57) a. TS: [tosnilo Dzona]‘CS: [tosnilo Dzona]
‘ SS: [Dzona tosnilo]
b. TS: ja xocu [tosnit’ PRO]‘CS: ja xocu [tosnit’ PRO]
‘ SS: ja xocu [PRO tosnit’]
The infinitive in (57b) does not have a PRO subject until SS, too late for
control in CS. I have implemented control in terms of PRO, but that is
not essential to the point. What is essential is that at the relevant level,
and in the relevant sense, tosnit’ does not have a subject.
So Russian diverges from Icelandic on this point (cf. Icelandic (52)). In
order to assess this di¤erence between Russian and Icelandic, we must fix
the level at which control is established. This question can be approached
in both RT and the standard ramified Pollock/Cinque-style clause struc-
ture. In a theory with such a clause structure, we would conclude that
there was a further level of functional structure that could be used to sort
out the di¤erent notions of subject, as shown in (58).
(58)
This array of conclusions can be modeled in RT by the following sub-
sequence of the representational chain:
(59) Case-Agr Structure‘Control Structure‘Russian-EPP Structure
Each representation would have a subject position, which would be
mapped or mismapped from a previous level. Control Structure would
have mapped into its subject position the highest Case position in CS; the
objects defined in Control Structure would be the ones selected by raising
and control predicates; and Russian-EPP Structure would have a notion
of subject more abstract than (in other words, not limited to) Control
Structure.
The equivalence of (58) and (59) should be familiar by now, which is
of course not to say that the theories in which they arise are equivalent. In
both theories certain results must obtain to achieve empirical adequacy:
in English all three notions must collapse into one; in Icelandic control
subjects must be distinct from Case-Agr subjects; and in Russian all three
Embedding 85
notions must be distinct. The two models will achieve these results in dif-
ferent ways.
The question for RT is how to graft the subchain in (59) into the model
presented in chapter 2. This question could be definitively answered by
identifying the ends of (59) with elements of the chapter 2 sequence. A
plausible candidate of course is that Case-Agr Structure is CS and that
Russian-EPP Structure is SS; but then Control Structure will intervene
between CS and SS as a new level.
In fact, there is good reason to posit a level between CS and SS. The
reasoning is simple: there is a notion of subject that is more abstract or
more general than ‘‘most externally assigned Case’’ but narrower than
‘‘topicalized subject.’’ Control and raising seem to require some interme-
diate notion of subject. In chapter 4 we will see that anaphoric control
requires a further notion of subject as well. The question then emerges, do
all these phenomena converge on a single notion of intermediate subject?
One consideration is the bounding of anaphors. Earlier I identified the
English anaphor as a CS anaphor. One reason for positing a level earlier
than SS is that CP structure is defined in SS, and elements in SpecC do
not seem to be able to antecede English reflexives, as shown earlier (this
is simply the well-known generalization that English reflexives must be
A-bound). Himself is thus bound by some earlier notion of subject; the
question is, is it the CS subject? For English it is di‰cult to say, but for
the Icelandic reflexive sig the answer is no.
We also know from Icelandic that the control subject is not the agree-
ment subject. For one thing, Icelandic allows control of NPs that would
not be assigned nominative Case. Moreover, when nominative Case is
assigned to the object and the verb agrees with it, control nevertheless
targets a di¤erent NP, the ‘‘subject’’ in some higher (or later) sense. This
later subject then is not the agreement subject, which we might take to be
the CS ‘‘subject.’’ But neither is it an SS subject, in that it is restricted to
A antecedents. This Icelandic anaphor, as well as the English himself, is
thus likely to be an element introduced in a level intermediate between CS
and SS, a level I will now identify with the label Predicate Structure (PS).
We have thus identified (58) as the subsequence CS‘PS‘ SS, so that
the model now looks like this:
(60) TS‘CS‘PS‘ SS‘FS‘PSb
QS
86 Chapter 3
Assigning himself to PS in English is slightly arbitrary, since it could
as easily be assigned to CS; only Icelandic shows evidence of the slightly
more abstract notion of subject. But this assignment does allow the im-
mediate annexation of the findings reported in Williams 1980, where the
licensing of anaphor binding in English was identified with the notion
‘‘predicate,’’ rather than ‘‘subject’’; in the present context we could return
to the notion ‘‘subject,’’ but only if we mean precisely the PS subject.
Another phenomenon that might be accounted for in terms of the
properties of PS is VP deletion. PS will define a notion of one-place
predicate, corresponding to some version of the (traditional) English VP,
which is abstracted away from whatever subject it is applied to; this
abstracted VP is what is needed to account for so-called sloppy identity.
(61) John likes himself and Sam does too.
What does Sam do? In the sloppy reading he does not ‘‘like John’’;
rather, he ‘‘self-likes,’’ just as John does. There is some controversy
whether this is the right view. I will return to the matter in chapter 9,
where I fill in some idea of what the semantic values assigned to the
objects in each level are.
Control and raising themselves must be assigned to some representa-
tion earlier than SS, if SS is where CP structure is introduced. Essentially,
this follows from the logic of the GBOIM in RT, even though it is not
usually considered a case of improper movement. Control and raising are
NP Structure rules, in the terminology of Van Riemsdijk and Williams
(1981), which entails that they are always relations between pairs of A
positions. But by the LEC, they must then be defined in a level that has
only A positions; this excludes SS, if SS is the level in which CP structures
are introduced. In other words, the following will always be ungrammat-
ical structures:
(62) a. *John seems [ . . . to have won]CP.
b. *John tried [ . . . to have won]CP.
These violate the GBOIM in RT, though not in the familiar application
of the term improper movement, as noted earlier. Since we already know
from Icelandic that control is defined in a more abstract level than CS, we
are left with the conclusion that control is bounded by CS on one side and
SS on the other—and so we are left with the conclusion that control is
defined at PS as well.
Embedding 87
The conclusion about (62a) was established independently in Williams
1994b, where it is argued that CP structure inhibits the transmission of
the theta role to the matrix subject. (62b) is a case of obligatory control,
in the sense of Williams 1980, where it is demonstrated that there are no
cases of obligatory control over CP structure; that is, control of x by John
in examples like the following is always an instance of optional or ‘‘arbi-
trary’’ control:
(63) a. John wonders [who [x to talk to]]CP.
b. [Who [x to talk to]]CP was not known to John.
See Williams 1980 for further discussion, and also see Wurmbrand 1998
for a comprehensive account of the di¤erence between obligatory and
optional control exercised across a variety of European languages that
delivers exactly this conclusion. But see also Landau 1999, where it is
argued that the obligatory/optional distinction is specious.
It follows as well that PRO in CP cannot be controllable; that is, deri-
vations like (64) are impossible.
(64) *Johni wants [PROi [ — to talk to ti]]CP.
This again follows if control is defined at PS. But alongside (64) we do
find (65a,b).
(65) a. John bought it [OPi [ — to read ti]]CP.
b. A shelf arrived [OPi [ — to put books on ti]]CP.
(65b) appears to involve a control relation between the direct object and
the SpecCP of the clause [OP [to to put books on t ]]. Why is that relation
allowed, if control is consigned to PS? The crucial di¤erence between
(65a) and (65b) must be that the clause in (65b) is an adjunct clause. The
rules determining the form and meaning of adjunct clauses are patently
not confined to PS, as in general wh movement can be involved in the
formation of adjuncts (e.g., relative clauses). The question remains, are
there any principled grounds for separating ‘‘real’’ control from control
of wh-moved operators in adjunct structures? I will postpone this question
until it is appropriate to discuss in general when adjuncts are embedded.
For the time being we may satisfy ourselves with the idea that ‘‘argu-
mental’’ control is established at PS.
Part of the benefit of the LEC can be achieved in a theory with stan-
dard clausal architecture by allowing the embedding of structures smaller
than CP—that is, ‘‘small clauses.’’ Locality e¤ects and limitations on the
88 Chapter 3
target of rules can be achieved in this way: embedding structures smaller
than CP will give a weaker clause boundary (thus allowing local rules
to apply in such a way as to bridge the clause boundary), and omitting
CP will at the same time provide a narrower class of targets (the A target
SpecC will be excluded, for example). This was the strategy adopted in
Williams 1974, where I argued that certain clause types lack CP structure,
having only IP or smaller structure (hence, ‘‘small clauses’’) (though this
terminology did not exist at the time—CP90s ¼ S 070s; IP90s ¼ S70s). Forexample, there are no gerunds with a wh complementizer system, so ger-
unds cannot be used to form indirect questions.
(66) *I wondered [whose book Bill’s having seen t].
What the LEC in RT adds to the small clause theory is that ‘‘smaller’’
corresponds to ‘‘earlier,’’ and this draws in the further property of rules
connected with reconstructivity—that is, the details about what move-
ment rules reconstruct for what relations. It also draws in the notion of
target type (A vs. A), if each RT level defines di¤erent types of NPs.
Small clause theories have no means of connecting locality with these
notions of target type and reconstructivity in a theoretically organic way.
I will discuss the full set of locality-reconstructivity-target correlations
(LRT correlations) in chapter 4. But for the moment I restrict attention
to the correlation between target and locality.
Wurmbrand (1998) has pursued the small clause methodology for
German restructuring verbs; she argues that they lack CP and IP struc-
ture, having only something like VP structure, and proposes that their
clause-union-like properties result from the smaller structure. This sort of
analysis is quite similar to the proposal I am making, in that smaller
clause types result in more clause union e¤ects, and it thus explains
locality-target correlations—penetrable complements are ones that lack
A targets.
Cinque (2001) has taken a di¤erent but related tack. He has argued
that restructuring verbs actually are themselves functional elements. Sup-
pose that clausal functional structure ¼ F1 > F2 > � � �Fn. Normally, amain verb takes a complement by instantiating Fn, and taking an FiP as
its complement. But Cinque suggests that a restructuring verb is an Fi,
and that it takes the rest of the functional chain, Fiþ1 > � � �Fn, as itscomplement, just as an abstract Fi would.
At first glance this would appear to give the same results as the small
clause approach: the restructuring verbs will take smaller complements
Embedding 89
than normal verbs, in that a restructuring verb identified as an Fi will
take as its complement only the tail of the clausal functional chain start-
ing at Fiþ1 and so will in e¤ect take a small clause as its complement.Clause union e¤ects will derive from the fact that the restructuring verb
and its complement compose a single clausal functional chain.
On the last point, though, Cinque’s proposal is quite di¤erent from the
small clause embedding proposal, the RT proposal (with the LEC), and
Wurmbrand’s proposal. In these accounts a small clause complement is a
separate (if degenerate) subchain from the chain that terminates in the
restructuring verb, not a continuation of that subchain.
The di¤erence is radical enough that it should be easy to devise decisive
tests, though I will not try to do so here. On one count, though, the evi-
dence is very simple, at least in principle.
Cinque argues for his proposal in part by pointing out that adverbs
that cannot be repeated in a single clause also cannot be repeated in a
restructuring verb structure. This of course does not follow at all from a
theory in which there is an actual operation of clause reduction. It
does follow from Cinque’s proposal if we accept Cinque’s (1998) central
idea about the distribution of adverbs: namely, that adverb types are in
a one-to-one relation to clausal functional structure, and that the non-
repeatability of adverbs follows from the absence of subcycles in the
clausal functional structure. Naturally, this nonrepeatability will carry
over to restructuring structures, if the verb and its complement instantiate
a single clausal functional structure.
The prediction is somewhat di¤erent in a small clause theory of the
restructuring predicates. The di¤erence between the two theories is sche-
matized in (67) (RV ¼ restructuring verb; MV ¼ main verb).(67) a. Cinque-style theory
F1 > F2 > F3 > F4 > F5 > F6 > F7RV MV
b. Small clause theory
F1 > F2 > F3 > F4 > F5 > F6 > F7 > F5 > F6 > F7RV MV
In the Cinque-style structure there is one clausal architecture, F1 . . . F7;
in the small clause structure the restructuring verb itself is an F7 and takes
the small clause F5 > F6 > F7 as its complement.
The theories coincide in predicting that adverbs associated with ‘‘high’’
adverbs at F1 . . . F4 cannot be repeated, if we make Cinque’s assumption
90 Chapter 3
about the relation of adverb type to functional structure, simply because
these functional projections occur only once in each structure. But with
respect to ‘‘low’’ adverbs, ones associated with F5 . . . F7, the theories
diverge. The Cinque-style structure predicts that they will not be repeat-
able. The small clause theory predicts that they will be repeatable—once
modifying the restructuring verb, and once modifying the main verb.
The small clause analysis seems to be borne out in the following
example:
(68) John quickly made Bill quickly leave.
The manner adverb quickly can be seen to modify both the restructuring
verb and the main (embedded) verb, and thus the structure (67b) appears
to be the correct one. This at least establishes that the small clause anal-
ysis is correct for make in English; I have not obtained the facts about
Romance restructuring verbs to determine whether they behave as make
does in (68).
In this section I have taken up some new empirical domains (control,
raising, predication, VP deletion, and their interaction with Case) and
posited a further level in RT to treat the complex of phenomena that arise
when they interact. I cannot blame the reader who at this point is dis-
tressed by the proliferation of levels in RT. But I do think that some
perspective is required in evaluating the practice involved. Much of the
proliferation of levels corresponds, point by point, with proliferation in
a ramified Pollock/Cinque-style theory (RP/CT), in that there is at the
limit (the worst case) a one-to-one correspondence between levels in
RT and functional elements in RP/CT. As I remarked earlier, the worst
case deflates my theory, because in this case the parallelism induced by
Shape Conservation is trivialized. But for the moment I would focus on
the fact that RP/CT lives with the following more or less permanent
mystery: there is a fixed universal set of functional elements in some fixed
order that defines clause structure, each with its own properties and its
own dimensions of linguistic variation. Now, this mystery corresponds
exactly to the ramified levels of RT—to the extent that often a revision
in the understanding of the role of a functional element in RP/CT will
translate straightforwardly into a revision in the understanding of a
level in RT. The fact that functional elements are called lexical items in
RP/CT and levels in RT should not be allowed to obscure this corre-
spondence. I think the correspondence puts into perspective the method-
ology that RT naturally gives rise to: solve problems by figuring out what
Embedding 91
levels are involved in the phenomena and fix the details of those levels
accordingly—in the worst case, standard practice.
3.2.2.2.3 Subject Case and Agreement In this last subsection I will
speculate on how an insight of Yip, Maling, and Jackendo¤ (1987) could
be expressed in RT. There is a di¤erence between English, on the one
hand, and both ergative languages and languages like Icelandic, on the
other hand, which has eluded the model so far. In English, the subject, if
Case marked, is always Case marked in a way that is independent of the
verb it is subject of, and in particular, independent of what Cases are
assigned in the VP. But in the other languages mentioned, subject Case
marking is dependent on the Case structure of the VP in ways noted
earlier. Yip, Maling, and Jackendo¤ suggest that the subject falls within
the Case domain of the verb in Icelandic-type languages, whereas in
English the subject is in a separate domain; in Icelandic, in their view,
there is only one Case domain, whereas in English the clause is divided
into two Case domains. A further corollary of this view is that nonsub-
ject nominatives will be found only in Icelandic-type languages.
In the present theoretical context we might adapt Yip, Maling, and
Jackendo¤ ’s conclusions by treating English nominative as a Case
assigned at PS instead of CS. If it is the only Case assigned in PS, and if
it is always assigned to the subject defined at that level, then there will be
no opportunity for it to mix with the rest of the Case system, which is
assigned at CS.
Under this arrangement we no longer have a ‘‘single-level’’ Case
theory. But perhaps it is arbitrary to expect that in the first place.
This arrangement makes an interesting prediction about expletives. The
simplest account of expletives is to treat them as ‘‘formal’’ Case holders;
that is, they occupy a Case position in CS that does not correspond to a
theta position in TS. But in fact, we might consider confining expletives
to PS; in that case we would expect (subject) expletives only in languages
like English, which has an ‘‘absolute’’ nominative subject requirement.
I do not know if the facts will bear out this conclusion. But German is
clearly a language of the Icelandic type with regard to Case assignment.
(69) Mir
I.dat
ist
is
geholfen.
helped
‘I was helped.’
Since the dative subject in (69) is a controllable nonnominative subject,
the remarks about Icelandic apply here. Moreover, German does not
seem to have a subject expletive.
92 Chapter 3
(70) a. Es
it
wurde
was
getanzt.
danced
‘There was dancing.’
b. Gestern
yesterday
wurde (*
was
es)
it
getanzt.
danced
c. Ich
I
glaube
believe
dass (*
that
es)
it
getanzt
was
wurde.
danced
The expletive es appears only in matrix clauses, presumably because it is
not a subject expletive, but a fill-in for Topic position; therefore, because
of the well-known matrix/subordinate di¤erence in German clausal syn-
tax—topicalization and V-to-C movement apply only in the matrix—it
will play a role only in matrix clauses. So, even the notion ‘‘expletive’’
needs to be generalized across the RT levels.
The lesson from this section will become familiar: a previously unitary
concept is generalized across the levels of RT. In this case it is the notion
‘‘subject,’’ di‰cult to define, but now decomposed into components:
agreement subject, control subject, thematic subject, Case subject, pure
EPP subject, and so on. But the decomposition brings more than its
parts, because these notions are ordered with respect to one another by
the asymmetric representation relation. The ordering allows us to say,
for example, that Icelandic quirkily Case-marked subjects are ‘‘earlier’’
than Russian pure EPP subjects and therefore liable to control.
Embedding 93
This page intentionally left blank
Chapter 4
Anaphora
The overall typology of anaphoric elements can be reinterpreted in terms
of the di¤erent levels of RT. Associating di¤erent anaphors with di¤erent
levels interacts with the LEC to fix properties of anaphoric items in a way
that I think is unique. In a sense it is a generalization of the method used
to explain the BOIM in chapter 3. The same method will be applied more
broadly still in chapters 5 and 6.
The Level Blocking Principle introduced in chapter 3 will play an im-
portant role in the discussion as well. According to this principle, if one
and the same operation can take place in two di¤erent levels, the appli-
cation in the early level blocks the application in the later level. If ana-
phors are introduced at every level, the applicability of such a principle
will be obvious.
4.1 The Variable Locality of Anaphors
It emerged in the 1980s, beginning with Koster 1985, that there is a hier-
archy of anaphoric elements, from ones that must find their antecedents
at very close range, to those whose antecedents can be very far away. It
will be natural to associate these with the levels of RT in such a way that
the more long-distance types are assigned to later structures, with the
hope that the ranges of the di¤erent types can be made to follow from
the ‘‘sizes’’ of the objects defined at each level. In this sense RT levels
index the set of anaphors in the same way that they index embedding
types as shown in chapter 3. Here and in chapter 5 we will see that RT,
with the LEC, draws together three di¤erent properties of syntactic
relations: their locality, their reconstructivity, and their target (where
target refers to choice of A or A antecedent, generalized in a way to be
suggested in chapter 5). I will refer to the correlations among these three
di¤erent aspects of syntactic relation as the LRT correlations (locality-
reconstructivity-target). Although di¤erent aspects of this three-way cor-
relation have been identified in previous work, it seems to me that the
whole of it has not been drawn together theoretically, nor has the scope
of the generalization involved been well delineated. I believe it is a dis-
tinctive feature of RT that it forces a very strong generalized version of
the correlation.
For example, RT makes explicit the following correlation about how
locality and type of possible antecedent covary. Traditionally, it has been
assumed that an anaphor must have an A position antecedent. For ex-
ample, it has been held that a wh-movement-derived antecedent is not
available for English reflexives (except of course under reconstruction).
Thus:
(1) a. *John wondered [which man]i pictures of himselfi convinced
Mary that she should investigate t.
b. John wondered which mani Bill thought [ti would like himself ].
In (1b) the reflexive is bound to which man, but via its A position trace,
which c-commands and is local to it. In (1a), however, this is impos-
sible; the trace of which man does not c-command the anaphor, and (most
importantly) which man is in an A position and so is ineligible itself as
antecedent.
But in RT, the notion of A position is relativized. Each representation
relation gives rise to a unique A/A distinction: positions at level Ri are
A positions with respect to positions at level Riþ1. As a result, we mightexpect anaphors at di¤erent levels to behave di¤erently; specifically, we
might expect anaphors at later levels to have an apparently ‘‘expanded’’
notion of potential antecedent (target). Furthermore, as we move ‘‘right-
ward’’ in the RT model, this expanding class of antecedents should be
correlated with loosening locality restrictions, simply because the struc-
tures get ‘‘bigger.’’ Thus, the discussion in this chapter helps substantiate
the LRT correlations made possible by the LEC in chapter 3, and first
put to analytic use there to generalize and explain the BOIM.
The correlations are purely empirical, and not necessary (apart from
the theory that predicts them, of course). Consider, for example, Japa-
nese zibun and Korean caki. As is well known, these anaphors are not
bounded by subjects as English himself is, or in fact by any sort of clause
type; nor are they bounded by Subjacency.
96 Chapter 4
(2) Johni-i
John-nom
Billj-ekey
Bill-dat
Maryk-ka
Mary-nom
cakii=j=k-lul
self-acc
cohahanta-ko
like-compl
malhayssta.
told
‘Johni told Billj that Maryk likes selfi=j=k.’
(Korean; Gill 2001, 1)
As a consequence, in RT caki must be an anaphor that is introduced in a
late level, perhaps SS or FS, the levels at which tensed-clause embedding
takes place. As an SS (or FS) anaphor, it will take as its antecedents the
elements that are developed at SS (or FS), among them the Topic and
the Focus of the utterance. So, caki should be able to be bound by a class
of antecedents not available for the English reflexive, namely, A ante-
cedents; and this prediction seems to be borne out.
(3) Johni-un
John-top
ttal-i
daughter-nom
cakii-pota
self-than
ki-ka
height-nom
te
more
kuta.
is-tall
‘As for Johni, (his) daughter is taller than selfi.’
(Korean; Gill 2001, 1)
In this structure caki is bound from the derived A Topic position. This is
possible because caki is licensed at SS, where such elements as Topic are
introduced. Similar facts hold for zibun in Japanese and ziji in Chinese.
Such licensing is impossible in English, as English reflexives are
licensed in CS (or at least, before SS), and Topics don’t exist in that level.
(4) *(As for) [JohnT]i . . . the book for himselfi to read was given to t by
Bill.
In RT this property of the English reflexive is not a free parameter, but is
determined by another di¤erence between zibun and himself. Namely,
subject opacity holds for himself, but not zibun, because in the RT model
each of these properties is determined by what level the reflexive is intro-
duced in, so only certain combinations of properties is possible.
In addition to zibun, Japanese has another reflexive, zibunzisin, which
is essentially like English reflexive himself, both in locality and in type of
antecedent (A/A).
(5) Johni-wa
John-top
[[Billj-ga
Bill-nom
Maryk-ni
Mary-dat
zibunzisin�i=j=�k-o
himself-acc
subete
all
sasageta]
devote
to]
that
omotta.
thought
‘John thought that Bill devoted all (of ) himself to Mary.’
Anaphora 97
Latin also shows a correlation between distance and type of anteced-
ent. According to facts and analysis provided in Benedicto 1991, the
Latin se anaphor has both a greater scope and a greater class of possible
antecedents than standard anaphors. First, reflexive binding of se (dative
sibi here) can penetrate finite clause boundaries.
(6) CiceroiCicero.nom
e¤ecerat
had-achieved
[ut
comp
Quintus
Quintus
Curius
Curius.nom
consilia
designs.acc
Catalina
Catalina.gen
sibiirefl.dat
proderet].
reveal.subj
‘Cicero had induced Quintus Curius to reveal Cataline’s designs to
him.’
(Sall, Cat., 26.3; from Benedicto 1991, (1))
In fact, it can even penetrate into finite relative clauses.
(7) EpaminondasiEpaminondas.nom
[ei
him.dat
[qui
that.nom
sibiirefl.dat
ex
by
lege
law.abl
praetor
praetor.nom
successerat]]
succeeded.ind
exercitum
army.acc
non
not
tradidit.
transferred
‘Epaminondas did not transfer the army to the one who succeeded
him as a praetor according to the law.’
This is especially noteworthy since it casts doubt on treating long-distance
reflexivization in terms of movement, as several accounts have proposed.
Given this, we would expect the reflexive to occur in late levels. From
that it would follow that it could target A antecedents. Citing the follow-
ing examples, Benedicto argues that this is exactly the case:
(8) Canumidogs.gen
tam
such
fida
trusty
custodia
watchfulness.nom
quid
what
significat
mean
aliud
else
nisi
except
[seirefl.acc
ad
for
hominem
men.gen
commoditates
comfort.acc
esse
be
generatos]?
created.inf
‘The trusty watchfulness of the dogs, . . . what else does it mean,
except that they were created for human comfort?’
‘Cic., Nat. deor, 2.158; from Benedicto 1991, (24))
(9) A
by
CaesareiCaesar.abl
ulade
very
liberaliter
generously
inuitor
am-invited
[sibiirefl.dat
ut
comp
sim
be.subj
legatus].
legate.nom
‘Caesar most liberally invites me to take a place on his personal sta¤.’
(Cic., Att., 2.18.3; from Benedicto 1991, (25))
98 Chapter 4
(10) A
by
Curione
Curio.abl
mihi
me.dat
nuntiatum
announced
est
was
[eum
he.acc
ad
to
me
me.acc
uenire].
come.inf
‘It was announced to me by Curio that he was coming to me.’
(Benedicto 1991, (33))
Benedicto makes the point that normally passive by phrases cannot con-
trol reflexives in general. The fact that the by phrase in (10) is the ante-
cedent of the reflexive suggests that it can be so solely by virtue of its role
as a topicalized NP, which of course is consistent as well with its surface
position.
In RT terms this means that the reflexives in these examples are directly
bound by the Topic position, not bound to the trace of the Topic position.
(11)
Anticipating the discussion of Reinhart and Reuland’s (1993) theory of
anaphora in section 4.3, I will note that it seems unlikely that Benedicto’s
conclusions can be rewritten in terms of logophoricity (unless logophoric
is redefined to correspond to topic-anteceded ).
Important to evaluating RT in this connection is that in the absence
of a theory there is no logical connection between locality and type of
antecedent, in either direction. Thus, the locality of himself does not pre-
dict the ungrammaticality of (1a), as no subject interrupts the anaphor-
antecedent relation. In the other direction, there is nothing about the
lack of locality of zibun that directly predicts that it could be bound by
A antecedents. One can easily imagine a language in which, for example,
(2) is grammatical but (3) is not—all one would need is the ability to
independently specify the locality and the antecedent class for a given
anaphor.
RT does not allow this, as both properties of an anaphor derive from
the particular level the anaphor is assigned to. Assigning an anaphor to a
level simultaneously determines its locality (its relation to its antecedent
will be restricted to spanning the objects that are manufactured at that
level) and its antecedent class (it will take as antecedents the elements that
appear in structures at that level). And it does so in a completely gener-
alized ‘‘graded’’ or indexed way: the larger the locality domain, the wider
the class of antecedents. In this regard RT is more generous than other
theories with only the A/A distinction. But that generosity is apparently
needed, and it is compensated by the locality-type correlation. In section
4.3 I will suggest that the flaws in Reinhart and Reuland’s (1993) theory
Anaphora 99
stem mainly from its having only a binary distinction for types of ana-
phors (their ‘‘reflexive predicate/logophoric pronoun’’ distinction) instead
of the indexed notion suggested here.
In advance of looking at any data, we can in fact sketch what we would
expect to be the properties of anaphors at di¤erent levels in RT. These are
all consequences of the LRT correlations, which in turn follow from the
architecture of the model. If there is an anaphor associated with TS, for
example, it will relate coarguments of a single predicate, and nothing else,
because the structures of TS are verbs combined with arguments. If there
are complex theta structures for clause union e¤ects, we would expect the
antecedent-anaphor relation for these anaphors to be able to span these
complex units. In English the prefix self- has exactly this property: it
can relate coarguments of a single predicate, but nothing further away,
whether or not a subject intervenes. Its extreme locality can best be
appreciated by comparing it with the English syntactic reflexives him/her/
it/oneself, which permit the following pattern of antecedents:
(12) a. Stories about the destruction of oneself can be amusing.
b. ‘x’s stories about y’s destruction of x’
c. ‘x’s stories about y’s destruction of y’
(12b) and (12c) are both possible interpretations of (12a); but with the
anaphoric prefix self-, instead, only the reading corresponding to (12c) is
available.
(13) a. Self-destruction stories can be amusing.
b. *‘x’s stories about y’s destruction of x’
c. ‘x’s stories about y’s destruction of y’
(13b) represents the case where the antecedent is not a coargument of the
reflexive; such cases are impossible for self-, but possible for oneself. A
first guess about what is wrong with (13b) would be that destruction had a
covert opacity-inducing subject; but that account would fail to explain
why (12b) is not parallel to (13b).
In the context of RT, if we assign self- to the earliest level, TS, the
observed behavior is expected. Anaphors like himself and oneself will be
assigned to (possibly) higher levels. The assignment of self- to the lowest
level is probably not accidental; being an a‰x, it has access only to TS,
since the levels higher than TS in RT play no role in morphology, as I
proposed in the account of the Mirror Principle in chapter 1. This con-
clusion holds only for what is traditionally called derivational morphol-
100 Chapter 4
ogy. Inflectional morphology clearly must have access to all levels. In
chapter 8 I reconstruct the traditional distinction in RT terms.
There are nona‰xal syntactic reflexives that also seem to be confined
to TS. For example, Baker (1985) reports that the reflexive in Chi-mwi:ni
is a free reflexive like English himself, but it is confined to direct object
position and can take only the immediate subject as argument. And in
fact one of the Dutch reflexives discussed in the next section is probably
another case of this kind.
4.2 Dutch zich and zichzelf (Koster 1985)
The Dutch reflexives zich and zichzelf, discussed in detail by Koster
(1985), can be distinguished by assigning them to di¤erent RT levels.
(14a–d) are Koster’s examples showing the di¤erence in locality between
the two.
(14) a. *Max
Max
haat
hates
zich.
self
‘Max hates himself.’
b. Max
Max
hoorde
heard
mij
me
over
about
zich
self
praten.
talk
‘Max heard me talk about him.’
c. Max
Max
haat
hates
zichzelf.
selfself
‘Max hates himself.’
d. *Max
Max
hoorde
heard
mij
me
over
about
zichzelf
selfself
praten.
talk
‘Max heard me talk about him.’
(14a) shows that zich cannot take a clausemate antecedent, and (14c)
shows that zichzelf can. We may achieve an adequate description of these
facts by assigning the two reflexives to di¤erent levels: zichzelf to TS, and
zich to CS. These assignments are warranted if zich approximates English
himself and zichzelf approximates English self-, given the discussion in
section 4.1.
These assignments explain (14d), insofar as zichzelf, being a TS ana-
phor, is restricted to coargument antecedents; but they do not, strictly
speaking, explain (14a)—that is, why zich is ungrammatical with a co-
argument antecedent. An obvious first guess is that zich is subject to some
kind of Condition B and is too close to Max to satisfy that condition.
Anaphora 101
However, I think it would be more interesting to explore the idea that
zich and zichzelf are in a blocking relation with one another: where one
is used, the other cannot be. (See Williams 1997 for a general discussion
of the role of blocking in anaphora.) As in other blocking relations, the
direction of blocking is determined by the licensing conditions that hold
for the two items in the blocking relation; when the licensing conditions
associated with one of the items are strictly narrower than the licensing
conditions associated with the other, the former will block the latter when
those narrower conditions obtain. In the case of zich and zichzelf it is
obvious that zichzelf will block zich, because the conditions for licensing
TS anaphors are narrower than the conditions for licensing CS anaphors:
TS anaphors are limited to coargument antecedents, while CS anaphors
are not so limited, but could include them. The existence of (14c), then, is
the reason that (14a) is ungrammatical. If this is correct, then Condition
B will not be relevant, a conclusion that I will demonstrate again shortly,
on di¤erent grounds.
In general, if a given ‘‘process’’ can occur at more than one level, then
an application in an earlier level will block an application in a later level.
I would hope that the reasoning will always reduce to the asymmetry of
the representation relation, as suggested by the last part of the previous
paragraph in connection with reflexives at di¤erent levels, but I have
not thought through the problem su‰ciently to be sure that the logic
appealed to there will be available in all cases. This kind of blocking of an
early level by a late level is frequent enough to deserve a name, so I will
call it level blocking. It will be relevant again in chapters 5 and 6 in con-
nection with scrambling. The problem for applying the principle in gen-
eral is identifying instances of the ‘‘same process,’’ a murky concept; it is
in fact what makes blocking murky in general, but no more so here.
Murky, but inescapable, apparently.
Not only is blocking a more interesting theoretical possibility than a
Condition B solution to the ungrammaticality of (14a); there are also
empirical obstacles to implementing the latter solution. With some inher-
ently reflexive verbs, zich is permitted in a clausemate context.
(15) a. Max
Max
wast
washes
zich.
self
‘Max washes himself.’
b. Max
Max
schaamt
shames
zich.
self
‘Max is ashamed.’
102 Chapter 4
This strongly suggests that zich is not subject to anything like Condition
B; if it were, there would be no account of the di¤erence between (14a)
and (15a), since they have identical syntactic structure. But the blocking
account of the antilocality of zich at least hints where to look for the
answer. Under the blocking account (15a) would be grammatical only if
for some reason zichzelf is not permitted with these verbs; and in fact it is
not.
(16) *Max schaamt zichzelf.
Why is this so? Perhaps these verbs are only ‘‘formally’’ reflexive; that
is, perhaps they are, thematically speaking, intransitive verbs. In that case
there would be no possibility of introducing the reflexive in TS, as TS
consists purely of theta roles, and so nothing corresponding to the posi-
tion of the reflexive. The reflexive is therefore a kind of expletive element
in such cases. Having no theta role, it cannot be introduced until CS,
when the nonthematic but Case-marked direct object is introduced. But
since zichzelf is eligible only for coargument anaphora, it cannot be
used, enabling zich to appear. This explanation is satisfying because it
relates the thematic structure of the verb to the already established
blocking relation between the two reflexives in the only way that they
could be related.
In fact, this conclusion carries over to English, which also has ‘‘formal’’
or ‘‘expletive’’ reflexives. Consider (17).
(17) John behaved himself.
As noted earlier, the English form self-, being a prefix, must be resolved
in TS; therefore, it cannot participate in formal reflexive structures.
(18) *self-behavior
Admittedly, the import of (18) is undercut somewhat by the fact that even
the English CS full reflexive form is blocked in such contexts.
(19) *behavior/*shame/*perjury of oneself
It seems that formal reflexives are systematically blocked in nominaliza-
tions. But this again is reasonable in RT, after all, since nominalizations
do not have Case, and formal anaphors are pure ‘‘Case holders.’’ What-
ever the preposition of governs in nominalizations must be thematic.
At first glance wast in (15a) appears to exemplify a third pattern, dif-
ferent from those of both haat and schaamt; but it is actually simply am-
Anaphora 103
biguous between the two. Alongside its intransitive use (as in Max wast)
it also has a transitive use; furthermore, the intransitive use has a formal
reflexive, just like schaamt, so that wast merely appears to take both
reflexives, a situation inconsistent with the use of level blocking. English
wash shows the same ambiguity as Dutch wast, except that the intransi-
tive in English perhaps does not take a formal reflexive.
(20) a. Max wast zich/zichzelf.
b. Max washed.
c. Max washed himself.
4.3 Reconstruing Reinhart and Reuland’s (1993) Findings
If the proposals made thus far are correct, we can elaborate on a distinc-
tion used by Reinhart and Reuland (1993) (R&R) to account for the
behavior of di¤erent kinds of anaphora. In the end we will reject their
theory of anaphora, because it is incompatible with the one developed
here, and because of its own unresolved flaws. Our model will more
closely resemble Koster’s (1985), which drew something like the same
distinction that R&R’s model draws, but without its limitations.
R&R identify circumstances in which the locality of binding is sus-
pended, as in (21a).
(21) a. Johni thinks that Mary likes Sue and himselfi.
b. *Johni thinks that Mary likes himselfi.
The di¤erence between (21a) and (21b) is that in (21b) the reflexive is in
an argument position, whereas in (21a) it is in only part of an argument
position. R&R conclude that there are two types of anaphor: lexical
(SELF, in R&R’s terms) and logophoric (SE). Lexical anaphora holds of
coarguments of a single predicate and hence occurs ‘‘in the lexicon,’’
whereas logophoric anaphora is a discourse-level business, the same
business that resolves pronoun antecedence.
I think the fundamental problem with R&R’s account is that there is
nothing in the account intermediate between lexical and logophoric ana-
phora. In Dutch, for example, zichzelf does seem to hold roughly of
coarguments, as we saw, and hence could be construed as lexical. How-
ever, not only is zich not a discourse-level anaphor, it in fact has rather
tight locality restrictions, something like English himself—a property
R&R’s account will entirely miss.
104 Chapter 4
The binary distinction made in R&R’s account leads to two other
problems as well.
The first problem is posed by ECM constructions, which show opacity
e¤ects even though the reflexive is not a coargument of the antecedent.
(22) a. John believes [himself to have won].
b. John thinks that Mary believes herself to have won.
R&R devise the notion ‘‘syntactic coargument’’ for this case: believes
assigns Case to himself and a theta role to John, and so they are co-
arguments in some extended sense. In fact, I should not quarrel too
much with this conclusion, as it corresponds so closely to my own, in that
one could call an antecedent in RT’s CS a ‘‘syntactic’’ argument. But
even so, I think R&R’s account su¤ers here mainly from having only a
binary distinction. Once their account is revised so that ‘‘syntactic co-
argument’’ replaces ‘‘thematic coargument’’ as the determinant of reflex-
ive antecedence, it becomes impossible to distinguish himself from self- in
English, where indeed coargument in the narrowest sense (theta-theoretic
coargument) seems to be the governing relation. Consider (23), for
example.
(23) *John self-believes to have left.
Here self- cannot correspond to the ‘‘syntactic’’ object of believes. Of
course, one might stipulate some property of believe that would rule (23)
out, but that would fail to express the very likely and interesting conclu-
sion that such cases are impossible.
In the RT analysis of ECM presented earlier, ECM arises from mis-
mapping TS to CS.
(24) a. TS: [John believes x]þ [himself to have left] ¼b. CS: John believes himself [to have left]
If anaphora applies in CS (or perhaps in CS and TS), then in CS it will
relate the two Case positions John and himself. The locality condition will
apply in CS and so will be bounded by the subject of believe, as that is the
highest NP available in the Case domain of that verb.
The other problem R&R’s account never satisfactorily resolves has to
do with reciprocals. Reciprocals in direct object position show familiar
locality e¤ects; but other reciprocals, while appearing in a broader
range of contexts, do not escape the utterance altogether in finding their
antecedents.
Anaphora 105
(25) John and Mary think pictures of each other are in the post o‰ce.
In (26a) each other is clearly not a coargument of John and Mary; but
from this we cannot conclude that each other is logophoric, as (26b)
shows.
(26) a. [John and Mary]i called on each other at the same time.
b. *[Each other]i’s houses consequently had a forlorn and deserted
look.
What is needed again is something intermediate between coargument
and discourse level; CS or SS application of reciprocal interpretation will
give the right result.
In fact, examples like (25), despite not being constrained to coargu-
ments, nevertheless show strict locality e¤ects.
(27) *John and Mary think that Bill wants pictures of each other to be
in the post o‰ce.
R&R’s account predicts that such cases should be grammatical: since the
antecedent cannot by any means be construed as a coargument of its
predicate, it must be taken to be a logophoric pronoun and should then
show no opacity e¤ects. Stipulating that long-distance reflexives involve
movement does not help either, as we saw in the case of Latin that long-
distance reflexivization penetrated the most hardened islands.
A correct summary of the situation in English must include the fact
that there are at least two di¤erent uses of the reflexive that cannot be
construed as coargument-antecedent-taking cases: one like (27) in which
the reflexive occurs in an argument position, in which case it shows sub-
ject opacity e¤ects; and another involving coordinated reflexives as in
(21a) (Sue and himself ), which do not show such opacity e¤ects. R&R’s
theory cannot distinguish these two, as it makes only a binary distinction
and neither of these qualifies as the ‘‘reflexive predicate’’ case. I will out-
line the RT account of such cases in the next section.
4.4 Predicate Structure
To resolve a problem with binding theory in RT, it is necessary to posit a
level between CS and SS. If that level is identified as the level at which the
subject-predicate relation is established, further puzzles can be tentatively
resolved, and new di¤erences between the RT treatment and the standard
treatment of binding theory emerge.
106 Chapter 4
4.4.1 The Level of Binding Theory
The behavior of German binding theory will compel us to interpose a
level between CS and SS, by the following reasoning. Short-distance
scrambling in German takes place after Case assignment, because the
scrambled NPs retain their Cases. Short scrambling takes place before
binding theory applies, because binding theory relations are computed
strictly on the output of short scrambling. Furthermore, binding theory
(in English) applies strictly before wh movement. Assuming wh movement
takes place in SS, we have the following implications:
(28) Case BT wh
CS < scrambling < XS < SS
In order to account for these relations, we must posit the level XS at
which binding theory applies, and this level cannot be identified with
either CS or SS.
In this section I will take up the idea from chapter 3 that XS is Predi-
cate Structure (PS), the level in which the subject-predicate relation is
instantiated. I suggested in chapter 3 that such a level is needed to un-
derstand the di¤erence between Icelandic and Russian EPP e¤ects. Here I
show that its existence resolves other puzzles as well, and I rationalize the
behavior of English-style anaphors by identifying them as PS anaphors.
This conclusion interacts with the LEC to make an unusual, but per-
haps correct, prediction about long-distance anaphors in English: specifi-
cally, if a tensed-clause boundary intervenes between an anaphor and the
nearest of its possible antecedents, then the anaphor can take more dis-
tant antecedents than if no tensed-clause boundary intervenes. The pre-
diction follows from the LEC because tensed Ss are introduced late (in
SS), and so any anaphor that still lacks an antecedent when the derivation
reaches SS is in fact an SS anaphor and therefore enjoys the broadest se-
lection of antecedents under the LRT correlations. The prediction is quite
contrary to intuition, and contrary to the standard treatment of anaphors
in general, wherein tensed-clause boundaries either are irrelevant to
choice of antecedent or prevent the choice of any higher antecedent, but
never enlarge the choice of possible antecedents.
The closest to a test case I have been able to construct for this predic-
tion is the following pair:
(29) a. Mary hopes that John will think that pictures of herself are in
the post o‰ce.
Anaphora 107
b. *Mary hopes that John believes pictures of herself to be in the
post o‰ce.
If the judgments are as indicated, then the prediction is borne out. In
(29a) a tensed-clause boundary is introduced before herself has found its
antecedent. As a result, the anaphor is an SS anaphor and therefore not
subject to the bounding e¤ects of subjects at PS; hence, it is allowed to
skip over the subject John to target Mary as its antecedent (perhaps only
if Mary is a Focus—but this makes sense since SS (or AS, to be intro-
duced in chapter 9) defines Focuses, not subjects). In (29b), on the other
hand, the structure [John believes [ pictures of herself to be . . . ]] is a CS (or
maybe PS, it doesn’t matter) embedding, because believe takes an IP-sized
complement, and that structure contains a subject for herself to bind to;
as a result, the anaphor here is a PS anaphor and must take that subject
as its antecedent. Recall that earlier anaphors always block later ana-
phors, so that if an anaphor can be identified as an earlier anaphor, it
must be.
Note that Reinhart and Reuland’s (1993) proposals do not discriminate
between (29a) and (29b), because in both cases the anaphor is not a
coargument (even in their extended sense of coargument) of any possible
antecedent. Hence, these must both be logophoric anaphors and so
should behave similarly; in particular, they should not show the di¤erence
that (29a) and (29b) illustrate.
From this section and preceding ones, it emerges that the English re-
flexive is a flexible type of anaphor: it takes its antecedent at the earliest
possible moment. If it has a possible antecedent in TS, it takes it; but
then, if it arrives in PS without an antecedent and a subject is available
there, it must take that; and finally, if it reaches SS with no possible
antecedent, it can even skip over subjects in its search. Reinhart and Reu-
land (1993) capture one part of this with their binary distinction, but if
the reasoning here is correct, there is really an n-way distinction, one for
each level in RT.
The standard view is that English has a subject-oriented anaphor as its
main anaphor, plus some special cases like (21), which are perhaps,
according to this view, not really anaphors at all, and for which a special
unrelated theory must be devised.
By contrast, RT has a single notion of anaphor whose properties vary
across the levels in a predictable way. According to this view, English
only appears to have a subject-oriented anaphor as its main anaphor, be-
cause it is only in very unusual circumstances that an anaphor can escape
108 Chapter 4
antecedents until as late as SS. In other languages, like Japanese and
Dutch, which have multiple anaphors, there is less flexibility, because the
di¤erent anaphors are specialized to di¤erent levels of the RT model.
The RT model of anaphora is thus much tighter than Reinhart and
Reuland’s model, because it does not posit di¤erent types of anaphors
with potentially di¤erent properties. Rather, it posits a single anaphor,
with properties that vary predictably across levels.
4.4.2 Nominative Case and PS
In discussing Case so far, I have alluded only briefly to the special status
of nominative Case and its relation to subject position (chapter 2). The
distinction just made between CS and PS o¤ers an opportunity to address
the di¤erent ways in which languages treat the Case of the subject.
In some languages the subject seems to be assigned Case by a calculus
involving the verb and the other verbal arguments as well; but in other
languages the subject is invariably assigned nominative Case. Languages
of the first type are the ergative and the quirky Case-marking languages,
and languages of the second type are the nominative-accusative languages
like English—what I will call labile nominative and fixed nominative
languages, respectively.
In RT it is natural to associate this di¤erence in behavior with two
di¤erent levels: labile nominative will be assigned at the same time as ac-
cusative and other VP-internal Cases, whereas fixed nominative will be
assigned later, in isolation from the assignment of the other Cases. In this
I follow the ‘‘Case in tiers’’ model (Yip, Maling, and Jackendo¤ 1987),
which distinguishes the two behaviors by identifying di¤erent domains for
the Case assignment algorithm: VP for English-style languages, and S for
quirky Case-marking languages.
This approach can be implemented in several di¤erent ways. I think it
is worthwhile to explore the simplest scheme, wherein nominative in fixed
nominative languages is assigned in PS, and assigned only to the subject
there, but nominative in labile nominative languages is assigned in CS,
and the nominative NP is associated with PS under the CS‘PS map-
ping. This comes closest to modeling Yip, Maling, and Jackendo¤ ’s
scheme. This leaves open the possibility that some languages might have
both rules, a possibility I will not explore here, but which might open the
way to a coherent description of ‘‘mixed’’ ergative languages.
The Case and predicate structures derived for ordinary monadic and
dyadic predicates for the two language types are illustrated in (30).
Anaphora 109
(30)
After the Case and predicate structures are ‘‘unified’’ in the simplest pos-
sible way (by CS‘PS), both languages will have predicate structures
that look like this:
(31) [NPnom [V]]
[NPnom [V NPacc]]
Which is only to say that for the simplest cases the two language types
will look identical, which of course they do. Of interest, then, is how they
diverge for the less ordinary cases.
Consider first the Icelandic quirky Case-marking verbs, as discussed in
chapter 3. For some verbs, the subject is dative, and the object is nomi-
native. So the Case and predicate structures must be as shown in (32b,c).
(32) a. Barninu
the-child.dat
batnaDirecovered-from
veikin.
the-disease.nom
‘The child recovered from the disease.’
(Yip, Maling, and Jackendo¤ 1987, 223)
b. CS: [NPdat [V NPnom]]
c. PS: [NP [V XP]pred]
The obvious CS‘PS isomorphism gives the desired result. The rea-
son this could never happen in English, or any other fixed nominative
language, is that PS requires that its subject be nominative, and so there is
no opportunity for a CS nonnominative to be mapped to that position.
In (32c) the subject NP in PS has no Case marking. This should be
interpreted to mean that no Case marking is assigned or licensed at that
level for the structures in question. But if that position is put into corre-
spondence with an NP in the earlier CS level that does have a Case, then
any NP that occupies that level must have a Case as well; otherwise, the
shape-conserving correspondence is compromised.
As mentioned earlier, the CS/PS distinction in RT duplicates the dis-
tinction between the VP and S domains of Case assignment found in
Yip, Maling, and Jackendo¤ ’s model, and therefore assimilates their
results. Nevertheless, many questions remain. First, where do Case struc-
tures like (32b) come from? They could be generated by something like
110 Chapter 4
Yip, Maling, and Jackendo¤ ’s algorithm. But the algorithm itself will
involve mapping a series of Cases onto a syntactic structure, and so it
might be more interesting in the present context to try to model it as a
matching of two structures. However, I will leave that project for further
research.
For all cases—both normal dyadic predicates and quirky Case-
marking predicates—there is a notion of subject in PS that is independent
of CS. This captures the truth that however Case is assigned to the sub-
ject, its status as ‘‘subject’’ for such purposes as (target of ) raising and
(antecedent or target of ) control will be independent of that assignment.
This accords with the general finding that quirky Case-marked subjects
are ‘‘real’’ subjects in these respects. RT can be seen as defining several
notions of subject, one at each level, with predictably varying properties,
so that if all sorts of subject in levels earlier than PS converge as the same
sort of subject in PS, then they will be treated alike in all respects for any
processes later than PS.
Ergative languages can be treated similarly. In ergative languages an
intransitive subject has the same Case as the transitive object. The exam-
ples in (33) are from West Greenlandic.
(33) a. Kaalip
Karl.erg
Hansi
Hans.abs
takuaa.
sees
‘Karl sees Hans.’
b. Kaali
Karl.abs
pisuppoq.
walks
‘Karl walks.’
(Yip, Maling, and Jackendo¤ 1987, 220)
In such a language both ergative and absolutive are assigned in relation
to V in CS. No Case is assigned to the subject in PS. In the mapping to
PS, the ‘‘highest’’ NP is chosen for the NP subject position, because that
choice minimizes distortion.
(34) a. CS: [NP erg NPabs V]
a
PS: [NP [ . . . V]]
b. CS: [NPabs V]
a
PS: [NP [ . . . V]]
Anaphora 111
I have not seriously studied how ergative languages would fare under
this modelization, and I make these last remarks only to indicate what
seems to me to be the most obvious direction for such a project.
4.5 Functional Structure and Antecedence
In the literature on the locality of anaphora, beginning with Pica 1991,
much has been made of the morphological nature of the anaphor in
determining locality. Pica’s generalization is that long-distance anaphors
are monomorphemic, and local anaphors are bimorphemic. A number
of attempts have been made to explain this. The main ideas of RT do not
seem to me to connect with Pica’s generalization in any particular way.
But Burzio (1996) has brought into focus a generalization about the
nature of the antecedent of the anaphor that under some straightforward
assumptions does connect with RT in an interesting way.
What I want to concentrate on here is Burzio’s finding that subjects
of tensed clauses are, as he puts it, ‘‘more prominent’’ antecedents than
subjects of infinitives. One part of this finding is already widely recog-
nized: namely, that subjects of tensed clauses are more likely ‘‘blockers’’
of long-range anaphora than subjects of infinitives. However, this fact
has usually been expressed by saying that tensed clauses are islands for
some anaphors, whereas infinitives are not, thus making the relation to
the subject position a secondary, incidental thing. But Burzio, drawing on
the work of Timberlake (1979) and others, observes that it also sometimes
matters whether the antecedent itself is the subject of a tensed clause or of
an infinitive. (35) illustrates this e¤ect in Russian.
(35) a. I
and
onihe
ne
not
prosil
asked
nikogo
any
iz
of
nix
them
[provesti
lead
sebjaiself
v
to
nuznoe
needed
mesto] . . .
place
‘And he did not ask any of them to lead him to the necessary
place . . .’
b. ?(pro) I
and
onihe
stydilsja
embarrassed
[PROi poprosit’
ask
kogo-libo
any
iz
of
nix
them
[provesti
lead
sebjaiself
v
to
nuznoe
needed
mesto]].
place
‘And he was embarrassed to ask any of them to lead him to the
necessary place.’
(Timberlake 1979, as reported in Burzio 1996)
112 Chapter 4
Schematically, these examples have the following form:
(36) a. . . . [TensedP antecedentnom [Infinitive V reflexive]]
b. ? . . . [Infinitive antecedentpro [Infinitive V reflexive]]
Burzio’s generalization is not a consequence of any current theory, as
Burzio himself indicates. In fact, to the extent that analysis of anaphoric
relations is reduced to locality, the finding is slightly paradoxical, in that
it says that if an anaphoric relation is going to span at least x amount of
structure, then it must span even slightly more, xþ D, where D is the dif-
ference in functional structure that would separate a nominative Case-
marked subject from a PRO subject, if any.
In trying to come to grips with his finding in classical terms, Burzio
concludes that the long-distance antecedent is not the NP itself, but a
structure that includes the NP as well as part of clausal functional struc-
ture (the part responsible for nominative Case checking). Although I will
not need to distinguish the nominative antecedent in this way, I will make
a proposal that I think captures Burzio’s insight that nominatives are
‘‘more prominent.’’
In order to accommodate Burzio’s finding in RT, we might think of it
in the following way: the more long-distance the anaphoric relation is,
the higher the antecedent must be in the functional structure of its own
clause. Put this way, the resemblance of Burzio’s finding to the LRT cor-
relations is obvious—specifically, it is a locality-target correlation. This
suggests a timing explanation of the kind that RT implements with the
LEC. However, there are some obstacles, one perhaps fundamental, to
implementing Burzio’s finding in RT.
In previous explanations of the LRT correlations (e.g., in the deriva-
tion of the BOIM in chapter 3), I have used the concept of extension as
an auxiliary hypothesis; and I believe I have used it in a straightforward
way. In the present context, however, the use of extension is either a very
delicate matter, or impossible, and some rethinking is called for.
To see the problem, let’s assume that Russian is like English in that
T(ense) is introduced in SS and infinitives, control, and so on, are located
in PS. Let’s assume further that nominative Case is introduced in the
structure just as in English—that is, at the level at which T is introduced,
SS (see chapter 3 for other possibilities). Then (35a) is as expected, be-
cause at SS, and not before SS, the most peripheral element in the struc-
ture will be the matrix nominative NP, which in fact is the antecedent.
Anaphora 113
However, (35b) is more problematic. There are several di¤erent as-
sumptions about what the analysis is, but in fact none of them make the
problem go away. The matrix, although tensed, does not have an overt
nominative subject; instead, it might have one of the following:
(37) a. a covert nominative subject
b. a covert nonnominative subject (call it pro)
c. no syntactically represented subject at all
If (37a) is correct, then we have to ask, why is a covert nominative not
targeted when nominative is the target?, and we have no answer. If (37b)
is correct, then we have a serious problem with interpreting extension,
because we have to ask, why can’t pro be targeted as long as it is pe-
ripheral?, and we have no answer. If (37c) is correct, we might have an
answer, depending on how extension is interpreted. If we understand it to
mean that a relation must span every level of embedding, then perhaps we
can account for (35b): the long-distance reflexive cannot be assigned until
SS, but the surface structure of (35b) has no targetable antecedent, be-
cause there is no antecedent in the matrix.
Although (35c) might seem to be the best choice, I am not sure it leads
to the best theory or the best understanding of extension. Extension is
perhaps nothing more than a very simple approximation of the principle
that is needed here.
The intuition behind extension is that rules should always target ‘‘new’’
material. In classical minimalism, extension requires that the most
recently merged element be targeted; it does so because the most recently
merged element is the most peripheral. So far I have followed this think-
ing. But we might instead concentrate on ‘‘new’’ itself, and not try to im-
plement it in terms of peripherality. We would then say that what is new
in SS is T and nominative Case (at least), and that therefore a rule
assigned to that level must target nominative. That would explain the
di¤erence between (35a) and (35b), derive Burzio’s finding, and in fact
account for what he means by calling the nominative ‘‘more prominent.’’
Importantly, while nominative Case is ‘‘new’’ in SS, the NP itself is
not; under shape-conserving mapping, nominative NPs correspond to
NPs in the previous levels (TS, CS, PS).
(38) PS: [NP PredP]
��! ��!
SS: [NPnom VPT]
114 Chapter 4
The reason that the long-distance reflexive cannot be assigned to these
earlier ‘‘shadows’’ of the nominative NP is that it is assigned to SS, the
one stipulation about Russian on which this account hangs.
The most general lesson from this chapter is that if a distinction must
be drawn, the RT levels might provide a nonarbitrary way to draw it.
Here I discussed di¤erent kinds of anaphors, and di¤erent kinds of Case
systems. Rather than saying that there are two (or more) kinds of ana-
phors with di¤erent and perhaps unrelated properties, and the same for
Case systems, RT allows the properties to di¤er in a completely system-
atic way, if the distinction to be drawn can be aligned with the di¤erence
between RT levels. I will explore further instances of this method in the
next two chapters.
Anaphora 115
This page intentionally left blank
Chapter 5
A/A/A/A
In chapter 4 I developed a typology for anaphoric elements in RT by
assigning di¤erent anaphors to di¤erent RT levels, in a way that explains
their properties; in particular, this methodology explains the link between
the locality of an anaphor and the type of antecedent it requires (theta
antecedent/A antecedent/A antecedent). The later the level, the larger the
defined structures are and the more types of NPs are available, and thus
locality and antecedent type are linked with one another. This coordina-
tion is one dimension of what I have called the LRT correlations, corre-
lations that stem from the basic architecture of RT in a way that I think
distinguishes it from other models.
In this chapter I will apply the same methodology to scrambling and
movement rules. Every representation relation is capable of mismatches,
that is, nonisomorphic relations between structures, to which I will con-
tentiously apply the term scrambling. The later the representation rela-
tion, the broader the scrambling, and the wider the class of elements
targeted; all this is parallel to the account of anaphors in chapter 4.
But for scrambling and other rules, including in fact ‘‘real’’ movement
rules, there is a third dimension of variation: reconstruction. As with lo-
cality and target, the reconstruction relations a scrambling or movement
rule enters into are determined entirely by where in the model it occurs.
Simply put, a scrambling or movement relation reconstructs for any rela-
tion defined in previous levels, and for no relation defined at the same or
at later levels.
This is not a stipulation. Instead, it is the inevitable consequence of
how the notion of representation organizes the various levels: if a certain
relation (say ‘‘antecedent to anaphor’’) is established at level Xi, and level
Xiþ1 represents level Xi, then when level Xi mismaps to level Xiþ1, thatmismapping will appear to reconstruct, in that the configuration in level
Xi will be the one relevant to establishing the relation, not the configura-
tion in level Xiþ1. In fact, by this reasoning, reconstruction is entirelyrelativized to the levels: each representation relation will (appear to) re-
construct for any relations defined on any previous levels. This notion
of reconstruction is in fact indistinguishable theoretically from the rela-
tivized version of the A/A distinction discussed in chapter 4, and if my
reasoning is correct, the relativized notions should entirely replace the
binary notions.
The prediction about reconstruction is exactly the prediction made by
the model outlined in Van Riemsdijk and Williams 1981, but again gen-
eralized. In that model all NP movement (and some other similar rules)
first was applied to derive NP Structure, where binding theory was
defined, and then wh movement defined S Structure. This established a
natural relation between NP movement and wh movement, a relation in
which the latter reconstructs for the former, but not the reverse, and nei-
ther reconstructs with itself. RT generalizes this notion of reconstruc-
tivity, as part of the LRT correlation. I do believe that RT not only
generalizes reconstruction and the A/A distinction, but in fact rationalizes
them: they arise organically and inevitably as a result of the sequence of
representation relations that a grammar assigns to a sentence. In addi-
tion, RT links both rule type and reconstructivity with locality (via the
LEC), correlations the NP Structure model, lacking the LEC, could not
make.
In this chapter I will treat ‘‘real’’ movement and scrambling similarly,
even though they correspond to quite di¤erent mechanisms in the theory.
For the purposes of reconstruction, this di¤erence will be irrelevant.
In chapter 6 I will return to the diagnostic I used earlier to distinguish
movement from (mis)representation: representation gives rise to (the ap-
pearance of ) intersecting dependencies, whereas movement gives rise to
nesting dependencies. In this chapter we will see another diagnostic for
the di¤erence: movement can apply only once in a given domain, but
scrambling can apply more than once. This di¤erence follows from the
basic di¤erence in the nature of the rules: scrambling is part of the
matching up of two structures, and where the matchup is not perfect that
will involve, or give the appearance of involving, multiple displacements.
The title of this chapter is a sort of joke, playing on the ambiguity of
bar. In the A/A distinction, the bar actually means ‘not’, so A means ‘not
argument’, as opposed to ‘argument’. So A , if it means anything, means
‘not not argument’, which is identical to argument. The bar notation thus
locks us into the binariness of the opposition. I suggest that the bar ¼ not
118 Chapter 5
interpretation simply be dropped, opening up the possibility of generaliz-
ing to a series, a series in fact indexed by the RT levels.
5.1 Long-Distance and Short-Distance Scrambling
Extremely local scrambling will be identified as CS misrepresentation;
that is, CS is mapped onto a misrepresenting SS, or some level later than
CS. This identification generates expectations about reconstruction—that
is, about the interaction of scrambling with binding theory (BT) and the
other properties that Van Riemsdijk and Williams (1981) associated with
A movement in the NP Structure model. The precise expectations depend
on where the theories that govern these phenomena intersect the levels of
the RT model. If, for example, BT applies in SS, meaning that surface
structures are the sole determinants of BT relations, and if local scram-
bling precedes SS, then local scrambling will interact with BT in the fol-
lowing way:
(1) Only the structures arising from local scrambling will determine the
applicability of the BT definitions.
This is largely correct. In German, for example, local scrambling
(sometimes called object shift) shows the following behavior:
(2) a. Ich
I
habe
have
die
the
Gasteiguests.acc
einander
one-another.dat
ti vorgestellt.
introduced
‘I introduced the guests to one another.’
b. *Ich habe einander die Gastei vorgestellt.
In this example I have assumed that the base order in German is ‘‘dative
accusative V,’’ an assumption that is somewhat controversial. I have
used a trace to mark the scrambled-from position; but of course in RT
there will be no trace of scrambling, because it is not a real movement,
but a displacement that arises from the mismatch of two levels. Only the
scrambled order (2a) permits the accusative NP to bind the dative NP,
assuming the theta order [goal [theme V]]. I will therefore assume that
BT applies in SS, or shortly after CS, if other levels intervene. The NP
Structure model captured a part of this generalization, in separating NP
movement from wh movement.
Long-distance movement (including scrambling), on the other hand,
does not conform to (1); in fact, it defies it systematically, in that (a) the
target position does not license antecedents that could not have been
licensed from the start position, and (b) moved anaphors that need local
A/A/A/A 119
antecedents find them local to the start position, not the target position.
All this is well known (see Van Riemsdijk and Williams 1981; Webelhuth
1989; Mahajan 1989; Vanden Wyngaerd 1989; Deprez 1989; Santorini
1990; Williams 1994b).
I have assumed that wh movement (and its special instance called top-
icalization) is a movement internal to one of the levels in the model, SS
or later. Suppose for discussion that the relevant level is SS, and suppose
that for English, BT is determined at PS (i.e., between CS and SS), as
suggested in chapter 4. Then the interaction of BT and wh will show re-
construction e¤ects, as diagrammed in (3).
(3) CS‘PS‘ SS . . .
BT wh
Binding relations established in SS will be established independent of—in
fact, in complete ignorance of—the operation of wh movement in FS.
How does this work concretely?
Suppose we have the theta structure John likes himself, which is
mapped onto a CS object and thence onto the PS object John likes himself,
which in turn is transformed into himself John likes t in SS. The binding
relations established in CS are not perturbed by the later movement.
(4) PS: John likes himself (BT)m Johni likes
himselfi
a
SS: John likes himself whm himself John likes t
Notice that the derivation involves a mixture of derivation and represen-
tation relations. The antecedent is defined in PS, SS represents PS, and
movement occurs within SS.
Nothing fundamental is changed if we replace wh movement within a
level with a representation relation occurring after PS, or, more to the
point, a misrepresentation relation. For example, suppose for purposes of
illustration that there were a scrambled relation between PS and SS; then
we would find the derivation in (5), which shows the same reconstruction
features as (4).
(5) PS: John likes himself (BT)m Johni likes
himselfi
a
SS: himself John likes
So reconstruction is indi¤erent as to whether the reconstructing relation
is interlevel misrepresentation or intralevel movement. Therefore, even
120 Chapter 5
without deciding whether ‘‘long-distance’’ scrambling is a wh movement–
like movement or a misrepresentation, we know how it will interact with
BT from the fact that it takes place later than BT.
In RT it is inevitable that reconstruction is a relative term. A movement
M reconstructs with respect to a relation R if R is established before M,
in the sense of the examples just discussed. In principle, a given move-
ment relation could reconstruct for one relation (even a movement rela-
tion) before it, and ‘‘be reconstructed for’’ by another movement after it,
all in the same sentence.
Given this, we might expect to find scrambling at every level. Long-
and short-distance scrambling could be understood as misrepresentations
of CS and PS, respectively, but are there other scramblings as well? RT
leads us to expect that there could be scramblings both earlier and later
than the ones identified as long and short scrambling.
Linguists have learned to think of long scrambling as involving re-
construction, and short scrambling as not. In fact, though, even short
scrambling shows reconstruction e¤ects of a certain kind: theta relations
could be viewed as being assigned ‘‘under reconstruction’’ of short
scrambling. Thinking of the theme in (6) as getting its thematic relation to
the verb under reconstruction of object shift is perfectly analogous to the
long-distance case.
(6) Ich
I
habe
have
Bill
Bill
gestern
yesterday
t gesehen.
seen
‘I saw Bill yesterday.’
But in fact RT even suggests that certain ‘‘Case-changing’’ or
grammatical-relation-changing operations (e.g., the antipassive construc-
tion) could be the result of ‘‘scrambling’’ at the earliest level, between TS
and CS. I note this possibility, but will not pursue it here.
Conversely, RT leads us to look for scramblings later than long
scrambling. This expectation too is fulfilled, as we will see in section 5.2.
The behavior of the extremely long and the extremely short cases of
scrambling tend to support the idea that scrambling occurs for every rep-
resentation relation in RT, and to call into question the binarity of the
A/A distinction.
Thus far I have accounted for the relation of di¤erent sorts of scram-
bling to BT in a way that generalizes the A/A distinction, and that
generalizes to other sets of relations besides BT relations. But I have not
yet accounted for the locality of scrambling and its interaction with BT. It
has traditionally been thought that long scrambling is A scrambling and
A/A/A/A 121
that local scrambling is A scrambling. The basic still-unanswered ques-
tion is why the possibility of reconstruction should correlate positively
with the distance moved. Linguists are so familiar with this correlation
that they do not generally appreciate that it remains unexplained. I will
now outline how the correlation between distance moved and type of
scrambling is achieved, and in fact is inevitable, in RT.
Recall that in chapter 3 di¤erent types of embedding were associated
with, and took place at, di¤erent RT levels, according to the LEC. These
ranged from very small TS embeddings, showing the tightest clause union
e¤ects, to embedding at FS, which showed the strong ‘‘insulating’’ prop-
erties of nonbridge verb embedding. The locality of movement relations
will be determined, in part, by what level the movement applies at, and in
particular, by what embedding has taken place at that level; an extraction
is of course impossible if the phrase to be extracted from is not yet present
in the same structure as the target of the movement.
This arrangement makes predictions about the relation of the locality
of scrambling to the type of reconstruction that takes place. If a particu-
lar kind of scrambling is defined on an early representation relation, it
will be ‘‘local,’’ in that it will not be able to bridge tensed embeddings,
and it will also not show reconstruction e¤ects with respect to BT, as-
suming BT applies later; but if a scrambling takes place later, it will be
nonlocal, and it will interact with BT reconstructively.
(7) XS ‘ PS ‘ SS
��! BT ��! tensed-S embedding
XS scrambling PS scrambling
It will be nonlocal in that it has access to a new set of embeddings to ex-
tract from; it will not interact with BT reconstructively because BT occurs
later. For example, on this view any scrambling that spans tensed Ss must
show BT reconstruction—to span tensed Ss, it must occur at or later than
SS, and by then BT relations are already fixed.
German presents an interesting constellation of scramblings. It has
what is called object shift, which is a purely local clause-bound scram-
bling. It has a subcase of object shift out of ‘‘restructuring’’ infinitives.
And some dialects allow scrambling out of infinitives in general, in a
construction studied by Santorini (1990). Santorini demonstrates two
properties of the construction: the scrambling is middlingly local, and it
acts like A movement for BT. As for locality, although it is not clause
bound, it can bridge at most an embedded infinitive construction.
122 Chapter 5
(8) a. Ich
I
habe
have
vorgehabt, [
planned
die
the.acc
Gasteiguests
einander
recip.dat
ti vorzustellen]IP.
to-introduce
‘I planned to introduce the guests to each other.’
b. ?Ich habe die Gastei einanderj vorgehabt [t0i tj ti vorzustellen]IP.
Here the accusative and the reciprocal have both scrambled out of the
embedded clause, and they have switched places; the new order licenses
binding. Santorini shows that the binding is not licensed under recon-
struction, inasmuch as (9b) is impossible.
(9) a. Ich habe vorgehabt, [einanderi die Gastej ti tj vorzustellen]IP.
b. *Ich habe einanderi vorgehabt, [ti die Gastej ti tj vorzustellen]IP.
In (9b) the reciprocal has scrambled out of the infinitive, leaving the an-
tecedent behind. The result is ungrammatical, showing that reconstruc-
tion is not the right approach, as it should work here if it were working in
(8). So the movement in (8) is like A movement.
The problem with calling it A movement is precisely its nonlocality.
First, it is not like NP movement in that it preserves earlier Case assign-
ments; relevant here is the fact that only vorzustellen ‘to introduce’ in (8)
assigns dative Case, not vorgehabt ‘planned’. Second, it is unlike NP
movement in being able to move nonsubject arguments out of an em-
bedded clause.
It is instructive to see how RT determines where this scrambling must
take place. The particular pairing of properties just described can be
accommodated in RT only in a very particular way. The scrambling in
(8) is bounded on the left by CS and on the right by BT; the first because
the original Cases are preserved on the scrambled NPs, and the second
because the binding is licensed in the derived positions. Furthermore, it
is bounded on the left by whatever level infinitive embedding occurs at,
and it is bounded on the right by whatever level tensed-S embedding
occurs at. There are a number of arrangements that satisfy all of these
constraints, though of course they have a quite specific character, so the
proposal is not without content, even if the levels are not specifically
identified.
(10) CS ‘ XS . . . A YS . . . B ZS . . . C WS
��! ��! ��! ��!
Case infinitive
embedding
BT tensed-S
embedding
A/A/A/A 123
The scrambling must occur somewhere between YS and ZS here, in the
region subscripted B.
In order for this to be a fully satisfying account of Santorini’s con-
struction, the levels XS . . . WS must be identified. I could arbitrarily
make assignments now (e.g., ZS ¼ PS, WS ¼ SS, YS ¼ CS) on the basisof the levels identified in chapter 4. Even without those details, however,
some of the interest of the theory can be enjoyed, because of the implica-
tional relations that must hold no matter how those assignments are
made, as the boundings illustrated in (10) must be preserved in any more
specific model. For example, if the relative ordering of the relevant levels
is that given in (10), there cannot be a type of scrambling that spans
tensed Ss yet interacts with BT nonreconstructively, since BT applies later
than Case assignment.
(11)
Nor could there be a type of scrambling that leads to new Case assign-
ments (and so is earlier than CS) yet interacts with BT reconstruc-
tively. RT allows a number of di¤erent kinds of scrambling (A, B, C in
(10)), thus expanding the A/A repertoire in a way perhaps prefigured
by Webelhuth (1989). However, it does not allow for just any arbitrary
combination of properties, which Webelhuth’s theory unfortunately did,
because saying that scrambling is simply a mixture of some A and some
A properties leads to exactly this expectation.
Thus far I have considered reconstruction for anaphor binding. But
languages also exhibit reconstruction for scope, with the same di¤erence
between short and long scrambling. Ueyama (1998) presents evidence
that these two types of scrambling in Japanese di¤er in their reconstruc-
tion behavior just in the manner the RT model would predict.
First, it has been well known since Hoji 1985 that monoclausal scram-
bling in languages like Japanese gives rise to scope ambiguity. The inter-
pretation of the scopal order of S and O is unambiguous in SOV clauses,
but OSV order introduces the possibility for O to take wide scope. This is
characteristic of scrambling before scope fixing, a possibility only for very
early, and therefore very local, scrambling. See chapter 6 for further dis-
cussion of monoclausal scrambling.
124 Chapter 5
Ueyama (1998) presents a range of scope interactions involving scram-
bling in biclausal structures, documenting a sharp distinction between
monoclausal and biclausal scrambling in Japanese.
First, as already indicated, if scrambling does not occur, scopes are
fixed.
(12) [Yaohan-sae]QP2 -ga [
Yaohan-even-nom
seizi-dantai
political party
X-ga
X-nom
[55%-no
55%-gen
ginkoo]QP1 -ni
bank-dat
supai-o
spy-acc
okurikonda
dispatched
to]CPcomp
kimetuketeiru.
conclude
‘[Even Yaohan]QP2 concludes [that political party X had dispatched
spies to [55% of the banks]QP1 ]CP.
(Ueyama 1998, 50)
For these cases with no scrambling, Q2 has scope over Q1unambiguously.
Second, scrambling within the lower clause does not change scope
relations between lower-clause and upper-clause quantifiers.
(13) [Yaohan-sae]QP2 -ga [[
Yaohan-even-nom
55%-no
55%-gen
ginkoo]QP1 -ni
bank-dat
seizi-dantai
political party
X-ga
X-nom
supai-o
spy-acc
okurikonda
dispatched
to]CPcomp
kimetuketeiru.
conclude
‘[Even Yaohan]QP2 concludes [that political party X had dispatched
spies to [55% of the banks]QP1 ]CP.’
(Ueyama 1998, 51)
Here QP1 has been scrambled to the head of the embedded clause, but
that does not a¤ect its scope interaction with the matrix QP2: again, QP2has scope over QP1. This result is perhaps expected on all accounts, but it
is relevant to Ueyama’s demonstration nevertheless.
The surprising fact is that when the embedded quantifier QP1 is
scrambled to the matrix clause, it still cannot take scope over the matrix
QP2.
(14) [55%-no
55%-gen
ginkoo]QP1 -ni [
bank-dat
Yaohan-sae]QP2 -ga [
Yaohan-even-nom
seizi-dantai
political party
X-ga
X-nom
supai-o
spy-acc
okurikonda
dispatched
to]CPcomp
kimetukeiteiru.
conclude
‘[Even Yaohan]QP2 concludes [that political party X had dispatched
spies to [55% of the banks]QP1 ]CP.’
(Ueyama 1998, 51)
A/A/A/A 125
QP1 unambiguously takes scope beneath QP2, even though it has been
fronted beyond it. This is surprising, because in the monoclausal case
OSV order leads to the possibility of wide scope for O over S. These cases
clearly indicate a close connection between locality and reconstruction.
Long scrambling reconstructs for quantifier scope fixing, whereas short
scrambling does not, exactly the direction of correlation that RT predicts.
5.2 Scrambling and the Subject
Bayer and Kornfilt (1994) present another set of scrambling cases that
show the full scope of the LRT correlations (locality, reconstructivity,
and target). They demonstrate a three-way split for scrambling that
strands a quantifier. On the assumption that quantifiers, or at least some
quantifiers, are not present until SS (or QS), it follows that scrambling
targeting those quantifiers cannot apply at least until then, and with a
further assumption it follows that quantifier-related scrambling cannot
take place to a position beneath the subject. Bayer and Kornfilt’s exam-
ples are these:
(15) a. Socken
socks
zieht
puts
der
the
Heinrich
Heinrich
im
in-the
Sommer
summer
keine
none
an.
on
‘Heinrich puts no socks on in the summer.’
b. ? . . . dass [Socken der Heinrich im Sommer keine anzieht]IP.
that
c. *. . . dass der Heinrich [Socken [im Sommer keine anzieht]].
that
In these examples the noun Socken has been scrambled away from its
quantifier keine. Since this scrambling appears to target quantifiers, it
applies at SS, but then extension will require it to move to the edge of
the constituents defined at that level. Since those constituents are IPs at
a minimum (more likely CPs), any scrambling will have to move to the
edge of IP, or to CP; as a result, (15c) is impossible, because here the
scrambling moves only to the left edge of VP.
Moltmann (1990) shows convincingly that scrambling targeting quan-
tifier expressions obligatorily exhibits reconstruction e¤ects, even when
applying in a simple clause.
126 Chapter 5
(16) a. . . . weil
because
Hans
Hans
Bilder
pictures
voneinanderiof-each-other.acc
den
the
Leutenipeople.dat
keine t
none
zeigen
to-show
mochte.
wants
‘. . . because Hans doesn’t want to show the people any pictures
of each other.’
(Moltmann 1990, (116a))
b. . . . weil
because
Maria
Maria
diese
these
Bilder
pictures
voneinanderiof-each-other.acc
den
the
Leutenipeople.dat
sicher t
surely
zeigen
to-show
wollte.
wanted
‘. . . because Maria surely wanted to show the people these
pictures of each other.’
(Moltmann 1990, (117a))
In (16a) the binding of voneinander by Leute takes place under recon-
struction; that this is a special feature of the rule splitting quantifier
phrases is shown by the unavailablity of binding in (16b), completely
parallel to (16a) except that the scrambling moves an intact definite NP.
It appears then that German actually has three kinds of scrambling
(scrambling in simple clauses, scrambling that moves out of embedded
clauses, and quantifier-targeting scrambling of the kind just illustrated),
and each shows the reconstructivity properties that are expected to follow
from the particular level at which it applies.
In the chapter 6 discussion of Superiority in Japanese, we will see the
same di¤erence between scrambling to a position above and scrambling
to a position below the subject: scrambling two wh words to a position
below the subject does not lead to ambiguity, but scrambling them both
to a position above the subject potentially does, thus repeating the lesson
learned from German in chapter 2.
In the debate over the nature of short, clause-bounded scrambling the
recurring question is whether it is an A movement or an A movement.
Some researchers have argued for A movement, some for A movement,
some for a mixed status of one kind or another. The cluster of properties
distinguishing A from A movements was hypothesized in Van Riemsdijk
and Williams 1981 to involve the now familiar cluster of properties con-
cerning BT and reconstruction. In the present context the questions con-
cerning the status of movement have all been relativized. That is, we now
ask not whether a given movement reconstructs, but what it reconstructs
A/A/A/A 127
for; and we ask not whether a moved constituent may antecede BT ele-
ments, but what it may antecede. I think that some of the confusion and
conflicting results pertaining to clause-bounded scrambling can be solved
in the context of a notion of the A/A distinction relativized in this way.
A recurring observation is that scrambling to a position beneath the
subject has di¤erent properties from scrambling to a position above the
subject. In German, for example, an accusative NP scrambled over a
dative reflexive may antecede that reflexive, but an accusative NP scram-
bled over a nominative subject may not antecede that subject.
(17) a. . . . dass
that
der
the
Arztidoctor.nom
den
the
Patientenjpatient.acc
sichi=jhimself.dat
ti im
in-the
Spiegel
mirror
zeigte.
showed
‘. . . that the doctor showed the patient himself in the mirror.’
(Muller 1995, 160)
b. *. . . dass
that
den
the
Frank
Frank.acc
sichihimself
manchmal
sometimes
ti nicht
not
gemocht
liked
hat.
has
(Muller 1995, 161)
Can such e¤ects be understood as arising from timing in RT? I have
already suggested in chapters 3 and 4 that there are several notions of
subject, each pertaining to a di¤erent level in the model: Case-theoretic,
controllable, nominative, and so on. Each of these occupies a position in
its own level, related to the subject positions in the other levels through
the representation relation.
Suppose, as suggested in chapter 3, that the controllable or nominative
subject is defined in PS and that the binding of subject-sensitive anaphors
takes place there as well. Schematically:
(18) TS‘CS‘A PS ‘B SS‘FS
control,
nominative
Case,
reflexive
binding
If representation relation A is a scrambled relation, the scrambling will be
restricted to positions beneath the surface subject position, which we are
identifying here with nominative Case; and it will also appear to deter-
mine the input to reflexive binding, as it precedes the level at which re-
128 Chapter 5
flexive binding takes place. If representation relation B is a scrambled re-
lation, it will involve scrambling over the surface subject position; and
it will appear that the binding relations are computed on the input to the
scrambling relation. Since the subject is overtly marked nominative, we
know that it is the subject of PS, and so scrambling must follow it. If
scrambling took place before PS, it would not preserve the nominative
Case relations, as these, unlike the Case relations of internal arguments,
are not determined until PS.
For representation relation B, the arrangement shown in (18) deter-
mines that scrambling reconstructs for BT. In fact, that is exactly what
happens for scrambling to a position above the subject, but not for
scrambling to a position beneath the subject.
(19) a. . . . dass
that
der
the
Arztjdoctor
sich�i=jhimself
den
the
Patientenipatient.acc
ti im
in-the
Spiegel
mirror
gezeigt
showed
hat.
has
‘. . . that the doctor showed the patient himself in the mirror.’
(Muller 1995, 177)
b. . . . dass
that
sichihimself
der
the
Fritz
Fritz.nom
ti schlau
intelligent
vorkommt.
appears
‘. . . that Fritz appears intelligent to himself.’
(19a) shows that scrambling to a position beneath the subject does not
allow reconstruction. In RT this means that such scrambling occurs
strictly before PS. But (19b) shows that scrambling to a position above
the subject does permit reconstruction, just as we would expect if scram-
bling to that position was not possible until PS.
Examples (19a,b) draw a fine distinction between movement to posi-
tions above and below the subject, but excluding movement to SpecC. Of
course, movement to SpecC permits reconstruction as well, in English as
well as in German, as (20) shows.
(20) a. Himselfi John likes ti.
b. Sichihimself
hat
has
FritziFritz
schon immer tialways
gemocht.
liked
(Muller 1995, 177)
(17) and (19) together show the fine interaction among binding, recon-
struction, and Case assignment, an interaction unique to RT as far as I
can tell.
A/A/A/A 129
5.3 A/A/A Reconstruction
The limitation of the binary A/A distinction becomes more acutely evi-
dent when we consider ‘‘higher’’ (or ‘‘later’’) reconstructions. Suppose
there were a rule that moved wh words, perhaps among other things, but
in such a way that a moved wh word was interpreted strictly in reference
to its reconstructed, or original, position. RT in fact leads us to expect
such a scrambling rule, on grounds of full generality: why should any
representation relation not be subject to mismatching to achieve semantic
e¤ects? Exactly such a rule is found in Japanese.
(21) ?Dono
which
hon-oibook.acc
Masao-ga
Masao.nom
[Hanako-ga
Hanako.nom
ti
tosyokan-kara karidasita ka]CPchecked-out
siritageatteiru.
wants-to-know
‘Masao wants to know which book Hanako checked out.’
(Saito 1991, (33a))
The wh word at the top of the matrix clause in (21) is interpreted at
the top of the embedded clause, even though it has been entirely
removed from the embedded clause, which means that it is licensed (wh-
interpreted) in its reconstructed position. We will assume that the move-
ment illustrated in (21) (called long topicalization by Saito (1991)) occurs
later than wh movement or wh interpretation. If wh movement (or con-
strual) occurs at SS, then long topicalization must occur at FS; and since
reconstruction is relative, this means that long topicalization reconstructs
for the purpose of wh movement/interpretation. Although this construc-
tion is called topicalization, it lacks the wa topic marking of more famil-
iar Japanese topicalization structures. Presumably this is because long
topicalization applies after such wa marking is licensed. The lack of wa
marking again would make sense if long topicalization applied at FS,
with wa marking applying at SS or earlier.
It is important to observe that X reconstructs for Y only if X is strictly
later than Y. Specifically, elements at the same level do not reconstruct
for one another. For example, as noted in Van Riemsdijk and Williams
1981 and Williams 1994b, wh movement does not reconstruct for wh
movement.
(22) a. *[Which picture of ti]j do you wonder whoi tj upset?
b. ?Whoi do you wonder [which picture of ti upset]?
130 Chapter 5
Both (22a) and (22b) involve extraction of one wh phrase from another
and so neither is fully grammatical. But only (22a) requires reconstruction
of wh movement for wh movement, and so it is far worse than (22b).
Likewise, long topicalization in Japanese, although it reconstructs
for wh movement, does not reconstruct for long topicalization, even
though multiple long topicalizations in a single multiclause structure are
grammatical.
(23) a. Taroo-ga
Taro-nom
[Hanako-ga
Hanako-nom
Masao-ni
Masao-dat
sono
that
hon-o
book-acc
watasita
handed
to]
that
omotteiru koto.
thinks
‘Taro thinks that Hanako handed the book to Masao.’
b. Sono honi-o Masaoj-ni Taroo-ga [Hanako-ga ti tj watasita to]
omotteiru koto.
c. Taroo-ga
Taro-nom
[Hanako-ga
Hanako-nom
sono
that
hon-o
book-acc
yonda
read
to]
that
itta koto.
said
‘Taro said that Hanako read the book.’
d. *[Hanako-ga ti yonda to]j sono honi-o [Taroo-ga tj itta koto].
(Saito 1991, 16)
(23b) is a version of (23a) in which double long topicalization has taken
place. Likewise, (23d) is a version of (23c) in which long topicalization
has taken place. The di¤erence is that in (23d) the applications are in-
trinsically nested, which is to say nothing more than that in order for
(23d) to be grammatical, long scrambling would have to reconstruct for
itself—and as we have seen, this is in general impossible. In RT the no-
tion that a given type of scrambling could reconstruct for itself is inco-
herent, since any given type of scrambling is simply the relation between
two adjacent levels, and any given sentence could involve only one such
relation.
Other details about the interaction of A and A systems also follow
from the architecture of RT. It is a theorem of RT that if X reconstructs
for Y, then Y cannot reconstruct for X. We already know that wh move-
ment reconstructs for NP movement.
(24) How [likely tBill to win] is Bill tAP?
We can therefore conclude that NP movement does not reconstruct for
wh movement. But what would that mean? Consider a language that
has both wh in situ for indirect questions (like Chinese) and Case-driven
A/A/A/A 131
raising (like English). Then one version of the question we are presently
addressing is, what would block the following derivation?
(25) a. [— wondered [wh [who to see Bill]]] raisingm
b. [who wondered [wh [t to see Bill]]] reconstructionm
c. [— wondered [wh [who to see Bill]]] wh construalm
d. [— wondered [wh who [t to see Bill]]]
In other words, the embedded wh word is raised to the matrix, then
reconstructed into its original position, and then used to make the em-
bedded clause an indirect question by wh construal strictly within the
embedded clause. This is what it would mean for raising to reconstruct
for wh movement. There are somewhat more complicated cases that
make the same point for a language like the one just imagined, but with
real wh movement instead of just wh construal.
(26) [Pictures of twh wondered [who tNP to bother Bill]].
Here the NP movement of pictures of who reconstructs for the licensing of
the embedded wh movement.
It is I think safe to assume at this point that such cases do not exist. But
why not? In RT this can be predicted from the very fact that wh recon-
structs for raising, via the theorem just mentioned.
The examples in (25) and (26) are at variance with RT in another,
though related, way: since raising is an IP rule, it cannot apply in the
presence of CP structure in the first place (see chapter 3).
Importantly, the details of the interaction illustrated in (25) and (26) do
not follow from the BOIM by itself—none of the movements in (25),
overt or covert, is improper. (25) does follow from the NP Structure
model of Van Riemsdijk and Williams (1981), and in fact follows in the
same way it does in RT. So a theory in which the GBOIM is added as an
extra condition will need still more conditions to regulate NP/wh recon-
struction interactions. By contrast, both the GBOIM and the facts in (25)
and (26) can be derived from the architecture of RT itself. These cases
thus add further weight to the argument that the GBOIM should be
architecturally derived.
Examples (25) and (26) are exactly like Saito’s (1991) long topical-
ization examples discussed earlier, except that Case-driven raising is
substituted for long topicalization. In both instances we tried to create
cases in which wh construal (or movement) takes place under reconstruc-
tion. One works, the other is blocked. This tells us that there is no abso-
132 Chapter 5
lute answer to a question like, ‘‘Is wh interpreted under reconstruction?’’
Rather, one must ask, ‘‘Is wh interpreted under reconstruction of Y?’’
RT suggests that there will potentially be a series of scramblings, one
between each representationally related pair Xn ‘Xnþ1, and that eachwill appear to reconstruct for the purposes of any relations established at
or prior to Xn. The nature of each kind of scrambling will be determined
by the level at which it operates; n/nþ 1 scrambling will scramble (and‘‘reconstruct’’) only nodes of the type defined at Xn or earlier, and its
‘‘range’’ will be determined by the size of the structures defined at Xn.
This again is one dimension of the LRT correlations.
The following diagrams the potential scrambling relationships and
their e¤ects on interpretation. The model assumed is the one presented in
chapter 9, which omits the level PS discussed in chapters 3 and 4.
(27)
5.4 Remnant Movement
The term remnant movement has great currency recently. But in fact there
have always been remnant movements, and there always will be, even
if present proposals fall by the wayside. Remnant movement is the
movement of a phrase containing the trace of something that has been
removed from it. Uncontroversially, remnant movement has taken place
in (28), assuming that certain is a raising predicate.
(28) [How certain ti to win]j is Johni tj?
Given the existence of remnant movement, the problem, as usual, is to
exclude most instances of it—for example, (29a).
A/A/A/A 133
(29) a. *Whoi were [pictures of ti]j seen tj?
b. Whoi were seen pictures of ti?
There is a derivation of (29a) in which the prohibition against extracting
from subjects has been evaded—by first extracting the wh word from the
direct object, (29b), and then moving the direct object to subject position,
(29b)! (29a).There will be any number of ways to exclude any particular case.
For example, the derivation of (29a) via (29b) is ruled out by the cycle
in Williams 1974 and by extension in Chomsky 1995. RT automatically
excludes most remnant movements, including (29), in the following way.
A remnant movement always involves two rules, the remnant-moving
rule (wh movement in (29)) and the remnant-creating rule. For remnant
movement to take place, the remnant-moving rule must ‘‘reconstruct
for’’ the remnant-creating rule. In RT this will happen only when the
remnant-creating rule applies earlier than the remnant-moving rule. In
other words, remnant movement is really a special case of reconstruction,
and everything that has been said about reconstruction applies.
Since NP movement (or its equivalent) occurs in CS (or PS) or there-
abouts, and wh movement in SS, wh movement reconstructs for NP
movement, giving (28) but excluding (29a). The general implication for
remnant movement is this:
(30) Corollary about remnant movement
A moved remnant cannot contain a hole ‘‘bigger’’ (or ‘‘later’’) than
the one it creates.
Importantly, no stipulation is needed to ensure this behavior; it follows,
as do all reconstruction interactions, from the architecture of RT.
Some but not all of this (and I think in fact an arbitrary subpart) fol-
lows from minimalist practice, from the incremental version of the cycle
that Chomsky (1995) has called extension (see also Williams 1974). Ex-
tension requires that every operation must enlarge the tree. As mentioned,
extension blocks the derivation of (29a) from (29b), as the movement
from object to subject does not reach the edge of the tree, but ‘‘tucks in’’
beneath the subject, to use Richards’s (1997) term. But this works only
within a single clause. In a multiple-clause structure, extension does not
block illicit remnant movements. Consider the following derivation:
(31) a. seems a picture of who to be for sale wh movementm
b. seems who a picture of ti to be for sale NP movementm
c. *[A picture of ti]j seems [whoi tj to be for sale]?
134 Chapter 5
(31c) is ungrammatical, of course—but why? It is not ungrammatical be-
cause of extension—it never disobeys extension, because the NP move-
ment strictly follows the wh movement. Perhaps it is ungrammatical
because seem does not take a wh complement. But why is that? Why do
raising predicates never take wh complements? Whatever the reason (it
used to be government; now its identity is uncertain), it is clearly not the
principle of extension itself, so (31c) and (31a) receive fundamentally dif-
ferent accounts.
In RT (31c) is ungrammatical for exactly the same reason as (29a):
since all NP movements must precede all wh movements, there is no
opportunity for the interaction that (31) illustrates. In fact, this explana-
tion was already a part of the NP Structure model (Van Riemsdijk and
Williams 1981), of which RT in this regard is a generalization.
The conclusions about remnant movement will provide no comfort for
proponents of Antisymmetry. RT has a rich movement rule typology;
because the representational levels index the set of movement rules, many
opportunities for remnant movement arise even while excluding (25) and
(18), as these are not consistent with the regime of remnant movement
(expressed in (30)) that follows from RT architecture. But in a theory
with a greatly reduced inventory of movement rules—in extreme cases, a
single movement rule (Move XP)—this regime would allow no remnant
movement at all. And remnant movement theories tend to have a greatly
reduced inventory of types of movement, seeking in particular to elimi-
nate head movement (e.g., Koopman and Szabolcsi 2000).
5.5 Summary of Findings
The overall argument for generalizing the A/A distinction along the lines
suggested here is summarized in table 5.1. This table charts ‘‘reconstruc-
tion’’ possibilities. Each column represents a ‘‘reconstructing’’ movement
or relation of some kind. Each cell in the column specifies whether that
movement ‘‘reconstructs’’ for the purposes of the relation corresponding
to the row of that cell. For example, wh movement (second column)
‘‘reconstructs for’’ anaphor binding, in the sense that binding relations
are licensed by the pre–wh movement structure.
The squinting eye can detect a rough diagonal from top left to bot-
tom right, with check marks below the diagonal and stars above the
diagonal (and question marks where the facts are indeterminate). This
diagonal arises because of the correlation of reconstruction with levels;
A/A/A/A 135
Table 5.1
What reconstructs for what
This !reconstructs
for this # Focus
Wh
movement
Long
scrambling
Short
scrambling
NP
movement
Movement
for Case
Wh movementp
* ? *
Long scrambling ? (opaque) ? ? *
Weak quantifiersp p p
* ? (raising) ?
Anaphor bindingp p p
* *
Short scrambling ? (opaque) ? (opaque) * * *
NP movementp?
p p? *
Q-floatp?
p? * * *
Theta relationsp p p p p p
136
Chapter5
the correlation follows from how the levels are related to one another by
representation.
RT links the reconstruction correlation with two other correlations:
rule target type and locality also vary systematically across levels, as
described in previous chapters. So we now have a sketch of the full set of
what I have called the LRT correlations.
As I noted at the outset, the title of this chapter is facetious: it pretends
that the ‘‘bar’’ of A , which means ‘not’, can be iterated like the bar of
X-bar theory. The serious side of this abuse of notation is that if the
approach in this chapter is correct, then A and A are simply two arbitrary
points in a spectrum of rule types. The cost of moving from the binary
distinction to the n-ary one is a richness of rule types. But I think that
richness is more than compensated for by the LRT correlations, and the
fact that these correlations flow organically from the architecture of the
model.
It seems to me that the set of LRT correlations is highly constraining,
eliminating many possible analyses, since it ties together three di¤erent
qualities of syntactic relations. It also seems to me that the full set of LRT
correlations follows from the representation model in a way that cannot
be duplicated without the architecture that representation requires.
A/A/A/A 137
This page intentionally left blank
Chapter 6
Superiority and Movement
I have assumed thus far that ‘‘real’’ movement—specifically, wh move-
ment—does not arise from (mis)representation of one level by another,
but is in fact movement in the traditional sense, as a part of the definition
of one of the levels, tentatively identified as SS in previous chapters.
Scrambling and wh movement therefore each have a completely di¤erent
status in the theory. The considerations o¤ered so far in favor of sepa-
rating the two theoretically were (a) the observation that wh movement,
unlike, for example, object shift in Icelandic, shows nesting rather than
intersecting patterns, and (b) that rules like wh movement operate once
per applicable domain, whereas scrambling rules operate multiple times. I
suggested that nesting and single application are diagnostic of ‘‘true’’
movement. Now it is time to back that suggestion up.
The empirical anxiety that presents itself is of course the notion that as
more and more pieces of the puzzle fall into place, scrambling and wh
movement will come to be seen as basically the same operation. This has
certainly been the widely held view so far. And recent theories and find-
ings seem to bolster the identification of a single notion of displacement
that is responsible for both, individuating di¤erences being attributed to
nonessential features of the relations involved.
For example, in work from the past decade on some Slavic wh systems,
it appears that wh movement sometimes exhibits what might be seen as
‘‘parallel’’ movement within a single domain, resulting in intersecting
derivations. If this initial impression is sustained, it undermines RT, in
that it suggests that ‘‘real’’ movement must be governed by principles that
enforce parallelism of movement of a set of elements in a single structure;
these principles would naturally extend to ‘‘movement’’ versions of the
phenomena I have cited as cases of shape-conserving representation, such
as scrambling, and would suggest that a unified theory could be achieved
if all phenomena were treated as cases of movement. But then the repre-
sentation relation would be left with nothing to account for. To consider
concrete cases, if parallelism of some kind governs wh movement, then
why does it not govern object shift (as a movement rule) as well, thus
making redundant the representation account of object shift and related
phenomena under the regime of Shape Conservation?
I have cited parallelism (Shape Conservation) as evidence for decom-
posing clause structure so that parallelism can be said to hold of the re-
lation among the decomposed parts, but if very similar parallelism can
also be shown to hold within a tree, then the architecture that arises from
the decomposition is less interesting and may in fact stand in the way of a
truly general theory. So it is a pressing empirical and theoretical problem
to see whether various sorts of parallelism e¤ects can actually be assimi-
lated to one another, as success in this endeavor would undermine not
only the results of chapters 1 and 2, but also the host of generalizations
that follow from the LEC in chapters 3–5.
Richards (1997) has built a theory in which the parallelism e¤ects of
scrambling are derived from a general theory of movement that also has
wh movement in its scope—in other words, a unified theory. I suppose I
should have been tempted to build a unified representational theory as
well, by which I mean a theory in which the principal features of wh
movement derive from Shape Conservation. I could not see any interest-
ing way to do this, so I leave that possibility unexplored here. But I will
point to circumstantial evidence, some of which I think is based on com-
pelling analyses, for distinguishing movement from shape-conserving
mapping between levels.
6.1 Is Superiority a Case of Shape Conservation?
One type of parallelism e¤ect that wh movement exhibits, even in a lan-
guage like English, has been called Superiority since Chomsky 1973. An
obvious and worthy goal would be to develop a theory of Superiority
governing movement that could account for scrambling parallelisms by
claiming that they arise from construing scrambling as movement. Sev-
eral researchers, most notably Richards (1997), have constructed theories
exactly along these lines.
Clearly, if there is no real di¤erence between the ‘‘Superiority’’ e¤ects
of wh movement and the ‘‘parallelism’’ e¤ects of local scrambling, and
the like, such a unified theory must be sought. But in fact I think the
two sorts of parallelism are fundamentally di¤erent, and di¤erent in a
140 Chapter 6
way that draws exactly the distinction between a movement relation
(wh movement) and a (mis)representation relation, what I am calling
scrambling.
I will argue (and in fact already have, in Williams 1994b) that Superi-
ority is in any event not a constraint on movement, but a consequence of
BT, to the extent that Superiority violations are really Crossover viola-
tions. If the parallelism distortion found in Superiority violations turns
out to result from Crossover violations, then there is no way that the
analysis can be extended to scrambling. Theoretically, then, a lot is up
in the air: Can scrambling and wh movement be assimilated under one
general theory? Is the Superiority Condition a part of that theory? Is
nesting versus intersecting a diagnostic of anything? In presenting my case
for the BT treatment of Superiority, I will also review Richards’s (1997)
version of Superiority, as it seems closest to achieving the unified theory
of movement whose scope would include wh movement and scrambling.
I will argue that its main conclusions are incorrect and that the correct
understanding of Superiority would make the unified theory Richards
discusses impossible in any event.
Superiority (Chomsky 1973) says that if two wh phrases are both eligi-
ble to move by wh movement, the higher one moves.
(1) a. Who saw whom?
b. *Whomi did who see ti?
Superiority thus preserves the order of the two wh words and so can be
understood to enforce a kind of parallelism constraint. Starting from (1),
we might seek to expand the coverage of Superiority to all parallelism
e¤ects, including the ones that were used in previous chapters to moti-
vate RT levels and Shape Conservation. So, another form of the central
question I would like to address in this chapter is, can A movements
and short-distance scrambling be shown to be governed by a general
Superiority Condition? In short, do A movement and scrambling show
Superiority e¤ects, in the sense in which wh movement does?
The answer will be an unequivocal no. In Williams 1991, 1994b, I sug-
gested an analysis of Superiority that in fact would make the extension to
A movement and scrambling impossible. Taking my inspiration from
Chierchia’s (1992) theory of the ambiguity ofWho does everybody like t?–
type sentences, I suggested that multiple-wh questions involve a ‘‘bind-
ing’’ relation between the unmoved wh word and the trace of the moved
wh word, so that Superiority violations are really Weak (and sometimes
Strong) Crossover violations.
Superiority and Movement 141
(2)
Since the object position can never bind an anaphor that occupies the
subject position, binding of who by ti here clearly violates BT (Weak/
Strong Crossover); as a result, Superiority is reduced to W/SCO.
But what purpose would such binding serve? It clearly does not result
in ‘‘coreference’’ between the two terms. But it has been noted since Kuno
and Robinson 1972 that the relation between the two wh words in a
multiple-wh question is not symmetric, the moved wh word serving as
an ‘‘independent’’ variable (‘‘sorting key,’’ to use Kuno and Robinson’s
term) and the unmoved wh word as the ‘‘dependent’’ variable. This can
be seen in the answers to multiple-wh questions. Such questions can be
answered by giving pair lists, but also by giving a function for relating the
dependent and the independent variable, so understood.
(3) a. Who wrote what?
b. Each student wrote his own name.
(3b) maps students onto written things—exactly such a function.
We must not be misled by the fact that list answers can be given to
multiple-wh questions.
(4) A: Who read what?
B: Bill read Moby Dick,
Sam read Omoo,
Pete read Typee.
A list is simply one way to specify a function; the function in (4B) is
(f(Bill) ¼Moby Dick, f(Sam) ¼ Omoo, . . .). In fact, ‘‘function’’ is exactly
the right notion. A function can map di¤erent independent variables onto
the same dependent variable, but it cannot map one and the same in-
dependent variable onto di¤erent dependent variables; and answers to
multiple-wh questions seem to conform to this restriction.
(5) A: Who read what?
B: Bill read Moby Dick,
Sam read Omoo,
Pete read Omoo.
*B 0: Bill read Moby Dick,
Sam read Omoo,
Sam read Typee.
142 Chapter 6
The (5B 0) answer is odd; it can be improved by replacing the last twoparts of the answer with Sam read Omoo and Typee, which restores func-
tionhood (K. Kohler, personal communication).
Telling evidence for this view comes from the following example:
(6) Who knows what who bought t?
On the standard account this should be a Superiority violation, because
what has crossed over who. But on my account it will be a violation
on only one interpretation, the one where the embedded who function-
ally depends on what. When the embedded who depends on the matrix
who, the WCO configuration does not arise, and in fact the sentence is
grammatical on exactly that one interpretation. This view leads to a
straightforward account of the role of D-linking in Superiority as well,
incorporating the findings of Pesetsky (1987). See Williams 1994b for
more details.
A further observation on Superiority in German by Wiltschko (1997)
strongly supports the view that the two wh words in a multiple-wh ques-
tion are in the dependent-independent variable relation. Wiltschko first
presents incontrovertible evidence that German has Superiority e¤ects,
e¤ects that had been hidden from previous investigators by factors
that Wiltschko identifies and controls for. In a discussion of the role of
D-linking in Superiority violations, she then shows that Superiority in
German is governed by a very mysterious semantic condition, illustrated
by the following examples:
(7) I am sure that Peter and Mary must have talked to each other on the
phone.
a. Weißt
know
du
you
wer
who
wen
whom
angerufen
called
hat?
has
‘Do you know who called whom?’
b. *Weißt du wen wer angerufen hat?
(Wiltschko 1997, (32))
(8) I am sure that Peter, Paul, and Mary must have all talked to each
other on the phone.
a. Weißt du wer wen angerufen hat?
‘Do you know who called whom?’
b. Weißt du wen wer angerufen hat?
(Wiltschko 1997, (33))
The only di¤erence between (7) and (8) is that there are three individuals
in (8) and only two in (7); but, as Wiltschko notes, this means that in (8)
Superiority and Movement 143
‘‘the answer can consist of (at least) two pairs.’’ We might take this
strange condition to be a condition on how dependent and independent
variables are related: if the independent variable cannot take at least two
di¤erent values, then there is no real nontrivial function, just a simple
fixed answer. Apart from any such consideration the condition is quite
peculiar, and, as Wiltschko shows, it does not follow from any of the
accounts of D-linking; in fact, the initial ‘‘setup’’ sentence in both (7) and
(8) guarantees the D-linking of all the wh phrases, thereby eliminating it
as a factor in discriminating them.
Important for present concerns is the conclusion that since this account
of Superiority is specific to binding relations, it is impossible to extend it
to scrambling, because elements that are A-scrambled do not in general
bear any relation to one another, binding-theoretic or otherwise. For ex-
ample, in neither (9a) nor (9b) is there a binding relation, or any other,
between Johann and das Buch.
(9) a. weil
because
Johann
Johann
das
the
Buch
book
gelesen
read
hat
has
‘because Johann read the book’
b. weil das Buch Johann gelesen hat
Referentially speaking, Johann and das Buch are completely independent
of one another, showing no coreference or dependency of reference, and
so there is no reason for any BT principle to force them to be in one or
another structural relation with each other. Likewise, the scrambling of
the verb and its complement NPs over negation in Scandinavian lan-
guages discussed in chapter 2 cannot conceivably involve any binding
relations among the moved elements, in general. So if the BT account
of Superiority in Williams 1994b is correct, it simply cannot be extended
to A movement. Put the other way around, if Superiority needs to be
extended to A movement and especially to scrambling, then it must be
something very di¤erent from what I have just suggested, and at heart it
must not have anything to do with the configurations that license depen-
dent reference.
But as I mentioned at the outset, some multiple-wh constructions
look at first glance just like Icelandic scrambling, suggesting a common
account. To use the terminology of this book, they raise the question
whether wh movement is shape conserving in the sense in which I am
using that term here. It is important that wh movement not show any true
shape-conserving properties, because if it does, there is no good reason to
144 Chapter 6
distinguish scrambling from other cases of movement and the rationale
for RT begins to evaporate. So my plan will be to show that wh move-
ment appears to have shape-conserving properties for special simple cases
where BT relations are involved, but that it is not shape conserving in
general.
The Slavic languages provide a rich source of information on multiple-
wh structures, including parallelism e¤ects of a kind that can be only
weakly illustrated by the English Superiority paradigm. In Bulgarian
multiple-wh questions, for example, both wh words move to the front of
the clause, maintaining their relative order.
(10) a. Kogo
whom
kakvo
what
e
aux
pital
asked
Ivan?
Ivan
‘Whom did Ivan ask what?’
b. *Kakvo
what
kogo
whom
e
aux
pital
asked
Ivan?
Ivan
(Boskovic 1995, 13–14, as reported in Richards 1997, 281)
There is one obvious di¤erence between Bulgarian and English
multiple-wh questions that I will put aside for the moment: in Bulgarian
all of the wh phrases in a multiple-wh question move, whereas in English
only the single independent one moves. I will concentrate first on what
the languages have in common: a single wh word is selected, moved to the
front, and interpreted as the independent variable. I will return to the
di¤erence in the fate of the dependent variables later.
In RT, facts such as those in (10) could be treated in two di¤erent
ways, with di¤erent consequences. (10) could be subsumed under a gen-
eral theory of Superiority of the type already discussed, which reduces
Superiority to a BT relation; or it could be accounted for by whatever RT
mechanism gives rise to the parallelism e¤ects in multiple scrambling
structures—the Shape Conservation principle regulating interlevel match-
ing, as I have proposed.
For the special case of two wh words, Shape Conservation appears to
hold, as (10) illustrates, and it could be used to support either account.
But for other cases wh movement appears not to obey Shape Conserva-
tion, suggesting that it is not to be subsumed under the same theory as
scrambling or object shift (analyzed as an interlevel mismatching con-
strained by Shape Conservation) and so must be an instance of ‘‘real’’
movement constrained by the W/SCO account of Superiority proposed in
Williams 1994b.
Superiority and Movement 145
For example, Bulgarian multiple-wh questions involving three wh
words exhibit the following behavior:
(11) a. Koj
who
kogo
whom
kakvo
what
e
aux
pital?
asked
b. Koj kakvo kogo e pital?
c. *Kakvo koj kogo e pital?
(Boskovic 1995, 13–14, as reported in Richards 1997, 281)
The wh word that was highest before movement (here, koj ) must remain
highest after movement, but the other two wh words can appear in either
order. Why would this be? In the theory of Superiority I have just out-
lined, the answer is straightforward: each of the lower wh words must
stand in the dependent-independent variable relation to the highest wh
word, a relation governed by BT; but they need not bear any particular
relation to each other. That is, the dependence is strictly binary. In this
regard Bulgarian is just like English, where in (12), for example, what
depends on who and whom depends on who, but what and whom bear no
particular relation to each other.
(12)
Why should they? Two reflexive pronouns, for example, may share an
antecedent, but nothing in any theory I am aware of forces any particular
structural relation between the two anaphors themselves.
(13) John gave a picture of himself to himself.
In a multiple-wh question there can be but one independent variable,
and the rest of the wh words are dependent. In English, the independent
variable is the moved one, and all the unmoved wh words must be de-
pendent on it. This is perhaps why wh movement is obligatory in English,
in the sense that exactly one wh word must move: if an unmoved wh
word must be dependent, then there must be a moved wh word that is
independent.
So Bulgarian and English multiple-wh questions are alike in that one
wh word (always the independent variable) moves to SpecC, and the rest
of the wh words are dependent on that one. They di¤er in that in Bul-
garian all wh words are moved (or scrambled) to the position of the
moved wh word (a di¤erence I will take up in later sections). But that is
perhaps the only di¤erence; in other words, there is no more reason for
146 Chapter 6
Bulgarian than for English to assume that the dependent wh words bear
any particular relation to one another.
In the Icelandic object shift construction involving V and two NPs (see
chapter 2), the situation is quite di¤erent; it is in fact the entire constella-
tion of VþNP1 þNP2 whose pieces can be reordered with negation, butnever in such a way as to reorder any of the parts of the constellation.
Here an entire pattern is being holistically conserved; in multiple-wh
movement, only the relation of each of the dependent variables to the in-
dependent variable is conserved. So the condition governing object shift
(Shape Conservation) and the condition governing multiple wh (BT ap-
plying to the dependent-independent variable relation) are fundamentally
di¤erent, and di¤erent in a way that flows from their very di¤erent status
in RT.
As Richards (1997) shows, only the first of the wh words in a multiple
question shows Subjacency e¤ects in its relation to its deep position. He
accounts for this in terms of his notion of a ‘‘Subjacency tax’’: in e¤ect,
the first movement to a particular SpecC must obey Subjacency, and all
further movements to that SpecC are free to violate Subjacency. Strictly
speaking, we could preserve the RT program by simply accepting this
view in the present context as well and moving on to other questions. But
the distinction we have been using between dependent and independent
wh words suggests a di¤erent view: namely, that only independent wh
words are subject to Subjacency. The movement of the dependent wh
words could be e¤ected by further applications of wh movement (relieved
of the need to obey Subjacency) or, in RT, by interlevel scrambling. In
the following, I will suggest that Rudin’s (1988) original distinction is
valid (see section 6.2.1): the independent wh word moves by wh move-
ment, an intralevel movement, and the rest of the wh words move by
scrambling, the interlevel relation governed by Shape Conservation.
6.2 Scrambling Wh Words
There is good evidence that multiple wh movement always involves
scrambling. One consideration is the role of D-linking in governing mul-
tiple wh movement—essentially the same as its role in governing scram-
bling. Another consideration is the focusing e¤ects that the reordering of
wh words have on interpretation—again, just what is found with scram-
bling. Focusing and D-linking are of course di¤erent, as I emphasized in
chapter 2. D-linked elements can be focused, a fact that seems to me to
Superiority and Movement 147
have been overlooked, partly because of the notion that focusing involves
‘‘new information’’ and D-linking ‘‘old information,’’ which I regard as a
confusion (see chapter 2 and Wiltschko 1997).
6.2.1 Scrambling and D-Linking
There is clear evidence, presented in Rudin 1988 and strengthened since
then, that the movement of the ‘‘extra’’ wh words in both Serbo-Croatian
and Bulgarian multiple-wh questions is due to focus-motivated scram-
bling, and not to a rule akin to wh movement. Part of the evidence comes
from the behavior of D-linked wh expressions.
First, D-linked wh expressions in Serbo-Croatian need not move,
whereas non-D-linked wh expressions must, on the assumption that in
this language, as in English, bare wh words are not D-linked, but ‘which
N’ NPs are.
(14) a. Ko
who
sta
what
kupuje?
bought
b. *Ko
who
kupuje
bought
sta?
what
c. Ko
who.nom
je
aux.3sg
kupio
bought.prt
koju
which
knjigu?
book.acc
(Konapasky 2002, 101)
However, if the D-linked wh word is the only wh NP in a question, then it
must move.
(15) Jucer
yesterday
je
aux.3sg
Petar
Petar.nom
kupio
bought.prt
koju
which
knjigu?
book.acc
(Konapasky 2002, 105)
These facts suggest that wh movement is obligatory in the sense that a wh
SpecC must be filled, but not obligatory apart from that. Moreover, they
suggest that the movement of the noninitial wh words is not a movement
targeting wh attractors, but a kind of scrambling. Boskovic (1999) in fact
suggests that there is no wh movement in Serbo-Croatian single-clause
sentences, only scrambling; but Konapasky (2002) uses the facts just cited
to justify Rudin’s original claim, against Boskovic’s—namely, that the
first wh phrase targets wh attractors, but the others do not.
A second rather e¤ective argument can be built on the fact that in some
dialects at least, non-D-linked wh phrases in embedded non-wh clauses
are not extracted from their embedded clause, but are nevertheless obli-
gatorily fronted within their clause.
148 Chapter 6
(16) a. Kokwho.nom
tvrdis
claim.2sg
[da
that
koga
who.acc
tk voli]?
love.3sg
‘Who do you claim that who loves?’
b. *Kok tvrdis [da tk voli koga]?
(Konapasky 2002, 97)
Koga in (16a), whose movement is apparently obligatory, does not end up
in the supposedly triggering Spec in the matrix. Rather, this movement
appears to happen in response to D-linking-related scrambling pressures
that arise within the embedded clause itself. This again strongly suggests
that the movement of the noninitial wh words is not targeting wh attrac-
tors in any of the cases.
Taken together, then, the behaviors exhibited by D-linked wh ex-
pressions strongly suggest that the first wh phrase is moved by obligatory
wh movement, and that the other wh phrases are moved by D-linking-
sensitive scrambling.
Further evidence leading to the same conclusion comes from (only)
Bulgarian. As noted earlier, multiple wh movement in Bulgarian is order
preserving; however, as both Rudin (1988) and Richards (1997) discuss,
D-linked wh expressions do not obey this stricture.
(17) a. Koj
who
kogo
whom
e
aux
vidjal?
seen
‘Who saw whom?’
b. *Kogo koj e vidjal?
c. Koj
which
profesor
professor
koja
which
kniga
book
e
aux
vidjal?
read
d. ?Koja kniga koj profesor e vidjal?
(Richards 1997, 104; from R. Izvorski, personal
communication)
There is apparently a noteworthy di¤erence between (17b) and (17d),
and D-linking is presumably implicated since the di¤erence comes down
to ‘which N’ versus ‘who’. Why is crossing-over allowed for D-linked
phrases only? Richards (1997, 111) suggests that there is an extra attrac-
tor in (17d), a Topic phrase above CP. I will accept this conclusion, but
interpreted in RT terms—it implies that in (17d) the primary wh move-
ment a¤ects koj profesor; the movement of koja kniga is secondary, and
hence scrambling. If the wh-moved phrase in (17d) is the independent
variable, then the topicalization of koja kniga reconstructs for the estab-
lishment of the dependent-independent variable relation, and koja kniga
is therefore the dependent variable, despite appearing first in the clause.
Superiority and Movement 149
(18)
XS is whatever level the dependent variable relation is licensed in, and YS
‘‘misprepresents’’ XS (symbolized by ‘‘‘!’’).
It is important to realize that there is nothing incoherent about top-
icalizing, focusing, or in any manner moving a dependent variable to
the head of the clause. In fact, this happens in English ( just not with wh
variables).
(19) Which of his poems does every poet like best t?
In (19) his poems is dependent on every poet but has been moved beyond
it by wh movement. The dependent variable relation is determined in such
cases under reconstruction.
The conclusion that the primary wh word undergoes wh movement,
and that the movement of the secondary wh words is achieved by di¤erent
means, has consequences for models of these structures. A model in
which the only means for displacement is the unified and general theory
of ‘‘movement’’ is hard pressed to account for the di¤erent behaviors of
di¤erent kinds of movements without undermining the unification and
generality of the theory. I think that Richards’s (1997) theory of move-
ment comes the closest to addressing these questions. In his view, what I
would call shape-conserving movement occurs whenever several elements
are attracted to the same (instance of ) the same feature. Shape Conser-
vation is not a principle of Richards’s theory, but a consequence of how
Shortest Move is defined (see Richards 1997 for a discussion of the defi-
nitions that yield the results).
Richards analyzes multiple-wh question movement in Serbo-Croatian
and Bulgarian as multiple movements to a single attractor. But examples
like (16a) pose di‰culties for the view that the movement of the second-
ary wh expressions is provoked by a wh attractor, since the movement
does indeed occur, and in fact obligatorily, but not to the site of the pur-
ported attractor.
In the RT view there are two kinds of displacement: movement, with
approximately the properties associated with wh movement, and scram-
bling, which results when the shape-conserving mapping that must hold
between levels is relaxed for one reason or another.
150 Chapter 6
6.2.2 Long-Distance versus Short-Distance Scrambling
I now take up Rudin’s (1988) notion that the final position of all but the
first of the wh words in a multiple-wh question arises from scrambling,
looking especially at problems that come up in implementing her insights
in RT.
6.2.2.1 Serbo-Croatian If the above conclusions about how Bulgarian
and English multiple-wh questions work are correct, then we will not
want to extend Superiority to A movements or scrambling; we have good
reason to maintain that local scrambling and object shift are best ana-
lyzed as interlevel holistic (mis)mapping, whereas the relation between wh
words in long-distance multiple wh movement is best treated as pairwise
instances of binding within a single level. In fact, though, some construc-
tions seem to arise from an interaction between the two kinds of relations,
and it has been a commonplace in the literature on multiple wh move-
ment since Rudin 1988 to distinguish long- and short-distance movement
along these lines.
Serbo-Croatian di¤ers from Bulgarian in allowing reordering of wh
words.
(20) a. Ko
who
je
aux
koga
whom
vidjeo?
saw
‘Who saw whom?’
b. Koga je ko vidjeo?
Serbo-Croatian thus shows no Superiority e¤ects here; however, with
long multiple wh movement (not grammatical for all speakers) Superior-
ity e¤ects again show up.
(21) a. Ko
who
je
aux
koga
whom
vidjeo?
saw
b. Koga je ko vidjeo?
c. Ko
who
si
aux
koga
whom
tvrdio
claimed
da
that t
je
aux
istukao?
beaten t
‘Who claimed that who was beaten?’
d. *Koga si ko tvrdio da je istukao?
(Boskovic 1995, as reported in Richards 1997, 32)
(21a,b) show that exchange is possible for short movements, while (21c,d)
show that it is not possible for long ones.
This di¤erence between long and short scrambling is familiar from
the findings in chapter 5. The scrambling involved in (21) resembles the
Superiority and Movement 151
scrambling that ‘‘fixes’’ WCO violations. Since we are in fact assuming
that Superiority is a subcase of WCO, these facts are not surprising. If,
for concreteness, we assume that WCO is adjudicated in PS, then the
scrambling in question could occur in PScCS. Rudin (1988) in fact
gives evidence that Serbo-Croatian has WCO-correcting scrambling in-
dependent of what happens to wh words.
(22) CS ‘
��!
PS
WCO
scrambling
Although Serbo-Croatian SS is compatible with either order of two wh
words, we should not expect the two orders to be equivalent (‘‘Nature
hates a synonymy’’), and they are not. Konapasky (2002) translates the
two cases in the following way:
(23) a. Ko
who.nom
je
aux.3sg
sta
what.acc
prodao?
sold.prt
‘Who sold what?’
b. Sta
what.acc
je
aux.3sg
ko
who.nom
prodao?
sold.prt
‘What exactly did who sell?’
Konapasky interprets the di¤erence as a di¤erence in focus, pointing out
that in (23b) the moved wh word is interpreted as focused. We might
slightly reinterpret this finding in light of the ideas about the dependent-
independent interpretation of multiple questions; we could well imagine
that what is special about (23b) is the reversal in the dependent-indepen-
dent interpretation. This is the ‘‘marked’’ interpretation precisely because
it is the one that does not faithfully mirror PS; but the mismatch is
licensed precisely because it does achieve the other construal of the sen-
tence, the one switching the dependent and independent variables. Such
a change in interpretation is consistent with the conclusion that Serbo-
Croatian has WCO-fixing CS‘PS scrambling. On this account the pri-
mary interpretive di¤erence will be ‘‘logical’’—having to do not with
focus, but with the relation between the two wh words; the focusing dif-
ference would then be a side e¤ect. Further study of the semantic di¤er-
ence between the inverted and uninverted structures is clearly required, as
I am only guessing at what Konapasky’s gloss might mean.
In Bulgarian, where any wh word except the top one can reorder (see
(11)), one would expect di¤erences in meaning to be associated with the
152 Chapter 6
di¤erent orders. I have seen no discussion of the relevant cases, and I had
di‰culty getting Bulgarian informants to verbalize any such di¤erence.
An intriguingly similar situation arises in a completely di¤erent do-
main: the ordering of prenominal adjectives in English. As is well known,
the ordering is largely fixed, although the principles governing the order
remain obscure.
(24) a. i. a big red house
ii. *a red big house
b. i. a stupid old man
ii. *an old stupid man
It may be that the ordering is determined by something like which predi-
cate expresses a more natural ‘‘general’’ class, where the relevant sense
of natural is not strictly linguistic. Be that as it may, the relevant point
here is that the ‘‘wrong’’ order can legitimately occur, but with a special
interpretation.
(25) a. a RED big house
b. an OLD stupid man
At first glance the di¤erence between (24) and (25) might be seen as a pure
focusing e¤ect, a kind of contrastive focusing. In a Checking Theory,
for example, one might insert a Focus projection somewhere in the func-
tional structure of NP (or DP), with a feature that draws the focused
adjective to it.
I think, though, that this approach is not correct, and in fact that the
focusing e¤ect here, as in the Serbo-Croatian inverted-wh cases, is sec-
ondary. The crux of the focusing account is that the adjectives in the
inverted cases function semantically as though they occurred in their
uninverted order except for the fact that they are focused; that is, focusing
is laid on top of the usual interpretation that these adjectives would have.
But the focusing account of inversion can be sustained only for cases
that involve predicates for which di¤erences in the reference of the NP
would seem to be indi¤erent to the predicates’ order. To take (24bi), if we
take the set of old men and then take the stupid ones of those, we should
get the same result as if we were to take the stupid men and then take
the old ones of those. So there appears to be no ‘‘logical’’ di¤erence in the
interpretation of the two cases. But for an important class of cases the
intersective interpretation of the adjectives is not available.
(26) the second green ball
(Matthei 1979)
Superiority and Movement 153
Here the order is fixed: first we take the green subset of all balls, and then
we take the second (according to some ordering scheme) of those. Impor-
tantly, in such cases reversing the two adjectives produces more than just
a change in the focusing.
(27) the GREEN second ball
(27) is sensible only when there is some way to define a set of ‘‘second
balls’’ and then take the (unique) green one from it. For example, if we
came across a two-dimensional array of balls, we might understand the
second column of balls to be the set of ‘‘second balls’’ and then look for
the green one among them.
Most significantly, (27) is not ambiguous, and in particular it has no
interpretation that has the same extension as (26). This means that the
interpretation is not simply focusing laid on top of the usual interpre-
tation of the two adjectives. Rather, the fundamental logical relation
between the two adjectives has changed. (27) forces into existence a
weird notion, the set of ‘‘second balls’’; but as soon as we understand how
that notion might be realized in some concrete situation, the weirdness
subsides.
In turn, this means that if the adjectives are inverted by scrambling,
then that scrambling precedes the ‘‘compositional’’ semantic interpreta-
tion of modification. Importantly, this ordering is obligatory, as I think
there is no alternative purely ‘‘focused’’ interpretation of (27).
It seems to me that exactly the same holds for Serbo-Croatian scram-
bling of wh words: it precedes the establishment of the basic logical rela-
tions among the wh words, and thus precedes the level (by assumption,
PS) in which those relations are established. This conclusion is buttressed
by the fact that Serbo-Croatian in any case has a type of scrambling that
could do this, namely, WCO-fixing scrambling.
(28) a. ??Njegovihis
susjedi
neighbors
ne
not
vjeruju
trust
nijednom
no
politicarui.
politician
b. Nijednom politicarui njegovi susjedi ne vjeruju.
(Richards 1997, 30; from M. Mihaljevic, personal
communication)
6.2.2.2 Modeling Bulgarian Bulgarian di¤ers from Serbo-Croatian in
not allowing scrambling of the two wh-words in a multiple-wh question;
as I noted earlier, drawing on Richards 1997 and Boskovic 1999, scram-
154 Chapter 6
bling can occur among the subordinate wh words, but none of them can
scramble with the first wh word.
(29) a. Koj
who
kogo
whom
kakvo
what
e
aux
pital?
asked
b. Koj kakvo kogo e pital?
c. *Kakvo koj kogo e pital?
(Boskovic 1995, 13–14, as reported in Richards 1997, 281)
This is trickier to model in RT than the Serbo-Croatian situation, and
in fact it cannot be modeled straightforwardly. Clearly, scrambling occurs
in Bulgarian, but not before wh dependency relations are determined;
otherwise, (29c) would be grammatical, as its counterpart is in Serbo-
Croatian, and its dependencies would be the reverse of those in (29a). At
the same time, though, ‘‘free’’ scrambling cannot occur after the determi-
nation of wh dependencies; if it could, we would again expect (29c) to be
grammatical, but with a ‘‘reconstructed’’ interpretation—that is, with an
interpretation identical to that of (29b). But this leads to the conclusion
that there is no scrambling at all, which of course is inconsistent with the
fact that both (29a) and (29b) are grammatical. So any straightforward
interleaving of scrambling and the other levels involved here (Case as-
signment, the establishment of wh dependencies) by itself will not do jus-
tice to the known facts.
I will propose a solution that capitalizes on what we already know
about wh movement: it always moves the independent variable. If this
feature of wh movement is held constant across all languages, then the
Bulgarian facts can be accounted for in this way: scrambling takes place
after wh dependencies are determined, but before wh movement, as shown
in (30).
(30) CS
Case
‘ PS
wh dependencies
‘
��!
scrambling
SS
wh movement
So wh movement will apply after scrambling, but because it is constrained
to move only the independent variable, it will lift that independent vari-
able, no matter where it lies among the scrambled wh phrases, to the top
of the structure.
For example, (29b) will have the following derivation (where I ¼independent and D ¼ dependent):
Superiority and Movement 155
(31)
The PScCS representation is rigid, and wh dependencies set up in PS
are fixed: the ‘‘superior’’ wh word must be chosen, since it is identified as
the independent variable. Then scrambling occurs in the PS‘ SS repre-
sentation; by itself, it would give the appearance of ‘‘reconstructed’’
dependencies; the NP superior in CS would be interpreted as the inde-
pendent variable no matter what the surface order. But wh movement
then moves the independent variable to the top in SS, and so the inde-
pendent variable in e¤ect ‘‘regains’’ its original superior position.
I must admit I feel uneasy about this account, because it feels like
‘‘cheating’’ against the spirit of RT. Specifically, by allowing wh move-
ment to target only the independent variable, we are ‘‘coding’’ a Superi-
ority property at a previous level (CS, PS) and then allowing it to reassert
itself at a later level.
Against that unease, I rehearse to myself the following. First, the
needed feature of wh movement is already attested for English, and for
that matter, for Bulgarian; and even Japanese, with no overt wh move-
ment, shows a pattern that reflects the wh variable dependencies.
(32) a. *John-ga
John-nom
naze
why
nani-o
what-acc
katta
bought
no?
q
b. John-ga nani-o naze katta no?
(Saito 1994)
Here scrambling must obligatorily reorder the wh words. We might take
these facts to show that (a) why is not a good independent variable (as
proposed in Williams 1994b), and (b) Japanese has a scrambling rule that
can apply before wh dependencies are determined. That is, (32a) is the CS
order, but (32b) is the scrambled PS order, the one on which wh depen-
dencies are calculated; (32a) as a CS order will yield an interpretation in
which naze is the independent variable, and so is ungrammatical.
Support for (a) comes from some observations about English quantifi-
cation constructions.
156 Chapter 6
(33) a. *For every reason, someone left.
b. Everyone left for some reason.
c. For every girl, there is a boy.
If we regard the interpretive configuration (Q1 (Q2 (. . .))) as a dependency
of Q2 on Q1, then (33a) shows that reasons are not good independent
variables, but they are good dependent variables. (33c) is a control show-
ing that (33a) is not ungrammatical simply because every cannot take
wide scope from the preposed position. (33a) might have an interpreta-
tion for some speakers in which someone has wider scope than every, but
that is irrelevant here. See chapter 5 and section 6.2.2.3 for more on the
varieties of scrambling in Japanese.
Further support for this interpretation of (32) comes from the following
example:
(34) Dare-ga
who-nom
naze
why
nani-o
what-acc
katta
bought
no?
q‘Who bought what why?’
(Richards 1997, 282)
The additional wh word dare in this example makes it possible for naze to
precede nani. In the context of the proposal just made, the reason is that
dare is now the independent variable, on which both naze and nani are
dependent, and so naze is not forced into the position of being the inde-
pendent variable.
The second thing I rehearse against my unease about the solution
under discussion is that it might not be necessary to countenance deriva-
tions like (34), where a short movement ‘‘hides’’ invisibly beneath a long
movement. I turn to this topic in section 6.3.
6.2.2.3 Long versus Short Monoclausal Scrambling in Japanese Japa-
nese presents a similar puzzle concerning A and A movement, and like-
wise presents a puzzle for straightforward Checking Theories. According
to a well-known generalization originally due to Kuroda (1970) (see also
Hoji 1986), Japanese quantifiers are not unambiguous in situ, but move-
ment introduces scope ambiguity.
(35) a. ~&Dareka-ga
someone-nom
daremo-o
everyone-acc
hihansita. (b > E)criticized
b. &Daremo-o dareka-ga t hihansita.
(Kuroda 1970)
Superiority and Movement 157
One way to understand the di¤erence between (35a) and (35b) is to sup-
pose that a moved quantifer may be interpreted in either its moved or its
unmoved position and thus has an ambiguous relation to anything that it
moves over. In classical terms we might understand this in the sense of A
versus A movement: A movement results in interpretation in the moved-to
position, whereas A movement results in interpretation in the moved-from
position (i.e., scope reconstruction for A but not A movement). In the RT
relativization of the A/A distinction, we would say instead that scram-
bling occurs either before or after scope determination.
The special problem for Checking Theories arises in cases where two
NPs move over the subject.
(36) a. ~&John-ga
John-nom
dareka-ni
someone-dat
daremo-o
everyone-acc
syookaisita. (b > E)introduced
b. ~&Dareka-ni John-ga daremo-o syookaisita. (b > E)c. ~&Dareka-ni daremo-o John-ga syookaisita. (b > E)d. &Daremo-o John-ga dareka-ni syookaisita.
(Yatsushiro 1996, as reported in Richards 1997, 82)
(36a) and (36b) are as expected. However, (36c) is surprising on the
Checking Theory account: if (36c) has the representation in (37), we ex-
pect it to be ambiguous, which it is not.
(37) [NP1-ni [NP2-o [NP-ga t1 t2 V]]]
The reason is that if both NP1 and NP2 are scopally ambiguous be-
tween deep and derived positions, then either scope order is possible:
NP1 > NP2 if NP1 takes scope from the derived position and NP2 takes
scope from the deep position, and the reverse if the reverse. But the fact is
simply that NP2 cannot take scope over NP1.
Importantly, though, if the two NPs crossing the subject switch their
relative order, then ambiguity results again.
(38) a. &Daremo-o dareka-ni John-ga syookaisita.
b. ~&Dareka-ni daremo-o John-ga syookaisita. (b > E)
The problem for Checking Theory is that it atomizes the NP movement
relations here, as each NP is checked independently of the others. It
therefore cannot account for e¤ects that arise from the relative ‘‘move-
ment’’ of two NPs with respect to each other, just the kinds of e¤ects for
which RT was envisaged.
Richards (1997) uses such examples to promote the idea that Superior-
ity holds for A movement, once ambiguity is controlled for. That is, (38b)
158 Chapter 6
illustrates A movement obeying Superiority, and (38a) doesn’t count, be-
ing ambiguous. In Richards’s account of (38a) the two NPs move to the
same functional node, and Superiority dictates their relative order. In
(38b) they again move to the same node, in the same order, but an extra
higher attractor (EXTRA in (39)) attracts NP-o to a higher position,
giving rise to ambiguity; since the extra attractor only attracts NP-o, Su-
periority does not prevent this movement (see Richards 1997 for formu-
lation of the relevant principles).
(39) [NP2-o [NP1-ni [t2 [NP-ga t1 t2 V]]]]EXTRA
Positing extra attractors does not solve this problem, though. In prin-
ciple, there is now no reason not to posit yet another attractor that
attracts NP-ni over the (derived) position of NP-o, thus again predicting
that the order ‘‘NP-ni NP-o NP-ga’’ will be ambiguous.
(40) [NP1-ni [NP2-o [t1 [t2 [NP-ga t1 t2 V]]]]EXTRA1 ]EXTRA2
But we know from (36c) that it is not.
The basic generalization about Japanese quantifiers is, ‘‘If two NPs
cross, ambiguity results,’’ understood in such a way that NP-ni and NP-o
do not cross in (36c), but do cross in (38a). But Checking Theory, because
it atomizes movement relations, cannot deal with cases where several
things move in concert. It must be augmented with an extrinsic principle
that controls either the input or the output of the derivation in a way that
has nothing to do with the operation of Checking Theory itself. In this
way, Checking Theory can be shielded against these and other related
empirical challenges, but at the cost of having less and less to say about
how these systems actually work.
How are these facts to be accounted for in RT?
Let us suppose that the orders of NP-ga, NP-ni, and NP-o, are repre-
sented in CS by the following structure:
(41) [NP-ga [NP-ni [NP-o V]]]
And let us suppose that SS can be generated by general rules, such as
(42).
(42) S! [NP1 [NP2 [NP3 V]]]Suppose further that SS uniquely determines quantifier scope; that is,
SScQS is strictly enforced.
The problem then reduces to this: how can (41) be mapped onto (42)
isomorphically? There is only one way, of course: NP-ga! NP1, and so
Superiority and Movement 159
on. This is why (41) as a surface structure is not ambiguous. The other
mappings are misrepresentations of (41). The following two are the mis-
representations that give rise to (38a) and (38b), respectively:
(43)
(44)
(43) shows why ‘‘NP-ni NP-o NP-ga’’ has wide scope for NP-ni, and (44)
shows why ‘‘NP-o NP-ni NP-ga’’ has wide scope for NP-o.
Now the question remains, why can NP-ni have wide scope in ‘‘NP-o
NP-ni NP-ga V,’’ when NP-o does not have wide scope in ‘‘NP-ni NP-o
NP-ga’’? The answer must be that ‘‘. . . -o . . . -ni . . .’’ is more distant from
CS than ‘‘. . . -ni . . . -o . . .’’ and so is warranted only if a di¤erence in
meaning is achieved—that is, only if the further mismatch is compensated
by a closer match to QS. At least, that is the logic of RT.
6.2.2.4 Long versus Short Scrambling in Hungarian The standard
treatment of long versus short scrambling facts is to posit two di¤erent
movements, A and A, and (or or) two di¤erent positions, A and A, which
are their respective targets. This is the strategy adopted in Checking
Theories, for example. But in the context of RT, we could instead pro-
pose a single position that is the ‘‘target’’ of two di¤erent ‘‘movements’’:
a ‘‘virtual’’ movement, which arises as a part of the (mis)representation
of one level by another, and an A movement, which is intralevel SS
movement.
Analysis of WCO and Superiority facts in Hungarian suggests that this
must be so. There appears to be only one Focus position, which appears
just before the verb and whose filling triggers verbal particle postposing;
but WCO and Superiority violations arise only when the position is filled
by long wh movement.
(45) a. Kitiwho.acc
szeret
loves
az
the
anyjaimother-his
ti?
b. *Kitiwho.acc
gondol
thinks
az
the
anyjaimother-his
hogy
that
Mari
Mari
szeret
loves
ti?
(E. Kiss 1989, 208)
160 Chapter 6
This suggests that we cannot associate the preverbal Focus position with
either A or A status; that is, we cannot call it an A or an A position, in-
dependent of when the movement takes place. In RT we need only set
things up in Hungarian so that the Focus position is accessible some time
before CP embedding takes place. There is no need to fix ahead of time
how a given position in SS will be filled.
6.3 Masked Scrambling
The worrisome thing about the last derivation posited above for Bulgar-
ian (see (31)) is that there is an ‘‘invisible’’ structure in which the inde-
pendent wh word is not superior to the rest. A similar situation arose in
the discussion of Japanese quantifier scrambling in section 6.2.2.3. Per-
haps ‘‘invisible’’ scramblings are not allowed. If so, then derivation (31)
will not occur, but another one will be allowed, as the scrambling in that
case is not invisible.
(46)
A ‘‘paradox’’ arises from having both A and A movements available
for the same ‘‘process.’’ The problem is that a sentence in which A move-
ment is supposed to have applied can always be viewed instead as the
outcome of an application of A movement, followed by the application
of A movement; the surface order will be the same, but the interpre-
tive e¤ects will be di¤erent. As we saw in (21), repeated here, in Serbo-
Croatian short scrambling precedes dependent-independent variable
fixing, whereas long scrambling follows it (and so reconstructs for it).
(47) a. Ko
who
je
aux
koga
whom
vidjeo?
saw
b. Koga je ko vidjeo?
c. Ko
who
si
aux
koga
whom
tvrdio
claimed
da
that t
je
aux
istukao?
beaten t
d. *Koga si ko tvrdio da je istukao?
But what prevents a derivation of (47d) in which first the two wh words
switch positions in the lower clause by short scrambling, and then the
Superiority and Movement 161
same wh words move to the A position in the higher clause, thus nullify-
ing Superiority e¤ects?
(48) D Structure! A scrambling! A movement(Of course, in RT scrambling is not classical movement; I put the matter
in classical terms here because the issue is not specific to RT.) If such a
derivation were possible, (47d) should be grammatical. We must prevent
A scrambling from applying to the wh words in the lower clause, or at
least prevent wh movement from applying to its output.
There is a subtlety in determining what would count toward making a
scrambling ‘‘invisible.’’ Certainly part of it has to do with whether the
surface string shows the scrambling order; if it does, then the scrambling
is certainly not invisible. However, there is another way in which a
scrambling, even one that did not manifest itself in the surface string,
could achieve visibility: it could induce some e¤ect in the interpretation.
In fact, visibility is a matter of interpretation anyway. The scrambling
is visible in the obvious sense if there is some sign of it in the phonological
interpretation; therefore, one could easily imagine that the semantic in-
terpretation could provide some sign as well, in the form of an e¤ect on
meaning.
The crucial case of this type would be the one in which long scram-
bling appeared to give rise to WCO repair, by virtue of a prior, ‘‘string-
invisible’’ short scrambling.
(49) CS: [NP1 NP2 V]S1PS: [NP2 NP1 V]S1SS: [NP2 [NP1 V]S1 ]S2
Scrambling takes place at both CS‘PS and PS‘ SS; the CS‘PS
scrambling is string invisible. If this derivation is allowed, then the crucial
question is, what relation does it bear to the derivation in (50), with which
it coincides in both CS and SS?
(50) CS: [NP1 NP2 V]S1PS: [NP1 NP2 V]S1SS: [NP2 [NP1 V]S1 ]S2
Although (49) and (50) are string indistinguishable, they might di¤er in
interpretation. The di¤erence would center on the interpretive properties
of PS and (under the assumptions we have made) would include the
bound variable dependencies that WCO governs. A single long scram-
bling will appear to reconstruct for such dependencies; but a short
162 Chapter 6
scrambling, followed by a long scrambling, will not. So the crucial ques-
tion is, if a language has both short and long scrambling, and the short
scrambling has interpretive e¤ects, are all long scramblings ambiguous?
In the cases examined in this book, it appears they are not. From this we
would tentatively conclude that the prohibition against ‘‘invisible’’
scrambling is a prohibition against ‘‘string-invisible’’ scrambling. How-
ever, I regard this as an open question, and it is entirely possible that the
correct answer is more complicated than the present discussion suggests:
it might, for example, depend on how evident the semantic e¤ect is. In
other words, there is no conclusion about invisible scrambling that fol-
lows from the central tenets of RT, and in fact a number of di¤erent
answers to the questions about it are compatible with those tenets. In
what follows I will explore some considerations suggesting that ‘‘string-
invisible’’ scrambling should not be allowed, but further research could
uncover a more complicated situation.
We can in fact observe the behavior of masked scrambling in English.
In the context of RT, scrambled orders (i.e., ones that deviate from TS
and later structure) are marked; and such deviation can be tolerated only
to achieve isomorphy somewhere else. But marked orders must be ‘‘visi-
ble’’; that is, there must be some way to reconstruct them. But if the re-
gion in which the marked order occurs has been evacuated, then that
evidence is gone; for example, once wh movement has taken place in (51),
no evidence remains to show which of the two orders was instantiated in
the lower clause. In such a case we assume the unmarked order, as it has
the lowest ‘‘energy state.’’
(51) a. whi . . . ti NP
b. whi . . . NP ti
(52) Assume Lowest Energy State
If there is no evidence for the marked order, assume the unmarked
order.
There is some evidence from English for such a supposition. The evi-
dence comes from the interaction of scrambling and contraction. The
known law governing contraction is (53), illustrated in (54).
(53) Don’t contract right before an extraction or ellipsis site.
(54) a. Bill’s in the garage.
b. Do you know where Bill is t?
c. *Do you know where Bill’s t?
Superiority and Movement 163
But because English has scrambling that can potentially move extraction
sites away from contractions, we can see how (53) interacts with such
scramblings.
The ‘‘normal’’ order for a series of time specifications within a clause
runs from the smallest scale to the largest.
(55) The meeting is at 2:00 p.m. on Thursdays in October in odd years
. . .
Any of these time specifications can be questioned.
(56) a. When is the meeting at 2:00 p.m. t? (Answer: on Thursday)
b. When is the meeting t on Thursday? (Answer: at 2:00)
Furthermore, the time specifications can be scrambled, up to ambiguity.
(57) The meeting is on Thursdays at 2:00 p.m.
Crucially, though, scrambling cannot be used to evade the restriction on
contraction.
(58) a. Do you know when the meeting is t on Thursday? (Answer: at
2 p.m.)
b. *Do you know when the meeting’s t on Thursday? (Answer: at
2 p.m.)
c. Do you know when the meeting is at 2:00 p.m. t? (Answer: on
Thursday)
d. Do you know when the meeting’s at 2:00 p.m. t? (Answer: on
Thursday)
e. *Do you know when the meeting’s on Thursday t? (Answer: at
2 p.m.)
(58b) clearly runs afoul of the trace contraction law (53); but why is (58e)
not a possible structure that would give the appearance that cases like
(58b) had evaded the law? (58e) must be eliminated, and a prohibition
against masked scrambling (52) looks like a promising means of doing
that. But again, I think it would be foolish not to explore more subtle
possibilities governing visibility.
6.4 Locality in RT
In chapter 3 the LEC was used to explain certain locality e¤ects, and in
particular the correlation between locality of operations and other prop-
164 Chapter 6
erties of operations. This naturally raises the issue of whether all locality
e¤ects can be so derived. In fact, not only scrambling is a¤ected by the
locality imposed by the LEC—wh movement is as well. As detailed in
chapter 3, wh movement cannot extract from structures that are not
embedded until after the level at which wh movement applies, and in fact
the islandhood of nonbridge verb complements was cited as an example
of that kind of explanation.
But if we accept the results of this chapter, there will be some obstacles
to reducing all locality to the LEC. Specifically, restrictions on wh move-
ment that fall under the traditional rubrics of Subjacency and the ECP
cannot be explained.
The Wh Island Constraint, for example, cannot be derived. (59) is a
typical Wh Island Constraint violation.
(59) *Whati do you wonder who bought ti?
Assume the LEC. As attested by the presence of the wh word in its
SpecC, the embedded clause is built up to the level of CP at the level at
which wh movement is defined—let’s say, SS; but if wh movement is
available at SS, there is no timing explanation for the ungrammaticality
of (59). If CP is present in the embedded clause, then it is also present,
and available for targeting, in the matrix clause.
Of course, one could supplement the LEC with more specific ideas
about how levels are characterized. For example, one could require that
all movement in a level applies before all embedding in a level; then tim-
ing would account for the Wh Island Constraint.
I am not at all convinced this is worthwhile. To begin with, there are
languages that are reported not to have a Wh Island Constraint; this
would at least tell us that the stipulation just mentioned was subject to
variation, an odd conclusion given its ‘‘architectural’’ flavor. We would
especially find ourselves in a bind if we were to accept Rizzi’s (1982)
conclusion that Italian has a wh island paradigm like the following:
(60) a. *whi . . . [wh . . . [that . . . ti . . . ]]
b. whi . . . [that . . . [wh . . . ti . . . ]]
That is, extraction from a that clause inside an indirect question is un-
grammatical, but extraction from an indirect question inside a that clause
is grammatical. Since both wh clauses and that clauses clearly involve CP
structure, they are introduced at the same level, and there is no way to
make this distinction with timing under the LEC. If it is ‘‘too late’’ to
Superiority and Movement 165
extract wh in (60a), then it is too late in (60b) as well, and so there is no
way to distinguish them. See Rizzi 1982 for examples and for an account
of how languages vary with respect to wh-island e¤ects.
I will tentatively conclude, then, that wh movement is subject to local-
ity constraints on embedding, beyond those predicted by RT.
Importantly, scrambling cannot be subject to constraints beyond those
RT imposes. That is because scrambling is not a rule operating within
any level, but arises as competing requirements of Shape Conservation
are played out. So it is important that scrambling not show any locality
conditions that cannot be reduced to the LEC and its e¤ect on timing.
From this point of view, the conclusions reached in this chapter
about multiple-wh questions are especially significant. Rudin (1988)
argues that the primary and secondary wh movements are di¤erent sorts
of movement—the di¤erence between wh movement and scrambling,
respectively. We would thus expect the primary wh movement to obey
Subjacency, and the secondary wh movements to obey only the strictures
imposed by the LEC.
Richards (1997) documents detailed di¤erences between the primary
and secondary wh movements that suggest this distinction might be cor-
rect. Interestingly, Richards’s own theory draws no distinction between
the movement of wh and the movement of other elements; they are all
instances of Move, which has a uniform (if spare) set of properties. In-
stead, Richards proposes what he calls a ‘‘Subjacency tax’’ theory of how
rules are governed by constraints: if several movements target the same
functional projection, the first movement obeys Subjacency, but the rest
of the movements are free to apply in defiance of Subjacency (the first one
having paid the ‘‘Subjacency tax’’). The tax notion exactly distinguishes
the first movement from the rest.
Consider, for example, the following cases in Bulgarian:
(61) a. *Koja
which
knigaibook
otrece
denied
senatorat
the-senator
[malvata
the-rumor
ce
that
iska
wanted
da
to
zabrani
ban
ti]?
b. ?Koj
which
senator
senator
koja
which
knigaibook
otrece
denied
[malvata
the-rumor
ce
that
iska
wanted
da
to
zabrani ti]?
ban
(Richards 1997, 240)
166 Chapter 6
The single complex-NP extraction of koja kniga in (61a) is ungrammatical
because of Subjacency; but in (61b) the same extraction causes only weak
unacceptability, because the primary extraction targeting the matrix
SpecC (of koj senator) obeys Subjacency. The movement of NP1 ‘‘pays
the Subjacency tax’’; NP2 is then free to move in violation of Subjacency,
which it in fact does in this example under reasonable assumptions. (See
Richards 1997 for the original formulation of this theory and extensive
examples.)
In the end, then, Richards’s theory delineates approximately the same
di¤erence between the primary and secondary wh movements that Rudin
(1988) proposed, and that is needed in RT; Richards simply derives that
di¤erence from his notion of the Subjacency tax.
We have already discussed examples that cast doubt on the view that
the two movements are the same kind of movement in the first place:
namely, the Bulgarian examples in which a secondary wh word in an
embedded clause does not move to its primary counterpart, but never-
theless obligatorily moves within its own clause ((16), repeated here).
(62) a. Kokwho.nom
tvrdis
claim.2sg
[da
that
koga
who.acc
tk voli]?
love.3sg
b. *Kok tvrdis [da tk voli koga]?
(Konapasky 2002, 97)
Such examples suggest that the di¤erence between the primary and the
secondary wh words has nothing to do with wh attraction. If that were so,
the Subjacency tax theory would be irrelevant, as only a single wh word
would ever be moved to SpecC anyway.
6.5 Conclusion
In this chapter I have pursued the notion that scrambling and wh move-
ment are fundamentally di¤erent: wh movement is an intralevel move-
ment rule, and scrambling is simply the misrepresentation of one RT level
by the next level.
I have argued in particular that in multiple wh movement languages
only one wh expression undergoes wh movement, and the rest undergo
scrambling, essentially Rudin’s (1988) conclusion. I have argued that
assimilating scrambling to wh movement is a mistake, and that in partic-
ular the theory proposed by Richards (1997) leaves significant questions
unanswered.
Superiority and Movement 167
After citing problems for the views of others, especially Richards, I
think it is only fair to expose a problem with the RT formulation of
multiple movement. The problem arises in trying to state precisely what
occurs at what levels and to correlate that with conclusions drawn from
other languages. For Bulgarian in particular, the problem manifests itself
as a conflict between the ordering of scrambling and its locality. Bulgar-
ian wh scrambling is a long-distance phenomenon (at least in the dialects
that allow it; see the discussion surrounding (16)), penetrating CPs in
particular.
(63) Koj
which
profesoriprofessor
koj
which
vaprosj tiquestion
iska
wanted
[da
to
kaze
say
molitva
prayer
[predi
before
da
that
obsadim
we-discussed
tj]]?
‘Which professor wanted to say a prayer before we discuss which
issue?’
(Richards 1997, 109; from R. Izvorski, personal communication)
So wh scrambling must occur after CP embedding.
At the same time I have supposed that wh scrambling occurs before wh
movement, since this explains, in the context of RT, why the independent
variable is always exterior. Combining these conclusions with the finding
of previous chapters that wh and CP embedding occur in the same level
(say, SS) results in the following ‘‘ordering’’ paradox (x > y means ‘y
happens before x’):
(64) a. wh movement > wh scrambling
b. wh scrambling > CP embedding
c. CP embedding ¼ wh movement
By one consideration, then, wh scrambling is strictly ordered before wh
movement; by another, they occur at the same level, exactly the level at
which CP embedding occurs.
The only way to dissolve paradoxes is to attack their assumptions until
one falls. The easiest one to attack here is the identification of the level at
which CP embedding takes place and the level at which wh movement
takes place. Ordering either wh scrambling or wh movement before CP
embedding is out of the question; in particular, it is incoherent, as it is
impossible to extract from something that is not embedded yet. The
ordering we need is CP embedding, wh scrambling, wh movement. There
is no paradox in this order, so long as there are further levels after the
168 Chapter 6
level of CP embedding. We simply don’t have the means to independently
identify the other levels.
The other possibility would be to develop some means of allowing wh
scrambling to occur after wh movement, but in such a way that a sec-
ondary (non-D-linked) wh expression could not be scrambled above the
primary one. The latter task is daunting because we have seen that such
scrambling above the primary wh expression is possible in certain lan-
guages: witness the long topicalization of wh words in Japanese, for ex-
ample, discussed in chapter 5. I will leave the problem unresolved.
Superiority and Movement 169
This page intentionally left blank
Chapter 7
X-Bar Theory and ClauseStructure
Taken together, this chapter and the next provide what I would tenta-
tively call the RT model of phrase structure, inflection, and head-to-head
phenomena. Even taken together, they are too ambitious for their length,
as they propose a theory of phrase structure that incorporates (i.e., elimi-
nates) both overt and covert head movement, and an account of the
morphology/syntax interface (‘‘morphosyntax’’) that presumes to forgo
‘‘readjustment’’ rules.
The two chapters are interdependent in that this chapter introduces
the definitions of phrasal categories, the mechanisms responsible for
agreement and Case assignment, and the relation between these and
the inflectional categories marked on the verbal head, and the next chap-
ter proposes a theory about how the inflected verbal head is spelled
out. Beyond that, this chapter uses mechanisms that are not fully devel-
oped or justified until the next chapter: specifically, the marking of the
complement-of relation on category nodes (using the sign ‘‘>’’), the no-
tion of reassociation, and the particular theory of multiple exponence.
Before turning to these matters, I would like to outline why I think
there is an RT model of phrase structure that is di¤erent from the stan-
dard treatment, and to briefly suggest how it is di¤erent. The phenomena
explored here are accounted for in the standard model by a combination
of X-bar theory and movement governed by the Head Movement Con-
straint (HMC; Travis 1984). The HMC is commonly understood to be a
subcase of Relativized Minimality (Rizzi 1990). Relativized Minimality
says that locality conditions are parameterized and that the significant
subcases correspond to the A, A, and V (or head) subsystems. But in the
past several chapters I have suggested that the A/A distinction should be
generalized to, or dissolved into, a more general parameterized distinction
(A/A/A/A) defined by the RT levels, and that the locality associated with
each of these is determined by the level in which it is defined, in that it is
determined by the size of the structures that are assembled at that level.
But now the Relativized Minimality series ‘‘A/A/head’’ becomes awk-
ward. There is no natural place for ‘‘head’’ in the new generalization.
This suggests that the locality of head movement needs a separate ac-
count, not related to the A/A distinction or its generalization in RT.
V cannot be located in any particular level, but in fact occurs in every
level; it di¤ers in this respect from the entities A/A/A . . . and so cannot
be assimilated to them. In fact, verbs, and heads in general, are inde-
pendently parameterized by the RT levels. See the more extensive discus-
sion of Relativized Minimality in section 7.3.
In the following discussion the sign ‘‘>’’ indicates the complement-of
relation. In this chapter and the next, we will see that this relation always
holds between two elements, but in fact elements of quite diverse types,
including at least the following:
(1) a. a word and a phrase (saw > [the boy]NP)
b. a morpheme and a morpheme ( pick < ed )
c. a word and a word (V > V)
d. a feature and a feature (Tense > AgrO)
This is further complicated by the fact that words make up phrases,
features make up word labels, and so on, and there must be some rela-
tion between the complement-of relations of complex forms and the
complement-of relations of their parts. What follows, in this chapter and
the next, is a calculus of these relations that seems to me to be the most
appropriate for RT.
These two chapters flesh out a view of the relation between syntax and
morphology that I have put forward in a number of places, particularly in
Williams 1981a and 1994a,b and in Di Sciullo and Williams 1987. In
those works I viewed the Mirror Principle as arising from the fact that
words and phrasal syntax instantiate the same kinds of relations. As
argued in Di Sciullo and Williams 1987, the Mirror Principle is nothing
more than the compositionality of word formation; that is, [ pickþ -ed ]Vas a morphological unit is equivalent to [did pick]VP as a syntactic unit.
Both instantiate the complement-of relation between T and V, but one
does it in a head-final ‘‘word’’ structure with its properties, and the other
in a head-initial ‘‘phrase’’ structure with its own di¤erent properties.
These two chapters attempt to provide a more explicit calculus to back up
that claim.
This view, sometimes called lexicalism, has been confused with another
view, one that goes back to Den Besten 1976 and before that to Genera-
172 Chapter 7
tive Semantics, related to ‘‘deep versus surface’’ lexical insertion. The no-
tion ‘‘surface’’ (or ‘‘late’’) lexical insertion of course only makes sense in a
derivational theory. But even in a nonderivational theory we can ask
what the relation is between the form of the word and the environment it
appears in. In earlier work I took the view that the lexicon contains its
own laws of formation, sharing some features with, but di¤erent from,
the laws of syntax, and that the ‘‘interface’’ between the lexicon and syn-
tax could be narrowed to exactly this: the lexicon produces lexical items
with their properties, and syntax determines the distribution of such
words solely on the basis of their ‘‘top-level’’ properties, not on the basis
of how they came to have those properties during lexical derivation. I
think this view is vindicated by the nearly inevitable role that ‘‘lexicalism’’
plays in RT.
I think the question of whether insertion is ‘‘late’’ or ‘‘early’’ depends to
such an extent on the particular theory in which it is asked that to raise it
in the abstract is useless. For example, Generative Semantics and Den
Besten’s (1976) theory are quite di¤erent frameworks, so di¤erent that
each one’s assumption of ‘‘late insertion’’ can hardly be seen as support-
ing it in the other. But I do think that the above-mentioned question
about ‘‘lexicalism’’ can be fruitfully raised as a general programmatic
question.
7.1 Functional Structure
I will propose here an X-bar theory in which a lexical item directly ‘‘lex-
icalizes’’ a subsequence of the functional hierarchy, where by functional
hierarchy I mean the sequence of elements that make up clause structure:
T > AgrS > AgrO . . . Aspect > V. In the construction of a clause, the
entire functional hierarchy must be lexicalized; however, there is more
than one way to accomplish that. For example:
(2)
X-Bar Theory and Clause Structure 173
In the theory to be presented, it is not just that was in (2a) bears some
relation to the bracketed subsequence T > AgrS; rather, it is T > AgrSin that ‘‘T > AgrS’’ is its categorial label. All types of elements—
morphemes, words, compounds, phrases—can realize subsequences; and
no element can realize anything but a subsequence. In this theory ‘‘lexi-
calizing a subsequence’’ is not a derived property of lexical items; rather,
it is simply what lexical items do.
I suppose the biggest mystery about language that these proposals turn
on is where the functional hierarchy comes from in the first place—why
there is such a thing as Cinque’s (1998) functional hierarchy. Put dif-
ferently, why is intraclause embedding so fundamentally di¤erent from
interclause embedding? The question is acute in all frameworks, but has
not been seriously addressed. I am not innocent. The tradition of solving
syntactic problems by introducing new fixed levels of internal clause
structure includes my own dissertation (Williams 1974), which sought to
explain transformational rule ordering by appealing to four levels of in-
ternal clause structure (and I secretly thought there were six) and arguing
that the subparts of that structure had independent existence (as small
clauses), but without asking where the structures came from.
To me, the mystery is this: why aren’t all embeddings on a par, each
phrase’s properties being determined by its head, or its subparts, in the
same way? In such a theory the transition from V to its complement CP
would be no di¤erent from the transition from T to AgrS or from AgrS to
AgrO. But this simplest state of a¤airs is not what we find, and in ac-
knowledgment of the mysterious distinction I will refer to them as com-
plement embedding and functional embedding, respectively.
A related mystery is, why is the internal structure so rigid? Cinque
(1998) has identified over a hundred steps from the top to the bottom of
a clause. In fact, the clause might not have the purely linear structure
Cinque suggests. It at least seems to have some adjunct ‘‘subcycles,’’ as
shown here:
(3) a. John let us down every month every other year every decade . . .
b. John let us down on every planet in every galaxy . . .
c. [[[VP] XP] XP] . . .
d. ?John let us down because Mary was there because he was sick
. . .
Recursion of time and place is possible, as schematized in (3c), but per-
haps not of causes. The obvious nesting of meanings in such subcycles
174 Chapter 7
suggests that the whole structure itself might be explicated in terms of
meaning, but nothing substantive has been forthcoming, and I have
nothing to add myself.
At any rate, what follows is a theory of what the syntax of expressions
that express a single functional hierarchy can look like, and it executes
the idea that all items, including all lexical items, lexicalize (or realize)
subsequences of the functional hierarchy. As interesting as I think the
consequences are, I must warn in advance that my proposals do not
address this mystery of why complex phrases with fixed functional struc-
ture exist in the first place. I hope that whatever the solution to this mys-
tery turns out to be, it will be compatible with what follows, and so I will
take the existence of the functional sequence and its linear structure as
axiomatic.
7.2 An Axiomatization of X-Bar Theory
Consider the complement embedding of the direct object NP under V.
Full NPs (or DPs, whichever turns out to be right) may not exist until SS;
at least, some of their components, such as relative clauses, do not exist
until then. Nevertheless, TS contains a ‘‘primitive’’ version of an SS NP,
CS a more developed version, and so on; and these are in correspondence
with one another under Shape Conservation.
(4) a. TS: [amalgamatev holdingsnp]vp‘
b. CS: [amalgamateV [his holdings]NPacc ]
(The introduction of adjuncts, in this case his, will be taken up later.) To
propose a term, there is a shadow of the CS NP his holdings in TS, and
that shadow is its correspondent under representation: the np holdings in
TS.
Functional embedding, on the other hand, introduces material into a
tree at a later level that has no shadow or correspondent in TS. Suppose,
for the purpose of exposition, that T(ense) is introduced in SS; then the
surface structure in (5a) will represent the theta structure in (5b).
(5) a. CS: [amalgamateV [his holdings]NPacc ]VP‘
b. SS: [amalgamate[T>V] [his holdings]NPacc ][T>V]P
T in SS clearly has no correspondent in CS.
T is not an independent node in the surface structure or any other
structure; rather, it is a feature that has been applied to the projection of
X-Bar Theory and Clause Structure 175
V. (Shortly I will explain how such structures arise and what expressions
of the form [x > y] mean.) Functional embedding can also introduce a
lexical head, like complement embedding. In this version, auxiliaries re-
alize functional elements.
(6) a. CS: [amalgamate [his holdings]NPacc ]‘
b. SS: [willT [amalgamate [his holdings]NPacc ]VP]TP
Complement embedding, on the other hand, has no analogue of (5b); it
is always done by explicit subordination to an overt head. That is to say,
the main verb is never realized as an a‰x on its direct object. This is be-
cause the construct consisting of the a‰x plus the direct object would
have to have a label, and that label would violate the second axiom in the
formalism I will provide shortly.
Important for the present discussion is that in neither style of func-
tional embedding (feature or full word) does a shadow of the embedding
element (T in (5), will in (6)) appear in TS.
In this section I want to develop the rationale for the distinction be-
tween complement embedding (4) and functional embedding (5). It is
central to the way in which RT and minimalist practice di¤er, and the
distinctive consequences of RT stem from it. The discussion culminates in
an axiomatization of X-bar theory.
Although complement and functional embedding di¤er in the funda-
mental way just mentioned, they both are compatible with the principle
of Shape Conservation, which holds of the successive members of a deri-
vation regardless of what kind of embedding is involved.
Consider the mapping in (4); it puts in correspondence the elements in
the theta structure and the Case structure, and also their relations, in the
following sense. First, the TS ‘‘verb’’ amalgamate and the CS ‘‘verb’’
amalgamate are in what we might call ‘‘lexical’’ correspondence; that is,
these are two faces of the same lexical item. A lexical item has tradition-
ally been understood as a collection of di¤erent forms of the same thing;
the usual list of the forms includes syntactic form, phonological form, and
semantic form. I would expand that list to include all of the RT levels,
but the idea is the same: a lexical item is the coordination of its con-
tributions to all the levels it participates in. Thus, for lexical items the
representational mapping conserves the relation ‘‘x is a ‘face’ of the lex-
ical item y.’’ Second, the mapping also conserves the complement-of re-
lation: in TS holdings is in a theta relation to amalgamate, and in CS it is
in a Case relation, but since these are the complement relations of the two
176 Chapter 7
respective representations, the correspondence is again conservative. And
third, the head-of relation is conserved: heads are mapped into heads.
We can formalize this conservation somewhat in terms of the notion
commutation. If a relation is preserved, we can say that it commutes with
the representation relation. For example, we will say that the head-of re-
lation commutes with the representation relation, in that the following
relation will always hold:
(7) The head of the representation of X ¼ the representation of the headof X.
Schematically:
(8) [amalgamatev holdingsnp]vp lhead-of amalgamate
b representation b representation
[amalgamateV [his holdings]NPacc ] lhead-of amalgamate
Construing representation this way allows a more abstract characteriza-
tion of what is conserved than simply geometrical congruence, though
geometrical congruences will certainly be entailed by it.
I have spoken sometimes of a subpart of a structure at one level as the
‘‘correspondent’’ or ‘‘shadow’’ or ‘‘image’’ of a subpart of a structure at a
di¤erent level. The shape-conserving mapping between levels warrants
such locutions. The shape-conserving mapping is defined as a mapping of
one whole level (i.e., set of structures) onto another whole level. As a part
of that process, it derivatively maps individual structures at one level to
individual structures at another level. Further, if it conserves the part-of
relation, then it will map parts of individual structures at one level to
parts of individual structures at the other level. In (8), for example, hold-
ings is a part of the TS :VP amalgamate holdings. That TS :VP is mapped
to an XS:VP amalgamate his holdings; in virtue of that mapping, and its
conservation of the part-of relation, TS :holdings is also mapped to
XS :his holdings. That is, TS :holdings is the TS correspondent, or
shadow, of XS :his holdings under the shape-conserving mapping. If the
mapping were not shape conserving, then it would be impossible to know
what the correspondents of an XS :phrase were. And in fact because the
mapping allows deviations and therefore is not fully shape conserving and
also because new elements enter at each level, problems may well arise in
some cases in determining what is the correspondent of what. But in
general there is a coherent and obvious notion of the earlier correspon-
dents of a phrase.
X-Bar Theory and Clause Structure 177
Complement embedding always involves one lexical freestanding head
embedding a phrase as its complement. But, as already mentioned, func-
tional embedding is often, though not always, signified not by a lexical
item, but simply by a feature on the head, as in (5), where the feature
controls some aspect of the morphology of the head. We might call this
kind of embedding a‰xal embedding, since its sign is usually an a‰x
(perhaps silent) on the head. As with complement embedding, we want to
understand how this functional embedding relation conforms to Shape
Conservation. I will consider two ways of making sense of this kind of
embedding.
In standard minimalist practice, stemming from Travis 1984, a‰xal
embedding is accomplished by head-to-head movement, wherein the main
verb is generated in a phrase subordinate to the a‰x (or its featural com-
position) and then moves to the a‰x. The successive movements of the
verb account for the Mirror Principle, since if the movement is always,
for example, left adjunction, the order in which the a‰xes (now, su‰xes)
appear on the verb will correspond to their hierarchical embedding in the
structure that the verb moved through.
(9) [[[[Vþ af1]þ af2]þ af3] . . . [ . . . tVþaf1þaf2 . . . [ . . . tVþaf1 . . .[ . . . tV]V]F1P]F2P]F3P(where afi bears features Fi)
Such a proposal accounts for the Mirror Principle by building morpho-
logical a‰xation directly into the syntactic derivation in a particular way.
Although such a view still has explicit adherents (see, e.g., Cinque 1998),
most researchers have retreated from this strongly antilexicalist view.
Unfortunately, the retreat has usually involved a weakening of the expla-
nation of Mirror Principle e¤ects.
The account I will present here will divide the problem into two parts:
first, how does X-bar theory regulate the information on phrase labels?
And second, how does morphology realize those labels when they occur
on terminal nodes? Despite being more complicated than the hybrid
Cinque-style theory in having two separate components, phrasal and
morphological, it succeeds in capturing the full Mirror Principle e¤ects,
because it involves nothing but X-bar theory and direct morphological
realization. In other words, there is nothing more relating the two—no
readjustment rules, no movements of any kind, and therefore no locality
conditions of any kind, just the calculus itself. The practical problem with
178 Chapter 7
locality conditions and readjustment rules is that they lead to ‘‘instantly
revisable’’ theories. It therefore seems to me that the strongest possible
theory lies down the path that begins by separating phrasal syntax and
lexical syntax from one another.
In the place of head-to-head movement for phrasal syntax, I propose
that a feature can directly take a phrase as a functional complement, and
when it does, the feature is realized on the head of the phrase by the in-
teraction of Shape Conservation and the definition of head.
If we want to embed a phrase under a full lexical item H, the most ob-
vious and simplest way to do so is by concatenation: that is, concatenate
the head and the phrase, and name the resulting phrase after the head.
(10) Lexical Headþ Some Phrase ¼ [Lexical Head� Some Phrase]LHPBut suppose that instead we want to subordinate a phrase to a feature, as
in (11). The simplest way is to add the feature to the feature complex of
the phrase itself. Then the feature will ‘‘percolate’’ to the head. The per-
colation is in fact forced by representation—in particular, by the com-
mutation of representation and the head-of relation.
If the feature did not percolate (downward) in SS, then the SS :VP would
not count as having a head (since its feature composition would be dif-
ferent from that of its V), and this would break the commutation dia-
gram. Note that the representation relation is not symmetric; the surface
structure has ‘‘more information’’ than the Case structure. This reflects
the general asymmetry of the representation relation already discussed
and does not alter the conclusion about percolation.
The notation X > Y used in (11) and elsewhere indicates the
complement-of relation; it means ‘X takes Y as a complement’. For ex-
ample, T > V is what results from adding T to the featural complex I
have abbreviated by V. In other words, the label itself is structured by
the complement-of relation. This is meant as an alternative to the usual
notion that a node is a set of features, with no order or relation among
them. In the account I am proposing here, nodes are features in ‘‘comple-
ment chains’’ of the kind that can be symbolized as A > B > C > D > E.
X-Bar Theory and Clause Structure 179
(See chapter 8 for more on this, and for some theorems about the latent
descriptive power of this notation.)
The notation gives structure to the set of features. That structure makes
possible a simple axiomatization of X-bar theory, at least insofar as
X-bar theory concerns the well-formedness of phrase labels in trees—
instantiating in particular the head-of relation and a feature percolation
mechanism.
Below are two trees that will fall under the axiomatization. (12a) is an
example of a simple clause with a single main verb. (12b) is an example of
a clause with an auxiliary verb and a main verb. The axioms to be dis-
cussed will be illustrated with respect to (12a).
(12)
(12a) illustrates two properties we want the system to have. First, each
node is structured with respect to the complement-of relation. Second,
180 Chapter 7
there are only three relations that can hold between a mother node and a
daughter node:
(13) Axiom 1 (The juncture types of X-bar theory)
There are just three juncture types:
a. mother node ¼ X > daughter node (embedding)
b. daughter node ¼ mother node > X (satisfaction)
c. mother node ¼ daughter node (adjunction)
Case (13a) licenses embedding the daughter node under X at the mother
node. For example, in tree (12a) T embeds AgrS at the top. Case (13b)
licenses the ‘‘satisfaction’’ of features under agreement; for example, in
tree (12a) AgrO is discharged or ‘‘checked’’ by the direct object, as illus-
trated by the relation of [T > AgrS] to its daughter [T > AgrS > AgrO].
Finally, case (13c) licenses adjunction structures, where mother and
daughter nodes are identical; this is illustrated in tree (12a) by the two
Adv nodes.
On this account agreement is strictly local. The AgrS feature percolates
as far as it likes, except of course that it can be ‘‘checked’’ only by a fea-
ture, and only when it is peripheral in the label, and it must be checked by
a sister to the label.
I think Axiom 1 is in fact X-bar theory itself. It tells what form suc-
cession of heads must take, thereby defining the notion ‘‘head’’; and at
the same time it defines the permissible percolations of features. But the
structures found in natural language are also defined by that previously
discussed mysterious condition on functional sequences, which I will call
the Pollock-Cinque functional hierarchy (PCFH).
(14) Axiom 2 (PCFH)
There is a universal set of elements (T, AgrO, AgrS, . . . , V) that are
in a fixed chain of complement-of relations:
(T > AgrS > � � � > AgrO > V)
Labels must be subsequences of this hierarchy.
Labels in trees must conform to both Axiom 1 and Axiom 2. The
structures admitted by Axiom 1 filtered by Axiom 2 turn out to be just
the right structures; that is, they turn out to be structures like (12a), and
most other possibilities are left out.
Axiom 2 guarantees that a lexical item, or in fact any element, whether
simple or derived, must lexicalize a subsequence of functional structure.
X-Bar Theory and Clause Structure 181
The ‘‘operations’’ of embedding and satisfaction defined in Axiom 1 pre-
serve the subsequence property: they add to labels and remove from them
only at the ends, so that going up the projection, the labels on the heads
will vary smoothly from V to C, never departing from subsequencehood.
Axiom 1 in RT accomplishes much of what is done by (covert) verb
movement (or head-to-head movement) in standard minimalist practice.
As mentioned earlier, head-to-head movement captures at least part of
the mirror relation that exists between word structure and phrase struc-
ture. Axiom 1 accomplishes the same work by limiting successions be-
tween hierarchically adjacent nodes to pairs that di¤er only at the top
(13a) or the bottom (13b) of the label; this has the provable e¤ect that the
label on the very lowest node bears a mirror relation to the succession of
phrases that dominates it. The fact that every pair of labels must meet the
stringent conditions of (13) is comparable to the restriction, in minimalist
practice, that head-to-head movement is extremely local. The admissibil-
ity of adjunct junctures (13c), which leave the label unchanged, corre-
sponds, in minimalist practice, to the feature of Relativized Minimality
that makes certain adjuncts invisible to head-to-head movement.
Axioms 1 and 2 account for what the syntactic structures can look like,
but do not say how labels are spelled out. The spell-out of labels is the
topic of chapter 8, where the morphological interpretation of ‘‘>’’ is
taken up.
The role of X-bar theory in RT, then, can be summarized as follows.
There is a relation ‘‘head-of’’ that holds in all levels and participates in
the correspondences between the levels under Shape Conservation. There
are two kinds of embedding: functional embedding and complement
embedding. Functional embedding occurs between levels, because some
elements (e.g., T) are simply not defined for levels earlier than PS in the
version of RT given in chapter 4, for example. A functional element may
be introduced as a free lexical item or as a feature. In either case it sub-
ordinates another phrase; if it is a feature, it is added to the label of the
phrase it subordinates, and that added feature propagates down to the
head to preserve the head-of relations that the structure enters into.
With this more specific understanding of how X-bar theory operates in
the architecture of RT, I would like to return to the discussion in chapter
3 in which I suggested that the embedding of di¤erent kinds of Ss occurs
at di¤erent levels, and that locality and clause union e¤ects of the array of
small clause embedding types can be made to follow from that arrange-
ment. The specific problem I want to address is that I risk inconsistency
182 Chapter 7
with that earlier conclusion if I now say that NPs have shadows in TS,
but Ss (at least some—for example, tensed Ss) do not. Furthermore, if Ss
do have shadows in early structure comparable to the ones that NPs
have, then the LRT correlations laid out in chapters 3–5 are jeopardized
in a way I will explain shortly.
The main reason for saying that the head of an NP appears as a
shadow in TS is for selection, and we do find tight selection between the
verb and the head noun of the direct object. On the other hand, there is
no selection whatever between the matrix verb and the verb of a that
clause. This is ordinarily understood as resulting from the fact that that is
the head of the clause, and the matrix verb selects the head. Although this
answers the point about clauses, it raises a problem for the DP theory of
NPs (e.g., Abney 1987): if D is really the head of the direct object, then it
is hard to see why there is selection between V and the N beneath D.
In fact, the di¤erence between NP and S is even more extreme: the
main verb does not even select the tense, or for that matter the finiteness,
of the embedded complement. Grimshaw (1978) made this point clearly
when she showed that when a verb selects wh, it cannot select even the T
value of the IP beneath wh (much less its main verb); consequently, any
verb that selects wh automatically and inevitably takes both finite and
infinitive complements.
(15) I know whythe bird sings
to sing
� �.
This means that the apparent selection for T shown by most verbs must
be mediated.
(16) I know thathe left
*to leave
� �.
That is, know selects that and that selects [þfinite]. Some di‰culties ariseon this view; for example, some predicates seem to be able to determine
the subjunctivity of that clauses.
(17) a. It is important that he be here.
b. *It is known that he be here.
In addition, sequence-of-tense phenomena, although not involving selec-
tion by the main verb, do show that that is not absolutely opaque, since
they link main and embedded T specifications. Despite these problems I
will assume that Grimshaw’s conclusion is essentially correct, and that
verbs select only for C.
X-Bar Theory and Clause Structure 183
It is a lexical fact about the complementizer that in English that it, un-
like wh, is restricted to finite complements. Overt complementizers in
other languages are not so restricted, taking both finite and infinitival
complements; and in fact English whether does so as well.
(18) I don0t know whetherto go
he went
� �.
In RT the di¤erence in the behavior of NPs and Ss with respect to
selection will follow from the fact that NP will have its head N as its
shadow in TS, whereas a that clause will have only that, the selected head,
as its shadow in TS. This is exactly what we would expect if that were the
head of the that clause and N were the head of the NP. On this view what
is special about that is that its complement (TP) is not defined until SS,
because T itself is defined only at SS.
This conclusion gives up the DP hypothesis of NPs, but for a good
reason: the obvious di¤erence in selection between NPs and Ss. Propo-
nents of the DP hypothesis (Abney (1987), and others) have taken pains
to develop mechanisms and definitions that permit selection between the
matrix verb and the N head of NP (inside DP), but not in a way that
draws any distinction between NP and S. As a result, the hypothesis sug-
gests that the same selection will be found with Ss; but it is not—selection
by the main verb stops with that for CP clauses. Moreover, while that
is selected by verbs, as suggested by the CP hypothesis for clauses, the D
of a DP is never selected by verbs; if a verb takes a DP at all, then it
takes the full range of determiners, with some completely explainable
exceptions.
In RT, then, there is a fundamental di¤erence between the embedding
of NPs and the embedding of Ss: an NP complement is embedded in
TS, and in all subsequent levels; S embedding, on the other hand, is
distributed across the RT levels depending on what kind of S is being
embedded. Baker (1996) o¤ers some evidence for treating NP and S
embedding in sharply di¤erent ways. He shows that in polysynthetic lan-
guages NP arguments do not occupy theta or Case-licensing positions;
rather, what appear to be the expression of NP arguments are actually
adjuncts. The arguments involve standard binding-theoretic tests for
constituency. S complements, on the other hand, are embedded as argu-
ments exactly as they are in English; again, standard binding-theoretic
arguments involving c-command lead inevitably to this conclusion.
Actually, there is a version of RT that o¤ers the possibility of having it
both ways. In this version NPs could be N-headed in TS and D-headed in
184 Chapter 7
SS, and clauses would be that-headed in both levels, thereby preserving
their di¤erent selectional behavior. I will not pursue this possibility here,
because it threatens to undermine the LRT correlations of chapter 3, in
the following way. If correspondence across levels does not respect cate-
gories, as the NP‘DP correspondence would not, then possibilities
arise that defeat the LRT correlations. Suppose, for example, that an IP
is embedded beneath V at an early level (somewhere before SS—say, PS)
for ECM, raising, and obligatory control constructions, as suggested in
chapter 3. Various clause union e¤ects that depend on the absence of CP
structure (e.g., obligatory control) could take place there; then the IP
could ‘‘grow’’ a CP through correspondence with an SS structure; in
other words, an SS :CP would be put in correspondence with the PS : IP.
The SS :SpecC could then be the target of wh movement. We would then
have derived obligatory control across a filled SpecC, exactly contrary to
the prediction outlined in chapter 3. The following illustrates the deriva-
tion just described, with (19a)‘ (19b):
(19) a. PS: NPi [V [NPi . . . ]IP] (obligatory control established)
b. SS: NPi [V [wh [NPi . . . ]IP]CP] (obligatory control preserved, wh
movement)
The straightforward way to avoid this defeat of the LRT correlations is
to prevent correspondence under Shape Conservation where the cate-
gories are not homogeneous: [[ . . . ]IP . . . ]CP cannot be a representation of
[ . . . ]IP. The only ‘‘growth’’ that is allowed is growth that preserves the
category, essentially adjunction. This would be a feature of the Shape
Conservation algorithm, which unfortunately is still under development.
But if this feature survives further investigation, then NP cannot become
DP under shape-conserving ‘‘correspondence.’’ We could still maintain
that a TS :NP could be embedded in SS under a D. However, since it
would not have had any previous communication with the V that the DP
is embedded under, we would need, as Abney did, to make D transparent
to selection so that V could directly see NP beneath D in the structure.
(20) [V [ [ ]NP]DP]
7.3 Relativized Minimality
One must pause soberly before putting aside one of the most fruitful ideas
of modern linguistics, but if the theory I have developed thus far is taken
seriously, Relativized Minimality must be seen as a pseudogeneralization.
X-Bar Theory and Clause Structure 185
Although there is some correspondence between head movement and
the mechanisms proposed here, a close examination of the context in
which head movement operates reveals decisive di¤erences. The real
content of a theory with head-to-head movement lies in the constraints
limiting the movement, as otherwise the theory says, ‘‘Anything can move
anywhere.’’ The best candidate for the theory of the bound on head
movement is the Head Movement Constraint (Travis 1984), and in par-
ticular, the generalization called Relativized Minimality (Culicover and
Wilkins 1984; Rizzi 1990).
The main problem with Relativized Minimality in the context of RT
was stated earlier in this chapter. The generalization of the A/A distinc-
tion to the A/A/A . . . distinction and the rationalization of the properties
of each type in terms of its association with a level under the LEC leaves
no room for heads, as heads themselves have no privileged relation to
any of the levels, occurring in all of them. But if head movement is not
covered under anything like Relativized Minimality, then some other
account of the localities it exhibits must be sought.
Other considerations point in the same direction. For the A/A/A . . .
series, Relativized Minimality is weak compared with the locality that
follows from the RT architecture in that it permits rule interactions that
cannot arise in RT. For example, Relativized Minimality permits head
movement over SpecCP, to the matrix verb.
(21) a. [Vþ C [wh tC IP]]b. *I wonder-that [who tthat Bill saw t]
The reason is that according to Relativized Minimality, di¤erent systems
—head, A, A—do not interfere with each other; they only self-interfere,
in the sense that a movement of type X will be bounded only by occur-
rences of targets of type X. But (21) is not possible in RT with the LEC.
It remains to find out whether languages instantiate the type of structure
illustrated in (21), but the prediction is clear. Likewise for cases in which
an A movement bridges an A specifier—the latter includes what has been
called superraising, as discussed in section 3.1, and is again not possible in
RT on principled grounds.
Another di¤erence between head movement governed by Relativized
Minimality and the account of inflection and agreement suggested here lies
in the di¤erent ways that one can ‘‘cheat’’ in the two theories. I say ‘‘cheat
in,’’ but I should probably say ‘‘extend’’: di¤erent theories allow di¤erent
sorts of ‘‘natural’’ extension, and I think a theory should be evaluated on
the basis of whether its natural extensions would be welcome or not.
186 Chapter 7
There are two obvious ways to cheat with head movement in Rela-
tivized Minimality, and in fact both have been exploited, or should I say,
explored. One is to extend the number of self-interfering systems, to four
(or more); in the limit, the theory reduces to the null theory, as in the
limit every element belongs to a di¤erent category from every other ele-
ment, and so nothing interferes with anything. The other obvious way to
cheat is to sidestep the locality condition by chaining a number of little
moves together into one large move; in the context of head movement
this is called excorporation. Both are standard.
For example, in Serbo-Croatian we find the following evidence of verb
clustering:
(22) a. Zaspali
slept.prt
bejahu.
aux.3pl
‘They had slept.’
b. *Zaspali [Marko i Petar] bejahu.
c. [Marko i Petar] bejahu zaspali.
(Konapasky 2002, 233)
The auxiliary and the participle cannot be separated, suggesting that they
form a tight cluster, one naturally seen as arising from head movement of
the lexical verb to the participle. But when there are two participles (byl
and koupil in the following related Czech examples), the second seems
able to hop over the first.
(23) a. Tehdy
then
bych
aux.1sg
byl
was.prt
koupil
bought.prt
knihy.
books.acc
‘Then I would have bought books.’
b. Byl bych tbyl koupil knihy.
c. *Koupil bych byl tkoupil knihy.
(Konapasky 2002, 233)
This appears to be a case of ‘‘long head movement.’’ There are two ways
to extend Relativized Minimality to accommodate this phenomenon.
First, we might increase the number of self-interfering categories, the
course taken by Rivero (1991), who proposes to account for the pattern in
(23) by saying that bych is functional, while byl and koupil are lexical: in
(23b) lexical grammatically hops over functional, whereas in (23c) lexical
ungrammatically hops over lexical. Second, we might say that clustering
takes place in the usual way, but then excorporation accounts for the
possibility of (23b) (and details about how it operates account for (23c));
X-Bar Theory and Clause Structure 187
this is the course taken by Boskovic (1999). See Konapasky 2002 for
summary discussion and critique.
In the theory presented here, where there is no head movement, and
no Relativized Minimality, these extensions are not available. The inad-
missibility of excorporation will follow from theorems about reassocia-
tion given in chapter 8. And separating heads into two types will make no
di¤erence to the system under discussion here, so long as both types are
part of the calculus of complement taking.
But in fact the X-bar theory proposed here has its own way to accom-
modate such facts. I will postpone my own analysis of the Serbo-Croatian
paradigm until chapter 8, where I claim that verb clustering follows from
a narrow theory of label spell-out. For the time being I simply want to
emphasize that the types of solutions or extensions available to RT are
very di¤erent, lacking as it does head movement and its governing theory
of locality, Relativized Minimality.
7.4 Clause Structure and Head-to-Head Movement
The sort of X-bar theory sketched in the previous section will permit a
full account of head-to-head movement e¤ects without movement, local-
ity conditions, or readjustment rules.
7.4.1 Reassociation and Case-Preposition Duality
There is a functional equivalence between Case marking and preposi-
tions, long recognized but not formalized; for example, there is some
equivalence between to NP and NPdat. This is not to say that the two are
interchangeable, just that there seem to be two ways to ‘‘mark’’ an NP, or
two ways to ‘‘embed’’ an NP under a Case/preposition (see Williams
1994b). Suppose that P stands for some Case/preposition; suppose further
that the relation between the Case marking/preposition and the NP it is
attached to is one of embedding. I will leave open whether it is functional
embedding or complement embedding (it can probably be either, de-
pending on the preposition), and I will use the symbol ‘‘>’’ already
introduced to indicate the embedding relation. Then the equivalence we
are discussing is this:
(24) [P > NP]P@ [[ . . . [P > N] . . . ]P>N]P
On the left P governs the full NP and projects a P node; on the right
P > N is realized on the head noun (as, for example, [dat > N], which is
188 Chapter 7
more traditionally notated as Ndat). In both cases P subordinates N. I will
call this relation Case-preposition (C-P) duality, even though it will turn
out to be a broader relation.
In what way are these two structures equivalent? And in what way are
they di¤erent? They are obviously not identical, in that in a given con-
struction in a given language with a given meaning, only one of them can
be used; so English has only to boys, whereas Latin has only pueribus.
Nevertheless, the two expressions are alike in two ways: first, the relation
between P/dat and the N/NP is approximately the same in the two cases,
and second, the distribution of P > NP and [P > N]P is approximately
the same (i.e., they fulfill the same function, that of expressing the dative
argument of a verb).
But what is the basis of their equivalence? Why should they be alike
at all? We might regard the two representations in (24) as mutually
derivable by abstraction/conversion (‘‘)’’ signals abstraction, and ‘‘(’’conversion).
(25) [ . . . [X > Y] . . . ]X , [X > . . . [Y] . . . ]
I will aim to derive the equivalence from the X-bar theory developed here
without any independent operations, but I will nevertheless refer to these
operations in exposition.
Specifically, I will explore the possibility that C-P duality is nothing
other than the relation called reassociation in chapter 8. There, reassocia-
tion is shown to be a property of the complement-of relation in the mor-
phology of the functional system, so that, for example, if the left-hand
side is a valid expression, then so is the right-hand side, and vice versa.
(26) [[X > Y] > Z], [X > [Y > Z]]
The relation accounts for, among other things, a kind of ‘‘clumping’’ in
how functional elements are realized morphologically. Given that func-
tional elements are strictly ordered (T > AgrS > AgrO > V), one would
expect only right- (or perhaps left-) linear structures to realize the in-
flected verb; however, morphological structures like these are found as
realizations of this order:
(27) Swahili inflected verb
[a-li]
AgrS-past
[ki-soma]
AgrO-V
[AgrS > T] > [AgrO > V], [AgrS > T > AgrO > V]
(Barrett-Keach 1986, 559)
X-Bar Theory and Clause Structure 189
The structure is a symmetrical binary tree, rather than a right-branching
one (see chapter 8 for details; see also Barrett-Keach 1986). Why can the
symmetrical structure on the left realize the linear chain of functional
elements on the right? The bottom line of (27) shows that the actual
structure of the inflected verb enters into a ‘‘C-P-like’’ duality relation
with the functional structure it is supposed to represent. See chapter 8 for
further discussion.
Reassociation is the relation illustrated here:
(28) [A > B] > C, A > [B > C]
C-P duality of the sort instantiated in (25) can be viewed as a straight-
forward case of reassociation if we can appeal to a null element 0 to serve
as the third term in the reassociation.
(29) [[0 > X] > Y], [0 > [X > Y]]
But the extension to 0 suggests an even more exotic possibility. In (25)
X is abstracted out, leaving behind Y; but suppose even more were
abstracted out, namely, X > Y.
(30) a. [[ . . . [X > Y] . . . ]X>Y]X ) [[X > Y] > [ . . . [0] . . . ]X>Y]Xb. [0 > [X > Y > 0]]) [[0 > X] > [Y > 0]]) [[0 > X > Y] > 0]
In terms of reassociation, (30a) is simply a double application of the op-
eration Reassociate, as indicated in (30b).
For X ¼ P and Y ¼ N we would then have:(31) [[ . . . [P > N] . . . ]P>N]P ) [[P > N] > [ . . . [0] . . . ]P>N]PThis essentially evacuates the head position of the complement entirely—
not just the P/Case, but the N as well. This suggests that the head of the
noun could be realized on the preposition itself. And this possibility arises
purely through X-bar theory, with no further mechanisms, so long as the
theory includes C-P duality, as it arises if reassociation holds of X-bar
syntax. Such cases resemble the ‘‘inflected prepositions’’ found in Breton,
where an agreement mark on a preposition precludes overt expression of
its direct object (Anderson 1982).
Examples like (31) will be well formed only if the label [P > N] satisfies
Axiom 2; that is, P must be higher than N on the relevant functional
hierarchy. There are perhaps two kinds of prepositions: one ‘‘functional’’
and transparent, for which Axiom 2 would be satisfied; and another that
takes ‘‘true’’ complements, for which it would not be satisfied (see Wil-
liams 1994b for extended discussion). Structures that instantiate the right-
190 Chapter 7
hand side of (30) might be the coalescences of P and pronoun or article
found in some languages.
(32) a. zu
to
dem
the.dat
) zum (German)
b. a
to
le
the
chien) au chien (French)dog
Taking (32b), and assuming the DP hypothesis, we have:
(33) P > [D > NP]DP ) [P > D] [0 > NP]NPIn this way, head-to-head movement, in its instantiation as a kind of
‘‘incorporation,’’ is realized directly by X-bar theory.
Used in this way, Reassociate bears an obvious relation to covert verb
movement. However, of course it is not movement, and it need not be
bounded by any extrinsic locality conditions; rather, it is localized by the
X-bar formalism itself.
In the remainder of this section I will explore the possibility that C-P
duality, as an instantiated application of Reassociate in syntax, is the ap-
propriate syntax for overt verb movement as well. Given the conclusions
of the last two sections, this is an almost obligatory step to take. In sec-
tion 7.3 I suggested problems with Relativized Minimality as an account
of head-to-head relations, partly because in section 7.2 I developed an
alternative account of inflection in syntax. But since in the standard ac-
count of clause structure Relativized Minimality is the principle govern-
ing the locality of overt head movement, something must be developed in
its stead if it is to be eliminated on general grounds.
Verb-second (or subject-aux inversion (SAI) in English) can be seen as
arising from C-P duality in the following way. A declarative clause is a
tensed entity, where the tense is realized on the head of VP (also the head
of NP, if nominative is simply the realization of tense on N, as suggested
in Williams 1994b).
(34) [NP VPT]T
By C-P duality, this is the same as (35).
(35) [T [NP VP]]
(35) itself is not instantiated, because there are no lexical items that purely
instantiate T (unless do is one). But if V in the tensed clause is represented
as T > V, then the indicative clause instead looks like (36),
(36) [NP [T > V]P]T
X-Bar Theory and Clause Structure 191
which, by (radical) C-P duality, is the same as (37).
(37) [[T > V] [NP 0P]]T , [NP [T > V]P]T
Thus, auxiliary inversion structures (the left-hand side of (37)) arise from
uninverted structures through C-P duality. Duality captures the most es-
sential properties of inversion: it is local (only the top tensed verb can
move, and only within a single clausal structure), and it is to the left (like
P). Duality does not capture the fact that inversion is restricted to modal
verbs in English, but not in German, a point to which I will return.
The restriction to movement within a single clausal structure follows
from Axiom 2, which says that all labels must be substrings of the PCFH,
just as [P > D] in (33) is. T > V either is, or abbreviates, one such sub-
string, but to move into a higher clause it would require a label like
[T > � � � C � � � > T � � � ], which violates the PCFH. The reason ‘‘move-ment’’ appears to displace items to the left follows from the fact that T
(or for that matter V) takes its complements to the right.
Thus far I have assumed that the subject is an adjunct or specifier of
the VP and hence does not participate in the duality. In fact, though, the
subject is treated somewhat di¤erently in the two constructions related by
the duality. In the V-initial structure, the subject is treated more as a
direct object than as a subject, in that adverbs cannot intervene between
it and [T > V].
(38) a. *Did recently John leave?
b. John recently did leave.
c. *John saw recently Bill.
In this, the fronted auxiliary is playing the same role that the preposition
plays in certain absolutive constructions.
(39) a. With John recently departed, . . .
b. *With recently John departed, . . .
In fact, the absolutive construction is a good model for SAI, and it
emphasizes the notion that SAI arises from C-P duality. Some absolutive
constructions—for example, the Latin ablative absolute—even use Case
instead of P, thus confirming the connection.
(40) Caesare
Caesar.abl
vivo, . . .
living.abl
‘With Caesar living, . . .’
192 Chapter 7
That is, the Latin absolutive bears the same relation to the English
absolutive that an uninverted clause bears to an inverted clause, the rela-
tion of C-P duality. Another model is the consider construction (consider
Bill foolish) and similar small clause constructions, in which the V relates
to the NP as it would to a direct object.
In English, we can say that the inverted auxiliary relates to the subject
in its ‘‘derived’’ position; this is because the invertible verbs are all auxil-
iaries, and auxiliaries only take subject arguments. But in other inversion
constructions this is impossible. In German, for example, the class of in-
vertible verbs includes all predicates, and so the verb must govern aspects
of the clause structure that should not be accessible to it from its derived
position. For instance:
(41) Gestern kaufteV Hans das Buch tV.
The derived position of the verb should not permit a theta relation to
the direct object, because of the locality of theta relations. Rather, the
‘‘trace’’ of the verb should be responsible for that relation, as it is in
standard accounts. How can this be done with C-P duality?
C-P duality can give us a kind of trace, if we take the 0 of reassociation
seriously. I have used representations in which XPs have 0 heads without
saying how they are licensed, and they clearly do not occur freely. In the
following structure,
(42) [X > 0P]
[X > [0 . . . ]0P]XP
X ‘‘controls’’ 0 by virtue of governing 0P. It controls it in the sense that
the 0P acts, in its interior, as though X occupied its head position. It
might not be far-fetched to regard the 0 as an ‘‘anaphor,’’ with X as its
antecedent. This in fact makes it just like movement: the 0 head of 0P is
identified with X. But in this case, the antecedence arises from Reasso-
ciate and the complete evacuation of the label on the head when Reasso-
ciate applies in the most radical manner.
Control of a 0 head is also found in gapping.
(43) [John saw Mary][T]P and [Bill 0 Pete][0]P.
In the simplest interpretation the [T] label on the first conjunct serves
as the antecedent of the [0] label on the second conjunct, and by virtue of
C-P duality governs the interior of the second conjunct. This explains the
locality of the construction: it cannot occur in coordinated CPs, because
the antecedence holds only for immediate conjuncts.
X-Bar Theory and Clause Structure 193
(44) *I think [that John saw Mary] and [that Bill 0 Pete].
In (43) more than just T is deleted in the second conjunct; the verb is
also deleted, and the verb is understood as identical to the verb of the first
conjunct. In Williams 1997 I suggested that a 0 head always licenses a 0
complement if the following relation holds:
(45) Antecedent of complement of 0 head ¼ complement of antecedentof 0 head.
That is, antecedence and complementation commute. See Williams 1997
for a derivation of (45) from a more general principle and for a discussion
of its scope and properties.
Returning to SAI, the notion that the 0 head of 0P is anteceded by
whatever governs 0P explains why the absent head nevertheless manages
to govern the internal structure of the 0P that it heads. For example,
fronted auxiliaries are compatible only with whatever complements are
possible when fronting does not take place:
(46) a. Can John [0 swim]0P?
b. *Can John [0 swimming]0P?
c. John can swim.
d. *John can swimming.
Questions about the type of complement of the 0 head are deferred to its
antecedent, the fronted auxiliary.
A. Neeleman (personal communication) suggests that traces might not
in fact be necessary—at least one of the forms in the dual relation will
represent a structure at a previous level in which the relevant licensing
takes place. So, for example, (46a) is the dual of (46c), and (46c) itself or
some structure that it represents licenses the relation between can and the
present participle; in that case the trace in (46a) is not needed for licensing.
7.4.2 Multiple Exponence in Syntax
The mechanism I have called Reassociate also provides a means of ac-
counting for multiple exponence in syntax. Multiple exponence is an em-
barrassment for the theory of labels proposed here; it shouldn’t exist. The
reason is, given a functional hierarchy F1 > � � � > F13 as in (47a), whereeach Fi is subcategorized for Fiþ1, applying Reassociate will derive ob-jects like those in (47b), but not like those in (47c) or (47d). (M marks
subunits that correspond to morphemes.)
194 Chapter 7
(47) a. F1 > F2 > F3 > F4 > F5 > F6 > F7 > F8 > F9 > F10 > F11> F12 > F13
b. [F1 > F2 > F3]M > [F4 > F5 > F6 > F7 > F8 > F9 > F10 > F11> F12 > F13]M[F1 > F2]M > [F3 > F4]M > [F5 > F6 > F7 > F8]M > [F9> F10]M > [F11 > F12 > F13]M[F1 > F2 > F3 > F4 > F5 > F6 > F7 > F8 > F9]M > [F10 > F11> F12 > F13]M
c. F10 > F2 > F3 > F6 > F5 > F13 > F8 > F8 > F9 > F1 > F11> F2 > F13
d. [F1 > F2 > F3 > F4 > F5 > F6]M > [F6 > F7 > F8 > F9 > F10> F11 > F12 > F13]M
(47c) is simply a random assortment of the original set of features, of
course inadmissible. But cases like (47d), described with the term multiple
exponence, seem to occur rather frequently. An example that will be dis-
cussed more thoroughly in section 8.3 is this one from Georgian:
(48) g-xedav-s
2sg.obj-see-3sg.subj
The problem is that both the prefix and the su‰x are sensitive to both
subject and object agreement features, and hence in some sense realize
them; but then, no matter what the feature hierarchy is, there is no way to
segment it into morphemes by Reassociate.
Suppose that the feature hierarchy here is (49a) (where S# stands for
subject number agreement, Sp for subject person agreement, etc.). Then
an acceptable segmentation would be (49b), but what is found, appar-
ently, is (49c).
(49) a. S# > Sp > O# > Op > V
b. [[O# > Op] > V] < [S# > Sp]
c. [[(S# > Sp >) O# > Op] > V] < [S# > Sp (> O# > Op)]
However, the problem would disappear if we were to make ‘‘silent’’ some
of the features in the two morphemes (here, as in chapter 8, parentheses
indicate silent features).
(50) [S# > Sp (> O# > Op)]prefix > [(S# > Sp >) O# > Op]su‰x
Now the forms are combinatorially valid. We have drawn a distinction
between what a morpheme is paradigmatically sensitive to, and what it
‘‘expresses’’ insofar as the rules for combining forms are concerned. So
X-Bar Theory and Clause Structure 195
both prefix and su‰x can be sensitive to the value of some feature Fi, but
only one of them will ‘‘express’’ it. The theory makes the interesting pre-
diction that multiple exponence will always involve features that are ad-
jacent on the functional hierarchy. See chapter 8 for further discussion.
Multiple exponence is found in phrasal syntax as well. If we accept the
account of head movement–type phenomena that I have suggested, then
my proposed theory of multiple exponence can be imported here, making
the same very specific predictions about the character of multiple expo-
nence in phrasal syntax.
As an example, consider the complementizer agreement phenomena
found in certain dialects of Dutch (Zwart 1997).
(51) datte
that.pl
wy
we
speult
play.pl
(East Netherlandic; from Zwart 1997)
As Zwart notes, in some dialects the morphology on the complementizer
di¤ers from the morphology on the verb, while in others the two are
identical; in East Netherlandic they are di¤erent. What the agreeing
dialects have in common is that the complementizer always agrees with
the subject, and it always agrees in addition to (not instead of ) the verb.
Let us suppose that the functional hierarchy is C > T > SA > � � �V(SA ¼ subject agreement). Then we can understand the East Netherlandic
example in the following way:
(52) datte (pl)
[C > (T > SA)]zfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflffl{C T S . . . V|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
speult
[T > SA > � � � > V]That is, the T and SA features are silent on the complementizer (they
could as easily have been silent on the verb; see chapter 8 for a discussion
of underdetermination of such analyses). From the point of view of syn-
tax, speult will ‘‘be a’’ T > SA > V, and datte will ‘‘be a’’ C and so can
combine with the tensed V in accord with the functional hierarchy.
7.4.3 The Distribution of Dual Forms
In sum: if X-bar theory is formulated to account for C-P duality, through
an extension of Reassociate to the domain of phrasal syntax, then it will
196 Chapter 7
also account for various cases of absorption, head-to-head movement,
gapping, and so on. Is it a notational variant of head-to-head move-
ment? Putting aside the dismissive tone of the phrase, we can answer,
yes, in some respects. It would be quite surprising if it were not, because
the theory of head-to-head movement now answers to numerous well-
documented findings. The theories converge in many ways—for example,
with respect to locality and traces. Nonetheless, they are quite di¤erent in
character, the one consisting entirely of the laws of X-bar theory, and the
other also including movement and the theory of locality that movement
requires.
I remarked at the outset that C-P duality is a possibility not always
realized, in the sense that the two forms it relates do not always both ex-
ist, or if they do, are not always equipotent. But why not? Why do we not
find a given structure existing alongside all its fully grammatical dual
structures?
There may be no single answer to this question. In some cases the
absence of requisite lexical items is the cause; for example, the T in (35)
([T [NP VP]]) cannot be realized because T does not correspond to any
lexical item.
Blocking is the answer in other cases. Consider the a le) au rule in
French. C-P duality gives two structures:
(53) a
to
[le
the
N], au [0 N]N
The fact that there is a special lexical item (au) for the right-hand side
may be enough to make the left-hand side ungrammatical, through
blocking, since the left-hand side is what would be expected on more
general grounds, if the item au did not exist.
In still other cases both sides of the duality may be permitted to exist if
there is some di¤erence in meaning. For example, a dative preposition
and a dative Case marking might exist side by side, so long as they dif-
fered in meaning. The English SAI~declarative duality is clearly another
example: the semantic di¤erence is the di¤erence between interrogative
and declarative, or, more accurately, between a range of interpretations
that includes interrogative (also exclamative, conditional, and imperative)
and a range of interpretations that includes declarative. But other dis-
crepancies are unaccounted for. For example, in English only the auxil-
iary verbs participate in SAI. The SAI verbs must be subcategorized to
take the ‘‘absolutive’’ NP VP sequence as their complement, as well as
X-Bar Theory and Clause Structure 197
having their usual VP subcategorization. Only auxiliary verbs as a class
have this possibility in English (whereas in German, for example, any
tensed verb can participate). In English there are telltale discrepancies
suggesting that double subcategorization is in fact the correct way to
characterize the situation.
(54) a. *Amn’t I going?
b. Aren’t I going?
c. *I aren’t going?
Amn’t is not a possible IP verb, and aren’t is an IP verb with agreement
possibilities di¤erent from those of its VP counterpart.
198 Chapter 7
Chapter 8
Inflectional Morphology
8.1 The Mirror Principle
The Mirror Principle is the name of an e¤ect, in that it is derived in
theories, not fundamental. Specifically, the e¤ect is that the order of
morphemes on inflected verbs seems to reflect the structure of the syntac-
tic categories that dominate that verb. For example, in a language in
which the verb is marked for both object agreement and subject agree-
ment, subject agreement marking is generally ‘‘outside of ’’ (i.e., farther
from the stem than) object agreement marking; this ordering mirrors the
ordering of subject and object in the clause, where subject is outside of
object. To give another example, admirably detailed by Cinque (1998),
the expression of various kinds of modality by means of a‰xes on the
verb mirrors the expression of those same kinds of modality when that
expression is achieved by means of adverbs or auxiliary verbs. For
example, in English, ability is expressed by the modal can, whereas in
the Una language of New Guinea, ability is expressed by a verbal su‰x
-ti. Through painstaking language comparisons, Cinque shows that if an
auxiliary verb in one language and an a‰x in another language represent
the same functional element, then they will occur in the same spot on the
functional hierarchy.
So long as functional embedding occurs between levels, we have al-
ready derived the Mirror Principle in RT. As shown in chapter 7, it arises
from the interaction of Shape Conservation with functional embedding.
Recall that there are two kinds of embedding, complement embedding
and functional embedding. A lexical item is concatenated with its com-
plement, whereas a feature is added to the top of the label of its comple-
ment; in both cases the ‘‘>’’ relation is instantiated. Suppose that f is a
feature borne by morpheme a.
(1) a. Complement embedding: [a > B]fPb. Functional embedding: [ . . . ][f>B]
A familiar example in English is pairs related by do support.
(2) a. [didT leave][T>V]Pb. [left][T>V]P
The following is the example of functional embedding discussed in chap-
ter 7, on the assumption that T marking occurs at SS:
(3) a. CS: [amalgamateV . . . ]VP lhead-of amalgamateV
a a
b. SS: [amalgamate[T>V] . . . ][T>VP] lhead-of amalgamate[T>V]
The addition of T to the VP in SS (giving [T > VP]) is the act of embed-
ding under T that is privileged to occur in SS; the ‘‘percolation’’ of T to
the head of VP is necessary to conserve the head-of relation that existed
in CS. What happens with one feature happens with any number of fea-
tures. English is not a good language for illustration; still, supposing that
AgrS > T in the functional hierarchy, then (3) would subsequently un-
dergo further functional embedding.
(4) [amalgamate[T>V] . . . ][T>VP] ) [amalgamate[AgrS>T>V] . . . ][AgrS>T>VP]In this way, Shape Conservation will derive a mirroring of the syntactic
structure of a phrase on the label of the phrase itself.
Actually, the Mirror Principle also arises in RT in a di¤erent way. If
X-bar theory requires that labels honor the functional hierarchy, and if
that requirement applies equally to labels on phrases and labels on heads,
then, assuming that the set of features is the same for each, this require-
ment will enforce mirror e¤ects. For example, the following case will
never arise:
(5) [ . . . HF1>F2>F3>F4 . . . ]F1>F3>F2>F4
If there is a single functional hierarchy, then it is impossible for both the
head and the label on the phrase to respect it, since they have di¤erent
orders of elements.
But is the Mirror Principle actually true? There are a number of cases,
some of which will be discussed later in this chapter, that seem to argue
against it. There are languages, for example, where subject agreement
marking stands between the verb and object agreement marking, and
200 Chapter 8
there is no reason to think that the syntax of such languages di¤ers from
the syntax of English in the relative ordering of the subject and object
NPs. How can we respond to such cases?
We could abandon the principle, or come to view it as a superficial
tendency that does not deserve deep codification in the theory of gram-
mar. Mirroring is a norm, but not required. One version of this strategy
calls for a separate set of operations whose role is to mediate between the
syntactic and the morphological representations—in other words, a set of
rules to fix the mistakes that arise when they don’t match, so-called re-
adjustment rules. The problem with such rules is that, once they are a
part of a theory at all, there is no stopping them. If, for example, one’s
theory of the syntax-morphology interface includes three (types of ) read-
justment rules, and one encounters a language whose morphosyntactic
interactions lie beyond those rules, it doesn’t hurt much to add a fourth
one. Adding a readjustment rule component to a theory elasticizes it in
such a way that it can respond flexibly to new data. While that might be a
good property of some engineering projects, it is a bad property of a
theory.
Cinque (1998) o¤ers the most rigid (i.e., the best) theory of morpho-
syntax. In this theory all inflectional a‰xation arises from head-to-head
movement; as a result, if afi bears feature fi, then the following structure
arises and a perfect mirroring results.
(6) [[[[[[Vþ af1]þ af2]þ af3]3P t]2P t]1P t]VPCinque’s theory is quite rigid in its predictions, and clearly false, as
Cinque himself recognizes. How can it be fixed? One possibility would be
to introduce a readjustment component. Cinque steers clear of this blank
check and instead suggests (but does not work out) a theory of null aux-
iliary verbs and applies it to some obviously troublesome cases. I think
Cinque’s instinct is correct—not to write a blank check, but to develop a
substantive theory—but I would like to suggest a di¤erent solution to the
problem of apparent instances of nonmirroring.
I think the best place to start is with the recognition that syntax and
morphology (i.e., word formation, including inflected-word formation)
have di¤erent syntaxes; there are universal di¤erences (syntax includes
XPs, morphology does not), and there are language-particular di¤erences
(English words are head final, English phrases are not; in other lan-
guages, such as Japanese, words and phrases are both head final). But
Inflectional Morphology 201
they have one thing in common: they are both productive systems for,
among other things, representing the functional hierarchy. Crucially, they
represent the same functional hierarchy, but because they are di¤erent
systems, they do so di¤erently.
It is of course an empirical fact, or claimed fact, that words and phrases
are di¤erent, in this way or in some other way. It is in fact an empirical
imposition that there are words—combinations of morphemes that are
smaller than phrases—at all. On minimalist assumptions it is not clear
that nonmonomorphemic words should even exist; certainly one can
imagine developing a mapping between a sound representation and a
meaning representation that does not have anything corresponding to
morphologically complex words. Artificial logical languages do not have
morphology, for example, nor do programming languages (actually, pro-
gramming languages do have morphology, but as a practice among
programmers, not as a part of o‰cial language specification).
But given that there are words, and that words cannot be theoreti-
cally dissolved into phrases, leaving only morphemes or features, the
questions remain, how do words ‘‘work’’ internally, and how do they
interface with syntactic representations? In chapter 7 I gave some idea
about how phrasal syntax represents the functional hierarchy and how
the head of a phrase relates to the head of a word; and I have located the
Mirror Principle in that representation and that relation. But now, how
do words work themselves? That is, how do words represent the func-
tional hierarchy?
The hope is of course that the correct theory of how words work will
eliminate the need for any devices that mediate between syntax and word
structure; then we will have eliminated any need for the Mirror Principle
itself, as it will simply be a property of the architecture of the theory. It
will not be the case that morphology mirrors syntax, or vice versa; rather,
they will both mirror or ‘‘represent’’ the same functional hierarchy, but in
di¤erent ways.
I have declared that RT inflectional morphology is lexicalist. But I am
sure I have overstated the matter—I am sure it would be possible to make
a Checking Theory account of verbal morphology consonant with the
rest of RT. As usual, the things we call theories are really much looser
than we think. But I do think that a lexicalist morphology is the best kind
for RT. For one thing, it makes maximal use of the representation rela-
tion to account for inflectional morphology, and for the Mirror Principle
in particular. For another, I think it would indeed be peculiar to have
202 Chapter 8
eliminated NP movement from scrambling, and from the association of
theta roles with Case, but to still have a Checking Theory for verbs.
In what follows I will develop an account of the sublanguage of inflec-
tional morphology as an independent language. I will treat inflected ver-
bal elements and VPs as di¤erent languages, both representing the same
set of abstract functional elements, in accordance with the conclusion of
chapter 7.
I will propose a model for the sublanguage of verbal inflection, a for-
mal language that I think is an accurate model of inflectional morphol-
ogy, and I will present some statistical evaluation of its accuracy. I will
also give some idea of how the model can be applied to other aspects of
linguistic structure—in particular, how it models Germanic and Hungar-
ian verb raising and verb projection raising. In general, it seems to be a
promising model wherever ‘‘inheritable’’ lexical specifications play them-
selves out combinatorially without extrinsic inhibition.
8.2 The Language CAT
Let us assume that there is a universal set of elements, as in (7), and
that these elements are in a fixed hierarchical relation to one another, as
indicated.
(7) Universal elements and hierarchy
AgrS > T > (Asp >) AgrO > V
(or T > AgrS > (Asp >) AgrO > V)
The question I want to explore is how such elements can be realized by
lexical items. The elements are not themselves lexical items, but they are
realized by lexical items. For example, -ed in English realizes T (and
maybe other features at the same time), and plead realizes V.
To express the fact that one morpheme is above another in the hierar-
chy in (7), we will endow each element in (7) with a ‘‘subcategorization’’
of the form in (8), adopting the convention that if a morpheme expresses
an element, it inherits its subcategorization.
(8) a. T: AgrOb. -ed: T, AgrO
T takes an AgrO complement, and -ed, because it realizes T, takes an
AgrO complement. Given a set of lexical elements each of which expresses
one of the elements in (7), we can derive a linear string that contains them
Inflectional Morphology 203
all, if we adopt the X-bar convention that when an element is combined
with another element that it is subcategorized for, the result is of the same
category as the subcategorizing element (the principle of X-Bar Head
Projection).
(9) [morpheme1AgrS
[morpheme2T
[morpheme3Asp
[morpheme4AgrO
[morpheme5]V]AgrO ]Asp]T]AgrSV
Such an account predicts that the surface order of the morphemes will
mirror the underlying relation of the elements in (7) to one another.
However, in general we find that the surface order of the morphemes of
inflected verbs di¤ers from the order in (7). One way to accommodate
these di¤erent orders is to generate (9) directly and then apply rules
that ‘‘adjust’’ it into another structure. As already mentioned, while this
approach is obviously not incoherent and so is conceivably correct, it
should nevertheless be suppressed because of the following considera-
tions: first, readjustment rules are an inorganic addition to the theory,
and second, their presence undercuts any specific expectations about the
surface order of morphemes. While (9) is not the only order realizing
(7), it is quite obvious that the possible orders realizing (7) are sharply
limited. I have posed the problem of readjustment rules as a problem
for derivation of inflected verbs in the lexicon, but it applies equally to
models that derive the inflected verb in syntax (as in Cinque 1998) or in
‘‘both’’ (as in Chomsky 1995—some sort of derivation in the lexicon,
and feature-by-feature checking in syntax under a ‘‘mirror’’ regime). In
Cinque’s model, for example, the verb moves successively through a
series of functional projections that define clause structure, picking up
one a‰x in each projection under adjunction; this predicts a right-linear
string of morphemes mirroring the underlying order of the functional
elements, and any deviation must be handled by a di¤erent mechanism.
I think a better approach is to somewhat enlarge the combinatorial
possibilities among the elements in (7) in the first place. The sole conven-
tion that governs combinations thus far is X-Bar Head Projection. Sup-
pose we add to this the convention that a composed unit can inherit a
subcategorization as well as a type; this subcategorization is inherited
from the nonhead (whereas the type is inherited from the head). Com-
bining the two conventions gives the following Rule of Combination
(RC):
204 Chapter 8
(10) a. Rule of Combination
X Y þY Z ! [XþY]X Zb. X/YþY/Z ¼ X/Z
(10a) is the basic rule of Categorial Grammar (given in that theory’s no-
tation in (10b)), and for this reason I will call the language generated by
RC from a set of elements CAT. To illustrate, suppose for simplicity that
T takes V as its complement, and some V takes N(P) as its complement;
then, we derive the tensed transitive verb to the right of the arrow in (11)
by applying RC to the two elements to the left of the arrow.
(11) V NP þ -ed(T) V ! Ved, T NPRC will also derive such objects as tensed Asp, as in (12).
(12) T-morphemeþT Asp
Asp-morpheme
AspAgrO
!!T-Asp
T AgrO
Such a rule does not by itself allow for the generation of alternative
morpheme orders. However, it does allow for more diversity in the struc-
tures that can instantiate (7), permitting (in addition to the purely right-
linear structure) such structures as the following, where T and Asp have
combined to form an intermediate unit of type T, with subcategoriza-
tion AgrO:
(13) AgrS [[T Asp]T; AgrO [AgrO V]AgrO ]]
RC accounts straightforwardly for morphological fusion, the situation
in which one morpheme instantiates more than one feature. If some fea-
tures are permitted to have phonologically empty realizations, derivations
like the following will be possible:
(14) [e]X Y þ [morpheme]Y Z ! [morpheme]X ZThis account of fusion predicts that fused elements must be adjacent in
the hierarchy in (7), since RC will only combine adjacent elements. The
prediction is overwhelmingly true.
This model still does not generate alternative orders of morphemes. To
generate di¤erent orders, we will relax the interpretation of the subcate-
gorization notation. The traditional notion of subcategorization bundles
together three di¤erent kinds of information: type, order, and level.
(15) Subcategorization
a. Type (N vs. V, etc.)
b. Order (left vs. right)
c. Level (root vs. stem; X0 vs. Xn)
Inflectional Morphology 205
So NP encodes the idea that the verb takes a nominal object (N), that it
takes it to the right ( X), and that it takes a phrase-level complement
(NP as opposed to N). I want to investigate the properties of the lan-
guage that results from relaxing the order and level restrictions, retaining
only type subcategorization. Relaxing the order restriction means that if
V takes N(P) as a complement, then it can take it either to the right or to
the left [V N] or [N V]. To eliminate ambiguity about which element
takes which as complement in a structure, I will use the sign ‘‘>’’ intro-
duced in chapter 7 to indicate the relation of head to complement, with
the narrow end pointing to the complement. For example, if V takes an N
complement, then both of the following constructions are licensed when
the order restriction is dropped:
(16) a. [N < V]
b. [V > N]
I will now define CAT to be the language that is generated by a set of
elements in head-complement order under the RC, where subcategoriza-
tion specifies type only, leaving level and order free.
(17) CAT ¼ {A( B), B( C), C( D) . . . þ RC}CAT uses type subcategorization only
Put di¤erently, CAT is the set of permutations that arise from suspending
order and level subcategorization. I will now determine some properties
of CAT with an eye to evaluating its role as a model of some linguistic
systems, inflectional morphology among them.
The first thing to establish is the relation of CAT, where order and
level are relaxed, to the language that results when they are enforced.
When the elements in (17) are combined in such a way that order is
fixed, subcategorization is not inherited, and only head type is projected,
they determine a single structure, which I will call the right-linear string
(RLS).
(18) Right-linear string
[A > [B > [C > [D > [E]E]D]C]B]A
The RLS is the model of Pollock/Cinque-style clause structure, and, via
the Mirror Principle, the model of inflectional morphology that is widely
assumed.
The RLS bears a particular relation to CAT that can be explicated by
defining two CAT-preserving operations, Flip and Reassociate.
206 Chapter 8
(19) a. Flip
If X ¼ [A > B], A and B terminal or nonterminal,
Flip(X) ¼ [B < A].
b. Reassociate
If X ¼ [A > [B > C]], R(X) ¼ [[A > B] > C].
Flip is CAT preserving in the sense that if [A > B] belongs to CAT, then
it is guaranteed that [B < A] belongs to CAT, by virtue of CAT’s indif-
ference to order. To show that Reassociate is CAT preserving, we reason
from the RC in this way: in X ¼ [A > [B > C]], [B > C] is of type B, with
subcategorization the same as C’s; so A must have subcategorization B
if X belongs to CAT; but then A must be directly combinable with B, and
the result of that combination will have subcategorization C; so, given
the RC, [[A > B] > C] must also belong to CAT. So, both operations are
CAT preserving. Furthermore, both have obvious inverses, and the in-
verses are also CAT preserving.
We can now show that CAT is the language that can be generated
from the RLS by Flip and Reassociate. We do this by showing that any
member X of CAT can be mapped onto the RLS by some combination of
Flip and Reassociate, and since these are invertible and CAT preserving,
that mapping can be viewed backward as a generation of X from the RLS
by some combination of Flip and Reassociate.
Suppose there is a structure X that is a member of CAT but cannot be
mapped onto the RLS by Flip or Reassociate, or their inverses. Then,
there must be some node in X that is either a left-branching structure
([[A > B] > C]) or a structure of the form [A < B], for if there are only
right-branching structures and rightward-pointing carets, then the struc-
ture is the RLS. In the first case, if right association cannot convert X to
[A > [B > C]] by reasoning already given, then it cannot belong to CAT
in the first place, and likewise for the second case; hence, there can be no
such structure. So,
(20) CAT ¼ RLSþwhere by RLSþ I mean the language generated from the RLS by Flipand Reassociate.
The properties of CAT just identified are useful in discussing CAT as a
model of linguistic systems. By virtue of Flip and Reassociate, CAT can
be taken as a model of systems that appear to involve movement. In fact,
CAT, via its RLSþ interpretation, mimics movement of constituents of
Inflectional Morphology 207
arbitrary size, over arbitrary distances. To see this, consider the RLS in
(21a), and whether H in that structure could be moved to the position
between B and C.
(21) Flip and Reassociate can e¤ect long-distance moves, of any node to
any higher position.
a. [A > [B > [C > [D > [E > [F > [G > [H > [I J]]]]]]]]]5
Derivation:
b. [A > [B > [C > [D > [E > [F > [G > [H > [I J]]]]]]]]]
Reassociatem
c. [A > [[[[[[B > C] > D] > E] > F] > G] > H] > [I > J]]
Flipm
d. [A > [H < [[[[[B > C] > D] > E] > F] > G]] > [I > J]]
In the derivation (21b–d), first several applications of Left-Reassociate
gather all of the material intervening between the moving item and the
landing site, and then a single Flip e¤ects the movement. It is important
to understand that as far as CAT is concerned, there is no movement;
rather, there is a theorem that if (21b) belongs to CAT, then so does
(21d); Flip and Reassociate are simply a way of thinking about this via
the RLSþ interpretation of CAT. Nevertheless, these conclusions inviteus to consider CAT as a model of linguistic structures that appear to in-
volve movement.
While a single unbounded movement is allowed, multiple movements
are quite constrained. The Flip operation in (21c) reverses the caret, thus
blocking any further applications of Reassociate. Hence, any further
movement in the vicinity of the movement path will be blocked; in par-
ticular, there will be
(22) a. no movement of the moved constituent
b. no movement out of the moved constituent (where it is complex)
c. no movement out of extracted-from constituents
It is again important to realize that these are not constraints that need to
be imposed on Flip and Reassociate; they all reduce to theorems about
CAT. A question I have not been able to answer is, is any system of
transformations of the RLS constrained by (22) equivalent to CAT or
RLSþ?Because of the restrictions in (22), CAT cannot be used to model wh
movement, as wh movement does not conform to any of them. CAT thus
208 Chapter 8
di¤ers from full-blown Categorial Grammar. In particular, it does not
have ‘‘type lifting,’’ which can be used to evade (22).
I will now try to assess how big CAT is. If the set of base elements
is finite, as it is in the cases we intend to model, CAT itself is finite.
As I characterized it earlier, for some fixed chain of elements in the
complement-taking relation, CAT defines some set of permutations of
those elements. The full set of permutations of n elements (call it P) has n!
elements (n� (n� 1)� (n� 2) � � � � 2). As n grows, CAT becomes a tinysubset of P; for this reason, any system of a certain size that resembles
CAT most likely is CAT.
For three elements, CAT is actually identical to P, but for any larger n
it is not.
(23) Suppose 1 > 2 > 3 > 4 > 5. Then:
3: 1 2 3
[2 < 1] > 3
1 [3 2]
3 < [1 > 2]
[2 3] 1
3 < [2 > 1]
4: 1 2 [3 > 4]
3 [1 2] 4
1 2 [4 < 3]
*3 1 4 2
1 > [[3 < 2] > 4]
*2 4 1 3
5: *3 1 5 2 4, etc.
The starred strings are the non-CAT strings for n ¼ 3, 4, and 5. To seethat they are non-CAT, we can try to build a parse tree for them from the
bottom up; for the examples given, there is no way to start building the
tree, because no adjacent elements are combinable in either direction (this
does not, however, characterize all failures of strings to be members of
CAT).
The non-CAT strings given here are derivable from the RLS by move-
ment free of the constraints in (22). For example, (24) gives the derivation
of ‘‘*3 1 5 2 4.’’
(24) (1 2 3 4 5)! 1 5 2 3 (4 t)! 3 1 5 2 t (4 t) ¼ 3 1 5 2 4In the first step 5 is extracted from ‘‘2 3 4 5’’; in the second step 3 is
extracted from that as well, violating the prohibition against extraction
from extracted-from constituents.
In what follows I will try to give some idea of how fast CAT grows rela-
tive to P. The table in (25) shows how many elements of P are excluded
from CAT for n ¼ 1 . . . 9, and the percentage excluded. Evidently, CATbecomes a vanishing portion of P.
Inflectional Morphology 209
(25) # Total :Excluded-from-CAT % excluded
3 6:0 0
4 24:2 8.3
5 120:30 25.0
6 720:326 45.3
7 5,040:3,234 64.1
8 40,320:31,762 78.8
9 362,880:321,244 88.5
I have not been able to devise a formula that will give the number of
CAT elements for n elements, so the figures in (25) were calculated by
hand. There is a formula that puts an upper bound on CAT and is still
smaller than P; the table in (26) compares the value of this formula with
P. (FR ¼ Flip-Reassociate upper bound)(26) FR ¼ 22n�3
n Ratio of n! to FR(n)
2 1.00eþ 0003 7.50e� 0014 7.50e� 0015 9.38e� 0016 1.41eþ 0007 2.46eþ 0008 4.92eþ 0009 1.11eþ 00110 2.77eþ 00111 7.61eþ 00112 2.28eþ 00213 7.42eþ 00214 2.60eþ 00315 9.74eþ 00316 3.90eþ 00417 1.66eþ 00518 7.45eþ 00519 3.54eþ 006
210 Chapter 8
The formula is arrived at by considering each node to be independently
flippable, and each pair of adjacent nodes to be independently reassoci-
able; since there are n� 1 of the former and n� 2 of the latter, there are(27) 2n�1 � 2n�2 ¼ 22n�3
ways to transform the RLS to generate CAT. But this overestimates the
actual number of permutations: any pair of adjacent unflipped right-
associated nodes in a structure X can be left-associated to yield another
member of CAT that has the same order of terminal elements, so the
same permutation of elements will be counted twice. I have not figured
out a way to subtract or to estimate the size of such redundancies.
Clearly, if we were modeling a linguistic system involving 15 con-
catenating elements, and the observed permutations of these elements
were found to conform to what would be expected of a CAT system, we
would have resounding confirmation that CAT is a good model of the
system, since the chance that exactly these orders would arise in a system
not essentially equivalent to CAT would be small. Unfortunately, most
linguistic systems do not involve the concatenation of such large numbers
of elements; some cases of interest, such as inflectional systems, may in-
volve 4 to 6 elements, and at that level the di¤erence between CAT and P
is not astronomical. Conclusions can nevertheless be drawn for systems of
this size as well, if a number of di¤erent languages are considered. For
example, since the chance that 10 languages with 5 morphemes are all
CAT ¼ .7510 ¼ 5%, one could claim significant confirmation of the CAT-like behavior of the subsystem in question from a collection of 10 such
languages. With this in mind, in section 8.4 I will survey inflectional sys-
tems with 4 and 5 morphemes to assess CAT as a model of inflection.
8.3 Inflectional Systems as an Instantiation of CAT
Suppose we have a fixed universal chain of elements in the complement-of
relation, as in (28).
(28) Universal elements and hierarchy
AgrS > T > Asp > AgrO > V, or perhaps
T > AgrS > Asp > AgrO > V (type subcategorization only)
As before, the caret in X > Y means ‘X takes things of type Y as com-
plement’, but with no restriction on the linear order of the elements or on
the ‘‘level’’ (i.e., bar level, as in X-bar theory) of the elements.
Inflectional Morphology 211
CAT with (28) as its base is clearly not a good model of any particular
language’s inflectional morphology, as no language has inflectional mor-
phology where, for example, the past tense a‰x may freely occur either
before or after the verb (corresponding to Flip). Any given language will
fix the linear order. In addition, any given language will fix the ‘‘level’’ at
which items attach, in a way that I will make precise.
We might say that CAT models inflectional morphology in the sense
that it sets the limits on possible realizations of the universal chain in (28),
but that any particular language will impose order and level constraints
on the subcategorization of particular items that will yield some subset of
CAT. In particular, it would be interesting to explore the possibility that
the only way inflectional systems can di¤er is in terms of these two prop-
erties. (29) is an attempt to formulate this hypothesis.
(29) Lexical Variation Hypothesis
Language-particular inflectional systems di¤er only in
a. order restrictions
b. level restrictions
on the subcategorizations of individual morphemes or classes of
morphemes.
The Lexical Variation Hypothesis (LVH) is independent of whether CAT
is a good model of inflection in general; it could be that CAT sets accu-
rate bounds on what permutations of elements in general can instantiate
the chain in (28), but that the way languages di¤er within that bound is
something other than (29). In what follows I will be evaluating the LVH
as well as CAT, but CAT is the main prey.
The order restriction determines the di¤erence between prefix and su‰x
for morphemes, and the di¤erence between head-initial and head-final
order in syntax.
The level restrictions have to do with what ‘‘size’’ the complement must
be. The details depend on assumptions about what units are available in
the first place. Two cases will be of interest here. One, already mentioned,
will be the word/phrase distinction; the subcategorization N, for exam-
ple, I will take to be ambiguous between N0 and NP. In addition, we
will need recourse to levels of morphological structure, the most familiar
version of which is the root/stem/word distinction introduced in Selkirk
1982, where stems are composed of roots, but not vice versa, and words
are composed of stems, but not vice versa, giving a three-way distinction
among levels. So we will allow a language to impose a restriction on an
212 Chapter 8
AgrO morpheme, for example, that it attach to a verb root, and not to
any other level of verb, in accordance with the LVH.
I should note that this system will give ambiguous derivations for cases
that are not normally ambiguous, and where there is no obvious semantic
ambiguity. To take an example from English derivational morphology, if
both -ate and -ion are type 1 (root-attaching) su‰xes, then both of the
following structures will be allowed:
(30) a. [[a¤ect þ at]þ ion]b. [a¤ect [atþ ion]]
If the subcategorizations and restrictions are satisfied in (30a), then under
the RC, they must be satisfied in (30b) as well. The possibility of structure
(30b) might be welcome for such cases, as there is some tendency to think
of -ation as a single a‰x in such cases; in the present case, for instance,
there is no word *a¤ectate.
Strictly speaking, a further unfamiliar sort of derivation should be
possible as well. Typically, the lexicon is divided into roots, stems, root-
attaching a‰xes, and stem-attaching a‰xes. But, in fact, the system
proposed here does not give exactly this classification. Consider the prop-
erties of -able and -ity listed in (31a,b).
(31) a. -able, A, Vstemb. -ity, N, Arootc. -[ability]: N, Vstemd. [compactþ [ability]]
The question raised in (31c) is, can -ity attach directly to -able, to derive
the complex su‰x -ability, with the properties shown in (31c) (derived by
the RC)? The question comes down to whether or not -able can satisfy the
subcategorization of -ity, and crucial to the answer is whether it satisfies
the restriction that -ity attaches only to roots. Now, -able itself attaches to
stems, but this leaves open the question whether it is itself a stem or a
root or both. If we decide that it can be a root, then there is nothing to
block (31c), and so (31d) will be a typical nominalization using these two
a‰xes. If CAT is right, then these ambiguities are harmless; if they can be
verified, I would in fact consider them confirmatory, because they would
be puzzling without CAT.
For each language I will examine in section 8.4, I will ask two ques-
tions. First, is the order of inflected elements a CAT order or not? Sec-
ond, is there a reasonable specification of order and level restrictions on
the morphemes that instantiate the functional elements that will yield the
Inflectional Morphology 213
particular shape of the inflected word in that language? The first question
addresses CAT by itself, the second, CAT plus the LVH.
I will begin with the assumption that the chain of elements in (28) is
the fixed universal base for CAT; any flexibility introduced into this as-
sumption would not be necessarily incompatible with CAT, but it would
weaken empirical expectations. If, for example, T and AgrS were ordered
di¤erently in di¤erent languages, we would simply have di¤erent bases
for CAT in those di¤erent languages.
In order to have a verbal morphology, a language needs a set of mor-
pheme classes that span the functional chain. Recall that a morpheme can
span subchains of the functional chain through fusion, which arises when
one of the morphemes that the RC combines is a null morpheme. In
general, the fusions that occur in a language are systematic; for example,
in English AgrS and T always fuse. Such generalizations are part of the
lexical style of the language; but, while fascinating in their own right
and essentially not understood, they are not directly the subject at hand.
In (32) the set {m1, m2, m3} is a spanning vocabulary for F1 . . .F6.
(32)
If the RC generates m1, m2, and m3, then it is guaranteed that m2 can
combine with m3, and the result of that combination can combine with
m1, and so [m1 [m2 m3]] will span the functional structure. The spanning
vocabulary might consist of a‰xes, in which case single inflected words
will span the functional structure; or it might consist of words, in which
case syntactic constructions will span the functional structure (giving rise
to what are called auxiliary verb systems); or it might consist of some
combination of the two. In English, for example, the spanning vocabu-
lary consists of both words and roots and a‰xes.
(33) T > AgrS >
|-----was-----|
Asp > AgrO > V
|-------seeing-------|
Was is a word that spans T and AgrS; seeing is a (derived) word that
spans Asp, AgrO, and V (under the assumption that AgrO is universally a
part of the chain).Was and seeingP can be combined in syntax, since was is
a T, AgrO element and seeingP is a projection of the AgrO element seeing.
214 Chapter 8
(34) a. In lexicon
ing: AgrO þAsp, Vsee: V, NP
seeþ ing! seeingAsp, NP
b. In syntax
was: T, AspP
seeing: AspP, NP
seeing: AspP, NPþNP! [seeing NP]AspP[was]þ [seeing]AspP ! [was [seeing NP]]TP
Importantly, it is the RC that is responsible for the operations in both
syntax and morphology. The only di¤erence is that in morphology it
combines X0-level objects, whereas in syntax it combines X0- and XP-
level objects; but this is the characteristic di¤erence between syntax and
morphology in any event. From that di¤erence arises the further di¤er-
ence that inheritance of subcategorization largely has no e¤ect in phrasal
syntax, since XPs have no subcategorization. It is possible that there are
phrasal syntax junctures of units smaller than XP, in which case inheri-
tance should be detectable again; I will suggest in section 8.5 that this is
the correct view of the syntax of verb-raising constructions.
An obvious di‰culty for the notion of spanning vocabulary, as it arises
from the RC, is the existence of multiple exponence. Multiple exponence
(the expression of a single functional element on more than one mor-
pheme in an inflected verb) should be impossible given the RC. This is
because if a feature is in two morphemes, there is no way those mor-
phemes can be combined by the RC: the subcategorization of one can
never match the type of the other, nor can they be hooked together by
any intermediate morphemes, for essentially the same reason.
Thus far I have assumed that the functional elements that a morpheme
‘‘realizes’’ will be exactly the set of elements that the shape of the mor-
pheme is sensitive to. This is a very natural assumption; for example, the
fact that the appearance, or not, of the -s marker on English verbs is
sensitive to the functional elements of Tense and Person and Number
leads us to suppose that -s ‘‘represents’’ these features. To account for the
possibility of multiple exponence, we must pull apart somewhat these two
properties of morphemes. We must allow a morpheme to be ‘‘sensitive
to’’ more features than it realizes. The result will inevitably be a weaker
theory, though there is at least one version that has some teeth to it.
Suppose the functional elements that a morpheme represents must be
Inflectional Morphology 215
some continuous subsequence of the full chain of functional elements that
it is sensitive to. This would allow a notation in which the functional ele-
ments that the morpheme is sensitive to, but that it does not represent,
can simply be marked as ‘‘inert’’ for the purposes of the RC. (I will use
parentheses to mark such an inert subsequence.)
(35) Multiple exponence
T > AgrS > Asp > AgrO > V
|-------------af1(--------------)|
|------af2------|
Suppose that af1 is ‘‘sensitive to’’ the features T through AgrO, whereas
af2 is sensitive to Asp through AgrO. Without the notion of inert element
the RC could not combine both af1 and af2 with a verb, to derive forms
like (36), because both a‰xes would be subcategorized for AgrO, but
neither would be AgrO.
(36) Vþ af1 þ af2But suppose Asp and AgrO are inert for the purposes of the RC, even
though they are relevant for the paradigmatic behavior of the class of
a‰xes that af1 belongs to. The resulting representations will be as in (37)
and the RC can combine them both with the verb as in (37c), since now
the type of af2 matches the subcategorization of af1.
(37) a. af1: T > AgrS > (Asp > AgrO) T, Asp
b. af2: Asp > AgrO Asp, V
c. [[Vþ af2]Asp þ af1]The restriction of inert elements to a single subsequence is an empiri-
cally sharp prediction, though I am not in a position to present evidence
that would confirm or refute it. We could envision an even tighter ver-
sion, in which the subsequence was always peripheral; again, I have no
idea how such a restriction would fare empirically, but it is easy to imag-
ine a kind of language that would refute it.
But one problem with the account as it stands is that it permits arbi-
trary choices in determining which a‰x has the inert features, and where
features are inert. Imagine two a‰xes, af1 and af2, each sensitive to the
same subchain of three elements.
(38) a. . . . F1 > F2 > F3 . . .
af1: F1 > F2 > F3af2: F1 > F2 > F3
b. Vþ af1 þ af2
216 Chapter 8
There are six di¤erent ways that inertness could be assigned so that af1and af2 can be combined with a verb as successive morphemes, as in
(38b); (39) shows three of them.
(39) a. af1: (F1 > F2 > F3)
af2: F1 > F2 > F3b. af1: (F1 > F2) > F3af2: F1 > (F2 > (F3)
c. af1: (F1) > F2 > F3af2: F1 > (F2 > F3)
We can probably rule out (39a) on general grounds: it gives af1 no fea-
tures for the RC to use in deriving complex verbs, so such an a‰x would
never appear in a derivation that was purely the result of successive
applications of the RC. As for the di¤erence between (39b) and (39c),
there is an interesting connection between inertness of features and para-
digm structure that could be used to give determinate analyses in such
cases. Elsewhere (Williams 1997) I have proposed that the inert elements
will always be minor paradigm dimensions, and the noninert elements will
be major paradigm dimensions. Major dimensions represent the broadest
subdivisions in the paradigm, and evidence for major versus minor
status comes from studying syncretism in the paradigm. The fact that all
English past tense forms fall together (e.g., pleaded is the past form for
all persons and numbers) is evidence for the major status of Tense in
English, whereas the fact that English 3rd person forms do not fall to-
gether ( pleads vs. plead ) shows that Person is a minor dimension.
This connection to paradigm structure could resolve ambiguities in the
lexical assignment of inertness. If the analysis in (39b) were correct, for
example, we would expect F3 to be more major than af2, but no such
expectation arises from analysis (39c).
There are in fact languages with exactly the a‰x pattern illustrated in
(39), Arabic and Georgian being among those I have analyzed as just
proposed. In each language there are two morpheme classes (a prefix class
and a su‰x class) each of which is sensitive to exactly the same set of
features ({SubjNumber, SubjPerson, ObjNumber, ObjPerson} in Geor-
gian and {SubjGender, SubjPerson, SubjNumber} in Arabic). (My anal-
yses were based on prior studies by Anderson (1992) and Noyer (1992),
respectively.)
There is a huge potential for ambiguity in the assignment of inertness
for the Georgian case especially, where four features are implicated. (40)
lists half of the possibilities.
Inflectional Morphology 217
(40) a. af1: F1 > F2 > F3 > F4af2: (F1 > F2 > F3 > F4)
b. af1: (F1) > F2 > F3 > F4af2: F1 > (F2 > F3 > F4)
c. af1: (F1 > F2) > F3 > F4af2: F1 > F2 > (F3 > F4)
d. af1: F1 > (F2 > F3 > F4)
af2: (F1) > F2 > F3 > F4e. af1: (F1 > F2 > F3 > F4)
af2: F1 > F2 > F3 > F4
In both languages examined it turns out that if Fi is a major dimension
for af1, then it is a minor dimension for af2, and vice versa. The Georgian
inflected verb, for example, has the form in (41), where both a‰xes
are sensitive to both subject agreement features and object agreement
features.
(41) F1 > F2 > F3 > F4af1 þ rootþ af2
But an examination of syncretisms in the paradigms for the two a‰xes
shows that Subject features are major dimensions for the su‰x, and
minor for the prefix, and Object features are the opposite, so that the
underdetermination is resolved, giving a system something like (40c).
It is perhaps surprising that the paradigms associated with morphemes
sensitive to identical sets of features should not have the same major/
minor dimensional ordering within the same language, but that may be
one of the milder surprises in store for us in the much-studied but little-
understood human ability to build paradigms. The structure of paradigms
is not the subject of this book; but see Williams 1997 for a discussion of
the paradigms from Arabic and Georgian that substantiate the claims
about paradigms made here.
For present purposes it is enough to know that the hypothesized con-
nection to paradigm structure can eliminate the arbitrariness of deter-
mining what functional elements are inert in what morphemes and can
therefore yield more determinate analyses.
8.4 Some Inflectional Systems
I have already outlined the enterprise of this section. For each language I
will first, determine whether CAT plus a universal base of functional ele-
218 Chapter 8
ments sets the proper bounds on what an inflectional system can do to
represent functional elements; and second, see if the details of word shape
in particular languages can be predicted by specifying level and order
restrictions on particular morphemes or classes of morphemes, in accor-
dance with the LVH.
The simplest sort of language from an inflectional point of view is one
where the RLS of functional elements is realized as an RLS of su‰xes.
(42) V
V
þ af2AgrO
þ af3T
þ af4AgrS
Such a language is the one expected in particular in Cinque’s version of
the Pollock-style model, in which the verb moves in syntax through the
head position of a series of functional projections, one projection for each
functional element, picking up an a‰x in each move by left-adjoining to
it. In the terms I will use to describe inflectional systems here, it is a lan-
guage that exhibits no fusion, and in which each morpheme takes its
complement to the left. If we use a kind of level restriction to bar a‰x-
ation to other a‰xes, then exactly the left-linear structure will result.
(43) [[[Vþ af1]þ af2]þ af3]While languages do exist that so transparently represent the functional
chain, they are somewhat rare.
More complex is a language with some fusion, but with the trans-
parently mirroring order of markers. Consider for example Mohawk or
Southern Tiwa, whose verbal inflectional systems look like this:
(44) a. Ka-’u’u-wia-ban.
1subj.2obj-baby-give-past
(Southern Tiwa)
b. [AgrS > AgrO > V] < T
[AgrS ¼ AgrO > V] < T
c. T: su‰x, T, AgrSAgrS ¼ AgrO: prefix arising from fusion: AgrS, VV: stem
Example (44a) shows that subject and object agreement marking are
fused into one morpheme, ka. (‘‘¼’’ represents the boundary at which twoadjacent elements are fused.) Mohawk and Southern Tiwa have the fur-
ther complication that T is on the opposite side of the stem, but as the
parse in (44b) suggests, this is not a problem for the hierarchical relation
Inflectional Morphology 219
among the elements. The match between functional elements and mor-
phemes is one-to-one except for the single fusion. Note that Mohawk and
Southern Tiwa require the complement order AgrS < T so that AgrO and
AgrS will be adjacent, hence able to fuse; it remains to be seen if that
order is universally possible. (44c) shows the language-particular specifi-
cations that determine the shape of the inflected verb.
Swahili, which does not exhibit fusion, would at first glance seem to
provide an even more transparent representation of the functional ele-
ments and thus an exact match between morphemes and functional
elements. But findings reported by Barrett-Keach (1986) show that the
Swahili inflected verb does not instantiate the RLS. Barrett-Keach shows
that the inflected verb has an internally bifurcated structure, as illustrated
in (45b).
(45) a. [AgrS þ TþAgrO þ V]wordb. [[AgrS þ T] [AgrO þ V]]word
Barrett-Keach gives two kinds of evidence for this conclusion. First, the
inflected verb has the accent pattern that Swahili assigns to compound
terms generally, including nominal compounds: main stress on the pen-
ultimate syllable of the second element, and secondary stress on the
penultimate syllable of the first element. (SP and OP stand for subject
and object pronoun clitic.)
(46) Juma
Juma
a-li-ki-soma
sp-past-op-read
kitabu.
book
‘Juma read the book.’
(Barrett-Keach 1986, (1a))
This would follow if the structure in (45b) were correct, and the two con-
stituents of the inflected verb were identified as stems.
(47) [[T AgrS]stem [AgrO V]stem]word
Barrett-Keach’s second piece of evidence is that Swahili has a su‰x, cho,
indicating relativization, which can appear in the middle of the inflected
verb, exactly between the two hypothesized stems.
(48) kitabu
book
a-li-cho-ki-soma
sp-past-rel-op-read
‘the book which s/he read’
(Barrett-Keach 1986, (10))
220 Chapter 8
Cho is clearly a su‰x, because it can also be appended to the comple-
mentizer (ambaþ cho). It can receive a unitary account only if a-li-cho-ki-
soma has the internal structure indicated in (45), which allows cho to be
appended to the first stem of the inflected verb. We can achieve that
structure by stipulating the following language-specific constraints on the
morphemes that realize the functional elements:
(49) T: prefix
AgrS: stem
AgrO: prefix
V: stem
T and AgrS compose a stem through a‰xation, as do AgrO and V; then
compounding (actually, the RC applying to two stems) assembles the
complete inflected verb from these subunits. Swahili illustrates what
might be called a word-internal auxiliary system (the T-AgrS stem), and
this treatment of it prefigures my general treatment of auxiliary systems.
I now turn to the more problematic cases for theories that essentially
expect the RLS (or LLS) as the only realization of functional elements.
The first is Navajo, in which AgrS intervenes between AgrO and V.
(50) AgrO Asp T AgrS V
There are two ways to parse this structure in CAT terms, depending
on whether T > AgrS (51a) or AgrS > T (51b). The lexical specifications
needed to force the analysis are given below each parse.
(51) a. [AgrO < [Asp < [T > AgrS]]] > V
T: prefix
AgrS: stem
Asp: prefix
AgrO: prefix
b. [[AgrO < Asp] < T] < [AgrS > V]
T: su‰x
Asp: su‰x
AgrS: su‰x
AgrO: stem
Mohawk and Swahili both require T > AgrS, so we might want to tenta-
tively assume that as the universal order and therefore favor parse (51a).
On behalf of parse (51b) we could point to the uniform su‰xation to the
AgrO stem that would result; although mixed systems exist with both
Inflectional Morphology 221
prefixes and su‰xes, the economics of the lexicon may favor uniform
prefixation or su‰xation. I leave the question open.
Inuit is the mirror image of Navajo, with AgrS between V and AgrO as
a su‰x.
(52) a. V T AgrS AgrOb. Piita-p
Piita-erg
mattak
mattak.abs
niri-va-a-0.
ate-indic-3sg.subj-3sg.obj
‘Piita ate the mattak.’
(Bok-Bennema 1995, 105)
(53) V < [T > AgrS > AgrO]
AgrS: prefix
T: prefix
AgrO: stem
(52) shows the order of elements, and (53) shows the parse and the lexical
specifications that force the analysis. As with Navajo, a di¤erent parse
results if AgrS > T.
In Yuman, Lakhota, and Alabama, on the other hand, there is a CAT
parse only if T > AgrO.
(54) AgrO AgrS V T
[[AgrO < AgrS] > V] < T
(P. Munro, personal communication)
There is no parse if AgrS > T, as the string then represents the ‘‘3 1 4 2’’
configuration already shown to lie outside CAT. We now have a conflict
between the requirements of two di¤erent languages: Navajo requires
T > AgrS because its fusion of AgrS and AgrO entails that these must be
adjacent in the chain; Yuman, on the other hand, requires AgrS > T. An
obvious way to resolve this would be to allow languages to di¤er in their
choice on this point, or, equivalently, to claim that there are two distinct
notions of Tense, one superior to and the other inferior to AgrS. Con-
vincing support for the latter position would of course be a language in
which both occur. I will leave the matter unresolved.
In English, auxiliary verbs are part of the spanning vocabulary. The
auxiliary verbs take their complements in syntax rather than morphology;
consequently, their complements are XPs rather than Xs. The special
feature of the morphology is that only one a‰x occurs on the verb; AgrSand T are always fused. Temporarily abandoning the fixed universal
chain, I will interpolate some Asp elements and a Voice element in the
222 Chapter 8
functional chain. How this new chain is related to the universal chain will
be left open.
(55) English functional chain
AgrS ¼ T > Asp1 > Asp2 > Voi > V
How various elements of the spanning vocabulary are related to the
functional chain is shown in (56).
(56) T Asp1 Asp2 Voi V
might have been being killed
|-killed-|
|------killing------|
|-----------killed-----------|
|------------------kill------------------|
|-----------------------kills-----------------------|
|------------passive--was------------|
|-------has--------|
|--been--|
|-modal-|
[----s-----|
The complex items shown here are derived by the RC from more ele-
mentary morphemes; for example, kills, which spans the whole chain, is
formed from kill and -s. The basic rule for relating form to function here
is the following: the stem of a form is determined by the left edge of its
span, and the form of that stem by the right edge. For example, has spans
T and Asp1. If it spanned more to the right, a di¤erent stem would be
used (has vs. was); if it spanned less to the left, a di¤erent form would
be used (e.g., has vs. have).
A given clause will span functional structure by a combination of mor-
phological and syntactic derived units. For example, (57) shows the
derivation of the pieces of John was being sued, from bottom to top.
(57) [be < ing]Voi, V morphology
[sue < ed]V, NP morphology
[being > suedP]VoiP syntax
John [was > [being > sued]]T syntax
The RC applies in both the lexicon and the syntax. The only di¤erence in
the outcome is determined by independent di¤erences between morphol-
ogy and syntax: complementation is left-headed in syntax but right-headed
Inflectional Morphology 223
in morphology, and complements are phrases in syntax but X0s in
morphology.
The RC, along with lexical specifications of the type that the LVH
a¤ords, thus lays out a good first approximation to the general question,
what is a possible verbal inflectional system in natural language? The fact
that the RC is invariant across agglutinating and isolating systems makes
it the only real candidate for a general answer. In what follows I will
sketch its role in other domains.
8.5 Verb (Projection) Raising as an Instance of CAT
I now turn to an application of CAT outside inflectional morphology:
namely, the realm of verb projection raising. In fact, I believe the appli-
cations of CAT outside morphology are numerous, and I have picked
verb projection raising merely as an illustration. My best guess is that
CAT is the relevant model of a system that involves only the playing out
of lexical specifications of type, order, and level.
The analysis presented here is based on Haegeman and Van Riems-
dijk’s (1986) discussion of the phenomenon. The model I present below
incorporates insights from their work, but rejects the role of movement in
the system, deriving all forms directly by the RC and lexical specifications
of order and level.
Example (58) illustrates verb raising in Dutch.
(58) a. *dat
that
Jan
Jan
een
a
huis
house
kopen
buy
wil
wants
(‘‘DS’’) NP < V < V
‘that Jan wants to buy a house’
b. dat Jan een huis wil kopen (VR) NP < [V > V]
c. *dat Jan wil een huis kopen (VPR) V > [NP < V]
(H&VR 1986, 419)
(58a) is the (ungrammatical) deep structure in Haegeman and Van
Riemsdijk’s (H&VR) model; (58b) is the verb-raising (VR) structure.
(58c) is the verb projection raising (VPR) structure, which is ungram-
matical in Dutch. In the VR construction, an embedded verb is raised out
of its complement and adjoined to the matrix verb, to the right; in the
VPR construction, the same operation is performed on an embedded VP.
While VPR is ungrammatical in Dutch, it is found in some other Ger-
manic dialects, such as West Flemish (59) and Swiss German (60).
224 Chapter 8
(59) a. da Jan een hus kopen wilt NP < V < V
b. da Jan een hus wilt kopen (VR) NP < [V > V]
c. da Jan wilt een hus kopen (VPR) V > [NP < V]
(H&VR 1986, 419)
(60) a. das de Hans es huus chaufe wil NP < V < V
b. das de Hans es huus wil chaufe (VR) NP < [V > V]
c. das de Hans wil es huus chaufe (VPR) V > [NP < V]
(H&VR 1986, 419)
I will analyze the VR and VPR constructions as instantiations of CAT.
This means that CAT sets the outer bounds on the form that these
constructions can take. It also means that all variation will be found in
the level and order subcategorizations of predicates or classes of predi-
cates. In the right margin of the constructions listed above are the CAT
representations.
If we were to interpret CAT as RLSþ (or, more appropriately, LLSþ:just like RLSþ, but using the LLS instead of the RLS as the base), thenwe would take (59a) as the LLS and [[NP < V1]VP < V0] as the base
structure, and we would apply Flip and Reassociate to derive [NP <
[V0 > V1]], which is the West Flemish VR structure. Some mechanism
would be needed to guarantee that Reassociate and Flip applied obliga-
torily in this case.
I will instead model V(P)R directly as CAT, in accordance with the
LVH. Under this interpretation Flip will correspond to order : right,
absence of Flip to order : left, and optional Flip to unspecified order. Left-
Reassociate will correspond to level :X0, which gives VR; lack of Left-
Reassociate will correspond to level :XP; and optional Left-Reassociate
will correspond to unspecified level. It seems to me that the entire range
of constructions discussed by H&VR can be described in these terms.
Dutch, for example, obligatorily Flips embedded verbs, but never VPs;
in CAT terms this means that the verbs in question have the subcatego-
rization shown in (61).
(61) V0
Modal verbs are exceptional in that they undergo Flip optionally.
(62) a. dat
that
ik
I
hem
him
zien
see
wilMwant
b. dat ik hem wilM zien
(H&VR 1986, 426)
Inflectional Morphology 225
In CAT terms this means that the order parameter is unset for these
verbs; or, equivalently, they have the additional subcategorization in (63).
(63) V0
There is an unexpected exception to (63): only basic Vs can have this
subcategorization, not V0s that are themselves complex verbs.
(64) a. *dat
that
ik
I
hem
him
kunnen
can
zien
see
wilMwant
b. dat ik hem wilM kunnen zien
‘that I want to be able to see him’
(H&VR 1986, 426)
This restriction is intuitively a level constraint: complex [V V] structures
are ‘‘bigger’’ than simple Vs. If we use the term stem in such a way that it
includes simple Vs, but excludes V-V compound verbs, then we could add
the level restriction to (63) to get (65).
(65) V0stem
In all of these cases the derived verb cluster has the same subcatego-
rization as the complement verb in the cluster, as determined by the RC.
As a result, hem is the direct object of the complex cluster in (64b), for
example, so the CAT structure of that clause is as follows:
(66) dat ik [hem < [wil > [kunnen > zien]V]V]VP
German has obligatory Flip for auxiliary verbs (H&VR 1986, 427) but
optional Flip for modals; these are straightforwardly treated as order
constraints on the model of Dutch.
West Flemish obligatorily Flips either the V or the whole VP around a
modal or auxiliary, as in (59). The order and level restrictions that ac-
count for this are as follows:
(67) M,A: V
The notation V is to be understood as ‘V0 or VP’; that is, no level con-
straint is applied, and so the term covers both VR and VPR.
I now turn to the complexities that arise when a series of VPs is
involved in VPR in Swiss German. I will show that the lack of a level
constraint in (67) accounts precisely for a complex array of possible out-
comes. The possible orders of a series of four verbs in which the lowest
takes a direct object are listed in (68).
226 Chapter 8
(68) a. das
that
er [
he
en
an
arie
aria
singe]
sing
chone]
can
wele]
want
hat
has
‘that he has wanted to be able to sing an aria’
b. N < V4 < V3 < V2 < V1c. V1 NP V2 V3 V4d. V1 V2 NP V3 V4e. V1 V2 V3 NP V4f. *V1 V2 V3 V4 NP
(H&VR 1986, 428)
The verbs must all appear in Flipped order; the direct object can appear
anywhere in the series except after the most deeply embedded comple-
ment. This patterning follows immediately from the stipulation in (67),
coupled with the further stipulation that no verb that takes a direct object
can take it on the right.
(69) a. M,A, V
b. V, NP
The absence of a level constraint in (69a) corresponds in RLSþ to op-tional Reassociate; Flip is obligatory, so the verbs always appear in
exactly reverse order (the reverse of (68b)).
(70) a. Reassociate at will.
b. Flip all V < V nodes. (for complex as well as simple Vs)
c. Flip no NP < V nodes. (for complex as well as simple Vs)
d. V2, NP
e. V1, V
f. V1 þ V2 ! [V1 > V2]V, NPThe stipulation in (70b,c) that Flip is forced (or fails) for both complex
and simple Vs taking direct objects follows from the RC, hence does not
count as a separate stipulation. If a complex verb is formed by combining
a modal or auxiliary with a transitive verb, the subcategorization of the
transitive verb will be inherited, including any order restriction, as the RC
dictates—(70f ) is the result of combining (70d) and (70e) with the RC. So
the extra stipulation in (70c) is not part of the theory; rather, it is added
for clarification.
(70a–c) can generate all of the patterns in (68). (68d), for example, is
derived by applying Reassociate followed by obligatory Flip.
Inflectional Morphology 227
(71)
It is important to remember that Flip and Reassociate are not essential to
the analysis; rather, they are just a way to think about CAT. The entire
analysis is (69) by itself.
A further consequence is that when the embedded verb has two argu-
ments, they may individually appear anywhere among the set of rean-
alyzed verbs, so long as they do not exchange places; the verbs will be
ordered among themselves exactly as in the one-argument case (68).
(72) das
that
er
he
em Karajan1(to) Karajan
en
an
arie2aria
vorsinge3sing-for
chone2can
wil1wants
‘that he wants to be able to sing an aria for Karajan’
(H&VR 1986, 434)
(73) a. NP1 NP2 V1 V2 V3b. NP1 V1 NP2 V2 V3c. V1 NP1 V2 NP2 V3d. V1 NP1 NP2 V2 V3e. V1 NP1 V2 NP2 V3f. V1 V2 NP1 NP2 V3g. *. . . NP2 . . . NP1 . . .
h. *. . . V3 . . . N . . .
In order to treat these cases as CAT, we must have some means of
representing verbs that take two arguments. We will adopt the ‘‘small
clause’’ analysis.
(74) [[NP < NP] < V]
Given this, we can derive all of the patterns in (73) from the stipulations
in (70). In terms of Flip and Reassociate, we can derive all of the patterns
in (73) from (73a). For example, we can apply Reassociate to (73a) to
derive (75a), and then apply Flip to derive (73f ); or we can apply Reas-
sociate to (73a) to derive (75b) and then apply Flip to derive (73d); or we
can simply not apply Reassociate, but then apply Flip to derive (73b).
(75) a. ! [[[NP1 < [NP2 < V3]] < V2] < V1] Flipm
[V1 > [V2 > [NP1 < [NP2 < V3]]]] (73f )
b. ! NP1 < [[NP2 < [V3 < V2]] < V1] Flipm
[V1 > [NP1 < [NP2 < [V2 > V3]]]] (73d)
c. [NP1 < [V1 > [NP2 < [V2 > V3]]]] (73b)
228 Chapter 8
As in the previous example, Flip and Reassociate play no role in the
analysis, which is completely determined by (69).
CAT’s success in modeling V(P)R is considerable, and the evidence
for the LVH is compelling as well. With very simple lexical stipulations
about subcategorization of individual lexical items or classes of lexical
items—mechanisms that surely no theory could forgo—we have suc-
ceeded in modeling V(P)R as described by H&VR, but without move-
ment and without the novel mechanism of dual analysis that they
believed necessary to describe the phenomena.
If CAT is the appropriate model whenever lexical subcategorizations
are played out in syntax, then it should come as no surprise that V(P)R
shows CAT-like behavior. Other constructions where CAT should be ap-
plicable are noun incorporation, causatives, derivational morphology,
and preposition stranding.
But not wh movement. CAT is not Categorial Grammar as espoused
by (among others) Bach (1976), Moortgat (1988), and Steedman (1996) in
that it lacks type-lifting, the feature that makes it possible to embed
descriptions of the broadest long-distance dependencies.
8.6 The Hungarian Verbal System
Hungarian has a verbal system very much like that of Germanic. It can
be similarly modeled by CAT, but with one striking shortcoming. Tradi-
tionally, the positioning of the Hungarian verbal modifier (VM, to be
explained below) has been modeled along with the rest of the verbal sys-
tem. CAT cannot do this. CAT gives a simple and satisfying model of the
verbal system minus the VM, capturing many of its very particular (but
robust) properties. But when the CAT definitions needed to model the
positioning of the VM are added to it, it overgenerates to the point that
the model is useless, no longer predicting any of the interesting features.
CAT is so restrictive that its failure to model a system is by itself in-
formative, and so no cause for lament. But in this case the message is
sharper: it suggests that, despite tradition, the positioning of the VM is
independent of the verbal system. In the end I will o¤er reasons to think
this is so.
8.6.1 The Verbal System without VMs
I will quickly sketch the verbal system first without the VM, and then
with the VM, noting the main generalizations. These generalizations
Inflectional Morphology 229
represent a hard-won understanding of the system developed over a de-
cade or so by Kenesei (1994), Szabolcsi (1996), Koopman and Szabolcsi
(2000), and Brody (1997), among many others.
Hungarian has a small series of optional ‘‘modal’’ verbs that occur in a
clause in fixed interpretive order, just the sort of system CAT likes.
(76) Nem
not
fogok
will.1sg
kezdeni
begin.inf
akarni
want.inf
be
in
menni.
go.inf
‘I will not begin to want to go in.’
(Koopman and Szabolcsi 2000, 16)
Ignoring the VM (be), each element in (76) has scope over all elements to
its right. Furthermore, any reordering of adjacent elements results in
ungrammaticality. From this, we can conclude that the following order
holds:
(77) nem > fogok > kezdeni > akarni > main-verb
In its rigidity, and its rightward orientation, this system resembles for ex-
ample the English auxiliary system, and in fact, Koopman and Szabolcsi
(2000) refer to the order in (76) as the English order. I will adopt this term
from them and use it to refer to the ‘‘head-first’’ order. It is of course the
RLS.
In addition to the order displayed in (76), Hungarian has a di¤erent—
in fact, opposite—way to deploy the series in (77).
(78) a. Nem [fogok > kezdeni > [[be < menni] < akarni]].
b. Nem [fogok > [[[be < menni] < akarni] < kezdeni]].
(Koopman and Szabolcsi 2000, 210)
Importantly, the interpretive order of the elements in (78) is the same as
in (76); that is, akarni always has scope over menni, for example, despite
their being in opposite orders in (76) and (78). In other words, (78) rep-
resents di¤erent ways to realize the abstract structure in (77). The carets
in (78) indicate the understood orders. The order of elements in (78b) I
will call the compound order, as the head-complement order is that found
in compound terms. Brody calls it the roll-up order, for good reason, as
we shall see. The tensed verb and its complement are always in the
English order.
As the forms in (78) show, any given sentence with multiple auxiliaries
will show a mixture of the English and compound orders. But there are
strong constraints on the mixture.
230 Chapter 8
1. The tensed verb cannot occur in a compound order.
(79) a. fogok > be < menni < akarni < kezdeni
b. *be < menni < akarni < kezdeni < fogok
2. Any compound structure must be at the bottom of the string of
auxiliaries.
(80) a. nem > fogok > kezdeni > akarni > be < menni
b. nem > fogok > akarni > be < menni < kezdeni
c. *nem > fogok > [akarni < kezdeni] > be < menni
3. The English order cannot occur inside a compound order.
(81) a. fogok > be < menni < akarni < kezdeni
b. *fogok > be < [akarni > menni] < kezdeni
These three findings can be summed up in the following recipe for creat-
ing alternative orders for a given string of auxiliary verbs completely in
the English order: beginning at the bottom, the bottom two terms can be
compounded, or ‘‘rolled up’’; and this rule can be applied repeatedly, but
not at the very top, where the tensed verb must be in the English order.
This system is easily modeled in CAT. Since each auxiliary, apart from
the tensed auxiliary, can appear on either side of its complement, each is
ambiguous with respect to order; that is, each has both of the following
subcategorizations:
(82) F, F
This by itself is not enough, because, with the RC, it will generate all of
the ungrammatical orders in (79)–(81). (80c), for example, would count
as grammatical, with exactly the parse indicated. To prevent this, we
must also impose level constraints. There is some question what the rele-
vant levels are; I will assume they are word and phrase (as the term com-
pound in compound order suggests). Assuming further that the compound
order is essentially lexical, and the English order is essentially phrasal, we
have the following subcategorization:
(83) Aux: Fn, F0
That is, each auxiliary takes a phrase of type F to the right, or a word of
type F to the left.
Furthermore, because the tensed auxiliary does not participate in the
compound structures, it has strictly the left of the two subcategorizations
in (83).
Inflectional Morphology 231
(84) AuxT: Fn
I assume this is a further stipulation, as there is in general no ban on
tensed verbs entering compound structures (e.g., English baby-sat).
Then, given the RC, along with the assumption that words can head
words, and words can head phrases, but phrases cannot occur in words,
we predict some of the contours of the Hungarian system. The fact that
the English order cannot occur in the middle of a compound follows from
the fact that a phrase (the bracketed FP in (85)) cannot occur in a com-
pound (marked here with { }).
(85) *fogok > {[akarni > [be < menni]]FP < kezdeni}
The fact that a compound cannot occur in the middle of a sequence of
auxiliaries does not follow from the specifications in (83). (86) is a parse
of such a case consistent with (83).
(86) *nem > fogok > [akarni < kezdeni]Aux; VP > [be < menni]VP
In (86) akarni and kezdeni form a compound verb, where akarni has its
VP-taking, rather than V-taking, subcategorization; that subcategoriza-
tion is inherited by the compound, according to the RC. Although some
speakers accept forms very much like this, I will assume that they are
ungrammatical, and I will introduce the further specifications necessary
to rule them out.
The problem would be solved if akarni were prevented from using its
VP-taking subcategorization when it was in a compound. This can be
achieved by reconstruing the ambiguity of the auxiliary verbs in a slightly
di¤erent way. Specifically, the principal ambiguity will be between root-
and word-level forms for each of the auxiliaries, as in (87).
(87) akarni: root, Frootword, Fn
That is, akarni is still ambiguous, but between the two levels root and
word; roots enter into the compounding system, and words into phrasal
syntax. Now (86) cannot be produced; only the root akarni can appear on
the left of a compound, and only a further root subcategorization can be
inherited by the compound.
To allow compound structures to appear in syntax, we must allow
roots to be reconstrued as words; once this is done, they can be used in
syntax, but they cannot enter the compounding system again. But this is
the classical relation between words and phrases.
232 Chapter 8
While the ‘‘coding’’ in (87) may appear suspicious, it is really harmless,
when one considers that if CAT is the model, the only way languages can
di¤er is with respect to level, order, and type restrictions, and these
restrictions are enforced in a rigid local fashion by X-bar inheritance and
the RC. I suspect that the ambiguity in (87) occurs in English as well,
with particle-verb constructions; that is, the relation between (88a) and
(88b) is really a level-order ambiguity between root- and word-level iden-
tification of the particle itself.
(88) a. John [looked up]V the answer.
b. John looked the answer up.
c. *John looked right up the answer.
d. John looked the answer right up.
e. the looking up of the answer
f. *the looking of the answer up
The lexical version of the particle excludes modification (88c), whereas
the syntactic version allows it (88d). The lexical version nominalizes (88e);
the lexical particle is ‘‘inside’’ the nominalization and therefore immune
to the laws governing the form of NPs. The syntactic version does not
nominalize (88f ); the syntactic particle is ‘‘outside’’ the nominalization,
where it is excluded from NP on general grounds. I imagine this line of
analysis could be applied to German separable prefixes as well.
Finally, to account for the absence of tensed verbs inside compound
structures, we require that T be represented only by a word-level element.
In the reformulation this remains a separate stipulation.
These stipulations exactly account for the Hungarian compounding
paradigm, if the VM is excluded.
Koopman and Szabolcsi (2000) seek a theory of clusters that involves
only phrasal syntax and XP movement. They thus seek to avoid any
reference to the lexical/phrasal distinction on which the analysis just
given rests. Their theory thereby also distinguishes itself from any of the
theories in which the roll-up structure results from X0 movement, and
VM fronting from XP movement.
But on close inspection the relevant distinction can be found in Koop-
man and Szabolcsi’s account, just relabeled as ‘‘smallness’’ instead of
‘‘lexicality.’’ Smallness, never defined, has less intuitive content than lex-
icality, though it would seem to be extensionally equivalent to it, judging
from the examples that Koopman and Szabolcsi give. But ‘‘smallness’’
leads to grave problems that ‘‘lexicality’’ does not have.
Inflectional Morphology 233
What allows Koopman and Szabolcsi to contemplate the elimination
of X0 movement is that massive remnant movement makes it possible
to simulate lexical movement by phrasal movement, as in the following
derivations:
(89) a. [XP YP ZP H]HP ![XP [YP [ZP [tXP tYP tZP H]HP]]]![tXP tYP tZP H]HP [XP [YP [ZP tHP]]]!
b. [XP YP ZP H]HP ![H [XP YP ZP tH]HP]
The pair of movements in (89a) result in the same surface configuration
as the movement in (89b). The movements in (89a) are first evacuation of
everything in HP except its head, followed by movement of the remnant
HP. The movement in (89b) is head movement.
Koopman and Szabolcsi simulate the head clustering for verbs in the
compound structure with the following condition:
(90) When the specifier of VPþ is a small VM or an inverted sequence,VPþ optionally extracts from CP. Otherwise, VPþ cannot extractfrom CP.
For reasons of space, I will not explain here how this principle interacts
with the theoretical environment that Koopman and Szabolcsi provide to
yield the constructions I have identified as lexical, or at least as involving
nonphrasal heads; but see Williams, in preparation, for a full discussion.
It is enough to see that lexicality is entering the system under the guise of
smallness. I think this is a step backward from the general understanding
of these constructions, in that it replaces a word with a relatively concrete
meaning (lexical ) with one distinctly less concrete (small ).
8.6.2 The Verbal System with VMs
I think that the fact that the RC with X-bar inheritance allows the be-
havior of the Hungarian verbal system, so complex at first glance, to be
boiled down to (87) (with help from (84)) is an impressive result. The
analysis is challenged, however, by the behavior of the VMs, which can-
not be fit into the system without losing all predictions.
The VM is a particle, or sometimes a short phrase, that is closely asso-
ciated with the main verb, sometimes forming an idiomatic expression
with it. The VM occurs either before or after the tensed verb, depending
on features of the sentence in which it occurs. If there is a preverbal neg-
234 Chapter 8
ative or Focus phrase, the VM occurs after the verb; if not, and if some
other conditions are met, it occurs before the verb.
(91) a. Nem
not
fogok
will.1sg
be
in
menni.
go.inf
‘I will not go in.’
b. Be fogok menni.
c. *Nem be fogok menni.
d. *Be nem fogok menni.
Be is a complement of menni; but in (91b) it occurs to the left of the
tensed auxiliary verb. And in fact, an unbounded number of auxiliary
verbs can appear between the particle to the left of the tensed verb and
the verb of which it is a complement.
(92) Be
in
fogok
will.1sg
kezdeni
begin.inf
akarni
want.inf
menni.
go.inf
The question is, what regulates the relation between these two positions?
The ‘‘trigger’’ for the appearance of be in initial position has been
argued to be phonological (e.g., Szendroi 2001): the auxiliary verb needs
‘‘support,’’ if not from a negative or a Focus, then from a particle. I will
assume that the trigger is an extrinsic constraint that CAT is not obliged
to model. Even so, CAT fails.
So far I have posited leftward root subcategorization for the compound
order and rightward phrasal subcategorization for the English order. To
generate (92), the CAT specifications must admit a third possibility—
namely, that a sequence of words can realize the English order, as only
words can transmit, via the RC, the lower verb’s need for the particle to
the top of the verb chain.
(93) a. Aux: Fword
b. menni: be
c. be < [fogok > kezdeni > akarni > menni]
If each auxiliary has a specification like the one in (93a), and the verbs
taking VMs have specifications like the one for menni in (93b), then (92)
will have a parse like (93c).
There is in fact some circumstantial evidence in favor of treating VMs
in this way. The verbs that enter into compounding relations with one
another are approximately the same verbs that permit VM raising: utalni
‘hate’, for example, does neither. But the lists are not identical (K. Szen-
droi, personal communication), so this consideration is hard to evaluate.
Inflectional Morphology 235
But there are two problems with analyzing VMs in this way.
First, (93) predicts that particle movement should be compatible with
compounding, but it is not.
(94) *Be < [fogok > kezdeni > [menni < akarni]].
Particle raising is compatible only with the pure English order, so any
compounding interferes. From the point of view of CAT this is very odd,
as other phrasal complements are compatible with compounding, which
shows that compounding is transparent to a main verb’s subcategoriza-
tion. For example:
(95) Nem >
not
fogom
will.1sg
> akarni
want.inf
> [szet
apart
szedni
take.inf
< kezdeni]
begin.inf
a
the
radiot.
radio
This example shows that compounding of the main verb (represented by
the bracketed sequence) does not prevent the main verb’s direct object
subcategorization (szetszedni: NP) from becoming the subcategorization
of higher constituents. If for direct objects, then why not for particles?
Second, particles seem to be able to raise out of embedded CP com-
plements under certain circumstances. For example:
(96) Szet
apart
kell,
must
hogy
that
szedjem
take.subjunctive.1sg
a
the
radiot.
radio
‘I must take apart the radio.’
(Koopman and Szabolcsi 2000, 211)
Although such cases are quite restricted, the fact that they exist at all
suggests that CAT is not the right mechanism to account for them.
These two properties of VM positioning—opacity of the compound
structures and nonlocality—both point to movement in the classical
sense, rather than CAT inheritance. Compounds are always opaque to
syntactic movement, but CPs are not.
If indeed the VM is positioned by movement and not by the same sort
of system that creates the verbal clusters, a sharp theory is needed to ex-
plain how a child would not be led astray by all the evidence that has
misled linguists into analyzing the two phenomena as one system. CAT is
just such a theory, because simple considerations unequivocably rule it
out as a model of the VM, even though it is an obvious model of the
verbal clusters.
Another reason to implicate movement in the positioning of the VM is
noted repeatedly by Koopman and Szabolcsi (2000): the VM can often be
236 Chapter 8
a full phrase. This again is characteristic of movement, especially move-
ment that bridges CPs.
(97) [a
the
szoba-ba]PProom-into
menni
go.inf
‘go into the room’
And, importantly, the VM cannot be phrasal when incorporated into a
compound.
(98) *[[a
the
szoba-ban]PProom-in
maradni]
stay.inf
akarni
want.inf
‘want to stay in the room’
This example falls within the scope of the theory outlined in section 8.6.1:
compounding involves X0s exclusively. (96) and (97) fall outside that
theory.
I think that CAT’s initial di‰culty in modeling the Hungarian verbal
complex turns out to be its virtue: CAT has the grace to fail obviously
and thereby to show where nature is jointed. Perhaps, as the last few
points independently suggest, the Hungarian VM does not compose a
homogeneous class of elements with the verbal particles after all.
In light of our conclusions about Hungarian, we can return to the
problem raised in chapter 7 about verb clusters in Czech and related lan-
guages; (99) repeats the facts from that discussion.
(99) a. Dal
give.prt
jsem
aux.1sg
mu
him.dat
penıze.
money.acc
‘I gave him money.’
b. Tehdy
then
bych
aux.1sg
byl
was.prt
koupil
bought.prt
knihy.
books.acc
‘Then I would have bought books.’
c. Byl bych tbyl koupil knihy.
d. *Koupil bych byl tkoupil knihy.
(Konapasky 2002, 246)
When there is a single participle, it can move to the left of the auxiliary.
When there are two participles, the first can move to the left of the auxil-
iary, but the second cannot. With the Hungarian system as a model, we
formulate the following restrictions:
(100) a. aux: PartP
b. Part0
c. part: XP
Inflectional Morphology 237
That is, auxiliary verbs can take a following participial phrase or a pre-
ceding participial stem; participles, on the other hand, always take an XP
complement. When both an auxiliary and a participle are present, two
structures are possible.
(101) a. aux > [part > X]PartPb. [[part < aux] > X]
(101b) corresponds to the possibility of (99a). When there are two par-
ticiples, the following structures are possible:
(102) a. aux > [part1 > [part2 . . . ]Part2P]Part1Pb. [part1 < aux] > [part2 . . . ]Part2Pc. *[part2 < [aux > part1]][ PartP]d. *[part2 < [part1 < aux][ PartP]]
The first participle can form a complex word with the auxiliary, and the
result will have the subcategorization of the nonhead part1 and so takes
Part2P on the right. But there is no way for the second participle to ap-
pear on the left as in (102c), because the unit [aux > part1] will itself be
phrasal and therefore cannot take a stem complement to the left. Simi-
larly, (102d) cannot be formed because [part1 < aux], while a stem-level
object, inherits its subcategorization from its nonhead (part1) and so can
only take an XP to the right, not a participial stem to the left. In Czech,
then, auxiliary verbs are just like the Hungarian cluster-forming auxiliary
verbs, and participles are like Hungarian nonauxiliary verbs.
These languages even have an analogue of Hungarian VM positioning.
There is a general rule of XP topicalization (Rivero 1991, Konapasky
2002) that can fill the initial position, illustrated here in Serbo-Croatian.
(103) [Citao
read.prt
knjigu]VPbook
je
aux
Ivan tVP.
Ivan
‘Ivan had read the book.’
(Konapasky 2002, 244)
If such an example were taken to show that aux had, in addition to (100a)
and (100b), a subcategorization like the following, then, as in Hungarian,
all sorts of unrealized possibilities would arise:
(104) aux: XP
Rather, as Konapasky (2002) shows, such phrases occupy the initial po-
sition by virtue of an entirely di¤erent process of XP topicalization.
238 Chapter 8
Chapter 9
Semantics in RepresentationTheory
Two features of RT lead to revisions in the standard assumptions about
how semantics is determined by syntactic form. One stems from the no-
tion of derivation in RT. In the syntactic analysis of a sentence, there is
no single structure that represents all of the information relevant to se-
mantics; semantics then must be done over the whole set of forms that
constitute the derivation and the matching relations that hold among
them. The other stems from the fact that the shape-conserving matching
that holds between levels does not always correspond to isomorphism, as
we have seen in several cases, beginning with the bracketing paradoxes of
chapter 1. To the extent that one end of such matches is semantic (or,
more semantic than the other), they give rise to instances in which the
system deviates from a strictly compositional system, the sort of system
that is standardly assumed. In sections 9.1 and 9.2 I will briefly outline
the issues involved in these two deviations, but without arriving at any
firm conclusions, apart from what I have just mentioned; the discussion
is provisional and speculative throughout, even by the standards of the
previous chapters. The role of blocking in determining meaning will re-
ceive special attention, since, as pointed out frequently in this book,
blocking is part and parcel of Shape Conservation: the most similar
blocks all the less similar, all else being equal.
In sections 9.2–9.5 I will explore, in the most preliminary possible way,
how RT fares in analyzing certain problems connected with the form-
meaning relation. In some cases I think an obvious advantage can be
demonstrated; in other cases I can show no more than that a coherent
account is possible. In section 9.2 I will illustrate the role of the blocking
aspect of Shape Conservation in understanding the contribution of Case
marking and the like. In section 9.3 I use the RT levels to index di¤erent
sorts of focus. In section 9.4 I address some problems in ellipsis, and in
section 9.5 I sketch how RT levels can be understood to index di¤erent
kinds of NP interpretations.
9.1 Compositionality
9.1.1 Matching and Compositionality
I take the interesting hypothesis about compositionality to be that it is
strict—every phrase’s meaning is some strict function of the meaning of
its parts; otherwise, the hypothesis does not say much. In what follows I
will be talking about representations of meaning that have structure: rep-
resentations that indicate the scopes of quantifiers, or that identify the
thematic roles of NPs, or whatever else there is—some of the levels of
RT. So I will be discussing translation of syntactic structures into some
other kind of language, not real semantics, which relates sentences to the
world. The question about compositionality then is one of compositional
translation: is the translation of every phrase X strictly a function of the
translation of its parts?
In the compositional scheme we start with a syntactic tree in language
A, and step by step, from the bottom up, we build some translation of
that tree. We do this for every sentence in language A, thus deriving a
second language B, consisting of all those translations. So B is whatever
A translates to.
But there is another way to think of translation. We can think of the
languages A and B as both antecedently defined, and of the translation as
a ‘‘matching’’ relation between them, one that matches to every sentence
in the first language a corresponding item in the second language.
Of course, compositional translation can be viewed as one particular
kind of matching relation. In fact, if we require the matching relation to
be absolutely ‘‘shape conserving’’—that is, if it matches up structures in
language A with structures in language B, observing conditions on the
identification of terminal elements across the two languages and respect-
ing isomorphism of structure—then the matching kind of translation
might be indistinguishable from compositional translation. In composi-
tional translation, the bottom-up piece-by-piece building of the second
tree based on what is found in the first tree will result in a tree that is
isomorphic to the first tree, and so matching translation and composi-
tional translation will always come out the same.
But there is a circumstance in which these two notions of translation
could diverge. For matching translation, we can think of the two lan-
240 Chapter 9
guages that are being matched up as definable independent of one an-
other, according to laws of form that might di¤er. In that case there
might not be an isomorphic match in the second tree for every phrase in
the first. This need not necessarily prevent the matching translation from
being a complete translation. For example, if there is no isomorphic
structure, the matching relation might pick the ‘‘nearest’’ structure as the
translation—still shape conserving, but not strict. The translation will be
fully defined, but it will diverge from a compositional translation for such
cases.
In the course of exposing RT in this book I have already presented
cases like this, which support the idea that the translation is matching,
not compositional. The cases of mismatch discussed in chapters 1 and 2
all have this character: in-situ quantifiers get wide scope by ‘‘mismatch-
ing’’ the structures at a later level, for example. But if this is possible
in general, then the question becomes, what makes the translation look
largely compositional? The answer has to be a combination of the fact
that the matching relation that happens to be in use in language is shape
conserving in the sense just mentioned, and the fact that the structures
defined in the two sublanguages are largely similar. It is straightforward
that if the two languages are fully isomorphic, then the result is indistin-
guishable from compositional translation. But if the structures defined in
the two languages are only slightly divergent, then the discrepancies be-
tween the two results might be infrequent and localized.
However, there is an interesting di¤erence between compositional
translation and matching translation that goes beyond these discrep-
ancies. In compositional translation, the translation of any given sentence
proceeds on the basis of that sentence by itself. But in the matching
theory, at least for the discrepant cases, the conclusion that b in language
B is the ‘‘best match’’ for a in language A cannot be determined by
looking just at a and b; instead, it must involve seeing what other struc-
tures are defined in languages A and B, insofar as there cannot be any-
thing that is a better match to a than b is. In this sense the matching
translation is ‘‘holistic’’: it matches the whole of language A to the whole
of language B in a way that cannot be broken down into the matching of
individual elements in A to individual elements in B.
In linguistics, syntactic transformation has traditionally been the means
of accounting for divergences of this kind, preserving compositionality.
For example: how can we compositionally determine the thematic struc-
ture of the verb see when its direct object is moved many clauses away?
Semantics in RT 241
Undo the transformation first; the transformation is responsible for the
distorted picture of thematic structure in surface structure. But I have
argued in specific cases (quantifier scope, heavy NP shift, scrambling,
etc.) that movement is not the correct account; rather, it is interlevel
holistic matching of structures.
9.1.2 Compositionality and the Place of Semantics in RT
We can think of RT as involving two di¤erent representational situations.
In one, a structure represents another piece of syntax, and in the other, it
represents a piece of semantics. To take one example, SS representing CS
is a case of syntax representing syntax, and SS representing QS (¼ TopS)is a case of syntax representing a semantic structure. To take another ex-
ample, in chapter 2 I analyzed a particular kind of linguistic variation as
arising from the way di¤erent languages resolve a conflict between a case
of ‘‘structural’’ representation and a case of ‘‘semantic’’ representation.
The formulas for English and German scrambling are these:
(1) English favors SScCS over SScQS (¼ TopS).German favors SScQS (¼ TopS) over SScCS.
And in both languages the possibility of SScFS can neutralize the dif-
ference. From this mechanism I derived the fact that English requires
elements following the verb to maintain a strict order that only focusing
e¤ects can disrupt, whereas in German scrambling is obligatory, except in
the face of some focusing e¤ects.
The model implicit in the above discussion is not the linear representa-
tion model, but a model in which there are three levels that SS must rep-
resent, namely, CS, QS (¼ TopS), and FS.(2) CS ‘ SS c FS
b
QS (¼ TopS)In such a model we can talk about the competing representational
requirements that these three peripheral structures place on SS.
In what follows I want to bring the model back in line with the linear
representation model, yet allow for the representational competition that
the results in chapter 2 depend on. But at the same time I want to model
certain other phenomena involving focus, which will make the model
in (2) unworkable. Furthermore, I want to develop a sense of the gross
architecture of the entire model, instead of simply adding a new level
represented by SS every time a new descriptive problem presents itself.
242 Chapter 9
The project begins with the previously mentioned, possibly indefensi-
ble, categorization of the levels into ‘‘semantic’’ and ‘‘syntactic.’’ In some
linguists’ view, all representations (i.e., ‘‘structures’’) are syntactic. But
some are intuitively more semantic than others: the representation that
unambiguously displays the scope of quantifiers is more semantic than
the representation that displays structural Case relations. But there is
another way to describe the di¤erence between two kinds of representa-
tions: one kind lies directly on the path to spell-out, and the other kind
does not. So, in the model that was the basis for the early part of the
book, CS and SS were indubitably on the way to spell-out, and QS cer-
tainly was not, given the existence of in-situ ambiguous quantifiers in the
output of English pronounced sentences.
In this light, FS itself is a fudge. FS consists of at least the two dif-
ferent elements, ‘‘display of primary sentential accent’’ and ‘‘display of
most salient new information.’’ These two di¤erent notions are clearly
related—but how?
We can begin to form a new model by identifying certain representa-
tions as ‘‘semantic’’: TS, QS, FS. These will not be on the main line to
spell-out. The other representations will be: CS, PS, SS, and AS (Accent
Structure, which displays the accent structure of the utterance). The main
line from CS to AS will be a linear series of representation relations, as
follows:
(3) CS ‘ PS‘ SS ‘ AS
We must also add the interpretive representations. Clearly, di¤erent syn-
tactic levels are relevant for di¤erent aspects of interpretation; for exam-
ple, AS is relevant for focus, but CS may not be. A simple scheme would
be to associate each of the interpretive levels with one of the syntactic
levels.
(4) TS ?S QS (¼ TopS) FS
��! ��! ��! ��!
CS‘PS‘ SS ‘AS
In general, representational conflicts at a given level will arise between
the interpretive level and the structural representational demands on that
level. Whether there are further conflicts will be taken up in the next
section.
This model permits the chapter 2 analysis of English and German,
though now the analysis is cast in slightly di¤erent terms. English favors
CS‘ SS over QS (¼ TopS)‘ SS, and German favors the reverse.
Semantics in RT 243
Moreover, the e¤ects of focus can be factored in by taking AScFS
fidelity into account, in that it can tip the balance back to parity in the
otherwise lopsided representational conflict. (Review chapter 2 for the
empirical basis of the English/German di¤erence, and see section 9.3 for
further analysis of the AS, SS, FS system.)
The model in (4) suggests that in general, since each syntactic level
represents both another syntactic level and an interpretive level, repre-
sentational conflicts will arise between these two. Whether there are fur-
ther sources of conflicts will be taken up in the next section.
Although it is compatible with the findings of this book, and in fact
makes them natural, the model in (4) raises questions about the linguistic
representation of meaning. Each interpretive level is separate from the
others, and there is no connection, no representation relation, between
them. Each of them is an aspect of LF, in the usual sense, exactly in the
sense that in RT each of the syntactic levels is an aspect of the syntax of
a clause. But how are these di¤erent aspects related to one another? The
theta structure of a clause will display the theta relations of the verb in
relation to the verb, and the quantification structure will display the
quantificational structure, but what is the relation between the two? One
wants to know which argument of the verb is quantified in which way.
To take a concrete example, consider a focused definite NP agent of a
verb. Its agentivity is represented in TS, its definiteness in CS, and its
focused property in FS, but how are all these facts related to each other?
The obvious answer is representation. Although I have spoken of repre-
sentation as relating whole structures to whole structures, in doing so it
relates parts of structures to parts of structures. For example, an internal
argument of V in TS will be mapped to a Case-marked accusative in CS,
and so forth, all the way to a focused constituent in AS/FS. We can thus
speak of an NP that is Case-marked, theta-marked, and focused only by
taking into account all of these levels and how they are related to one
another by shape-conserving representation.
We can even define a relation between the theta role an object receives
and its scope, even though these will not be in any direct chain of repre-
sentation, because there will be an induced representation relation that
holds between them, by virtue of the representations that the model does
express directly.
(5) TSn ?SnQS (¼ TopS)n FSn n n n
CS ‘ PS ‘ SS ‘ AS
244 Chapter 9
The representations symbolized by the long arrows are induced by the
representations induced by the short arrows, in the fashion described be-
fore. Some NP in AS represents a focused NP in FS, and that NP repre-
sents an NP in SS, which represents . . . some NP in TS, and so there is an
indirect relation between FS and TS, and also between particular NPs in
FS and NPs (or whatever arguments are) in TS.
For example, consider the following TS/QS pair, with the obvious
head-to-head matches:
(6) TS: [boy]agent [V [girl]patient]
QS: [some boy]QP [V [every girl]QP]
The natural isomorphism will match the agent in TS to the preverbal QP
in QS; this will result in the further match between boy and boy. Some
will not be matched, as it makes its ‘‘first’’ appearance in SS and QS. Boy
occurs in both TS and QS; in TS it is agent, and in QS it is (head of ) a
quantified NP. The full interpretation of [some boy] in QS and later levels
will be some function of the interpretation of [boy] as agent of V. And
so on.
If the matching between levels were always isomorphic, then the
induced isomorphism could be established directly, abridging the repre-
sentation circuit. But owing to the existence of misrepresentation, the
induced representation must make essential use of the chain of represen-
tation relations to establish the relation between TS and QS.
But nothing is changed when mismatching occurs. Recall that English
favors SS as a representation of CS over QS, and so surface structures
with two quantification structures are ambiguous, in one instance mis-
mapping the two Case structures by crossing.
(7)
Here, as before, representation provides the relation between the quanti-
fied NPs and their images in TS.
This aspect of semantics in RT is nothing other than the ‘‘higher equals
later’’ correspondent of compositionality in standard Checking Theory
practice. That is, representation replaces domination for functional
Semantics in RT 245
embedding. For example, in RT an accusative represents a patient; in
standard practice it functionally dominates it.
9.2 Blocking in Semantics
The blocking principle in general prevents multiple representation; that is,
the following situation is not allowed:
(8) *x‘ ys1‘ ys2
(8) corresponds to the notion that ‘‘nature hates a synonymy’’—there
cannot be a di¤erence in form without some di¤erence in meaning. If x is
a concept and ys1 and ys2 are words, then (8) is the notion of synonymy
that holds in the lexicon, and especially in inflectional morphology, where
it is understood that variant forms like sneaked/snuck cannot coexist in
the same grammar. The blocking principle, thus construed, has been
shown to be an operative constraint on language acquisition (Pinker
1984); it is what drives out *goed. It can also, and perhaps thereby, be
construed as a constraint on the form of a grammar.
As ordinarily understood, the blocking principle does more than forbid
synonymy; it says which of two forms is chosen to represent the given
meaning—namely, the one more specifically tailored for that meaning.
For example, were is the general form of the past tense of be, and was is
the form specific to the 1st singular past; although both was and were are
compatible with ‘‘1st singular past,’’ blocking dictates that only was can
express that notion, being most specifically fitted to it.
Although I do not think that the blocking principle is well understood,
I nevertheless regard the principle of Shape Conservation to be a case of
blocking in the sense just described. If ys1 and ys2 are both candidates to
represent x in (8), and if ys1 is more congruent to x than ys2 is, then ys1 is
‘‘more specific to’’ x than ys2 is, and must be chosen to resolve the syn-
onymy. In the simplest case (‘‘all else being equal’’) that should settle the
matter. But in fact, since di¤erent representational levels are connected to
di¤erent aspects of meaning, it is inevitable that blocking will not give a
determinate answer to the question of which of two forms is to be used to
represent, for example, a given theta structure.
For this reason the role of the blocking principle in the present con-
text is not straightforward. Such a principle is clearly required, but it is
not clear what phenomena fall under it. For example, we have analyzed
246 Chapter 9
HNPS as a case of ‘‘misrepresentation’’ between CS and SS, which exists
alongside the ‘‘true’’ representation; so, restricting ourselves to CS and
SS, (8) seems to be instantiated. But, as we saw in the discussion of
HNPS, this ‘‘misrepresentation’’ is accompanied by di¤erences in inter-
pretation at FS. So the blocking principle is upheld in the end, but in a
wider context, one that includes FS. Constructions that look synonymous
(the shifted and unshifted variants of HNPS cases) turn out to have dif-
ferent meanings at FS.
But this raises the question, what di¤erences can count as di¤erences
that license a representational synonymy? For, in the case of HNPS, the
TS‘CS representation does display representational synonymy; it is
only in a later representation that the focus-related di¤erence in meaning
arises. So it is natural to ask, is there any limit on the ‘‘delay’’ that can
occur between a representational synonymy and the di¤erence in meaning
that rescues it?
To put the question in concrete terms: Scrambling interacts with defi-
niteness in German, and other semantic classifications, in ways analyzed
in chapter 2. There, the Synonymy Principle was seen to be satisfied in a
direct way, in that the structure of the example discussed always looked
like (9).
(9)
Suppose surface structure1 and surface structure2 are the scrambled and
unscrambled representations of one and the same Case structure, as the
diagram illustrates. We know that in German this situation is correlated
with di¤erences in scope/topicalization aspects of interpretation repre-
sented in QS. So the CS‘ SS representation involves ‘‘synonymy,’’ but
the surface structures do not, as each surface structure in SS receives a
di¤erent interpretation in QS.
Semantics in RT 247
Now consider a di¤erent kind of case, one that in fact appears to model
known phenomena. Suppose that two di¤erent Case-marking systems
could represent one and the same theta structure, but with the same or a
related di¤erence in meaning as in the case of German scrambling; in
other words, scope, or specificity, or something else, turns on the di¤er-
ence. In such a case the sign of the di¤erence in meaning would be
‘‘remote’’ from the representation of the di¤erence in meaning itself: a
Case distinction would control a di¤erence in meaning two representa-
tions away, so to speak, as shown in diagram (10).
(10)
The licensing of the TS‘CS synonymy is ‘‘delayed’’ until QS.
On methodological grounds I suppose we should begin by disallowing
such cases, in that we would then have a much tighter idea about the
scope of the Synonymy Principle. With delayed licensing, we are saying
that any di¤erence in meaning can license any di¤erence in form. Without
delayed licensing, we can more narrowly specify how di¤erences in form
and di¤erences in meaning are related to one another: only di¤erences in
form and di¤erences in meaning that are in the same region of the model
can interact in this way. The actual predictions would of course depend
on the details of the model, but to take an extreme case, di¤erences in
Case marking (at the early end of the model) could not correspond to
di¤erences in information structure (at the late end).
To illustrate with a concrete case, consider Swahili object agreement
(OM indicates the object agreement a‰x).
(11) a. N-a-m-penda Juma.
I-tns-om-like Juma
‘I like Juma.’
b. *Napenda Juma.
c. N-a-ki-soma kitabu.
I-tns-om-read book
‘I read the book.’
d. Nasoma kitabu.
‘I read a book.’
248 Chapter 9
When the object is animate, object agreement is obligatory; but when the
object is inanimate, it occurs only with definites. If object agreement is at
the same level as Case assignment, then the pattern in (11) shows that
Swahili agreement for indefinites has TS‘ SC synonymy, not resolved
until QS, if QS is where definites and indefinites are sorted out.
This conclusion that delayed synonymy resolution is possible can be
averted by structuring the model di¤erently. For example, suppose that
QS represents the scope of quantifiers, as before, but that the definite/
indefinite distinction is established earlier—say, in CS. Then of course the
TS‘CS synonymy is resolved on the spot, and the more narrow con-
ception of how blocking enforces itself is possible.
I do not find myself in any position to resolve the question of delayed
licensing of synonymy. It is a question that does not translate easily into
standard minimalist practice with Checking Theory, and so deserves fur-
ther study in empirically distinguishing these two styles of modeling how
semantics is determined by syntactic form.
9.3 Kinds of Focus
9.3.1 IFocus and LFocus
The RT model just outlined provides an index to another set of related
entities, the di¤erent kinds of focus. Several kinds of focus, or focusing
e¤ects, have been cited in the literature: normal focus, contrastive focus,
and the focusing that occurs in special constructions like pseudocleft,
cleft, scrambling, and HNPS. I think this variety can be understood in
terms of mismappings between levels. If we look at the right-hand side of
the model as it now stands, we see several opportunities for mismatch.
(12)
The mismatch between SS and QS (¼ TopS) has already been discussedin chapter 2, and nothing said here will change the conclusions drawn
there. I will try to show that the way SS, AS, and FS relate to one another
Semantics in RT 249
can account for the variety of focusing e¤ects and can allow them, despite
their di¤erent properties, to be seen as part of a systematic whole.
I will begin by drawing attention to an only partly appreciated dimen-
sion on which types of focus can be di¤erentiated. The discussion that
follows depends on sorting them out clearly.
One kind of focus generates a propositional presupposition—that is, a
presupposition that some proposition is true. This sort of focus is found
in the cleft construction, for example.
(13) It was John who Bill saw.
(13) presupposes that Bill saw someone. I will call this kind of Focus a
Logical Focus (LFocus). I include in this type the answers to questions.
When the answer to a question is a whole sentence, the ‘‘real’’ answer
must be the focus of the sentence.
(14) A: What did you buy in New York?
B: I bought a RECORD in New York.
B 0: *I bought a record in New YORK.
The question answer focus is often cited as the core case of normal focus.
The other kind of focus is tied directly to the placement of main sen-
tence accent, but it does not involve anything propositional. For example:
(15) John wants a red hat and a BLUE hat.
The ‘‘presupposition’’ generated by focusing on BLUE is just the word
hat, and nothing bigger than that. One could try to extract a proposi-
tional presupposition from (15) (e.g., John wants an X-colored hat), but
that is an artifact of the particular example and is not possible in general.
(16) John compared the red hat to the BLUE hat.
There is no proposition out of which BLUE has been abstracted in (16).
Rather, BLUE is what I called a disanaphor in Williams 1997, and hat is
its paired anaphor; the requirement is that the disanaphor be di¤erent
from whatever stands in the same relation (‘‘ �R�!’’ in (17)) to the an-tecedent of the anaphor that the disanaphor bears to the anaphor.
(17) X
0disanaphor
�R�! �R�!
antecedent of anaphor
¼anaphor
This is the Disanaphora Principle proposed in Williams 1997, where it is
shown that the relation between hat and hat in (16) obeys general princi-
250 Chapter 9
ples of anaphora. The accent pattern, and the accompanying anaphoric
commitments, are essentially obligatory.
(18) *John compared the red hat to the blue HAT.
(The fact that (18) is not absolutely ungrammatical is a point to which I
will return.)
The most convincing examples showing that accent-induced Focus/
Presupposition structure has nothing to do with propositional presuppo-
sition comes from how telephone numbers are pronounced when they
include repeated digits (M. Liberman, personal communication).
(19) a. 258-3648b. *258-3648c. *258-3656d. 258-3656
Here again the pattern is obligatory, so long as the speaker groups the
digits in the usual way (3-2-2). Again, no propositional presupposition
is raised. The anaphora involved here takes ‘‘same digit’’ as the identity
condition in the domain in which that anaphora operates. As this kind of
focus pertains to what has been called the information structure of a sen-
tence, I will call it Information Focus (IFocus).
I will associate IFocus and LFocus with di¤erent levels in RT. As
LFocus for the pseudocleft construction involves wh movement, it cannot
occur any earlier than SS, and I will assume that it is defined in SS (or the
closely related QS). As IFocus involves the phonological accent pattern, it
is plausibly associated with AS, which itself determines FS (Information
Structure (IS)), resulting in the following diagram (the same as (12)):
(20) (QS‘)SS‘AS(cFS)
LFocus IFocus
We now have two notions of focus, so it is important to know how they
are related to each other. The answer is representation. That is, in the
normal situation, IFocus represents LFocus. Notice that the representa-
tion is not direct, but rather induced by the circuit.
Given a sentence with a nontrivial LFocus in SS, how is it represented
by AS?
LFocus and IFocus are similar in an important way. Each breaks up a
sentence into two parts: the Focus and the rest. We might suppose, then,
that matching up the structures on this basis would be a part of the natural
isomorphism between the two levels SS and AS, with the consequence
Semantics in RT 251
that, in the normal case, the IFocus and the LFocus would be identified
with each other. That is indeed what we find in the ‘‘unmarked’’ pronun-
ciation of cleft sentences.
(21) a. It was JOHN that Bill saw.
b. *It was John that Bill SAW.
It is also what we find in normal focus, as defined by question-answer
pairs.
(22) A: What did you buy in New York?
B: I bought a RECORD in New York.
B 0: *I bought a record in New YORK.
For the relation between IFocus and LFocus to be completely clear,
the full details of AS—and for that matter QS (¼ TopS)—must bedeveloped, and I will not do that here. I will make the smallest number of
assumptions possible. That is, AS generates a set of accent structures, and
in particular defines the notion ‘‘Accented Phrase’’ in a way that captures
its central property: for English, it appears that the Accented Phrase can
be any phrase that contains the main accent on a right branch. The fact
that in a right-branching structure a number of di¤erent phrases will
qualify is the phenomenon of Focus projection.
(23) I [want to [see [the man [in the [red HAT]]]]].
Any of the bracketed expressions in (23) can be the Accented Phrase in
AS, hence the IFocus in IS. The definition of Accented Phrase accounts
for Focus projection. The IFocus in FS will canonically map to the
Accented Phrase in AS.
SS likewise defines LFocus in some manner. At the worst, certain con-
structions, like clefts and sentential answers to questions, are specified as
determining an LFocus. In the natural isomorphism between the levels,
LFocus ¼ Accented Phrase ¼ IFocus. Correspondingly, the LPresuppo-sition of (21a) (Bill saw someone) and its IPresupposition (Bill saw t) are
matched as well.
What is odd about (21b), then, is that the IFocus is not identified with
the LFocus. An odd sentence, but not a truly ungrammatical one—it
simply has a very specialized use. We can use the machinery just devel-
oped to explicate that use.
When the natural isomorphism holds between SS and AS, the pairings
IFocus ¼ LFocus and IPresupposition ¼ LPresupposition result. The se-mantics is straightforward: the meaning of the IFocus is some function of
252 Chapter 9
the LFocus, and the meaning of the IPresupposition is some function of
the LPresupposition. But when the isomorphism is broken, as in (21b),
these identities do not hold. Instead, for (21b) the identities are these:
(24) It was John that Bill SAW.
SS: LFocus ¼ JohnLPresup ¼ Bill saw someone
PP: Accented Phrase ¼ SAWRest ¼ it was John that Bill X
IS: IFocus ¼ SAWIPresup ¼ it was John that Bill X
The IPresupposition here includes both the LFocus and (part of ) the
LPresupposition. It therefore cannot be identified with the LPresupposi-
tion—or, for that matter, with any other constituent in SS. Its meaning
therefore cannot be (a function of ) the meaning of the LPresupposition,
or the meaning of any subconstituent in SS. Rather, it must take the
whole surface structure (with both LFocus and LPresupposition) as its
value, but with the IFocus abstracted out.
(25) [saw]IFocus [[John]LFocus [Bill Xed someone]LPresup]IPresup
(25) shows how AS represents SS, but without the natural isomorphism.
It is because of the nonisomorphism that (21b) has such a special-
ized use. Normally, the LFocus is not IPresupposed. In this example it
is; in fact, a particular LFocus:LPresupposition pair is IPresupposed.
Under what circumstances would this be appropriate? Only if that
LFocus :LPresupposition pair had occurred together in recent previous
discourse. But that could really only be the case if something like (26A)
preceded (21b).
(26) A: It was JOHN that Bill heard.
B: No, it was John that Bill SAW.
The narrow circumstance in which this sort of IPresupposition is pos-
sible is what gives examples like (21b) their metalinguistic or ‘‘corrective’’
flavor. In ordinary terminology, the focus on saw in (21b) would be called
contrastive focus and would be given a separate theoretical treatment, or
at least the promise of one. But in fact, many of the things that are true of
focus in general are true of contrastive focus as well, and there is there-
fore much to lose in not giving them a common account. For example,
the rules for determining Focus projection are the same for both con-
trastive focus and normal focus, as the following examples show.
Semantics in RT 253
(27) a. It was John that Bill SAW in the morning.
b. It was John that Bill saw in the MORNING.
c. A: What did you do to John?
B: I SAW him.
d. A: What happened?
B: Bill saw John in the MORNING.
In (27a) the contrastive focus is narrow, just as the normal focus is in
(27c); likewise, in (27b) the contrastive focus is potentially broad, just as
the normal focus is in (27d). Such parallels compel us to treat contrastive
and normal focus by the same mechanisms, which include the identifica-
tion of the IFocus and the relation of IFocus to the Accented Phrase.
In addition, when a language has left-accented Focuses, as Hungarian
does, the left accenting holds for both normal and contrastive focusing.
But of course a di¤erence must be drawn somewhere. In the present
scheme it is drawn in the relation of SS to AS, and specifically in the re-
lation of the IFocus to the LFocus—when IFocus represents LFocus, we
get normal focusing, when it doesn’t, we get contrastive.
An important element in this explanation is that LFocus is subordi-
nate to IFocus. This is shown by the fact that LFocus can wind up in the
IPresupposition, but the reverse can never happen, because of how AS
and SS relate to one another. In other words, it is not enough to say of
(28B) that it has two Focuses. The following exchange will always be
impossible:
(28) A: JOHN saw Bill.
B: *No, it was Sam that JOHN saw.
Here, speaker B has attempted to correct speaker A, but has chosen the
wrong focus strategy to do it: he has preserved speaker A’s main Focus
as an Accented Phrase and has added his own correction as an LFocus
di¤erent from the Accented-Phrase-defined Focus. A theory that assigns
triggering features to Focuses does not thereby explain this particular
asymmetry, even if it assigns di¤erent features to the two. RT distin-
guishes them by virtue of the asymmetric relation between levels and the
fact that they are located in di¤erent levels.
9.3.2 Copular Inversion and Focus
The apparatus developed here can unravel some of the intricacy of
copular constructions. Copular sentences with two NPs show a complex
254 Chapter 9
interaction among IFocus, LFocus, and referentiality. Such sentences
usually have inverted and uninverted variants.
(29) a. John is the mayor.
b. The mayor is John.
From small clause constructions, we know that one of these orders is
more basic.
(30) a. I consider John the mayor.
b. *I consider the mayor John.
I will assume that the ‘‘narrower’’ term (John) is the subject of the sen-
tence in some sense of subject relevant to a level prior to SS or to SS itself,
the earliest level in which LFocus and IFocus are defined; I will then refer
to the order in (29a) as the subject order (see Williams 1997 for fuller
discussion, but in a di¤erent theoretical context). Both (29a) and (29b) are
grammatical with final accent; however, they diverge if the accent falls on
the initial NP.
(31) a. JOHN is the mayor.
b. *The MAYOR is John.
Like some previous examples, (31b) is not ungrammatical; rather, it is
restricted to ‘‘corrective’’ contexts. We may gain some understanding of
(31) if we assume that the order in (31a) is the subject order, but the order
in (31b) is not. Then the pattern in (31) is just the familiar pattern we
have seen for HNPS, and the logic of (31) is, ‘‘Invert to deliver a canoni-
cal (final) Focus, but not otherwise.’’
The two orders show a surprising further di¤erence in relatives and
questions.
(32) a. I wonder who is the mayor?
b. I wonder who the mayor is?
c. I met the person who is the mayor.
d. *I met the person who the mayor is.
The intriguing contrast is (32b) versus (32d): since both involve wh
movement, it seems unlikely that the di¤erence has to do with movement
per se. Also, both have noncanonical (nonfinal) IFocuses, so the answer
does not lie there either.
But two plausible suppositions will su‰ce to explain the di¤erence in
the context of RT. First, suppose the inverted subject must be an LFocus;
Semantics in RT 255
and second, suppose that questions, but not relatives, have LFocus
‘‘pivots’’ (wh words). Then (32b) ‘‘compensates’’ for noncanonical sub-
ject order by establishing a canonical LFocus; but in (32d) there is
no corresponding compensation, so the noncanonical subject order is
unmitigated.
There is some evidence for both of the suppositions needed in this ex-
planation. First, questions do seem to raise a propositional presupposi-
tion of exactly the sort that would be given by identifying the pivot as the
LFocus. That is, (33a) seems to presuppose the truth of (33b).
(33) a. Who did you see?
b. You saw someone.
Second, there is some di¤erence in the presuppositions for inverted and
uninverted copular sentences, which I think the following examples bring
out:
(34) a. Bill thought that John was the mayor, but in fact the town had
no mayor.
b. ?Bill thought that the mayor was John, but in fact the town had
no mayor.
That is, the inverted form seems to carry a presupposition, ‘‘the mayor is
somebody,’’ which the uninverted form does not carry. (For further dis-
cussion, see Williams 1998a.)
9.3.3 Spanish LFocus
Having made the distinction between IFocus and LFocus, let us return to
a problem alluded to in chapter 2. It has often been noted that ‘‘answers
to questions’’ in Spanish are obligatorily clause final.
(35) A: Who called?
B: *JUAN llamo por telefono.
JUAN called
(Zubizarreta 1998, 76)
B 0: Llamo por telefono JUAN.
Even to discuss the problem, we must distinguish normal focus from
contrastive focus, because contrastive focus in Spanish is not subject to
the limitation just illustrated. However, distinguishing them risks losing
an account of all they have in common, as noted earlier in this chapter:
they have anaphoric commitments of the same kind, they both carry nu-
256 Chapter 9
clear stress internally, and so on. The IFocus/LFocus distinction allows
us to treat them separately without abandoning a common account of the
phenomena just described.
First, we will need to assume one of the conclusions reached earlier:
that questions, and their answers, involve LFocus of the answer, for rea-
sons already given—a question generates an LPresupposition, and the
response to the question carries forward the LPresupposition of the ques-
tion and substitutes the answer for the wh phrase as LFocus. Now we
may begin to approach the question of how Spanish focus works.
First, why must the answer to a question, which we have identified now
as the LFocus, be clause final in Spanish? We have already assumed that
in the canonical SS‘AS representation, the LFocus is mapped to the
IFocus. Let us further assume that the SS Focus is rightmost. The right-
ward positioning of the LFocus in SS arises from the requirement that SS
match QS; in other words, we will assume that the rightness requirement
originates in QS and propagates to SS under Shape Conservation. Right-
ness of LFocus will be enforced to the extent that SScQS is enforced. In
particular, if SScQS supersedes SScPS, then LFocuses will appear in
a rightward position, if possible.
Let us suppose that Spanish is such a language. Then we do expect the
behavior in (35): if the LFocus can be rightmost, then it must be right-
most. But other predictions are generated as well.
First, the LFocus will appear on the right only if the syntax allows
it. Since subjects can be postposed in Spanish, rightward positioning of
LFocused subjects is possible. But, as we saw in section 2.7, there are sit-
uations in which such postposing is impossible.
(36) A: Con
with
quien
who
llegaron
arrived
enferma?
sick
‘Whoi did he arrive with sicki?’
B: Llegaron con MARIA enferma.
B 0: *Llegaron enferma con MARIA.
Example (36) is significant in sorting out theoretical treatments of
focusing e¤ects. In RT (36B) is grammatical precisely because (36B 0) isnot. (36B 0) is not grammatical because it is not an available structure inthe relevant level of representation (SS in the present context). Therefore,
(36B) is the closest match to the quantification structure, and so even
though it mismatches on the positioning of the Focus, it is the best match,
hence grammatical (though it is judged slightly worse than a ‘‘normal’’
Semantics in RT 257
answer in which postposing is possible). So, the best match wins, even
when the best match is a bad match. I must stress that I do not have an
account for why postposing is not allowed in these cases, only for why
nonfinal Focuses are acceptable when postposing is not allowed.
In a Checking Theory account of such structures, (36B) is mysterious.
If there is a focus feature that must be checked in Spanish, resulting in
obligatory postposing of the subject, why is that feature not left unsatis-
fied in (36B), making the sentence ungrammatical? That is, the gramma-
ticality of (36B) cannot be understood in terms of the ungrammaticality
of (36B 0), because an unchecked feature is an unchecked feature.In RT, English di¤ers from Spanish in two ways. First, English does
not allow subjects to be postposed. I assume this is due to a di¤erence in
the constitution of SS (or PS). Second, English does not favor FSc SS
(or derivatively FScQS), in that LFocuses are tolerated in nonfinal
position, as we saw earlier. I assume these are independent di¤erences
between the two languages. If so, then there is room for other language
types—specifically, for a language that strongly favors LFocus in right-
most position, but without subject postposing. Such a language would
treat subject LFocuses in the same way that English does, that is, in situ;
but in the VP, where reordering is possible, not putting the LFocus in
final position would be sharply worse than in English. French might be
such a language.
The second thing to understand about Spanish is why the rightward
positioning requirement is not imposed for contrastive focus. In short,
because contrastive focus does not involve an LFocus. The LPresupposi-
tion is a presupposition of truth, and, as we saw in section 9.3.2, it is not
relevant to the general case of contrastive focus.
(37) I prefer the red book to the [BLUE]IFocus book.
The same notion of IFocus is applicable to both contrastive and normal
focus, but the rightward positioning requirement for answers stems from
the syntax of LFocus in SS, not from IFocus, and so has no e¤ect on
examples like (37) or like Zubizarreta’s (1998, 76) (see (57) in chapter 2).
(38) JUAN
JUAN
llamo por telefono (
called
no
not
PEDRO).
PEDRO
Here the Focus is an IFOCUS and is not extraposed even though
extraposition is possible.
The focusing in (38) involves no truth presupposition, insofar as saying
JUAN called does not presuppose the truth of someone called. It presup-
258 Chapter 9
poses that x called has occurred in the discourse already; but that is
nothing more than to say that x called is an anaphor, not that it is true.
(39) Mary didn’t call; but JUAN called.
The anaphor called is licensed by Mary didn’t call, even though that
clause explicitly denies that Mary called and gives no indication that
anyone else did.
9.3.4 Hungarian Focus
As we saw in chapter 2, Hungarian focus structure is Focus initial, in
that the Focus precedes all nontopicalized clause elements, including the
subject.
(40) Hungarian focus structure
Topic . . . Topic F [V . . . ]
Hungarian di¤ers in this way from the languages we have considered
so far—English, Spanish, Italian. If this is correct, then Hungarian di¤ers
parametrically in how it structures one of the levels (FS), which tells us
that the levels themselves are not fully fixed universally. We will see that
languages can vary in two ways: not only in which representation rela-
tions they prefer over others, as in chapter 2, but also in how the levels
themselves are structured. RT will then di¤er from other theories in
having a nonuniform source of variation—Checking Theory reduces all
variation to strength of features, Antisymmetry reduces all variation to
remnant movement; Optimality Theory reduces all variation to reorder-
ing of constraints. For some this might be enough to put RT out of the
running, but surely that conclusion is premature.
Hungarian di¤ers from English in another way: the Focus itself must
be initially accented. In (41) the Focus can be any of the underlined
constituents.
(41) Janos [a TEGNAPI cikkeket] olvasta.
Janos the yesterday’s articles read
‘Janos read yesterday’s articles.’
(Kenesei 1998, as reported in Szendroi 2001)
Hungarian and English thus di¤er on two parameters: Is neutral Focus
position on the left or the right? and Is the focused constituent left
Semantics in RT 259
accented or right accented? If it turns out that all languages are of either
the Hungarian or the English type, then I will be deeply embarrassed, as I
have constructed a theory in which there are four possible language types,
including as well, for example, languages where the left-accented Focus
occurs on the right periphery, and the reverse.
(42) a. [ . . . [Accent . . . ]]
b. [[ . . . Accent] . . . ]
I frankly cannot think of a natural scheme to tie these two parameters
together as one. In RT in particular it would be di‰cult to coordinate
them, as they govern di¤erent levels: the accent placement parameter
governs AS, and the left versus right placement of Focus itself is a feature
of FS. For these reasons I hope the two parameters do not turn out to be
linked empirically.
Furthermore, there is a little evidence, from English and symmetrically
from Hungarian, suggesting that they are independent. Both English and
Hungarian have nonperipheral Focuses, and those Focuses are accented
like their peripheral counterparts.
(43)
In Hungarian, noninitial Focuses are allowed only as second Focuses, as
single Focuses must move to initial Focus position.
These examples show that the internal placement of the accent is inde-
pendent of whether the Focus is peripheral or not, suggesting that the in-
ternal placement is independent of the external distribution and in turn
that languages with the parameters set as in (42) are to be expected.
In sum, then, it appears we might say that universally, (a) Focuses are
either left or right accented, as a part of the definition of the Accented
Phrase in AS; (b) the principal constituent of FS is located either left-
peripherally or right-peripherally in the structures defined there; and (c)
the AS is mapped to the FS under Shape Conservation.
260 Chapter 9
9.4 Ellipsis in RT
In section 9.3 we had call to identify the complement of an IFocus as an
‘‘anaphoric’’ IPresupposition. In fact, IPresupposition is a poor term,
since, as shown there, there is no presupposition in the sense of a propo-
sition with a truth value. Finding anaphora operating in AS, in the form
of destressing, suggests revisiting the theme of chapter 4, where it was
shown that di¤erent reflexive anaphors occupy di¤erent RT levels, with
predictably di¤erent properties. Are there other kinds of anaphors that
can be ‘‘indexed’’ according to the RT levels?
A good candidate is the family of ellipsis rules. English and other
languages display several kinds of ellipsis, with puzzling di¤erences in
behavior. I think some of these properties, particularly involving di¤er-
ences in locality, can be explained by locating them at di¤erent RT levels.
English, as well as other languages, has an ellipsis rule that deletes
everything but a single remnant constituent.
(44) John wants to build Mary a tree house on Friday, and
Samnom; too
Samacc; too
a co‰n; too
on Sunday; too
8>><>>:
9>>=>>;.
Although (45) is potentially ambiguous, given a particular focus, its in-
terpretation is fairly well fixed.
(45) Bob saw BILL, and Pete too.
¼ and Bob saw Pete0 and Pete saw Bill
This is exactly what we would expect if the construction in question were
interpreted in AS. The interpretive layer of AS (FS) partitions a sentence
into IFocus and IPresupposition, and it is the IPresupposition, and only
the IPresupposition, that is used as the antecedent. For this reason I will
refer to this kind of ellipsis as Focus ellipsis.
Fixing Focus ellipsis at AS—that is, very late—suggests that it will be
highly nonlocal. In particular, it suggests that the ellipsis site itself can
span CP boundaries, which does indeed seem possible (elided elements
are struck through).
(46) Someone thinks that Bill likes fruitcake, and
Someone thinks that Pete likes fruitcake too
Semantics in RT 261
VP ellipsis presents quite a di¤erent picture. VP ellipsis seems intrinsi-
cally bound up with the notion of subjecthood we have associated with
PS: the elided material is always interpreted as a predicate that takes the
remnant of ellipsis as its subject.
(47) Sue likes oats in the morning and John does too.
Since VP deletion is licensed (first) in PS, we would expect it to be im-
mune to the identification of the Focus, and this seems largely true.
(48)John saw MARY
JOHN saw Mary
� �and then BILL did too.
The anaphora is compatible with any choice of Focus. Not only can
the main accent be located anywhere; in addition, wherever it is, Focus
projection is possible without a¤ecting the interpretation of the ellipsis.
Again, this is what would be expected if VP ellipsis were adjudicated
in PS, before AS. Moreover, the availability of ‘‘strict’’ versus ‘‘sloppy’’
readings does not turn on focus structure, as the following examples
show:
(49) a. JOHN likes his mother, and so does BILL.
b. John likes his MOTHER, and so does Bill.
c. i. Bill likes Bill’s mother (sloppy)
ii. Bill likes John’s mother (strict)
Both (49a) and (49b) have both readings in (45), despite having di¤er-
ent accent structures. (See Williams 1974 or Fiengo and May 1994 for
accounts of the strict and sloppy readings.)
What is invariant about VP deletion is the relation of the ellipsis to
what remains. The VP is a predicate on the subject that remains, and it
is on the basis of this that the strict/sloppy readings are sorted out—the
ambiguous pronoun bears an ambiguous relation to the subject.
Focus ellipsis bears the relation IPresupposition to the IFocus that
remains undeleted; therefore, in both cases the target of the ellipsis is ap-
propriate to the level at which it takes place. Focus ellipsis also shows
strict/sloppy identity ambiguities.
(50) a. Sam told JOHN to buy his mother a present, and PETE as well.
b. i. Sam told Pete to buy John’s mother a present
ii. Sam told Pete to buy Sam’s mother a present
Appropriately, the ambiguity lies in how the pronoun relates to the rem-
nant of the ellipsis, in this case, the Focus; as a result, all else being equal,
262 Chapter 9
Focus ellipsis behaves in a way parallel to VP ellipsis. For both VP ellip-
sis and Focus ellipsis, we can imagine the sort of account put forward
in Williams 1974, wherein the deleted material bears an ‘‘abstraction’’
relation to the remnant material. In the case of VP ellipsis the abstrac-
tion is the abstraction inherent in the subject-predicate relation; in the
case of Focus we can easily imagine that the same kind of abstraction is
involved.
(51) a. John lx (x likes his mother)
b. John lx (Sam told x to buy his mother a present)
Then in both cases the ambiguity will lie in whether the pronoun takes as
its antecedent the lambda variable x (for the sloppy reading) or the argu-
ment of the lambda expression, John (for the strict reading).
The result so far is that the interpretation of the ellipsis, and in par-
ticular the behavior of the strict/sloppy ambiguity, turns on structures
needed independently: the articulation into subject and predicate in PS
for VP ellipsis, and the articulation into Focus and Presupposition for
Focus ellipsis.
However, this pretty picture is marred somewhat by the existence of
speakers who accept a wider class of sloppy readings for VP ellipsis. The
following sort of case is reported by Fiengo and May (1994):
(52) a. John’s father thinks that he will win, and Bill’s father does too.
b. i. Bill’s father thinks that John will win (strict)
ii. Bill’s father thinks that Bill will win (sloppy)
Fiengo and May develop a theory of sloppy identity that depends on a
general notion of ‘‘parallelism’’ that must hold in ellipsis sites; the sloppy
interpretation arises here because the relation between John’s and he in
the first clause of (52a) is structurally parallel to the relation between
Bill’s and Bill in (52bii).
The sloppy readings for examples like (52) are, I think, only marginally
available, and not at all for some speakers. But the mystery remains:
where do they come from? I think the focus structures of the examples
shed some light on the situation. Importantly, the success of sloppy am-
biguity that turns on antecedents other than subjects depends completely
on focus structure, as the following examples show:
(53) a. John’s father thinks he will win, and BILL’s father does too.
b. John’s father thinks he will win, and Bill’s MOTHER does too.
0Bill’s MOTHER thinks Bill will winc. *John’s father thinks he will win, and BILL’s mother does too.
Semantics in RT 263
(53b) does not have a sloppy reading, the one indicated beneath it. This
is clearly the result of Bill ’s not being the Focus of the second clause.
(53c) simply shows that given the context, BILL could not be the Focus,
because of the disanaphora conditions on focusing discussed earlier.
Two points will clarify the situation. First, for some speakers it appears
that sloppy identity for VP ellipsis is being licensed in exactly the manner
of Focus ellipsis: sloppy identity can turn only on the Focus. That is, the
ellipsis is being licensed by a structure that looks like this:
(54) BILL lx (x’s mother [thinks he will win])
This is a structure that arises in FS, not PS. So we might conclude that for
some speakers the sloppiness can arise in FS, not PS. This will also ex-
plain why (53b) does not have a sloppy reading; it does not qualify for
one in PS, because the sloppiness does not turn on the subject, and it does
not qualify for one in FS, because the sloppiness does not turn on the
Focus. So we may account for the phenomenon in (53) by supposing that
for some speakers VP ellipsis is licensed in FS, instead of (or actually, in
addition to) PS.
The most compelling reason that this picture must be essentially correct
is that even for speakers who allow Focus-anteceded sloppy identity for
VP ellipsis, focus plays no role when the licensing is subject-anteceded.
This can be verified in examples already given; for example, (53a,b),
which are repeated here, both have valid sloppy interpretations in which
the antecedent for the pronoun is Bill’s mother (the reading indicated in
(55c)).
(55) a. John’s father thinks he will win, and BILL’s father does too.
b. John’s father thinks he will win, and BILL’s MOTHER does
too.
c. Bill’s mother thinks that Bill’s mother will win.
What this means is that all speakers have access to the ‘‘core’’ case of
VP licensing—the one found in PS, where only subjects antecede elided
material, and where variations in focus structure play no role in the
availability of antecedents. So focus-based variation arises only when the
licensing takes place at FS.
Now let us apply this methodology to other ellipsis rules. English has
another ellipsis rule called gapping, a stylistically somewhat formal rule.
Gapping seems restricted to coordinated IPs; at least, that is what the
following paradigm suggests:
264 Chapter 9
(56) a. I think that John saw Mary, and Mary John.
b. *I think that John saw Mary, and that Mary, John.
This restriction suggests that gapping is defined on the level at which IPs
are defined, but not CPs—in other words, on something like PS. If that is
so, then gapping should be bounded by CPs not only as shown in (56),
but also as shown in (57).
(57) a. John thinks that Sue bought a dog, and Pete, a cat.
b. John wants to buy a dog, and Pete, a cat.
c. John wants Sue to buy a dog and Pete, a cat.
(57a) is grammatical, but it cannot mean ‘. . . and Pete thinks that Sue
bought a cat’; that is, the ellipsis cannot bridge the tensed complement
structure, but must be contained entirely within it.
(58) a. *[John thinks that Sue bought a dog] and [Pete thinks that Sue
bought a cat].
b. John thinks that [Sue bought a dog] and [Pete bought a cat].
c. John wants Sue to buy a dog and Pete, wants Sue to buy a cat.
d. John wants Sue to buy a dog and Pete wants a cat to buy a
dog.
The restriction follows if gapping is restricted to PS, where CP structure
has not yet been introduced. Of special interest is (58c), as the embedded
clause has a subject, but is not tensed. (58c) is slightly more di‰cult to
parse in the manner indicated. In fact, a di¤erent reading interferes, the
one indicated in (58d) (see Hankamer 1973 for discussion). But most
speakers accept (57b), particularly if the pause is made especially prom-
inent. If these discriminations are correct, they strongly confirm the
framework that predicts them. To summarize the prediction: from a fact
about the context in which gapping takes place (56), we infer the dis-
criminations in (57) and (58), discriminations we have no right to expect
in the absence of RT.
In all of the discussions of locality so far, I have given cases in which
the ellipsis slices into the complement—that is, deletes part of it. But
then what about cases in which the ellipsis includes the whole of the
complement?
(59) John said [that he was leaving]CP on Monday, and
Bill said [that he was leaving]CP on Tuesday.
Semantics in RT 265
In (59) an entire CP has been gapped along with the verb. But how is that
possible, if gapping occurs at a level where CP has not yet been intro-
duced? The answer must be something like this. At the point at which the
gapped structure is assigned an antecedent, which I will continue to sup-
pose is IP, the full CP structure has not been introduced in the antecedent
VP, but the gapping rule nevertheless establishes the antecedent relation
between the two VPs. (The relation is indicated here by coindexation.)
(60) John [said that]VPi on Monday and Bill [e]VPi on Tuesday.
At a later stage—say, SS—the full tensed CP is filled into the comple-
ment position in the first clause.
(61) John [said [that he was leaving]CP]VPi on Monday, and Bill [e]VPi on
Tuesday.
In the resulting structure [e]VP will be understood as having the whole VP
as its antecedent, including the CP.
Under this arrangement the rule licensing the gapped material does not
have access to the CP structure; but it does not need to have that access.
Therefore, it will still be impossible to delete a proper subpart of a com-
plement CP.
The final ellipsis rule I will consider in connection with the RT levels
is sluicing. Sluicing is triggered by the presence of wh phrases, so it is in-
evitable that it is licensed in SS, the level in which wh is defined.
(62) John likes someone, but I don’t know who [John likes t].
Given that sluicing is licensed in a structure in which CP has been intro-
duced, we expect that it can slice into CPs, and this appears to be so.
(63) John thinks that Mary will lie to someone, but I don’t know
who John thinks [that Mary will lie to t].
The residual preposition guarantees that the embedded clause has been
sliced into, and not simply deleted as a whole, which (as we saw in the
case of gapping) is irrelevant to evaluating locality.
9.5 The Semantic Values of Elements in RT Levels
An NP in TS corresponds to a pure theta role; an NP in higher levels
corresponds more and more to what we think of as a full NP—refer-
ential, quantificational, and so on. An NP in CS is a Cased NP; pre-
266 Chapter 9
sumably it is here, and possibly in later levels, that expletives enter. We
can then talk about the ‘‘history’’ of an NP as the series of objects at dif-
ferent levels that are put in correspondence under the isomorphic map-
ping that relates the levels to one another.
(64) TS: [dog . . . ]‘
CS: [dognom . . . ]‘
SS: [[every dog] . . . ]
The sequence dog, dognom, every dog is established by Shape Conserva-
tion.
Since presumably every NP has an image in every level, it might at first
seem di‰cult to distinguish the di¤erent levels. But in fact I think that
anaphors, as described in chapter 4, can give us some insight into the
di¤erences between the levels. Recall that anaphoric bindings are a part
of what Shape Conservation carries forward from one level to the next,
so that a coindexation (or its equivalent) established in an early level will
persist in later levels.
(65) TS: [dogi likes himselfi]‘
CS: [dognomi likes himselfi]‘
SS: [[every dog]i likes himselfi]
If the anaphor is assigned its antecedent in CS (for concreteness), then
that assignment is carried forward to SS by the Shape Conservation
mapping. Put in terms used in earlier chapters, the ‘‘antecedent’’ relation
commutes with the representation relation, in that, given an anaphor, the
image (under shape-conserving mapping) of the anaphor takes as its an-
tecedent the image of the antecedent of the anaphor.
But an anaphoric relation established in an early level may ‘‘mean’’
something di¤erent from an anaphoric relation established in later levels;
at least, that is what I will tentatively suggest in what follows.
For example, an anaphoric binding in TS binds two theta roles together
—two coarguments, or, as suggested in chapter 4, perhaps a somewhat
broader notion. One cannot coherently say that the two theta roles
‘‘corefer’’ since reference, in the sense of that property which, for exam-
ple, definite NPs have, is not a concept at that level. Theta roles in TS are
the actors, patients, and so on, that are the arguments of predicates, and
coindexing two theta roles says that they are ‘‘the same’’—that is, ‘‘iden-
tified.’’ This will translate into coreference in a later level—specifically, in
whatever later level the relevant notion of reference is operative. This at
Semantics in RT 267
least tells us that split antecedents are impossible at this level, as splitting
an anaphor implies some kind of substructure, and theta roles themselves
are indivisible at TS, that is, atomic. Later coindexings might be liable to
split antecedents, as at least the full notion of reference will have to allow
the sorts of relations that have been referred to as coreference, overlap in
reference, subsumption of reference, disjointness of reference, and so on,
and therefore will clearly allow the sort of structure that would support
split antecedence. By this thinking, then, we arrive at the notion that early
anaphors will not allow split antecedence, but late anaphors will.
This will be more than a way to simply classify anaphors, as we now
know some things about the behavior of early and late anaphors: early
anaphors will display sharp locality restrictions, will have a limited set
of admissible antecedents (in the A/A sense), and will always be trans-
parently reconstructed for by movement and scrambling relations. If it
turns out that these things also correlate with the possibility of having
split antecedents, then that becomes a strong cross brace in the empirical
underpinning of RT.
I have not carried out the broad empirical survey that would deliver
a sound decision on this speculation. It would be relevant to know, for
example, whether long-distance uses of Japanese zibun allow split ante-
cedents. But there is one suggestive indication that the correlations are
exactly as expected. It is well known that English clausemate, coargument
antecedents are not allowed to be split, and as I have already suggested,
these are CS or possibly TS anaphors, on the grounds of locality and
reconstructivity.
(66) *Johni told Maryj about themselves[i; j].
This fact is certainly consonant with my proposals; indeed, if it were false,
it would call into serious question the premise on which I am basing the
further predictions in this section. At the other end of the scale are ana-
phors of the kind discussed by Reinhart and Reuland (1993); as deter-
mined in chapter 4, these are defined at a late stage in the model, on
grounds of their lack of locality.
(67) John told Mary that at least Bill and himself would be there.
The question then is, can these anaphors be split? The following example
is relevant:
(68) Johni told Maryj that at least Bill and themselves[i; j] would be
invited to the party.
268 Chapter 9
If the judgment discriminating (66) and (68) is reliable, these examples are
encouraging, because in the absence of RT, there is no particular reason
that locality and target type should correlate with the possibility of split
antecedents.
If these two types of anaphors di¤er in this way, then we would expect
them to di¤er in reconstructivity as well: anaphors that do not allow split
antecedents would reconstruct, and anaphors that do allow split ante-
cedents would not. Although the following examples are the right kinds
of examples to make the point, I think they are complex enough that firm
judgments are not available; consequently, although the marks in (69) do
correspond to my own judgments, perhaps they should be read as only
the ‘‘predicted’’ judgments.
(69) a. What Johni saw t was himselfi dancing in the street.
b. *What Johni told Maryi that he saw on TV was Bill and
themselvesi dancing in the streets.
Of course, in order for (69b) to be relevant at all, it must be determined
that reconstruction is necessary in the first place; if the surface, unrecon-
structed configuration of the anaphor and its putative antecedents is
valid, then (69b) would be irrelevant to the question of reconstruction.
But I think the following example establishes that something like c-
command is necessary even for these sorts of reflexives:
(70) *Exactly when John told Mary to leave, I saw Bill and themselves
dancing in the streets on TV.
Controllable (null) subjects are like anaphors in dividing into two sorts,
one allowing splitting, and the other not; the former are traditionally
called obligatory control cases, and the latter, non–obligatory control
cases. Obligatory control cases take determinate local antecedents; non–
obligatory control cases take ‘‘arbitrary’’ and ‘‘inferred’’ antecedents.
As suggested in chapter 3, it is very likely that these two sorts of con-
trol correspond to di¤erent ‘‘sizes’’ of infinitives. Wurmbrand (1998) has
documented that this is the case in German. Applying the same reasoning
used earlier, the RT expectation is that the ‘‘smaller’’ the infinitive, the
earlier the control relation is established, and the less possibility there will
be for split antecedence. Again, some very clear cases suggest that this is
so. As discussed in chapter 3, no infinitive that clearly takes CP structure
shows the properties of obligatory control; likewise, such infinitives show
split antecedents.
Semantics in RT 269
(71) a. Non–obligatory control
Johni told Maryj [how [PRO][i; j] to save themselves]CP.
b. Obligatory control
*Johni promised Maryj [[PRO][i; j] to save themselves]CP.
(I have included a reflexive in both cases to guarantee the relevant con-
struals for the examples. The reflexive itself cannot be the locus of the
splitting or nonsplitting of antecedents, as it occupies (the whole of ) an
argument position and cannot be split; but such a reflexive can take as its
unsplit antecedent another NP that itself has split antecedents).
Not all non–obligatory control cases show overt CP structure, but at
least the ones that do behave exactly as expected, uniformly allowing split
antecedents. Conversely, obligatory control structures do not allow split
antecedents.
The anaphoric systems in other languages should reveal the same
pattern: long-distance anaphors should allow split antecedents, anaphors
with high locality have unsplit antecedents. In Japanese, for example, we
might expect zibun and zibunzisin to di¤er in exactly this way. However,
in checking the literature I have not found examples that unambiguously
demonstrate this, independently of the splitting involved in infinitival
control.
If I am putting RT to correct use here, in trying to rationalize the ‘‘split
antecedents’’ divide among anaphoric elements, then in fact that divide
must be the tip of the iceberg, as every pair of RT levels has the potential
to give rise to other, but related, kinds of distinctions. This will require
sorting out the RT levels more precisely than I have been able to do here.
An additional distinction is perhaps isolated in the following pair:
(72) a. John wants to win.
b. John wants himself to win.
First, I think these do not di¤er at all regarding the possibility of split
antecedents; in both cases the antecedent of the embedded subject is sim-
ply John. But another distinction has often been noted: namely, that (72a)
has the de se reading and (72b) does not. Partee (1971) caught one aspect
of this distinction in the contrast between the following pair, which di¤er
sharply in their meanings:
(73) a. Only John wants to win.
b. Only John wants himself to win.
270 Chapter 9
In standard theory this might be attributed to a di¤erence between PRO
and himself. RT at least o¤ers the opportunity to interpret the di¤erences
in another way. Significantly, the structure that gives rise to the de se
reading is ‘‘smaller,’’ and therefore earlier, than the one that does not.
In a related vein, RT levels can also be used to distinguish various
kinds of quantifier scope assignment. The first clue is to understand how
quantifier scope relates to various opportunities for reconstruction.
We know, for example, that wh movement reconstructs for quantifier
interpretation in some instances, and not in others, and in fact that NP
movement itself reconstructs for certain quantifiers. Wh reconstruction
for scope takes place in examples like this:
(74) How many people does John think Bill saw t?
This example is actually ambiguous between de dicto and de re inter-
pretations, which can be schematized as follows:
(75) a. John thinks [x many people [Bill saw t]] What is x?
b. [x many people] [John thinks [Bill saw t]] What is x?
(75a) represents the de dicto interpretation, which plausibly involves
a quantifier having scope in the lower clause; (75b) represents the wide
scope de re interpretation.
(75a) certainly suggests that, in RT, wh movement can occur later than
the construal of quantifiers like that many, by the theory’s general meth-
odology. In the working model I have adopted for this book, that is not
strictly speaking possible, but of course we might take SS to be an ab-
breviation of some number of levels in which this can be sorted out. Does
(75b) suggest that quantifier construal occurs after wh movement as well?
Quite possibly, I would guess, though not necessarily, as an embedded
quantifier could have wide scope without the benefit of wh movement.
When we turn to NP movement, we again find evidence for recon-
struction—what have been called quantifier lowering cases with raising
verbs.
(76) Someone seems to have been here.
a. for someone x, x seems to have been here
b. seems [for someone x, x to have been here]
In RT there will be no lowering; instead, there will be ordering. The con-
strual of the quantifier someone precedes NP movement; since NP move-
ment is associated with the level PS, quantifier construal must precede
Semantics in RT 271
that level. The conclusion that presents itself from the data examined thus
far is that quantifiers can be construed in any level; but in fact, quantifiers
di¤er regarding where they are construed.
(77) Not many boys are believed [t to have left].
a. not many boys [believed [t to have left]]
b. believed [not many boys [t to have left]]
Most speakers reject the narrow scope reading (77b). So, NP movement
seems to reconstruct for construal of someone, but not for construal of
not many. In RT this simply means that the levels (or range of levels)
at which these two quantifiers are construed are di¤erent: one before, one
after PS. In this regard RT mimics the findings of Beghelli and Stowell
(1997) under the ‘‘later equals higher’’ equivalence discussed in chapter 2.
That the existential is construed early is consistent with the fact that the
implicit quantification of suppressed arguments is interpreted as existen-
tial, and with extremely narrow scope:
(78) a. They weren’t attacked.
b. They weren’t attacked by someone.
c. not [bx [x attacked them]]d. bx [not [x attacked them]]
(78a) can only have meaning (78c), whereas (78b) can have meanings
(78c) and (78d). Perhaps the existential binding of implicit arguments
is accomplished at TS, thus explaining its generally narrow scope—
anything else will come later.
Splitting up quantifier construals between levels raises some technical
questions, to which I can at this point only stipulate arbitrary answers,
but I suppose I should do at least that if only to show that the project is
not incoherent. The general idea is this. As in earlier sections of this
chapter, we have seen that the interpretation of structures ‘‘accumulates’’
across levels. Just as with anaphoric bindings, then, scope assignments
that are established at earlier levels are preserved in later structure under
Shape Conservation.
Many questions remain unanswered. For example, why are some
quantifiers excluded from early construal, and presumably, some ex-
cluded from late construal? I have no specific ideas about this, though I
would of course note that it is a problem for the standard model as well.
It is particularly troublesome for Beghelli and Stowell’s (1997) model,
where quantifiers are assigned scope by moving them to preestablished,
272 Chapter 9
dedicated positions in functional structure. The question is, why are those
positions located where they are in functional structure?—essentially the
same question that arises under the already mentioned ‘‘higher equals
later’’ equivalence between the two styles of modeling the relation be-
tween syntax and semantics.
But even with so much in darkness, I am encouraged to try to extend
the LRT correlations of earlier chapters to questions of scope and ante-
cedence, so as to lock together an even more disparate array of properties
of syntactic relationships in a way I think is impossible in other models.
Semantics in RT 273
This page intentionally left blank
References
Abney, S. 1987. The English noun phrase in its sentential aspect. Doctoral dis-
sertation, MIT.
Anderson, S. 1982. Where’s morphology? Linguistic Inquiry 13, 571–612.
Anderson, S. 1992. A-morphous morphology. Cambridge: Cambridge University
Press.
Andrews, A. 1982. The representation of Case in Modern Icelandic. In J.
Bresnan, ed., The mental representation of grammatical relations, 427–503. Cam-
bridge, Mass.: MIT Press.
Babby, L. 1998a. Subject control in direct predication: Evidence from Russian.
In Z. Boskovic, S. Franks, and W. Snyder, eds., Formal Approaches to Slavic
Linguistics 1997: The Connecticut Meeting, 17–37. Ann Arbor: Michigan Slavic
Publications.
Babby, L. 1998b. Voice and diathesis in Slavic. Ms., Princeton University.
Bach, E. 1976. An extension of classical transformational grammar. In Problems
in linguistic metatheory: Proceedings of the 1976 conference at Michigan State
University, 183–224. East Lansing: Michigan State University, Department of
Linguistics.
Baker, M. 1985. The Mirror Principle and morphosyntactic explanation. Linguis-
tic Inquiry 16, 373–415.
Baker, M. 1996. The polysynthesis parameter. Oxford: Oxford University Press.
Barrett-Keach, C. N. 1986. Word-internal evidence from Swahili for Aux/Infl.
Linguistic Inquiry 17, 559–564.
Bayer, J., and J. Kornfilt. 1994. Against scrambling as an instance of Move-alpha.
In N. Corver and H. van Riemsdijk, eds., Studies on scrambling, 17–60. Berlin:
Mouton de Gruyter.
Beghelli, P., and T. Stowell. 1997. Distributivity and negation. In A. Szabolcsi,
ed., Ways of scope taking, 71–107. Dordrecht: Kluwer.
Benedicto, E. 1991. Latin long-distance anaphora. In J. Koster and E. Reuland,
eds., Long-distance anaphora, 171–184. Cambridge: Cambridge University Press.
Besten, H. den. 1976. Surface lexicalization and trace theory. In H. van Riems-
dijk, ed., Green ideas blown up: Papers from the Amsterdam Colloquium on Trace
Theory. Publications of the Linguistics Department 13. Amsterdam: University of
Amsterdam, Linguistics Department.
Bodomo, A. B. 1998. Serial verbs as complex predicates in Dagaare and Akan. In
I. Maddieson and T. J. Hinnebusch, eds., Language history and linguistic descrip-
tion in Africa. Vol. 2, Trends in African linguistics, 195–204. Trenton, N.J.: Africa
World Press.
Bok-Bennema, R. 1995. Case and agreement in Inuit. Berlin: Mouton de Gruyter.
Boskovic, Z. 1995. On certain violations of the Superiority Condition, AgrO, and
economy of derivation. Ms., University of Connecticut.
Boskovic, Z. 1999. On multiple feature checking. In S. D. Epstein and N. Horn-
stein, eds., Working minimalism, 159–187. Cambridge, Mass.: MIT Press.
Brody, M. 1997. Mirror theory. Ms., University College London.
Brody, M., and A. Szabolcsi. 2000. Overt scope: A case study in Hungarian. Ms.,
University College London and New York University.
Burzio, L. 1996. The role of the antecedent in anaphoric relations. In R. Freidin,
ed., Current issues in comparative grammar, 1–45. Dordrecht: Kluwer.
Chierchia, G. 1992. Functional wh and weak crossover. In D. Bates, ed., Pro-
ceedings of the 10th West Coast Conference on Formal Linguistics, 75–90. Stan-
ford, Calif.: CSLI Publications.
Chomsky, N. 1957. Syntactic structures. The Hague: Mouton.
Chomsky, N. 1973. Conditions on transformations. In S. Anderson and P.
Kiparsky, eds., A festschrift for Morris Halle, 232–286. New York: Holt, Rine-
hart and Winston.
Chomsky, N. 1982. Barriers. Cambridge, Mass.: MIT Press.
Chomsky, N. 1993. A minimalist program for linguistic theory. In K. Hale and
S. J. Keyser, eds., The view from Building 20: Essays in linguistics in honor of
Sylvain Bromberger, 1–52. Cambridge, Mass.: MIT Press.
Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press.
Cinque, G. 1998. Adverbs and functional heads. Oxford: Oxford University Press.
Cinque, G. 2001. ‘‘Restructuring’’ and functional structure. Ms., University of
Venice.
Collins, C. 1996. Local economy. Cambridge, Mass.: MIT Press.
Collins, C. 2001. The internal structure of verbs in Ju|’hoan and ¼jHoan. In A.Bell and P. Washburn, eds., Cornell working papers in linguistics 18. Ithaca, N.Y.:
Cornell University, CLC Publications.
Culicover, P., and W. Wilkins. 1984. Locality in linguistic theory. New York:
Academic Press.
Deprez, V. 1989. On the typology of syntactic positions and the nature of chains.
Doctoral dissertation, MIT.
276 References
Diesing, M. 1992. Indefinites. Cambridge, Mass.: MIT Press.
Di Sciullo, A.-M., and E. Williams. 1987. On the definition of word. Cambridge,
Mass.: MIT Press.
Fiengo, R., and R. May. 1994. Indices and identity. Cambridge, Mass.: MIT
Press.
Fodor, J. 1978. Parsing strategies and constraints on transformations. Linguistic
Inquiry 9, 427–474.
Fox, D. 1995. Economy and scope. Natural Language Semantics 23, 283–341.
Gill, K.-H. 2001. The long-distance anaphora conspiracy: The case of Korean.
Ms., University of Edinburgh.
Grimshaw, J. 1978. English wh-constructions and the theory of grammar. Doc-
toral dissertation, University of Massachusetts, Amherst.
Haegeman, L., and H. van Riemsdijk. 1986. Verb projection raising, scope, and
the typology of rules a¤ecting verbs. Linguistic Inquiry 17, 417–466.
Hankamer, J. 1973. Unacceptable ambiguity. Linguistic Inquiry 4, 17–68.
Harley, H. 1995. Subjects, events, and licensing. Doctoral dissertation, MIT.
Hoji, H. 1985. Logical Form constraints and configurational structures in Japa-
nese. Doctoral dissertation, University of Washington.
Hoji, H. 1986. Scope interpretation in Japanese and its theoretical implications. In
M. Dalrymple, J. Goldberg, K. Hanson, M. Inman, C. Pinon, and S. Wechsler,
eds., Proceedings of the 5th West Coast Conference on Formal Linguistics, 87–101.
Stanford, Calif.: CSLI Publications.
Holmberg, A. 1985. Word order and syntactic features. Doctoral dissertation,
University of Stockholm.
Huang, C.-T. J. 1982. Logical relations in Chinese and the theory of grammar.
Doctoral dissertation, MIT.
Kaplan, R., and J. Bresnan 1982. Lexical-Functional Grammar: A formal system
for grammatical representation. In J. Bresnan, ed., The mental representation of
grammatical relations, 173–281. Cambridge, Mass.: MIT Press.
Kayne, R. 1975. French syntax. Cambridge, Mass.: MIT Press.
Kayne, R. 1981. Two notes no the NIC. In A. Belletti, L. Brandi, and L. Rizzi,
eds., Theory of markedness in generative grammar, 317–346. Pisa: Scuola Normale
Superiore.
Kayne, R. 1994. The antisymmetry of syntax. Cambridge, Mass.: MIT Press.
Kenesei, I. 1994. The syntax of focus. Ms., University of Szeged.
Kenesei, I. 1998. Adjuncts and arguments in VP-focus. Acta Linguistica Hungar-
ica 45/1–2, 61–88.
E. Kiss, K. 1987. Configurationality in Hungarian. Dordrecht: Reidel.
E. Kiss, K. 1995. NP movement, operator movement, and scrambling in Hun-
garian. In K. E. Kiss, ed., Discourse configurational languages, 207–243. Oxford:
Oxford University Press.
References 277
Konapasky, A. 2002. A syntacto-morphological analysis of dependent heads in
Slavic. Doctoral dissertation, Princeton University.
Koopman, H., and A. Szabolcsi. 2000. Verbal complexes. Cambridge, Mass.:
MIT Press.
Koster, J. 1985. Reflexives in Dutch. In J. Gueron, H.-G. Obenauer, and J.-Y.
Pollock, eds., Grammatical representations, 141–167. Dordrecht: Foris.
Kuno, S., and J. Robinson. 1972. Multiple wh-questions. Linguistic Inquiry 3,
463–488.
Kuroda, S.-Y. 1970. Remarks on the notion of subject with reference to words
like ‘‘also,’’ ‘‘even,’’ or ‘‘only.’’ Annual Bulletin, vol. 3, 111–129; vol. 4, 127–152.
Tokyo: Research Institute of Logopedics and Phoniatrics.
Lako¤, G. 1972. On Generative Semantics. In D. Steinberg and L. Jakobovits,
eds., Semantics, 232–296. Cambridge: Cambridge University Press.
Landau, I. 1999. Elements of control. Doctoral dissertation, MIT.
Lasnik, H. 1999. Minimalist analysis. Oxford: Blackwell.
Lavine, J. 1997. Null expletives and the EPP in Slavic. Ms., Princeton University.
Lavine, J. 2000. Topics in the syntax of non-agreeing predicates in Slavic. Doc-
toral dissertation, Princeton University.
Mahajan, A. 1989. The A/A0 distinction and movement theory. Doctoral disser-tation, MIT.
Marantz, A. 1984. Grammatical relations. Cambridge, Mass.: MIT Press.
Matthei, E. 1979. The acquisition of prenominal modifier sequences: Stalking the
second green ball. Doctoral dissertation, University of Massachusetts, Amherst.
Moltmann, F. 1990. Scrambling in German and the specificity e¤ect. Ms., MIT.
Moortgat, M. 1988. Categorial investigations. Doctoral dissertation, University
of Amsterdam.
Muller, G. 1995. A-bar syntax: A study in movement types. Berlin: Mouton de
Gruyter.
Neeleman, A. 1994. Complex predicates. Doctoral dissertation, Utrecht
University.
Noyer, R. 1992. Features, positions, and a‰xes in autonomous morphological
structure. Doctoral dissertation, MIT.
Partee, B. 1971. On the requirement that transformations preserve meaning. In
C. Fillmore and D. T. Langendoen, eds., Studies in linguistic semantics, 1–21.
New York: Holt, Rinehart and Winston.
Pesetsky, D. 1987. Wh-in-situ: Movement and unselective binding. In E. Reuland
and A. ter Meulen, eds., The representation of (in)definiteness, 98–129. Cam-
bridge, Mass.: MIT Press.
Pica, P. 1991. On the interaction between antecedent-government and binding:
The case of long-distance reflexivization. In J. Koster and E. Reuland, eds., Long-
distance anaphora, 119–135. Cambridge: Cambridge University Press.
278 References
Pinker, S. 1984. Language learnability and language development. Cambridge,
Mass.: Harvard University Press.
Pollock, J.-Y. 1989. Verb movement, Universal Grammar, and the structure of
IP. Linguistic Inquiry 20, 365–424.
Postal, P. 1974. On raising: One rule of English grammar and its theoretical impli-
cations. Cambridge, Mass.: MIT Press.
Prinzhorn, M. 1998. Prosodic and syntactic structure. Ms., University of Vienna.
Reinhart, T., and E. Reuland. 1993. Reflexivity. Linguistic Inquiry 24, 657–720.
Richards, N. 1997. What moves where when in which language? Doctoral disser-
tation, MIT.
Riemsdijk, H. van. 1996. Adverbia en bepaaldheid. Ms., University of Tilburg.
Riemsdijk, H. van, and E. Williams. 1981. NP Structure. The Linguistic Review 1,
171–217.
Rivero, M.-L. 1991. Long head movement and negation: Serbo-Croatian vs.
Slovak and Czech. The Linguistic Review 8, 319–351.
Rizzi, L. 1982. Violations of the Wh-Island Constraint and the Subjacency Con-
dition. In Issues in Italian syntax, 49–76. Dordrecht: Kluwer.
Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press.
Roeper, T., and M. Siegel. 1978. A lexical transformation for verbal compounds.
Linguistic Inquiry 9, 199–260.
Ross, J. R. 1970. On declarative sentences. In R. A. Jacobs and P. S. Rosenbaum,
eds., Readings in English transformational grammar, 222–272. Waltham, Mass.:
Ginn.
Rudin, C. 1988. On multiple questions and multiple wh-fronting. Natural
Language and Linguistic Theory 6, 445–501.
Saito, M. 1991. Long distance scrambling in Japanese. Ms., University of Con-
necticut, Storrs.
Saito, M. 1992. Long distance scrambling in Japanese. Journal of East Asian
Linguistics 1, 69–118.
Saito, M. 1994. Improper adjunction. In M. Koizumi and H. Ura, eds., Formal
Approaches to Japanese Linguistics 1, 263–293. MIT Working Papers in Linguis-
tics 24. Cambridge, Mass.: MIT, Department of Linguistics and Philosophy,
MITWPL.
Samek-Lodovici, V. 1996. Constraints on subjects: An optimality-theoretic anal-
ysis. Doctoral dissertation, Rutgers University.
Santorini, B. 1990. Long distance scrambling and anaphora binding. Ms., Uni-
versity of Pennsylvania.
Selkirk, E. 1982. The syntax of words. Cambridge, Mass.: MIT Press.
Steedman, M. 1996. Surface structure and interpretation. Cambridge, Mass.: MIT
Press.
References 279
Szabolcsi, A. 1996. Verb and particle movement in Hungarian. Ms., UCLA.
Szendroi, K. 2001. Focus and the syntax-phonology interface. Doctoral disserta-
tion, University of Southern California.
Timberlake, A. 1979. Reflexivization and the cycle in Russian. Linguistic Inquiry
10, 109–141.
Travis, L. 1984. Parameters and e¤ects of word order variation. Doctoral disser-
tation, MIT.
Ueyama, A. 1998. Two types of dependency. Doctoral dissertation, University of
Southern California.
Vanden Wyngaerd, G. 1989. Object shift as an A-movement rule. In P. Branigan,
J. Gaulding, M. Kubo, and K. Murasugi, eds., Student Conference in Linguistics
1989, 256–271. MIT Working Papers in Linguistics 11. Cambridge, Mass.: MIT,
Department of Linguistics and Philosophy, MITWPL.
Webelhuth, G. 1989. Syntactic saturation phenomena and the modern Germanic
languages. Doctoral dissertation, University of Massachusetts, Amherst.
Wilder, C. 1997. Some properties of ellipsis in coordination. In A. Alexiadou and
T. H. Hall, eds., Studies on Universal Grammar and typological variation, 59–107.
Amsterdam: John Benjamins.
Williams, E. 1971a. Small clauses in English. Ms., MIT.
Williams, E. 1971b. Underlying tone in Margi and Igbo. Ms., MIT. [Published
1976, Linguistic Inquiry 7, 463–484.]
Williams, E. 1974. Rule ordering in syntax. Doctoral dissertation, MIT.
Williams, E. 1977. Discourse and Logical Form. Linguistic Inquiry 8, 101–139.
Williams, E. 1980. Predication. Linguistic Inquiry 11, 203–238.
Williams, E. 1981a. Argument structure and morphology. The Linguistic Review
1, 81–114.
Williams, E. 1981b. Language acquisition, markedness, and phrase structure. In
S. Tavakolian, ed., Language acquisition and linguistic theory, 8–34. Cambridge,
Mass.: MIT Press.
Williams, E. 1981c. On the notions ‘‘lexically related’’ and ‘‘head of a word.’’
Linguistic Inquiry 12, 245–274.
Williams, E. 1986. A reassignment of the functions of LF. Linguistic Inquiry 17,
265–299.
Williams, E. 1987. Implicit arguments, the binding theory, and control. Natural
Language and Linguistic Theory 5, 151–180.
Williams, E. 1991. ‘‘Why crossover?’’ Handout, colloquium presentation, MIT.
Williams, E. 1994a. Negation in English and French. In D. Lightfoot, ed., Verb
movement, 189–206. Cambridge, Mass.: MIT Press.
Williams, E. 1994b. Thematic structure in syntax. Cambridge, Mass.: MIT Press.
Williams, E. 1997. Blocking and anaphora. Linguistic Inquiry 28, 577–628.
280 References
Williams, E. 1998a. The asymmetry of predication. In R. Blight, ed., Texas Lin-
guistic Forum 38, 323–333. Austin: University of Texas, Texas Linguistic Forum.
Williams, E. 1998b. Economy as shape conservation. In Celebration: An electronic
festschrift in honor of Noam Chomsky’s 70th birthday. http://addendum.mit.edu/
celebration.
Williams, E. In preparation. The structure of clusters. Ms., Rutgers University.
[To be presented at NIAS/Collegium Budapest Cluster Study Group.]
Wiltschko, M. 1997. D-linking, scrambling and superiority in German. Groninger
Arbeiten zur germanistischen Linguistik 41, 107–142.
Wurmbrand, S. 1998. Infinitives. Doctoral dissertation, MIT.
Yatsushiro, K. 1996. On the unaccusative construction in nominative Case
licensing. Ms., University of Connecticut, Storrs.
Yip, M., J. Maling, and R. Jackendo¤. 1987. Case in tiers. Language 63, 217–
250.
Zubizarreta, M. L. 1998. Prosody, focus, and word order. Cambridge, Mass.: MIT
Press.
Zwart, C. J.-W. 1997. Morphosyntax of verb movement: A minimalist approach to
the syntax of Dutch. Dordrecht: Kluwer.
References 281
This page intentionally left blank
Index
A/A0 distinction, 72, 118–121, 171relativization of, 96, 121, 130–133Ablative absolute, 192–193Accent Structure (AS), 243, 251–261Adjective order, 153–154Adjuncts and X-bar, 61–62Adverb positioning, 44Anaphora, 95–116Antisymmetry, 19–21Arabic inflection, 217–218Assume Lowest Energy State, 163
Benedicto, E., 98–99Binding, 120Blocking in semantics, 10, 246–249Bracketing paradoxes, 5–8Bridge verbs, 69Bulgarian, 145–146, 154–157, 168Burzio, L., 112–113
Case structure, 13Case-preposition duality, 188–194CAT, 203–238Causativization, 66–67Checking Theory, 29, 35–36Cinque, G., restructuring verbs, 90–91functional structure, 201–202Complement-of relation, 179Complementizer agreement, 196Compositionality, 240–246Contraction, 163–164Control, 85–86, 269–270obligatory/optional distinction, 87–88Copular inversion, 254–256Countercyclic derivation, 70–71CS embedding, 67–69Czech verb clusters, 237–238
D-linking, 41, 144, 148–149Disanaphora Principle, 250
Dutch reflexive, 101–103Dutch verb clusters, 224–229
ECM as CS embedding, 15, 67–68, 105Ellipsis, 261–266Embedding, 25functional vs. complement, 59, 174–176,199–201
English auxiliary system, 222–224EPP subjects, 83–85Equidistance, 16Ergative case, 110–111Excorporation, 187–188Expletives, 68, 92Extension, 73, 114
Flip, 206–211Focus, 34IFocus vs. LFocus, 249–261normal and contrastive, 32, 249Focus ellipsis, 261Focus Structure (FS), 30–33FS embedding, 60–70Functional structure, 173
Gapping, 193–194, 264–266General Ban on Improper Movement(GBOIM), 72
General Condition on Scope, 22Georgian inflection, 217–218Germanrestructuring verbs, 89–91scrambling, 39–44, 119, 122–124, 126–129V2 vs. V-final, 78–79WCO, 143–145
Haegeman, L., and H. van Riemsdijk,(1986), 224–229
Head-complement relation, 11–12Head Movement Constraint, 171
Heavy NP Shift, 33–38Holmberg’s generalization, 17–19Hungarianfocus, 36, 259–260scope, 45–50scrambling, 160–165verb clusters, 229–237
IFocus, 249–261Improper movement, 71–75Induced representation, 244IPresupposition, 252–253
Japaneselong vs. short scrambling, 157–161reflexive, 97scope and scrambling, 124–126
Konopasky, A., 148–152Koopman, H., and A. Szabolcsi (2000),229–237
L-tous, 79–80Latin reflexive, 98–99Level Blocking Principle, 95 102Level Embedding Conjecture, 63–65Lexical-Functional Grammar, 22, 38Lexical Variation Hypothesis (LVH), 212,219
Lexicalism, 172–173, 202LFocus, 249–261Locality, 164–167Long topicalization, 130–132LPresupposition, 252–253LRT correlations, 59, 117–135
Mirror Principle, 15, 178, 199–203Mohawk inflection, 219–220Movement, 26, 62Multiple exponence, 194–196, 216Multiple-WH movement, 145–147
Navajo inflection, 221–222Nominative case, 109NP structure model, 118, 127
Optimality theory, 22, 38
Pan˙ini’s principle, 7, 10
Parallel movement, 139 141Predicate Structure (PS), 86 87, 106 112
QS, 30 33Quantifier interpretation andreconstruction, 42 43, 271 273
Quantifier scope, 42Quirky case, 81 83, 110 112
Reassociation, 188 194, 206 211Reconstruction, 117–135and quantifier interpretation, 271–273Reinhart, T., and E. Reuland (1993), 99–101, 104–106, 108
Relativized Minimality, 185–188Remnant movement, 19–21, 133–135Representation, 13–14asymmetry of, 60as homomorphism, 61model for, 23–24Richards, N., 140, 149–150, 158–159,166
Right node raising (RNR), 37Rule of Combination (RC), 204–205Russian subject position, 52–55, 83–85
Scrambling, 39–44, 117–135, 157–165long vs. short, 157–161masked, 161Selection 183Self-, 100Semantic compositionality, 242–246Semantic interpretation, 25Serbo-Croatian, 151–152verb clusters, 187–188, 237–238Serial verbs, 65–66Shadow, 175 177Shape Conservation, 5, 7–8, 15–23, 239–242, 246
Small clause theory, 63Southern Tiwa inflection, 219–220Spanish focus, 50–52, 256–259Spanning vocabulary, 214–215Split antecedents, 267–269SS embedding, 69–70Subcategorization, 203–204Subject auxiliary inversion, 191–192SubjectsEPP, 83–85quirky, 81–83and scrambling, 126–129Superiority, 140–145, 158–159Superraising, 77–78Swahili inflection, 220–221object agreement, 248–249Swiss German verb clusters, 224–229Synonymy Principle, 247Synthetic compounds, 9–13
Target, 95–96Theta Structure, 13Topic, 30–32Tough movement, 75–77TS Embedding, 65–67
Ueyama, A., 124–126
284 Index
V2, 78–79, 191–192Verb projection raising, 224 –229Verb-particle construction, 233Verbal modifier (Hungarian), 229–237VP ellipsis, 262–264
Weak Crossover, 141–145, 154West Flemish verb clusters, 224–229Wiltschko, M., 143–145Wurmbrand, S. (restructuring verbs), 89–90
X-bar theory, 175–185
Yip, M., J. Maling, and R. Jackendo¤(1987), 109–112
Yuman (Lakhota, Alabama) inflection, 222
Index 285
This page intentionally left blank
Current Studies in Linguistics
Samuel Jay Keyser, general editor
1. A Reader on the Sanskrit Grammarians
J. F. Staal, editor
2. Semantic Interpretation in Generative Grammar
Ray Jackendo¤
3. The Structure of the Japanese Language
Susumu Kuno
4. Speech Sounds and Features
Gunnar Fant
5. On Raising: One Rule of English Grammar and Its Theoretical Implications
Paul M. Postal
6. French Syntax: The Transformational Cycle
Richard S. Kayne
7. Panini as a Variationist
Paul Kiparsky, S. D. Joshi, editor
8. Semantics and Cognition
Ray Jackendo¤
9. Modularity in Syntax: A Study of Japanese and English
Ann Kathleen Farmer
10. Phonology and Syntax: The Relation between Sound and Structure
Elisabeth O. Selkirk
11. The Grammatical Basis of Linguistic Performance: Language Use and
Acquisition
Robert C. Berwick and Amy S. Weinberg
12. Introduction to the Theory of Grammar
Henk van Riemsdijk and Edwin Williams
13. Word and Sentence Prosody in Serbocroatian
Ilse Lehiste and Pavle Ivic
14. The Representation of (In)definiteness
Eric J. Reuland and Alice G. B. ter Meulen, editors
15. An Essay on Stress
Morris Halle and Jean-Roger Vergnaud
16. Language and Problems of Knowledge: The Managua Lectures
Noam Chomsky
17. A Course in GB Syntax: Lectures on Binding and Empty Categories
Howard Lasnik and Juan Uriagereka
18. Semantic Structures
Ray Jackendo¤
19. Events in the Semantics of English: A Study in Subatomic Semantics
Terence Parsons
20. Principles and Parameters in Comparative Grammar
Robert Freidin, editor
21. Foundations of Generative Syntax
Robert Freidin
22. Move a: Conditions on Its Application and Output
Howard Lasnik and Mamoru Saito
23. Plurals and Events
Barry Schein
24. The View from Building 20: Essays in Linguistics in Honor of Sylvain
Bromberger
Kenneth Hale and Samuel Jay Keyser, editors
25. Grounded Phonology
Diana Archangeli and Douglas Pulleyblank
26. The Magic of a Common Language: Jakobson, Mathesius, Trubetzkoy, and
the Prague Linguistic Circle
Jindrich Toman
27. Zero Syntax: Experiencers and Cascades
David Pesetsky
28. The Minimalist Program
Noam Chomsky
29. Three Investigations of Extraction
Paul M. Postal
30. Acoustic Phonetics
Kenneth N. Stevens
31. Principle B, VP Ellipsis, and Interpretation in Child Grammar
Rosalind Thornton and Kenneth Wexler
32. Working Minimalism
Samuel Epstein and Norbert Hornstein, editors
33. Syntactic Structures Revisited: Contemporary Lectures on Classic Trans-
formational Theory
Howard Lasnik with Marcela Depiante and Arthur Stepanov
34. Verbal Complexes
Hilda Koopman and Anna Szabolcsi
35. Parasitic Gaps
Peter W. Culicover and Paul M. Postal, editors
36. Ken Hale: A Life in Language
Michael Kenstowicz, editor
37. Flexibility Principles in Boolean Semantics: The Interpretation of Coordina-
tion, Plurality, and Scope in Natural Language
Yoad Winter
38. Phrase Structure Composition and Syntactic Dependencies
Robert Frank
39. Representation Theory
Edwin Williams