Constraining the Theory - Prof. Fredreck J. Newmeyer

Preview:

DESCRIPTION

The Edge of Linguistics lecture series from Prof. Fredreck J. Newmeyer During Oct 7 to Oct 17, Prof. Newmeyer offered a lecture series on a wide range of linguistic topics in Beijing Language and Culture University. Lecture 1: The Chomskyan Revolution Lecture 2: Constraining the Theory Lecture 3: The Boundary between Syntax and Semantics Lecture 4: The Boundary between Competence and Performance Lecture 5: Can One Language Be ‘More Complex’ Than Another? Background: Fredreck J. Newmeyer is Professor Emeritus of Linguistics at the University of Washington and adjunct professor in the University Of British Columbia Department Of Linguistics and the Simon Fraser University Department of Linguistics. He has published widely in theoretical and English syntax.

Citation preview

1

FREDERICK J. NEWMEYER

UNIVERSITY OF WASHINGTON, UNIVERSITY OF BRITISH COLUMBIA,

AND SIMON FRASER UNIVERSITY

Class 2:Constraining the Theory

2

The need for constraints

Syntactic Structures contrasted three formal models of syntax:

Finite-State grammars

Phrase-structure grammars

Transformational grammars

3

The need for constraints

Finite-State grammars. Every rule is of the form:

Si a Sj (a terminal symbol followed by the initial symbol)

Notice that a FSG does generate an infinite set of sentences.

But it provides no structure to these sentences.

And it cannot generate sentences like:If Si, then Sj

Either Si or Sj

4

The need for constraints

Phrase-structure grammars. Every rule is of the form:

A B CB a D

These rules give you a tree diagram: A

B C

a D

• So they can handle structure and constituency.

• But they cannot handle discontinuous elements or long-distance dependencies:

• Mary is work+ing• Who did you see?

5

The need for constraints

Transformational rules. They transform trees into trees:

6

The need for constraints

The problem: Transformational rules are too powerful!

They allow things to happen that never occur in any human language:

To reverse all of the words in a sentence.To move a preposition in the highest clause and attach it to an adjective in the lowest clauseTo delete every other word.

7

The need for constraints

So the important thing was to constrain transformational rules — to make them less powerful.

This was especially important, given that a child acquires a transformational grammar.

The less powerful the rules, the smaller the ‘search space’ for the child and therefore the easier to explain the rapidity of language acquisition.

8

Island constraints

The most important set of constraints over the years are what are called ‘island constraints’.

They prohibit movement out of a particular syntactic configuration.

A configuration out of which nothing can move is called an ‘island’.

9

Island constraints

The first island constraint was the A-over-A principle (from Chomsky 1964).

Mary saw the boy walking to the railroad station is ambiguous:

Mary saw [NP the boy [VP walking to the railroad station]]

Mary saw [NP [NP the boy] [VP walking to the railroad station]]]

10

Island constraints

Who did Mary see walking to the railroad station? is unambiguous

It can be a question corresponding to (i), but not to (ii):

(i) Mary saw [NP the boy [VP walking to the railroad station]]

(ii) Mary saw [NP [NP the boy] [VP walking to the railroad station]]]

11

Island constraints

The idea of the A-over-A principle: You can’t move an NP (or a VP, S, etc.) if it is dominated by an NP (or a VP, S, etc.):

So A1 is an island,preventing the movementof x.

12

Island constraints

JOHN R. ROSS

In his 1967 dissertation, John R. Ross showed that the A-over-A principle did not work.

He proposed in its place, 6 or 7 other constraints to replace A-over-A.

13

Island constraints

Complex Noun Phrase Constraint:

Element A cannot be moved out of NP1

*Who do you believe the claim that John saw?

14

Island constraints

The Coordinate Structure Constraint

Neither conjunct in a coordinatestructure can be moved.

*What did John eat beans and?

15

Island Constraints

Ross proposed several other constraints as well.

We still talk today about ‘Ross constraints’.

However, Chomsky in 1973 found a way to unify most of them under one simple constraint.

16

Island Constraints

The crucial notion is ‘bounding node’.

Bounding nodes (for English) are the nodes S (=IP), and NP (=DP).

Subjacency (for English): No element may be moved across more than one bounding node.

17

Island Constraints

Who did you believe Bill saw? is a grammatical sentence. The first movement crosses only one bounding node and the second also crosses only one.

18

Island Constraints

*Who did you believe the claim that Bill saw? is an ungrammatical sentence. The first movement crosses only one bounding node, but the second crosses two.

19

Island Constraints

didyouwonder edwhereJohnputwhat

*What did you wonder where John put? is impossible because it is a Subjacency violation.

20

SIMPLIFYING THE SYNTACTIC COMPONENT

The trend in chomskyan thinking has been to reduce the scope and complexity of the ‘narrow syntax’ in two ways:

1. To derive to the extent possible syntactic complexity from independently needed principles,

and

2. To shift the burden of accounting for specific phenomena from the syntax to other components (lexicon, morphology, interfaces).

21

SIMPLIFYING THE SYNTACTIC COMPONENT

It’s the first (more interesting!) strategy that dominated syntactic theory for the most part.

In early transformational syntax, grammars were lists of complex rules, the rule lists of one language not looking very much like the rule lists of another language.

22

SIMPLIFYING THE SYNTACTIC COMPONENT

23

SIMPLIFYING THE SYNTACTIC COMPONENT

Throughout most of the history of generative syntax, language-particular rules have been simplified (or eliminated).

More general principles have replaced them.

24

SIMPLIFYING THE SYNTACTIC COMPONENT

A good example of deriving syntactic facts from independently needed principles:

Joe Emonds’s Structure Preserving ConstraintA large class of T-rules can move an element only into a position that could have been created by the Phrase-Structure rules.

25

SIMPLIFYING THE SYNTACTIC COMPONENT

PASSIVE RULE

Russia defeated Germany.Germany was defeated by Russia.

BUT NOTa. *Germany Russia was defeated by

b. *Germany was Russia defeated by c. *Germany was by Russia defeated d. *Germany was defeated Russia by

Therefore, the passive rule can be greatly simplified.

26

THE LEXICALIST HYPOTHESIS

THE LEXICALIST HYPOTHESIS: Transformational rules cannot change the syntactic category of an item, perform derivational morphology, etc.

Before the late 1960s (and in Generative Semantics), John’s refusal of the offer was derived transformationally from something like the fact that John refused the offer.

27

THE LEXICALIST HYPOTHESIS

But that fails to explain why

John’s three unexpected refusals of the offer

has exactly the same structure as

John’s three boring books about surfing

In other words, you lose a generalization by deriving nouns from verbs.

28

THE LEXICALIST HYPOTHESIS

Another argument for the LH is that the relationship between verbs and their corresponding nominalizations can be very idiosyncratic:

motion, but *mote; usher, but *ush; tuition, but *tuit; etc.  profess (‘declare openly’) — professor (‘university teacher’) — profession (‘career’)

ignore (‘pay no attention to’) — ignorance (‘lack of knowledge’) — ignoramus (‘very stupid person’)

person (‘human individual’) — personal (‘private’) — personable (‘friendly’) — personality (‘character’) — personalize (‘tailor to the individual’) — impersonate (‘pass oneself off as’)

social (‘pertaining to society’; ‘interactive with others’) — socialist (‘follower of a particular political doctrine’) — socialite (‘member of high society’)

29

SURFACE SEMANTIC INTERPRETATION

The lexicalist hypothesis emphasized the importance of ‘shallow’ levels of syntactic structure.

Shallow levels became even more important with the introduction of surface semantic interpretation in the late 1960s.

RAY JACKENDOFF

30

SURFACE SEMANTIC INTERPRETATION

S

NP VP

Q N V NP

many men read Q N

few books

INTERPRET WITH WIDE SCOPE INTERPRET WITH NARROW SCOPE

31

THE EXTENDED STANDARY THEORY

But some aspects of interpretation still seemed to take place at Deep Structure.

For example, interpretation seems to have to take place before Passive, since in a passive sentence like Mary was seen by John, Mary is interpreted in object position.

The model with both Deep and Surface Structure rules of interpretation was called the Extended Standard Theory (EST).

32

THE EXTENDED STANDARD THEORY

33

TRACES AND OTHER ABSTRACT ELEMENTS

The ‘price paid’ for the addition of constraints on movement, surface interpretation, and so on was an explosion of the number of ‘invisible elements’ like traces, PRO, pro, and so on.

Look at how traces work(ed).

34

TRACES AND OTHER ABSTRACT ELEMENTS

S

NP AUX VP 

was V NP

seen

Mary t

35

TRACES AND OTHER ABSTRACT ELEMENTS

But the biggest plus for traces and other empty elements was that they seemed to allow for the unification of constraints on movement and constraints on anaphora.

36

TRACES AND OTHER ABSTRACT ELEMENTS

Mary helped herself andMary seemed t to be happy

are grammatical for the same reason.

*Mary asked [John to help herself] and

*Mary seemed [to be true t to be happy]

are ungrammatical for the same reason.

37

MOVE-a

By the mid 1970s the transformational component had been ‘cleaned up’ to the point where it was suggested (by Chomsky) that there could be one all-purpose movement rule Move- .a

38

MOVE-a

Passive Dative wh-movement

Move-a

Extraposition Scrambling Subject-Aux- Inversion

39

ALL INTERPRETATION NOW ON THE SURFACE

Note that traces allow all interpretation to take place on the surface.

Johni was seen ti by Mary

The trace of John marks its original D-Structure position.

The model on the next slide (also called the Extended Standard Theory) was dominant from the mid 1970s to the mid 1990s.

40

41

THE GOVERNMENT-BINDING THEORY (GB)

The next major step forward in syntactic theory was the Government-Binding Theory (GB).

Published in 1981, LGB synthesized all of the results of the previous decade.

42

THE GOVERNMENT-BINDING THEORY (GB)

GB posited that a grammar is a set of interacting principles.

Movement applies freely, constrained by these principles.

The principles are:BoundingGovernmentTheta-theoryBindingCaseControlX-bar

43

THE GOVERNMENT-BINDING THEORY (GB)

Bounding: The principles that determine how far an element can move. Subjacency is the most important Bounding principle.

Government: The relation between a head and its dependent element, e. g. V and NP, V and PP, INFL and the subject position.

The centrepiece of government was the Empty Category Principle: Every empty element needs to be governed in a particularly strong way.

44

THE GOVERNMENT-BINDING THEORY (GB)

*Who did you wonder if solved the problem is an ECP violation (note that it does not violate Subjacency).

45

THE GOVERNMENT-BINDING THEORY (GB)

Theta-theory: governs the positioning of arguments (elements that have thematic roles).

A consequence of theta-theory: an element can move only into a position that is not assigned a semantic role:

it seems [John is willing to help] Johni seems [ti willing to help]

This movement is possible because seem does not assign a thematic role to its subject.

46

THE GOVERNMENT-BINDING THEORY (GB)

Binding theory: Governs the relationship between an element and its antecedent:

Principle-A: An anaphor must be free in its governing category: Theyi like each otheri, but not *Theyi think that Mary likes each otheri.

Principle-B: A pronominal must be free in its governing category: Johni likes himj, but not *Johni likes himi.

Principle-C: A referring expression must be free everywhere: *Hei thinks that Johni is smart.

47

THE GOVERNMENT-BINDING THEORY (GB)

Case theory: Every NP must be Case-marked:

e was seen John Johni was seen ei

John has to move because participles do not assign Case. (It only looks like Passive is obligatory.)

• Control theory: The relationship between an antecedent and PRO (Johni wants PROi to leave).

48

THE GOVERNMENT-BINDING THEORY (GB)

X-bar theory: The principles that govern phrase structure:

Functional and lexical categories

49

THE GOVERNMENT-BINDING THEORY (GB)

GOVERNMENT THEORY

BINDING THEORY

CASE THEORY

THETA-THEORY

X-BAR THEORY

BINDING THEORY

ETC.

A SENTENCE IS THE PRODUCT OF THE INTERACTION OF THE DIFFERENT PRINCIPLES

50

PARAMETERS

The principles are parameterized: (Ideally) by allowing a small number of principles each to have a small number of settings, the superficially complex differences of the worlds’ languages can be accounted for.

LUIGI RIZZI

51

PARAMETERS

Rizzi noticed that extraction is more permissive in Italian than in English.

In Italian the literal equivalent of English *What did you wonder where John put? is grammatical.

Rizzi proposed that Subjacency is parameterized: Bounding nodes in English: IP and DP Bounding nodes in Italian: CP and DP.

52

PARAMETERS

Other languages are more restrictive than English.

Russian has Wh-Movement, but the wh-element cannot be extracted from its clauses.

So in Russian you can say things like Who did you see?, but not *Who did you ask Mary to see?

Therefore in Russian, the bounding nodes are both CP and IP.

53

PARAMETERS

But what about languages like Chinese that appear not to have any Wh-Movement (wh-in situ languages)?

Zhangsan xiang-zhidao [Lisi mai-le shenme] (Chinese)Zhangsan wonder Lisi bought what‘Zhangsan wonders what Lisi bought’

John-ga dare-o butta ka (Japanese)John-SU who-OB hit‘Whom did John hit?’ (compare: John-ga Bill-o butta ‘John hit Bill’)

54

PARAMETERS

James Huang has worked out the parameters for Chinese and typologically similar languages:

JAMES HUANG

55

PARAMETERS

Huang has proposed a ‘Question Movement Parameter:

In languages like English, Italian, and Russian, Wh-Movement applies in the overt syntax.

In languages like Chinese and Japanese, Wh-Movement applies covertly in LF:

56

PARAMETERS

So the first sentence below would have the LF representation under it:

Zhangsan xiang-zhidao [Lisi mai-le shenme]Zhangsan wonder Lisi bought what‘Zhangsan wonders what Lisi bought’

Zhangsan xiang-zhidao [CP shenmei [IP Lisi mai-leti]

Zhangsan wonder what Lisi bought

57

PARAMETERS

Why believe in LF movement?

Because interpretations in wh-in-situ languages (often!) obey at least some island constraints.

*Ni xiangxin Lisi weisheme lai de shuofayou believe [the claim [that [Lisi came why]]]‘*Why do you believe the claim that Mary came ___?’ *John-wa Mary-ga naze sore-o katta kadooka siritagatte iru noJohn wants to know [whether [Mary bought it why]]‘Why does John want to know whether Mary bought it ___?’

This can be captured if LF Wh-Movement is subject to these constraints.

58

PARAMETERS

Another parametric difference among languages:

No null arguments (English, French): It is raining, *is raining; Mary left, *left

Null subjects (Spanish, Italian): llueve (‘It is raining’); comió la manzana (‘He/She ate the apple’)

Null subjects are usually analyzed as the empty pronominal ‘pro’

59

PARAMETERS

Both null subjects and null objects (Chinese):

Zhangsani xiwang [ei keyi kanjian Lisi]

Zhangsan hope can see Lisi‘Zhangsan hopes that he can see Lisi’

Zhangsani shuo Lisi kanjian-le ei

Zhangsan say Lisi see LE‘Zhangsan said Lisi saw him’

• Huang analysed the empty position as null topic (old information, salient in discourse)

60

PARAMETERS

The Head Parameter has also been historically very important.

Joseph Greenberg’s 1963 paper launched modern typology.

JOSEPH GREENBERG, 1915-2001

61

PARAMETERS

The Greenbergian correlations:VO correlate OV correlate adposition - NP NP – adpositioncopula verb - predicate predicate - copula verb‘want’ - VP VP - ‘want’tense/aspect auxiliary verb - VP VP - tense/aspect auxiliary verbnegative auxiliary - VP VP - negative auxiliarycomplementizer - S S – complementizerquestion particle - S S - question particleadverbial subordinator - S S - adverbial subordinatorarticle - N' N' – articleplural word - N' N' - plural wordnoun - genitive genitive – nounnoun - relative clause relative clause – nounadjective - standard of comparison standard of comparison – adjectiveverb - PP PP – verbverb - manner adverb manner adverb - verb

62

PARAMETERS

The correlations are generally captured by the Head parameter: A language is either head-initial (VO) or head-final (OV).

The problem: Many, probably most, languages are not completely consistent.

For example, Chinese is consistently head-final except in the rule expanding X’ to X0 (if the head is verbal it precedes the complement).

63

PARAMETERS

So Chinese manifests the ordering V-NP, but NP-N:

you sange ren mai-le shuHAVE three man buy-ASP book‘Three men bought books’

 Zhangsan de sanben shuZhangsan DE three book‘Zhangsan’s three books’

64

PARAMETERS

The usual assumption has been that ‘inconsistent’ language have more complex grammars than ‘consistent’ languages.

So Huang has suggested that Chinese has a more complicated X-bar schema to ‘pay’ for its inconsistency:

XP —> YP X’X' —> X0 YP iff X = [+v]

YP X0 otherwise

65

PARAMETERS

Lisa Travis has suggested a different way of handling the inconsistent ordering of Chinese.

LISA TRAVISNormally, if a language is head final, it assigns Case and Theta-Role to the left, as in (a). However Chinese has a special setting (b) that violates this default ordering. 

a. Unmarked setting: HEAD-RIGHT THETA-ASSIGNMENT TO LEFT & CASE-ASSIGNMENT TO LEFT

b. Marked setting (Chinese): HEAD-RIGHT & THETA-ASSIGNMENT TO RIGHT & CASE-ASSIGNMENT TO RIGHT

66

PARAMETERS

Some other important parameters:oSerial verbs (YES, as in Chinese;

no, as in English)oPolysynthesis (YES, as in Inuit and

Athabaskan; NO, as in English and Chinese)

oAccusative or Ergative (Accusative as in English and Chinese; Ergative as in Dyirbal and Georgian)

67

PARAMETERS

68

PARAMETERS

Parameterized principles came to play such an important role that the Government-Binding theory is sometimes called the ‘Principles-and-Parameters’ approach.

At one point, syntacticians were confident that acquiring a language was just a matter of finding the right ON-OFF settings for each language.

69

PARAMETERS

The theory of parameters is in difficulty now:

o One might need hundreds or even thousands of them.

o The clustering effects have not worked out very well.

o They are out of spirit with the Minimalist Program.

o There have been recent attempts to derive them from the process of language acquisition.

70

THE MINIMALIST PROGRAM

Starting in the 1990s, the Government-Binding theory has been gradually replaced by the Minimalist Program.

But the MP is in many ways not an abrupt change of direction from GB.

Many conceptions from GB were incorporated directly into the MP.

71

THE MINIMALIST PROGRAM

Everybody knew that there was huge redundancy in the scope of the GB principles.

Binding, bounding, Case, theta, etc. overlapped considerably in their domains.

Some ungrammatical sentences were ruled out by 3 or 4 different principles!

So it became clear that it was desirable to reduce the number of principles.

72

THE MINIMALIST PROGRAM

Many of the principles seemed to have a ‘least-effort/economy’ essence.

That is, they moved elements as short a distance as possible …

… or they looked at only the closest possible relationship between an anaphor or its antecedent of a gap and its filler.

That seemed to suggest that ‘formal economy’ should be at the centre of the theory.

73

THE MINIMALIST PROGRAM

Other things that were important in the early theory no longer seemed so important.

The levels of D(eep)-Structure and S(urface)-Structure seemed to be playing less and less work.

X-bar theory seemed to follow from independent principles.

More and more generalisations seemed to apply at the interfaces with PF and LF, rather than in the course of the derivation.

74

THE MINIMALIST PROGRAM

Chomsky’s big idea: Rebuild grammatical theory from the ‘bottom up’: Start with only what we know is necessary and go from there.

That’s why it is called the ‘Minimalist Program’.

The idea that language might be ‘perfect’ is a leading idea of the MP.

If that is true, language is unlike all other known biological systems.

75

THE MINIMALIST PROGRAM

• The Minimalist Program is committed to probing to what extent the human language faculty is an optimal solution to minimal design specification.

• The hope is that the only grammatical processes are those that are subject to ‘virtual conceptual necessity’.

• Notice that one consequence is that there is a much smaller innate UG.

76

THE MINIMALIST PROGRAM

has no level of D-Structure or S-Structure

leaves a more important role for the semantic and phonetic interfaces

So processes that used to be considered syntax-internal, like binding, bounding, etc., are now handled at LF or at PF.

77

THE MINIMALIST PROGRAM

has only one structure-building operation, namely, ‘Merge’ (in other words, recursion) is all that there is in the narrow syntax.

Sentences are built from the ‘bottom up’, in the manner of categorial grammar.

Movement is considered to be ‘Internal Merge’, that is, the merging (expansion) of an element already in the derivation.

78

THE MINIMALIST PROGRAM

79

THE MINIMALIST PROGRAM

appeals to ‘third-factor’ explanations: those that are based on factors outside of universal grammar.

The reason for that is clear — the more that you remove from UG, the more that other systems are going to need to take over the work.

So maybe economy principles arise from pressure for efficient computation and have nothing to do with UG.

80

THE MINIMALIST PROGRAM

The next few classes will go into more detail about the MP, its strengths and its weaknesses.

One thing to keep in mind: If the narrow syntax is not accounting for grammatical complexity, then what is?

The answer: the lexicon and the interface components.

If so, does the MP lead to an overall simplification?

81

LEXICALIST APPROACHES

Not all formal linguists work in ‘Chomskyan’ syntax.

Recall that the idea was to impose more and more constraints on syntactic transformations.

By the late 1970s, some linguists posited ‘constraining’ transformations out of existence.

These became (super)-lexicalist approaches.

82

LEXICALIST APPROACHES

Head-Driven Phrase Structure Grammar (HPSG)

IVAN SAG, 1949-2013

83

LEXICALIST APPROACHES

Lexical Functional Grammar (LFG)

JOAN BRESNAN

84

LEXICALIST APPROACHES

A wide range of approaches called ‘Construction Grammar’

ADELE GOLDBERG

Recommended