130
What cross-linguistic What cross-linguistic variation tells us about variation tells us about information density in information density in on-line processing on-line processing John A. Hawkins John A. Hawkins UC Davis & University of UC Davis & University of Cambridge Cambridge

What cross-linguistic variation tells us about information density in on-line processing

Embed Size (px)

DESCRIPTION

What cross-linguistic variation tells us about information density in on-line processing. John A. Hawkins UC Davis & University of Cambridge. P atterns of variation across languages provide relevant evidence for current issues in psychology on information density in on-line processing. - PowerPoint PPT Presentation

Citation preview

Page 1: What cross-linguistic variation tells us about information density in on-line processing

What cross-linguistic variation What cross-linguistic variation tells us about information tells us about information

density in on-line processingdensity in on-line processing

John A. HawkinsJohn A. Hawkins

UC Davis & University of UC Davis & University of CambridgeCambridge

Page 2: What cross-linguistic variation tells us about information density in on-line processing

Patterns of variation across languages provide relevant evidence for current issues in psychology on information density in on-line processing.

22

Page 3: What cross-linguistic variation tells us about information density in on-line processing

Some background, first of all.

I have argued (Hawkins 1994, 2004, 2009, to appear) for a ‘Performance-Grammar Correspondence Hypothesis’:

33

Page 4: What cross-linguistic variation tells us about information density in on-line processing

Performance-Grammar Correspondence Performance-Grammar Correspondence Hypothesis (PGCH)Hypothesis (PGCH)

Languages have conventionalized grammatical Languages have conventionalized grammatical properties in proportion to their degree of properties in proportion to their degree of preference in performance, as evidenced by preference in performance, as evidenced by patterns of selection in corpora and by ease of patterns of selection in corpora and by ease of processing in psycholinguistic experiments.processing in psycholinguistic experiments.

44

Page 5: What cross-linguistic variation tells us about information density in on-line processing

I.e. languages have conventionalized or ‘fixed’ in I.e. languages have conventionalized or ‘fixed’ in their grammars the same kinds of preferences and their grammars the same kinds of preferences and principles that we see in performance, principles that we see in performance,

esp. in those languages in which speakers have esp. in those languages in which speakers have alternatives to choose from in language usealternatives to choose from in language use

55

Page 6: What cross-linguistic variation tells us about information density in on-line processing

E.g. between:E.g. between:

alternative word ordersalternative word orders

relative clauses with or without a relativizer,relative clauses with or without a relativizer,

with a gap or a resumptive pronounwith a gap or a resumptive pronoun

extraposed vs non-extraposed phrasesextraposed vs non-extraposed phrases

‘‘Heavy’ NP Shift or no shiftHeavy’ NP Shift or no shift

alternative ditransitive constructionsalternative ditransitive constructions

zero vs non-zero case markerszero vs non-zero case markers

and so onand so on

66

Page 7: What cross-linguistic variation tells us about information density in on-line processing

The patterns and principles found in these The patterns and principles found in these selections are, according to the PGCH, the same selections are, according to the PGCH, the same patterns and principles that we see in grammars in patterns and principles that we see in grammars in languages with fewer conventionalized options languages with fewer conventionalized options (more fixed orderings, gaps only in certain (more fixed orderings, gaps only in certain relativization environments, etc).relativization environments, etc).

77

Page 8: What cross-linguistic variation tells us about information density in on-line processing

If so, linguists developing theories of grammar and If so, linguists developing theories of grammar and of typological variation need to look seriously at of typological variation need to look seriously at theories of processing, in order to understand which theories of processing, in order to understand which structures are selected in performance, when, and structures are selected in performance, when, and why, with the result that grammars come to why, with the result that grammars come to conventionalize these, and not other, patterns.conventionalize these, and not other, patterns.

See Hawkins (2004, 2009, to appear)See Hawkins (2004, 2009, to appear)

88

Page 9: What cross-linguistic variation tells us about information density in on-line processing

Conversely, psychologists need to look at Conversely, psychologists need to look at grammars and at cross-linguistic variation in order grammars and at cross-linguistic variation in order to see what they tell us about processing. since to see what they tell us about processing. since grammars are conventionalized processing grammars are conventionalized processing preferences.preferences.

99

Page 10: What cross-linguistic variation tells us about information density in on-line processing

Alternative variants across grammars are also, by Alternative variants across grammars are also, by hypothesis, alternatives for efficient processing.hypothesis, alternatives for efficient processing.

And the frequency with which these alternatives are And the frequency with which these alternatives are conventionalized is, again by hypothesis, correlated with conventionalized is, again by hypothesis, correlated with their degree of preference and efficiency in processing.their degree of preference and efficiency in processing.

1010

Page 11: What cross-linguistic variation tells us about information density in on-line processing

Looking at grammatical variation from a processing Looking at grammatical variation from a processing perspective can be revealing, therefore.perspective can be revealing, therefore.

1111

Page 12: What cross-linguistic variation tells us about information density in on-line processing

E.g. Japanese, Korean, Dravidian languages do not move E.g. Japanese, Korean, Dravidian languages do not move heavy and complex phrases to the end of their clauses, like heavy and complex phrases to the end of their clauses, like English does, they move them to the beginning, in English does, they move them to the beginning, in proportion to their (relative) complexity.proportion to their (relative) complexity.

If your psychological model predicts that all languages If your psychological model predicts that all languages should be like English, then you need to go back to the should be like English, then you need to go back to the drawing board and look at these different grammars, and at drawing board and look at these different grammars, and at their performance, before you define and test your model their performance, before you define and test your model further.further.

1212

Page 13: What cross-linguistic variation tells us about information density in on-line processing

Which brings me to today’s topic:Which brings me to today’s topic:

What do grammars and typological variation tell us What do grammars and typological variation tell us about information density in on-line processing?about information density in on-line processing?

1313

Page 14: What cross-linguistic variation tells us about information density in on-line processing

Let us define Information as:

the set of linguistic forms {F} (phonemes, morphemes, words, etc) and the set of

properties {P} (ultimately semantic properties in a semantic representation) that are assigned to them by linguistic convention and in processing.

1414

Page 15: What cross-linguistic variation tells us about information density in on-line processing

Let us define Density as:

the number of these forms and properties that are assigned at a particular point in

processing, i.e. the size of a given {Fi}-{Pi} pairing at point … i … in on-line comprehension or production.

1515

Page 16: What cross-linguistic variation tells us about information density in on-line processing

I see evidence for two very general and complementary principles of information density in cross-linguistic patterns.

1616

Page 17: What cross-linguistic variation tells us about information density in on-line processing

First, First, minimize {Fminimize {Fii}}

minimize the set {Fi} required for the assignment of a particular Pi or {Pi}

I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line.

1717

Page 18: What cross-linguistic variation tells us about information density in on-line processing

The conditions that determine the degree of permissible minimization can be inferred from the patterns themselves and essentially involve efficiency and ease of processing in the assignment of {Pi} to {Fi}.

1818

Page 19: What cross-linguistic variation tells us about information density in on-line processing

Examples will be given from morphological

hierarchies and from syntactic patterns such as

word order and filler-gap dependencies.

1919

Page 20: What cross-linguistic variation tells us about information density in on-line processing

Second, Second, maximize {Pmaximize {Pii}}

maximize the set {Pi} that can be assigned to a particular Fi or {Fi}.

I.e. select and arrange linguistic forms so that as many as I.e. select and arrange linguistic forms so that as many as possible of their (correct) syntactic and semantic properties possible of their (correct) syntactic and semantic properties can be assigned to them at each point in on-line can be assigned to them at each point in on-line processing.processing.

2020

Page 21: What cross-linguistic variation tells us about information density in on-line processing

A set of linear ordering universals will be presented in which category A is systematically preferred before B regardless of language type, i.e. A + B. Positioning B first would always result in incomplete or incorrect assignments of properties to B on-line, whereas positioning it after A permits the full assignment of properties to B at the time it is processed.

These universals provide systematic evidence for maximize {Pi}.

2121

Page 22: What cross-linguistic variation tells us about information density in on-line processing

Consider first some grammatical patterns from morphology Consider first some grammatical patterns from morphology that support the that support the minimize {Fminimize {Fii} } principleprinciple

minimize the set {Fi} required for the assignment of a particular Pi or {Pi}

2222

Page 23: What cross-linguistic variation tells us about information density in on-line processing

In Hawkins (2004) I formulated the following principle of In Hawkins (2004) I formulated the following principle of form minimization based on parallel data from cross-form minimization based on parallel data from cross-linguistic variation and language-internal selection patterns. linguistic variation and language-internal selection patterns.

2323

Page 24: What cross-linguistic variation tells us about information density in on-line processing

Minimize FormsMinimize Forms (MiF) (MiF)The human processor prefers to minimize the formal The human processor prefers to minimize the formal complexity of each linguistic form F (its phoneme, complexity of each linguistic form F (its phoneme, morpheme, word or phrasal units) and the number of morpheme, word or phrasal units) and the number of forms with unique conventionalized property forms with unique conventionalized property assignments, thereby assigning more properties to fewer assignments, thereby assigning more properties to fewer forms. These minimizations apply in proportion to the forms. These minimizations apply in proportion to the ease with which a given property P can be assigned in ease with which a given property P can be assigned in processing to a given F.processing to a given F.

2424

Page 25: What cross-linguistic variation tells us about information density in on-line processing

The basic premise of MiF is that the processing of linguistic The basic premise of MiF is that the processing of linguistic forms and their conventionalized property assignments forms and their conventionalized property assignments requires effort. Minimizing the forms required for property requires effort. Minimizing the forms required for property assignments is efficient since it reduces that effort by fine-assignments is efficient since it reduces that effort by fine-tuning it to information that is already active in processing tuning it to information that is already active in processing through accessibility, high frequency, and inferencing through accessibility, high frequency, and inferencing strategies of various kinds. strategies of various kinds.

2525

Page 26: What cross-linguistic variation tells us about information density in on-line processing

MiF is visible in two sets of variation data across and within MiF is visible in two sets of variation data across and within languages. languages.

The The firstfirst involves complexity differences between surface involves complexity differences between surface forms (morphology and syntax), with preferences for forms (morphology and syntax), with preferences for minimal expression (e.g. zero morphemes) in proportion to minimal expression (e.g. zero morphemes) in proportion to their frequency of occurrence and hence ease of processing their frequency of occurrence and hence ease of processing through degree of expectedness (cf. Levy 2008, Jaeger through degree of expectedness (cf. Levy 2008, Jaeger 2006).2006).

2626

Page 27: What cross-linguistic variation tells us about information density in on-line processing

E.g. singular number for nouns is much more frequent than E.g. singular number for nouns is much more frequent than plural, absolutive case is more frequent than ergative.plural, absolutive case is more frequent than ergative.

Correspondingly singularity on nouns is expressed by Correspondingly singularity on nouns is expressed by shorter or equal morphemes, often zero (cf. English shorter or equal morphemes, often zero (cf. English catcat vs. vs. cat-scat-s), almost never by more. Similarly for absolutive and ), almost never by more. Similarly for absolutive and ergative case marking.ergative case marking.

2727

Page 28: What cross-linguistic variation tells us about information density in on-line processing

A second data pattern captured in MiF involves the number A second data pattern captured in MiF involves the number and nature of lexical and grammatical distinctions that and nature of lexical and grammatical distinctions that languages conventionalize.languages conventionalize.

The preferences are again in proportion to their efficiency, The preferences are again in proportion to their efficiency, including frequency of use.including frequency of use.

2828

Page 29: What cross-linguistic variation tells us about information density in on-line processing

There are preferred lexicalization patterns across languages.There are preferred lexicalization patterns across languages.

Certain grammatical distinctions are cross-linguistically Certain grammatical distinctions are cross-linguistically preferred:preferred:

certain numbers on nounscertain numbers on nouns

certain tensescertain tenses

aspectsaspects

causativitycausativity

some basic speech act typessome basic speech act types

thematical roles like Agent, Patientthematical roles like Agent, Patient

etcetc

2929

Page 30: What cross-linguistic variation tells us about information density in on-line processing

The result is numerous ‘hierarchies’ of lexical and The result is numerous ‘hierarchies’ of lexical and grammatical patternsgrammatical patterns

E.g. the famous color term hierarchy of Berlin & Kay E.g. the famous color term hierarchy of Berlin & Kay (1969), and the Greenbergian morphological hierarchies(1969), and the Greenbergian morphological hierarchies

3030

Page 31: What cross-linguistic variation tells us about information density in on-line processing

Where we have comparative performance and grammatical Where we have comparative performance and grammatical data for these hierarchies it is very clear that the data for these hierarchies it is very clear that the grammatical rankings (e.g. Singular > Plural) correspond to grammatical rankings (e.g. Singular > Plural) correspond to a frequency/ease of processing ranking, with higher a frequency/ease of processing ranking, with higher positions receiving less or equal formal marking and more positions receiving less or equal formal marking and more or equal unique forms for the expression of that category or equal unique forms for the expression of that category alone.alone.

3131

Page 32: What cross-linguistic variation tells us about information density in on-line processing

Form Minimization Prediction 1Form Minimization Prediction 1

The formal complexity of each F is reduced in proportion to The formal complexity of each F is reduced in proportion to the the frequency of that F and/or the processing ease of frequency of that F and/or the processing ease of assigning a given P to a reduced F (e.g. to zero).assigning a given P to a reduced F (e.g. to zero).

3232

Page 33: What cross-linguistic variation tells us about information density in on-line processing

The cross-linguistic effects of this can be seen in the The cross-linguistic effects of this can be seen in the following Greenbergian (1966) morphological hierarchies following Greenbergian (1966) morphological hierarchies (with reformulations and revisions by the authors shown):(with reformulations and revisions by the authors shown):

3333

Page 34: What cross-linguistic variation tells us about information density in on-line processing

Sing > Plur > Dual > Trial/Paucal (for number)

[Greenberg 1966, Croft 2003]

Nom/Abs > Acc/Erg > Dat > Other (for case marking)

[Primus 1999]

Masc,Fem > Neut (for gender) [Hawkins 2004]

Positive > Comparative > Superlative [Greenberg 1966]

3434

Page 35: What cross-linguistic variation tells us about information density in on-line processing

Greenberg pointed out that these grammatical hierarchies define performance frequency rankings for the relevant properties in each domain.

The frequencies of number inflections on nouns in a corpus of Sanskrit, for example, were:

 

Singular = 70.3%; Plural = 25.1%; Dual = 4.6%

3535

Page 36: What cross-linguistic variation tells us about information density in on-line processing

By MiF Prediction 1 we therefore expect:By MiF Prediction 1 we therefore expect:

For each hierarchy H the amount of formal marking (i.e. For each hierarchy H the amount of formal marking (i.e. phonological and morphological complexity) will be greater phonological and morphological complexity) will be greater or equal down each hierarchy position.or equal down each hierarchy position.

3636

Page 37: What cross-linguistic variation tells us about information density in on-line processing

E.g. in (Austronesian) Manam:

3rd Singular suffix on nouns = 0

3rd Plural suffix = -di,

3rd Dual suffix = -di-a-ru

3rd Paucal = -di-a-to (Lichtenberk 1983)

The amount of formal marking increases from singular to plural, and from plural to dual, and is equal from dual to paucal, in accordance with the hierarchy prediction.

3737

Page 38: What cross-linguistic variation tells us about information density in on-line processing

Form Minimization Prediction 2Form Minimization Prediction 2

The number of unique F:P pairings in a language is The number of unique F:P pairings in a language is reduced by grammaticalizing or lexicalizing a given F:P in reduced by grammaticalizing or lexicalizing a given F:P in proportion to proportion to the frequency and preferred expressiveness the frequency and preferred expressiveness of that P in performance.of that P in performance.

3838

Page 39: What cross-linguistic variation tells us about information density in on-line processing

In the lexicon the property associated with teacher is frequently used in performance, that of teacher who is late for class much less so. The event of X hitting Y is frequently selected, that of X hitting Y with X’s right hand less so.

The more frequently selected properties are conventionalized in single lexemes or unique categories and constructions. Less frequently used properties must then be expressed through word and phrase combinations and their meanings must be derived by semantic composition.

3939

Page 40: What cross-linguistic variation tells us about information density in on-line processing

This makes the expression of more frequently used meanings shorter, that of less frequently used meanings longer, and this pattern matches the first pattern of less versus more complexity in the surface forms themselves correlating with relative frequency.

Both patterns make utterances shorter and the communication of meanings more efficient overall, which is why I have collapsed them both into one common Minimize Forms principle.

4040

Page 41: What cross-linguistic variation tells us about information density in on-line processing

By MiF Prediction 2 we expect:By MiF Prediction 2 we expect:

For each hierarchy H (A > B > C) if a language assigns at least one morpheme uniquely to C, then it assigns at least one uniquely to B; if it assigns at least one uniquely to B, it does so to A.

4141

Page 42: What cross-linguistic variation tells us about information density in on-line processing

E.g.a distinct Dual implies a distinct Plural and Singular in the grammar of Sanskrit.

A distinct Dative implies a distinct Accusative and Nominative in the case grammar of Latin and German

(or a distinct Ergative and Absolutive in Basque, cf. Primus 1999).

4242

Page 43: What cross-linguistic variation tells us about information density in on-line processing

A unique number or case assignment low in the hierarchy implies unique and differentiated numbers and cases in all higher positions.

4343

Page 44: What cross-linguistic variation tells us about information density in on-line processing

I.e. grammars prioritize categories for unique formal I.e. grammars prioritize categories for unique formal expression in each of these areas in proportion to their expression in each of these areas in proportion to their relative frequency and preferred expressiveness.relative frequency and preferred expressiveness.

This results in these hierarchies for conventionalized This results in these hierarchies for conventionalized categories whereby languages with fewer categories match categories whereby languages with fewer categories match the performance frequency rankings of languages with the performance frequency rankings of languages with many.many.

4444

Page 45: What cross-linguistic variation tells us about information density in on-line processing

By MiF Prediction 2 we also expect:By MiF Prediction 2 we also expect:

For each hierarchy H any combinatorial features that partition references to a given position on H will result in fewer or equal morphological distinctions down each lower position of H.

4545

Page 46: What cross-linguistic variation tells us about information density in on-line processing

E.g. when gender features combine with and partition number, unique gender-distinctive pronouns often exist for the singular and not for the plural

English he/she/it vs they

the reverse uniqueness is not found (i.e. with a gender-distinctive plural, but gender-neutral singular).

4646

Page 47: What cross-linguistic variation tells us about information density in on-line processing

More generally MiF Prediction 2 leads to a general principle of cross-linguistic morphology:

 

Morphologization

A morphological distinction will be grammaticalized in proportion to the performance frequency with which it can uniquely identify a given subset of

entities {E} in a grammatical and/or semantic domain D.

4747

Page 48: What cross-linguistic variation tells us about information density in on-line processing

This enables us to make sense of ‘markedness reversals’.

E.g. in certain nouns in Welsh whose referents are much more frequently plural than singular, like ‘leaves’ and ‘beans’, it is the singular form that is morphologically more complex than the plural:

deilen ("leaf") vs. dail ("leaves")

ffäen ("bean") vs. ffa ("beans")

Cf. Haspelmath (2002:244).

4848

Page 49: What cross-linguistic variation tells us about information density in on-line processing

All of these data provide support for our All of these data provide support for our minimize {Fminimize {Fii} } principle:principle:

minimize the set {Fi} required for the assignment of a particular Pi or {Pi}

I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line.

4949

Page 50: What cross-linguistic variation tells us about information density in on-line processing

EitherEither the surface forms of the morphology are reduced, the surface forms of the morphology are reduced, in proportion to frequency and/or ease of processing.in proportion to frequency and/or ease of processing.

OrOr lexical and grammatical categories are given priority for lexical and grammatical categories are given priority for unique formal expression, in proportion to frequency unique formal expression, in proportion to frequency and/or preferred expression, resulting in reduced and/or preferred expression, resulting in reduced morpheme and word combinations for their expression.morpheme and word combinations for their expression.

5050

Page 51: What cross-linguistic variation tells us about information density in on-line processing

The result of both is more minimal forms in proportion to The result of both is more minimal forms in proportion to frequency/ease of processing/preferred expressiveness, i.e. frequency/ease of processing/preferred expressiveness, i.e. fewer and shorter forms for the expression of the speakers’ fewer and shorter forms for the expression of the speakers’ preferred meanings in performance.preferred meanings in performance.

5151

Page 52: What cross-linguistic variation tells us about information density in on-line processing

Consider now some patterns from syntax that support the Consider now some patterns from syntax that support the minimize {Fminimize {Fii} } principle principle

minimize the set {Fi} required for the

assignment of a particular Pi or {Pi}

5252

Page 53: What cross-linguistic variation tells us about information density in on-line processing

In Hawkins (2004) I formulated a second minimization In Hawkins (2004) I formulated a second minimization principle for the combination of forms and dependencies principle for the combination of forms and dependencies between them based on parallel data from cross-linguistic between them based on parallel data from cross-linguistic variation and language-internal selection patterns: variation and language-internal selection patterns: Minimize Domains (MiD).Minimize Domains (MiD).

5353

Page 54: What cross-linguistic variation tells us about information density in on-line processing

Minimize Domains (MiD) Minimize Domains (MiD)

The human processor prefers to minimize the connected The human processor prefers to minimize the connected sequences of linguistic forms and their conventionally sequences of linguistic forms and their conventionally associated syntactic and semantic properties in which associated syntactic and semantic properties in which relations of combination and/or dependency are relations of combination and/or dependency are processed. processed.

5454

Page 55: What cross-linguistic variation tells us about information density in on-line processing

5555

E.g. in order to recognize how the words of a sentence are E.g. in order to recognize how the words of a sentence are grouped together into phrases and into a hierarchical tree grouped together into phrases and into a hierarchical tree structure the human parser prefers to access the smallest structure the human parser prefers to access the smallest possible linear string of words that enable it to make each possible linear string of words that enable it to make each phrase structure decision:phrase structure decision:

the principle of the principle of Early Immediate ConstituentsEarly Immediate Constituents (EIC) (EIC) (Hawkins 1994). (Hawkins 1994).

Page 56: What cross-linguistic variation tells us about information density in on-line processing

more generally the processing of all syntactic and semantic more generally the processing of all syntactic and semantic relations prefers relations prefers minimal domainsminimal domains (Hawkins 2004). (Hawkins 2004).

5656

Page 57: What cross-linguistic variation tells us about information density in on-line processing

Minimize Domains predicts that each Phrasal Combination Minimize Domains predicts that each Phrasal Combination Domain (PCD) should be as short as possible. Domain (PCD) should be as short as possible.

A PCD consists of the smallest amount of surface A PCD consists of the smallest amount of surface structure on the basis of which the human processor can structure on the basis of which the human processor can recognize (and produce) a mother node M and assign recognize (and produce) a mother node M and assign the correct daughter ICs to it, i.e. on the basis of which the correct daughter ICs to it, i.e. on the basis of which phrase structure can be processed. phrase structure can be processed.

5757

Page 58: What cross-linguistic variation tells us about information density in on-line processing

Some linear orderings reduce the number of words and Some linear orderings reduce the number of words and their associated properties that need to be accessed for this their associated properties that need to be accessed for this purpose.purpose.

The degree of this preference is proportional to the The degree of this preference is proportional to the minimization difference for the same PCDs in competing minimization difference for the same PCDs in competing orderings. orderings.

5858

Page 59: What cross-linguistic variation tells us about information density in on-line processing

I.e. linear orderings should be preferred that minimize I.e. linear orderings should be preferred that minimize PCDs by maximizing their “IC-to-word” ratios.PCDs by maximizing their “IC-to-word” ratios.

The result will be a preference for short before long The result will be a preference for short before long phrases in head-initial languages like English.phrases in head-initial languages like English.

5959

Page 60: What cross-linguistic variation tells us about information density in on-line processing

(1) a. The man vp[waited pp1[for his son] pp2[in the cold but not unpleasant wind]](1) a. The man vp[waited pp1[for his son] pp2[in the cold but not unpleasant wind]] 1 2 3 4 51 2 3 4 5

----------------------------------------------------------------------b. The man vp[waited pp2[in the cold but not unpleasant wind] pp1[for his son]]b. The man vp[waited pp2[in the cold but not unpleasant wind] pp1[for his son]]

1 2 3 4 5 6 7 8 91 2 3 4 5 6 7 8 9 ----------------------------------------------------------------------------------------------------------------------------------

The three items, V, PP1, PP2 can be recognized and constructed on The three items, V, PP1, PP2 can be recognized and constructed on the basis of five words in (1a), compared with nine in (1b), assuming the basis of five words in (1a), compared with nine in (1b), assuming that (head) categories such as P immediately project to mother that (head) categories such as P immediately project to mother nodes such as PP, enabling the parser to construct them on-line.nodes such as PP, enabling the parser to construct them on-line.

(1a) VP PCD: IC-to-word ratio of 3/5 = 60%(1a) VP PCD: IC-to-word ratio of 3/5 = 60%(1b) ------------------------------------- 3/9 = 33%(1b) ------------------------------------- 3/9 = 33%

6060

Page 61: What cross-linguistic variation tells us about information density in on-line processing

For experimental support (in production and For experimental support (in production and comprehension) for short before long effects in English, comprehension) for short before long effects in English, see e.g. Stallings (1998), Gibson (1998), Wasow (2002).see e.g. Stallings (1998), Gibson (1998), Wasow (2002).

6161

Page 62: What cross-linguistic variation tells us about information density in on-line processing

A Corpus Study Testing MiD in EnglishA Corpus Study Testing MiD in English

Structures like (1ab) with vp{V, PP1, PP2} were examined Structures like (1ab) with vp{V, PP1, PP2} were examined (Hawkins 2000)(Hawkins 2000) in which the two PPs were permutable in which the two PPs were permutable with truth-conditional equivalence (i.e. the speaker had with truth-conditional equivalence (i.e. the speaker had a choice). a choice).

Only 15% (58/394) had long before short. Among those Only 15% (58/394) had long before short. Among those with at least a one-word weight difference, 82% had with at least a one-word weight difference, 82% had short before long, and there was a gradual reduction in short before long, and there was a gradual reduction in the long before short orders the bigger the weight the long before short orders the bigger the weight difference (PPS = shorter PP, PPL = longer PP):difference (PPS = shorter PP, PPL = longer PP):

6262

Page 63: What cross-linguistic variation tells us about information density in on-line processing

(2) PPL > PPS by 1 word(2) PPL > PPS by 1 word by 2-4 by 2-4 by 5-6 by 5-6 by 7+ by 7+

[V PPS PPL][V PPS PPL] 60% (58) 86% (108) 94% (31)60% (58) 86% (108) 94% (31) 99% (68) 99% (68)[V PPL PPS][V PPL PPS] 40% (38) 14% (17) 6% (2)40% (38) 14% (17) 6% (2) 1% (1) 1% (1)

6363

Page 64: What cross-linguistic variation tells us about information density in on-line processing

For For head-finalhead-final languages long before short orders provide minimal languages long before short orders provide minimal domains for processing phrase structure:domains for processing phrase structure:

(3) a. Mary ga [(3) a. Mary ga [[kinoo John ga kekkonsi-ta to][kinoo John ga kekkonsi-ta to]s it-ta]vps it-ta]vp Mary SU yesterday John SU married that said, Mary SU yesterday John SU married that said, ‘‘Mary said that John got married yesterday’Mary said that John got married yesterday’ b. b. [kinoo John ga kekkonsi-ta to][kinoo John ga kekkonsi-ta to]s Mary ga [it-ta]vps Mary ga [it-ta]vp

6464

Page 65: What cross-linguistic variation tells us about information density in on-line processing

Why?Why?

Because placing longer before shorter phrases in Because placing longer before shorter phrases in Japanese positions constructing categories or heads (V, Japanese positions constructing categories or heads (V, P, Comp, etc) close, or as close as possible, to each P, Comp, etc) close, or as close as possible, to each other, each being on the right of their respective phrasal other, each being on the right of their respective phrasal sisters.sisters.

Result: PCDs are smallerResult: PCDs are smaller

6565

Page 66: What cross-linguistic variation tells us about information density in on-line processing

(4) Some basic word orders of (4) Some basic word orders of JapaneseJapanese grammar grammar

a.a. Taroo ga vp[tegami o kaita]Taroo ga vp[tegami o kaita] NP-VNP-V T. SU letter DO wroteT. SU letter DO wrote

'Taroo wrote a letter''Taroo wrote a letter'b.b. Taroo ga pp[Tokyo kara] ryokoositaTaroo ga pp[Tokyo kara] ryokoosita NP-PNP-P T. SU Tokyo from travelledT. SU Tokyo from travelled

'Taroo travelled from Tokyo''Taroo travelled from Tokyo' c.c. np[[Taroo no] ie]np[[Taroo no] ie] Gen-NGen-N

Taroo 's houseTaroo 's house

The heavier phrasal categories, e.g. NPs, occur to the left The heavier phrasal categories, e.g. NPs, occur to the left of their single-word (shorter) heads in Japanese, e.g. of their single-word (shorter) heads in Japanese, e.g. before V and P, and P and V are adjacent on the right before V and P, and P and V are adjacent on the right of their respective sistersof their respective sisters

6666

Page 67: What cross-linguistic variation tells us about information density in on-line processing

For experimental and corpus support for long before short For experimental and corpus support for long before short phrases in Japanese and Korean when there is a plurality of phrases in Japanese and Korean when there is a plurality of phrases before V, see Hawkins (1994, 2004), Yamashita & phrases before V, see Hawkins (1994, 2004), Yamashita & Chang (2001, 2006), Choi (2007)Chang (2001, 2006), Choi (2007)

6767

Page 68: What cross-linguistic variation tells us about information density in on-line processing

An early corpus study testing long before short in Japanese An early corpus study testing long before short in Japanese (Hawkins 1994):(Hawkins 1994):

[{NPo, PPm} V][{NPo, PPm} V]

(5) a. (Tanaka ga) [[Hanako kara]pp [sono hon o]np katta]vp(5) a. (Tanaka ga) [[Hanako kara]pp [sono hon o]np katta]vp

Tanaka SU Hanako from that book DO bought, Tanaka SU Hanako from that book DO bought,

'Tanako bought that book from Hanako''Tanako bought that book from Hanako'

b. (Tanaka ga) [[sono hon o]np [Hanako kara]pp katta]vpb. (Tanaka ga) [[sono hon o]np [Hanako kara]pp katta]vp

6868

Page 69: What cross-linguistic variation tells us about information density in on-line processing

ICS = shorter Immediate Constituent; ICL = longer Immediate ICS = shorter Immediate Constituent; ICL = longer Immediate Constituent; regardless of NP or PP statusConstituent; regardless of NP or PP status

(6) ICL>ICS by 1-2 words(6) ICL>ICS by 1-2 words by 3-4by 3-4 by 5-8 by 5-8 by 9+by 9+[ICS ICL V][ICS ICL V] 34% (30)34% (30) 28% (8) 17% (4)28% (8) 17% (4) 9% (1) 9% (1)[ICL ICS V][ICL ICS V] 66% (59)66% (59) 72% (21) 83% (20)72% (21) 83% (20) 91% (10) 91% (10)

Data from Hawkins (1994:152), collected by Kaoru Horie. Data from Hawkins (1994:152), collected by Kaoru Horie.

I.e. the bigger the weight difference, the more the I.e. the bigger the weight difference, the more the heavy phrase occurs to the left; the mirror-image heavy phrase occurs to the left; the mirror-image of Englishof English

6969

Page 70: What cross-linguistic variation tells us about information density in on-line processing

Given these data from performance, we can now better Given these data from performance, we can now better understand:understand:

(a) the Greenbergian word order correlations(a) the Greenbergian word order correlations

(b) why there are two, and only two, productive word order (b) why there are two, and only two, productive word order types cross-linguistically, head-initial and head-finaltypes cross-linguistically, head-initial and head-final

(c) why and when there are “exceptional” departures from (c) why and when there are “exceptional” departures from the expected head-initial and head-final ordersthe expected head-initial and head-final orders

7070

Page 71: What cross-linguistic variation tells us about information density in on-line processing

The "Greenbergian" word order correlationsThe "Greenbergian" word order correlations ( (Greenberg Greenberg 1963, Dryer 1992)1963, Dryer 1992)

(7) (7) vp{V, pp{P, NP}} vp{V, pp{P, NP}}

a. vp[a. vp[travelstravels pp[ pp[toto the city]] the city]] b. [[the city b. [[the city toto]pp ]pp travelstravels]vp]vp ---------------- -------- --------

c. vp[c. vp[travelstravels [the city [the city toto]pp]]pp] d. [pp[d. [pp[toto the city] the city] travelstravels]vp]vp ------------------------------------ ------------------- -------------------

The adjacency of V and P guarantees the smallest possible string of The adjacency of V and P guarantees the smallest possible string of words for the recognition and cnstruction of VP and its two words for the recognition and cnstruction of VP and its two constituents (V and PP), see the underlinings.constituents (V and PP), see the underlinings.

7171

Page 72: What cross-linguistic variation tells us about information density in on-line processing

Language Quantities in Matthew Dryer's (1992) Cross-linguistic SampleLanguage Quantities in Matthew Dryer's (1992) Cross-linguistic Sample

(8)(8) a. vp[a. vp[VV pp[ pp[PP NP]] = 161 (41%) NP]] = 161 (41%) b. [[NP b. [[NP PP]pp ]pp VV]vp = 204 ]vp = 204 (52%)(52%)

c. vp[c. vp[VV [NP [NP PP]pp] = 18 (5%)]pp] = 18 (5%) d. [pp[d. [pp[PP NP] NP] VV]vp = 6 (2%)]vp = 6 (2%)

Preferred (a)+(b) with consistent ‘head’ ordering = 365/389 (94%) Preferred (a)+(b) with consistent ‘head’ ordering = 365/389 (94%)

7272

Page 73: What cross-linguistic variation tells us about information density in on-line processing

Both head-initial (English) and head-final (Japanese) Both head-initial (English) and head-final (Japanese) orders can be equally efficient for processing: whether orders can be equally efficient for processing: whether heads are adjacent to one another on the left of their heads are adjacent to one another on the left of their respective sisters (English), or on the right (Japanese), respective sisters (English), or on the right (Japanese),

hence two and only two highly word order productive hence two and only two highly word order productive types, as predicted by MiDtypes, as predicted by MiD

7373

Page 74: What cross-linguistic variation tells us about information density in on-line processing

MiD helps us to understand these cross-linguistic patterns MiD helps us to understand these cross-linguistic patterns and their frequencies. It also enables us to explain some and their frequencies. It also enables us to explain some systematic grammatical exceptions to these head-systematic grammatical exceptions to these head-ordering universals.ordering universals.

7474

Page 75: What cross-linguistic variation tells us about information density in on-line processing

Dryer (1992): there are exceptions to the preferred Dryer (1992): there are exceptions to the preferred consistent head ordering when the category that modifies a consistent head ordering when the category that modifies a head is a single-word item, e.g. an adjective modifying a head is a single-word item, e.g. an adjective modifying a noun (noun (yellow bookyellow book).).

7575

Page 76: What cross-linguistic variation tells us about information density in on-line processing

Many otherwise head-initial languages have non-initial Many otherwise head-initial languages have non-initial heads with the adjective preceding the noun here (e.g. heads with the adjective preceding the noun here (e.g. English), many otherwise head-final languages have English), many otherwise head-final languages have noun before adjective (e.g. Basque). noun before adjective (e.g. Basque).

BUT when the non-head is a branching phrasal category BUT when the non-head is a branching phrasal category (e.g. adjective phrase, cf. English (e.g. adjective phrase, cf. English books yellow with agebooks yellow with age) ) there are good correlations with the predominant head there are good correlations with the predominant head ordering. ordering.

Why?Why?

7676

Page 77: What cross-linguistic variation tells us about information density in on-line processing

When heads are separated by a non-branching single When heads are separated by a non-branching single word, then the difference between, say, word, then the difference between, say,

vp[vp[VV [Adj [Adj NN]np] and vp[]np] and vp[VV np[ np[NN Adj]] Adj]] [read [[read [yellow bookyellow book]] []] [readread [ [book yellowbook yellow]] ]]

is short, only one word. Hence the MiD preference for is short, only one word. Hence the MiD preference for noun initiality (and for noun-finality in postpositional noun initiality (and for noun-finality in postpositional languages) is significantly less than it is for intervening languages) is significantly less than it is for intervening branching phrases, and either less head ordering branching phrases, and either less head ordering consistency or no consistency is predictedconsistency or no consistency is predicted

7777

Page 78: What cross-linguistic variation tells us about information density in on-line processing

English [English [yellow bookyellow book] but [] but [book book [[yellow with ageyellow with age]]]]

Romance languages have both prenominal and Romance languages have both prenominal and postnominal adjectivespostnominal adjectives

French French grand homme grand homme / / homme grandhomme grand

but postnominal adjective phrases like Englishbut postnominal adjective phrases like English

7878

Page 79: What cross-linguistic variation tells us about information density in on-line processing

Similarly, when there is just a one-word difference between Similarly, when there is just a one-word difference between competing domains in performance, e.g. in the corpus data competing domains in performance, e.g. in the corpus data of English and Japanese above, both ordering options are of English and Japanese above, both ordering options are generally productive, and so too in grammars.generally productive, and so too in grammars.

7979

Page 80: What cross-linguistic variation tells us about information density in on-line processing

Center embedding hierarchies and EICCenter embedding hierarchies and EIC

The more complex a center-embedded constituent and the longer the The more complex a center-embedded constituent and the longer the PCD for its containing phrase, the fewer languages. PCD for its containing phrase, the fewer languages.

E.g. in the environment pp[P np[__ N]] we have a center-embedding E.g. in the environment pp[P np[__ N]] we have a center-embedding hierarchy, cf. Hawkins (1983).hierarchy, cf. Hawkins (1983).

(9) Prep lgs:(9) Prep lgs: AdjN AdjN 32% 32% NAdjNAdj 68% 68%

PosspNPosspN 12% 12% NPosspNPossp 88% 88%RelNRelN 1% 1% NRelNRel 99% 99%

Mary traveled pp[to np[interesting cities]]Mary traveled pp[to np[interesting cities]] AdjNAdjN np[[this country’s] cities]]np[[this country’s] cities]] PosspNPosspN np[[I already visited] cities]]np[[I already visited] cities]] RelNRelN

8080

Page 81: What cross-linguistic variation tells us about information density in on-line processing

I.e. The Greenbergian word order universals support I.e. The Greenbergian word order universals support domain minimization and locality (Hawkins 2004, Gibson domain minimization and locality (Hawkins 2004, Gibson 1998).1998).

There are minor and predicted departures from consistent There are minor and predicted departures from consistent ordering and head adjacency, as we have seen.ordering and head adjacency, as we have seen.

There are also certain conflicts between MiD and other There are also certain conflicts between MiD and other ease of processing principles, e.g. Fillers before Gaps, ease of processing principles, e.g. Fillers before Gaps, which result in e.g. NRel in certain (non-rigid) OV languages which result in e.g. NRel in certain (non-rigid) OV languages (Hawkins 2004, to appear).(Hawkins 2004, to appear).

8181

Page 82: What cross-linguistic variation tells us about information density in on-line processing

Apart from these, I see no evidence in grammars for any Apart from these, I see no evidence in grammars for any preference for preference for “non-locality” “non-locality” of the kind that certain of the kind that certain psycholinguists have argued for based on experimental psycholinguists have argued for based on experimental evidence with head-final languages (e.g. Konieczny 2000, evidence with head-final languages (e.g. Konieczny 2000, Vasishth & Lewis 2006).Vasishth & Lewis 2006).

E.g. Konieczny showed in a self-paced reading experiment E.g. Konieczny showed in a self-paced reading experiment in German that the verb is read systematically faster when a in German that the verb is read systematically faster when a NRel precedes it, in proportion to the length of Rel. NRel precedes it, in proportion to the length of Rel.

8282

Page 83: What cross-linguistic variation tells us about information density in on-line processing

This finding makes sense in terms of expectedness and This finding makes sense in terms of expectedness and predictability (Levy 2008, Jaeger 2006): the longer you predictability (Levy 2008, Jaeger 2006): the longer you have to wait for a verb in a verb-final structure, the more have to wait for a verb in a verb-final structure, the more you expect to find one, making verb recognition easier.you expect to find one, making verb recognition easier.

However, Konieczny found no evidence for this facilitation However, Konieczny found no evidence for this facilitation at the verb in his German corpus data (Uszkoreit et al. at the verb in his German corpus data (Uszkoreit et al. 1998). Instead the predictions made for the relevant 1998). Instead the predictions made for the relevant structures by MiD and locality were strongly confirmed.structures by MiD and locality were strongly confirmed.

8383

Page 84: What cross-linguistic variation tells us about information density in on-line processing

In fact, corpus studies quite generally do not support non-In fact, corpus studies quite generally do not support non-locality: none of the data from numerous typologically locality: none of the data from numerous typologically diverse language corpora reported in Hawkins (1994, 2004) diverse language corpora reported in Hawkins (1994, 2004) support it.support it.

8484

Page 85: What cross-linguistic variation tells us about information density in on-line processing

Nor do word order universals support it. The Greenbergian Nor do word order universals support it. The Greenbergian correlations strongly support locality, and the exceptions to correlations strongly support locality, and the exceptions to Greenberg involve either small single-word non-localities or Greenberg involve either small single-word non-localities or competitions with independently motivated preferences competitions with independently motivated preferences that do produce some non-localities in certain language that do produce some non-localities in certain language types – but not because non-locality is a good thing! types – but not because non-locality is a good thing!

8585

Page 86: What cross-linguistic variation tells us about information density in on-line processing

The experimental evidence for greater ease of processing at The experimental evidence for greater ease of processing at the verb appears to be evidence, therefore, for a certain the verb appears to be evidence, therefore, for a certain facilitation (arguably through predictability) at a single facilitation (arguably through predictability) at a single temporal point in sentence processing: it tell us nothing, temporal point in sentence processing: it tell us nothing, about processing load for the structure as a whole, and it about processing load for the structure as a whole, and it does not implicate any preference for non-locality as such.does not implicate any preference for non-locality as such.

8686

Page 87: What cross-linguistic variation tells us about information density in on-line processing

Corpus data appear to reflect these overall processing Corpus data appear to reflect these overall processing advantages for alternative structures within which the verb advantages for alternative structures within which the verb may appear early or late. The predictions for these may appear early or late. The predictions for these alternations are based squarely on the preferred locality of alternations are based squarely on the preferred locality of phrasal daughters and these predictions are empirically phrasal daughters and these predictions are empirically correct (Konieczny 2000, Uszkoreit et al. 1998). Non-correct (Konieczny 2000, Uszkoreit et al. 1998). Non-locality arises only when the locality demands of two locality arises only when the locality demands of two phrases are in conflict and cannot be satisfied at the same phrases are in conflict and cannot be satisfied at the same time. time.

E.g. if N is adjacent to its Rel in German, then N is E.g. if N is adjacent to its Rel in German, then N is separated from a final V.separated from a final V.

8787

Page 88: What cross-linguistic variation tells us about information density in on-line processing

Grammars also support locality in word order universals Grammars also support locality in word order universals and provide no evidence for non-locality as an independent and provide no evidence for non-locality as an independent factor.factor.

Let us turn now to relative clauses and look at the cross-Let us turn now to relative clauses and look at the cross-linguistic evidence for form and domain minimization in this linguistic evidence for form and domain minimization in this area.area.

8888

Page 89: What cross-linguistic variation tells us about information density in on-line processing

Relative clauses in many languages (e.g. Hebrew) exhibit both a 'gap' Relative clauses in many languages (e.g. Hebrew) exhibit both a 'gap' and a 'resumptive pronoun' structure:and a 'resumptive pronoun' structure:

(10) a.(10) a. the students the studentsii [that I teach O [that I teach Oii]] GapGap b.b. the students the studentsii [that I teach them [that I teach themii]] Resumptive Resumptive

PronounPronoun

In English we find relative clauses with and without a relative pronoun:In English we find relative clauses with and without a relative pronoun:

(11) a.(11) a. the studentsthe studentsii [whom[whomii I teach O I teach Oii]] Relative PronounRelative Pronoun b. b. the studentsthe studentsii [O [Oii I teach O I teach Oii]] Zero RelativeZero Relative

8989

Page 90: What cross-linguistic variation tells us about information density in on-line processing

Patterns in Performance Patterns in Performance

The retention of the relative pronoun in English is correlated, inter alia,The retention of the relative pronoun in English is correlated, inter alia,with the degree of separation of the relative clause from its head noun:with the degree of separation of the relative clause from its head noun:the bigger the separation, the more the rel pros are retained (Quirk the bigger the separation, the more the rel pros are retained (Quirk 1957, Hawkins 2004:153).1957, Hawkins 2004:153).

9090

Page 91: What cross-linguistic variation tells us about information density in on-line processing

(12) a.(12) a. [the students[the studentsii [whomi I teach Oi] [whomi I teach Oi]] visited me ] visited me b.b. [the students[the studentsii [Oi I teach Oi] [Oi I teach Oi]] visited me] visited me (13) a.(13) a. [the students[the studentsii (from Denmark) (from Denmark) [whomi I teach Oi][whomi I teach Oi]] visited me] visited me

b.b. [the students[the studentsii (from Denmark) (from Denmark) [Oi I teach Oi][Oi I teach Oi]] visited me ] visited me

(14) a. [the students(14) a. [the studentsii (from Denmark)] visited me (from Denmark)] visited me [whomi I teach Oi][whomi I teach Oi] b. [the studentsb. [the studentsii (from Denmark)] visited me (from Denmark)] visited me [Oi I teach Oi][Oi I teach Oi]

(12a)(12a) Rel Pro = 60%Rel Pro = 60% (12b)(12b) Zero Rel = Zero Rel = 40%40%

(13a)(13a) Rel Pro = 94%Rel Pro = 94% (13b)(13b) Zero Rel = Zero Rel = 6%6%

(14a)(14a) Rel Pro = 99%Rel Pro = 99% (14b)(14b) Zero Rel = Zero Rel = 1%1%

9191

Page 92: What cross-linguistic variation tells us about information density in on-line processing

The Hebrew gap is favored when the distance between head The Hebrew gap is favored when the distance between head and gap is small, and gap is small, cf. Ariel (1999):cf. Ariel (1999):

(15) a.(15) a. Shoshana hi [ha-ishaShoshana hi [ha-ishaii [she-nili ohevet O [she-nili ohevet Oii]]]] Gap Gap Shoshana is the-woman that-Nili lovesShoshana is the-woman that-Nili loves

b.b. Shoshana hi [ha-ishaShoshana hi [ha-ishaii [she-nili ohevet ota [she-nili ohevet otaii]]]] Res ProRes Pro

that-Nili loves herthat-Nili loves her

(15a) Gap = 91%(15a) Gap = 91% (15b) Res Pro = 9% (15b) Res Pro = 9%

9292

Page 93: What cross-linguistic variation tells us about information density in on-line processing

Resumptive pronouns in Hebrew become more frequent in more Resumptive pronouns in Hebrew become more frequent in more complex relatives with bigger distances between the head and the complex relatives with bigger distances between the head and the position relativized on, as in (16b):position relativized on, as in (16b):

(16) a. Shoshana hi ha-isha(16) a. Shoshana hi ha-ishaii [she-dani siper [she-moshe rixel [she-nili ohevet O [she-dani siper [she-moshe rixel [she-nili ohevet O ii]]]]]] b. Shoshana hi ha-ishab. Shoshana hi ha-ishaii [she-dani siper [she-moshe rixel [she-nili ohevet ota [she-dani siper [she-moshe rixel [she-nili ohevet ota ii]]]]]]

Shoshana is the-woman that-Danny said that-Moshe gossiped that-Nili loves (her) Shoshana is the-woman that-Danny said that-Moshe gossiped that-Nili loves (her)

For just 3+ words separating head and position relativized on (i.e. gap For just 3+ words separating head and position relativized on (i.e. gap or resumptive pronoun), many more pronouns, or resumptive pronoun), many more pronouns, Ariel (1999)Ariel (1999)

(16a) Gap = 58%(16a) Gap = 58% (16b) Res Pro = 42%(16b) Res Pro = 42%

9393

Page 94: What cross-linguistic variation tells us about information density in on-line processing

Relative clauses with larger domains are more complex and Relative clauses with larger domains are more complex and harder to process. The harder to process relatives have the harder to process. The harder to process relatives have the less minimal and more explicit form, in accordance with our less minimal and more explicit form, in accordance with our minimize {Fminimize {Fii} } principle above.principle above.

9494

Page 95: What cross-linguistic variation tells us about information density in on-line processing

Specifically, the explicit resumptive pronoun makes the relative Specifically, the explicit resumptive pronoun makes the relative easier to process because the position relativized on is now easier to process because the position relativized on is now explicitly signaled and flagged, in contrast to the zero gap, and explicitly signaled and flagged, in contrast to the zero gap, and because the explicit pronoun shortens various domains for because the explicit pronoun shortens various domains for processing combinatorial and dependency relations within the processing combinatorial and dependency relations within the relative clause (these processes must otherwise access the head relative clause (these processes must otherwise access the head noun itself), cf. Hawkins (2004)noun itself), cf. Hawkins (2004)

9595

Page 96: What cross-linguistic variation tells us about information density in on-line processing

A Cross-linguistic Universal: the Accessibility A Cross-linguistic Universal: the Accessibility HierarchyHierarchy

Keenan & Comrie (1977) proposed an Accessibility Keenan & Comrie (1977) proposed an Accessibility Hierarchy (AH) for universal rules of relativization on Hierarchy (AH) for universal rules of relativization on different structural positions within a clause:different structural positions within a clause:

Subjects > Direct Objects > Indirect Objects/Obliques > GenitivesSubjects > Direct Objects > Indirect Objects/Obliques > Genitives (17) a. the professor(17) a. the professorii [that O [that Oii/he/heii wrote the letter] wrote the letter] SUSU b. the professorb. the professorii [that the student knows O [that the student knows Oii/him/himii] ] DODO c. the professorc. the professorii [that the student showed the book to O [that the student showed the book to O ii/him/himi]i]

IO/OBL IO/OBL d. the professord. the professorii [that the student knows O [that the student knows Oii/his/hisii son] son] GENGEN

9696

Page 97: What cross-linguistic variation tells us about information density in on-line processing

Relative clauses "cut off" (may cease to apply) down AH, cf. (18): if a Relative clauses "cut off" (may cease to apply) down AH, cf. (18): if a language can form a relative clause on any low position, it can language can form a relative clause on any low position, it can (generally) relativize on all higher positions. (generally) relativize on all higher positions.

(18) (18) SU only:SU only: Malagasy, MaoriMalagasy, Maori SU & DO only: SU & DO only: Kinyarwanda, IndonesianKinyarwanda, Indonesian SU & DO & IO/OBL only: SU & DO & IO/OBL only: Basque, CatalanBasque, Catalan SU & DO & IO/OBL & GEN: SU & DO & IO/OBL & GEN: English, Hausa English, Hausa

(19)(19) ny mpianatrany mpianatraii [izay nahita ny vehivavy O [izay nahita ny vehivavy Oii] (Malagasy)] (Malagasy)the student that saw the womanthe student that saw the woman

'the student that saw the woman' (NOT the student that the woman saw)'the student that saw the woman' (NOT the student that the woman saw)

9797

Page 98: What cross-linguistic variation tells us about information density in on-line processing

Distribution of gaps to resumptive pronouns across Distribution of gaps to resumptive pronouns across languages also follows the AH with gaps higher and languages also follows the AH with gaps higher and pronouns lower:pronouns lower:

If a gap occurs low on the hierarchy, it occurs all the way up; If a gap occurs low on the hierarchy, it occurs all the way up; if a pronoun occurs high, it occurs all the way down. if a pronoun occurs high, it occurs all the way down.

9898

Page 99: What cross-linguistic variation tells us about information density in on-line processing

Languages Combining Gaps with Resumptive PronounsLanguages Combining Gaps with Resumptive Pronouns (data from Keenan-Comrie 1977)(data from Keenan-Comrie 1977)

SUSU DODO IO/OBLIO/OBL GENGENAobanAoban gapgap propro propro propro ArabicArabic gapgap propro propro proproGilberteseGilbertese gapgap propro propro proproKeraKera gapgap propro propro proproChinese (Peking)Chinese (Peking) gapgap gap/progap/pro propro proproGenoeseGenoese gapgap gap/progap/pro propro proproHebrewHebrew gapgap gap/progap/pro propro proproPersianPersian gapgap gap/progap/pro propro proproTonganTongan gapgap gap/progap/pro propro proproFulaniFulani gapgap gapgap propro proproGreekGreek gapgap gapgap propro proproWelshWelsh gapgap gapgap propro proproZurich GermanZurich German gapgap gapgap propro proproToba BatakToba Batak gapgap * * propro proproHausaHausa gapgap gapgap gap/progap/pro proproShonaShona gapgap gapgap gap/progap/pro proproMinang-KabauMinang-Kabau gapgap ** */pro*/pro proproKoreanKorean gapgap gapgap gapgap proproRovianaRoviana gapgap gapgap gapgap proproTurkishTurkish gapgap gapgap gapgap proproYorubaYoruba gapgap gapgap 0 0 proproMalayMalay gapgap gapgap RPRP proproJavaneseJavanese gapgap * * * * proproJapaneseJapanese gapgap gapgap gapgap gap/progap/pro

Gaps Gaps == 24 [100%] 24 [100%] 17 [65%] 6 [26%] 1 [4%]17 [65%] 6 [26%] 1 [4%] Res Pros =Res Pros = 0 [0%] 0 [0%] 9 [35%] 17 [74%] 24 [96%] 9 [35%] 17 [74%] 24 [96%]

9999

Page 100: What cross-linguistic variation tells us about information density in on-line processing

Keenan-Comrie argued that these grammatical patterns Keenan-Comrie argued that these grammatical patterns were ultimately explainable by declining ease of processing were ultimately explainable by declining ease of processing down the AHdown the AH

They hypothesized that the AH was a complexity ranking They hypothesized that the AH was a complexity ranking

Cf. Hawkins 1999, 2004:177-190, to appear for elaboration in terms of Minimize Cf. Hawkins 1999, 2004:177-190, to appear for elaboration in terms of Minimize Forms and Minimize DomainsForms and Minimize Domains

100100

Page 101: What cross-linguistic variation tells us about information density in on-line processing

Keenan (1987) gave data from English corpora showing Keenan (1987) gave data from English corpora showing declining frequencies of relative clause usage correlating declining frequencies of relative clause usage correlating with the AH positions relativized onwith the AH positions relativized on

101101

Page 102: What cross-linguistic variation tells us about information density in on-line processing

Experimental evidence for SU > (easier than) DO Experimental evidence for SU > (easier than) DO relativization (English)relativization (English)

Wanner & Maratsos (1978): first pointed to greater processing load for Wanner & Maratsos (1978): first pointed to greater processing load for DO relsDO rels

Ford (1983): longer lexical decision times in DO relsFord (1983): longer lexical decision times in DO relsKing & Just (1991): lower comprehension accuracy and longer lexical King & Just (1991): lower comprehension accuracy and longer lexical

decision times in self-paced reading experimentsdecision times in self-paced reading experimentsPickering & Shillcock (1992): significant reaction time differences in Pickering & Shillcock (1992): significant reaction time differences in

self-paced reading experiments, both within and across clause self-paced reading experiments, both within and across clause boundaries (i.e. for embedded and non-embedded gap positions)boundaries (i.e. for embedded and non-embedded gap positions)

King & Kutas (1992, 1993): neurolinguistic support using ERPsKing & Kutas (1992, 1993): neurolinguistic support using ERPsTraxler et al (2002): eye movement study controlling also for agency Traxler et al (2002): eye movement study controlling also for agency

and animacyand animacyFrauenfelder et al (1980) and Holmes & O'Regan (1981): similar (SU > Frauenfelder et al (1980) and Holmes & O'Regan (1981): similar (SU >

DO) results for FrenchDO) results for FrenchKwon et al (2010): for an eye-tracking study of Korean and a recent Kwon et al (2010): for an eye-tracking study of Korean and a recent

literature review of the SU/DO asymmetry in English and other lgsliterature review of the SU/DO asymmetry in English and other lgs

102102

Page 103: What cross-linguistic variation tells us about information density in on-line processing

Let us take stockLet us take stock

We see in these studies a clear correlation between performance data We see in these studies a clear correlation between performance data measuring preferred selections in corpora and ease of processing in measuring preferred selections in corpora and ease of processing in experiments, on the one hand, and the fixed conventions of grammars in experiments, on the one hand, and the fixed conventions of grammars in languages with fewer options:languages with fewer options:

●● SU relatives have been shown to be easier to process than DO in English SU relatives have been shown to be easier to process than DO in English and certain other lgs - correspondingly lgs like Malagasy only have the SU and certain other lgs - correspondingly lgs like Malagasy only have the SU optionoption

● ● the distribution of resumptive pronouns to gaps across grammars follows the distribution of resumptive pronouns to gaps across grammars follows the AH ranking, with pronouns in the more difficult environments, and gaps the AH ranking, with pronouns in the more difficult environments, and gaps in the easier ones: this reverse implicational hierarchy appears to be in the easier ones: this reverse implicational hierarchy appears to be structured by ease of processingstructured by ease of processing

103103

Page 104: What cross-linguistic variation tells us about information density in on-line processing

All of these data, morphological and All of these data, morphological and syntactic, support syntactic, support minimize {Fminimize {Fii}}, in , in proportion to the ease with which a given proportion to the ease with which a given property Pproperty Pii can be assigned in processing can be assigned in processing to a given Fto a given Fii..

104104

Page 105: What cross-linguistic variation tells us about information density in on-line processing

Let is turn now to our second principle of Information Let is turn now to our second principle of Information Density, Density, maximize {Pmaximize {Pii}.}.

maximize the set {Pi} that can be

assigned to a particular Fi or {Fi}.

105105

Page 106: What cross-linguistic variation tells us about information density in on-line processing

In Hawkins (2004) I argued for a further very general In Hawkins (2004) I argued for a further very general principle of efficiency, in addition to Minimize Forms and principle of efficiency, in addition to Minimize Forms and Minimize Domains: Minimize Domains: Maximimize On-line ProcessingMaximimize On-line Processing..

There is a clear preference for selecting and arranging There is a clear preference for selecting and arranging linguistic forms so as to provide the earliest possible access linguistic forms so as to provide the earliest possible access to as much of the ultimate syntactic and semantic to as much of the ultimate syntactic and semantic representation as possible. representation as possible.

106106

Page 107: What cross-linguistic variation tells us about information density in on-line processing

This principle also results in a preference for error-free on-This principle also results in a preference for error-free on-line processing since errors delay the assignment of line processing since errors delay the assignment of intended properties and increase processing effort. intended properties and increase processing effort.

107107

Page 108: What cross-linguistic variation tells us about information density in on-line processing

Maximize On-line ProcessingMaximize On-line Processing (MaOP) (MaOP)The human processor prefers to maximize the set of properties The human processor prefers to maximize the set of properties that are assignable to each item X as X is processed, thereby that are assignable to each item X as X is processed, thereby increasing O(n-line) P(roperty) to U(ltimate) P(roperty) ratios. The increasing O(n-line) P(roperty) to U(ltimate) P(roperty) ratios. The maximization difference between competing orders and structures maximization difference between competing orders and structures will be a function of the number of properties that are unassigned will be a function of the number of properties that are unassigned or misassigned to X in a structure/sequence S, compared with the or misassigned to X in a structure/sequence S, compared with the number in an alternative. number in an alternative.

108108

Page 109: What cross-linguistic variation tells us about information density in on-line processing

Clear examples can be seen across languages when Clear examples can be seen across languages when certain common categories {A, B} are ordered certain common categories {A, B} are ordered asymmetrically A + B, regardless of the language type, in asymmetrically A + B, regardless of the language type, in contrast to symmetries in which both orders are contrast to symmetries in which both orders are productive [A+B/B+A], e.g. Verb+Object [VO] and productive [A+B/B+A], e.g. Verb+Object [VO] and Object+Verb [OV].Object+Verb [OV].

Some examples of asymmetries are summarized below:Some examples of asymmetries are summarized below:

109109

Page 110: What cross-linguistic variation tells us about information density in on-line processing

Some AsymmetriesSome Asymmetries (Hawkins 2002, 2004)(Hawkins 2002, 2004)

(i)(i) Displaced WH preposed to the left of its (gap-containing) clause Displaced WH preposed to the left of its (gap-containing) clause [almost exceptionless][almost exceptionless]

WhoWhoii [did you say O [did you say Oii came to the party] came to the party]

(ii) Head Noun (Filler) to the left of its (gap-containing) Relative Clause(ii) Head Noun (Filler) to the left of its (gap-containing) Relative ClauseE.g. E.g. the studentsthe studentsii [[that I teach Othat I teach Oi]i]

If a lg has basic VO, then NRel If a lg has basic VO, then NRel [exceptions = rare][exceptions = rare] (Hawkins 1983)(Hawkins 1983)

VOVO OVOVNRel (English)NRel (English) NRel (Persian)NRel (Persian)

*RelN*RelN RelN (Japanese)RelN (Japanese)

110110

Page 111: What cross-linguistic variation tells us about information density in on-line processing

(iii) Antecedent precedes Anaphor [highly preferred cross-linguistically](iii) Antecedent precedes Anaphor [highly preferred cross-linguistically]E.g. E.g. JohnJohn washed washed himselfhimself (SVO), (SVO), Washed Washed John himselfJohn himself (VSO), (VSO), John himselfJohn himself washed washed (SOV) = highly preferred over e.g. (SOV) = highly preferred over e.g. Washed Washed himself Johnhimself John (VOS) (VOS)

(iv) Wide Scope Quantifier/Operator precedes Narrow Scope Q/O [preferred](iv) Wide Scope Quantifier/Operator precedes Narrow Scope Q/O [preferred]E.g. E.g. Every student a bookEvery student a book read read (SOV lgs) (SOV lgs) preferred preferred A book every studentA book every student read read (SOV lgs) (SOV lgs) preferred preferred

111111

Page 112: What cross-linguistic variation tells us about information density in on-line processing

In these examples there is an asymmetric dependency of B on A: the In these examples there is an asymmetric dependency of B on A: the gap is dependent on the head-noun filler in (ii) (for gap-filling), the gap is dependent on the head-noun filler in (ii) (for gap-filling), the anaphor on its antecedent in (iii) (for co-indexation), the narrow anaphor on its antecedent in (iii) (for co-indexation), the narrow scope quantifier on the wide scope quantifier in (iv) (the number of scope quantifier on the wide scope quantifier in (iv) (the number of books read depends on the quantifier in the subject NP in books read depends on the quantifier in the subject NP in Every Every student read a book/Many students read a book/Three students student read a book/Many students read a book/Three students read a bookread a book, etc). , etc).

112112

Page 113: What cross-linguistic variation tells us about information density in on-line processing

The assignment of dependent properties to B is more efficient when A The assignment of dependent properties to B is more efficient when A precedes, since these properties can be assigned to B immediately in precedes, since these properties can be assigned to B immediately in on-line processing. In the reverse B + A there will be delays in property on-line processing. In the reverse B + A there will be delays in property assignments on-line ("unassignments") or misanalyses assignments on-line ("unassignments") or misanalyses ("misassignments"). ("misassignments").

If the relative clause precedes the head noun the gap is not immediately If the relative clause precedes the head noun the gap is not immediately recognized and there are delays in argument structure assignment recognized and there are delays in argument structure assignment within the relative clause; if a narrow scope quantifier precedes a wide within the relative clause; if a narrow scope quantifier precedes a wide scope quantifier, a wide scope interpretation will generally be scope quantifier, a wide scope interpretation will generally be (mis)assigned on-line to the narrow scope quantifier; and so on.(mis)assigned on-line to the narrow scope quantifier; and so on.

113113

Page 114: What cross-linguistic variation tells us about information density in on-line processing

I have argued that MaOP (in the form of Fillers before Gaps) I have argued that MaOP (in the form of Fillers before Gaps) competes with Minimize Domains to give asymmetries in relative competes with Minimize Domains to give asymmetries in relative clause ordering: clause ordering:

a head before relative clause preference is visible in both VO and a head before relative clause preference is visible in both VO and OV languages, with only rigid V-final languages resisting this OV languages, with only rigid V-final languages resisting this preference to any degree (Hawkins 2004:203-10). preference to any degree (Hawkins 2004:203-10).

114114

Page 115: What cross-linguistic variation tells us about information density in on-line processing

MiDMiD MaOPMaOP

VO & NRel:VO & NRel: + + + +

VO & RelN:VO & RelN: - - - -

OV & RelN:OV & RelN: + + - -

OV & NRel:OV & NRel: - - + +

115115

Page 116: What cross-linguistic variation tells us about information density in on-line processing

WALS data (Dryer 2005ab):WALS data (Dryer 2005ab):

Rel-NounRel-Noun Noun-Rel or Mixed/OtherNoun-Rel or Mixed/Other

Rigid SOVRigid SOV 50% (17) 50% (17) 50% (17) 50% (17)

Non-rigid SOV 0% (0)Non-rigid SOV 0% (0) 100% (17) 100% (17)

VOVO 3% (3) 3% (3) 97% (116) 97% (116)

116116

Page 117: What cross-linguistic variation tells us about information density in on-line processing

Language Variation in PsycholinguisticsLanguage Variation in Psycholinguistics

What this all means for psycholinguistics is that What this all means for psycholinguistics is that grammatical patterns and rules provide data that can grammatical patterns and rules provide data that can inform language processing theories inform language processing theories (Hawkins 2007, Jaeger & (Hawkins 2007, Jaeger & Norcliffe 2009).Norcliffe 2009).

Conversely, processing can help us understand Conversely, processing can help us understand grammars better.grammars better.

117117

Page 118: What cross-linguistic variation tells us about information density in on-line processing

We can now give an explanation for what has been simply We can now give an explanation for what has been simply observed and stipulated so far in grammatical models, observed and stipulated so far in grammatical models, e.g. the existence of a head ordering parameter, with e.g. the existence of a head ordering parameter, with head-initial (VO) and head-final (OV) lgs being roughly head-initial (VO) and head-final (OV) lgs being roughly equally productive:equally productive:

they are equally efficient for processing whether adjacent they are equally efficient for processing whether adjacent heads occur on the left of their sisters (English), or on the heads occur on the left of their sisters (English), or on the right (Japanese).right (Japanese).

118118

Page 119: What cross-linguistic variation tells us about information density in on-line processing

Performance data motivate the Accessibility Hierarchy for Performance data motivate the Accessibility Hierarchy for relative clause formation, the cut-offs for relativization, the relative clause formation, the cut-offs for relativization, the reverse implicational patterns for gaps and resumptive reverse implicational patterns for gaps and resumptive pronouns, and numerous other regularities and language-pronouns, and numerous other regularities and language-particular subtleties (Hawkins 1999, 2004, to appear).particular subtleties (Hawkins 1999, 2004, to appear).

119119

Page 120: What cross-linguistic variation tells us about information density in on-line processing

This approach helps us understand exceptions to proposed This approach helps us understand exceptions to proposed universals (involving e.g. differential ordering for single-universals (involving e.g. differential ordering for single-word versus phrasal modifiers of heads).word versus phrasal modifiers of heads).

I.e. linguists can benefit from the inclusion of processing I.e. linguists can benefit from the inclusion of processing ideas in their theories and descriptions.ideas in their theories and descriptions.

120120

Page 121: What cross-linguistic variation tells us about information density in on-line processing

The leftward versus rightward movement of heavy phrases The leftward versus rightward movement of heavy phrases in different language types is directly relevant for in different language types is directly relevant for processing theories, on the other hand (cf. the theory of de processing theories, on the other hand (cf. the theory of de Smedt 1994 which predicts only rightward movements).Smedt 1994 which predicts only rightward movements).

As is the absence of any independent evidence for “anti-As is the absence of any independent evidence for “anti-locality” in any word order universals.locality” in any word order universals.

121121

Page 122: What cross-linguistic variation tells us about information density in on-line processing

For theories of information density we have seen lots of For theories of information density we have seen lots of cross-linguistic patterns and hierarchies in morphology and cross-linguistic patterns and hierarchies in morphology and syntax that support two complementary principles:syntax that support two complementary principles:

minimize {Fminimize {Fii}} and and maximize {Pmaximize {Pii}}

122122

Page 123: What cross-linguistic variation tells us about information density in on-line processing

Minimize {FMinimize {Fii}}

minimize the set {Fminimize the set {Fii} required for the } required for the

assignment of a particular Passignment of a particular Pii or {P or {Pii}}

in proportion to the processing ease with which each Pin proportion to the processing ease with which each Pii can can be assigned.be assigned.

123123

Page 124: What cross-linguistic variation tells us about information density in on-line processing

Maximize {PMaximize {Pii}}

Maximize the set {PMaximize the set {Pii} that can be} that can be

assigned to a particular Fassigned to a particular Fii or {F or {Fii}}

at each point in on-line processing.at each point in on-line processing.

124124

Page 125: What cross-linguistic variation tells us about information density in on-line processing

ReferencesReferences Ariel, M. (1999) 'Cognitive universals and linguistic conventions: The case of resumptive Ariel, M. (1999) 'Cognitive universals and linguistic conventions: The case of resumptive

pronouns', pronouns', Studies in LanguageStudies in Language 23:217-269. 23:217-269.Choi, H.W. (2007) ‘Length and order: A corpus study of Korean dative-accusative Choi, H.W. (2007) ‘Length and order: A corpus study of Korean dative-accusative

construction’, construction’, Discourse and CognitionDiscourse and Cognition 14: 207-27. 14: 207-27.Croft, W. (1990) Croft, W. (1990) Typology and UniversalsTypology and Universals, CUP, Cambridge., CUP, Cambridge.de Smedt, K.J.M.J. (1994) 'Parallelism in incremental sentence generation', in G. de Smedt, K.J.M.J. (1994) 'Parallelism in incremental sentence generation', in G.

Adriens & U. Hahn, eds., Adriens & U. Hahn, eds., Parallelism in Natural Language ProcessingParallelism in Natural Language Processing, Ablex, , Ablex, Norwood, NJ.Norwood, NJ.

Dryer, M.S. (1992) 'The Greenbergian word order correlations', Dryer, M.S. (1992) 'The Greenbergian word order correlations', LanguageLanguage 68: 81-138. 68: 81-138.Dryer, M.S. (2005a) ‘Order of relative clause and noun’, in M. Haspelmath, M.S. Dryer, Dryer, M.S. (2005a) ‘Order of relative clause and noun’, in M. Haspelmath, M.S. Dryer,

D. Gil & B. Comrie, eds., D. Gil & B. Comrie, eds., The World Atlas of Language StructuresThe World Atlas of Language Structures, OUP, Oxford., OUP, Oxford.Dryer, M.S. (2005b) ‘Relationship between the order of object and verb and the order of Dryer, M.S. (2005b) ‘Relationship between the order of object and verb and the order of

relative clause and noun’, in M. Haspelmath, M.S. Dryer, D. Gil & B. Comrie, eds., relative clause and noun’, in M. Haspelmath, M.S. Dryer, D. Gil & B. Comrie, eds., The World Atlas of Language StructuresThe World Atlas of Language Structures, OUP, Oxford., OUP, Oxford.

Ford, M. (1983) 'A method of obtaining measures of local parsing complexity throughout Ford, M. (1983) 'A method of obtaining measures of local parsing complexity throughout sentences', sentences', Journal of Verbal Learning and Verbal BehaviorJournal of Verbal Learning and Verbal Behavior 22: 203-218. 22: 203-218.

Gibson, E. (1998) 'Linguistic complexity: Locality of syntactic dependencies', Gibson, E. (1998) 'Linguistic complexity: Locality of syntactic dependencies', CognitionCognition 68: 1-76.68: 1-76.

Greenberg, J.H. (1963) 'Some universals of grammar with particular reference to the Greenberg, J.H. (1963) 'Some universals of grammar with particular reference to the order of meaningful elements', in J.H. Greenberg, ed., order of meaningful elements', in J.H. Greenberg, ed., Universals of LanguageUniversals of Language, MIT , MIT Press, Cambridge, Mass..Press, Cambridge, Mass..

Greenberg, J.H. (1966) Greenberg, J.H. (1966) Language Universals with Special Reference to Feature Language Universals with Special Reference to Feature HierarchiesHierarchies, Mouton, The Hague., Mouton, The Hague.

Haspelmath, M. (2002) Haspelmath, M. (2002) MorphologyMorphology, Arnold, London., Arnold, London.Hawkins, J.A. (1983) Hawkins, J.A. (1983) Word Order UniversalsWord Order Universals, Academic Press, New York., Academic Press, New York.

125125

Page 126: What cross-linguistic variation tells us about information density in on-line processing

Hawkins, J.A. (1994) Hawkins, J.A. (1994) A Performance Theory of Order and ConstituencyA Performance Theory of Order and Constituency, CUP, , CUP, Cambridge.Cambridge.

Hawkins, J.A. (1999) 'Processing complexity and filler-gap dependencies', Hawkins, J.A. (1999) 'Processing complexity and filler-gap dependencies', LanguageLanguage 75: 75: 244-285244-285

Hawkins, J.A. (2000) 'The relative ordering of prepositional phrases in English: Going Hawkins, J.A. (2000) 'The relative ordering of prepositional phrases in English: Going beyond manner-place-time', beyond manner-place-time', Language Variation and ChangeLanguage Variation and Change 11: 231-266. 11: 231-266.

Hawkins, J.A. (2004) Hawkins, J.A. (2004) Efficiency and Complexity in GrammarsEfficiency and Complexity in Grammars, OUP, Oxford., OUP, Oxford.Hawkins, J.A. (2007) ‘Processing typology and why psychologists need to know about it’, Hawkins, J.A. (2007) ‘Processing typology and why psychologists need to know about it’,

New Ideas in PsychologyNew Ideas in Psychology 25: 87-107. 25: 87-107.Hawkins, J.A. (2009) ‘Language universals and the performance-grammar Hawkins, J.A. (2009) ‘Language universals and the performance-grammar

correspondence hypothesis’, in M.H. Christiansen, C. Collins & S. Edelman, eds., correspondence hypothesis’, in M.H. Christiansen, C. Collins & S. Edelman, eds., Language UniversalsLanguage Universals, OUP, Oxford, 54-78., OUP, Oxford, 54-78.

Hawkins, J.A. (to appear) Hawkins, J.A. (to appear) Cross-linguistic Variation and EfficiencyCross-linguistic Variation and Efficiency, OUP, Oxford., OUP, Oxford.Holmes, V.M. & O'Regan, J.K. (1981) 'Eye fixation patterns during the reading of relative Holmes, V.M. & O'Regan, J.K. (1981) 'Eye fixation patterns during the reading of relative

clause sentences', clause sentences', Journal of Verbal Learning and Verbal BehaviorJournal of Verbal Learning and Verbal Behavior 20: 417-430. 20: 417-430.Jaeger, T.F. (2006) ‘Redundancy and syntactic reduction in spontaneous speech’, Jaeger, T.F. (2006) ‘Redundancy and syntactic reduction in spontaneous speech’,

Unpublished PhD dissertation, Stanford University, Stanford, CA. Unpublished PhD dissertation, Stanford University, Stanford, CA.

Jaeger, T.F. & Norcliffe, E. (2009) ‘The cross-linguistic study of sentence production: Jaeger, T.F. & Norcliffe, E. (2009) ‘The cross-linguistic study of sentence production: State of the art and a call for action’, State of the art and a call for action’, Language and Linguistics CompassLanguage and Linguistics Compass, Blackwell., Blackwell.

Just, M.A. & Carpenter, P.A. (1992) 'A capacity theory of comprehension: Individual Just, M.A. & Carpenter, P.A. (1992) 'A capacity theory of comprehension: Individual differences in working memory', differences in working memory', Psychological ReviewPsychological Review 99:122-49. 99:122-49.

Keenan, E.L. (1987) ‘Variation in Universal Grammar’, in E.L. Keenan Keenan, E.L. (1987) ‘Variation in Universal Grammar’, in E.L. Keenan Universal Universal Grammar: 15 EssaysGrammar: 15 Essays, Croom Helm, London, 46-59., Croom Helm, London, 46-59.

126126

Page 127: What cross-linguistic variation tells us about information density in on-line processing

Keenan, E.L. & Hawkins, S. (1987) 'The psychological validity of the Accessibility Keenan, E.L. & Hawkins, S. (1987) 'The psychological validity of the Accessibility Hierarchy', in E.L. Keenan, Hierarchy', in E.L. Keenan, Universal Grammar: 15 EssaysUniversal Grammar: 15 Essays, Croom Helm, London., Croom Helm, London.

King, J. & Just, M.A. (1991) 'Individual differences in syntactic processing: The role of King, J. & Just, M.A. (1991) 'Individual differences in syntactic processing: The role of working memory', working memory', Journal of Memory and LanguageJournal of Memory and Language 30: 580-602. 30: 580-602.

King, J. & Kutas, M. (1992) 'ERP responses to sentences that vary in syntactic King, J. & Kutas, M. (1992) 'ERP responses to sentences that vary in syntactic complexity: Differences between good and poor comprehenders', Poster, Annual complexity: Differences between good and poor comprehenders', Poster, Annual Conference of the Society for Psychophysiological Research, San Diego, CA.Conference of the Society for Psychophysiological Research, San Diego, CA.

King, J. & Kutas, M. (1993) 'Bridging gaps with longer spans: Enhancing ERP studies of King, J. & Kutas, M. (1993) 'Bridging gaps with longer spans: Enhancing ERP studies of parsing', Poster presented at the Sixth Annual CUNY Sentence Processing parsing', Poster presented at the Sixth Annual CUNY Sentence Processing Conference, University of Massachusetts, Amherst.Conference, University of Massachusetts, Amherst.

Konieczny, L. (2000) ‘Locality and parsing complexity’, Konieczny, L. (2000) ‘Locality and parsing complexity’, Journal of Psycholinguistic Journal of Psycholinguistic ResearchResearch 29(6): 627-645. 29(6): 627-645.

Kwon, N., Gordon, P.C., Lee, Y., Kluender, R. & Polinsky, M. (2010) ‘Cognitive and Kwon, N., Gordon, P.C., Lee, Y., Kluender, R. & Polinsky, M. (2010) ‘Cognitive and linguistic factors affecting subject/object asymmetry: An eye-tracking study of linguistic factors affecting subject/object asymmetry: An eye-tracking study of prenominal relative clauses in Korean’, prenominal relative clauses in Korean’, LanguageLanguage 86: 546-82. 86: 546-82.

Levy, R. (2008) ‘Expectation-based syntactic comprehension’, Levy, R. (2008) ‘Expectation-based syntactic comprehension’, CognitionCognition 106: 1126-1177. 106: 1126-1177.Lichtenberk, F. (1983) Lichtenberk, F. (1983) A Grammar of ManamA Grammar of Manam, University of Hawaii Press, Honolulu., University of Hawaii Press, Honolulu.Primus, B. (1999) Primus, B. (1999) Cases and Thematic RolesCases and Thematic Roles, Max Niemeyer Verlag, Tuebingen., Max Niemeyer Verlag, Tuebingen.Quirk (1957) 'Relative clauses in educated spoken English', Quirk (1957) 'Relative clauses in educated spoken English', English StudiesEnglish Studies 38: 97-109. 38: 97-109. Keenan, E.L. & Comrie, B. (1977) 'Noun phrase accessibility and Universal Grammar', Keenan, E.L. & Comrie, B. (1977) 'Noun phrase accessibility and Universal Grammar',

Linguistic InquiryLinguistic Inquiry 8: 63-99. 8: 63-99.Stallings, L. M. (1998) 'Evaluating Heaviness: Relative Weight in the Spoken Production Stallings, L. M. (1998) 'Evaluating Heaviness: Relative Weight in the Spoken Production

of Heavy-NP Shift', Ph.D. dissertation, University of Southern California.of Heavy-NP Shift', Ph.D. dissertation, University of Southern California.Traxler, M.J., Morris, R.K. & Seeley, R.E. (2002) ‘Processing subject and object relative Traxler, M.J., Morris, R.K. & Seeley, R.E. (2002) ‘Processing subject and object relative

clauses: Evidence from eye movements’, clauses: Evidence from eye movements’, Journal of Memory and LanguageJournal of Memory and Language 47: 69- 47: 69-90.90.

127127

Page 128: What cross-linguistic variation tells us about information density in on-line processing

Uszkoreit, H., Brants, T., Duchier, D., Krenn, B., Konieczny, L., Oepen, S. and Skut, W. Uszkoreit, H., Brants, T., Duchier, D., Krenn, B., Konieczny, L., Oepen, S. and Skut, W. (1998) ‘Studien zur performanzorientierten Linguistik: Aspekte der (1998) ‘Studien zur performanzorientierten Linguistik: Aspekte der Relativsatzextraposition im Deutschen’, Relativsatzextraposition im Deutschen’, KognitionswissenschaftKognitionswissenschaft 7: 129-133. 7: 129-133.

Vasishth, S & Lewis, R. (2006) ‘Argument-head distance and processing complexity: Vasishth, S & Lewis, R. (2006) ‘Argument-head distance and processing complexity: Explaining both locality and anti-locality effects’, Explaining both locality and anti-locality effects’, LanguageLanguage 82: 767-794. 82: 767-794.

Wanner, E. & Maratsos, M. (1978) 'An ATN approach to comprehension', in M. Halle, J. Wanner, E. & Maratsos, M. (1978) 'An ATN approach to comprehension', in M. Halle, J. Bresnan & G.A. Miller, eds., Bresnan & G.A. Miller, eds., Linguistic Theory and Psychological RealityLinguistic Theory and Psychological Reality, MIT Press, , MIT Press, Cambridge, Mass., 119-161.Cambridge, Mass., 119-161.

Wasow, T. (2002) Wasow, T. (2002) Postverbal BehaviorPostverbal Behavior, CSLI Publications, Stanford University, Stanford., CSLI Publications, Stanford University, Stanford.Yamashita, H. & Chang, F. (2001) '"Long before short" preference in the production of a Yamashita, H. & Chang, F. (2001) '"Long before short" preference in the production of a

head-final language', head-final language', CognitionCognition, 81: B45-B55., 81: B45-B55.Yamashita, H. & Chang, F. (2006) ‘Sentence production in Japanese’, in M. Nakayama, R. Yamashita, H. & Chang, F. (2006) ‘Sentence production in Japanese’, in M. Nakayama, R.

Mazuka & Y. Shirai, eds., Mazuka & Y. Shirai, eds., Handbook of East Asian Psycholinguistics, Vol.2Handbook of East Asian Psycholinguistics, Vol.2, CUP, , CUP, Cambridge.Cambridge.

128128

Page 129: What cross-linguistic variation tells us about information density in on-line processing

AcknowledgementsAcknowledgements

Special thanks to the many collaborators and contributors to Special thanks to the many collaborators and contributors to this research program as presented here, especially:this research program as presented here, especially:

Gontzal AldaiGontzal Aldai Barbara JansingBarbara Jansing

Bernard Comrie Bernard Comrie Stephen MatthewsStephen Matthews

Gisbert FanselowGisbert Fanselow Fritz NewmeyerFritz Newmeyer

Luna FilipovicLuna Filipovic Beatrice PrimusBeatrice Primus

Kaoru HorieKaoru Horie Anna SiewierskaAnna Siewierska

Ed KeenanEd Keenan Lynne StallingsLynne Stallings

Lewis LawyerLewis Lawyer Tom WasowTom Wasow

129129

Page 130: What cross-linguistic variation tells us about information density in on-line processing

Financial SupportFinancial Support

has been received from the following sources for the has been received from the following sources for the research reported here and is gratefully acknowledged:research reported here and is gratefully acknowledged:

German National Science Foundation fellowship (DFG grant INK 12/A1)German National Science Foundation fellowship (DFG grant INK 12/A1)

European Science Foundation small grantEuropean Science Foundation small grant

Max Planck Institute for Evolutionary Anthropology (Leipzig) research Max Planck Institute for Evolutionary Anthropology (Leipzig) research fellowships 2000-04fellowships 2000-04

University of California Davis research fundsUniversity of California Davis research funds

University of Cambridge Research Centre for English and Applied University of Cambridge Research Centre for English and Applied Linguistics research funds and UCD teaching buy-outs 2007-10Linguistics research funds and UCD teaching buy-outs 2007-10

130130