67
Frequency, Chunks & Hesitations An Empirical Analysis of Bybee’s Exemplar Model Dr. Ulrike Schneider 1 Linguistisches Kolloquium Mainz, 19. Januar 2015

Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Embed Size (px)

Citation preview

Page 1: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Frequency, Chunks & HesitationsAn Empirical Analysis of Bybee’s Exemplar Model

Dr. Ulrike Schneider

1

Linguistisches Kolloquium Mainz, 19. Januar 2015

Page 2: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Hesitations

• uh I don’t agree

• and uh fortunately we agreed

• and when they say you know [pause] buy one get one free it’s hard to resist

• we have a tremendous amount of um sunny days

2

Page 3: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Hesitation Placement

3

Page 4: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Proposed Explanations

Hesitation Placement

§ Intonation UnitsFilled and unfilled pauses are preferentially placed after the first word in a phonemic clause (Boomer 1965)Filled pauses are most likely to occur at intonation unit boundaries (Clark & Fox Tree 2002)

§ ConstituentsHesitations are preferably placed at constituent boundaries (e.g. Maclay & Osgood 1959; Clark & Clark 1977; Swerts 1998; Biber et al. 1999)Major planning points (Clark & Clark 1977)

a. Grammatical junctures [clause boundaries]b. Other constituent boundariesc. Before the 1st content word within a constituent

4

Page 5: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

New Ideas

Hesitation Placement

§ Lounsbury (1954):Hypothesis 1: Hesitation pauses correspond to the points of highest statistical uncertainty in the sequencing of units of any given order.Hypothesis 2: [These points] correspond to the beginning of units of encoding.

§ Goldman-Eisler (1968):Speech is often extremely complex but still fluent.The conception of ready-made sentence schemata, models of sentences or modules implies that they are selected in one piece so to speak, that they are not constructed from individual lexical elements – and this would account for the fluency of speakers irrespective of their complexity, in the same way as efficiency in mass production is a matter of use of prefabricated units

5

Page 6: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

New Ideas

Hesitation Placement

§ Lounsbury (1954):Hypothesis 1: Hesitation pauses correspond to the points of highest statistical uncertainty in the sequencing of units of any given order.Hypothesis 2: [These points] correspond to the beginning of units of encoding.

§ Goldman-Eisler (1968):Speech is often extremely complex but still fluent.The conception of ready-made sentence schemata, models of sentences or modules implies that they are selected in one piece so to speak, that they are not constructed from individual lexical elements – and this would account for the fluency of speakers irrespective of their complexity, in the same way as efficiency in mass production is a matter of use of prefabricated units

6

Page 7: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Usage-Based Models

7

Page 8: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

“Prefabricated Units”

Usage-Based Theories

... are the basic units of grammar

§ No separation between grammar and the lexicon§ Both concrete units (like words) and abstract units (like

constructions) are stored in the mental lexicon§ Even compositional units can be stored in the lexicon§ Grammatical structure emerges because speakers combine

several read-made units§ What is mentally stored and how strongly it is represented is

determined by the individual speaker’s experience (i.e. usage)

8

Page 9: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

“Prefabricated Units”

Usage-Based Theories

§ Constructionsabstracte.g. ‘time’-away construction (Jackendoff 1997)

twistin‘ the night awaydanced the night awaywhile the day awayVERB the TIME away

§ Chunksconcretesequences of units, often words

9

Page 10: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunks

Usage-Based Theories

10source: http://www.madeleineshaw.com.au/

tree of

cour

se

how

are

you?

onth

eot

her

hand

sorr

yto

keep

you

wai

ting

Page 11: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking

11

Page 12: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. What Causes Chunking?

Chunking

§ Frequency of use – frequently used sequences = chunks (e.g. Bybee 2002, 2010)

it’s, that’s, don’t? and I, in the

§ Likelihood of co-occurrence – determined e.g. by means of transitional probabilities or the MI score (e.g. Gries 2008; Hilpert 2013; Wiechman 2008)

can’t, willing to, wind up, oh dear, I suppose? aesthetically pleasing, collapsible sailboat, juvenile delinquents

12

Page 13: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

2. Is Chunking Abrupt or Gradual?

Chunking

§ Threshold ApproachWe have two distinct classes of multi-word sequences: chunks and non-chunks. Any sequence which the mind deems sufficiently frequent is stored as a chunk (e.g. Pawley & Syder 1983; Erman & Warren 2000).

§ Continuous ApproachChunking is a gradual phenomenon. There aren’t chunks and non-chunks, but more or less chunky sequences (e.g. Langacker 1987; Bybee 2002, 2010; Arnon & Snider 2010)

13

Page 14: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

3. How are Chunks Stored?

Chunking

§ Holistic StorageChunks receive a separate entry in the metal lexicon. Chunking strength (should the model require it) is reflected by stronger representations (e.g. Arnon & Snider 2010).

§ NetworkChunks are not stored holistically, but instead as connections between the representations of their components. Chunking strength is reflected by stronger connections (e.g. McClelland & Rumelhart 1981)

14

Page 15: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Bybee’s (2010) Model

15

Page 16: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. What Causes Chunking?

Bybee’s Exemplar Model

§ Co-occurrence Frequency§ Some combinations that receive a strong chunkiness rating based

on co-occurrence frequency actually receive a low rating based on probabilistic measures of co-occurrence (e.g. in the).

§ The formulae to calculate probabilistic measures (such as transitional probabilities) always contain co-occurrence frequency. This means that the other factors in the formula “devalue” the frequency rating.

§ Bybee argues that this does not happen in the mind: The mind does not “devalue” frequent combinations.

§ See Bybee’s (2010) discussion of Collostruction Analysis (Stefanowitsch & Gries 2003).

16

Page 17: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

2. Is Chunking Abrupt or Gradual?

Bybee’s Exemplar Model

§ Continuous ApproachChunking is a gradual phenomenon. From the first encounter, sequences of any length are mentally stored.The more often a sequence is used, the chunkier it becomes.

17

Page 18: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

3. How are Chunks Stored?

Bybee’s Exemplar Model

§ Holistic StorageHolistic storage at the first encounterCertainly words that have never been experienced together do not constitute a chunk, but otherwise there is a continuum from words that have been experienced together only once and fairly recently, which will constitute a weak chunk whose internal parts are stronger than the whole, to more frequent chunks such as lend a hand and pick and choose which are easily accessible as wholes while still maintaining connections to their parts. (Bybee 2010)

§ Network[I]tems that are used together frequently will form tighter bonds than items that occur together less often (Bybee 2007)

18

Page 19: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking and Constituents

Bybee’s Exemplar Model

§ Chunks are not units of planning that speakers can revert to in addition to constituents.

§ Chunks do not even result from constituents, but:§ “Sequentiality is more basic than hierarchy” (Bybee 2010)

Chunks can be combinedSmaller chunks can occur within larger onesFrom these combinatorial possibilities and the varying chunking strengths within a string thus created emerges the hierarchical structure of languageConcrete surface sequences are primary.Abstract hierarchical phrase structure is derived.

§ Not all of the abstractions that linguists have made (e.g. certain phrase boundaries) should be rethought based on the frequency data we now have.

19

Page 20: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Strong Chunks

Bybee’s Exemplar Model

20

frequentsequences

stronglyrepresented

unit-like appearancein speech (+ writing)

string frequencynot frequency of the individual components

strong, easily accessible holistic representation

fluent pronunciationphonetic reductionuninterrupted

Strong Chunks

Page 21: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 21

frequentsequences

unit-like appearancein speech (+ writing)

form the basis of constituents

Bybee’s Exemplar Model

Strong Chunks

Strong Chunks

stronglyrepresented

Page 22: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. Co-occurrence frequency should be a better predictor of hesitation placement than transitional probabilities and similar probabilistic measures.

2. The frequency of a sequence and its chance of being interrupted to hesitate should be inversely related.

3. Co-occurrence frequency should be a better predictor of hesitation placement than phrase structure.

Hypotheses

22

Page 23: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking in the PP

23

Page 24: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Contexts

Chunking in the PP

§ Prepositional phrases

24

1. Prep N about baseball

2. Prep Det N of the cowboys

3. Prep N N of Princess Di

4. Prep Det N N through a fax machine

5. Prep Adj N with stiff penalties

6. Prep Det Adj N in a nice neighbourhood

Page 25: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Data

Chunking in the PP

§ SWITCHBOARD NXT§ Telephone conversations between strangers (1990/91)§ Spoken American English§ 830,000 words § annotated: Part-of-Speech, phrases etc.§ time-aligned

25

Page 26: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Hesitations

Chunking in the PP

§ Unfilled pauses (0.2 - 1 sec.)§ Filled pauses (uh, um)§ Discourse markers (well, like, you know, I mean)

26

Page 27: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Hesitations

Chunking in the PP

§ Prepositional phrases§ n = 4,724 data points

27

1. Prep N about baseball n = 1,231

2. Prep Det N of the cowboys n = 1,440

3. Prep N N of Princess Di n = 346

4. Prep Det N N through a fax machine n = 218

5. Prep Adj N with stiff penalties n = 254

6. Prep Det Adj N in a nice neighbourhood n = 575

Page 28: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Possible Positions

Chunking in the PP

28

and in the movieuh

Position1

uh

Position 2

uh

Position 3

Page 29: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking in the PP

29

before Prep before N

Prep N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

100

200

300

400

500

600

before Prep before Det before N

Prep Det N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

200

400

600

800

before Prep before N1 before N2

Prep N N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

250

300

before Prep before Det before N1 before N2

Prep Det N N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

before Prep before Adj before N

Prep Adj N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

250

before Prep before Det before Adj before N

Prep Det Adj N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

Figure 4.1: Distribution of hesitations across prepositional phrase types. White bars indicate unfilled pauses, ruled bars indicate filled pauses and grey bars indicate discourse markers.

96Hesitation Placement in Prepositional Phrases

before Prep before N

Prep N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

100

200

300

400

500

600

before Prep before Det before N

Prep Det N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

200

400

600

800

before Prep before N1 before N2

Prep N N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

250

300

before Prep before Det before N1 before N2

Prep Det N N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

before Prep before Adj before N

Prep Adj N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

250

before Prep before Det before Adj before N

Prep Det Adj N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

Figure 4.1: Distribution of hesitations across prepositional phrase types. White bars indicate unfilled pauses, ruled bars indicate filled pauses and grey bars indicate discourse markers.

96Hesitation Placement in Prepositional Phrases

before Prep before N

Prep N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

100

200

300

400

500

600

before Prep before Det before N

Prep Det N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

200

400

600

800

before Prep before N1 before N2

Prep N N

Hesitation PlacementTo

tal A

mou

nt o

f Hes

itatio

ns

0

50

100

150

200

250

300

before Prep before Det before N1 before N2

Prep Det N N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

before Prep before Adj before N

Prep Adj N

Hesitation Placement

Tota

l Am

ount

of H

esita

tions

0

50

100

150

200

250

before Prep before Det before Adj before N

Prep Det Adj N

Hesitation PlacementTo

tal A

mou

nt o

f Hes

itatio

ns

0

50

100

150

200

Figure 4.1: Distribution of hesitations across prepositional phrase types. White bars indicate unfilled pauses, ruled bars indicate filled pauses and grey bars indicate discourse markers.

96Hesitation Placement in Prepositional Phrases

Page 30: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking ‘Grain Size’

Chunking in the PP

§ Bigram: 2 consecutive words, though not across sentence boundaries

§ Word: Word form + POS-Tag, separated by spaces from other word forms

30

Page 31: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Predictors

Chunking in the PP

§ Bigram frequencies§ Direct transitional probability§ Backwards transitional probability§ Mutual Information Score (MI)§ Lexical Gravity G§ Word frequencies§ Hesitation type

31

Page 32: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Lounsbury’s Hypothesis

32

Page 33: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Lounsbury’s Hypothesis

§ Lounsbury (1954):Hypothesis 1: Hesitation pauses correspond to the points of highest statistical uncertainty in the sequencing of units of any given order.

33

Page 34: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 34

Phrase TypeDistribution of Lowest TPDDistribution of Lowest TPDDistribution of Lowest TPDDistribution of Lowest TPD % at

Lowest pPhrase Type1 2 3 4

% at Lowest p

Prep N 180 1,050 59.9% p<.001

Prep Det N 195 102 1,140 32.4% -

Prep N N 23 519 4 58.4% p<.001

Prep Det N N 26 9 433 18 39.5% p<.001

Prep Adj N 21 382 27 59.5% p<.001

Prep Det Adj N 57 9 424 81 33.3% p<.001

Lounsbury’s Hypothesis

Page 35: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 35

Phrase TypeDistribution of Lowest TPDDistribution of Lowest TPDDistribution of Lowest TPDDistribution of Lowest TPD % at

Lowest pPhrase Type1 2 3 4

% at Lowest p

Prep N 180 1,050 59.9% p<.001

Prep Det N 195 102 1,140 32.4% -

Prep N N 23 519 4 58.4% p<.001

Prep Det N N 26 9 433 18 39.5% p<.001

Prep Adj N 21 382 27 59.5% p<.001

Prep Det Adj N 57 9 424 81 33.3% p<.001

Lounsbury’s Hypothesis

Page 36: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 36

Phrase TypeDistribution of Lowest MIDistribution of Lowest MIDistribution of Lowest MIDistribution of Lowest MI % at

Lowest pPhrase Type1 2 3 4

% at Lowest p

Prep N 805 415 63.9% p<.001

Prep Det N 696 577 168 48.7% p<.001

Prep N N 305 239 3 47.2% p<.001

Prep Det N N 201 216 29 0 35.3% p<.001

Prep Adj N 192 231 5 51.9% p<.001

Prep Det Adj N 254 247 74 0 39.0% p<.001

Lounsbury’s Hypothesis

Page 37: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 37

Phrase TypeDistribution of Lowest MIDistribution of Lowest MIDistribution of Lowest MIDistribution of Lowest MI % at

Lowest pPhrase Type1 2 3 4

% at Lowest p

Prep N 805 415 63.9% p<.001

Prep Det N 696 577 168 48.7% p<.001

Prep N N 305 239 3 47.2% p<.001

Prep Det N N 201 216 29 0 35.3% p<.001

Prep Adj N 192 231 5 51.9% p<.001

Prep Det Adj N 254 247 74 0 39.0% p<.001

Lounsbury’s Hypothesis

Page 38: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Results

§ The different measures of association make very different assessments concerning the location of the point of highest statistical uncertainty.

§ The hypothesis is not confirmed in its strongest formHesitations are not always placed at the point of highest statistical uncertainty.

§ BUT:More hesitations are placed at the point of highest statistical uncertainty than expected by chance.This holds for all measures tested except backwards transitional probability.

38

Lounsbury’s Hypothesis

Page 39: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Multifactorial Analysis

39

Page 40: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Conditions

Multifactorial Analysis

§ Multinomial outcomes§ Multifactorial§ Partially correlated/collinear predictors

40

Page 41: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

CART-Trees

Multifactorial Analysis

§ Classification and Regression Trees§ Algorithm ‘grows’ trees through recursive binary partitioning§ Can handle multinomial outcomes, complex interactions &

collinear predictors§ ctree function for party package for R (Hothorn et al. 2006)

41

Page 42: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

CART-Trees

Multifactorial Analysis

42

bi0.freq.NXTp < 0.001

1

≤ 336 > 336

MI0.NXTp < 0.001

2

≤ 2.873 > 2.873

G2.NXTp < 0.001

3

≤ -0.549 > -0.549

Node 4 (n = 202)

1 2 30

0.2

0.4

0.6

0.8

1

MI1.NXTp = 0.007

5

≤ 2.502 > 2.502

MI0.NXTp = 0.043

6

≤ 1.835 > 1.835

Node 7 (n = 295)

1 2 30

0.2

0.4

0.6

0.8

1Node 8 (n = 69)

1 2 30

0.2

0.4

0.6

0.8

1Node 9 (n = 252)

1 2 30

0.2

0.4

0.6

0.8

1

G0.NXTp < 0.001

10

≤ -0.507 > -0.507

Node 11 (n = 292)

1 2 30

0.2

0.4

0.6

0.8

1

TPD.bi0.NXTp < 0.001

12

≤ 0.261 > 0.261

bi0.freq.NXTp = 0.045

13

≤ 176 > 176

Node 14 (n = 163)

1 2 30

0.2

0.4

0.6

0.8

1Node 15 (n = 15)

1 2 30

0.2

0.4

0.6

0.8

1Node 16 (n = 115)

1 2 30

0.2

0.4

0.6

0.8

1Node 17 (n = 37)

1 2 30

0.2

0.4

0.6

0.8

1

Page 43: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Random Forests

Multifactorial Analysis

§ reliance on a single tree may be problematiconly locally optimal splitsvariable predictionssome predictors never appearimportance of predictors hard to assess

§ reliance on several thousand trees (here: 3,000)§ random subset of predictors§ random subset of data points§ cforest command for party package in R (Hothorn et al. 2006,

Strobl et al. 2007, Strobl et al.2008)

43

Page 44: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Random Forests - Variable Importance

Multifactorial Analysis

44sort(PDN.varimp)

MI2.NXT

TPB.bi2.NXT

w2.freq.NXT

bi1.freq.NXT

G1.NXT

bi2.freq.NXT

TPD.bi2.NXT

w3.freq.NXT

TPB.bi1.NXT

MI1.NXT

TPD.bi1.NXT

hes.type

TPB.bi0.NXT

G2.NXT

w1.freq.NXT

w0.freq.NXT

G0.NXT

bi0.freq.NXT

TPD.bi0.NXT

MI0.NXT

0.000 0.005 0.010 0.015 0.020 0.025

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

Page 45: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

What causes chunking?

45

Page 46: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Results (Random Forests)

Multifactorial Analysis

46

Phrase Type Correct Pred.

Sig. Level ResidualsResiduals

Prep N 82.7% p<.001 15.2 -15.69

Prep Det N 71.9% p<.001 6.46 -7.72

Prep N N 72.7% p<.001 4.84 -5.59

Prep Det N N 65.4% p<.001 10.35 -7.94

Prep Adj N 71.0% p<.001 3.47 -4.1

Prep Det Adj N 63.1% p<.001 7.86 -6.68

Page 47: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Results (Out-of Bag)

Multifactorial Analysis

47

Phrase Type Correct Pred.

Sig. Level ResidualsResiduals

Prep N 69.9% p<.001 8.93 -9.22

Prep Det N 64.4% p<.001 2.78 -3.33

Prep N N 59.5% non-sig. - -

Prep Det N N 44.1% p<.01 2.59 -1.98

Prep Adj N 57.3% non-sig. - -

Prep Det Adj N 48.2% p<.001 2.38 -2.02

Page 48: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Results (Out-of Bag)

Multifactorial Analysis

48

Phrase Type Correct Pred.

Sig. Level ResidualsResiduals

Prep N 69.9% p<.001 8.93 -9.22

Prep Det N 64.4% p<.001 2.78 -3.33

Prep N N 59.5% non-sig. - -

Prep Det N N 44.1% p<.01 2.59 -1.98

Prep Adj N 57.3% non-sig. - -

Prep Det Adj N 48.2% p<.001 2.38 -2.02

Page 49: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. Co-occurrence frequency should be a better predictor of hesitation placement than transitional probabilities and similar probabilistic measures.

2. The frequency of a sequence and its chance of being interrupted to hesitate should be inversely related.

3. Co-occurrence frequency should be a better predictor of hesitation placement than phrase structure.

What causes chunking?

49

Page 50: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Performance of Predictors

Multifactorial Analysis

50

0.005

0.010

0.015

Predictor

Var

iabl

e Im

porta

nce

1 TPB 2 w.freq 3 bi.freq 4 MI 5 TPD 6 G 7 hes.type

Page 51: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. Co-occurrence frequency should be a better predictor of hesitation placement than transitional probabilities and similar probabilistic measures.

No, co-occurrence frequency performs on par with the other predictors.

No evidence that co-occurrence frequency is the sole cause of chunking.

But: No sign of highly frequent sequences being ‘devalued’.

What causes chunking?

51

Page 52: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

CART-Trees

Multifactorial Analysis

52

bi0.freq.NXTp < 0.001

1

≤ 336 > 336

MI0.NXTp < 0.001

2

≤ 2.873 > 2.873

G2.NXTp < 0.001

3

≤ -0.549 > -0.549

Node 4 (n = 202)

1 2 30

0.2

0.4

0.6

0.8

1

MI1.NXTp = 0.007

5

≤ 2.502 > 2.502

MI0.NXTp = 0.043

6

≤ 1.835 > 1.835

Node 7 (n = 295)

1 2 30

0.2

0.4

0.6

0.8

1Node 8 (n = 69)

1 2 30

0.2

0.4

0.6

0.8

1Node 9 (n = 252)

1 2 30

0.2

0.4

0.6

0.8

1

G0.NXTp < 0.001

10

≤ -0.507 > -0.507

Node 11 (n = 292)

1 2 30

0.2

0.4

0.6

0.8

1

TPD.bi0.NXTp < 0.001

12

≤ 0.261 > 0.261

bi0.freq.NXTp = 0.045

13

≤ 176 > 176

Node 14 (n = 163)

1 2 30

0.2

0.4

0.6

0.8

1Node 15 (n = 15)

1 2 30

0.2

0.4

0.6

0.8

1Node 16 (n = 115)

1 2 30

0.2

0.4

0.6

0.8

1Node 17 (n = 37)

1 2 30

0.2

0.4

0.6

0.8

1

Page 53: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

2. The frequency of a sequence and its chance of being interrupted to hesitate should be inversely related.

Yes, there is not a single split in the CART-trees which suggests the opposite.

We always find: The higher the score of a bigram, the less likely the speaker is to interrupt the speech flow at this transition.

Splits in the trees are made across the spectrum and based on all predictors.

Is Chunking Abrupt or Gradual?

53

Page 54: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

3. Co-occurrence frequency should be a better predictor of hesitation placement than phrase structure.

Yes, frequency-derived measures are far better predictors of hesitation placement than phrase structure.

Chunking across the prepositional phrase is possible and, in fact, common.

Chunking and Constituents

54

Page 55: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Chunking in Violation of the PP Boundary

Multifactorial Analysis

55

TPD.bi0.NXTp < 0.001

1

≤ 0.13 > 0.13

hes.typep = 0.022

2

u {dm, pause}

Node 3 (n = 85)

1 2 30

0.2

0.4

0.6

0.8

1

Node 4 (n = 181)

1 2 30

0.2

0.4

0.6

0.8

1

G0.NXTp = 0.002

5

≤ 3.11 > 3.11

Node 6 (n = 61)

1 2 30

0.2

0.4

0.6

0.8

1

Node 7 (n = 104)

1 2 30

0.2

0.4

0.6

0.8

1

Page 56: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

§ Quantifier + ofIt would be great to have some of those [pause] organisations [...]Examples: one of, many of, (a) lot ofn = 289Hesitation before the preposition: 7.3 % (rest: 47.17 %)

§ Further of-CollocatesExamples: sort(s) of, kind(s) of, out of, terms ofn = 121Hesitation before the preposition: 4.1 % (rest: 44.6 %)

§ Ctree models perform above average on these structures§ Characterised by positive or high MI score and high direct

transitional probability

56

Multifactorial Analysis

Chunking in Violation of the PP Boundary

Page 57: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 57

Multifactorial Analysis

Chunking in Violation of the PP Boundary

-10

-50

510

15

Backwards Transitional Probability and Gfor 'Support Noun+of' Bigrams

Backwards Transitional Probability (log scaled)

G

0.0001 0.001 0.01 0.1 1 -5 0 5 10 15

MI and Direct Transitional Probabilityfor 'Support Noun+of' Bigrams

MI

Dire

ct T

rans

ition

al P

roba

bilit

y (lo

g sc

aled

)

0.0001

0.001

0.01

0.1

1

kind ofkinds of

sort ofsorts of

type oftypes of

form offorms of

-10

-50

510

15

Backwards Transitional Probability and Gfor 'out of' & 'terms of'

Backwards Transitional Probability (log scaled)

G

0.0001 0.001 0.01 0.1 1 -5 0 5 10 15

MI and Direct Transitional Probabilityfor 'out of' & 'terms of'

MI

Dire

ct T

rans

ition

al P

roba

bilit

y (lo

g sc

aled

)

0.0001

0.001

0.01

0.1

1

out ofterms of

Page 58: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015 58

Multifactorial Analysis

0.000

0.005

0.010

0.015

0.020

Predictor

Var

iabl

e Im

porta

nce

1 TPB 2 bi.freq 3 MI 4 TPD 5 G

Page 59: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

3. Co-occurrence frequency should be a better predictor of hesitation placement than phrase structure.

Yes, frequency-derived measures are far better predictors of hesitation placement than phrase structure.

Chunking across the prepositional phrase is possible and, in fact, common.

Chunking strengths across the phrase boundary vary + are immensely important for the model.

Chunking and Constituents

59

Page 60: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Summary

60

Page 61: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

1. Co-occurrence frequency should be a better predictor of hesitation placement than transitional probabilities and similar probabilistic measures.

2. The frequency of a sequence and its chance of being interrupted to hesitate should be inversely related.

3. Co-occurrence frequency should be a better predictor of hesitation placement than phrase structure.

Hypotheses

61

✔ ︎✘

✔ ︎

Page 62: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

How are Chunks Stored?

Bybee’s Exemplar Model

§ Holistic StorageHolistic storage at the first encounter

62

Page 63: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

How are Chunks Stored?

63

a lot of people

alot

ofpeople

a lotlot of

of people

aaaaalotlot

ofofofofofofofpeoplepeople

of peoplelot oflot oflot ofa lota lota lot

a lot of peoplesemantic

filter

Page 64: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

How are Chunks Stored?

64

alot

ofpeople

Page 65: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Conclusions Concerning the Mental Model

How are Chunks Stored?

§ Measures like the MI score tend to rate sequences which form a semantic unit much higher than sequences which do not form a semantic unit

The good performance of the MI score could be interpreted as a semantic filter being at workBUT: MI is no better predictor than frequency

§ Word frequencies are poor predictorsIn an exemplar model, we would expect competition between the parts and the whole, for which we find no evidence in the data

§ A very simple network model of chunking suffices to explain a processing phenomenon like hesitation placement.

65

Page 66: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Well, thank you for uh your attention.

66

Page 67: Frequency, Chunks & sentationU... · PDF fileSpeech is often extremely complex but still fluent. ... collapsible sailboat, juvenile delinquents 12. Ulrike Schneider | Linguistisches

Ulrike Schneider | Linguistisches Kolloquium | 19. Januar 2015

Sources:http://www.freidok.uni-freiburg.de/volltexte/9793/

67