Effect and artifact in the perception of stress; a cross-linguistic view Vincent J. van Heuven

Preview:

Citation preview

Effect and artifact in the perception of stress; a cross-linguistic view

Vincent J. van Heuven

Introduction, terminology

30 April 2008 Stress UAB 3

Introduction: terms

Stress Abstract linguistic property of a word Position of strongest syllable in word Only one head: culminative property

Accent Phonetic realisation of a stressed syllable

30 April 2008 Stress UAB 4

Introduction: terms

Typically, inventory of stressed syllables is larger than that of unstressed syllables

Identity of word is mainly determined by make-up of stressed syllable

Listeners pay more attention when they expect a stress

Word recognition waits for stressed syll.

30 April 2008 Stress UAB 5

Introduction: terms

Stress is realised by More careful (‘clear’, ‘hyper’) articulation More expanded vowel space Longer duration More intensity (decibels) Flatter spectral tilt (faster adduction) Resistance to assimilation and coarticulation

30 April 2008 Stress UAB 6

Introduction: terms

When word is important in sentence Stress is additionally signalled by

conspicuous pitch movement Movement is associated (‘aligned’) with the

stressed syllable Sentence stress

is sometimes called ‘pitch accent’ [not to be confused with Tokyo Japanese]

30 April 2008 Stress UAB 7

Production ~ Perception

Perception Sentence stress is more prominent than just

word stressA well-aligned pitch movement is always heard

as a stress: strongest cue by far But is not always present

Absent when word has no sentence stress Therefore: pitch is strong but inconsistent

cue

30 April 2008 Stress UAB 8

Production ~ Perception

Production Most consistent cue is relative duration of

rhyme portion in syllable Ratio between stressed and unstressed

version of rhyme (in paradigmatic comparison) is the same, whether pitch movement is present or not

30 April 2008 Stress UAB 9

Aside

Paradigmatic ~ syntagmatic comparison (the) IMport ~ (to) imPORT Do not compare first syll with second syll You will find that unstressed port is longer

and louder (dB) than stressed IM Compare stressed IM with unstressed im,

and stressed PORT with unstressed port

Functional load hypothesis

30 April 2008 Stress UAB 11

Functional load hypothesis

Classical order of importance of stress cues (Fry 1955, 1958, 1965) Pitch (movement) Duration Intensity Spectral expansion

Based mainly on English stress

30 April 2008 Stress UAB 12

Functional load hypothesis

Berinstein (1979) You can spend your money only once If language uses a parameter for segmental

contrast, it cannot use the same parameter as a stress cue

E.g., if a language has long ~ short vowels, duration is no longer an effective stress cue

30 April 2008 Stress UAB 13

Berinstein (1979)

Languages contrasted:position vowel length

English variable, initial no (?)

Spanish variable, prefinal no

K’ekchi fixed, final yes

Caqchiquel fixed, final no

30 April 2008 Stress UAB 14

Functional load hypothesis

Berinstein (1979) English has tense (long) and lax (short)

vowels Spanish has neither tenseness nor length as

a parameter Prediction: duration is less effective stress

cue in English than in Spanish

30 April 2008 Stress UAB 15

Functional load hypothesis

Berinstein (1979) K’ekchi, fixed final stress, with vowel length

contrast Cakchiquel, fixed final stress, no length

contrast Prediction: duration is less effective stress

cue in K’ekchi than in Cakchiquel

30 April 2008 Stress UAB 16

Berinstein (1979)

Perception study Stimuli

/bibibibi/, 100 ms base vowel duration (+ 40 ms for /b/) test vowel has deviant duration

70, 100 (control), 120, 140, 160, 200 ms Listeners

36 native English (mean age 22) 22 monolingual Spanish (mean age 23) 31 K’ekchi (mostly bi-lingual, mean age < 20) 46 Cakchiquel (all bi-lingual, mean age < 20)

clear position bias: more stress judgments as test syllable occurs earlier in the word

86, 67, 62, 46%

huge effect of duration (lengthening > 50% attracts stress)

34, 44, 89, 94%overall effect better than 2x chance

no position bias

34, 34, 32, 32%

small effect of duration:

28, 26, 39, 39%

overall effects just above chance

clear position bias: more stress judgments as test syllable occurs later in the word:

19, 23, 31, 44%

no clear effect of duration manipulations

28, 26, 30, 34%

overall effect hardly above chance

30 April 2008 Stress UAB 20

Berinstein (1979)

Results of perception test (cont.) English

clear position bias: more stress judgments as test syllable occurs earlier in the word

huge effect of duration (lengthening > 50% attracts stress)

overall effect better than 2x chance

Note I replicated the experiment with Dutch listeners results identical to English

30 April 2008 Stress UAB 21

Berinstein (1979)

Results of perception test K’ekchi

clear position bias: more stress judgments as test syllable occurs later in the word

no clear effect of duration manipulations overall effect hardly above chance

Spanish no position bias small effect of duration overall effects just above chance

tiny effect of duration: only 200-ms vowels attract some stress judgments

24, 29, 25, 35%

1 2 3 4

Position of syllable

0

20

40

60

80

100S

tres

s pe

rcei

ved

(%)

English

Spanish

KekchiCakchiquel

120 140 160 200

Vowel duration (ms)

0

20

40

60

80

100

Str

ess

perc

eive

d (%

)

30 April 2008 Stress UAB 24

Berinstein (1979)

Summary of observations re. duration Large effect in English

But English also has length contrast Small effect in Spanish

Even though Spanish has no length contrast Small effect in Kekchi

Even though Kekchi has vowel length contrast Same small effect in Cakchiquel

Even though Cakchiquel has no length contrast

30 April 2008 Stress UAB 25

Berinstein (1979)

Conclusion re. Berinstein (1979) Results simply contradict all predictions

Within the European languages Spanish should use duration more than English (but does not)

Within the Mayan languages Cakchiquel should use duration more than K’ekchi (but does not)

Therefore little credibility for functional load hypothesis

30 April 2008 Stress UAB 26

Berinstein (1979)

Extra: position bias in Berinstein Strong initial-stress bias in English

OK, most words have initial stress Weak final-stress bias in K’ekchi

OK, but why weak? Weak prefinal stress bias in Cakchiquel

Not predicted No stress bias at all in Spanish

Why? What is the distribution of stress in Spanish?

30 April 2008 Stress UAB 27

Functional load hypothesis

Posituk, Gandour & Harper (1996) Thai has five lexical tones

Prediction: pitch cannot be an effective stress cue

Thai contrasts long short vowels Prediction: duration cannot be an effective stress

cue

Acoustic correlates were measured i.e. NOT a perception study

30 April 2008 Stress UAB 28

Potisuk et al. (1996)

Method two male, three female speakers

(read-out speech) 25 sentences with minimal stress pairs

(20 with long vowels, 5 with short vowels)

full 5 x 5 matrix of two-tone sequences

30 April 2008 Stress UAB 29

Potisuk et al. (1996)

Note: stress pairs are not really minimal one is a two-word sequence (N-V) the other is a two-syllable compound

Measurements only initial syllables were measured (paradigmatic) F0 curve, in ERB + Z-transform, time-normalised

(reduction to mean and SD) Rhyme duration (re. sentence duration, within-

speaker normalisation for inherent segment duration)

Intensity curve (normalised within speakers, reduction to mean and SD (through Z-transform)

five-member lexical tone contrast is fully maintained in [–stress], even though F0 curves are flattened considerably

Mean F0: No difference between +stress and –stress

F0 variability: larger for +stress, stronger for some tones than for others (interaction of stress and tone)

Mean intensity: no difference

Intensity variability: no difference

Duration: [+stress] much longer than [–stress], for all lexical tones (i.e. no stress x tone interaction)

30 April 2008 Stress UAB 36

Potisuk et al. (1996)

Results Mean F0: no difference F0 variability: larger for [+stress], stronger for

some tones than for others (interaction of stress and tone)

Mean intensity: no difference Intensity variability: no difference Duration: [+stress] longer than [–stress], for all

lexical tones (i.e. no stress x tone interaction)

30 April 2008 Stress UAB 37

Potisuk et al. (1996)

Acid test: automatic classification by LDA rhyme duration >> F0-SD >> Intensity SD 99% correct classification with duration alone

Interesting point five-member lexical tone contrast is fully maintained

in [–stress], even though F0 curves are flattened considerably

In other languages lexical-tone contrasts may be neutralised in [–stress] conditions

30 April 2008 Stress UAB 38

Potisuk et al. (1996)

Conclusions Results largely go against functional load

hypothesis Duration is by far the strongest correlate

(but should not be) F0 should not be a correlate

and indeed is not in terms of mean F0

But is a good stress cue in terms of F0 range

30 April 2008 Stress UAB 39

Multiple sources of variability

Vowel duration is longer (e.g. Klatt, 1974) in [+long] vowels before deeper prosodic breaks in syllables with word stress in words with sentence stress in slow speech before voiced (and esp.) sonorant consonants

30 April 2008 Stress UAB 40

Multiple sources of variability

Listeners are able to decompose different sources of variability in a parameter E.g. Nooteboom (1979) shows that Dutch

listeners use duration effectively to make multiple simultaneous contrastsLong ~ short vowelsDepth of prosodic break

They adjust the long ~ short boundary depending on the depth of the break

30 April 2008 Stress UAB 41

Functional load hypothesis

Since simultaneous effects are perceptually decomposed, the functional load hypothesis seems too simple Results indicate that we can both have our

cake and eat it ‘Get two for the price of one’

Original hierarchy still stands

Duration as a stress cue in English

30 April 2008 Stress UAB 43

Postnuclear stress contrast?

Beckman & Edwards (1994) Simple prominence hierarchy in English Four degrees of prominence

Full vowel > reduction vowel (schwa) Pitch movement > no pitch accent Last accent > earlier accents

30 April 2008 Stress UAB 44

Postnuclear stress contrast?

Beckman & Edwards (1994) Predictions

Schwa cannot be stressed unless it is transformed to a full vowel first

No contrast between initial and final stress in postnuclear words with full vowels (Scott 1939, Huss 1978).

30 April 2008 Stress UAB 45

Postnuclear stress contrast?

Scott (1939) One sentence, initial stress only Noun ~ verb minimal stress pair 11 listeners, forced choice Response distribution towards initial stress But not significantly so

30 April 2008 Stress UAB 48

Postnuclear stress contrast?

Pilch (1970) Difference between import ~ import is

exclusively a matter of intonation Not carried by stress If intonation cues are removed (by

embedding target in postnuclear position) no difference between noun and verb reading should remain

30 April 2008 Stress UAB 49

Postnuclear stress contrast?

Huss (1978) Used same clever sentences as Scott

Actually, even cleverer

Identical word sequences with different stress pattern on noun~verb pairs in postnuclear position

See examples

(1) It is not true that all nations have always been equally self-sufficient as far as the production of sinks is concerned. The degree of self-sufficiency has

changed during the last year: Whereas formerly the Americans used to import sinks, now the Germans import sinks.

Did you say the Germans import sinks?

(2) It is not true that the balance of payment of all nations has always been equally healthy. The amount of

net import has changed in different ways for different nations: Whereas formerly the Americans’ import

used to sink, now the Germans’ import sinks.

Did you say the Germans’ import sinks?

30 April 2008 Stress UAB 51

Huss (1978)

Method 4 different noun~verb pairs Nuclear~postnuclear target position Statement~question 7 speakers 3 phonetic expert listeners 4 x 7 x 3 = 84 stress judgments per condition

Lexical stress pattern

Perceived as

Noun (initial) Verb (final)

statement Noun 25 75

Verb 24 76

Question Noun 24 76

Verb 14 86

No effect

Trend, χ2 = 1.89 (p = 0.167)

Huss (1978) perception test: Percent responses

(1) [the GERmans] [import sinks]

(2) [the GERmans’ import] [sinks]

Final lengthening of unstressed syll.

No lengthening of stressed syll.

30 April 2008 Stress UAB 54

Huss (1978)

No clear difference between initial and final stress in postnuclear minimal pairs with full vowels only

As predicted by Beckman & Edwards But stress and phrasing confounded Let us keep phrasing constant and vary

stress only. See Huss (1975)

30 April 2008 Stress UAB 55

Huss (1975)

Method 10 minimal stress noun~verb pairs

We FIRST import, he said [Verb, final stress]His FIRST import, he said [Noun, initial stress]

2 male speakers Informal listening procedure Unknown number of listeners (but

phonetically trained)

30 April 2008 Stress UAB 56

Huss (1975)

Perceptual results One group of words with stress perceived in

conformity with noun~verb contrast, high agreement among listeners

In ‘a few words’ listeners did not agree In ‘some other words’ listeners did agree but

reported stress the wrong way around Unfortunately no quantitative data

The decisive auditory parameter in the identification of stress in post-nuclear position, i.e. in the absence of a pitch contrast, was the duration ratio between the two syllables; the experimental follow-up study should bear out which acoustic parameters correlate with this auditory impression.

30 April 2008 Stress UAB 58

Huss (1975)

Perceptual results In pairwise comparison of noun~verb pairs vowel

duration seemed the clearest correlate

Acoustic measurements of one speaker presented (better speaker)

Second speaker had more perceptual ambiguities (and reversals) No quantitative data

Dur

atio

n ra

tio S

1 /

S2

Nouns, initial stress

Verbs, final stress

Duration contrast even more extreme in postnuclear than nuclear stress

30 April 2008 Stress UAB 60

Huss (1975)

Conclusion At least some speakers produce a very

reliable contrast between initial and final stress in postnuclear position in words with full vowels only

The correlate is syllable duration The contrast, when made, is adequately

perceived

30 April 2008 Stress UAB 61

Postnuclear stress contrast?

Beckman & Edwards seem wrong English speakers tend to preserve stress

contrast in postnuclear position English listeners are sensitive to the

contrast even when there is no pitch movement (duration is effective cue)

Same effects were found for DutchNooteboom (1972), van Katwijk (1974), Sluijter &

van Heuven (1996)

30 April 2008 Stress UAB 62

Sluijter & van Heuven (1996)

Prenuclear (unaccented) targets Lexical pair ‘canon~cannon’ Reiterant mimicry

Initial Final

Word stress

0.00

0.25

0.50

0.75

1.00

1.25

Dur

atio

n S

1 / S

2

Lexical

Reiterant

Nuclear

Initial Final

Word stress

0.00

0.25

0.50

0.75

1.00

1.25

Dur

atio

n S

1 / S

2

Pre-nuclear

30 April 2008 Stress UAB 64

Sluijter & van Heuven (1996)

Results Duration (ratio S1/S2) very strong stress cue Equally effective in nuclear and non-nuclear

position Affords 100% stress decisions in LDA

Linear Discriminant AnalysisAutomatic classification algorithm

30 April 2008 Stress UAB 66

Sluijter et al. (1997)

Duration, intensity and loudness as perceptual cues in stress perception in non-nuclear position

Overall result: Duration is strongest cue Loudness (intensity > 500 Hz) is second Intensity is weak cue

30 April 2008 Stress UAB 67

Aside: strength of cues

Standard plots % stress as a function of

X but averaged over all Y stepsY but averaged over all X steps

Observe difference in psychometric function Obscures interaction between X and Y

Alternative: quasi 3D plots

Plot quasi 3-D Determine cross-overs

(50%) in X and Y dimensions, by e.g.

Linear interpolation Probit fitting

Compute linear regression line through points

Determine slope of function

900: X only cue 00: Y only cue 450: equal strength

30 April 2008 Stress UAB 71

Last minute results

Dutch minimal stress pair ‘I have yesterday a canon/cannon heard’

Prenuclear ik heb gisteren een kanon GEHOORD ik heb gisteren een kanon GEHOORD

postnuclear ik heb GISTEREN een kanon gehoord ik heb GISTEREN een kanon gehoord

30 April 2008 Stress UAB 72

Last minute results

Starting from each natural base stimulus 7 manipulations of syllable duration ratio

(using Praat PSOLA) 4 repetitions of each type 20 native Dutch listeners 80 responses per data point

30 April 2008 Stress UAB 74

Last minute results

Duration ration is very effective stress cue in Dutch

Also (smaller) effect of base stimulus Same effects before and after nuclear

accent Same effects are expected for English

30 April 2008 Stress UAB 75

Summing up

Duration is very effective stress cue in Dutch, even in non-nuclear position

It should also be so in English Work in progress at Leiden University

Production and perception of stress in pre- and postnuclear position in Dutch and English.

No results for English at this stage.

Stress bias

30 April 2008 Stress UAB 77

Van Heuven & Menert (1996)

Strange difference Strong initial bias for English (but no fixed

initial stress) Weaker final bias for K’ekchi (although

exceptionless fixed stress)

Why the difference? Bias is partly the result of artifact

30 April 2008 Stress UAB 78

Van Heuven & Menert (1996)

Experiment 1 Synthesized Dutch minimal stress pairs

Monotone 100 Hz flat Declination 100 ... 70 Hz Inclination 100 ... 130 Hz Noise source (i.e. no periodicity, whisper)

Manipulated duration ratio S1 / S2

30 April 2008 Stress UAB 79

Van Heuven & Menert (1996)

Experiment 1: Results Large effects of duration manipulation Strong overall bias for initial stress Reduction of initial-stress bias:

Declination (85%) > Monotone (80%) > Inclination (60%) > Noise (55%).

30 April 2008 Stress UAB 82

Van Heuven & Menert (1996)

Experiment 2: Effect of context Same stimuli & manipulations as before Also preceded by short carrier, so that

first syllable of target does not appear out of the blue

30 April 2008 Stress UAB 83

Van Heuven & Menert (1996)

Experiment 2: Results Isolated targets: Replicates exp 1. Preceding context:

Bias for initial stress completely gone

30 April 2008 Stress UAB 85

Van Heuven & Menert (1996)

Apparently: bias is not inherent but induced by Presence/absence of a preceding context Whether (first syllable of) target has pitch

Suggestion: Bias is induced by virtual pitch jump from

assumed/inferred F0 baseline

30 April 2008 Stress UAB 86

Van Heuven & Menert (1996)

Inferred baseline is speaker’s bottom pitch (roughly 70 Hz)

Prediction The higher the level pitch of an isolated

target, the larger the virtual F0 jump, the stronger the initial stress bias

No bias when target has 70 Hz pitch

30 April 2008 Stress UAB 87

Van Heuven & Menert (1996)

Experiment 3 Same reiterant stimuli Synthesized at 70, 100, 130 and 160 Hz We also manipulated formant settings

+20%, –15%, 0% (neutral)

If virtual pitch jump, then initial stress bias should increase with onset F0

Some initial-stress bias is stimulus induced

Inferred virtual pitch from speaker’s baseline seems justified

Other effects may also play a role

Listeners expect final lengthening in isolated words

Through perceptual compensation last syllable in an equal duration string of four sounds less stressed

Results help to explain why initial stress bias is strong in English and final bias is weaker in Mayan languages K’ekchi and Cakchiquel

Vowel reduction as a stress cue

30 April 2008 Stress UAB 90

FRY (1965): DURATION vs. SPECTRAL REDUCTION

4 Minimal stress pairs (noun vs. verb)CONtract ~ conTRACT

SUBject ~ subJECT

Digest ~ diGEST

Object ~ obJECT

3 duration steps (smaller range than in Fry 1955, 1958)

30 April 2008 Stress UAB 91

DURATION vs. SPECTRAL REDUCTION

3 degrees of vowel reduction/expansion for V1 while keeping V2 constant (mid

value): f1, f2, f3 for V2 while keeping V1 constant (f4, f5, f6) Note: reduction of diphthong /ai/ by

reduction of glide trajectory (full, halfway, none= endpoint only)

duration manipulation

quality manipulationV1<V2

V1=V2

V1>V2

30 April 2008 Stress UAB 94

DURATION vs. SPECTRAL REDUCTION

Intensity (V1=V2) and F0 (120 Hz) were kept constant

Problem? There is a constant 6dB difference between

F1 and F2, i.e., spectral tilt depends on frequency difference between F1 and F2: the larger the distance the flatter the tilt

30 April 2008 Stress UAB 95

DURATION vs. SPECTRAL REDUCTION

RESULTS Effects of duration structure (in spite of

restricted duration range) stronger than of spectral reduction

Effects of reduction of V1 stronger than of V2

30 April 2008 Stress UAB 96

Van Bergem (1993)

Spectral reduction in Dutch Production study

Measurement of F1 and F2 at most stable portion during vowel (least spectral change)

Systematic manipulation of stress, focus, and lexical status of words

Manipulation of focus through question/answer pairs:

Test syllable: can

(What did you buy for your mother?

I bought [CANdy]+F for my mother +C +A +S

(For whom did you buy candy?)

I bought [CANdy]-F for my mother +C -A +S

(Where do they sell beer?)

In our [canTEEN]+F they sell beer +C +A -S

(What do they sell in our canteen?)

In our [canTEEN]-F they sell beer +C -A -S

(What can your sister do for hours?)

My sister can [TALK]+F for hours -C +A

(How long can your sister talk?)

My sister can [TALK]-F for hours -C -A

[CAN]+F (spoken in isolation) ISO

30 April 2008 Stress UAB 98

Van Bergem (1993)

Experimental set-up 15 (male) speakers 7 stress/accent/status

conditions 33 test syllables

yielding 3465 vowel tokens

30 April 2008 Stress UAB 99

Van Bergem (1993)

Selected results For test syllables with /e:/, /o:/ and /a:/ only No function words Spectrally most expanded tokens for

isolated words marginal reduction for +A+S Appreciable reduction for -A Appreciable reduction for -S Effects of A and S are equal and additive

30 April 2008 Stress UAB 100

30 April 2008 Stress UAB 101

Van Bergem (1993)

Notes These are acoustic effects Proper studies of the cue value of

spectral reduction for stress/accent perception have to be carried out yet (for any language whatsoever)

…preferably in relationship with cues to domain-final lenthening

Unified view

30 April 2008 Stress UAB 103

Unified view

There is no unified view I would like to assume that all languages

use stress parameters in the same way Not necessarily in speech production but

certainly in speech perception Although the use of pitch for the marking of

sentence stress may differ

30 April 2008 Stress UAB 104

Unified view

No room for a functional load hypothesis Unclear why duration is such a weak cue

for Spanish in Berinstein (1979) But strong cue in Catalan in recent work

at UAB Also in Spanish?

(Much) more research needed

Thanks for bearing with me