362
Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 http://www.icsi.berkeley.edu/~steveng [email protected] In Collaboration with Hannah Carvey, Leah Hitchcock and Shawn Chang

Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Time Frames of

Spoken Language

Steven GreenbergInternational Computer Science Institute1947 Center Street, Berkeley, CA 94704

http://www.icsi.berkeley.edu/[email protected]

In Collaboration with Hannah Carvey, Leah Hitchcock and Shawn Chang

Page 2: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Acknowledgements and Thanks

Statistical Analysis and Automatic ClassificationHannah Carvey, Shawn Chang, Leah Hitchcock

Research FundingU.S. National Science FoundationU.S. Department of Defense

Page 3: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

For Further Information

Consult the web site:

www.icsi.berkeley.edu/~steveng

Page 4: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

OVERTURE

The Central Challenge for Models of Speech Recognition

Page 5: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Language - The Traditional PerspectiveThe “classical” view of spoken language posits a quasi-arbitrary relation between

the lower and higher tiers of linguistic organization

Cat= [k] + [ae] + [t]

Cat = /k/ + /ae/ + /t/

Page 6: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Serial Frame Perspective on SpeechTraditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed

spectral profile of the acoustic signal (provided courtesy of the auditory system) computed for each interval (frame) of speech

Page 7: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Serial Frame Perspective on SpeechTraditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral

profile of the acoustic signal (provided courtesy of the auditory system) computed for each interval (frame) of speech (this is literally how automatic speech recognition systems decode the speech signal)

Page 8: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Challenge Number One

Pronunciation Variability

Page 9: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse

Page 10: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse The are literally dozens of ways in which common words are pronounced

Page 11: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Variability of Real SpeechPronunciation patterns encountered in everyday life are extremely diverse The are literally dozens of ways in which common words are pronounced

(as the following two slides illustrate for the word “and” based on manual phonetic annotation of a corpus comprising telephone dialogues)

Page 12: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

How Many Pronunciations of “and”?

82 ae n63 eh n45 ix n35 ax n34 en30 n20 ae n dcl d17 ih n17 q ae n11 ae n d

7 q eh n7 ae nx6 ae ae n6 ah n5 eh nx4 uh n4 ix nx4 q ae n dcl d3 eh n d3 q ae nx

3 eh2 ae n dcl2 ae2 ax m2 ax n d2 ae eh n dcl d2 eh n dcl d2 ax nx2 q ae ae n2 q ix n2 ix n dcl d2 ih 2 eh eh n2 q eh nx2 ix d n1 eh m1 ax n dcl d1 aw n1 ae q1 eh dcl

N Pronunciation N Pronunciation

Canonical pronunciation

Page 13: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

How Many Pronunciations of “and”?

1 ah nx1 ae n t1 eh d1 ah n dcl d1 ey ih n dcl1 ae ix n1 ae nx ax1 ax ng1 ay n1 ih ah n d1 ae hh1 ih ng1 ix1 ae n d dcl1 ix dcl d1 ae eh n1 hh n1 ix n t1 ae ax n dcl d1 iy eh n

1 m1 ae ae n d1 nx1 q ae ae n1 q ae ae n dcl d1 q ae eh n dcl d1 q ae ih n1 aa n1 q ae n d1 ? nx1 q ae n q1 eh n m1 q eh en dcl1 eh ng1 q eh n q1 em1 q eh ow m1 q ih n1 q ix en1 er

N Pronunciation N Pronunciation

Page 14: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Variability of Real SpeechThe are literally dozens of ways in which common words are pronounced

And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)

Page 15: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

1   I 6 4 9   5 3   5 3   a y

2   a n d 5 2 1   8 7   1 6   a e n

3   th e 4 7 5    7 6   2 7   d h a x

4   y o u 4 0 6   6 8   2 0   y ix

5   th a t 3 2 8   1 1 7   1 1   d h a e

6   a 3 1 9   2 8   6 4   a x

7   to 2 8 8   6 6   1 4   tc l t u w

8   k n o w 2 4 9   3 4   5 6   n o w

9   o f 2 4 2   4 4   2 1   a x v

1 0   it 2 4 0   4 9   2 2   ih

1 1   y e a h 2 0 3   4 8   4 3   y a e

1 2   in 1 7 8   2 2   4 5   ih n

1 3   th e y 1 5 2   2 8   6 0   d h e y

1 4   d o 1 3 1   3 0   5 4   d c l d u w

1 5   s o 1 3 0   1 4   7 4   s o w

1 6   b u t 1 2 3   4 5   1 2   b c l b a h tc l t

1 7   is 1 2 0   2 4   5 0   ih z

1 8   lik e 1 1 9   1 9   4 6   l a y k c l k

1 9   h a v e 1 1 6   2 2   5 4   h h a e v

2 0   w a s 1 1 1   2 4   2 3   w a h z

2 1   w e 1 0 8   1 3   8 3   w iy

2 2   it's 1 0 1   1 4   2 0   ih tc l s

2 3   ju s t 1 0 1   3 4   1 7   jh ix s

2 4   o n 9 8   1 8   4 9   a a n

2 5   o r 9 4   2 3   3 6   e r

2 6   n o t 9 2   2 4   2 4   m a a q

2 7   th in k 9 2   2 3   3 2   th ih n g k c l k

2 8   fo r 8 7   1 9   4 6   f e r

2 9   w e ll 8 4   4 9   2 3   w e h l

3 0   w h a t 8 2   4 0   1 4   w a h d x

3 1   a b o u t 7 7   4 6   1 2   a x b c l b a w

3 2   a ll 7 4   2 7   2 4   a o l

3 3   th a t's 7 4   1 9   1 6   d h e h s

3 4   o h 7 4   1 7   6 1   o w

3 5   re a lly 7 1   2 5   4 5   r ih l iy

3 6   o n e 6 9   8   7 8   w a h n

3 7   a re 6 8   1 9   4 2   e r

3 8   I'm 6 7 9   2 6   q a a m

3 9   rig h t 6 1   2 1   2 8   r a y

4 0   u h 6 0   1 6   4 1   a h

4 1   th e m 6 0   1 8   2 3   a x m

4 2   a t 5 9   3 6   8   a e d x

4 3   th e re 5 8   2 8   2 2   d h e h r

4 4   my 5 8   9   6 6   m a y

4 5   me a n 5 6   1 0   5 8   m iy n

4 6   d o n 't 5 6   2 1   1 4   d x o w

4 7   n o 5 5   8   7 7   n o w

4 8   w ith 5 5   2 0   3 5   w ih th

4 9   if 5 5   1 8   4 1   ih f

5 0   w h e n 5 4   1 8   3 1   w e h n

5 1   c a n 5 4   2 8   1 5   k c l k a e n

5 2   th e n 5 1   1 9   3 8   d h e h n

5 3   b e 5 0   1 1   7 6   b c l b iy

5 4   a s 4 9   1 6   1 8   a e z

5 5   o u t 4 7   1 9   2 2   a e d x

5 6   k in d 4 7   1 7   2 1   k c l k a x n x

5 7   b e c a u e 4 6   3 1   1 5   k c l k a x z

5 8   p e o p le 4 5   2 1   4 4  p c l p iy p c l l e l

5 9   g o 4 5   5   8 3   g c l g o w

6 0   g o t 4 5   3 2   1 5   g c l g a a

6 1   th is 4 4   1 1   4 7   d h ih s

6 2   s o me 4 3   4   4 8   s a h m

6 3   w o u ld 4 1   1 6   2 9   w ih d c l

6 4   th in g s 4 1   1 5   5 2   th ih n g z

6 5   n o w 3 9   1 1   6 9   n a w

6 6   lo t 3 9   9   4 7   l a a d x

6 7   h a d 3 9   1 9   2 4   h h a e d c l

6 8   h o w 3 9   1 1   5 3   h h a w

6 9   g o o d 3 8   1 3   2 7   g c l g u h d c l

7 0   g e t 3 8   2 0   1 3   g c l g e h d x

7 1   s e e 3 7   6   8 0   s iy

7 2   fro m 3 6   1 0   2 8   f r a h m

7 3   h e 3 6   7   3 9   iy

7 4   me 3 5   5   8 7   m iy

7 5   d o n 't 3 5   2 1   1 4   d x o w

7 6   th e ir 3 3   1 9   2 5   d h e h r

7 7   mo re 3 2   1 1   5 6   m a o r

7 8   it's 3 1   1 4   2 0   ih tc l s

7 9   th a t's 3 1   2 0   1 6   d h e h s

8 0   to o 3 1   6   6 0   tc l t u w

8 1   o k a y 3 1   1 7   4 5   o w k c l k e y

8 2   v e ry 3 0   1 1   3 6   v e h r iy

8 3   u p 3 0   1 1   3 4   a h p c l p

8 4   b e e n 3 0   1 1   5 1   b c l b ih n

8 5   g u e s s 2 9   8   4 2   g c l g e h s

8 6   time 2 9   8   6 2   tc l t a y m

8 7   g o in g 2 9   2 1   1 3   g c l g o w ih n g

8 8   in to 2 8   2 0   1 4   ih n tc l t u w

8 9   th o s e 2 7   1 2   4 2   d h o w z

9 0   h e re 2 7   1 1   2 5   h h iy e r

9 1   d id 2 7   1 3   2 3   d c l d ih d x

9 2   w o rk 2 5   8   6 6   w e r k c l k

9 3   o th e r 2 5   1 4   2 6   a h d h e r

9 4   a n 2 5   1 2   2 8   a x n

9 5   I'v e 2 5   7   4 6   a y v

9 6   th in g 2 4   9   5 2   th ih n g

9 7   e v e n 2 4   7   4 0   iy v ix n

9 8   o u r 2 3   9   3 3   a a r

9 9   a n y 2 3   1 1   2 3   ix n iy

1 0 0   w e 're 2 3   8   2 5   w e y r

How Many Different Pronunciations?

1  I 649  53  53  ay2  and 521  87  16  ae n3  the 475   76  27  dh ax4  you 406  68  20  y ix5  that 328  117  11  dh ae6  a 319  28  64  ax7  to 288  66  14  tcl t uw8  know 249  34  56  n ow9  of 242  44  21  ax v

10  it 240  49  22  ih11  yeah 203  48  43  y ae12  in 178  22  45  ih n13  they 152  28  60  dh ey14  do 131  30  54  dcl d uw15  so 130  14  74  s ow16  but 123  45  12  bcl b ah tcl t17  is 120  24  50  ih z18  like 119  19  46  l ay kcl k19  have 116  22  54  hh ae v20  was 111  24  23  w ah z

Rank Word N #PronMost CommonPronunciation

MCP%Total

The 20 most frequent words account for 35% of the tokens

Page 16: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

QUESTION

How do listeners decode the speech signal given the large amount of

pronunciation variation?

Page 17: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Challenge Number Two

Acoustic Variability

Page 18: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal

structure of the speech signal under everyday conditions

Page 19: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal structure of the speech signal under everyday conditions

Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)

Page 20: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Effects of Reverberation on the Speech SignalReflections from walls and other surfaces routinely modify the spectro-temporal structure of the speech signal under everyday conditions

Yet, the intelligibility of speech is remarkably stable (unless the amount of reverberation or background noise is truly extreme)

How can this be so?

Page 21: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

QUESTION

Is there some acoustic property that provides a basis for perceptual stability

of the speech signal?

Page 22: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved

under many acoustic-interference conditions

[based on an illustration by Hynek Hermansky]

Modulation Spectrum

Page 23: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions

In reverberant environments the MODULATION SPECTRUM’S peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)

[based on an illustration by Hynek Hermansky]

Modulation Spectrum

Page 24: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions

In reverberant environments the modulation spectrum’s peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)

(“What is the modulation spectrum?” you ask)

[based on an illustration by Hynek Hermansky]

Modulation Spectrum

Page 25: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

An Invariant Property of the Speech Signal?Low-frequency energy fluctuations of the pressure waveform are largely preserved under many acoustic-interference conditions

In reverberant environments the modulation spectrum’s peak is attenuated and shifted down to ca. 2 Hz (but is largely preserved)

(“What is the modulation spectrum?” you ask) – Let’s find out!

[based on an illustration by Hynek Hermansky]

Modulation Spectrum

Page 26: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Modulation Spectrum Computation

Page 27: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Intelligibility and the Modulation SpectrumSignificant attenuation (or distortion) of the modulation spectrum results in an

appreciable decline in the ability to understand spoken language

Greenberg and Arai (1998)

Page 28: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Intelligibility and the Modulation SpectrumSignificant attenuation (or distortion) of the modulation spectrum results in an appreciable decline in the ability to understand spoken

language

Why should this be so?

Greenberg and Arai (1998)

Page 29: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

Page 30: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

What does it reflect linguistically?

Page 31: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

What does it reflect linguistically?

Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?

Page 32: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

What does it reflect linguistically?

Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

Page 33: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

What does it reflect linguistically?

Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?

Does the modulation spectrum reflect a unitary property of the speech signal?

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

Page 34: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of the Modulation SpectrumWhy is the modulation spectrum’s integrity so crucial for intelligibility?

What does it reflect linguistically?

Why is the bandwidth of the modulation spectrum associated with (intelligible) speech so broad?

Does the modulation spectrum reflect a unitary property of the speech signal?

Or something more complex?

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

Page 35: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)

Page 36: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)

The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….

Page 37: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)

The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….

Syllable duration(in terms of equivalentModulation frequency)

Modulation Spectrum

Modulation spectrum of a short excerpt from the Switchboard Corpus

Syllable duration distribution associated with a 30-minute subset of Switchboard

Page 38: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Modulation Spectrum Reflects SyllablesThe peak in the modulation spectrum (for speech) is ca. 5 Hz (200 ms)

The distribution associated with SYLLABLE DURATION is similar to the pattern of the MODULATION SPECTRUM ….

Suggesting that the latter reflects SYLLABLES

Syllable duration(in terms of equivalentModulation frequency)

Modulation spectrum of a short excerpt from the Switchboard Corpus

Syllable duration distribution associated with a 30-minute subset of Switchboard

Page 39: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Trouble with Syllables …The question thus arises …

Page 40: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Trouble with Syllables …The question thus arises …

If the modulation spectrum truly reflects syllables in the speech signal

Page 41: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Trouble with Syllables …The question thus arises …

If the modulation spectrum truly reflects syllables in the speech signal

Why is the distribution of syllable duration so broad?

Page 42: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Trouble with Syllables …The question thus arises …

If the modulation spectrum truly reflects syllables in the speech signal

Why is the distribution of syllable duration so broad?

Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997)

Syllable duration(modulation frequency)

Modulation Spectrum

Page 43: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Trouble with Syllables …The question thus arises …

If the modulation spectrum truly reflects syllables in the speech signal

Why is the distribution of syllable duration so broad?

And does this variability in syllable duration reflect something significant?

Syllable duration(modulation frequency)

Modulation Spectrum

Modulation spectrum of 15 minutes of spontaneous Japanese speech (OGI-TS corpus) compared with the syllable duration distribution for the same material (Arai and Greenberg, 1997)

Page 44: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART ONE

What Underlies

Variation in Word Duration?

Page 45: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word DurationMost words (81%) in the Switchboard corpus are monosyllabic, and most

of the remainder are disyllabic (together comprising 95% of the words)

Page 46: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word DurationMost words (81%) in the Switchboard corpus are monosyllabic, and most of the remainder

are disyllabic (together comprising 95% of the words)

The distribution of word duration therefore largely parallels that of syllables (plotted in units of duration [ms] on a logarithmic scale)

All Words

Page 47: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?

Page 48: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?

Or does it reflect a more complex set of phenomena?

Page 49: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?

Or does it reflect a more complex set of phenomena?

It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)

Page 50: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?

Or does it reflect a more complex set of phenomena?

It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)

Does such a relationship hold for spoken language?

Page 51: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What Underlies Word Duration Variability?Is this distribution of lexical duration of a uniform nature (and source)?

Or does it reflect a more complex set of phenomena?

It has been observed for WRITTEN text that the more frequent words tend to be shorter and the less common words longer (i.e., Zipf’s law)

Does such a relationship hold for spoken language?

Let’s find out!

Page 52: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the

Switchboard corpus) can be plotted relative to frequency of occurrence

Page 53: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the

Switchboard corpus) can be plotted relative to frequency of occurrence

0

50

100

150

200

250

300

350

400

450

500

1 10 100 1000

Number of Occurences

Duration (ms)

r = – 0 .42Words with fewer than 5 instances omitted from graph

Page 54: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus)

can be plotted relative to frequency of occurrence

Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration

0

50

100

150

200

250

300

350

400

450

500

1 10 100 1000

Number of Occurences

Duration (ms)

r = – 0 .42Words with fewer than 5 instances omitted from graph

Page 55: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to

frequency of occurrence

Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration

There is a lot of variability in word duration for any given frequency range

0

50

100

150

200

250

300

350

400

450

500

1 10 100 1000

Number of Occurences

Duration (ms)

r = – 0 .42Words with fewer than 5 instances omitted from graph

Page 56: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Is Word Duration Related to Word Frequency?Word duration (derived from the phonetically annotated portion of the Switchboard corpus) can be plotted relative to frequency

of occurrence

Such an exercise shows that there is a WEAK relationship (r = – 0.42) between lexical (unigram) frequency and word duration

There is a lot of variability in word duration for any given frequency range

Suggesting that lexical frequency, alone, is unlikely to account for variation in word duration

0

50

100

150

200

250

300

350

400

450

500

1 10 100 1000

Number of Occurences

Duration (ms)

r = – 0 .42Words with fewer than 5 instances omitted from graph

Page 57: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Page 58: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Page 59: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

Page 60: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

Page 61: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

Page 62: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment

Page 63: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment

The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers

Page 64: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and

other durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed

(as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment

The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers

Before examining these data, let’s briefly consider the nature of the annotated material

Page 65: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

If Not (entirely) Word Frequency, Then What? One parameter that might be more directly related to word duration (and other

durational properties of speech) is STRESS ACCENT

Stress Accent is related to the emphasis (or prominence) associated with individual syllables within a word

Although dictionaries list the stress patterns associated with words, this information is but a rough guide to the actual patterns observed (as is the phonetic pronunciation provided in the dictionary)

In order to obtain empirical data pertaining to stress accent, it is necessary to manually annotate a corpus (syllable by syllable)

This manual annotation has been performed for a 45-minute subset of the Switchboard corpus, which has also been labeled with respect to phonetic segments, syllables and words

It is thus possible to ascertain the relationship between stress accent and duration at the level of the word, syllable and phonetic segment

The remainder of this presentation focuses on the statistical relationship between stress accent and duration at these different linguistic tiers

Before examining these data, let’s briefly consider the nature of the annotated material

(this is important for evaluating the reliability of the results obtained)

Page 66: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

INTERMEZZO

Being Phonetically (and Prosodically)

Annotated

Page 67: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)    

Page 68: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated    

Page 69: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level

Page 70: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level

Page 71: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods

Page 72: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods45 minutes of stress-accent-labeled material

Page 73: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent

(this latter material not used in the current analysis, but will be available soon)  

Page 74: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent

(this latter material not used in the current analysis, but will be available soon)  

There is a Lot of Diversity in the Material Transcribed

Page 75: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent

(this latter material not used in the current analysis, but will be available soon)  

There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American

dialectal variation, speaking rate and voice quality

Page 76: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD

corpus, have been phonetically annotated (labeled and segmented)

Most of this Material has been Manually Annotated     4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using

automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent (this

latter material not used in the current analysis, but will be available soon)  

There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American

dialectal variation, speaking rate and voice quality

Transcription SystemA variant of Arpabet (which was also used for transcription of the TIMIT corpus)

Page 77: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….

Page 78: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….

http://www.icsi/berkeley.edu/real/stp

Page 79: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

Page 80: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics students

Page 81: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform

Page 82: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram

Page 83: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription

Page 84: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and

“forced alignments” (automatic estimates of phones and boundaries)

Page 85: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and

“forced alignments” (automatic estimates of phones and boundaries) + audio

Page 86: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and

“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations

Page 87: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Phonetic Transcription How was the Labeling and Segmentation Performed?

VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and

“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations

Additionally, automatic segmentation and labeling of articulatory manner was used as a guide for phonetic labeling and segmentation in recent work

Page 88: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Page 89: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Page 90: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy

Page 91: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light

Page 92: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

Page 93: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

Page 94: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

Page 95: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

Page 96: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)

Page 97: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Annotation of Stress AccentForty-five minutes of the phonetically annotated portion of the Switchboard

corpus was manually labeled with respect to stress accent

Three levels of accent were distinguished:

Heavy Light None

(In actuality, labelers assigned a “1” to a fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))

Page 98: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART TWO

The Relation between

Stress Accent and Word Duration

Page 99: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Page 100: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency

Page 101: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude

Page 102: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

Page 103: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

Page 104: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment

(at least for spontaneous American English material)

Page 105: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment

(at least for spontaneous American English material)

Amplitude and duration appear to play a far more important role than f0

Page 106: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment

(at least for spontaneous American English material)

Amplitude and duration appear to play a far more important role than f0

Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration

Page 107: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment

(at least for spontaneous American English material)

Amplitude and duration appear to play a far more important role than f0

Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration

So …

Page 108: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Back to Stress Accent and Word Duration…Stress accent is supposed to bear some systematic relation to three

principal acoustic parameters of the speech signal:

Fundamental Frequency Amplitude Duration

In previous studies my colleagues and I have shown that f0 -related cues play a relatively small role in stress accent assignment

(at least for spontaneous American English material)

Amplitude and duration appear to play a far more important role than f0

Therefore, it is not unreasonable to assume that the stress accent patterns associated with words bear some tangible relation to lexical duration

So …, let’s find out!

Page 109: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words

Page 110: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words

(these are words containing at least one heavily accented syllable)

Page 111: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words

(these are words containing at least one heavily accented syllable)

The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms)

Heavily Accented

Page 112: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Word Duration and Stress Accent LevelLet’s first examine the durational properties of heavily accented words (these are words

containing at least one heavily accented syllable)

The mean duration of this subset (36%) is 378 ms (s.d. = 168 ms)

Most of the heavily accented words are longer than 200 ms

Heavily Accented

Page 113: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)

Word Duration and Stress Accent Level

Heavily Accented

Page 114: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)

The mean duration of this subset is 255 ms (s.d. = 116 ms)

Word Duration and Stress Accent Level

Page 115: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Let’s now compare the duration of the heavily accented words with those of their lightly accented counterparts (25% of the total)

The mean duration of this subset is 255 ms (s.d. = 116 ms)

In many respects the durational properties of these two subsets are similar

Word Duration and Stress Accent Level

Page 116: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Let’s now compare the duration of unaccented words with that of their accented counterparts

Word Duration and Stress Accent Level

Page 117: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)

Word Duration and Stress Accent Level

Page 118: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)The unaccented words are generally shorter than 200 ms

Word Duration and Stress Accent Level

Page 119: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

Let’s now compare the duration of unaccented words with that of their accented counterpartsThe mean duration of the unaccented subset (39%) is 149 ms (s.d. = 78 ms)The unaccented words are generally shorter than 200 ms and constitute a very different distributional form than their accented counterparts

Word Duration and Stress Accent Level

Page 120: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

Let’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to words of varying accent levels

Word Duration and Stress Accent Level

Page 121: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

All Words

Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus

with those pertaining to words of varying accent levels

When we do so,

Page 122: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

All Words

Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus with those

pertaining to words of varying accent levels

When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words,

Page 123: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

All Words

Word Duration and Stress Accent LevelLet’s now compare the durational properties of ALL WORDS in the corpus with those pertaining to

words of varying accent levels

When we do so, we notice that the left-hand branch of the lexical distribution largely reflects unaccented words, while the right-hand branch reflects mostly accented words (with the peak reflecting both)

Page 124: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

All Words

Word Duration and Stress Accent LevelTherefore, it appears that the broad distribution of word duration

(and, in turn, syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech

Page 125: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Heavily Accented

LightlyAccented

Unaccented

All Words

Word Duration and Stress Accent LevelTherefore, it appears that the broad distribution of word duration (and, in turn,

syllable duration) largely reflects the co-existence of accented and unaccented words within spontaneous speech

What are the implications of this insight?

Page 126: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to

reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

Page 127: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to

reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

UnaccentedHeavily Accented

All Accents(Convergnce)

Page 128: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to

reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

Does this insight have implications for the lower tiers of spoken language?

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

UnaccentedHeavily Accented

All Accents(Convergnce)

Page 129: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to

reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels)

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

UnaccentedHeavily Accented

All Accents(Convergnce)

Page 130: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Breadth of the Modulation SpectrumThe broad bandwidth of the modulation spectrum, therefore, appears to

reflect the heterogeneity in syllabic and lexical duration associated with variation in stress accent level

Does this insight have implications for the lower tiers of spoken language? (e.g., the phonetic and phonological levels)

Let’s find out!

Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth)

UnaccentedHeavily Accented

All Accents(Convergnce)

Page 131: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

INTERMEZZO

Anatomy of the Syllable

Page 132: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

Page 133: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position

Page 134: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

Page 135: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

Page 136: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is an onset?

Page 137: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is a onset? What is a nucleus?

Page 138: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is a onset? What is a nucleus? What is a coda?

Page 139: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

What is a nucleus? What is a coda? What is a coda?

The following slides provide a brief (and gentle) introduction to syllable structure

Page 140: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

“J” = JUNCTURE

Page 141: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

“J” = JUNCTURE

Page 142: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

“J” = JUNCTURE

Page 143: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

“J” = JUNCTURE

Page 144: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

“J” = JUNCTURE

Page 145: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA

Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition)

Most (but not all) syllables also contain an ONSET (usually a CONSONANT)

Many syllables contain a CODA (also typically a CONSONANT)

The most common syllable form in English is Onset + Nucleus + Coda (“Nine”)

Followed in popularity by Onset + Nucleus (“Two”)

“J” = JUNCTURE

Page 146: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART THREE

Stress Accent and Syllable Position

Page 147: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level

we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

Page 148: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Importance of Syllable StructureBefore going into the details of durational variation at the segmental level

we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable

Page 149: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

All Segments

Pronunciation Variation – Syllable and Accent

Deletions

InsertionsSubstitutions

Pronunciation variation is systematic at the level of the syllable

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 150: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

All Segments

Pronunciation Variation – Syllable and Accent

Deletions

InsertionsSubstitutions

Pronunciation variation is systematic at the level of the syllable

It’s also systematic when stress accent is taken into account

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 151: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable

It’s also systematic when stress accent is taken into account

BOTH syllable structure and accent level are required for a full accounting

All Segments Deletions

InsertionsSubstitutions

CODATerritory

ONSETTerritory

NUCLEUSTerritory

Page 152: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A Coarse Perspective on Pronunciation Variation(at the level of the syllable and stress accent)

Page 153: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

Page 154: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

Page 155: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

Page 156: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Analysis of Durational Properties of SpeechThe following analyses are conditioned on stress accent level and (for the

most part) syllable position

We will begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none

Page 157: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

Page 158: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

Together, the V, VC, CV and CVC forms account for 85% of syllables

Page 159: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

Together, the V, VC, CV and CVC forms account for 85% of syllables

The CVCC and CCVC forms account for another 10%

Page 160: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsThere is a broad range of syllable structures observed in spoken English

Together, the V, VC, CV and CVC forms account for 85% of syllables

The CVCC and CCVC forms account for another 10%

Together, the CV and CVC forms cover ca. 60% of the syllables

Page 161: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number

of segments within the syllable (as shown in the graph below)

Canonical Syllable Forms

V = VowelC = Consonant

Page 162: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number

of segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy

Canonical Syllable Forms

V = VowelC = Consonant

Page 163: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number

of segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy

This pattern is representative of accent’s impact on duration

Canonical Syllable Forms

V = VowelC = Consonant

Page 164: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Across Syllable FormsIt is not surprising that syllable duration is largely a function of the number

of segments within the syllable (as shown in the graph below)

Note the systematic lengthening of the syllable for each form as the accent level increases from none to light to heavy

This pattern is representative of accent’s impact on duration (as we’ll see)

Canonical Syllable Forms

V = VowelC = Consonant

Page 165: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)

V = VowelC = Consonant

Page 166: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

V = VowelC = Consonant

Page 167: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)

V = VowelC = Consonant

Page 168: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration - Accent Level/Syllable Form

Canonical Syllable Forms

This graph shows the same data as the previous slides, but from the perspective of only two accent levels (heavy and none)

The heavily accented syllables are generally 60-100% longer than their unaccented counterparts

The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV)

This pattern implies that accent has the greatest impact on vocalic duration

V = VowelC = Consonant

Page 169: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

Page 170: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

The duration of vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

Page 171: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Nucleus Duration - Accent Level/Syllable FormThe hypothesis delineated on the previous slide (that accent has the most

profound impact on vocalic duration) is confirmed in the graph below

The duration of vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)

Page 172: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART FOUR

Stress Accent and the Vocalic Nucleus

Page 173: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration

Stress Accent’s Impact on the Vocalic Nucleus

Page 174: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

Stress Accent’s Impact on the Vocalic Nucleus

Page 175: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact

Stress Accent’s Impact on the Vocalic Nucleus

Page 176: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact

But first … a brief primer on vocalic acoustics

Stress Accent’s Impact on the Vocalic Nucleus

Page 177: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the structure of the syllable has relatively little impact on vocalic duration

As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form

We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact

But first … a brief primer on vocalic acoustics (which should facilitate digesting the material that follows)

Stress Accent’s Impact on the Vocalic Nucleus

Page 178: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

INTERMEZZO

A Brief Primer on Vowel Acoustics

Page 179: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A Brief Primer on Vocalic Acoustics

Page 180: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

A Brief Primer on Vocalic Acoustics

Page 181: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

A Brief Primer on Vocalic Acoustics

Page 182: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

A Brief Primer on Vocalic Acoustics

Page 183: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:

A Brief Primer on Vocalic Acoustics

Page 184: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue

• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance

• The height parameter is closely linked to the frequency of F1

In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:

A Brief Primer on Vocalic Acoustics

Page 185: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Spatial Patterning of Duration

in

Vocalic Nuclei

Page 186: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data

Spatial Patterning of Duration

Page 187: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data

The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position

Spatial Patterning of Duration

Page 188: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data

The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)

Spatial Patterning of Duration

Page 189: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data

The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)

The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei

Spatial Patterning of Duration

Page 190: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data

The duration will be plotted on a 2-D grid, where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)

The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei

Spatial Patterning of Duration

Page 191: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to

duration

Page 192: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to

duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Page 193: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Page 194: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Thus, duration appears to be highly correlated with vowel height

Page 195: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Duration and Vowel Height

All nuclei Diphthongs Monophthongs

The spatial patterning of vocalic segments is systematic with respect to duration

Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Thus, duration appears to be highly correlated with vowel height

But … the situation is a little more complicated than first appearances would suggest

Page 196: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented

vocalic nuclei

Canonical Syllable Forms

Page 197: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei

Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs

Canonical Syllable Forms

Page 198: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between accented and unaccented vocalic nuclei

Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs

Canonical Syllable Forms

Lax monophthongs

Page 199: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

Page 200: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

The low vowels, be they monophthongs or diphthongs, are rarely unstressed

Page 201: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed

The low vowels, be they monophthongs or diphthongs, are rarely unstressed

The high diphthongs and high/mid, tense monophthongs occupy an intermediate position

Page 202: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The high vowels are rarely fully stressed

Vocalic Identity Among Fully Stressed Nuclei

Page 203: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

Vocalic Identity Among Fully Stressed Nuclei

Page 204: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

An intermediate degree of stress accounts for the other vocalic instances

Vocalic Identity Among Fully Stressed Nuclei

Page 205: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The high vowels are rarely fully stressed

The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed

An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here)

Vocalic Identity Among Fully Stressed Nuclei

Page 206: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

Is It Stress? Vocalic Identity? Or What?

Page 207: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)

Is It Stress? Vocalic Identity? Or What?

Page 208: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 209: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High Vowels

Is It Stress? Vocalic Identity? Or What?

Page 210: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 211: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress Accent

Is It Stress? Vocalic Identity? Or What?

Page 212: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 213: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs

High Vowels are Fully Stressed Extremely Rarely

Is It Stress? Vocalic Identity? Or What?

Page 214: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs

High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs

Is It Stress? Vocalic Identity? Or What?

Page 215: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs

High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs

Thus, Stress Accent Appears to Be Intricately Involved with Vocalic Identity

Is It Stress? Vocalic Identity? Or What?

Page 216: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse

For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the

diphthongs

Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs

Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs

High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs

Thus, Stress Accent Appears to Be Intricately Involved with Vocalic Identity (as illustrated on the next several slides)

Is It Stress? Vocalic Identity? Or What?

Page 217: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

The Vowel Space Under (Full) Stress (Accent) There is a relatively even distribution of segments across the vowel space,

with a slight bias towards the front and central vowels

Canonical Vowels Only

Page 218: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space

The Vowel Space Without (Stress) Accent

Canonical Vowels Only

Page 219: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space

The low and mid vowels “get creamed”

The Vowel Space Without (Stress) Accent

Canonical Vowels Only

Page 220: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress accent exerts a profound effect on the character of the vowel space

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 221: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 222: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

Low vowels are mostly associated with accented forms

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 223: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress accent exerts a profound effect on the character of the vowel space

High vowels are largely associated with unaccented syllables

Low vowels are mostly associated with accented forms

This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation

The Vowel Spaces Compared

Heavily Accented Unaccented

Canonical Vowels Only

Page 224: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART FIVE

Stress Accent’s Impact on Syllable Onsets

Page 225: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

Page 226: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Page 227: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

Page 228: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

The following slides suggest that this assumption is incorrect

Page 229: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable OnsetsThe onset is often cited as the key syllabic constituent with respect to

“lexical access”

It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

The following slides suggest that this assumption is incorrect,

And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest

Page 230: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent

level (though not quite as much as in vocalic constituents)

Page 231: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent

level (though not quite as much as in vocalic constituents)

Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter

Page 232: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormThe duration of the syllable onset varies significantly as a function of accent level (though not

quite as much as in vocalic constituents)

Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter

The duration of unaccented onsets is similar across syllable forms

Page 233: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their

unaccented counterparts

Page 234: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Canonical Syllable Forms

Onset Duration - Accent Level/Syllable FormOnsets of accented syllables are generally 50-60% longer than their

unaccented counterparts

Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)

Page 235: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)

constituents in somewhat greater detail

Page 236: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)

constituents in somewhat greater detail

A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns

Page 237: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)

constituents in somewhat greater detail

A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns

What is place of articulation?

Page 238: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationIt is of interest to examine accent’s impact on duration of onset (and coda)

constituents in somewhat greater detail

A convenient means to do so is to partition the data with respect to place of maximum articulatory constriction in order to highlight certain patterns

What is place of articulation? Let’s find out!

Page 239: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Place of Articulation – A Brief PrimerThe tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English

AnteriorLabial [p] [b] [m]Labio-dental [f] [v] Inter-dental [th] [dh]

CentralAlveolar [t] [d] [n] [s] [z]

PosteriorPalatal [sh] [zh]Velar [k] [g] [ng]

ChameleonRhoticized [r]Lateral [l]Approximant [hh]

From Daniloff (1973)

Page 240: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)

constituents on the basis of articulatory place

Page 241: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)

constituents on the basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Page 242: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)

constituents on the basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

Page 243: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)

constituents on the basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …

Page 244: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda)

constituents on the basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …

There is a large disparity in the durational differences due to accent level

Page 245: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda) constituents

on the basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …

There is a large disparity in the durational differences due to accent level

We will now examine the specific durational patterns as a function of articulatory place ...

Page 246: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Onset Duration and Place of ArticulationWe will examine accent’s impact on the duration of onset (and coda) constituents on the

basis of articulatory place

First, we will examine the anterior consonants, followed by the central and posterior onsets

Finally, we will examine those segments whose place of articulation assimilates to that of the following vocalic segment (“place chameleons”)

Although the heavily accented onsets are generally 50-60% longer than their unaccented counterparts …

There is a large disparity in the durational differences due to accent level

We will now examine the specific durational patterns as a function of articulatory place ...

The patterns are revealing

Page 247: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - ANTERIOR Place

Canonical Syllable Forms

The voiceless consonants ([p] and [f]) are longer than the other segments

Page 248: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - ANTERIOR Place

Canonical Syllable Forms

The voiceless consonants ([p] and [f]) are longer than the other segments

The largest durational disparity (as a function of accent level) is exhibited in the glide [y]

Page 249: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - ANTERIOR Place

Canonical Syllable Forms

The voiceless consonants ([p] and [f]) are longer than the other segments

The largest durational disparity (as a function of accent level) is exhibited in the glide [y]

The smallest durational disparity is manifest in the voiced fricative [dh]

Page 250: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - ANTERIOR Place

Canonical Syllable Forms

The voiceless consonants ([p] and [f]) are longer than the other segments

The largest durational disparity (as a function of accent level) is exhibited in the glide [y]

The smallest durational disparity is manifest in the voiced fricative [dh]

The other segments exhibit intermediate patterns

Page 251: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Page 252: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

Page 253: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

Page 254: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences

Page 255: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)

Page 256: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Segmental Identity and Stress AccentIt is of interest to compare accent’s impact on segmental duration with its

impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Usually, non-canonical realizations are manifest as segmental deletions

The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

But also exhibits some interesting differences(which are potentially significant for models of phonetic organization)

Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented

Page 257: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 258: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 259: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 260: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

READY?

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 261: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Road Map - How to Interpret the Data

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Compare the numbers in the YELLOW and ORANGE columns

Most numbers in the YELLOW / ORANGE columns will be similar

Indicating that the phonetic realization of the segment is the canonical form

A large disparity between columns is marked with a blue box

READY? OK, Let’s go!

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 262: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Statistics – ANTERIOR Place

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Stress accent exerts relatively little affect on anterior onset segments

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 263: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Syllable Onset Statistics – ANTERIOR PlaceStress accent exerts relatively little affect on anterior onset segments

EXCEPT for [dh] and [y]

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 264: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Statistics – ANTERIOR Place

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 203 205 153 153 94 94 450 452

b 126 127 227 225 214 190 567 542

m 137 137 211 211 116 110 464 458

f 136 136 104 104 113 103 353 343

v 35 33 58 58 108 93 201 184

th 62 61 102 100 28 26 192 187

TotalHeavy Light None

dh 95 80 311 257 625 451 1031 788

y 63 72 135 136 193 145 391 353

Stress accent exerts relatively little affect on anterior onset segments

EXCEPT for [dh] and [y]

[dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 265: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - CENTRAL Place

Canonical Syllable Forms

The voiceless consonants ([t] and [s]) are longer than the other segments

Page 266: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - CENTRAL Place

Canonical Syllable Forms

The voiceless consonants ([t] and [s]) are longer than the other segments

The alveolar flap [dx] and nasal flap [nx] are the shortest segments and don’t exhibit a durational disparity as a function of accent level

Page 267: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Central segments tend to “disappear” under (absence of) stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – CENTRAL Place

Page 268: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Central segments tend to “disappear” under (absence) of stress (accent)

There is also a tendency for flaps ([dx] and [dx]) to insert under similar conditions

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – CENTRAL Place

Page 269: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 241 245 276 230 513 276 1030 751

d 141 143 149 134 173 128 463 405

dx 0 3 0 62 0 179 0 244

n 133 135 237 196 194 130 564 461

nx 0 2 0 40 0 73 0 115

s 289 290 284 287 187 186 760 763

TotalHeavy Light None

z 14 13 16 16 43 45 73 74

Syllable Onset Statistics – CENTRAL PlaceCentral segments tend to “disappear” under (absence) of stress (accent)

There is also a tendency for flaps ([dx] and [dx]) to insert under similar conditions

In heavily accented syllables, central segments maintain their canonical identity

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 270: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - POSTERIOR Place

CANONICAL Syllable Forms

The voiceless consonants ([k], [sh], [ch]) are longer than the other segments

Page 271: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - POSTERIOR Place

CANONICAL Syllable Forms

The voiceless consonants ([k], [sh], [ch]) are longer than the other segments

Most of the segments exhibit a durational disparity between accented and unaccented forms

Page 272: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - POSTERIOR Place

CANONICAL Syllable Forms

The voiceless consonants ([k], [sh], [ch]) are longer than the other segments

Most of the segments exhibit a durational disparity between accented and unaccented forms

The duration of the voiced segments in unaccented syllables is ca. 50-60 ms

Page 273: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - POSTERIOR Place

CANONICAL Syllable Forms

The voiceless consonants ([k], [sh], [ch]) are longer than the other segments

Most of the segments exhibit a durational disparity between accented and unaccented forms

The duration of the voiced segments in unaccented syllables is ca. 50-60 ms

The glide [w] exhibits a significant disparity between accented and unaccented forms

Page 274: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 185 186 189 187 170 168 544 541

g 115 116 138 137 54 51 307 304

ng 0 0 2 3 1 1 3 4

sh 26 26 40 40 73 80 139 146

zh 0 1 2 9 11 17 13 27

ch 32 34 19 27 22 23 73 84

TotalHeavy Light None

jh 31 30 52 43 58 48 141 121

w 201 209 310 330 276 287 787 826

q 0 33 0 64 0 38 0 135

Posterior segments are remarkably stable in onset position

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – Posterior Place

Page 275: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Statistics – Posterior PlacePosterior segments are remarkably stable in onset position

The only significant “deviation” from canonical representation is the intrusion of the glottal stop [q], which lacks phonemic status in English

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 185 186 189 187 170 168 544 541

g 115 116 138 137 54 51 307 304

ng 0 0 2 3 1 1 3 4

sh 26 26 40 40 73 80 139 146

zh 0 1 2 9 11 17 13 27

ch 32 34 19 27 22 23 73 84

TotalHeavy Light None

jh 31 30 52 43 58 48 141 121

w 201 209 310 330 276 287 787 826

q 0 33 0 64 0 38 0 135

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 276: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - Place Chameleons

CANONICAL Syllable Forms

Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms

Page 277: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Onset Duration - Place Chameleons

CANONICAL Syllable Forms

Place chameleon segments exhibit a consistent durational disparity between accented and unaccented forms

In unaccented syllables the duration of these segments is ca. 50-60 ms

Page 278: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 279: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

They are relatively stable at syllable onset, except in unaccented forms

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 280: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

r 272 269 233 215 233 162 738 646

l 184 180 226 212 220 162 630 554

hh 158 156 169 157 67 37 394 350

er 0 0 0 2 0 0 0 2

lg 0 2 0 8 0 21 0 31

el 0 1 0 0 0 0 0 1

TotalHeavy Light None

Syllable Onset Statistics – Place Chameleons“Chameleons” assimilate their place of articulation to the following vowel

They are relatively stable at syllable onset, except in unaccented forms

The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 281: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Patterns – Syllable OnsetsThe ANTERIOR and POSTERIOR onsets are generally canonically realized

(the exceptions typically function as “junctures,” rather than as segments)

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 282: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Patterns – Syllable OnsetsThe ANTERIOR and POSTERIOR onsets are generally canonically realized

(the exceptions typically function as “junctures,” rather than as segments)

The CENTRAL and PLACE CHAMELEON onsets are often non-canonical (and also often function as “junctures”)

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 283: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART SIX

Stress Accent’s Impact on Syllable Codas

Page 284: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

Page 285: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)

Page 286: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)

There is a far greater probability of segmental deletion in coda constituents

Page 287: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)

There is a far greater probability of segmental deletion in coda constituents

Accent level exerts a powerful influence on segmental deletion and on segmental duration

Page 288: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)

There is a far greater probability of segmental deletion in coda constituents

Accent level exerts a powerful influence on segmental deletion and on segmental duration

To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)

Page 289: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Stress Accent and Syllable CodasStress accent’s impact on syllable codas differs from that of onsets

The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are NOT taken into account)

There is a far greater probability of segmental deletion in coda constituents

Accent level exerts a powerful influence on segmental deletion and on segmental duration

To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)

(for this reason the durational properties of ALL syllables, including those in which coda segments are deleted, are also shown)

Page 290: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - ANTERIOR Place

CANONICAL Syllable Forms

The durational disparity between accented and unaccented forms is smaller for codas and for onsets

Page 291: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - ANTERIOR Place

CANONICAL Syllable Forms

The durational disparity between accented and unaccented forms is smaller for codas and for onsets

Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v])

Page 292: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - ANTERIOR Place

CANONICAL Syllable Forms

The durational disparity between accented and unaccented forms is smaller for codas and for onsets

Certain segments exhibit little if any difference in duration as a function of accent (e.g., [b], [m], [v])

Such segments manifest certain properties of flaps

Page 293: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - ANTERIOR Place

ALLSyllable Forms

Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions)

Page 294: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - ANTERIOR Place

ALLSyllable Forms

Because of the significant number of deletions in coda constituents, particularly in unaccented syllables, the durational disparity between accented and unaccented syllables is preserved when duration is computed across ALL syllable forms (including those with deletions)

Those segments exhibiting flap-like properties (e.g., [b], [m], [v]) tend to delete the most in unaccented codas

Page 295: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 296: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 297: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 298: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

p 33 32 39 32 17 13 89 77

b 9 6 4 4 1 1 14 11

m 108 96 148 148 112 83 368 327

f 37 36 40 40 36 48 113 124

v 63 55 102 87 172 94 337 236

th 11 10 24 16 34 20 69 46

TotalHeavy Light None

dh 0 0 0 4 0 5 0 9

Syllable Coda Statistics – Anterior PlaceAnterior coda segments are relatively stable under stress (accent)

The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and

They tend to delete in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 299: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)

Page 300: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)

The duration of many of the coda segments do not exhibit a difference in duration (when computed for the canonical syllable forms)

Page 301: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - CENTRAL Place

CANONICAL Syllable Forms

The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables (see durational data for ALL syllables)

The duration of many of the coda segments do not exhibit a difference in duration (when computed for the canonical syllable forms)

Most of the unaccented codas are short in duration

Page 302: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - CENTRAL Place

ALL Syllable Forms

Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

Page 303: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - CENTRAL Place

ALL Syllable Forms

Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

In some sense the default duration for central codas is very short (more on this point later on in the presentation)

Page 304: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 305: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 306: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 307: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

t 322 126 575 191 562 172 1459 489

d 200 119 295 127 370 96 865 342

n 311 237 498 381 773 542 1582 1160

s 142 135 202 214 151 155 495 504

z 179 149 258 208 271 221 708 578

TotalHeavy Light None

Syllable Coda Statistics – Central PlaceCentral coda segments are extremely unstable under stress (accent)

(except for the fricatives [s] and [z])

The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables

The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 308: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - POSTERIOR Place

CANONICAL Syllable Forms

Many coda consonants are short in duration

Page 309: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - POSTERIOR Place

CANONICAL Syllable Forms

Many coda consonants are short in duration

Most segments exhibit relatively little sensitivity to accent level

Page 310: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - POSTERIOR Place

ALL Syllable Forms

There are relatively few deletions in coda segments, hence the durational patterns are similar for ALL syllable forms relative to the canonical syllable forms

Page 311: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 312: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – Posterior PlacePosterior coda segments are relatively stable under stress (accent)

The primary exception is [ng], which tends to delete in unaccented syllables

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 313: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Accent

Segment Can Trans Can Trans Can Trans Can Trans

k 170 150 196 162 51 39 417 351

g 10 10 8 10 4 5 22 25

q 0 42 0 71 0 54 0 167

ng 63 60 139 126 203 129 405 315

sh 9 9 2 2 4 6 15 17

zh 1 0 0 4 0 2 1 6

TotalHeavy Light None

ch 26 25 27 25 12 12 65 62

jh 10 10 11 10 15 12 36 32

w 0 4 0 2 0 6 0 12

Syllable Coda Statistics – POSTERIOR PlacePosterior coda segments are relatively stable under stress (accent)

The primary exception is [ng], which tends to delete in unaccented syllables

The “infamous” glottal stop [q] tends to insert in this context

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 314: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - Place Chameleons

CANONICAL Syllable Forms

There is a large durational disparity between the accented and unaccented chameleon segments

Page 315: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - Place Chameleons

CANONICAL Syllable Forms

There is a large durational disparity between the accented and unaccented chameleon segments

In unaccented syllables the duration of these segments is ca. 60 ms

Page 316: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - Place Chameleons

ALL Syllable Forms

There are a lot of deletions of coda chameleons in unaccented syllables

Page 317: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Duration - Place Chameleons

ALL Syllable Forms

There are a lot of deletions of coda chameleons in unaccented syllables

Hence the mean duration of these segments in unaccented forms is short

Page 318: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 319: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 320: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Coda Statistics – Place ChameleonsChameleon segments are unstable under stress (accent)

This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms

The segment [r] tends to delete in unaccented syllables, but not otherwise

Can = Canonical formTrans = Transcribed (i.e., phonetically realized)

Page 321: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Patterns – Syllable CodasThe ANTERIOR and POSTERIOR codas are generally canonically realized

(the exceptions typically function as “junctures,” rather than segments)

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 322: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Pronunciation Patterns – Syllable CodasThe ANTERIOR and POSTERIOR codas are generally canonically realized

(the exceptions typically function as “junctures,” rather than segments)

The CENTRAL and PLACE CHAMELEON segments are often non-canonical (and also often function as “junctures”)

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 323: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART SEVEN

Onset and Coda Patterns Compared

Page 324: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 325: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas

The centrally articulated segments are highly unstable in both contexts

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 326: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas

The centrally articulated segments are highly unstable in both contexts

As are the place chameleons

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 327: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Comparison of Syllable Onsets and CodasOnsets tend to be more stable than codas

The centrally articulated segments are highly unstable in both contexts

As are the place chameleons

The unstable anterior and posterior phones are mostly “junctures”

C = Canonical realizationN = Non-canonical realization, N0 = Non-canonical in unaccented syllables

Place of Articulation Approximants

Page 328: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

PART EIGHT

A Preliminary Juncture-Accent Model

Page 329: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A means of visualizing important properties of the acoustic signal

Road Map to the Juncture-Accent Model

Page 330: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A means of visualizing important properties of the acoustic signal

The juncture-accent representation is based on log, critical-band energy across time and frequency

Road Map to the Juncture-Accent Model

Page 331: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A means of visualizing important properties of the acoustic signal

The juncture-accent representation is based on log, critical-band energy across time and frequency

Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles

Road Map to the Juncture-Accent Model

Page 332: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

A means of visualizing important properties of the acoustic signal

The juncture-accent representation is based on log, critical-band energy across time and frequency

Although it is not intended as an auditory representation, it does represent spectro-temporal properties of the signal in a manner consistent with auditory principles

Let’s take a look at some illustrations – Spectro-Temporal Profiles or “STePs”

Road Map to the Juncture-Accent Model

Page 333: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of a Spectro-Temporal Profile

[s]

[eh]

[vx]

[en]

juncture accented syllable

unaccented syllable

“Seven”

mean duration

Full-spectrumperspective

OGI Numbers95

[s] [eh] [vx] [en]

Page 334: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

[s]

[eh]

[vx][en]

juncture accented syllable

unaccented syllable

mean duration

“Seven”

Anatomy of a Spectro-Temporal ProfileHigh-frequency

perspective

OGI Numbers95

[s] [eh] [vx] [en]

Page 335: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Anatomy of a Spectro-Temporal Profile

juncture accented syllable

unaccented syllable

[z]

mean duration

“Zero”

[ih]

[r]

[ax]

Full-spectrumperspective

OGI Numbers95

[z] [ih] [r] [ah]

Page 336: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Spectro-Temporal Profile

juncture unaccented

syllable

mean duration

“Zero”

[ih][r]

[ax]

accented syllable

[z]

High-frequencyperspective

OGI Numbers95

[z] [ih] [r] [ah]

Page 337: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Spectro-Temporal Profile

mean duration

“Three”

[iy][r]

accented syllable

[th]

Full-spectrumperspective

OGI Numbers95

[th] [r] [iy]

Page 338: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Spectro-Temporal Profile

mean duration

“Three”

[r]

accented syllable

[iy]

High-frequencyperspective

OGI Numbers95

[th]

[th] [r] [iy]

Page 339: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and Conclusions(at last!)

Page 340: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Page 341: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Page 342: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

Page 343: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

Page 344: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

The duration of the syllable coda is less affected by stress accent, however ...

Page 345: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

The duration of the syllable coda is less affected by stress accent, however ...

Coda constituents are more prone to deletion as a function of stress accent

Page 346: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

The duration of the syllable coda is less affected by stress accent, however ...

Coda constituents are more prone to deletion as a function of stress accent

Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)

Page 347: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

The duration of the syllable coda is less affected by stress accent, however ...

Coda constituents are more prone to deletion as a function of stress accent

Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)

These data are inconsistent with a segmental model of spoken language

Page 348: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Summary and ConclusionsBased on a detailed analysis of a manually annotated corpus of spontaneous

American English (Switchboard) the following conclusions are drawn:

Stress accent is the primary linguistic property associated with duration at the segmental, syllabic and lexical levels

Stress accent’s impact on duration is most pronounced in the vocalic nucleus

But also affects the duration of the syllable onset

The duration of the syllable coda is less affected by stress accent, however ...

Coda constituents are more prone to deletion as a function of stress accent

Thus, stress accent has an (indirect) impact on duration even for codas (via segmental deletion)

These data are inconsistent with a segmental model of spoken language

But is consistent with a JUNCTURE-ACCENT model based on syllable forms of variable accent level

Page 349: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

That’s All, Folks

Many Thanks for Your Time and Attention

Page 350: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

What’s Going on in Pronunciation?

Page 351: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

What’s Going On? (in pronunciation)

Page 352: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

What’s Going On? (in pronunciation)

Page 353: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

What’s Going On? (in pronunciation)

Page 354: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

What’s Going On? (in pronunciation)

Page 355: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

What’s Going On? (in pronunciation)

Page 356: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

What’s Going On? (in pronunciation)

Page 357: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

What’s Going On? (in pronunciation)

Page 358: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

What’s Going On? (in pronunciation)

Page 359: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v]

What’s Going On? (in pronunciation)

Page 360: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

With respect to onset and coda segments (i.e. consonants) there are two basic forms – (1) those that are relatively stable across accent level, and (2) those that are not

Most of the non-continuants (i.e. stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior

The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables

The place chameleons (i.e., the approximants) are not very stable in either onset or coda position

The vowels are divisible into two main groups – accented and unaccented

The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space

The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space

Certain segments are actually junctures – e.g., the flaps and the glottal stop

Many so-called segments are actually junctures (as they are flaps), the most noteworthy examples are [dh] and [v]

None of these properties is consistent with a segmental model of language

What’s Going On? (in pronunciation)

Page 361: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration and Number of SegmentsFor syllables greater than a single segment there is relatively little difference

in duration as the number of segments (within a syllable) increases

Canonical Syllable Forms

Page 362: Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 steveng

Syllable Duration and Number of SegmentsFor syllables greater than a single segment there is relatively little difference

in duration as the number of segments (within a syllable) increases

Suggesting that syllable duration is largely controlled by processes independent of segmental production

Canonical Syllable Forms