54
CS578- Speech Signal Processing Lecture 2: Production and Classification of Speech Sounds Yannis Stylianou University of Crete, Computer Science Dept., Multimedia Informatics Lab [email protected] Univ. of Crete

CS578- Speech Signal Processing

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS578- Speech Signal Processing

CS578- Speech Signal ProcessingLecture 2: Production and Classification of Speech

Sounds

Yannis Stylianou

University of Crete, Computer Science Dept., Multimedia Informatics [email protected]

Univ. of Crete

Page 2: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 3: CS578- Speech Signal Processing

A Simple view

Page 4: CS578- Speech Signal Processing

Cross sectional view

Page 5: CS578- Speech Signal Processing

Downward-looking into the larynx: Vocalfolds

Left: Voicing, Right: Breathing

Page 6: CS578- Speech Signal Processing

Vocal Folds vibration

Page 7: CS578- Speech Signal Processing

Bernoulli’s principle in the glottis

Page 8: CS578- Speech Signal Processing

Glottal airflow velocity

Page 9: CS578- Speech Signal Processing

Softer, typical, and relaxed glottal flow

Page 10: CS578- Speech Signal Processing

Other vocal folds configurations

Left: Whispering, Middle: Voicing Right: Whispering voicing

Page 11: CS578- Speech Signal Processing

Other forms of vibration

Creaky voice:

vocal folds very tenseonly a portion of them in oscillation

harsh-sounding voicehigh and irregular pitch

Vocal fry

folds are massy and relaxedabnormally low and irregular pitch

secondary pulses during open phase

Diplophonia

extra flapssecondary pulses during the closed phase

Page 12: CS578- Speech Signal Processing

Other forms of vibration

Creaky voice:

vocal folds very tenseonly a portion of them in oscillation

harsh-sounding voicehigh and irregular pitch

Vocal fry

folds are massy and relaxedabnormally low and irregular pitch

secondary pulses during open phase

Diplophonia

extra flapssecondary pulses during the closed phase

Page 13: CS578- Speech Signal Processing

Other forms of vibration

Creaky voice:

vocal folds very tenseonly a portion of them in oscillation

harsh-sounding voicehigh and irregular pitch

Vocal fry

folds are massy and relaxedabnormally low and irregular pitch

secondary pulses during open phase

Diplophonia

extra flapssecondary pulses during the closed phase

Page 14: CS578- Speech Signal Processing

Examples

Upper panel: vocal fry, Lower panel: diplophonia

Page 15: CS578- Speech Signal Processing

Vocal Tract

By saying Vocal Tract we mean:

Oral cavity: from the larynx to the lips, and the Nasal cavity

Oral tract: 17cm for male voice, shorter for females

Its purpose is to spectrally “color” the source and generatenew sources for sound production

Page 16: CS578- Speech Signal Processing

Vocat tract shapes

Page 17: CS578- Speech Signal Processing

Spectral Shaping

Vocal tract is often approximated by a linear filter with:

Formant frequencies

Formant amplitude

Formant bandwidth

Assuming a stable vocal tract and only with poles filter:

H(z) =A∏Ni

k=1(1− ckz−1)(1− c∗k z−1)

=

Ni∑k=1

Ak

(1− ckz−1)(1− c∗k z−1)

Page 18: CS578- Speech Signal Processing

Spectral Shaping

Vocal tract is often approximated by a linear filter with:

Formant frequencies

Formant amplitude

Formant bandwidth

Assuming a stable vocal tract and only with poles filter:

H(z) =A∏Ni

k=1(1− ckz−1)(1− c∗k z−1)

=

Ni∑k=1

Ak

(1− ckz−1)(1− c∗k z−1)

Page 19: CS578- Speech Signal Processing

Example

Let the excitation of vocal tract, h[n], be:

u[n] = g [n] ? p[n]

then, the output speech, x [n, τ ], is given by:

x [n, τ ] = w [n, τ ]{h[n] ? (g [n] ? p[n])}

and

X (ω, τ) =1

P

∞∑k=−∞

H(ωk)G (ωk)W (ω − ωk , τ)

Page 20: CS578- Speech Signal Processing

Harmonics and Formants

Page 21: CS578- Speech Signal Processing

Soprano

Page 22: CS578- Speech Signal Processing

Ways to categorize speech sounds

Vocal fold state:

VoicedUnvoiced

Oral tract state:

PlosivesFricatives

Also: voiced and unvoiced plosives (/b/,/t/), voiced and unvoicedfricatives (/z/,/f/), whispered unvoiced

Page 23: CS578- Speech Signal Processing

“Which tea party did baker go to?”

Page 24: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 25: CS578- Speech Signal Processing

Short Time Fourier Transform, STFT

STFT:

X (ω, τ) =∞∑

n=−∞x [n, τ ]e−jωn

wherex [n, τ ] = w [n, τ ]x [n]

Spectrogram:S(ω, τ) = |X (ω, τ)|2

Page 26: CS578- Speech Signal Processing

Short Time Fourier Transform, STFT

STFT:

X (ω, τ) =∞∑

n=−∞x [n, τ ]e−jωn

wherex [n, τ ] = w [n, τ ]x [n]

Spectrogram:S(ω, τ) = |X (ω, τ)|2

Page 27: CS578- Speech Signal Processing

Narrowband Spectrogram

Page 28: CS578- Speech Signal Processing

Wideband Spectrogram

Page 29: CS578- Speech Signal Processing

Spectrogram on speech

Page 30: CS578- Speech Signal Processing

Spectrogram on speech; another example

Page 31: CS578- Speech Signal Processing

Do we know better now?

to classify sounds by looking in time of in frequency domain for

periodic, noisy, impulsive sources?

shape of vocal tract?

Page 32: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 33: CS578- Speech Signal Processing

Phonemes’ map

Page 34: CS578- Speech Signal Processing

Vowels

Source: Quasi-periodic puffs of airflow

System: Each vowel phoneme corresponds to a differentvocal tract configuration.

Page 35: CS578- Speech Signal Processing

Vowels: time and spectrogram

Page 36: CS578- Speech Signal Processing

Nasals

Source: Quasi-periodic puffs of airflow

System: Air flows mainly through the nasal cavity and oraltract being constricted

Page 37: CS578- Speech Signal Processing

Nasals: time and spectrogram

Page 38: CS578- Speech Signal Processing

Fricatives

Source:Voiced: vocal-folds vibrateUnvoiced: vocal-folds are relaxed and not vibrating

System: Oral tract being constricted by tongue at the back,center, or front of the oral tract, or at the teeth or lips

Page 39: CS578- Speech Signal Processing

Fricatives’ profile

Page 40: CS578- Speech Signal Processing

Fricatives: time and spectrogram

Page 41: CS578- Speech Signal Processing

Plosives, or “burst” signals

Voiced:

Source: vocal folds are vibrating (“voice bar”)

System: Oral tract being constricted by tongue at the back,center, or front of the oral tract, or at the teeth or lips

Unvoiced:

Source: vocal folds are not vibrating

System: Oral tract being constricted by tongue at the back,center, or front of the oral tract, or at the teeth or lips

Page 42: CS578- Speech Signal Processing

Plosives’ profile

Page 43: CS578- Speech Signal Processing

Voice Onset Time

Page 44: CS578- Speech Signal Processing

Plosives: time and spectrogram

Page 45: CS578- Speech Signal Processing

Semi-vowels

Page 46: CS578- Speech Signal Processing

Transitional Speech Sounds: “boy”

Page 47: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 48: CS578- Speech Signal Processing

Prosody of speech

As prosody of speech we refer to:

Rhythm

Fundamental frequency contour (pitch)

Loudness

Page 49: CS578- Speech Signal Processing

Stressed speech

“Please do this today”:

Page 50: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 51: CS578- Speech Signal Processing

Perception of Speech

?

Page 52: CS578- Speech Signal Processing

Outline

1 Anatomy and Physiology of SpeechProduction

LarynxVocal TractCategories of sound by source

2 Spectrographic analysis of Speech

3 Elements of Language

4 Prosody of Speech

5 Perception of Speech

6 Acknowledgments

Page 53: CS578- Speech Signal Processing

Acknowledgments

Most, if not all, figures in this lecture are coming from the book:

T. F. Quatieri: Discrete-Time Speech Signal Processing,principles and practice

2002, Prentice Hall

and have been used after permission from Prentice Hall

Page 54: CS578- Speech Signal Processing