38
Sonorant Grab Bag March 27, 2014

Sonorant Grab Bag

  • Upload
    tawana

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Sonorant Grab Bag. March 27, 2014. Speech Synthesis: A Basic Overview. Speech synthesis is the generation of speech by machine. The reasons for studying synthetic speech have evolved over the years: Novelty To control acoustic cues in perceptual studies - PowerPoint PPT Presentation

Citation preview

Page 1: Sonorant Grab Bag

Sonorant Grab Bag

March 27, 2014

Page 2: Sonorant Grab Bag

Speech Synthesis:A Basic Overview

• Speech synthesis is the generation of speech by machine.

• The reasons for studying synthetic speech have evolved over the years:

1. Novelty

2. To control acoustic cues in perceptual studies

3. To understand the human articulatory system

• “Analysis by Synthesis”

4. Practical applications

• Reading machines for the blind, navigation systems

Page 3: Sonorant Grab Bag

Speech Synthesis:A Basic Overview

• There are four basic types of synthetic speech:

1. Mechanical synthesis

2. Formant synthesis

• Based on Source/Filter theory

3. Concatenative synthesis

• = stringing bits and pieces of natural speech together

4. Articulatory synthesis

• = generating speech from a model of the vocal tract.

Page 4: Sonorant Grab Bag

1. Mechanical Synthesis• The very first attempts to produce synthetic speech were made without electricity.

• = mechanical synthesis

• In the late 1700s, models were produced which used:

• reeds as a voicing source

• differently shaped tubes for different vowels

Page 5: Sonorant Grab Bag

Mechanical Synthesis, part II• Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device…

• with independently manipulable source and filter mechanisms.

Page 6: Sonorant Grab Bag

Mechanical Synthesis, part III• An interesting historical footnote:

• Alexander Graham Bell and his “questionable” experiments with his dog.

• Mechanical synthesis has largely gone out of style ever since.

• …but check out Mike Brady’s talking robot.

Page 7: Sonorant Grab Bag

The Voder• The next big step in speech synthesis was to generate speech electronically.

• This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder.

• The Voder was a manually controlled speech synthesizer.

• (operated by highly trained young women)

Page 8: Sonorant Grab Bag

Voder Principles• The Voder basically operated like a vocoder.

• Voicing and fricative source sounds were filtered by 10 different resonators…

• each controlled by an individual finger!

• Only about 1 in 10 had the ability to learn how to play the Voder.

Page 9: Sonorant Grab Bag

Overtone Singing• F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate.

• What kind of voice quality would be conducive to this?

Page 10: Sonorant Grab Bag

Vowels and Sonorants• So far, we’ve talked a lot about the acoustics of vowels:

• Source: periodic openings and closings of the vocal folds.

• Filter: characteristic resonant frequencies of the vocal tract (above the glottis)

• Today, we’ll talk about the acoustics of sonorants:

• Nasals

• Laterals

• Approximants

• The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.

Page 11: Sonorant Grab Bag

Damping• One interesting acoustic property exhibited by (some) sonorants is damping.

• Recall that resonance occurs when:

• a sound wave travels through an object

• that sound wave is reflected...

• ...and reinforced, on a periodic basis

• The periodic reinforcement sets up alternating patterns of high and low air pressure

• = a standing wave

Page 12: Sonorant Grab Bag

Resonance in a closed tube

t

i

m

e

Page 13: Sonorant Grab Bag

Damping, schematized• In a closed tube:

• With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out.

• Why?

• The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.

Page 14: Sonorant Grab Bag

Damping Comparison• A heavily damped wave wil die out more quickly...

• Than a lightly damped wave:

Page 15: Sonorant Grab Bag

Damping Factors• The amount of damping in a tube is a function of:

• The volume of the tube

• The surface area of the tube

• The material of which the tube is made

• More volume, more surface area = more damping

• Think about the resonant characteristics of:

• a Home Depot

• a post-modern restaurant

• a movie theater

• an anechoic chamber

Page 16: Sonorant Grab Bag

An Anechoic Chamber

Page 17: Sonorant Grab Bag

Resonance and Recording• Remember: any room will reverberate at its characteristic resonant frequencies

• Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation

• Examples:

• Classroom recording (29 dB signal-to-noise ratio)

• “Soundproof” booth (44 dB SNR)

• Anechoic chamber (90 dB SNR)

Page 18: Sonorant Grab Bag

Spectrograms

classroom

“soundproof” booth

Page 19: Sonorant Grab Bag

Spectrograms

anechoic chamber

Page 20: Sonorant Grab Bag

Inside Your Nose• In nasals, air flows through the nasal cavities.

• The resonating “filter” of nasal sounds therefore has:

• increased volume

• increased surface area

• increased damping

• Note:

• the exact size and shape of the nasal cavities varies wildly from speaker to speaker.

Page 21: Sonorant Grab Bag

Nasal Variability• Measurements based on MRI data (Dang et al., 1994)

Page 22: Sonorant Grab Bag

Damping Effects, part 1

[m] [m]

• Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose.

Page 23: Sonorant Grab Bag

Damping Effects, part 2• How might the power spectrum of an undamped wave:

• Compare to that of a damped wave?

• A: Undamped waves have only one component;

• Damped waves have a broader range of components.

Page 24: Sonorant Grab Bag

100 Hz sinewave

90 Hz sinewave

110 Hz sinewave

+

+

Here’s Why

Page 25: Sonorant Grab Bag

The Result

90 Hz +

100 Hz +

110 Hz

• If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:

Page 26: Sonorant Grab Bag

Damping Spectra

light

medium

Page 27: Sonorant Grab Bag

Damping Spectra

heavy

• Damping increases the bandwidth of the resonating filter.

• Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output.

• Nasal formants will have a larger bandwidth than vowel formants.

Page 28: Sonorant Grab Bag

Bandwidth in Spectrograms

The formants in nasals have increased bandwidth, in comparison to the formants in vowels.

F3 of [m] F3 of

Page 29: Sonorant Grab Bag

Nasal Formants• The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube.

• fn = (2n - 1) * c

4L

• The simplest case: uvular nasal .

• The length of the tube is a combination of:

• distance from glottis to uvula (9 cm)

• distance from uvula to nares (12.5 cm)

• An average tube length (for adult males): 21.5 cm

Page 30: Sonorant Grab Bag

The Math

12.5 cm

9 cm

fn = (2n - 1) * c

4L

L = 21.5 cm

c = 35000 cm/sec

F1 = 35000

86

= 407 Hz

F2 = 1221 Hz

F3 = 2035 Hz

Page 31: Sonorant Grab Bag

The Real Thing• Check out Peter’s production of an uvular nasal in Praat.

• And also Dustin’s neutral vowel!

• Note: the higher formants are low in amplitude

• Some reasons why:

• Overall damping

• “Nostril-rounding” reduces intensity

• Resonance is lost in the side passages of the sinuses.

• Nasal stops with fronter places of articulation also have anti-formants.

Page 32: Sonorant Grab Bag

Anti-Formants• For nasal stops, the occlusion in the mouth creates a side cavity.

• This side cavity resonates at particular frequencies.

• These resonances absorb acoustic energy in the system.

• They form anti-formants

Page 33: Sonorant Grab Bag

Anti-Formant Math• Anti-formant resonances are based on the length of the vocal tract tube.

• For [m], this length is about 8 cm. 8 cm

• fn = (2n - 1) * c

4LL = 8 cm

AF1 = 35000 / 4*8 = 1094 Hz

AF2 = 3281 Hz

etc.

Page 34: Sonorant Grab Bag

Spectral Signatures• In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies.

anti-formants

Page 35: Sonorant Grab Bag

Nasal Place Cues• At more posterior places of articulation, the “anti-resonating” tube is shorter.

• anti-formant frequencies will be higher.

• for [n], L = 5.5 cm

• AF1 = 1600 Hz

• AF2 = 4800 Hz

• for , L = 3.3 cm

• AF1 = 2650 Hz

• for , L = 2.3 cm

• AF1 = 3700 Hz

Page 36: Sonorant Grab Bag

[m] vs. [n]

• Production of [meno], by a speaker of Tsonga

• Tsonga is spoken in South Africa and Mozambique

[m] [e] [n] [o]

AF1 (m)

AF1 (n)

Page 37: Sonorant Grab Bag

Nasal Stop Acoustics: Summary

• Here’s the general pattern of what to look for in a spectrogram for nasals:

1. Periodic voicing.

2. Overall amplitude lower than in vowels.

3. Formants (resonance).

4. Formants have broad bandwidths.

5. Low frequency first formant.

6. Less space between formants.

7. Higher formants have low amplitude.

Page 38: Sonorant Grab Bag

Perceiving Nasal Place• Nasal “murmurs” do not provide particularly strong cues to place of articulation.

• Can you identify the following as [m], [n] or ?

• Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time.

• Transitions provide important place cues for nasals.

• Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel.

• Can you identify these nasal + transition combos?