Upload
isabelle-hopper
View
20
Download
1
Tags:
Embed Size (px)
DESCRIPTION
PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu. Center for Spoken Language Understanding OGI School of Science & Technology at OHSU. OVERVIEW. IMPORTANCE OF SPECTRAL BALANCE MEASUREMENT OF SPECTRAL BALANCE ANALYSIS METHODS - PowerPoint PPT Presentation
Citation preview
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1
PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL
BALANCE OF VOWELS
Jan P.H. van Santen and Xiaochuan Niu
Center for Spoken Language UnderstandingOGI School of Science & Technology at OHSU
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2
OVERVIEW
1. IMPORTANCE OF SPECTRAL BALANCE2. MEASUREMENT OF SPECTRAL BALANCE3. ANALYSIS METHODS4. RESULTS5. SYNTHESIS6. CONCLUSIONS
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3
1. IMPORTANCE OF SPECTRAL BALANCE
• Linguistic Control Factors– Stress-like factors– Positional factors– Phonemic factors
• Acoustic Correlates– Traditionally TTS-controlled:
• Pitch, timing, amplitude
– Demonstrated in natural speech, but usually not TTS-controlled:• Spectral tilt, balance• Formant dynamics• …
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4
2. MEASUREMENT OF SPECTRAL BALANCE
• Data:– 472 greedily selected sentences
• Genre: newspaper• Greedy features: linguistic control factors
– One female speaker– Manual segmentation– Accent: independent rating by 3 judges
• 0-3 score
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5
2. MEASUREMENT OF SPECTRAL BALANCE
• Energy in 5 formant-range frequency bands– B0: 100-300 Hz [~F0]
– B1: 300-800 Hz [~F1]
– B2: 800-2500 Hz [~F2]
– B3: 2500-3500 Hz [~F3]
– B4: 3500- max Hz [~fricative noise]
• In other words, multidimensional measure• Filter bank Square
Average [1 ms rect.] 20 log10(Bi )
• Subtract estimated per-utterance means
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6
2. MEASUREMENT OF SPECTRAL BALANCE
• Details:– Confounding with F0
• Measure pitch-corrected and raw– For certain wave shapes, pitch directly related to fixed-frame
energy– Why do both: wave shapes may change in unknown ways
• F0 not confined to B0 [female speech]
– Vowel formants not quite confined to bands [e.g., F1 for /EE/ and F3 for /ER/]
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7
2. MEASUREMENT OF SPECTRAL BALANCE
• Why not more or different bands?– Multiple interacting Linguistic Control Factors
• Need measurements that minimize interactions
– 5 bands Different vowels “behave similarly”• Can model vowels as a class
• Why not simply spectral tilt?– 5 bands more information than single measure– Supply more information for synthesis
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8
3. ANALYSIS METHODS
• Measures likely to behave like segmental duration:– Multiple interacting, confounded factors:
• Interaction: Magnitude of effects on one factor may depend on other factors
• Confounding: Unequal frequencies of control factor combinations
– “Directional Invariance”• Direction of effects on one factor
independent of other factors
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9
3. ANALYSIS METHODS
• Need method that – can handle multiple interacting,
confounded factors and – takes advantage of Directional
Invariance:
• Used: Sums of Products Model:
Ki Ij
jjini
i
cSccB )(),...,( ,0
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 10
3. ANALYSIS METHODS
• Special cases:– Multiplicative model: K = {1}, I1 = {0,…,n}
)()(),...,( ,100,10 nnni cScSccB
)()(),...,( 1,01,00 nnni cScSccB
– Additive model: K = {0,…,n}, Ii = {i}
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 11
3. ANALYSIS METHODS
• Used additive model
• Note: Parameter estimates are:– Estimates of marginal means …– … in balanced design:
),...,,...,()( 0,...,,...,
1,00
niiCcccCc
ii cccBMeancSnnii
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 12
3. ANALYSIS METHODS
• Pitch correction:
)(log20)(log20 10010][
wici tfBB
• Confounding with F0: Show both
<B0, B1, B2, B3, B4> and:
<B0 + B1, B2, B3, B4>
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 13
4. RESULTS: (A) POSITIONAL EFFECTS
5 Bands, not pitch-correctedSolid: right position, dashed: left position. Y-axis: corrected mean
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 14
4. RESULTS: (A) POSITIONAL EFFECTS
5 Bands, pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 15
4. RESULTS: (A) POSITIONAL EFFECTS
4 Bands, not pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 16
4. RESULTS: (A) POSITIONAL EFFECTS
4 Bands, pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 17
4. RESULTS: (B) STRESS/ACCENT EFFECTS
5 Bands, not pitch-correctedSolid: stressed syllable, dashed: unstressed. Y-axis: corrected mean
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 18
4. RESULTS: (B) STRESS/ACCENT EFFECTS
5 Bands, pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 19
4. RESULTS: (B) STRESS/ACCENT EFFECTS
4 Bands, not pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 20
4. RESULTS: (B) STRESS/ACCENT EFFECTS
4 Bands, pitch-corrected
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 21
4. RESULTS: (C) TILT EFFECTS
4
3
2
1
0
)2,1,0,1,2(
B
B
B
B
B
Tilt
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 22
5. SYNTHESIS
• Use ABS/OLA sinusoidal model:s[n] = sum of overlapped short-time signal frames sk[n]
sk[n] = sum of quasi-harmonic sinusoidal components:
sk[n] lAk,l cos(k,l n + k,l
• Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters;
• Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter Ak,l ;
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 23
5. SYNTHESIS
• Considering the differences of prosody factors between original and target unit, band differences:
iii BB ˆ
• Transform the band difference into weights applying to the sinusoidal parameters:
i
2010 iiw
• ,when the j’th harmonic is located in
the i'th band;ikjkj wAA
• Spectral smoothing across unit boundaries.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 24
5. SYNTHESIS
5 Bands modification example [i:]
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25
CONCLUSIONS
• Described simple methods for predicting and synthesizing spectral balance
• But: Spectral balance is only one “non-standard acoustic correlate”
• Others that remain to be addressed:– Spectral dynamics– Phase