Upload
felix-bell
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 1
ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No 00574 (Th.P3.17)
Harmonic Plus Noise Model Based Speech Synthesis in
Hindi and Pitch ModificationBy
P.K. LehanaP.C. Pandey
IIT Bombay, Indiahttp://www.ee.iitb.ac.in
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 2
ABSTRACTIn harmonic plus noise model (HNM), each segment of speech is modeled as two bands: a lower "harmonic" part represented as amplitudes and phases of the harmonics of a fundamental and an upper noise part using an all-pole filter excited by random white noise, with dynamically varying band boundary. HNM based synthesis can be used for good quality output with relatively small number of parameters and it permits pitch and time scaling without explicit estimation of vocal tract parameters. We have investigated its use for synthesis in Hindi which has aspirated stops and lacks voiced fricatives. It was found that good quality synthesis could be carried out, including those of aspirated stops. The upper band of HNM was needed only for the palatal and alveolar fricatives. Sensitivity of output quality to the errors in glottal closure instants was studied and random perturbations exceeding 4% of the local pitch period resulted in noticeable degradation. Synthesis with pitch scaling showed that the frequency scale of the amplitudes and phases of the harmonics of the original signal needed to be modified by a speaker dependent warping function, obtained by studying the relationship between pitch frequency and formant frequencies for the three cardinal vowels spoken with different pitches.
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 3
OVERVIEW
Introduction
Harmonic plus noise model (HNM)
Methodology
Results
Conclusions
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 4
INTRODUCTION
Research Objective
Use of harmonic plus noise model (HNM) based pitch synchronous synthesis to study:
• Speech synthesis with phoneme sets in Hindi(“aspiration” a feature for stops)
• Effect of perturbations in glottal crossing instants(GCI’s) on speech quality, by using electroglottogram(EGG) for accurate specifications of GCI’s
• Speaker modification
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 5
HARMONIC PLUS NOISE MODEL (1/3)
Harmonic plus Noise Model (HNM) of Speech(Stylianou, 1995; 2001)
Speech signal divided into:• Harmonic part
• Noise part
Parameters:• Max. voiced frequency • V/UV & pitch• Harm. ampl. & phases • Noise parameters
( )
0( ) Re ( )exp{ [ ( ) ]}L t t
l ll o
s t a t j l d
( ) ( )[ ( ; )* ( )]n t w t h t b t
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 6
HARMONIC PLUS NOISE MODEL (2/3)
Analysis
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 7
HARMONIC PLUS NOISE MODEL (3/3)
Synthesis
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 8
METHODOLOGY (1/3)
Synthesis with Hindi Phoneme SetsMaterial: Recordings of speech and electroglottogram (EGG) for • Isolated Vowels • Syllables • Words • Sentences
Processing: Analysis/synthesis of recorded material using HNM
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 9
METHODOLOGY (2/3)
Effect of Pitch Perturbation on Speech Quality
Material: 2-channel recording of vowels for male and female speakers• Speech signal • EGG from imp. glottograph
Processing: – Estimation of pitch periods from• speech signal • EGG – Analysis of vowels for HNM parameters– Resynthesis, with 0 - 20 % perturbation in GCIs – Assessment of quality of resynthesized vowels
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 10
METHODOLOGY (3/3)
Spectral Modifications
Material: Sustained vowels at different notes by male and female speakers
Processing:• Study of F0 & formants in cardinal vowels• Formant synthesis after interchanging the notes• Scaling of HNM parameters by pitch-scaling factor
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 11
RESULTS (1/7)
ANALYSIS/SYNTHESIS RESULTS
Synthesis with Hindi Phoneme Sets
• All vowels and VCV natural & intelligible,
if synthesized using harmonic part only,
except /a∫a/ and /asa/ which require the noise part also.
• GCIs obtained from glottal signal (EGG) give better synthesis.
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 12
RESULTS (2/7)
Example1: Synthesis of /ata/
Recorded
Synth. (H)
Synth. (H+N)
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 13
RESULTS (3/7)
Example2: Synthesis of /a∫a/
Recorded
Synth. (H)
Synth. (H+N)
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 14
RESULTS (4/7)
Effect of Pitch Perturbation, Example: vowel /a/
Recorded
GCIs from Speech GCIs from EGG
Syn.
Syn. With(GCI) < 4%
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 15
RESULTS (5/7)
Effect of Pitch Perturbation on Vowel Quality
Quality (GCI)
GCI’s from Speech GCI’s from EGG
Acceptable < 4 % < 6 %
Noticeable degradation
4 - 8 % 6 - 10 %
Unacceptable > 8 % > 10 %
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 16
RESULTS (6/7)
F0 and Formant Relations • F1 monotonically increases with F0.
• Interchanging the notes results in unnatural output -> proper relation between F0 and F’s necessary.
• Speaker dependent relationship between F0 and formants.
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 17
RESULTS (7/7)
Scaling of HNM parameters by speaker dependent scaling factor gives more natural o/p.
Quality Scaling factor
Natural < 1.5
Degradation 1.5-2.0
Unacceptable > 2.0
Recorded /a/ ( male, 117Hz) /a/ pitch-scaled by 2.1
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 18
CONCLUSIONS
Conclusions
HNM based synthesis provided good quality
synthesis in Hindi.
GCI perturbations > 4 % → quality degradation.
GCIs from EGG → better output, indicating HNM’s
sensitivity to pitch estimation errors.
Modest pitch modification possible with
linear frequency scaling of HNM parameters
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 19
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 20
ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No 00574 (Th.P3.17)
Harmonic PlusBased Speech
Hindi and Pitch
IIT
Bo
mb
ay
pc
pa
nd
ey
@e
e.i
itb
.ac
.in
ICA
20
04
, Kyo
to, J
apan
, Apr
i l 4
- 9, 2
004
• Introdn •HNM •Methodology •Results •Conclusions 21
Noise ModelSynthesis inModification