22
I I T B o m b a y p c p a n d e y @ e e . i i t b . a c . i n I C A 2 0 0 4 , K y o t o , J a p a n , A p r i l 4 - 9 , 2 0 0 4 Introdn HNM •Methodology •Results •Conclusions 1 ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No 00574 (Th.P3.17) Harmonic Plus Noise Model Based Speech Synthesis in Hindi and Pitch Modification By P.K. Lehana P.C. Pandey

IIT Bombay [email protected] ICA 2004, Kyoto, Japan, April 4 - 9, 2004 Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults

Embed Size (px)

Citation preview

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 1

ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No 00574 (Th.P3.17)

Harmonic Plus Noise Model Based Speech Synthesis in

Hindi and Pitch ModificationBy

P.K. LehanaP.C. Pandey

IIT Bombay, Indiahttp://www.ee.iitb.ac.in

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 2

ABSTRACTIn harmonic plus noise model (HNM), each segment of speech is modeled as two bands: a lower "harmonic" part represented as amplitudes and phases of the harmonics of a fundamental and an upper noise part using an all-pole filter excited by random white noise, with dynamically varying band boundary. HNM based synthesis can be used for good quality output with relatively small number of parameters and it permits pitch and time scaling without explicit estimation of vocal tract parameters. We have investigated its use for synthesis in Hindi which has aspirated stops and lacks voiced fricatives. It was found that good quality synthesis could be carried out, including those of aspirated stops. The upper band of HNM was needed only for the palatal and alveolar fricatives. Sensitivity of output quality to the errors in glottal closure instants was studied and random perturbations exceeding 4% of the local pitch period resulted in noticeable degradation. Synthesis with pitch scaling showed that the frequency scale of the amplitudes and phases of the harmonics of the original signal needed to be modified by a speaker dependent warping function, obtained by studying the relationship between pitch frequency and formant frequencies for the three cardinal vowels spoken with different pitches.

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 3

OVERVIEW

Introduction

Harmonic plus noise model (HNM)

Methodology

Results

Conclusions

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 4

INTRODUCTION

Research Objective

Use of harmonic plus noise model (HNM) based pitch synchronous synthesis to study:

• Speech synthesis with phoneme sets in Hindi(“aspiration” a feature for stops)

• Effect of perturbations in glottal crossing instants(GCI’s) on speech quality, by using electroglottogram(EGG) for accurate specifications of GCI’s

• Speaker modification

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 5

HARMONIC PLUS NOISE MODEL (1/3)

Harmonic plus Noise Model (HNM) of Speech(Stylianou, 1995; 2001)

Speech signal divided into:• Harmonic part

• Noise part

Parameters:• Max. voiced frequency • V/UV & pitch• Harm. ampl. & phases • Noise parameters

( )

0( ) Re ( )exp{ [ ( ) ]}L t t

l ll o

s t a t j l d

( ) ( )[ ( ; )* ( )]n t w t h t b t

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 6

HARMONIC PLUS NOISE MODEL (2/3)

Analysis

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 7

HARMONIC PLUS NOISE MODEL (3/3)

Synthesis

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 8

METHODOLOGY (1/3)

Synthesis with Hindi Phoneme SetsMaterial: Recordings of speech and electroglottogram (EGG) for • Isolated Vowels • Syllables • Words • Sentences

Processing: Analysis/synthesis of recorded material using HNM

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 9

METHODOLOGY (2/3)

Effect of Pitch Perturbation on Speech Quality

Material: 2-channel recording of vowels for male and female speakers• Speech signal • EGG from imp. glottograph

Processing: – Estimation of pitch periods from• speech signal • EGG – Analysis of vowels for HNM parameters– Resynthesis, with 0 - 20 % perturbation in GCIs – Assessment of quality of resynthesized vowels

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 10

METHODOLOGY (3/3)

Spectral Modifications

Material: Sustained vowels at different notes by male and female speakers

Processing:• Study of F0 & formants in cardinal vowels• Formant synthesis after interchanging the notes• Scaling of HNM parameters by pitch-scaling factor

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 11

RESULTS (1/7)

ANALYSIS/SYNTHESIS RESULTS

Synthesis with Hindi Phoneme Sets

• All vowels and VCV natural & intelligible,

if synthesized using harmonic part only,

except /a∫a/ and /asa/ which require the noise part also.

• GCIs obtained from glottal signal (EGG) give better synthesis.

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 12

RESULTS (2/7)

Example1: Synthesis of /ata/

Recorded

Synth. (H)

Synth. (H+N)

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 13

RESULTS (3/7)

Example2: Synthesis of /a∫a/

Recorded

Synth. (H)

Synth. (H+N)

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 14

RESULTS (4/7)

Effect of Pitch Perturbation, Example: vowel /a/

Recorded

GCIs from Speech GCIs from EGG

Syn.

Syn. With(GCI) < 4%

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 15

RESULTS (5/7)

Effect of Pitch Perturbation on Vowel Quality

Quality (GCI)

GCI’s from Speech GCI’s from EGG

Acceptable < 4 % < 6 %

Noticeable degradation

4 - 8 % 6 - 10 %

Unacceptable > 8 % > 10 %

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 16

RESULTS (6/7)

F0 and Formant Relations • F1 monotonically increases with F0.

• Interchanging the notes results in unnatural output -> proper relation between F0 and F’s necessary.

• Speaker dependent relationship between F0 and formants.

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 17

RESULTS (7/7)

Scaling of HNM parameters by speaker dependent scaling factor gives more natural o/p.

Quality Scaling factor

Natural < 1.5

Degradation 1.5-2.0

Unacceptable > 2.0

Recorded /a/ ( male, 117Hz) /a/ pitch-scaled by 2.1

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 18

CONCLUSIONS

Conclusions

HNM based synthesis provided good quality

synthesis in Hindi.

GCI perturbations > 4 % → quality degradation.

GCIs from EGG → better output, indicating HNM’s

sensitivity to pitch estimation errors.

Modest pitch modification possible with

linear frequency scaling of HNM parameters

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 19

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 20

ICA 2004, Kyoto, April 4-9, 2004 / Session: SPP02, Paper No 00574 (Th.P3.17)

Harmonic PlusBased Speech

Hindi and Pitch

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 21

Noise ModelSynthesis inModification

IIT

Bo

mb

ay

pc

pa

nd

ey

@e

e.i

itb

.ac

.in

ICA

20

04

, Kyo

to, J

apan

, Apr

i l 4

- 9, 2

004

• Introdn •HNM •Methodology •Results •Conclusions 22

By

P.K. LehanaP.C. Pandey

EE Dept, IIT Bombay, Indiahttp://www.ee.iitb.ac.in