Speech Morphing 2 Info]

8/6/2019 Speech Morphing 2 Info]

1/24

8/19/2011 1

Voice Transformation :Voice Transformation :

Speech MorphingSpeech Morphing

Gidon Porat and Yizhar Lavner

SIPL Technion IITDecember 1 2002


2/24

8/19/2011 2

Project GoalsProject Goals

Gradually change a source speakers

voice, to sound like the voice of a target

speaker.

The inputs : two reference voice signals,

one for each speaker.

The output : N voice signals that gradually

change from source to target.

source targetinterp


3/24

8/19/2011 3

ApplicationsApplications

Multimedia and video entertainment: voiceMorphing , just like its face counterpart.

While seeing a face gradually changing from oneperson to anothers (like often done in videoclips) ,we could simultaneously hear his voicechanging as well.

Forensic voice identification by synthesis:

Identifying a suspect voice by creating voice-bank of different pitch, rate and timbre. Similarmethod was developed for face recognition toreplace police sketch artists.


4/24

8/19/2011 4

The ChallengesThe Challenges The source and target voice signals will

never be of the same length. A time

varying method is needed.

It is needed to change the source voicescharacteristics to those of the target

speaker : pitch,duration,spectralparameters.

Produce a natural sounding speechsignal.

source target


5/24

8/19/2011 5

The Challenges cont.The Challenges cont.Here are two identical words (shade) form source and target

The target

speakers word

lasts longer

then the source

speakers and

its shape is

quite different.


6/24

8/19/2011 6

Speech ModelingSpeech Modeling

A1 A2 A3 Ak

Sound transmission in the vocal tract can be modeled

as sound passing through concatenated loss less

acoustic tubes.

A known mathematical relation between the areas of

these tubes and the vocal tracts filter, will help in the

implementation of our algorithm


7/24

8/19/2011 7

Speech ModelingSpeech Modeling -- SynthesisSynthesis

The basic synthesis of digital speech is done by the discrete-

time system model.


8/24

8/19/2011 8

sample

align

interpolate

Prototype WaveformPrototype Waveform

InterpolationInterpolation

PWI is a speech coding

method, which is based on the

presentation of a speechsignal, or its residual error

function, by a 3-D surface.

The creation of such a surface

is described by 3 main steps.


9/24

8/19/2011 9

Surface Construction AlgorithmSurface Construction Algorithm


10/24

8/19/2011 10

The SolutionThe SolutionUse 3D surfaces that will capture each vocal

phonemes residual error signal

characteristics and interpolate betweenthe two speakers.

Unvoiced phonemes are not dealt with due

to complexity and the fact that they carry

little information about the speaker.


11/24

8/19/2011 11

AlgorithmAlgorithm Block DiagramBlock Diagram


12/24

8/19/2011 12

The Algorithm

+

Intermediate surface

Speaker A

Speaker B

The new error function, that will

be reconstructed from that

surface, will then be the input of

a new Vocal Tract Filter.

Once the surfaces for bothsource and target speakers are

created (for each phoneme) an

interpolated surface is created.


13/24

8/19/2011 13

After creating a surface for each

speakers phoneme, an intermediate

surface is created.

The WaveformThe Waveform IntermediateIntermediate

( , ) ( , ) (1 ) ( , )new s t u t u t u t J E F J E K J !

( , ) : { : [ 0 ] ; : [ 0 2 ] }( , ) : { : [ 0 ] ; : [ 0 2 ] }

( , ) : { : [ 0 ] ; : [ 0 2 ] }

/ ; /

s s

t t

n e w n e w

n e w s n e w t

u t t T u t t T

u t t T

T T T T

J J TJ J T

J J T

F K

! !


14/24

8/19/2011 14

The WaveformThe Waveform -- ReconstructionReconstructionThe new, intermediate error signal can be evaluated from the

new surface by the equation :

And that :

( ) ( , ( )), [0 : ]new new new newe t u t t t T J! !

0

'

0 '

2( ) ( )

( )

t

new

newt

t t dt p t

TJ J!

( )ne tJ

Assuming the pitch cycle changes slowly in time :

( ) ( ) (1 ) ( )new s t p t p t p t E F E K !


15/24

8/19/2011 15

AlgorithmAlgorithm Cont.Cont.

The areas of the new

tube model will be an

interpolation

between thesources and the

targets.

Once the new Areas

are computed the

LPC parameters and

V(z) can be

calculated, and the

signal can be

synthesized.1 2 3 4 5 6 7( , , , , , , )a a a a a a a


16/24

8/19/2011 16

New Voiced Phoneme CreationNew Voiced Phoneme Creation

Block DiagramBlock Diagram


17/24

8/19/2011 17

The transfer functionThe transfer functionThe factor could be invariant, and then the voice produced

will be an intermediate between the two speakers, or it

could vary in time, (from at t=0 to at t=T), yielding agradual change from one voice to the other.

In order for one to hear a linear change between the

sources and the targets voices, the coefficient, (the relative

part of ) has to vary in time nonlinearly with afast transition to a = 0.5 and a slow transitions around it.

E

( , )su t J


18/24

8/19/2011 18

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0

- 0 . 0 3

- 0 . 0 2

- 0 . 0 1

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

The AlgorithmThe Algorithm cont.cont.The final and new speech signal is created by concatenating

the new vocal phonemes, in order, along with the

sources/targets unvoiced phonemes and silent periods.

1

silent

2voiced

3

unvoiced

4

voiced


19/24

8/19/2011 19

ConclusionConclusionThe utterances produced by the algorithm were

shown to consist of intermediate features of the two

speakers. Both intermediate sounds:

And gradually changing sounds were produced.


20/24

8/19/2011 20

Algorithms limitationsAlgorithms limitationsAlthough most of the speakersindividuality is concentrated in the

voiced speech segments,degradation could be noticed, wheninterpolating between two speechsignals that differ greatly in their

unvoiced speech segments, such asheavy breathing, long periods ofsilence, etc.


21/24

8/19/2011 21

What has been done in theWhat has been done in the

second part of this project?second part of this project? Basic implementation of reconstruction algorithm.

Interpolation between two surfaces.

Final implementation of the algorithm.

Fixes in the algorithm such as:

Maximization of the crosscoralation between

surfaces. Modifications to the morphing factor.


22/24

8/19/2011 22

Future work proposalsFuture work proposals The effect of the unvoiced/silent segments of

speech on voice individuality.

The effect of the Vocal Tracts formants shift

around the unit cycle on the humans ear, in

order to find a robust mapping of the

transform function alpha.


23/24

8/19/2011 23

AppendixAppendix -- EquationsEquations( , ) ( , ) (1 ) ( , )new s t alignedu t u t u t J E F J E K J !

( , ) ( , )t aligned t nu t u t J J J ! 2

0 0

( , ) ( , )

arg max { }

( , )e

ne

s t e

t

n

s t e

u t u t dtd

u u t

T

J

J

J J J J

J

J J

! !

!

( ) ( ) (1 ) ( )new s t p t p t p t E F E K !


24/24

8/19/2011 24

AppendixAppendix transfer function.transfer function.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Documents

Speech Morphing 2 Info]