Upload
atul-kumar-verma
View
218
Download
0
Embed Size (px)
Citation preview
8/6/2019 Speech Morphing 2 Info]
1/24
8/19/2011 1
Voice Transformation :Voice Transformation :
Speech MorphingSpeech Morphing
Gidon Porat and Yizhar Lavner
SIPL Technion IITDecember 1 2002
8/6/2019 Speech Morphing 2 Info]
2/24
8/19/2011 2
Project GoalsProject Goals
Gradually change a source speakers
voice, to sound like the voice of a target
speaker.
The inputs : two reference voice signals,
one for each speaker.
The output : N voice signals that gradually
change from source to target.
source targetinterp
8/6/2019 Speech Morphing 2 Info]
3/24
8/19/2011 3
ApplicationsApplications
Multimedia and video entertainment: voiceMorphing , just like its face counterpart.
While seeing a face gradually changing from oneperson to anothers (like often done in videoclips) ,we could simultaneously hear his voicechanging as well.
Forensic voice identification by synthesis:
Identifying a suspect voice by creating voice-bank of different pitch, rate and timbre. Similarmethod was developed for face recognition toreplace police sketch artists.
8/6/2019 Speech Morphing 2 Info]
4/24
8/19/2011 4
The ChallengesThe Challenges The source and target voice signals will
never be of the same length. A time
varying method is needed.
It is needed to change the source voicescharacteristics to those of the target
speaker : pitch,duration,spectralparameters.
Produce a natural sounding speechsignal.
source target
8/6/2019 Speech Morphing 2 Info]
5/24
8/19/2011 5
The Challenges cont.The Challenges cont.Here are two identical words (shade) form source and target
The target
speakers word
lasts longer
then the source
speakers and
its shape is
quite different.
8/6/2019 Speech Morphing 2 Info]
6/24
8/19/2011 6
Speech ModelingSpeech Modeling
A1 A2 A3 Ak
Sound transmission in the vocal tract can be modeled
as sound passing through concatenated loss less
acoustic tubes.
A known mathematical relation between the areas of
these tubes and the vocal tracts filter, will help in the
implementation of our algorithm
8/6/2019 Speech Morphing 2 Info]
7/24
8/19/2011 7
Speech ModelingSpeech Modeling -- SynthesisSynthesis
The basic synthesis of digital speech is done by the discrete-
time system model.
8/6/2019 Speech Morphing 2 Info]
8/24
8/19/2011 8
sample
align
interpolate
Prototype WaveformPrototype Waveform
InterpolationInterpolation
PWI is a speech coding
method, which is based on the
presentation of a speechsignal, or its residual error
function, by a 3-D surface.
The creation of such a surface
is described by 3 main steps.
8/6/2019 Speech Morphing 2 Info]
9/24
8/19/2011 9
Surface Construction AlgorithmSurface Construction Algorithm
8/6/2019 Speech Morphing 2 Info]
10/24
8/19/2011 10
The SolutionThe SolutionUse 3D surfaces that will capture each vocal
phonemes residual error signal
characteristics and interpolate betweenthe two speakers.
Unvoiced phonemes are not dealt with due
to complexity and the fact that they carry
little information about the speaker.
8/6/2019 Speech Morphing 2 Info]
11/24
8/19/2011 11
AlgorithmAlgorithm Block DiagramBlock Diagram
8/6/2019 Speech Morphing 2 Info]
12/24
8/19/2011 12
The Algorithm
+
Intermediate surface
Speaker A
Speaker B
The new error function, that will
be reconstructed from that
surface, will then be the input of
a new Vocal Tract Filter.
Once the surfaces for bothsource and target speakers are
created (for each phoneme) an
interpolated surface is created.
8/6/2019 Speech Morphing 2 Info]
13/24
8/19/2011 13
After creating a surface for each
speakers phoneme, an intermediate
surface is created.
The WaveformThe Waveform IntermediateIntermediate
( , ) ( , ) (1 ) ( , )new s t u t u t u t J E F J E K J !
( , ) : { : [ 0 ] ; : [ 0 2 ] }( , ) : { : [ 0 ] ; : [ 0 2 ] }
( , ) : { : [ 0 ] ; : [ 0 2 ] }
/ ; /
s s
t t
n e w n e w
n e w s n e w t
u t t T u t t T
u t t T
T T T T
J J TJ J T
J J T
F K
! !
8/6/2019 Speech Morphing 2 Info]
14/24
8/19/2011 14
The WaveformThe Waveform -- ReconstructionReconstructionThe new, intermediate error signal can be evaluated from the
new surface by the equation :
And that :
( ) ( , ( )), [0 : ]new new new newe t u t t t T J! !
0
'
0 '
2( ) ( )
( )
t
new
newt
t t dt p t
TJ J!
( )ne tJ
Assuming the pitch cycle changes slowly in time :
( ) ( ) (1 ) ( )new s t p t p t p t E F E K !
8/6/2019 Speech Morphing 2 Info]
15/24
8/19/2011 15
AlgorithmAlgorithm Cont.Cont.
The areas of the new
tube model will be an
interpolation
between thesources and the
targets.
Once the new Areas
are computed the
LPC parameters and
V(z) can be
calculated, and the
signal can be
synthesized.1 2 3 4 5 6 7( , , , , , , )a a a a a a a
8/6/2019 Speech Morphing 2 Info]
16/24
8/19/2011 16
New Voiced Phoneme CreationNew Voiced Phoneme Creation
Block DiagramBlock Diagram
8/6/2019 Speech Morphing 2 Info]
17/24
8/19/2011 17
The transfer functionThe transfer functionThe factor could be invariant, and then the voice produced
will be an intermediate between the two speakers, or it
could vary in time, (from at t=0 to at t=T), yielding agradual change from one voice to the other.
In order for one to hear a linear change between the
sources and the targets voices, the coefficient, (the relative
part of ) has to vary in time nonlinearly with afast transition to a = 0.5 and a slow transitions around it.
E
( , )su t J
8/6/2019 Speech Morphing 2 Info]
18/24
8/19/2011 18
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0
- 0 . 0 3
- 0 . 0 2
- 0 . 0 1
0
0 . 0 1
0 . 0 2
0 . 0 3
0 . 0 4
The AlgorithmThe Algorithm cont.cont.The final and new speech signal is created by concatenating
the new vocal phonemes, in order, along with the
sources/targets unvoiced phonemes and silent periods.
1
silent
2voiced
3
unvoiced
4
voiced
8/6/2019 Speech Morphing 2 Info]
19/24
8/19/2011 19
ConclusionConclusionThe utterances produced by the algorithm were
shown to consist of intermediate features of the two
speakers. Both intermediate sounds:
And gradually changing sounds were produced.
8/6/2019 Speech Morphing 2 Info]
20/24
8/19/2011 20
Algorithms limitationsAlgorithms limitationsAlthough most of the speakersindividuality is concentrated in the
voiced speech segments,degradation could be noticed, wheninterpolating between two speechsignals that differ greatly in their
unvoiced speech segments, such asheavy breathing, long periods ofsilence, etc.
8/6/2019 Speech Morphing 2 Info]
21/24
8/19/2011 21
What has been done in theWhat has been done in the
second part of this project?second part of this project? Basic implementation of reconstruction algorithm.
Interpolation between two surfaces.
Final implementation of the algorithm.
Fixes in the algorithm such as:
Maximization of the crosscoralation between
surfaces. Modifications to the morphing factor.
8/6/2019 Speech Morphing 2 Info]
22/24
8/19/2011 22
Future work proposalsFuture work proposals The effect of the unvoiced/silent segments of
speech on voice individuality.
The effect of the Vocal Tracts formants shift
around the unit cycle on the humans ear, in
order to find a robust mapping of the
transform function alpha.
8/6/2019 Speech Morphing 2 Info]
23/24
8/19/2011 23
AppendixAppendix -- EquationsEquations( , ) ( , ) (1 ) ( , )new s t alignedu t u t u t J E F J E K J !
( , ) ( , )t aligned t nu t u t J J J ! 2
0 0
( , ) ( , )
arg max { }
( , )e
ne
s t e
t
n
s t e
u t u t dtd
u u t
T
J
J
J J J J
J
J J
! !
!
( ) ( ) (1 ) ( )new s t p t p t p t E F E K !
8/6/2019 Speech Morphing 2 Info]
24/24
8/19/2011 24
AppendixAppendix transfer function.transfer function.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1