Speech Morphing 2 Info]

Embed Size (px)

Citation preview

  • 8/6/2019 Speech Morphing 2 Info]

    1/24

    8/19/2011 1

    Voice Transformation :Voice Transformation :

    Speech MorphingSpeech Morphing

    Gidon Porat and Yizhar Lavner

    SIPL Technion IITDecember 1 2002

  • 8/6/2019 Speech Morphing 2 Info]

    2/24

    8/19/2011 2

    Project GoalsProject Goals

    Gradually change a source speakers

    voice, to sound like the voice of a target

    speaker.

    The inputs : two reference voice signals,

    one for each speaker.

    The output : N voice signals that gradually

    change from source to target.

    source targetinterp

  • 8/6/2019 Speech Morphing 2 Info]

    3/24

    8/19/2011 3

    ApplicationsApplications

    Multimedia and video entertainment: voiceMorphing , just like its face counterpart.

    While seeing a face gradually changing from oneperson to anothers (like often done in videoclips) ,we could simultaneously hear his voicechanging as well.

    Forensic voice identification by synthesis:

    Identifying a suspect voice by creating voice-bank of different pitch, rate and timbre. Similarmethod was developed for face recognition toreplace police sketch artists.

  • 8/6/2019 Speech Morphing 2 Info]

    4/24

    8/19/2011 4

    The ChallengesThe Challenges The source and target voice signals will

    never be of the same length. A time

    varying method is needed.

    It is needed to change the source voicescharacteristics to those of the target

    speaker : pitch,duration,spectralparameters.

    Produce a natural sounding speechsignal.

    source target

  • 8/6/2019 Speech Morphing 2 Info]

    5/24

    8/19/2011 5

    The Challenges cont.The Challenges cont.Here are two identical words (shade) form source and target

    The target

    speakers word

    lasts longer

    then the source

    speakers and

    its shape is

    quite different.

  • 8/6/2019 Speech Morphing 2 Info]

    6/24

    8/19/2011 6

    Speech ModelingSpeech Modeling

    A1 A2 A3 Ak

    Sound transmission in the vocal tract can be modeled

    as sound passing through concatenated loss less

    acoustic tubes.

    A known mathematical relation between the areas of

    these tubes and the vocal tracts filter, will help in the

    implementation of our algorithm

  • 8/6/2019 Speech Morphing 2 Info]

    7/24

    8/19/2011 7

    Speech ModelingSpeech Modeling -- SynthesisSynthesis

    The basic synthesis of digital speech is done by the discrete-

    time system model.

  • 8/6/2019 Speech Morphing 2 Info]

    8/24

    8/19/2011 8

    sample

    align

    interpolate

    Prototype WaveformPrototype Waveform

    InterpolationInterpolation

    PWI is a speech coding

    method, which is based on the

    presentation of a speechsignal, or its residual error

    function, by a 3-D surface.

    The creation of such a surface

    is described by 3 main steps.

  • 8/6/2019 Speech Morphing 2 Info]

    9/24

    8/19/2011 9

    Surface Construction AlgorithmSurface Construction Algorithm

  • 8/6/2019 Speech Morphing 2 Info]

    10/24

    8/19/2011 10

    The SolutionThe SolutionUse 3D surfaces that will capture each vocal

    phonemes residual error signal

    characteristics and interpolate betweenthe two speakers.

    Unvoiced phonemes are not dealt with due

    to complexity and the fact that they carry

    little information about the speaker.

  • 8/6/2019 Speech Morphing 2 Info]

    11/24

    8/19/2011 11

    AlgorithmAlgorithm Block DiagramBlock Diagram

  • 8/6/2019 Speech Morphing 2 Info]

    12/24

    8/19/2011 12

    The Algorithm

    +

    Intermediate surface

    Speaker A

    Speaker B

    The new error function, that will

    be reconstructed from that

    surface, will then be the input of

    a new Vocal Tract Filter.

    Once the surfaces for bothsource and target speakers are

    created (for each phoneme) an

    interpolated surface is created.

  • 8/6/2019 Speech Morphing 2 Info]

    13/24

    8/19/2011 13

    After creating a surface for each

    speakers phoneme, an intermediate

    surface is created.

    The WaveformThe Waveform IntermediateIntermediate

    ( , ) ( , ) (1 ) ( , )new s t u t u t u t J E F J E K J !

    ( , ) : { : [ 0 ] ; : [ 0 2 ] }( , ) : { : [ 0 ] ; : [ 0 2 ] }

    ( , ) : { : [ 0 ] ; : [ 0 2 ] }

    / ; /

    s s

    t t

    n e w n e w

    n e w s n e w t

    u t t T u t t T

    u t t T

    T T T T

    J J TJ J T

    J J T

    F K

    ! !

  • 8/6/2019 Speech Morphing 2 Info]

    14/24

    8/19/2011 14

    The WaveformThe Waveform -- ReconstructionReconstructionThe new, intermediate error signal can be evaluated from the

    new surface by the equation :

    And that :

    ( ) ( , ( )), [0 : ]new new new newe t u t t t T J! !

    0

    '

    0 '

    2( ) ( )

    ( )

    t

    new

    newt

    t t dt p t

    TJ J!

    ( )ne tJ

    Assuming the pitch cycle changes slowly in time :

    ( ) ( ) (1 ) ( )new s t p t p t p t E F E K !

  • 8/6/2019 Speech Morphing 2 Info]

    15/24

    8/19/2011 15

    AlgorithmAlgorithm Cont.Cont.

    The areas of the new

    tube model will be an

    interpolation

    between thesources and the

    targets.

    Once the new Areas

    are computed the

    LPC parameters and

    V(z) can be

    calculated, and the

    signal can be

    synthesized.1 2 3 4 5 6 7( , , , , , , )a a a a a a a

  • 8/6/2019 Speech Morphing 2 Info]

    16/24

    8/19/2011 16

    New Voiced Phoneme CreationNew Voiced Phoneme Creation

    Block DiagramBlock Diagram

  • 8/6/2019 Speech Morphing 2 Info]

    17/24

    8/19/2011 17

    The transfer functionThe transfer functionThe factor could be invariant, and then the voice produced

    will be an intermediate between the two speakers, or it

    could vary in time, (from at t=0 to at t=T), yielding agradual change from one voice to the other.

    In order for one to hear a linear change between the

    sources and the targets voices, the coefficient, (the relative

    part of ) has to vary in time nonlinearly with afast transition to a = 0.5 and a slow transitions around it.

    E

    ( , )su t J

  • 8/6/2019 Speech Morphing 2 Info]

    18/24

    8/19/2011 18

    0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0

    - 0 . 0 3

    - 0 . 0 2

    - 0 . 0 1

    0

    0 . 0 1

    0 . 0 2

    0 . 0 3

    0 . 0 4

    The AlgorithmThe Algorithm cont.cont.The final and new speech signal is created by concatenating

    the new vocal phonemes, in order, along with the

    sources/targets unvoiced phonemes and silent periods.

    1

    silent

    2voiced

    3

    unvoiced

    4

    voiced

  • 8/6/2019 Speech Morphing 2 Info]

    19/24

    8/19/2011 19

    ConclusionConclusionThe utterances produced by the algorithm were

    shown to consist of intermediate features of the two

    speakers. Both intermediate sounds:

    And gradually changing sounds were produced.

  • 8/6/2019 Speech Morphing 2 Info]

    20/24

    8/19/2011 20

    Algorithms limitationsAlgorithms limitationsAlthough most of the speakersindividuality is concentrated in the

    voiced speech segments,degradation could be noticed, wheninterpolating between two speechsignals that differ greatly in their

    unvoiced speech segments, such asheavy breathing, long periods ofsilence, etc.

  • 8/6/2019 Speech Morphing 2 Info]

    21/24

    8/19/2011 21

    What has been done in theWhat has been done in the

    second part of this project?second part of this project? Basic implementation of reconstruction algorithm.

    Interpolation between two surfaces.

    Final implementation of the algorithm.

    Fixes in the algorithm such as:

    Maximization of the crosscoralation between

    surfaces. Modifications to the morphing factor.

  • 8/6/2019 Speech Morphing 2 Info]

    22/24

    8/19/2011 22

    Future work proposalsFuture work proposals The effect of the unvoiced/silent segments of

    speech on voice individuality.

    The effect of the Vocal Tracts formants shift

    around the unit cycle on the humans ear, in

    order to find a robust mapping of the

    transform function alpha.

  • 8/6/2019 Speech Morphing 2 Info]

    23/24

    8/19/2011 23

    AppendixAppendix -- EquationsEquations( , ) ( , ) (1 ) ( , )new s t alignedu t u t u t J E F J E K J !

    ( , ) ( , )t aligned t nu t u t J J J ! 2

    0 0

    ( , ) ( , )

    arg max { }

    ( , )e

    ne

    s t e

    t

    n

    s t e

    u t u t dtd

    u u t

    T

    J

    J

    J J J J

    J

    J J

    ! !

    !

    ( ) ( ) (1 ) ( )new s t p t p t p t E F E K !

  • 8/6/2019 Speech Morphing 2 Info]

    24/24

    8/19/2011 24

    AppendixAppendix transfer function.transfer function.

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1