Analysis and Synthesis of Pathological Vowels › spapl › paper › gabelman › prospv6.pdf2 OVERVIEW OF PRESENTATION I. Background II. Analysis of pathological voices III. Synthesis

1

Analysis and Synthesisof Pathological Vowels

Prospectus

Brian C. Gabelman

6/13/2003

2

OVERVIEW OF PRESENTATION

I. Background

II. Analysis of pathological voices

III. Synthesis of pathological voices

IV. Summary

3

What is this project?

BACKGROUND

Methods for modeling, analysis, and synthesis of pathological vowels incorporating novel approaches in:

- System identification- Parameterization of non-periodic

components (AM, FM, and noise)- Synthesizer designs for realtime and

offline use

What is a pathological vowel?

May be caused by physical or neural problems. Characterized by substantial and complex NONPERIODIC signal components.

4

Why do it?- Create a non-subjective basis to

compare pathological voices for:1. Improved diagnosis2. Tracking changes in a patient’s voice

- Generate voice samples with known levels of variations (noise, roughness, etc.)for:1. Evaluation of model parameters2. Evaluation of listener variability3. Evaluation of importance of levels

of pathological features.

BACKGROUND

What has been done before?- A well-established theory existsfor NORMAL voices

- Recent studies of pathological voicesemploy perturbation of normal features plus additive noise.

- ES (external source) stimulation of the vocal tract to analyze formants (vowels)since 1942 for NORMAL voices.

5

For the theoretical/analytical aspect of the project, an expression of the hypothesis of the dissertation in one sentence is:“By means of FM and AM demodulation techniques, estimation of nonperiodicfeatures of pathological vowels may be improved.”

BACKGROUND

6

GLOTTAL SOURCE WAVEFORM

(time domain)

VOCAL TRACT FREQ. RESP.

(freq domain)

RESULTING

VOICE SIG.

(time domain)

g(t)G(s)

f(t)F(s)

v(t)V(s)

g(t) conv. f(t) = v(t) (time domain)G(s) x F(s) = V(s) (freq domain)

ANALYSIS

SOURCE - FILTER MODEL OF SPEECH

7

Steps for analysis:

PERIODIC ANALYSIS:

1. FORMANT DETERMINATIONUses LP (linear prediction) to model vocaltract as cascaded 2nd order digital resonators.External source testing is shown to augmentor replace LP for pathological vowels (Inv).

2. SOURCE MODELINGUses inverse filtering and least squaresoptimization to fit source waveform to astandard model (LF).

NONPERIODIC ANALYSIS:

3. ANALYSIS OF PITCH VARIATIONUses high resolution pitch tracking to measure detailed nonperiodic frequency variation.Variations are segmented into low andhigh frequeny FM with Gaussian form.

ANALYSIS

8

4. FM DEMODULATIONPitch variations are removed from originalvoice to achieve accurate noise estimation(step 7) (Inv.)

5. ANALYSIS OF POWER VARIATIONUses power tracking to measure detailednonperiodic loudness variations.Variations are segmented into low andhigh frequency AM with Gaussian form.

6. AM DEMODULATIONPower variations are removed from originalvoice to achieve accurate noise estimation(step 7) (Inv.)

7. ASPIRATION NOISE Frequency domain methods are used to separate aspiration noise component. Thenoise is spectrally modeled.[Gus de Krom]

ANALYSIS

9

ANALYSIS

COLLECTIONOF VOICESAMPLES

VOICEANALYSIS

PARAMETERS(PITCH, NSR,FORMANTS,..)

VALIDATION

PERCEPTUALCOMPARISONS

+

-

VOICESYNTHESIS

ANALYSIS BY SYNTHESIS

10

ANALYSIS

ANALYSIS/SYNTHESIS MODEL OVERVIEW

SOURCE WAVEFORM

FMMODULATION

AMMODULATION

SPECTRALSHAPING

GAUSSIANNOISE

VOCALTRACT

OUTPUTVOICE

PERIODIC

NONPERIODIC

+

+

SYNTHESIS

ANALYSIS

11

MIC SIGNAL

LPC FORMANTANALYSIS /MANUAL OPS

JITTERESTIMATE.

TREMOR TIME HIST

INVERSEFILTERING

RAW FLOWDERIVATIVE

SRC NOISESPECTRUM

RESAMPLETO REMOVETREMOR

SYNTHESIZER

FORMANTS

FITTED LF SOURCE PULSE

LEAST SQRLF FIT

PITCH TRACKER

CEPSTRALNOISEANALYSIS

NSR ESTIMATE

CONST. PITCHVOICE

PITCH TIME HIST

SHIMMERESTIMATE

VOLUME TIME HIST

POWER TRACKER

POWER TIME HIST

RESAMPLETO REMOVETREMOR

CONST. POWERVOICE

MIKE SIGNAL

ANALYSIS

OVERVIEW OF PROJECT OPERATONS

12

UNKOWN SYSTEM PREDICTOR

u(n) s(n)

s’(n)

+

-

e(n)

∑=

−−p

k

k

k za

G

1

1

∑=

−p

k

k

k z1α

ESTIMATED INVERSE SYSTEM

H(z)

A(z) ≈−= ∑=

−p

k

k

k z1

1 α G/H(z)

LINEAR PREDICTION

(SOURCE)(VOCALTRACT)

Estimates the vocal tract as an all-pole filterby minimizing the error between actual andmodel-predicted signals.

(MODEL)

(ERROR)

PERIODIC ANALYSIS

13

IDEALIZED LP RESULT

0.03 0.035 0.04 0.045 0.05 0.055 0.06

-10

-8

-6

-4

-2

0

2 ERROR SIGNAL OF COVARIANCE LPA = INVERSE FILTERED OUTPUT

0.03 0.035 0.04 0.045 0.05 0.055 0.06-20-15-10-505

10152025 LPC COV. PREDICTOR: LINE = ACTUAL OUT, DOT= PREDICTED

-1.5 -1 -0.5 0 0.5 1 1.5-1

-0.8-0.6-0.4-0.2

00.20.40.60.8

1

Real part

Imag

inar

y pa

rt

LPC COVARIANCE ROOTS. O=TRUE X=LPC

POLE REFLECTEDINSIDE UNIT CIRCLE

Requires a priori knowledge of system.More difficult for pathological vowels.

PERIODIC ANALYSIS

14

SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION

VOICE TIME SERIESVOICE SPECTRUM





CASE 0: NORMAL SOURCE AND NORMAL VOCAL TRACT

CASE 1: “BREATHY” SOURCE AND NORMAL VOCAL TRACT

CASE 2: NORMAL SOURCE AND ABNORMAL VOCAL TRACT

SOURCE-FILTER AMBIGUITYSource & filter are mixed in final voice.Unique LP solution may be difficult.

PERIODIC ANALYSIS

15

≤−+−=≤−=

≤==

−−

−−

ce

tt

eB

ce

tt

e

eg

t

ttttetmEtE

tttEtE

tttEtE

tE

eee

e

e

�

�

),()(

,)(

,sin)(

)()(

2

)(

2

01

ε

ε

α ω

0 0.002 0.004 0.006 0.008 0.01 0.012-150

-100

-50

0

50

100

150

T (SECONDS)

FLO

W U

(t)

& D

ER

IVA

TIV

E E

(t)

E1 (FIRST SEG.) E2 (SECOND SEG.)

U(t)

30*E(t)

tp

(te,Ee)

tc

(t2,Ee/2)

= ∫ E t d t( )

(tpk,Epk)

FITTING RAW SOURCE TO LF Having established the inverse filteredsource, it is fit to the LF model [Qi]

PERIODIC ANALYSIS

16

HIGH RESOLUTION PITCH TRACKING Nonperiodic analysis begins with interpolatingpitch tracking.

NON-PERIODIC ANALYSIS

TIME (SEC)

0.018 0.02 0.022 0.024 0.026 0.028 0.03 0.032 0.034 0.036-8

-6

-4

-2

0

2

4

6

8

VO

ICE

SIG

Measured Period

0 0.2 0.4 0.6 0.8 1255

260

265

270

275

FRE

Q (

HZ

)

TIME (SEC)

VOICE SIGNAL

PITCH TRACK

17

0 0.2 0.4 0.6 0.8 1

260

265

270

275

FRE

Q H

Z

SEC

0 0.2 0.4 0.6 0.8 1

-4

-2

0

2

4

FRE

Q H

Z

SEC

FM DEVIATION SEGREATED TOLOW AND HIGH FREQUENCY

The pitch track is low/hi pass filtered to yield tremor and HFPV (High FrequencyPitch Variation).


18

-1.5 -1 -0.5 0 0.5 1 1.50

5

10

15

20

25

30

35

40

45

DELT FREQ (%) TOTSKIP=0 FCUT=10 TOTMAN=0

#OC

CU

RR

AN

CE

S

GAUSSIAN HFPV

Successful pitch tracking yields a Gaussiandistribution in HPFV. The standard deviationis a convenient measure of HFPV.


19

FM DEMODULATION

The pitch track may be used to demodulatethe original voice to obtain a version withalmost no pitch variation; re-trackingverifies constant pitch (<0.1%).


0 0.2 0.4 0.6 0.8 1

-0.1

-0.05

0

0.05

0.1

DE

LT

A F

RE

Q P

ER

CE

NT

TIME (SEC)

20

0 100 200 300 400 5000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4

SAMPLE NUMBER

SM

OO

TH

ED

AB

S O

RIG

VO

ICE

PITCH TRACK FEATURES

ENVELOPE MINIMA

0 0.5 1 1.5 2 2.5 3 3.5 4x 10 4

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SAMPLE #

POW

ER

POWER TRACKING

Analogously to pitch tracking, voice poweris tracked.


POWER TRACK

21

0 0.2 0.4 0.6 0.8 1

0.5

1

1.5

2 x 10 8 ORIGINAL POWER ( 0) AND TREMOR (LINE)

PO

WE

R

TIME (SEC)

0 0.2 0.4 0.6 0.8 1

-20

-10

0

10

20 SHIM% = 100*(POWER - LOWPASS POWER)/POWER

DE

LT

A P

OW

ER

PE

RC

EN

T

TIME (SEC)

POWER SEGREGATED TOLOW AND HIGH FREQUENCY

The power track is low/hi pass filtered to yield low frequency power variations andhigh frequency shimmer.


22

-20 -15 -10 -5 0 5 10 150

5

10

15

20

25

30

35

40

DELT PWR (%)

#OC

CU

RR

AN

CE

S

SHIM% HIST. STAND DEV = 4.823%

GAUSSIAN POWER VARIATIONS

Shimmer also displays Gaussian variations.


23

0 0.5 1 1.5 2 2.5 3 3.5 4x 104

0.75

0.8

0.85

0.9

0.95

1VARIOUS MEASURES OF SIGNAL STRENGTH

SAMPLE #

POWER

MAX AMPL.

SUMENERGY

AM DEMODULATION

The power track may be used to demodulatethe original voice to obtain a version withalmost no variation in strength; re-trackingverifies constant power.


24

ASPIRATION NOISE ANALYSIS

Aspiration noise is segregated via spectraltechniques. Peaks in the FFT of the log of the FFT (cepstrum) represent periodic energy,and are filtered out with a comb filter (lifter).Results are used to calculate noise-to-signalratio (NSR).


0 1000 2000 3000 4000 50000

1

2

3

4

5

6

7

FREQUENCY (HZ)

LO

G10

MA

GN

ITU

DE

Figure 2.30a. PSD oforiginal voice.0 0.1 0.2 0.3 0.4 0.5

0

0.020.04

0.060.080.1

0.12

TIME (SEC)Figure 2.30b. Cepstrum of original voice.

0 0.005 0.01 0.015 0.02

0

0.020.04

0.06

0.080.1

0.12

TIME (SEC)Figure 2.30c. Cepstrum (expanded scale).

0 0.005 0.01 0.015 0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

TIME (SEC)Figure 2.30d. Comb-liftered cepstrum of 14c.

0 1000 2000 3000 4000 5000-2-101234567

FREQUENCY (HZ)

LO

G10

MA

GN

ITU

DE

Figure 2.30e. Orig PSD, aspriation PSD,and vocal tract.

0 1000 2000 3000 4000 50001.5

2

2.5

3

3.5

4

4.5

FREQUENCY (HZ)

LO

G10

MA

GN

ITU

DE

Figure 2.30f. Source aspiration PSD with vocal tract removed and fitted to 25-pointpiecewise-linear model.

25

FM DEMODULATION IMPROVESNSR ACCURACY

Using FM demodulation improves resolutionof spectral peaks of periodic components,thus allowing longer FFT windows and moreaccurate NSR determination.


0 100 200 300 400 500 600 700 800

-1

0

1

2

3

4

5

6

7

FREQUENCY (HZ)

LO

G10

MA

GN

ITU

DE

26

0 5 10 15 20 25 30 35

-30

-25

-20

-15

-10

-5

0

NS

R (

DB

)

CASE# - SORTED BY ASCENDING ORIGINAL NSR

CHANGES IN NSR AFTER FM ANDAM DEMODULATION

FM demodulation reduces NSR measuresby up to 20 dB, yielding results closer toperceived levels. AM demodulation has much less effect.


ORIG

ALL FM REMOVED

TREMOR REMOVED

27

PC #1: STIMULUS GEN.

STIMULUSSIGNAL

D/A

AMPLIFIER XDUCER

/a/

ACOUSTIC CONDUIT

PC #2: DAQ

SIGNALCONDITIONER

A/D

A/D

MEM.

MICROPHONECH 1

CH 2

SIGNALCONDITIONER

VOCAL TRACT

TUBE OF LENGTH L WITH CLOSED END.FORMANTS AT F = C/4L, 3C/4L, 5C/4L,...

EXTERNAL (ES) SOURCE ANALYSISSource-filter ambiguity may be resolvedby augmenting the glottal source with anknown external stimulus. [Epps].

PERIODIC ANALYSIS

28

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 104

-60

-40

-20

0

20

40

60

80

100

120

140SPECTRUM WITH RAG, W/O RAG, & MAG OF T.F. (YEL)

FREQUENCY (Hz)

MA

GN

ITU

DE

(dB

)

|HB(f)|

|HA(f)|

|V(f)|

TUBE FORMANTS

ES VERIFICATION A simple plastic tube model verified theES experimental setup. Resonances occurat expected frequencies.

PERIODIC ANALYSIS

29

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

30SPECTRA D1: VC LPA(DASH), VC FFT(SOLID) @ 0dB, EXTSRC TF(DOT) @ 20dB

F1 F2

F3 F4

LPA

FFT

EXT SRC T.F.’S

ES: NORMAL /a/ LP & FFT analysis show consistent results with ES analysis for a normalvowel.

PERIODIC ANALYSIS

30

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

30D1BRTHY: VOICE LPA & FFT (BOT), EXTSRC TF (TOP)

FREQUENCY (Hz)

MA

GN

ITU

DE

(dB

)

F1 F2

F3 F4LPA

FFT

EXT SRC T.F.’S

ES: SIMULATED BREATHY /a/ LP & FFT analysis show poor resolutionfor F3 and F4 for a breathy /a/, while theES resolution for F3 and F4 remains good.

PERIODIC ANALYSIS

31

SYNTHESIS

SYNTHESIS OF PATHOLOGICAL VOWELS

Synthesis is a critical step in the study of pathological vowels. It provides evidence of the success of analysis and modeling steps via immediate comparisons of original and synthetic voice.

Two synthesizers were implemented:

1. A realtime hardware-based synthesizer capable of providing instant response to changes in model parameters.

2. A software synthesizer implemented in MATLAB with extended features, convenient graphical interface, and ease of modification.

32

LOWPASSFILT.

CURRENT REALTIME SYNTHESIZER FUNCTIONAL OVERVIEW

D/AAGC

4 ZEROES

. . .

20 POLES

GAH

ASPNOISE

OUTPUTSIGNAL FILE

RECORDWAV

STORE/RECALL CONTROL PARMS

PARAMETER TIME VARIATION

SOURCESELECTION

SOURCE CONTROLS

JITTER SHIMMER DIPLOPHONIA

CLKX86 INTERUPT

AMP

IMPL

KGLOT88

LF1

LF2

ARB

ARBITRARYSOURCEFILE

SPKR

+

SYNTHESIS

REALTIME SYNTHESIZER- Implemented in native X86 assembly

language- Executes all code within 100us cycle- Overrides PC OS to achieve determinancy- Employs dedicated clock, I/O, and controlhardware implemented in a wire-wrap PCB adapter card

33

SYNTHESIS

SYNTHESIS VALIDATION

The current model, analysis tools, and synthesizers yield a high level of fidelity in generation of synthetic pathological vowels.The system is currently employed at the UCLA Voicelab for NIH funded perceptual studies.

In order to objectively validate the analysis/synthesis process, the loop is closed by re-analyzing the synthetic time series to confirm parameter values.

Re-analysis also provides opportunity to observe interactions of nonperiodic components.

34

SYNTHESIS


-30 -25 -20 -15 -10 -5 0-30

-25

-20

-15

-10

-5

0

A.N. SET NSR IN SYNTHESIZER NPB21

ME

AS

UR

ED

NS

R I

N S

YN

TH

ET

IC (

DB

)

Re-measured synthetic aspiration noise level agrees with level set in synthesizer.

35

SYNTHESIS


Aspiration noise adds about %0.2 to HFPVmeasurements.

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

JITTER LEVEL SET IN SYNTHESIZER (%)

JIT

TE

R M

EA

SUR

ED

IN

SY

NT

HE

TIC

(%

)

A.N. SET NSR IN SYNTHESIZER (dB)

-30 -25 -20 -15 -10 -5 0-30

-25

-20

-15

-10

-5

0

ME

ASU

RE

D N

SR I

N S

YN

TH

(dB

)

HFPV adds about 4dB to NSR measurements.

36

SYNTHESIS


Subjective analysis by synthesis experiments demonstrate the success of AM and FM demodulation in achieving accurate modeling of nonperiodic features. Listeners adjust syntheticaspiration noise to match original.

Match improves with demodulation

-30 -25 -20 -15 -10 -5 0

-30

-25

-20

-15

-10

-5

0

SA

BS

AS

PIR

AT

ION

NO

ISE

(D

B)

CEPSTRAL NSR (DB)

PEARSON = 0.51

ORIGINAL VOICE (NO DEMOD)

37

SYNTHESIS


-30 -25 -20 -15 -10 -5 0-30

-25

-20

-15

-10

-5

0

SA

BS

ASP

IRA

TIO

N N

OIS

E (

DB

)

CEPSTRAL NSR (DB)

TREMOR REMOVED

PEARSON = 0.71

-30 -25 -20 -15 -10 -5 0-30

-25

-20

-15

-10

-5

0

SA

BS

ASP

IRA

TIO

N N

OIS

E (

DB

)

CEPSTRAL NSR (DB)

ALL AM&FM REMOVED

PEARSON = 0.87

38

SUMMARYCONCLUSION

This study has achieved improved automatic, objective analysis and synthesis of speech within the specialization of pathological vowels. Specific accomplishments include:

- A unique, symmetric model fornonperiodic components as AM, FM and spectrally-shaped aspiration noise

- Improved accuracy of noise analysis via AM & FM demodulation

- Application of ES formant identificationfor pathological vowels.

- Implementation of realtime and offlinespecialized high fidelity vowel synthesizers

Documents

Analysis and Synthesis of Pathological Vowels › spapl › paper › gabelman › prospv6.pdf2 OVERVIEW OF PRESENTATION I. Background II. Analysis of pathological voices III. Synthesis