70
Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics Petros Maragos 1 Summer School on Speech Signal Processing (S4P) DA-IICT, Gandhinagar, India, 9-11 Sept. 2018

Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Computer Vision, Speech Communication & Signal Processing Group,Intelligent Robotics and Automation Laboratory

National Technical University of Athens, Greece (NTUA)Robot Perception and Interaction Unit,

Athena Research and Innovation Center (Athena RIC)

Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics

Petros Maragos

1

Summer School on Speech Signal Processing (S4P)DA-IICT, Gandhinagar, India, 9-11 Sept. 2018

Page 2: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

2

Outline

Nonlinear Speech Processing Turbulence: Fractals, Chaotic Dynamics

Multiscale Fractal Dimensions of Speech Sounds Fractal Modulations for Fricative SoundsChaotic Dynamics of Speech SoundsAlgorithms for Speech Fractal & Chaos Analysis Application to Speech RecognitionApplication to Music Recognition

Page 3: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

)(ns

IMPULSETRAIN

GENERATOR

PITCH PERIOD

GLOTTALPULSE

MODELG(z)

X

RANDOMNOISE

GENERATORX

VOCALTRACTMODEL

V(z)

VOCAL TRACT

PARAMETERS

RADIATIONMODEL

R(z)

VA

NA

)(nu

VOICED/UNVOICED SWITCH

(Rabiner & Schafer, 1978)

Linear Source-Filter Model

Page 4: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Nonlinear Fluid Dynamic of the Vocal Tract(Kaiser 1993)

Page 5: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Physics of Speech Airflow• airflow variables: = air density; = pressure

= 3D air particle velocity

• governing equations:mass conservation (continuity eqn):

momentum conservation (Navier-Stokes eqn):

state equation:

• time-varying boundary conditions

u

0ut

2 13

u u u p g u ut

1.4 const.p

p

Page 6: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Speech Aerodynamics• Reynolds number:

• low viscosity μ high Reinertia forces viscous forces

• “aerodynamic” phenomena (Re >>1):air jet, rotational motion, separated airflow,boundary layers, vortices, turbulence

• experimental & theoretical evidence for nonlinear phenomena:

Teager (1970s–1980s), Kaiser (1983 – ), Thomas (1986),

McGowan (1988), Barney, Shadle & Davis (1999), ...

( ) ( )Re velocity scale length scale

Page 7: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Vortices• vorticity:• VORTEX is a flow region of similar• a vortex can be generated by:

– velocity gradients in boundary layers– separated air flow– curved geometry of vocal tract

• dynamics of vortex propagation:

vorticity twisting & stretchingdiffusion of vorticity

u

2u ut

u

2

Page 8: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Nonlinear Speech Processing

• Modulations

• Turbulence– Fractals– Chaos

Page 9: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Turbulence• flow state with broad-spectrum rapidly-varying (in space and

time) velocity and vorticity• transition to turbulence is easier for higher Re flows• eddies: vortices of a characteristic size

• Energy Cascade Theory (Richardson,1922)(multiscale hierarchy of eddies)

• 5/3 spectral law (Kolmogorov, 1941):

wavenumber energy dissipation rate

velocity wavenumber spectrum

2 3 5 3,S k r r k 2 /k

r ,S k r

Page 10: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Turbulence, Fractals and Chaos

• fractal geometry quantifies multiscale structuresin turbulence

• Kolmogorov’s 5/3 law

• we use fractal dimension to quantify “amount” of turbulence in speech

• chaos turbulence

2 3Var u x u x x x

Page 11: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Multiscale Fractal Dimension of Speech Spounds

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ F /

0 5 10 15 20 25 30−400

0

400

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

F /

0 5 10 15 20 25 30−800

0

800

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

V /

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ V /

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ IY

/

0 5 10 15 20 25 30−3000

0

3000

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

IY /

/f/ /iy//v/

[ P. Maragos & A. Potamianos, JASA 1999 ]

Page 12: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/ao/,DE=6, #1846

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/iy/,DE=5, #1068

Speech Attractors0 500 1000 1500

−1

−0.5

0

0.5

1

Time

X(t

)/ao/

0 200 400 600 800 1000−1

−0.5

0

0.5

1

Time

X(t

)

/iy/

−1.5−1

−0.50

0.51

−1.5

−1

−0.5

0

0.5

1−1.5

−1

−0.5

0

0.5

1

/k/,DE=6, #816

0 200 400 600 800−1

−0.5

0

0.5

1

Time

X(t

)

/k/

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/s/,DE=5, #829

0 200 400 600 800−1

−0.5

0

0.5

1

Time

X(t

)

/s/

[ Pitsikalis & Maragos, Speech Commun 2009 ]

Page 13: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Multiscale Fractal Dimensionsfor

Speech Sounds

Refs: • P. Maragos and A. Potamianos, “Fractal Dimensions of Speech Sounds: Computation and

Application to Automatic Speech Recognition”, Journal of Acoustical Society of America, March 1999.

• P. Maragos, “Fractal Signal Analysis Using Mathematical Morphology”, in Advances in Electronics and Electron Physics, vol.88, Academic Press, 1994.

Page 14: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

14

FRACTALS: Definitions• Mandelbrot’s definition

set is fractal Hausdorff dim topological dim

• Examples

• SignalsA function is a fractal if its graph

is a fractal set in is continuous

S ( ) HD S ( )TD S

1 2 T HD D 0 1 T HD D

2 3 T HD D

S =S =

S =fractal curvefractal surface

fractal dust

: vf ( )Gr f 1v

f [ ( )] 1T Hv D D Gr f v

Page 15: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

15

‘FRACTAL’ DIMENSIONS (OF SETS IN Rν)

Hausdorff dimension

Minkowski-Bouligand dimension

box counting dimension

similarity dimension

0 T H MB BCD D D D v£ £ £ = £

H SD D£

HD =

MBD =

BCD =

SD =

Page 16: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Morphological Measurement of Fractal Dimension

• Minkowski cover of curve

• Fractal (Minkowski-Bouligand) dimension

• Least-Squares line fit to data

: ( )Bz G

G rB z C r

1,2D

1;2B D

B B

A rA r area C r length of G r r

r

2log , log 1BA r r r D

Page 17: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Morphological (Flat & Weighted) FiltersDilation (Max-plus convolution):

Erosion (Min-plus correlation):

( )( ) max ( ) ( )yf g x f y g x y

( )( ) min ( ) ( )yf g x f y g y x

( )f g f g g

( )f g f g g

0 100 200 300

0

50

100

Sample Index

ORIGINAL SIGNAL

−10 0 100

10

Sample Index

PARABOLA PULSE

0 100 200 300−10

0

50

100

110DILATION BY FLAT & PARABOLIC SE

0 100 200 300−10

0

50

100

110EROSION BY FLAT & PARABOLIC SE

0 100 200 300−10

0

50

100

110OPENING BY FLAT & PARABOLIC SE

0 100 200 300−10

0

50

100

110CLOSING BY FLAT & PARABOLIC SE

Opening:

Closing:

Page 18: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

18

Minkowski Fractal Dimension of 1D Curveand Morphological Algorithm for 1D Signals

Page 19: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

11.21.41.61.8

2

TIME (in sec)

ZERO−CROSSINGSMS−AMPLITUDE

FRACTAL DIMENSION

/ SOOTHING /

SPEECH SIGNAL

ST Speech & Fractal Dimension

Page 20: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Multiscale Speech Fractal Dimension

• short-time speech signal

• signal graph

• fractal constant power law

• variable power law

• multiscale fractal “dimension”

(speech fractogram):

of

short-time speech segment

around time

,tS Tt 0

2 , : 0G t S t R t T

2 , as 0Darea G B C

DtMFD ,

2 Darea G B C

t

Page 21: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Multiscale Fractal Dimension of Speech Spounds

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ F /

0 5 10 15 20 25 30−400

0

400

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

F /

0 5 10 15 20 25 30−800

0

800

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

V /

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ V /

0 0.5 1 1.5 2 2.5 3 3.5 41

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of

/ IY

/

0 5 10 15 20 25 30−3000

0

3000

TIME (millisec)

SP

EE

CH

SIG

NA

L: /

IY /

/f/ /iy//v/

[ P. Maragos & A. Potamianos, JASA 1999 ]

Page 22: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/AA/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

AA

/

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/B/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

B/

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/F/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

F/

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/R/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

R/

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/EN/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

EN

/

0 1 2 3 41

1.2

1.4

1.6

1.8

2

/M/

SCALE (millisec)

FR

AC

TA

L D

IME

NS

ION

of /

M/

Mean and standard deviation (error bars) of the multiscale fractal dimension for the phonemes /aa/, /b/, /en/, /f/, /m/, /r/ from the TIMIT database (20 ms window, updated every 10 ms. Average over 200 phonemic instances.)

Page 23: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Mean MFD for /sh/, /zh/, /uh/, /t/, /d/

Page 24: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

81.2% 83.5% 84.5%

11,

,,,DD

CECE

, , ,E C E C

1 16 1 16

, , ,

,

E C E C

D D

Features

Models5-mixture Gaussians 85.6% 86.3%10-mixture Gaussians 88.6% 88.9%

},,,,,{

CECECE

},{},

,,,,{

DDCE

CECE

Word Percent Correct For the E-set Recognition Task (ISOLET Database, 5-Mixture Gaussians per HMM State)

Word Percent Correct for the E-set Recognition Task

Maragos & Potamianos, JASA 1999

Page 25: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Fractal Modulationsfor

Fricative Sounds

Ref: • A. G. Dimakis and P. Maragos, “Phase Modulated Resonances Modeled as Self-

Similar Processes With Application to Turbulent Sounds”, IEEE Transactions on Signal Processing, Nov. 2005.

Page 26: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

• An important class of statistically self-similar randomprocesses defined by their measured power spectra:

A truly enormous collection of natural phenomena exhibit 1/f-type spectral behavior over a wide frequency range:(frequency variations in quartz crystal oscillators, geophysical variations,heart rate variations, electronic device noises, network traffic flow andeconomic time series.)

• Most popular mathematical model for Gaussian 1/fprocesses: Fractional Brownian Motion (FBM)

1/f Noises

2

( )| |

S

Page 27: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

27

Noises• Stochastic processes with power spectrum • Filtering white noise with convolution kernel

(Fractional Integration)• Non – exponential autocorrelation• White noise• Pink noise• Fractal Brownian Motion• Brown noise• Black noise• Applications: electronics, geophysics, astronomy, music,

acoustics, optics, economics, traffic flows, communications, geometry of nature

1 f

1 f 2 1t

1t 0

1

1 3

2 2

Page 28: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Examples of FFT-based Synthesis of 1D FBM

Page 29: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

29

FBM Synthesis of Fractal Landscapes

D = 2.15

D = 2.5

D = 2.8

R. Voss, 1988

Page 30: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

1/f Speech Modulation Model

• Model a resonance of a random speech phoneme as a phase-modulated 1/f signal:

• Nonlinear phase signal P(t) modeled as 1/f randomprocess.

• Useful model for broad resonances often observed infricative voiced or unvoiced sounds and probably caused bynonlinear phenomena during speech production.

( )

( ) cos ( )c

t

S t A t P t

Page 31: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

• Isolate resonance: Bandpass filter the speech signal.• Demodulate filtered signal using ESA, obtain instant

frequency F(t), and median filter to reduce spikes.• Estimate phase modulation signal P(t) by integrating IF:

• Fit 1/fγ model to P(t). Methods tested include: Linear regression on Periodogram Estimation using variance of wavelet coefficients Maximum Likelihood estimation

Parameter Estimation in 1/f-PM

0( ) 2 ( ( ) )

tP t F F d

Page 32: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

1 2 3 4 5 6 7-20

-15

-10

-5

0

5

10

Scale m101

102

103

104

-300

-250

-200

-150

-100

-50

0

Frequency (Hz)

=2.99

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.052800

3000

3200

3400

3600

3800

4000

4200

4400

4600

4800

Time (sec)0 1000 2000 3000 4000 5000 6000 7000 8000

-180

-160

-140

-120

-100

-80

-60

Frequency (Hz)

/S/ phoneme experiment

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

Speech signal

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05-2

-1

0

1

2

3

4

5

6

7

8

Time (sec)

Power Spectrum

Phase modulation P(t)

Instant Frequency

PSD of P(t) Var. of wavelet coefficients

Page 33: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

101

102

103

104

-350

-300

-250

-200

-150

-100

-50

0

50

Frequency (Hz)

=3.63

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

/Z/ phoneme experimentSpeech signal

0 1000 2000 3000 4000 5000 6000 7000 8000

-180

-160

-140

-120

-100

-80

Frequency (Hz)

Power Spectrum

0 0.01 0.02 0.03 0.04 0.05 0.06 0.072400

2600

2800

3000

3200

3400

3600

3800

4000

4200

4400

Time (sec)

Instant Frequency

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07-20

-15

-10

-5

0

5

10

Time (sec)

1 2 3 4 5 6 7-20

-15

-10

-5

0

5

10

Scale m

Phase modulation P(t) PSD of P(t) Var. of wavelet coefficients

Page 34: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Chaotic Dynamics of Speech Sounds

Refs: • V. Pitsikalis and P. Maragos, “Filtered Dynamics and Fractal Dimensions for

Noisy Speech Recognition”, IEEE Signal Processing Letters, Nov. 2006.• V. Pitsikalis and P. Maragos, “Analysis and Classification of Speech Signals by

Generalized Fractal Dimension Features”, Speech Communication, Dec. 2009.• I. Kokkinos and P. Maragos, “Nonlinear Speech Analysis Using Models for

Chaotic Systems”, IEEE Transactions Speech and Audio Processing, Nov. 2005.

Page 35: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Embedding-Attractor Reconstruction

•Parameters to specify: , ET D

h G 1x t

1 2x t T

1x t T

0 200 400 600 800 1000 1200 1400 1600 1800 2000−10

−8

−6

−4

−2

0

2

4

6

8

10X−projection (Lorenz)

Time

X(t

) 1Y t 1x t 1x t T

1 1Ex t D T

−10−8−6−4−20246810

−10

−8

−6

−4

−2

0

2

4

6

8

10

−10

0

10

Reconstructed Lorenz Attractor,20000 iterations

X

Y

Z

−10 −8 −6 −4 −2 0 2 4 6 8 10−20

−10

0

10

20

0

5

10

15

20

25

30

Lorenz Attractor;σ=5 R=15 B=1;Δt=0.25; 20000 iterations

X

Y

Z

Page 36: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

• Nonlinear Dynamic System (Lorenz)

• Attractor

• 1D Projection

dxx y

dtdy

R x y x zdtdz

B z x ydt

−10 −8 −6 −4 −2 0 2 4 6 8 10−20

−10

0

10

20

0

5

10

15

20

25

30

Lorenz Attractor;σ=5 R=15 B=1;Δt=0.25; 20000 iterations

X

Z

Y

0 500 1000 1500 2000 2500 3000 3500 4000−10

−8

−6

−4

−2

0

2

4

6

8

10X−projection (Lorenz)

Time

X(t

)

Page 37: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Time Delay

• Average Mutual Information between

• “Optimum” Time Delay:

( ), ( )x t x t T

Pr ( ),Pr , log

Pr Prx t x t T

I T x t x t Tx t x t T

min arg min ( )optT

T I T

0 50 100 1500

1

2

3

4

5Lorenz System,σ=5 R=15 B=1,Δt=0.25,#10000

Time Delay T

Ave

rage

Mut

ual I

nfor

mat

ion

Page 38: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Embedding Dimension• Sufficient: • False Neighbors: from projection• True Neighbors: from dynamics• False Neighbors Criterion

• When % false neighbors =0,

Attractor is unfolded

2E AttractorD D

1 1,

( ) ( ) ( ) ( )( ) ( )

d d d di j

d d

y i y j y i y jR Threshold

y i y j

1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Lorenz System,σ=5 R=15 B=1,Δt=0.25,#2000

Embedding Dimension DE

% F

alse

Nei

ghbo

rs

Page 39: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/ao/,DE=6, #1846

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/iy/,DE=5, #1068

Speech Attractors0 500 1000 1500

−1

−0.5

0

0.5

1

Time

X(t

)/ao/

0 200 400 600 800 1000−1

−0.5

0

0.5

1

Time

X(t

)

/iy/

−1.5−1

−0.50

0.51

−1.5

−1

−0.5

0

0.5

1−1.5

−1

−0.5

0

0.5

1

/k/,DE=6, #816

0 200 400 600 800−1

−0.5

0

0.5

1

Time

X(t

)

/k/

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1−1

−0.5

0

0.5

1

/s/,DE=5, #829

0 200 400 600 800−1

−0.5

0

0.5

1

Time

X(t

)

/s/

[ Pitsikalis & Maragos, Speech Commun 2009 ]

Page 40: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Correlation Dimension (Speech)

Correlation Dimension:

Correlation integral:

N: # of points, r: scale, x: set points,Η: Ηeavyside function

1

1,1

N

i ji j i

C N r H r x xN N

0

log ,lim lim

logC r N

C N rD

r

Page 41: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Correlation Dimension (Lorenz)

1

1,1

N

i ji j i

C N r H r x xN N

0

log ,lim lim

logC r N

C N rD

r

10−1

100

101

104

105

106

Lorenz System,σ=5 R=15 B=1,Δt=0.25,#4000

Cor

rela

tion

Inte

gral

Scale0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

1.8

1.85

1.9

1.95

2

2.05

2.1

2.15

2.2

Scale

Loca

l Slo

pe

Lorenz System,σ=5 R=15 B=1,Δt=0.25,#4000

Page 42: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

10−3

10−2

10−1

100

101

100

101

102

103

104

105

106

107

averaging over 8 phonemes of type /ao/

scale

corr

elat

ion

inte

gral

plain averaging weighted averaging 10

−210

−110

010

110

−1

100

101

102

103

104

105

106

107

averaging over 15 phonemes of type /iy/

scale

corr

elat

ion

inte

gral

plain averaging weighted averaging

10−2

10−1

100

101

100

101

102

103

104

105

106

107

averaging over 13 phonemes of type /s/

scale

corr

elat

ion

inte

gral

plain averaging weighted averaging

10−3

10−2

10−1

100

101

100

101

102

103

104

105

106

averaging over 11 phonemes of type /k/

scale

corr

elat

ion

inte

gral

plain averaging weighted averaging

Correlation Integrals of Speech Sounds

/ao//iy/

/k//s/

Page 43: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Fractal Features

800 1000 1200 1400−1

−0.5

0

0.5

1

(ms)

N-d

CleanedEmbedding N-d

SignalLocal SVDspeech

signalFiltered Dynamics -Correlation Dimension (8)

Noisy Embedding Filtered Embedding

FDCD

Multiscale Fractal Dimension (6)MFDGeometrical

Filtering

−0.200.20.40.6−0.2 0 0.2 0.4

−0.2

0

0.2

0.4

−0.200.20.40.6−0.2 0 0.2 0.4

−0.2

0

0.2

0.4

0.1 0.2 0.30

0.05

0.1

0.15

Median Neighborhood Distance

Den

sity

FilteredNoisy

Neighborhood Distance Reduction

Projection

50 100 150 200 250 300 350 400

−600

−400

−200

0

200

400

600 Mproj−noisyMproj−cleanMproj−filt

Enhanced Speech

[ Pitsikalis & Maragos, IEEE SPL 2006 ]

Page 44: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Noisy Speech Database: Aurora 2

Task: Speaker Independent Recognition of Digit Sequences TI - Digits at 8kHz Training (8440 Utterances per scenario, 55M/55F) Clean (8kHz, G712) Multi-Condition (8kHz, G712) 4 Noises (artificial): subway, babble, car, exhibition 5 SNRs : 5, 10, 15, 20dB , clean

Testing, artificially added noise 7 SNRs: [-5, 0, 5, 10, 15, 20dB , clean] A: noises as in multi-cond train., G712 (28028 Utters) B: restaurant, street, airport, train station, G712 (28028 Utters) C: subway, street (MIRS) (14014 Utters)

Page 45: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Average Recognition Results on Aurora 2: plain CD vs FDCD

0

20

40

60

80

100

Wor

d A

ccur

acy

(%)

20dB 15dB 10dB 5dB 0dB Aver

SNR

+Plain CD +FDCD

Plain CD: Correlation Dimension without Dynamical Filtering

Page 46: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Average Recognition Results on Aurora 2: FDCD

0102030405060708090

100W

ord

Acc

urac

y (%

)

Clean 20dB 15dB 10dB 5dB 0dB Aver

SNR

Baseline +FDCD

Up to +40%

Page 47: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Average Recognition Results on Aurora 2: MFD

0102030405060708090

100W

ord

Accu

racy

clean 20 dB 15 dB 10 dB 5 dB 0 dB Ave.

SNRBaseline MFD

Up to +27%

Page 48: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Average Recognition Results on Aurora 2

2

12

22

32

42

52

62

72

Accuracy

Clean 20 15 10 5 0 average

SNR

Plain Fractal Features (Aurora 2)

CDFDCDMFD

Page 49: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Average Recognition Results on Aurora 2:Hybrid Features: Fractals and Modulations

3040

5060

7080

9010

0

Acc

urac

y

20 dB 10 dB 5 dB

SNR

Baseline +FMP +FDCD +FMP+FDCD

Up to +61%

[ Pitsikalis & Maragos, IEEE SPL 2006; Speech Commun 2009 ]

Page 50: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Lyapunov Exponents (L.E.s)

k-th Lyapunov Number:k-th Lyapunov Exponent:

1/

lim ( )nn

k n kL rln( )k kL

1 2 1 ek k D

Page 51: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Lyapunov Exponents (II)

Quantify signal predictability (orbits convergence-divergence rates in phase space)

Positive L.E. exponential divergenceNegative L.E. exponential convergence

Dissipative system sum of L.Es <0Chaotic system at least one L.E >0

Invariants of system dynamics useful for characterization /recognition purposes

Determine prediction horizon(upper bound of system predictability)

Page 52: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Prediction on Reconstructed Attractor(Kokkinos & Maragos, T-SAP 2005)

Goal: capture dynamics of MIMO systemfrom input-output pairs

Models tested: Local Polynomials Global Polynomials Radial Basis Function networks Takagi-Sugeno-Kang models Support Vector Machines

1 ( )n nX F X

1 ( )n nX X f

Page 53: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Computation of Lyapunov Exponents Consider an orbit Oseledec matrix:

i-th L.E. , is i-th eigenvalue of OSL Limitations:• Only approximation of Jacobian J of f is available

( F is an approximation to f )• Ill-conditioned nature of OSL

recursive QR decomposition technique• Limited data set local L.E.s

1 1lim ( ) ( ) ( ) ( )T T TN F N F F F NX X X X OSL J J J J

1 ( ), 1, 2, ,n nX X n N f

log( )i is is

Page 54: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Validation of Lyapunov Exponents

Inverse time sequencing of data • True exponents flip sign (divergence of nearby orbits

becomes convergence & vice versa)• False exponents remain negative

(artifact of embedding processno dependence on system dynamics)

Models that learn the data (and not the system dynamics) fail to give such results.

RBF nets, TSK-0, Global Polynomials ... failed SVM, TSK-1 succeeded

Page 55: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Applications to Speech Signals(Kokkinos & Maragos 2005)

Prediction – coding with global polynomials(smaller MSE than LPC with same # of params )

Speech analysis using Lyapunov exponents• Vowels have small positive L.E.s• Voiced fricatives have bigger positive L.E.s• Unvoiced fricatives have no validated L.E.s

(too noisy)• Stop sounds have no validated L.E.s

(non-stationary)• Non-validated L.E.s are still useful

Page 56: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Speech Results: validated L.E.s Phoneme: /aa/

Phoneme: /v/ 0 200 400 600 800 1000 1200

−1

−0.5

0

0.5

1

# of Jacobians used

Lyapunov Exponents of /aa/

Direct L.E.−Inverse L.E.

−0.50

0.51

−0.50

0.51

−0.5

0

0.5

1

X1

Reconstructed Attractor of /aa/

X2

X3

200 400 600 800 1000 1200−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Original Signal: /aa/

Sample index

1D

Sig

na

l

200 400 600 800 1000

−0.5

0

0.5

1Original Signal: /v/

Sample index

1D

Sig

na

l

−0.50

0.51

−0.50

0.51

−0.5

0

0.5

1

X1

Reconstructed Attractor of /v/

X2

X3

200 400 600 800

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

# of Jacobians used

Lyapunov Exponents of /v/

Direct L.E.−Inverse L.E.

Page 57: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Speech Results: Non-validated L.E.s Phoneme: /sh/

Phoneme: /t/500 1000 1500

−1

−0.5

0

0.5

1Original Signal: /sh/

Sample index

1D S

igna

l

−1

0

1

−1

0

1−1

−0.5

0

0.5

1

X1

Reconstructed Attractor of /sh/

X2

X3

−0.50

0.51

−0.50

0.51

−0.5

0

0.5

1

X1

Reconstructed Attractor of /t/

X2

X3

200 400 600 800 1000

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Original Signal: /t/

Sample index

1D S

igna

l

0 500 1000 1500

−2

−1

0

1

2

# of Jacobians used

Lyapunov Exponents of /sh/

Direct L.E.−Inverse L.E.

0 200 400 600 800 1000

−1.5

−1

−0.5

0

0.5

1

1.5

2

# of Jacobians used

Lyapunov Exponents of /t/

Direct L.E.−Inverse L.E.

Page 58: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Speech Lyapunov Exponents

[ Kokkinos & Maragos, IEEE T-SAP 2005 ]

Page 59: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Speech Sound ClassificationUsing only L.E.s (PCA projection of 3 first L.E.s):

x :Unvoiced Fric., o :Unvoiced Stop

x :Unvoiced Fric., o :Voiced Fric.

x :Vowel, o :Voiced Stop

x :Vowel, o :Unvoiced Fric.

> :Unvoiced Stop, o :Voiced Stop

x :Vowel, o :Unvoiced Stop

When combined with MFCC: (4 classes) ~12% smaller error using K-NN classifier

Page 60: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Other Works on Speech Fractals or Chaotic Dynamics

C. A. Pickover and A. Khorasani, ‘‘Fractal Characterization of Speech Waveform Graphs,’’ Computer Graphics 1986.

P. J. B. Jackson and C. H. Shadle, “Frication noise modulated by voicing, as revealed by pitch-scaled decomposition”, J. Acoust. Soc. Amer. 2000.

S. McLaughlin and P. Maragos, “Nonlinear Methods for Speech Analysis and Synthesis”, in Advances in Nonlinear Signal and Image Processing, edited by S. Marshall and G. L. Sicuranza, EURASIP Book Series on Signal Processing and Communications, Hindawi Publ. Corp., 2006, pp.103-140.

M. Zaki, J. N. Shah and H. A. Patil, “Effectiveness of Multiscale Fractal Dimension-based Phonetic Segmentation in Speech Synthesis for Low Resource Language”, in Proc. Int’l Conf. on Asian Language Processing (IALP) 2014.

K. López-de-Ipina, J. Solé-Casals, H. Eguiraun, J.B. Alonso, C.M. Travieso, A.Ezeiza, N Barroso, M. Ecay-Torres, P. Martinez-Lage, Blanca Beitia, “Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: A fractal dimension approach”, Computer Speech & Language 2015.

E. Tzinis, G. Paraskevopoulos, C. Baziotis, A. Potamianos, “Integrating Recurrence Dynamics for Speech Emotion Recognition”, in Proc. Interspeech 2018.

Page 61: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

61

Fractals and Music

Ref: • A. Zlatintsi and P. Maragos, “Multiscale Fractal Analysis of Musical Instrument Signals

with Application to Recognition”, IEEE Transactions on Audio, Speech and Language Processing, Apr. 2013.

Page 62: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

Multiscale Fractal Dimension of Music Sounds

0 1 2 3 4 53

4

5

6

7

8

LOG SCALE

LOG

AR

EA

BassBassoonBb ClarinetCelloFluteHornTuba

[ Zlatintsi & Maragos, T‐ASLP 2013 ]

Page 63: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

63

Morphological Covering MethodD 2 lim log[AB(s) / s2 ]

log(1 / s)

Double Bass steady state (solid line), its multiscale flat dilations and erosions at scales s=25,75, where B is a 3-sample symmetric horizontal segment with zero height.

P. Maragos and A. Potamianos. J. Acoust. Soc. Amer., 1999.

0 1 2 3 4 53

4

5

6

7

8

LOG SCALELO

G A

REA

BassBassoonBb ClarinetCelloFluteHornTuba

Multiscale Fractal Analysis of Musical Instrument Signals

log[AB(s)] vs log(s) for the seven analyzed instruments for the note C3 except for Bb Clarinet and Flute shown for C5 instead. Notethe difference in the slope for larger scales . (for 30ms analysis window).

Page 64: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

64

MFD Analysis for Steady State of the Note

Upright Bass Clarinet Cello

Flute Clarinet Oboe Piano

Mean MFD (middle line) and standard deviation (error bars) (for 30 ms analysis window, updated every 15 ms).

Page 65: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

65

MFD Analysis for Steady State of the Note

D = 1.42

D = 1.35

D = 1.65

D = 1.46

D = 1.37

D = 1.89

Horn Tuba Bassoon

Bb Clarinet Flute Bb Clarinet Flute

Page 66: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

66

MFD Analysis on Synthesized Signals

Strong dependence on the frequency

f=5Hz f=50Hz f=100Hz

f=200Hz

f=300Hz f=400Hz f=500Hz f=600Hz

Page 67: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

67

Analysis of MFD during the Attack

MFDs estimated for the 7 analyzed instruments attacks, averaged over the whole range (using 30 msanalysis windows).

Similar as for the steady state Higher D for small scales st and

more fragmentation.

Increased value of D(s = 1).

Clear distinction of D among some of the analyzed instruments.

Mean MFD and standard deviation of the attack and steady state of A3 for Cello (left images) and F4 for Flute (right images).

Page 68: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

68

MFD Variability of the Steady State for the Same Instrument over One Octave

Dependence on the acoustical frequency and the  MFD profile that increases rapidly for higher frequency sounds.

The instruments’ specific MFDs beholds the shape observed for the specific octave.

MFD of Bb Clarinet steady state notes, over one octave for one 30 ms analysis window.

Page 69: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

69

Experimental Evaluation

Double Bass, Bassoon & Tuba best recognized  Low discriminability between Bb Clarinet & Flute

Enhanced discriminability for Bassoon, Bb Clarinet and Horn Decreased for  Cello & Flute On average MFD+MFCC features improve the recognition over the baseline

Mean AccuracyΗΜΜ Ν=5 Μ=3

Example of the 13 logarithmically sampled points of the MFD, for Bb Clarinet (A3), forming the MFDLG feature vector.

Page 70: Nonlinear Aspects of Speech Production: Fractals and Chaotic …cvsp.cs.ntua.gr/interspeech2018/slides/PM_Nonlinear... · 2018-09-27 · Nonlinear Aspects of Speech Production: Fractals

70

Conclusions

Existence-Importance of nonlinear speech structure of turbulence type (fractals, chaotic dynamics)

Speech technology systems can benefit from including such nonlinear features

Find/extract robust nonlinear features of turbulence type Improve computational algorithms Fuse nonlinear with linear features Applications also to other sound signals, e.g. music

For more information, demos, and current results: http://cvsp.cs.ntua.gr and http://robotics.ntua.gr