22
Modeling of Mel Frequency Features Modeling of Mel Frequency Features for Non Stationary Noise for Non Stationary Noise I.Andrianakis I.Andrianakis P.R.White P.R.White Signal Processing and Control Signal Processing and Control Group Institute of Sound and Group Institute of Sound and Vibration Research University Vibration Research University of Southampton of Southampton

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Modeling of Mel Frequency Features Modeling of Mel Frequency Features for Non Stationary Noisefor Non Stationary Noise

I.AndrianakisI.AndrianakisP.R.WhiteP.R.White

Signal Processing and Control Group Signal Processing and Control Group Institute of Sound and Vibration Institute of Sound and Vibration

Research University of SouthamptonResearch University of Southampton

Page 2: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Outline Outline

Introduction.

Mel Frequency Log Spectrum and Cepstrum.

Distribution of the MFLS and MFC coefficients.

Physical Interpretation of the distributions.

Modeling of data with Gaussian Mixture Models and the EM algorithm.

Results.

Summary & Further work.

Page 3: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

IntroductionIntroduction

When working with speech or noise, often one wishes to extract When working with speech or noise, often one wishes to extract some salient features of the signals so that instead of working with some salient features of the signals so that instead of working with the whole data set to concentrate on a smaller set that conveys the whole data set to concentrate on a smaller set that conveys most significant information.most significant information.

Such features are the Mel Frequency Log Spectral and Cepstral Such features are the Mel Frequency Log Spectral and Cepstral Coefficients.Coefficients.

Their favourable property is that they focus mostly on low Their favourable property is that they focus mostly on low frequency components, where most of the car or train noise energy frequency components, where most of the car or train noise energy exists, while compacting the – usually lower energy - higher exists, while compacting the – usually lower energy - higher frequencies.frequencies.

We shall present some results from our research on the application We shall present some results from our research on the application of MFLSCs and MFCCs to noise signals and their modelling with of MFLSCs and MFCCs to noise signals and their modelling with Gaussian Mixture Models.Gaussian Mixture Models.

Page 4: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Mel Frequency Mel Frequency Log Spectrum and CepstrumLog Spectrum and Cepstrum

Mel Frequency Cepstrum

Mel FrequencyLog Spectrum

Noise STFT |.|2Mel Frequency Filter Banks

Log( . ) DCT( . )

Page 5: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Rationale Behind the Use of Rationale Behind the Use of Mel Frequency FeaturesMel Frequency Features

Mel frequency warping focuses in low frequencies (<1Khz) where the filter bank spacing is linear.

Energy above 1KHz is compacted as the filters have logarithmically increasing pass bands.

Suitable for representing ambient noise (i.e. in cars and trains) because the energy is concentrated in the lower frequencies.

Page 6: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Rationale Behind the Use of Rationale Behind the Use of Mel Frequency Features Mel Frequency Features (II)(II)

Filter banks are closely spaced where the signal’s energy is higher.

Page 7: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Comparison With LPCComparison With LPC

TraiTrainn

CarCar

PSDPSD 13 LPC 13 LPC SpectrumSpectrum

20 Mel 20 Mel SpectrumSpectrum

Frequency Frequency [Hz][Hz]

Page 8: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Distribution of the Mel Frequency Distribution of the Mel Frequency CoefficientsCoefficients

We are concerned with the form of the probability distribution of the Mel

Frequency features, that is, the Mel Log Spectrum and the Mel

Cepstrum.

In the following, we shall present the distribution of MF Log Spectrum

Coefficients and MF Cepstral Coefficients for various types of signals.

We shall also try to give a physical explanation for the form of the

distribution for each case.

Page 9: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

‘‘Stationary’ Noise Stationary’ Noise

0 20 40 60 80 100 120-1

-0.5

0

0.5

1

Time [s] Time

Fre

quen

cy

Spectrogram

0 20 40 60 80 100 1200

500

1000

1500

2000

2500

3000

3500

4000

This is a segment of car noise and its respective spectrogram.

The signal looks fairly stationary in its mean and variance, while the spectrogram shows that its frequency components do not vary with time either.

We shall proceed now to examine the distribution of its Mel Frequency Features.

Page 10: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Mel Log SpectrumMel Log Spectrum

0 5 10 15 200

0.5

1

1.5

2

2.5

3

3.5

4Kurtosis of Coeffficients

Corfficients

Kur

tosi

s0 1 2 3 4 5 6 7

0

100

200

300

400

500

6001

-1 0 1 2 3 4 50

100

200

300

400

500

6005

-8 -7 -6 -5 -4 -3 -2 -10

100

200

300

400

500

600

700

800

900

100020

-7 -6 -5 -4 -3 -2 -1 0 1 20

200

400

600

800

1000

120016

1 5 16

20Coefficient

s

Time [s]

Coeff

icie

nts

Mel Log Spectrum

0 20 40 60 80 100 120

2

4

6

8

10

12

14

16

18

20

Below we can see the evolution with time of the previous signal’s Mel Log Spectrum, the kurtosis of its coefficients and some characteristic distributions.

The coefficients follow almost a Gaussian distribution.

Page 11: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Mel CepstrumMel Cepstrum

Time [s]

Coeff

icie

nts

Mel Cepstrum

0 20 40 60 80 100 120

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20-0.5

0

0.5

1

1.5

2

2.5Kurtosis of Coeffficients

Corfficients

Kur

tosi

s-2 -1.5 -1 -0.5 0 0.5 1 1.5

0

50

100

150

200

250

300

350

400

450

50015

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.50

50

100

150

200

250

300

350

400

450

50012

4 6 8 10 12 14 16 180

100

200

300

400

500

600

700

800

900

10002

-8 -6 -4 -2 0 2 4 60

100

200

300

400

500

600

700

800

9001

1 2 12

15Coefficient

s

This is the evolution with time of the Mel Cepstrum, the kurtosis of its coefficients and some selected distributions.

The coefficients are again almost Gaussian. The high kurtosis for 1 and 2 is due to a few outliers.

Page 12: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Non-Stationary Noise Non-Stationary Noise

0 50 100 150 200

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Time [s] Time

Fre

quen

cy

Spectrogram

0 50 100 150 2000

500

1000

1500

2000

2500

3000

3500

4000

We shall proceed now to examine how the distributions vary in the case of Non-Stationary noise.

This is a segment of train noise, where a number of amplitude fluctuations occurs due to events as changing of rails and other trains passing by.

Page 13: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Mel Log SpectrumMel Log Spectrum

Time [s]

Coeff

icie

nts

Mel Log Spectrum

0 50 100 150 200

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20-0.5

0

0.5

1

1.5

2

2.5Kurtosis of Coeffficients

Corfficients

Kur

tosi

s-3 -2 -1 0 1 2 3 4 5

0

100

200

300

400

500

600

700

800

900

10001

-2 -1 0 1 2 3 4 5 60

100

200

300

400

500

600

700

800

9007

-4 -3 -2 -1 0 1 2 3 40

100

200

300

400

500

600

700

800

900

100011

-7 -6 -5 -4 -3 -2 -1 00

100

200

300

400

500

600

700

800

900

100019

The Mel Log Spectrum is now varying with time reflecting the different sound events. The kurtosis is also increasing for higher coefficients.

1 7 11

19Coefficient

s

The few first coefficients close to Gaussian but the higher ones develop longer tails.

Page 14: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Mel Cepstrum Mel Cepstrum

Time [s]

Coeff

icie

nts

Mel Cepstrum

0 50 100 150 200

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20-0.5

0

0.5

1

1.5

2

2.5Kurtosis of Coeffficients

Corfficients

Kur

tosi

s-10 -8 -6 -4 -2 0 2 4 6 8 100

100

200

300

400

500

600

700

800

900

10001

0 2 4 6 8 10 12 140

100

200

300

400

500

600

700

8002

-5 -4 -3 -2 -1 0 1 2 3 4 50

100

200

300

400

500

600

700

800

900

10004

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.50

100

200

300

400

500

600

700

800

90011

The sound events are now reflected in the first few Cepstrum coefficients.

1 2 4 11Coefficient

s

Unlike the Log Spectrum the first coefficients now have longer tails, while the higher tend to Gaussian.

Page 15: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Log Spectrum Distribution - Log Spectrum Distribution - Physical InterpretationPhysical Interpretation

The lower ML Spectrum coefficients represent the lower frequencies of the spectrum where there is always noise energy present.

Thus, they assume constant high values with not many fluctuations that turn them close to Gaussian.

Higher coefficients assume high values only temporarily, due to non stationary events.

This results in their distributions having longer tails.

When energy is present at high frequencies for prolonged periods they can even be bimodal.

Time [s]

Coeff

icie

nts

Mel Log Spectrum

0 50 100 150 200

2

4

6

8

10

12

14

16

18

20

-3 -2 -1 0 1 2 3 4 50

100

200

300

400

500

600

700

800

900

10001

-7 -6 -5 -4 -3 -2 -1 00

100

200

300

400

500

600

700

800

900

100019

1 19Coefficient

s

Page 16: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Cepstrum Distribution - Cepstrum Distribution - Physical InterpretationPhysical Interpretation

The lower Cepstrum Coefficients reflect the amplitude and envelope spectral fluctuations.

As both of these vary in non stationary signals so do the lower MFCCs resulting in distributions with long tails.

Higher coefficients however, convey mostly information about harmonic components, not as dominant in the more broadband like noise of trains and cars and definitely not fast fluctuating.

1 11Coefficient

s

Time [s]

Coeff

icie

nts

Mel Cepstrum

0 50 100 150 200

2

4

6

8

10

12

14

16

18

20

-10 -8 -6 -4 -2 0 2 4 6 8 100

100

200

300

400

500

600

700

800

900

10001

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.50

100

200

300

400

500

600

700

800

90011

Page 17: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Modelling the DataModelling the Data

The previous analysis showed that the distribution of Mel Log Spectrum and Mel Cepstrum coefficients deviates from the normal especially in the case of non-stationary noise, which is of most interest.

In our attempt to model successfully the coefficients we used Gaussian Mixture Models, which are capable of approximating irregularly shaped distributions.

An algorithm that allows us to fit mixtures of Gaussians into our data is the Estimation Maximization algorithm.

Page 18: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

The Estimation Maximization The Estimation Maximization Algorithm for Gaussian Mixture Algorithm for Gaussian Mixture

ModelsModelsWe assume the probabilistic model:

where:

We assume a latent random variable that determines the distribution comes from.

We then find the expected value of the log likelihood with

respect to , given and an initial guess of the parameters

That is:

1

( | ) ( | )M

i i ii

p p

x x

( , ), 1...i i i M

glog( ( , | ))p x y

jy

x

[log ( , | ) | , )]gE p x y x

jx

y

log ( , | ) ( | , )gp f dy y

x y y x

Page 19: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

The Estimation Maximization The Estimation Maximization Algorithm for Gaussian Mixture Algorithm for Gaussian Mixture

Models Models (II)(II)

This was the Expectation step. In the Maximization step we maximize the

expected value with respect to i.e.

The two steps are repeated until convergence.

For an excellent tutorial of EM see:

J. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application fir Gaussian Mixture and Hidden Markov Models

1arg max( [log ( , | ) | , )])i iE p

x y x

Page 20: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Fitting GMM to the DataFitting GMM to the Data

-10 -5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

-15 -10 -5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

-10 -5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-15 -10 -5 0 5 10 150

0.05

0.1

0.15

0.2

0.25

-20 -15 -10 -5 0 5 100

0.05

0.1

0.15

0.2

0.25

Single Gaussian

Two Gaussians

Three Gaussians

Here we present some results of fitting GMMs to

various distributions.

Page 21: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Summary Summary

Today we have discussed about:

The distribution of the Mel Frequency Log Spectral and Cepstral Coefficients.

The form this assumes in the presence of non-stationary noise providing also a physical explanation.

How it can be modeled with Gaussian Mixture models via the EM algorithm.

And finally showed some results of fitting GMMs into our data.

Page 22: Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration

Further Work Further Work

Examine the distribution of Mel Frequency features for noisy speech and see how these are altered by the presence of different noise types.

Construct Optimal Estimators for clean speech Mel features, given the noisy ones and the noise models.

Use HMMs with Gaussian Mixture Models for accommodating the different noise states.