PowerPoint Presentation - Noise Reductionmy.fit.edu/~vkepuska/ece5525/Projects/Spring2005/Don McMann/Project1.2.pdfNoise Reduction •Based on Weiner filter theory •Noise reduction

Noise Reduction

Two Stage Mel-Warped Weiner Filter Approach

Intellectual Property

• Advanced front-end feature extraction algorithm

• ETSI ES 202 050 V1.1.3 (2003-11)

• European Telecommunications Standards Institute

• ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).

Noise Reduction

• Based on Weiner filter theory

• Noise reduction is performed in two stages

• Input signal is de-noised in the first stage.

• Second stage – dynamic noise reduction based on SNR of processed signal

First Stage

Spectrum

Estimation

PSD

Mean

WF

Design

Mel

Filter-Bank

Mel

IDCT

Apply

Filter

VADNest

To Second

Stage

Second Stage

Spectrum

Estimation

PSD

Mean

WF

Design

Mel

Filter-Bank

Gain

Factorization

Mel

IDCT

Apply

Filter

From First Stage

OFF

Output

Buffering

Buffer 1 Buffer 2

0 1 2 3 0 1 2 3

A B C D E F G H

B C D new F G H

De-noised

(1st Stage)

De-noised

(output)

• 1 frame = 80 samples

• 1 buffer = 4 frames

A

De-noised

(output)

Spectrum Estimation

• Input signal is divided into overlapping frames of Nin = 200 samples.

• A 25ms frame length and 10ms frame shift (80 samples) are used.

• Each frame Sw(n) is windowed with a Hanning window of length Nin.

Spectrum Estimation

sw(n ) s

in(n ) w

Hann(n )

in

HannN

nnw

)5.0(2cos5.05.0)(

where

SFFT(n )

sw(n ), 0 n N

in 1

0, Nin n NFFT

1

Padding from Nin up to NFFT-1, NFFT = 256

Spectrum Estimation

indexfrequencybinwherenSFFTbinXFFT

,)()(

20,2

FFTNbinbinXbinP

• Frequency representation:

• Power spectrum:

• Smoothing:

40,2

)12(2FFT

NbinbinPbinP

binPin

Power Spectral Density Mean

• Compute for each Pin(bin) the mean over the last TPSD = 2 frames.

Pin _ psd

bin , t 1

2Pin

i 0

21

bin , t 1

Wiener Filter Design

• A forgetting factor (weight) is computed for each frame, λNSE.

If (t < 100 frames)

λNSE = 1 – 1/t

else λNSE = 0.99


First stage noise spectrum estimate is

updated based on VAD flag:

If flag = 0

P1/2noise(bin,tn) = min(λNSE ● P1/2

noise(bin,tn-1)+(1-λNSE)●PSDmean,exp(-10))

If flag = 1

P1/2noise(bin,t) = P1/2

noise(bin,tn) (last non speech frame)


Second stage is updated permanently:

If (t < 11)

Pnoise(bin,t) = λNSE ● Pnoise(bin,tn-1)+(1- λNSE)●PSDmean

else

update = 0.9 + 0.1×PinPSD(bin,t)/(PinPSD(bin,t)+ Pnoise(bin,t-1) )

×(1+1/(1+0.1×PinPSD(bin,t) /(PinPSD(bin,t-1)))

Pnoise(bin,t) = Pnoise(bin,t-1)×update


Noiseless spectrum is estimated:

P1/2den(bin,t) = 0.98×P1/2

den(bin,t-1)+(1-0.98)×T[PSDmean -P1/2noise(bin,t) ]

where the threshold function T is

otherwise

tbinziftbinztbinzT

0

0),(,,


The priori SNR is calculated:

tbinP

tbinPtbin

noise

den

,

,,

The filter transfer function is

tbin

tbintbinH

,1

,,


tbinPtbinHtbinPinPSDden

,,,2121

2

The filter transfer function is used to improve noiseless signal estimation:

The improved priori SNR is:

22

222,

,

,max, dB

tbinP

tbinPtbin

noise

den

Voice Activity Detection

• VAD is used to detect noise frames

• Find frame energy:If frame threshold < 10

long term energy factor (LTE) = 1 - 1/t

Else LTE = 0.97;

Calculate frame energy:

frameEn 0.5 ln

64 Sinn

2

i 0

M 1

64


• Use frame energy to update mean energy:

If frame energy - mean energy < 20 (SNR threshold) or t < 10

Then if (frameEn < meanEn) or (t < 10)

meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn)

Else meanEn = meanEn+(1 - 0.99) * (frameEn - meanEn)

If (meanEn < 80)

meanEn = 80


• Is the current frame speech?If t > 4

if (frameEn - meanEn) > 15

IT IS SPEECH

nbSpeechFrame++

else if nbSpeechFrame > 4

hangover = 15, nbSpeechFrame = 0

if (hangover != 0)

IT IS SPEECH

else IT IS NOT SPEECH

Mel Filter Bank

• The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel-frequency scale.

• The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

Mel IDCT

• The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:

240,

24

0

2

nnkIDCTkHnhmel

k

melWF

)(2

cos, kdff

kfnnkIDCT

samp

centr

mel

samp

centrcentr

f

kfkfkdf

11

Gain Factorization

• Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage.

• The de-noised frame signal energy is calculated as:

tbinPtE

bin

denden,

65

0

2/1

3

Gain Factorization

• The noise energy of the current frame is estimated as:

tbinPtE

bin

noisenoise,

65

0

2/1

Gain Factorization

• The smoothed SNR is evaluated using 3 de-noised frame energies and the noise energy

tEtEtE

tEtEtERatio

noisenoisenoise

dendenden

12

If (Ratio > 0.0001)

Then

SNRavg(t) = 6.67 × log10 (Ratio)

Else

SNRavg(t) = -33.3

Gain Factorization

• To decide the degree of aggression, the SNR is tracked:

If {(SNRavg(t) – SNRlow-track(t-1)) < 10 or t < 10}

calculate λSNR(t)

SNRlow-track(t) = λSNR(t)× SNRlow-track(t -1)+(1-

λSNR(t))×SNRavg(t)

Else

SNRlow-track(t) = SNRlow-track(t -1)

Gain Factorization

• Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech.

• The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

Apply Filter

• The causal impulse response is obtained, truncated and weighted by a Hanning window.

• The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

Offset Compensation

• A filter is used to remove the DC offset over the frame length

interval (80 samples).

)1()1024/11()1()()(__

nSnSnSnSofnrnrnrofnr

Where Snr is the noise reduced signal

Results

Noisy test file:

After de-noise:

Results

Footloose:

Not Footloose:

Results: why didn’t this work?

Hair dryer:

Still there?!?!:

Results

Hair dryer:

Gone:

Documents

PowerPoint Presentation - Noise Reductionmy.fit.edu/~vkepuska/ece5525/Projects/Spring2005/Don McMann/Project1.2.pdfNoise Reduction •Based on Weiner filter theory •Noise reduction