Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Noise Reduction
Two Stage Mel-Warped Weiner Filter Approach
Intellectual Property
• Advanced front-end feature extraction algorithm
• ETSI ES 202 050 V1.1.3 (2003-11)
• European Telecommunications Standards Institute
• ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).
Noise Reduction
• Based on Weiner filter theory
• Noise reduction is performed in two stages
• Input signal is de-noised in the first stage.
• Second stage – dynamic noise reduction based on SNR of processed signal
First Stage
Spectrum
Estimation
PSD
Mean
WF
Design
Mel
Filter-Bank
Mel
IDCT
Apply
Filter
VADNest
To Second
Stage
Second Stage
Spectrum
Estimation
PSD
Mean
WF
Design
Mel
Filter-Bank
Gain
Factorization
Mel
IDCT
Apply
Filter
From First Stage
OFF
Output
Buffering
Buffer 1 Buffer 2
0 1 2 3 0 1 2 3
A B C D E F G H
B C D new F G H
De-noised
(1st Stage)
De-noised
(output)
• 1 frame = 80 samples
• 1 buffer = 4 frames
A
De-noised
(output)
Spectrum Estimation
• Input signal is divided into overlapping frames of Nin = 200 samples.
• A 25ms frame length and 10ms frame shift (80 samples) are used.
• Each frame Sw(n) is windowed with a Hanning window of length Nin.
Spectrum Estimation
sw(n ) s
in(n ) w
Hann(n )
in
HannN
nnw
)5.0(2cos5.05.0)(
where
SFFT(n )
sw(n ), 0 n N
in 1
0, Nin n NFFT
1
Padding from Nin up to NFFT-1, NFFT = 256
Spectrum Estimation
indexfrequencybinwherenSFFTbinXFFT
,)()(
20,2
FFTNbinbinXbinP
• Frequency representation:
• Power spectrum:
• Smoothing:
40,2
)12(2FFT
NbinbinPbinP
binPin
Power Spectral Density Mean
• Compute for each Pin(bin) the mean over the last TPSD = 2 frames.
Pin _ psd
bin , t 1
2Pin
i 0
21
bin , t 1
Wiener Filter Design
• A forgetting factor (weight) is computed for each frame, λNSE.
If (t < 100 frames)
λNSE = 1 – 1/t
else λNSE = 0.99
Wiener Filter Design
First stage noise spectrum estimate is
updated based on VAD flag:
If flag = 0
P1/2noise(bin,tn) = min(λNSE ● P1/2
noise(bin,tn-1)+(1-λNSE)●PSDmean,exp(-10))
If flag = 1
P1/2noise(bin,t) = P1/2
noise(bin,tn) (last non speech frame)
Wiener Filter Design
Second stage is updated permanently:
If (t < 11)
Pnoise(bin,t) = λNSE ● Pnoise(bin,tn-1)+(1- λNSE)●PSDmean
else
update = 0.9 + 0.1×PinPSD(bin,t)/(PinPSD(bin,t)+ Pnoise(bin,t-1) )
×(1+1/(1+0.1×PinPSD(bin,t) /(PinPSD(bin,t-1)))
Pnoise(bin,t) = Pnoise(bin,t-1)×update
Wiener Filter Design
Noiseless spectrum is estimated:
P1/2den(bin,t) = 0.98×P1/2
den(bin,t-1)+(1-0.98)×T[PSDmean -P1/2noise(bin,t) ]
where the threshold function T is
otherwise
tbinziftbinztbinzT
0
0),(,,
Wiener Filter Design
The priori SNR is calculated:
tbinP
tbinPtbin
noise
den
,
,,
The filter transfer function is
tbin
tbintbinH
,1
,,
Wiener Filter Design
tbinPtbinHtbinPinPSDden
,,,2121
2
The filter transfer function is used to improve noiseless signal estimation:
The improved priori SNR is:
22
222,
,
,max, dB
tbinP
tbinPtbin
noise
den
Voice Activity Detection
• VAD is used to detect noise frames
• Find frame energy:If frame threshold < 10
long term energy factor (LTE) = 1 - 1/t
Else LTE = 0.97;
Calculate frame energy:
frameEn 0.5 ln
64 Sinn
2
i 0
M 1
64
Voice Activity Detection
• Use frame energy to update mean energy:
If frame energy - mean energy < 20 (SNR threshold) or t < 10
Then if (frameEn < meanEn) or (t < 10)
meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn)
Else meanEn = meanEn+(1 - 0.99) * (frameEn - meanEn)
If (meanEn < 80)
meanEn = 80
Voice Activity Detection
• Is the current frame speech?If t > 4
if (frameEn - meanEn) > 15
IT IS SPEECH
nbSpeechFrame++
else if nbSpeechFrame > 4
hangover = 15, nbSpeechFrame = 0
if (hangover != 0)
IT IS SPEECH
else IT IS NOT SPEECH
Mel Filter Bank
• The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel-frequency scale.
• The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.
Mel IDCT
• The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:
240,
24
0
2
nnkIDCTkHnhmel
k
melWF
)(2
cos, kdff
kfnnkIDCT
samp
centr
mel
samp
centrcentr
f
kfkfkdf
11
Gain Factorization
• Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage.
• The de-noised frame signal energy is calculated as:
tbinPtE
bin
denden,
65
0
2/1
3
Gain Factorization
• The noise energy of the current frame is estimated as:
tbinPtE
bin
noisenoise,
65
0
2/1
Gain Factorization
• The smoothed SNR is evaluated using 3 de-noised frame energies and the noise energy
tEtEtE
tEtEtERatio
noisenoisenoise
dendenden
12
If (Ratio > 0.0001)
Then
SNRavg(t) = 6.67 × log10 (Ratio)
Else
SNRavg(t) = -33.3
Gain Factorization
• To decide the degree of aggression, the SNR is tracked:
If {(SNRavg(t) – SNRlow-track(t-1)) < 10 or t < 10}
calculate λSNR(t)
SNRlow-track(t) = λSNR(t)× SNRlow-track(t -1)+(1-
λSNR(t))×SNRavg(t)
Else
SNRlow-track(t) = SNRlow-track(t -1)
Gain Factorization
• Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech.
• The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.
Apply Filter
• The causal impulse response is obtained, truncated and weighted by a Hanning window.
• The input signal is filtered with the filter impulse response to produce the noise-reduced signal.
Offset Compensation
• A filter is used to remove the DC offset over the frame length
interval (80 samples).
)1()1024/11()1()()(__
nSnSnSnSofnrnrnrofnr
Where Snr is the noise reduced signal
Results
Noisy test file:
After de-noise:
Results
Footloose:
Not Footloose:
Results: why didn’t this work?
Hair dryer:
Still there?!?!:
Results
Hair dryer:
Gone: