Digital Audio Signal Processing Lecture-3 Noise Reduction Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven [email protected] homes.esat.kuleuven.be/~moonen

Digital Audio Signal Processing

Lecture-3Noise Reduction

Marc MoonenDept. E.E./ESAT-STADIUS, KU Leuven

[email protected]

homes.esat.kuleuven.be/~moonen/

mailto:[email protected]

Digital Audio Signal Processing Version 2015-2016 Lecture-3: Noise Reduction p. 2

Overview

• Spectral subtraction for single-micr. noise reduction– Single-microphone noise reduction problem– Spectral subtraction basics (=spectral filtering)– Features: gain functions, implementation, musical noise,…– Iterative Wiener filter based on speech signal model

• Multi-channel Wiener filter for multi-micr. noise red.– Multi-microphone noise reduction problem– Multi-channel Wiener filter (=spectral+spatial filtering)

• Kalman filter based noise reduction– Kalman filters– Kalman filters for noise reduction


Single-Microphone Noise Reduction Problem

• Microphone signal is

• Goal: Estimate s[k] based on y[k]

• Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, …

Digital audio restoration • Will consider speech applications: s[k] = speech signal

desired signal estimate

desired signal s[k]

noise signal(s)

? ][ksy[k]

][][][ knksky

desired signal contribution

noisecontribution


Spectral Subtraction Methods: Basics

• Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is

(i-th frame)

• However, speech signal is an on/off signal, hence some frames have speech +noise, i.e.

some frames have noise only, i.e.

• A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)

][][][ knksky

)()()( iii NSY

frames} only'-noise{`frame )( 0 )( iii NY


Spectral Subtraction Methods: Basics

• Definition: () = average amplitude of noise spectrum

• Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames

• Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated ():

based on `gain function’

)()()(ˆ iii YGS

))(ˆ),(()( ii YfG

framesonly -noise

)(1

)(ˆM

iYM

})({)( iNE


Spectral Subtraction: Gain Functions



• Example 1: Ephraim-Malah Suppression Rule (EMSR)

with:

• This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984].

• Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985].

(*) minimum mean squared error

modified Bessel functions

skip fo

rmulas



• Example 2: Magnitude Subtraction – Signal model:

– Estimation of clean speech spectrum:

– PS: half-wave rectification

)(,)(

)()()(

iyji

iii

eY

NSY

)(

)(

)(ˆ1

)(ˆ)()(ˆ

)(

)(,

i

G

i

jii

YY

eYS

i

iy

))(,0max()( ii GG



• Example 3: Wiener Estimation – Linear MMSE estimation:

find linear filter Gi() to minimize MSE

– Solution:

Assume speech s[k] and noise n[k] are uncorrelated, then...

– PS: half-wave rectification

2

2

2

22

,

,,

,

,

)(

)(ˆ1

)(

)(ˆ)(

)(

)()(

)(

)()(

ii

i

iyy

inniyy

iyy

issi

YY

Y

P

PP

P

PG

)(

)(

)().(

)().()(

,

,

iyy

isy

ii

iii P

P

YYE

YSEG <- cross-correlation in i-th frame

<- auto-correlation in i-th frame

2

)(ˆ

)().()(

iS

YGSE iii


Spectral Subtraction: Implementation

® Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank)

= estimate for Y(n ) at time i (i-th frame)

N=number of frequency bins (channels) n=0..N-1 D=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter)

® frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2D..3D)® subband processing:

® synthesis bank: matched to analysis bank (see DSP-CIS)

y[k]Y[n,i]

Short-timeanalysis

Short-timesynthesis

Gain functions

][ˆ ks],[ˆ inS

],[ˆ in

],[].,[],[ˆ inYinGinS


Spectral Subtraction: Musical Noise

• Audio demo: car noise

• Artifact: musical noise

What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames,

resulting in random gains Gi() ® statistical analysis shows that broadband noise is transformed into

signal composed of short-lived tones with randomly distributed frequencies (=musical noise)

magnitude subtraction][ky ][̂ks


probability that speech is present, given observation

)()()(ˆ ii YGS

instantaneousaverage

Spectral Subtraction: Musical Noise

Solutions?- Magnitude averaging: replace Yi() in calculation of Gi()

by a local average over frames

- EMSR (p7)- augment Gi() with soft-decision VAD:

Gi() P(H1 | Yi()). Gi() …


Spectral Subtraction: Iterative Wiener Filter

Example of signal model-based spectral subtraction…• Basic:

Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models

• Procedure:1. Estimate parameters of a speech model from noisy signal y[k]

2. Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9)

3. Re-estimate parameters of speech model from the speech signal estimate

4. Iterate 2 & 3



white noise

generator

pulse train……

pitch period

voiced

unvoiced

x

M

m

mjme

1

1

1

sg

speech

signal

frequency domain:

time domain:

= linear prediction parameters

all-pole filter

u[k]

)(1

)(

1

Ue

gS M

m

mjm

s

M

msm kugmksks

1

][][][

TM 1α



For each frame (vector) y[m] (i=iteration nr.)

1. Estimate and

2. Construct Wiener Filter (p.9)

with:

• estimated during noise-only periods

•

3. Filter speech frame y[m]

isg , TiMii ,,1 α

)()(

)(..)(

nnss

ss

PP

PG

)(nnP

2

1,

,

1

)(

M

m

mjim

isss

e

gP

][ˆ mis

Repeat until

some errorcriterion issatisfied


Overview



• Kalman filter based noise reduction– Kalman filters– Kalman filters for noise reduction


Multi-Microphone Noise Reduction Problem

(some) speech estimate

speech source

noise source(s)

microphone signals ][kn

][ks

? ][ks

speech part noise part

M=

num

ber

of m

icro

phon

es



Will estimate speech

part in microphone 1

(*) (**)

? ][1 ks

(*) Estimating s[k] is more difficult, would include dereverberation... (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate

][kn

][ks



• Data model:

See Lecture-2 on multi-path propagation, with q left out for conciseness.

Hm(ω) is complete transfer function from speech source position to m-the microphone

)(

)(

)(

)(.

)(

)(

)(

)(

)(

)(

)()().(

)()()(

2

1

2

1

2

1

MMM N

N

N

S

H

H

H

Y

Y

Y

S

Nd

NSY


Multi-Channel Wiener Filter (MWF)

• Data model:

• Will use linear filters to obtain speech estimate (as in Lecture-2)

• Wiener filter (=linear MMSE approach)

Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’…

)()().( )( NdY S

)().()().()(ˆ1

*1 YFH

M

mmm YFS

})().()({min2

1)( YFFHSE



• Wiener filter solution is (see DSP-CIS)

– All quantities can be computed !– Special case of this is single-channel Wiener filter formula on p.9– In practice, use alternative to ‘subtraction’ operation (see literature)

)}().({)}().({.)}().({

...

.)}().({)}().({)(

*1

*1

1

)0)}().({(with

lationcrosscorre

*1

1

ationautocorrel

*

1

NEYEE

SEE

H

NE

H

NYYY

YYYF

S

compute during speech+noise periods

compute during noise-only periods


• Note that… MWF combines spatial filtering (as in Lecture-2) with

single-channel spectral filtering (as in single-channel noise reduction)

if

then…

noise vectorsteering

2

1

)()(.)(

)(

)(

)(

)(

Nd

Y

S

Y

Y

Y

M

)()}().({ NNHE ΦNN



…then it can be shown that

① represents a spatial filtering (*)

Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field– Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field.

MWF is operated without invoking any prior knowledge (steering vector/noise field) ! (the secret is in the voice activity detection… (explain))

(*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency)

)().(.)()( 1

scalar

dΦF NN

)().(1 dΦNN



…then it can be shown that

① represents a spatial filtering (*)

② represents an additional `spectral post-filter’

i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter

(prove it!)

)().(.)()( 1

scalar

dΦF NN

)().(1 dΦNN


1)(.)().().(

)(.)(...)( 21

*1

2

S

HS

NNH dΦd

)(


Multi-Channel Wiener Filter: Implementation

• Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering:

][kn

][ks

][1 ks

][1 ky

filter

coefficients


Solution is…

}].[][{min2

1 fyf kksE T TT

MTT

TTM

TT

kkkk ][...][][][

...

21

21

yyyy

ffff

]}[.][{]}[.][{.}][.][{

]}[.][{.}][.][{

11

1

1

1

knkEkykEkkE

kskEkkE

T

T

nyyy

yyyf

compute during speech+noise periods

compute during noise-only periods

• Implementation with time-domain linear filtering:

Multi-Channel Wiener Filter: Implementation


Overview



• Kalman filter based noise reduction– Kalman filters : See Lecture-6 – Kalman filters for noise reduction

Continue only

after

having st

udied Lecture-6


Kalman filter for Speech Enhancement

• Assume AR model of speech and noise

• Equivalent state-space model is…

y=microphone signal

u[k], w[k] = zero mean, unit

variance,white noise

][][

][][]1[

kky

kkkTxc

vAxx



with:

n

s

NM

T

n

s

g0

0gGC

A0

0AA ;100100;

Tkwkuk ][][.][ Gv

;00;00 nTns

Ts gg gg

TGGQ .

s[k]

an

d n

[k] a

re in

clud

ed

in s

tate

ve

cto

r,

hen

ce c

an b

e e

stim

ate

d b

y K

alm

an

Filt

er



Disadvantages iterative approach:• complexity• delay

split signal

in frames

estimate

parameters

Kalman Filter

reconstruct

signal

iterations

y[k]

][̂ks

][ˆ min

Iterative algorithm (details omitted)



iteration index time index (no iterations)

State Estimator

(Kalman Filter)

Parameters Estimator

(Kalman Filter)

D

D

]1|1[ˆ kkn

][ky

]|[ˆ],|[̂ kknkks

]1|1[ˆ kkgn

],|[ˆ kkgs ]|[ˆ kkgn

ns ˆ,ˆ

ns gg ˆ,ˆ

Sequential algorithm (details omitted)


CONCLUSIONS

• Single-channel noise reduction– Basic system is spectral subtraction– Only spectral filtering, hence can only exploit differences in spectra

between noise and speech signal:• noise reduction at expense of speech distortion• achievable noise reduction may be limited

• Multi-channel noise reduction– Basic system is MWF,– Provides spectral + spatial filtering (links with beamforming!)

• Iterative Wiener filter & Kalman filtering– Signal model based approach

Documents

Digital Audio Signal Processing Lecture-3 Noise Reduction Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven [email protected] homes.esat.kuleuven.be/~moonen