Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 1
May 14th, 2006 ICASSP 2006, Toulouse, France 1
Defeating Ambient Noise:Practical Approaches for Noise Reduction and Suppression
Ivan TashevMicrosoft Research
Redmond, USA
May 14th, 2006 ICASSP 2006, Toulouse, France 2
IntroductionWhy signal enhancement is important:
Reducing the ambient noise from the captured audio signal is crucial for providing good sound in modern computing systems, critical for the needs of real time communication and speech recognition.
Tutorial goal:To present the key theoretical aspects and share our practical experience in the area of noise suppression and reduction for application in sound capture and processing systems.
Target audience:Engineers and researchers working in the area of audio signal processing planning or building audio systems for sound capturing.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 2
May 14th, 2006 ICASSP 2006, Toulouse, France 3
Introduction (2)
Noise suppression as science and as art:It is a science, because uses mathematical models and hypotheses, it is repeatable, i.e. we get the same results with the same input dataIt is an art, because it is about human perception of the sound and requires evaluation from a human
For speech signals the process is part of more general term speech enhancement
May 14th, 2006 ICASSP 2006, Toulouse, France 4
Defeating ambient noise: tutorial agenda
BasicsNoise suppressionDirectional microphonesMicrophone arraysAdvanced techniquesFree joke and conclusions
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 3
May 14th, 2006 ICASSP 2006, Toulouse, France 5
Basics
Noise: definition and propertiesSignal: definition and propertiesNoise suppression and reduction, speech enhancementAudio processing in frequency domain: weighting, transformation, synthesisBandpass filtering
May 14th, 2006 ICASSP 2006, Toulouse, France 6
Basics: noise properties
Statistical model: Zero mean Gaussian random processRight: airplane noise PDF vs. Gaussian PDF
In frequency domain: White noise spectrumPink noise: 6 dB/oct decreaseColored noise – with given spectrum Hoth noise: typical room noise model
Temporal characteristics: Pseudo stationary compared to speechSpecific noises may be different: wind noise
Spatial characteristics: Ambient, isotropic: evenly distributed Point noise sources - jammers
0 1000 2000 3000 4000 5000 6000 7000 8000-10
-5
0
5
10
15
20
25
30
35
Frequency, Hz
Spc
tral D
ensi
ty, d
B
Hoth noise
-4 -3 -2 -1 0 1 2 3 40
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04Probability distribution function
Times standard deviation
Pro
babi
lity
NoiseGauss
White Inside A320 NYC Café
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 4
May 14th, 2006 ICASSP 2006, Toulouse, France 7
Basics: signal properties
In most of the cases the signal is speechStatistical model (in long term):
Zero mean random Gaussian (Laplace, Gamma) process
Frequency domain (in short term):Voiced – e.g. vowels (harmonic structure) Unvoiced – e.g. fricatives (noise type)
Temporal: Speech and nonspeech segments
Spatial: Point sound source (mouth or loudspeaker)
-4 -3 -2 -1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Probability distribution function
Times standard deviation
Pro
babi
lity
SpeechGaussLaplaceGamma
May 14th, 2006 ICASSP 2006, Toulouse, France 8
Basics: classification
Noise suppression: removing the noise based on statistical models of the noise and signal, spectral subtractionNoise reduction or cancellation: removing the noise based on knowledge or estimation of the corrupting signalSignal (speech) enhancement: more general term for any type of processing aiming improving some property of the signalActive noise cancellation: decreasing the noise level in certain area by sending opposite phase sound with loudspeakers – not discussed in this tutorial
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 5
May 14th, 2006 ICASSP 2006, Toulouse, France 9
Basics: processing flow
Processing in frequency domainAudio frames:
80-1024 samples, 5-25 ms
Frequency domain transformations:Fourier (FFT): symmetric spectra, zero Fs/2 bin, process the first halfMCLT (Malvar, 1992): shifts bins ½ frequency binOther: Hartley, wavelet, cepstra; no re-synthesis
May 14th, 2006 ICASSP 2006, Toulouse, France 10
Basics: processing flow (2)
Overall process (typical):Extract the frame
Weighting TransformProcessInverse transformSynthesis (overlap-add) using ½ of the previous frame
Move one half frame forward, repeat
+ +. . .
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 6
May 14th, 2006 ICASSP 2006, Toulouse, France 11
Basics: processing flow (3)
Weighting function:Keeps the spectral peaks less smearedCommonly used:
Bartlett (triangle)Hann or Hanning (cos-shaped)Modified Hann – sqrt(cos)-shaped, to be applied twice
If re-synthesis is not requiredNatural, Bartlett, Parsen: sinc, sinc2
and sinc4 in frequency domainMax-Fauque-Bertier (sinc): rectangular in frequency domainBlackman and further generalization as Taylor sequence
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.80
0.2
0.4
0.6
0.8
1
Weight windows
Time
Wei
ght
BartlettHannModified Hann
May 14th, 2006 ICASSP 2006, Toulouse, France 12
Basics: bandpass filtering
Bandpass filtering:Do not process frequency bins below and above certain frequencies – zero themTypical low limit: 100-300 Hz for speechTypical high limit: 0.45Fs, reduces aliasingDynamic bandpass filtering
Measure SNR per binAdjust the low and high slopesApply the filter
No kidding!Increases speech intelligibilitySaves artifacts and distortionsSaves efforts and some CPU time
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 7
May 14th, 2006 ICASSP 2006, Toulouse, France 13
Basics: summary
Noise and signal properties: statistical, frequency, temporal, and spatialSuppression vs. reduction vs. enhancement vs. cancellationProcessing in frequency domain
Break in 50% overlapping frames – most commonWeighting function is important, sqrt(cos)-shaped most commonOverlap-add processing
Bandpass filtering: increases intelligibility, reduces artifacts and saves efforts
May 14th, 2006 ICASSP 2006, Toulouse, France 14
Noise suppressionGain based noise suppressiona priori and a posteriori SNRSuppression rulesML and Decision Directed approach for a priori SNR estimationUncertain presence of signalVoice activity detectorsAccounting for the temporal characteristicsOverall architectureDemos
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 8
May 14th, 2006 ICASSP 2006, Toulouse, France 15
Noise suppression: gain based processing
Given signal xn(t) and noise dn(t) mixed in yn(t)Observed in frequency domain, n-th frame, k-th frequency bin: Yk = Xk + DkNoise suppression:
Gk – time varying, non-negative, real value gain (or suppression rule)The estimator keeps the same phase as Yk: under Gaussian assumptions the best phase estimator is observed phase
The goal of noise suppression is for each frame to estimate Gk vector optimal in certain way
( ) .kk k k k k
k
YX G Y G YY
= =
May 14th, 2006 ICASSP 2006, Toulouse, France 16
Noise suppression: a priori and a posteriori SNR
Signal and noise: statistically independent Gaussian processesSignals variances a priori and a posteriori SNRs
The suppression rule is now function of two parameters:
( ), ( ), ( )X D Yk k kλ λ λ
( , )k k kG ξ γ
2( )( )
( )D
Y kk
kγ
λ( )( )( )
X
D
kkk
λξλ
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 9
May 14th, 2006 ICASSP 2006, Toulouse, France 17
Noise suppression:suppression rules
Wiener (1945):
MMSE spectral amplitude estimator
DerivationGoal Solution
Problems:Musical noises in the pausesDistortion in the speech segments
2
2
( ) ( ) ( )( ) 1 ( )1 ( )( )
DY k k kG k kkY k
λ ξγξ
−= = − =
+
{ }2ˆk kX Xε ⎡ ⎤−⎣ ⎦
2
2 2
( ) ( )( ) ( ) ( ) ( )( ) 1 1 ( )( ) ( ) ( ) ( )
DXY YY DD D
YY YY
Y k kP k P k P k kG k kP k P k Y k Y k
λ λ γ−−= = = = − = −
Musical noisesand distortions
May 14th, 2006 ICASSP 2006, Toulouse, France 18
Noise suppression:suppression rules
McAulay/Malpass (1980):
ML spectral amplitude estimator
Ephraim/Malah (1984):
Introduce a priori SNR
MMSE short term spectral amplitude estimator
Where:
2
2
( ) ( )1 1( )2 2 ( )
DY k kG k
Y kλ−
= +
0 1(1 ) exp2 2 2 2
k k k kk k k
k
G I Iπν ν ν νν νγ
⎡ ⎤ −⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
-40-20
020
40
-40-20
020
40-40
-30
-20
-10
0
10
ζ , dB
Ephraim and Malah suppression rule
γ, dB
Sup
pres
sion
Gai
n, d
B
( )1
kk
kk ξν γ
ξ+
-40-20
020
40
-40-20
020
40-40
-30
-20
-10
0
ζ , dB
Wiener suppression rule
γ, dB
Sup
pres
sion
Gai
n, d
B
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 10
May 14th, 2006 ICASSP 2006, Toulouse, France 19
Noise suppression:suppression rules (2)Ephraim/Malah (1985):
MMSE short term log spectral amplitude estimator
Computational complexity of Ephraim and Malahsuppression rulesEfficient alternatives, P. Wolfe/S. Godsill (2001):
Joint Maximum A Posteriori Spectral Amplitude EstimatorMaximum A Posteriori Spectral Amplitude Estimator
MMSE Spectral Power Estimator:
Gaussian noise and Gamma speech distributions,Martin (2002)
11
k kk
k k
G ξ νξ γ
⎛ ⎞+= ⎜ ⎟+ ⎝ ⎠
1.exp1 2
k
tk
kk
eG dttν
ξξ
∞ −⎛ ⎞= ⎜ ⎟⎜ ⎟+ ⎝ ⎠
∫
May 14th, 2006 ICASSP 2006, Toulouse, France 20
Noise suppression:a priori SNR estimation
a priori SNR estimation:
ML approximation:
Decision-directed (Ephraim/Malah, 1984):
Noise variation estimationRequires signal/noise classification of the audio frames/binsIn non-signal frames/bins update the noise model:
2( ) ( 1) ( )( ) (1 ) ( ) ( )n n nD Dk k Y kλ β λ β−= − +
2( ) ( )ˆ( )( )
D
D
Y k kk
kλ
ξλ
−=
2( 1)( )
( 1)
ˆ ( )ˆ( ) (1 )max 0, ( ) 1 , [0,1)
( )
nn
nD
X kk k
kξ α α γ α
λ
−
−⎡ ⎤= + − − ∈⎣ ⎦
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 11
May 14th, 2006 ICASSP 2006, Toulouse, France 21
Noise suppression:uncertain presence of signal
McAulay/Malpass (1980)Observation Yk = Xk + Dk holds only if we have signal presentedReal case:
Modified MMSE suppression rule:
1
0
,,
k kk
k
X D with signal state HY
D just noise state H+
=
{ } { }1 1| ( | ). | ,k k k k kE X Y P H Y E X Y H=
May 14th, 2006 ICASSP 2006, Toulouse, France 22
Noise suppression:voice activity detectors
Energy based, binary decision
Track minimal energy
For classification apply threshold (2.5-7 Emin)Can be done per frame or per bin
Probabilistic based (Sohn et. all., 1999)Compute likelihood ratio:Apply hang-over scheme Result: signal presence probability vector (per bin)
See Martin (2001) as well
( )( )
2 2( 1) ( 1) ( 1)min min min
( )min
2 2( 1) ( 1) ( 1)min min min
upn n n
n
n n ndown
E E Y Y ETE
E E Y E YT
τ
τ
− − −
− − −
+ − >=
+ − >
1 exp1 1
k kk
k k
γ ξξ ξ
⎧ ⎫Λ = ⎨ ⎬+ +⎩ ⎭
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 12
May 14th, 2006 ICASSP 2006, Toulouse, France 23
Noise suppression:using temporal properties
Suppression rule estimators use only the current frame: artifacts, distortionsTemporal gain smoothing
Direct smoothing:
HMM based:
Practical interpolation:
( ) ( 1)(1 )n nk k kG G Gβ β−= − +
( 1)( ) 01 11
( 1)00 10
nn k
k knk
a a GG Ga a G
−
−
+=+
( ) ( 1)n nk k kG G G−=
May 14th, 2006 ICASSP 2006, Toulouse, France 24
Noise suppression:overall architecture
Non-observable
signal x(t)
noise d(t)
Observable
SFFT
VADUpdatenoisemodel
Computesuppression
rule
Final estimator
phase
magnitude
Corruptedsignal y(t)
iSFFT
noise model
presence probability vector
suppression rule
x(t) estimation
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 13
May 14th, 2006 ICASSP 2006, Toulouse, France 25
Noise suppression:practical tips and tricks
Limit: Suppression gains: keep above -60 dBProbabilities: [1e-4,0.9999]
Smooth (in time and/or frequency):Noise modelsGains
Simplify:Do not use more complex models than necessarySimpler model with more precise or faster parameters estimation usually works better
May 14th, 2006 ICASSP 2006, Toulouse, France 26
Noise suppression:demonstrations
Input file Wiener MMSE SPEMcAulay/Malpass Ephraim/Malah
10.722.5-44.8-22.2MMSE SPE
13.225.0-47.0-22.0Ephraim-Malah
2.614.4-36.0-21.6McAulay-Malpass
18.230.2-52.3-22.1Wiener filtered
11.8-33.3-21.5Not processed
ImprovementSNRNoiseSignalAlgorithm
Note: All measurement units are dB
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 14
May 14th, 2006 ICASSP 2006, Toulouse, France 27
Noise suppression:summary
Noise suppression as time varying, real value, non-negative gain (or suppression rule) based operationa priori and a posteriori SNRs estimation is essential – the decision-directed approachSignal may or may not be present – voice activity detectors are criticalEstimation of precise noise model is with high importanceSmoothing in time improves listening results
May 14th, 2006 ICASSP 2006, Toulouse, France 28
Directional microphones
Microphone typesPressure gradient microphoneParameters for directional microphonesFirst order directional microphonesClassification and parametersBottom line
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 15
May 14th, 2006 ICASSP 2006, Toulouse, France 29
Directional microphones:microphone types
Microphone is a device that converts the air pressure to a electric signalMicrophone types:
Carbon – in first phonesCrystal – piezoelectric effect basedDynamic – inverted loudspeakerCondenser – measurement grade micsElectret – the most common today
May 14th, 2006 ICASSP 2006, Toulouse, France 30
Directional microphones:pressure gradient microphone
Pressure microphoneConverts pressure to electric signalCan be designed as diaphragm in closed capsuleAcoustical monopole
Pressure gradient microphoneConverts the pressure difference into electric signalCan be designed as diaphragm in a open capsuleAcoustical dipole
acoustical monopole,omnidirectional
acoustical dipole,directional
closedcapsule
diaphragm
opencapsule diaphragm
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 16
May 14th, 2006 ICASSP 2006, Toulouse, France 31
Directional microphones:pressure gradient microphone (2)
Directivity pattern of pressure gradient microphone
Has figure-8 directivity patternFrequency response: 6 dB/octslope towards low frequencies
sound0 deg
sound90 deg
0.05
0.1
0.15
0.2
30
210
60
240
90
270
120
300
150
330
180 0
Directivity pattern at 1000 Hz
0 1000 2000 3000 4000 5000 6000 7000 80000
0.2
0.4
0.6
0.8
1
1.2
1.4Frequency response at 0 deg
cos( )( , ) 1 exp( 2 )dU f j f θθ πν
= − −
d = 9 mmυ = 342 m/sf = 0-8000 Hzө= 0-360O
May 14th, 2006 ICASSP 2006, Toulouse, France 32
Directional microphones:first order microphones
First order microphone as combination of delayed τ and subtracted two signals from two microphones at distance dDirectivity pattern
cos( )( , ) 1 exp 2
( , ) (1 )cos( )Norm
dU f j f
U f
θθ π τν
θ α α θ
⎛ ⎞⎛ ⎞= − − +⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠≈ + −
Omnidirectional and directionalmicrophones
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 17
May 14th, 2006 ICASSP 2006, Toulouse, France 33
Directional microphones:classification
Zero at 90 deg, acoustic dipole4.80.00figure 8
Highest DI, zeros at ± 109 deg6.00.25hypercardioid
Highest front-to-back ratio, zeros at ±125 deg5.7~0.35supercardioid
Zero at 180 deg4.80.50cardioid
No directivity0.01.00omnidirectional
NoteDIαType
cardioid supercardioid hypercardioid figure 8 (dipole)
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
Directivity: cardioid
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
Directivity: supercardioid
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
Directivity: hypercardioid
0.2
0.4
0.6
0.8
1
30
210
60
240
90
270
120
300
150
330
180 0
Directivity: figure 8
May 14th, 2006 ICASSP 2006, Toulouse, France 34
Directional microphones:parameters
Directivity patternDirectivity index Sensitivity, -45 dBV/Pa typicalSNR, 60 dB typicalFrequency response: front/back
10 2
0 0
( , , )( ) 10.log1 ( , , )
4
T TP fDI f
d d P fπ π
ϕ θ
θ ϕ ϕ θπ
⎛ ⎞⎜ ⎟⎜ ⎟
= ⎜ ⎟⎜ ⎟⋅⎜ ⎟⎜ ⎟⎝ ⎠
∫ ∫2
0( , , ) ( , ) , constantP f U f cϕ θ ρ ρ= = =
( , )U f c
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 18
May 14th, 2006 ICASSP 2006, Toulouse, France 35
Directional microphones:summary
In the Noise suppression section we learned that 6 dB noise suppression is a good achievementAn cardioid microphone gives 4.8 dB noise reduction without distortions and artifactsIn real systems design using directional microphones is importantThe microphone directivity pattern is further denoted as U(f,c), f – frequency, c – look-up direction { , , }c θ ϕ ρ=
May 14th, 2006 ICASSP 2006, Toulouse, France 36
Microphone arrays
Definition and typesDelay-and-sum beamformerTerminologyTime-invariant beamformers, demoSound source localizationAdaptive beamformersSpatial filtering, demo
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 19
May 14th, 2006 ICASSP 2006, Toulouse, France 37
Microphone arrays:definition and types
Set of synchronously sampled microphonesTypes:
linear, planar, 3Dcompact and largeuniform, nonuniform and random spacingnear field and far field
Advantage: allow spatial filtering, reducing the noises and reverberationDisadvantage: require more microphones and more processing time
May 14th, 2006 ICASSP 2006, Toulouse, France 38
Microphone arrays:delay-and-sum beamformer
The most intuitive approachShift the signals to align them and sumAdvantages:
Simple and efficientProblems:
Variable directivityBig sidelobesLow efficiency
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 20
May 14th, 2006 ICASSP 2006, Toulouse, France 39
Microphone arrays:terminology
Beamforming: making the microphone array to listen to given look-up directionBeamsteering: electronically change the look-up direction the microphone array listens toNullsteering: suppressing the sounds coming from given directionSound source localization: techniques to detect, localize and track one or multiple sound sources using microphone array
May 14th, 2006 ICASSP 2006, Toulouse, France 40
Microphone arrays:general parameters
Generalized form:M – number of microphonesXi(f) – spectrum of i-th channelW(f,i) – weight coefficients matrixY(f) – output signal
Parameters:Directivity pattern B:
Main Response Axis – direction towards max sensitivity, look-up directionBeamwidth: area -3 dB around MRA
1
0( ) ( , ) ( )
M
ii
Y f W f i X f−
=
= ∑
2
( , ) ( ) ( , ),
( , ) ( , )
m
H
c pj f
m
B f W f D f
eD f U f cc p
πν
θ θ
θ
−−
= ⋅
=−
maxθ
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 21
May 14th, 2006 ICASSP 2006, Toulouse, France 41
Microphone arrays:general parameters (2)
Ambient noise gain: isotropic noise reduction
Non-correlated (sensor) noise gain
Total noise gain: combination of the two above
The beamformer design is to find weight matrix to satisfy certain criteria & constrains
2 2
02
1( ) ( , , )4CH f B f d d
ππ
π
ϕ θ θ ϕπ
+
−
= ∫ ∫
12
0( ) ( , )
M
Ni
H f W f i−
=
= ∑
( ) ( )2 2
2
( ). ( ) ( ). ( )( )
( )C C N N
C
H f N f H f N fH f
N f+
=
May 14th, 2006 ICASSP 2006, Toulouse, France 42
Microphone arrays:time invariant beamformerDesign criteria:
Max noise suppression: highly non-linearReplaced with directivity pattern matching – reducing the optimization dimensionsIsotropic noise assumption
Constrains:Unit gain and zero phase shift towards MRAFrequently: in the beamwidth area
Two controversial trends: decreasing the ambient noise gain increases the non-correlated noise gain. Optimum? – Minimize the total gain
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 22
May 14th, 2006 ICASSP 2006, Toulouse, France 43
Microphone arrays:time invariant beamformer (2)
Superidirective beamformer (Cox, 1986)
is the power spectral density matrix of the input signals assuming isotropic noiseConstrained LMS algorithm, antenna arrayAchieves maximum directivityChu, 1997; Elko, 2000
min( ) 1H HXXW
W W subject to W DΦ =
XXΦ
May 14th, 2006 ICASSP 2006, Toulouse, France 44
Microphone arrays:time invariant beamformer (3)
0 2000 4000 6000 80000
5
10
15
20
25
Frequency (Hz)
Dire
ctiv
ity (l
inea
r)
SuperdirectiveDelay-and-sum
0 2000 4000 6000 8000-60
-50
-40
-30
-20
-10
0
10
Frequency (Hz)
Whi
te n
oise
gai
n (d
B)
SuperdirectiveDelay-and-sum
-20
-10
0
90
270
180 0
Superdirective beamformer (f=3000 Hz)
-20
-10
0
90
270
180 0
Delay-and-sum beamformer (f=3000 Hz)
Comparison:Delay and sum and Superdirective array
Simulation:5 element linear array,3 cm distance
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 23
May 14th, 2006 ICASSP 2006, Toulouse, France 45
Microphone arrays:time invariant beamformer (4)
Design example (Tashev/Malvar, 2005)Four element linear arrayBeamwidth vs. Frequency vs. Total Noise GainDirectivity pattern vs. FrequencyDirectivity pattern in 3D for 1000 Hz
Demonstrations:a) Parallel recordingb) Real-time SSL
May 14th, 2006 ICASSP 2006, Toulouse, France 46
Microphone arrays:time invariant beamformer (5)Advantages:
No VAD required Stable, reliable, predictable, measurableGuaranteed parametersFast switching to different speakerLow CPU requirement
Real-world problems: Requires Sound Source Localizer to find and track the desired sound sourceSensor’s & equipment’s noises limit the performance Microphones manufacturing tolerances:
Calibration during manufacturingAuto calibration during use (Tashev, 2004)
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 24
May 14th, 2006 ICASSP 2006, Toulouse, France 47
Microphone arrays:source localization
Time delay estimates basedCross-correlation functionWeighting: ML, PHAT (Knap/Carter, 1976)Combining the pairs
Brandstein et. all., 1996Burchfield et. all., 2001 – uses optimization, works in 2DRui/Florencio, 2003 – sum or cross-correlation functions towards hypothesis
Beamsteering basedCompute the output energy of set of beamsFind the maximumDo interpolation for increased precisionVariant: two dimensional search
May 14th, 2006 ICASSP 2006, Toulouse, France 48
Microphone arrays:source localization (2)
Problems: noise and reverberation Post-processing the raw SSL results
Particle filteringKalman filteringReal-time clustering
Camera-assisted approachFace detection softwareFusion SSL and video data
•Real SSL results: raw, post-processed, snapped to 10 degrees beams.•Two persons talking at 6 and -38 degrees, distance 12 feet, conference room.•Four element linear array.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 25
May 14th, 2006 ICASSP 2006, Toulouse, France 49
Microphone arrays:adaptive algorithms
Frost algorithm (Frost, 1972)
is the power spectral density matrix of the input signalsGradient descent optimization, i.e. constrained LMS algorithmDesigned for antenna array
min( ) 1H HXXW
W W subject to W DΦ =
XXΦ
May 14th, 2006 ICASSP 2006, Toulouse, France 50
Microphone arrays:adaptive algorithms (2)
Generalized Side Lobe Canceller (Griffiths/Jim, 1982)Time-invariant beamformerNulls are sharper than beamsBlocking matrix – place null towards the sound sourceAdaptive filters to minimize residual in the beamformer output
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 26
May 14th, 2006 ICASSP 2006, Toulouse, France 51
Microphone arrays:adaptive algorithms (3)
AdvantagesUse fully the geometry under the specific noiseVery good with point noise sources No calibration required
Real-world problemsHigher requirement for CPU, memoryMore complex for implementationSlower adaptation and switching to next sound sourceNon-predictable and non-guaranteed parametersSimilar to fixed beamformers performance with ambient type of noise
May 14th, 2006 ICASSP 2006, Toulouse, France 52
Microphone arrays:non-linear spatial filtering
Implemented as non-linear post-processorBased on Instantaneous Direction Of Arrival (IDOA) estimation per bin
where Compute the probability and apply in the same way as in noise suppression under uncertain presence of signal
[ ]1 2 1( ) ( ), ( ), , ( )Mf f f fδ δ δ −∆ …
1 1( ) arg( ( )) arg( ( ))j jf X f X fδ − = −-2
02
-20
2
-2
0
2
-π ≤ δ1 ≤ +π
Phase differences at 750 Hz
-π ≤ δ2 ≤ +π
-π ≤
δ3 ≤
+π
MeasuredComputed
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 27
May 14th, 2006 ICASSP 2006, Toulouse, France 53
Microphone arrays:non-linear spatial filtering (2)
Generalized suppression with spatial information and known look-up directionDemo:
Recording conditions:Human speaker at 0 degrees, 1.5 mRadio at -45 degrees, 2 mOffice: normal noise and reverberationFour element linear microphone array
Same audio recording, two sequences:video: direction-frequency-power; audio: one microphonevideo: direction-power for SSL; audio: array output
May 14th, 2006 ICASSP 2006, Toulouse, France 54
Microphone arrays:non-linear spatial filtering (3)
AdvantagesBetter directivity than time-invariant beamformerGood source separationLow CPU overhead
Real-world problemsRequires channel matching, i.e. calibrationNon-linear processing (artifacts, musical noises) direction-time-power
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 28
May 14th, 2006 ICASSP 2006, Toulouse, France 55
Advanced techniques
Adaptive noise reductionPsychoacoustic based noise suppressorNoise suppressor optimized for speech recognitionNoise suppression with speech modelSpatial noise suppression
May 14th, 2006 ICASSP 2006, Toulouse, France 56
Advanced techniques:adaptive noise reduction
Add a microphone to capture the noise signal (HDD in a laptop, engine in a car)Two inputs system:
voice + noise: y(t)=x(t)+h(t)*z(t)noise only: z(t)
Use LMS, RLS or NLMS adaptive filterDouble talk detector necessary if leakage of x(t) in z(t)
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 29
May 14th, 2006 ICASSP 2006, Toulouse, France 57
Advanced techniques:adaptive noise reduction (2)
Advantages:Linear! No musical noises or distortionsWorks with non-stationary noisesLow CPU requirement
Real-world issuesNeeds a second microphoneLimited applicability: when we can capture the noise only signalHas some audible residuals and artifacts
May 14th, 2006 ICASSP 2006, Toulouse, France 58
Advanced techniques:psychoacoustic noise suppressor
Concept:More energy removed -> more musical noises and distortionsMasking effects in frequency and time domains in human perception of soundWhy remove noises we can’t hear?
Real-life issuesNeeds MOS tests for evaluationDuplicates codec functionality – the new audio codecs use the same effect
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 30
May 14th, 2006 ICASSP 2006, Toulouse, France 59
Advanced techniques:noise suppressor for ASR
General idea: optimize parameterized suppression rule for best recognition rate (Tashev/Droppo/Acero, 2006)
More training data improves average recognition, harms clean speech recognitionRprop optimization algorithm: enhanced version of gradient descent Objective function: Maximum Mutual Information (MMI) from ASR, closely related to the recognition accuracyOptimization parameters: the suppression ruleStarting point: MMSE Spectral Power Estimator rule
Baseline: 99.5% clean, 52.5% averageStarting point: 96.9% clean, 74.9% averageAchieved optimal point: 99.0% clean, 77.7% average
May 14th, 2006 ICASSP 2006, Toulouse, France 60
Advanced techniques:noise suppressor for ASR (4)
MMSE SPE After 20 Iterations
-40-20
020
40
-40-20
020
40-40
-30
-20
-10
0
10
ζ , dB
Suppression Rule - start point
γ, dB
Sup
pres
sion
Gai
n, d
B
-40-20
020
40
-40-20
020
40-40
-30
-20
-10
0
ζ , dB
Suppression Rule - result point
γ, dB
Sup
pres
sion
Gai
n, d
B
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 31
May 14th, 2006 ICASSP 2006, Toulouse, France 61
Advanced techniques:using speech model
General idea:Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsMeasure the parametersSynthesize clean speech signal
Real-world issues:If we can do reliably the parsing – we solved the noise robust ASR problems ☺Even text-to-speech systems do not have very good pronunciation, doing this without language model is more difficult
May 14th, 2006 ICASSP 2006, Toulouse, France 62
Advanced techniques:using speech model (2)
Drucker (1968):Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsUse separate enhancing filters for each categoryHard decision for presence and class
McAulay/Malpass (1980) introduced soft decision rules and using several filters in parallelSome techniques:
Use the harmonic structure of vowels, time warping to make them flat, clean, un-warpUse vocal tract model for generating fricatives and other consonantsUsing language model (too specific)
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 32
May 14th, 2006 ICASSP 2006, Toulouse, France 63
Advanced techniques:spatial noise suppression
Microphone array for headset (Tashev/Seltzer/Acero, 2005)
3-element microphone arrayBone sensor for reliable VADWorking in IDOA space
Multidimensional generalization of classic noise suppression
Building position-dependent noise modelsApply suppression rule
[ ]1 2 1( ) ( ), ( ), , ( )Mf f f fδ δ δ −∆ …
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
Phase differences for 1000 Hz
D12
D13
1 1( ) arg( ( )) arg( ( ))j jf X f X fδ − = −
May 14th, 2006 ICASSP 2006, Toulouse, France 64
Advanced techniques:spatial noise suppression (2)
0 2000 4000 6000 8000-10
-8
-6
-4
-2
0
2
Frequency, Hz
Mag
nitu
de, d
B
Mic1Mic2
•General architecture•Beamformerdirectivity•Diffraction around the head correction
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 33
May 14th, 2006 ICASSP 2006, Toulouse, France 65
Advanced techniques:spatial noise suppression (3)
Signal and noise variances
a priory and a posteriori SNR
Suppression rule
2
2
( | ) ( | )
( | ) ( | )
Y
D
f E Y f
f E D f
λ
λ
⎡ ⎤∆ ∆⎢ ⎥⎣ ⎦⎡ ⎤∆ ∆⎢ ⎥⎣ ⎦
2
( | ) ( | )( | ) (1 )max[0, ( | )], [0,1)( | )
( | )( | )
( | )
Y D
D
D
f ff ff
Y ff
f
λ λξ β β γ βλ
γλ
∆ − ∆∆ + − ∆ ∈∆
∆∆
∆
( | ) 1 ( | )( | )1 ( | ) ( | )
f fH ff f
ξ ϑξ γ
⎛ ⎞∆ + ∆∆ = ⎜ ⎟+ ∆ ∆⎝ ⎠
-20
2
-2
0
2
0
50
100
-π ≤ δ2 ≤ +π-π ≤ δ1 ≤ +π
Sig
nal v
aria
nce
-20
2
-2
0
2
0
5
10
-π ≤ δ2 ≤ +π-π ≤ δ1 ≤ +π
Noi
se v
aria
nce
May 14th, 2006 ICASSP 2006, Toulouse, France 66
Advanced techniques:spatial noise suppression (4)
SNR improvement, all units in dB
BM – best microphone,BF – beamformerNS – noise suppressorSR – spatial noise suppressor
Demo: parallel recording with BT headset
16.411.16.43.2Car, 90 dB22.817.512.37.2Café, 75 dB 34.729.422.525.2Office, 55 dB SRNSBFBM
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 34
May 14th, 2006 ICASSP 2006, Toulouse, France 67
Advanced techniques:summary
Improving further the noise suppression and reduction increases complexity, requires more information.The algorithms become more specialized: for car, for speech, for ASR, for specific noises.Use good judgment when use or design them:
Do I need this? How specific is the application?
Remember: more complex model with more parameters means slower computation and adaptation. Use with caution.Still very exciting new algorithms, solving problems unsolved so far.
May 14th, 2006 ICASSP 2006, Toulouse, France 68
Defeating ambient noise: final remarks
The art of noise suppression is to know when to stop.None of the methods is universal, use cascading and make sure not to destroy important properties.Build processing blocks, think the whole system: well balanced suppression across the processing chain.Noise suppression is about human perception: use your ears and MOS tests.
Defeating Ambient Noise - practical approaches
May 14th, 2006
ICASSP 2006, Toulouse, France 35
May 14th, 2006 ICASSP 2006, Toulouse, France 69
Finally
Thank you for choosing this tutorial!Thank you for the attention!
Questions?
Contact info: [email protected]://research.microsoft.com/users/ivantash