Upload
phungtuyen
View
221
Download
0
Embed Size (px)
Citation preview
Dual Transfer Function GSC and Application toJoint Noise Reduction and Acoustic Echo
Cancellation
Gal ReuvenUnder supervision of Sharon Gannot1 and Israel Cohen2
1School of Engineering, Bar-Ilan University, Ramat-Gan2Department of Electrical Engineering, Technion, Haifa
February, 2006
DTF-GSC AND APP. TO JOINT NR AND AEC Motivation
Motivation
• Interferences degrade
– Intelligibility– Speech compression quality– Speech recognition rates
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
COMPEETING SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
Goal: speech enhancement by joint interference and noisereduction system
1
DTF-GSC AND APP. TO JOINT NR AND AEC Outline
Outline
• Problem presentation
• The DTF-GSC
• Estimation
• Performance analysis and experimental study
• Application: joint AEC and NR
– Cascade schemes– ETF-GSC scheme
2
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Problem Presentation
• M ≥ 3 microphones
• One desired speech signal
• One directional interference signal
• One directional/ambient noise signal
• Arbitrary acoustic transfer functions
(ATFs)
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
COMPEETING SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t)
m = 1, . . . , M
3
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Time Domain Presentation
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ s2(t) + nm(t); m = 1, . . . , M
where
am(t): the acoustical impulse responses of the m-thmicrophone to the desired speech source
bm(t): the acoustical impulse responses of the m-thmicrophone to the non-stationary interference source
s1(t): the desired speechs2(t): the non-stationary interference sourcenm(t): the (directional or nondirectional) stationary noise
signal at the m-th microphone
4
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Frequency Domain Presentation
STFT:
Z(t, ejω) = A(ejω)S1(t, ejω) + B(ejω)S2(t, ejω) + N(t, ejω)where
Z(t, ejω) =[
Z1(t, ejω) Z2(t, ejω) · · · ZM(t, ejω)]T
A(ejω) =[
A1(ejω) A2(ejω) · · · AM(ejω)]T
B(ejω) =[
B1(ejω) B2(ejω) · · · BM(ejω)]T
N(t, ejω) =[
N1(t, ejω) N2(t, ejω) · · · NM(t, ejω)]T
5
DTF-GSC AND APP. TO JOINT NR AND AEC Problem Presentation
Goal
• Reconstruct the desired speech signal in an environment contains
– Reverberation– Competing speech signal (double talk)– Stationary noise
• Applications
– Blind source separation (BSS)– Acoustic echo cancellation (AEC)
• Methods
– Extend TF-GSC such that it will apply null to the interferencedirection
– Exploiting non stationarity of desired and interference signals
6
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Dual Source Transfer-FunctionGeneralized Sidelobe Canceller (DTF-GSC)
��
����
�� ��������
����
��������
��������
��������
W†0
+
ZM (t, ejω)
Y (t, ejω)
YMBF(t, ejω)
∑ ∑
∑
U3(t, ejω)
U4(t, ejω)
UM (t, ejω)
YNC(t, ejω)
H†
Z1(t, ejω)
Z2(t, ejω)
Z3(t, ejω)
G3(t, ejω)
G4(t, ejω)
GM (t, ejω)
−
7
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Method
Extend the TF-GSC for dealing with nonstationary interference
X Matched beamformer (MBF)
Distortionless to the desired direction while blocking the interferencedirection
X Blocking matrix (BM)
Blocking both desired and interference directions
X Adaptive noise canceller (ANC)
Estimates the residual noise at the MBF output using reference signalsproduced by the BM
8
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Matched beamformer
ATFs ratio matched filter:
W0(ejω) =
A(ejω)
‖A(ejω)‖2 − ρ(ejω) B(ejω)
‖A(ejω)‖‖B(ejω)‖1− |ρ(ejω)|2 F(ejω)
ρ(ejω)≡ B†(ejω)A(ejω)‖A(ejω)‖ ‖B(ejω)‖
Easily verified:
• A†(ejω)W0(ejω) = F(ejω)
• B†(ejω)W0(ejω) = 0
9
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Blocking Matrix
H(ejω) =
Q3(ejω) Q4(ejω) · · · QM(ejω)L3(ejω) L4(ejω) · · · LM(ejω)
1 0 · · · 00 1 · · · 0
· · · . . .0 0 · · · 1
Qm(ejω) =−A∗2(e
jω)
A∗1(ejω)
B∗m(ejω)
B∗1(ejω)− B∗2(ejω)
B∗1(ejω)
A∗m(ejω)
A∗1(ejω)
A∗2(ejω)
A∗1(ejω)− B∗2(ejω)
B∗1(ejω)
; m = 3, . . . , M
Lm(ejω) =−A∗m(ejω)
A∗1(ejω)− B∗m(ejω)
B∗1(ejω)
A∗2(ejω)
A∗1(ejω)− B∗2(ejω)
B∗1(ejω)
; m = 3, . . . , M
10
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Blocking Matrix
Easily verified:
• A†(ejω)H(ejω) = 0
• B†(ejω)H(ejω) = 0
11
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC
Adaptive Noise Canceller
Normalized LMS:
Gm(t + 1, ejω) = Gm(t, ejω) + µUm(t, ejω)Y ∗(t, ejω)
Pest(t, ejω)
Gm(t + 1, ejω) FIR←− Gm(t + 1, ejω)
for m = 3, . . . , M ; where
Pest(t, ejω) = ηPest(t− 1, ejω) + (1− η)‖Z(t, ejω)‖2
12
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
MBF components:
Done in a two steps procedure
• EstimatingA∗m(ejω)
A∗1(ejω)and
B∗m(ejω)
B∗1(ejω)exploiting non stationarity
• calculating W0(ejω)
13
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
An unbiased estimate ofA∗m(ejω)
A∗1(ejω)and
B∗m(ejω)
B∗1(ejω)is obtained by applying LS to
Φ(1)zmz1(e
jω)Φ(2)
zmz1(ejω)
...
Φ(K)zmz1(e
jω)
=
Φ(1)z1z1(e
jω) 1Φ(2)
z1z1(ejω) 1
...
Φ(K)z1z1(e
jω) 1
[Hm(ejω)
Φumz1(ejω)
]+
ε(1)m (ejω)
ε(2)m (ejω)
...
ε(K)m (ejω)
(a separate set of equations is used for m = 2, . . . , M).
14
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
BM components:
Estimation method depends on type of frames
• Single speech signal is active:A∗m(ejω)
A∗1(ejω)or
B∗m(ejω)
B∗1(ejω)is adapted and H(ejω)
is calculated
• Double talk: Qm(ejω) and Lm(ejω) are estimated directly by solving
15
DTF-GSC AND APP. TO JOINT NR AND AEC Estimation
Estimation
Φ(1)zmz1(e
jω)Φ(2)
zmz1(ejω)
...
Φ(K)zmz1(e
jω)
=
Φ(1)z1z1(e
jω) Φ(1)z2z1(e
jω) 1Φ(2)
z1z1(ejω) Φ(2)
z2z1(ejω) 1
...
Φ(K)z1z1(e
jω) Φ(K)z2z1(e
jω) 1
×
−Qm(ejω)−Lm(ejω)Φumz1(e
jω)
+
ε(1)m (ejω)
ε(2)m (ejω)
...
ε(K)m (ejω)
(a separate set of equations is used for m = 3, . . . , M)
16
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
DTF-GSC Performance Analysis
• General expression for the output power spectral density:
Φyy(t, ejω
) =
{W0
†(e
jω)ΦZZ(t, e
jω)W0(e
jω)
−W0†(e
jω)ΦNN(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦZZ(t, e
jω)W0(e
jω)
−W0†(e
jω)ΦZZ(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦNN(t, e
jω)W0(e
jω)
+W0†(e
jω)ΦNN(t, e
jω)H(e
jω)(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦZZ(t, e
jω)H(e
jω)
×(H†(ejω
)ΦNN(t, ejω
)H(ejω
))−1
H†(ejω)ΦNN(t, e
jω)W0(e
jω) }
• PSD depends on:
– Input signal PSD– Noise signal PSD– Signal ATF ratios
17
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
DTF-GSC Performance Analysis
Output power density
• 10 microphones linear array
• Delay only ATFs for speech and noise
• Maintaining desired signal at θ = 90o
• Blocks directional noise from θ =120o
• Blocks interference from θ = 60o
18
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
PSD deviation
DEV(t, ejω) =Φs1
yy(t, ejω)
|F(ejω)|2|A1(ejω)|2Φs1s1(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise field
• Desired signal from θ = 90o
• Upto 4dB distortion in frequenciesbelow 3000Hz
01000
20003000
4000 87 88 89 90 91 92 93
−10
−6
−2
2
θ [deg]Frequency[Hz]
Φyy
[dB
]
19
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
Noise Reduction
NR(t, ejω) =Φn
yy(t, ejω)
|F(ejω)|2|D1(ejω)|2Φnn(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise signal from θ =120o
• 50dB attenuation in the noise direc-tion
0
1000
2000
3000
4000
115116117118119120121122123124125−60
−40
−20
0
Frequency[Hz]θ [deg]
Φyy
[dB
]
20
DTF-GSC AND APP. TO JOINT NR AND AEC DTF-GSC Analysis
Interference Reduction
NIR(t, ejω
) =Φs2
yy(t, ejω)
|F(ejω)|2|B1(ejω)|2Φs1s1(t, ejω)
• 10 microphones linear array
• Delay only ATFs for speech
• Directional noise field
• Interference signal from θ = 60o
• 50dB attenuation in the interference direc-
tion
0
1000
2000
3000
4000
5556575859606162636465−60
−40
−20
0
Frequency[Hz]θ [deg]
Φyy
[dB
]
21
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Experimental study
• Speech signal
• simulated ATFs in two noise fields:
– directional noise– diffused noise
• Sonograms
• Performance evaluation
22
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Sonograms
Time [Sec]
Fre
quen
cy [H
z]
(a)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(c)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(e)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(b)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z](d)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(f)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
23
DTF-GSC AND APP. TO JOINT NR AND AEC Experimental study
Performance evaluation
Noise and interference reduction in
• directional noise field (top)
• diffused noise field (bottom)
Input Output of Output of Output ofMBF BM DTFGSC
S1NR S1S2R S1NR S1S2R S1NR S2NR S1NR S1S2R11.3 2.3 13.8 16.9 -3.9 -4.5 34.6 12.712.7 2.3 17.4 25 -3.8 -3.5 20.9 22.6
24
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Application: joint noise reduction and echo cancellation
• M ≥ 3 microphones
• One desired speech signal
• One competitive speech signal
(echo)
• One directional/ambient noise signal
• Arbitrary acoustic transfer functions
(ATFs)
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
REMOTE SPEECH SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
zm(t) = am(t) ∗ s1(t) + bm(t) ∗ e(t) + nm(t)
m = 1, . . . , M
25
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Cascade scheme
• AEC-BF: multichannel AEC followed by beamformer
– The beamformer inputs contain less echo– The multichannel AEC deteriorates due to noise
• BF-AEC: beamformer followed by single channel AEC
– AEC contains less noise in its input– The beamformer suppresses echo, although AEC has better
performance– AEC suffers from fast variations in echo path due to the beamformer
26
DTF-GSC AND APP. TO JOINT NR AND AEC Application
TF−GSC
∑
∑
∑
U2(t, ejω)
U3(t, ejω)E(t, ejω)
ZM (t, ejω)
Z2(t, ejω)
Z1(t, ejω)
GE2
(t, ejω)
GE1
(t, ejω)
GNM (t, ejω)
GN3
(t, ejω)
GN2
(t, ejω)
ZAECM (t, ejω)
ZAEC2
(t, ejω)
ZAEC1
(t, ejω)
Y (t, ejω)
−
GEM (t, ejω)
+
−
+
∑
∑
YMBF(t, ejω)
−
∑
+
UNM (t, ejω)
H†
W†0
YNC(t, ejω)
−
+
27
DTF-GSC AND APP. TO JOINT NR AND AEC Application
��
����
����
��
��
����
������������ ��
��
����
TF−GSC
∑
∑
H†
Z1(t, ejω)
Z2(t, ejω)
W†0
U2(t, ejω)
U3(t, ejω)
UM (t, ejω)
YNC(t, ejω)
ZM (t, ejω)
E(t, ejω) GE(t, ejω)
GN2
(t, ejω)
GN3
(t, ejω)
GNM
(t, ejω)
YMBF(t, ejω)
∑
−
Y (t, ejω)
−
∑
+
YBF (t, ejω)
+
28
DTF-GSC AND APP. TO JOINT NR AND AEC Application
ETF-GSC scheme
• Matched beamformer (MBF)
– Maintains desired signal
• Blocking unit (BU)
– Blocks both desired and echo signals
• Adaptive noise and echo canceller (ANEC)
– Noise canceller and echo canceller work in parallel– Echo reference signal is used to create more interference reference
signals to the ANEC
29
DTF-GSC AND APP. TO JOINT NR AND AEC Application
��
��
����
��������
��
��
��������
����
��
��������
����
����
����
����
��
��
����
��������
��
��
����
��������
����
����
����
��������
��������
∑
∑
E(t, ejω)
∑
−
+
U′
3(t, ejω)
U′
M(t, ejω)
GNM (t, ejω)
GN3 (t, ejω)
GN2 (t, ejω)
GEM (t, ejω)
GE2 (t, ejω)
GE1 (t, ejω)
∑
Y (t, ejω)
YEC(t, ejω)
∑
F†0
YNC(t, ejω)
Z1(t, ejω)
Z2(t, ejω)
Z3(t, ejω)
ZM (t, ejω)
UM (t, ejω)
U3(t, ejω)
U2(t, ejω)
−
+
H†
F†0
YMBF(t, ejω)
∑
H†
∑−
+
+
−
U′
2(t, ejω)
GH2 (t, ejω)
GH3 (t, ejω)
GHM (t, ejω)
30
DTF-GSC AND APP. TO JOINT NR AND AEC Application
ETF-GSC scheme
• Estimation
– MBF estimation is done as in the TF-GSC (during single talk)– BM estimation is done as in the TF-GSC (during single talk)– Noise canceller adapts during noise only frames– Echo canceller adapts during echo frames
31
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Performance evaluation
Input Tested algorithm Echo suppression Noise reductionSNR SER AEC BF Total Total
AEC-BF 18.5 2.1 20.6 23.515 5 BF-AEC 5.4 6.2 11.7 24.7
ETF-GSC 37.7 23.1AEC-BF 3.9 0.4 4.4 23.3
5 15 BF-AEC 1.7 6.8 8.6 24.4ETF-GSC 18.1 23.0AEC-BF 12.1 1.0 13.1 23.9
15 15 BF-AEC 4.6 5.8 10.4 24.5ETF-GSC 29.7 23.6
32
DTF-GSC AND APP. TO JOINT NR AND AEC Application
Sonograms
Time [Sec]
Fre
quen
cy [H
z]
(a)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(c)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(b)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
Time [sec]
Fre
quen
cy [H
z]
(d)
0 1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
4000
5
10
15
20
25
30
35
40
45
50
33
DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions
Conclusions
• DTF-GSC algorithm
– GSC structure: modified MBF and BM– New identification procedure for DT frames– Application: BSS problem of convolutive mixtures and additive noise
• DTF-GSC performance analysis
– General expression for the output power spectral density– Expected deviation imposed on the desired signal– Noise reduction– Interference reduction
34
DTF-GSC AND APP. TO JOINT NR AND AEC Conclusions
Conclusions
ETF-GSC
– Joint echo cancellation and noise reduction in a reverberatedenvironment
– TF-GSC based solution: BU and ANEC blocks (reference signalincorporated)
– Performance evaluation (during DT) and comparison to cascade schemes
35
DTF-GSC AND APP. TO JOINT NR AND AEC Future Research
Future Research
• Dual nonstationary speech signals in the presence of echo andstationary noise
MICROPHONES ARRAY
AMBIENT NOISE
DESIRED SPEECH SIGNAL
ECHO SIGNAL
NOISE SOURCE
SPEECH ENHANCEMENT
SYSTEM
COMPETING SPEECH SIGNAL
36
DTF-GSC AND APP. TO JOINT NR AND AEC Future Research
Future Research• Speech enhancement using the DTF-GSC and postfiltering
– Less significant noise reduction is obtained in diffused noise field– Postfiltering: known methods or using noise reference signals
• DTF-GSC using Relative Transfer Function (RTF) system identification
– Weighted least squares optimization criterion– Smaller error variance and faster convergence
• Joint noise reduction and echo cancellation using the ETF-GSC andresidual echo cancellation
– Misadjusted AEC filters and finite filters length– Linear prediction error filter removes the short-term correlation of the
residual echo– Whitened residual echo is cancelled by a noise reduction filter
37