12-235

1On the Use of Time-Domain Widely LinearFiltering for Binaural Speech Enhancement

Joseph Szurley, Student Member, IEEE, Alexander Bertrand, Member, IEEE, and Marc Moonen, Fellow, IEEE

AbstractWidely linear (WL) filtering has been shown toimprove performance compared to linear filtering due to itsability to incorporate the non-circularity of the signal statistics.However there has been some inconsistency in its application,specifically when constructing complex signals from real signals,which has recently been considered in the context of speechenhancement in binaural or stereo systems. This letter showsthat the corresponding WL filtered output contains exactly thesame information as the linear filter output while increasing thecomputational complexity and memory requirements.

Index TermsWidely linear filtering, binaural speech enhance-ment

I. INTRODUCTIONRecently there has been a growing interest in applying

widely linear (WL) filtering to speech enhancement [1], [2].The benefit of using a WL filter compared to a linear filterstems from the fact that speech enhancement algorithms oftenoperate in the frequency domain, which yields complex signalswith non-circular statistics. With a linear filter, due to circu-larity assumptions that are imposed, any non-circular secondorder statistics are neglected which could result in suboptimalsolutions. Therefore in order to fully exploit the non-circularityof the second order statistics a WL filter should be used.

In WL filtering a complex signal is augmented with itsconjugate and a filter is derived from the correspondingcompound signal. This can sometimes improve performancein a mean squared error (MSE) sense but by no more than afactor of 2 [3]. In fact with certain signal models, e.g., doublewhite, it can be shown that the WL filter offers no benefit tothe linear filter [4], [5].

In [6], [7], [8], [9], [10], WL filtering has been appliedto speech enhancement and echo cancellation in binauralhearing aids or other stereo systems, where two real signalsare combined to form a single complex signal which is thenused in a WL framework. In this paper we show that whilethis formulation presents a novel approach, it cannot result in

This research work was carried out at the ESAT Laboratory of KU Leuven,in the frame of KU Leuven Research Council CoE EF/05/006 Optimizationin Engineering (OPTEC) and PFV/10/002 (OPTEC), Concerted ResearchAction GOA-MaNet, the Belgian Programme on Interuniversity AttractionPoles initiated by the Belgian Federal Science Policy Office: IUAP P7/19 Dy-namical systems, control and optimization (DYSCO) 2012-2017, ResearchProject iMinds, Research Project FWO nr. G.0763.12 Wireless AcousticSensor Networks for Extended Auditory Communication. Alexander Bertrandis supported by a Postdoctoral Fellowship of the Research Foundation Flanders(FWO). The scientific responsibility is assumed by its authors.

The authors are with the Department of Electrical Engineering ESAT-SCD/ iMinds - Future Health Department, KU Leuven, Kasteelpark Arenberg 10,B-3001 Leuven, Belgium (e-mail: [email protected]; [email protected]; [email protected]).

a performance gain, i.e., the corresponding WL filtered outputcontains exactly the same information as the output from thelinear filter. We show this explicitly for the case of multi-channel Wiener filtering (MWF) and time-domain minimumvariance distortionless response (MVDR) filtering, but a sim-ilar conclusion can be drawn for other types of filters (e.g.,those used in [9], [10]). Furthermore, we demonstrate that,while this approach does not improve performance, it increasesthe computational complexity and memory requirements.

II. SIGNAL MODELFor the binaural speech enhancement problem considered,

we assume that the system contains 2M microphones whichare assumed to be contained in two different M-microphonearrays or hearing aids. The microphone signals are given as

yk,m(t) = xk,m(t) + vk,m(t), k = 0, 1, m = 0 . . .M 1,(1)

where xk,m(t) is the speech component and vk,m(t) is theadditive noise component. We define the ML-dimensionalstacked microphone signal vector, yk RML, of each arrayas

yk =[yTk,0, . . . ,y

Tk,M1

]T, k = 0, 1 (2)

where T indicates the transpose, yk,m RL is defined as

yk,m =

yk,m(t).

.

.

yk,m(t L+ 1)

, k = 0, 1 (3)

and a 2ML-dimensional signal vector y R2ML is defined asy =

[yT

0yT

1

]T (4)where x and v are defined similarly.

III. LINEAR FILTERINGIn this section, we review two different filtering techniques

for speech enhancement which will be compared to their WLcounterparts in the following section.

A. Linear Multi-Channel Wiener FilterThe goal of the MWF in speech enhancement is to minimize

the mean squared error (MSE) between a desired speechcomponent of a reference microphone signal and a linearlyfiltered version of the microphone signals. The linear MSEcost function at each array is given as

minimizewkMWF

E{|dk wTkMWFy|2} (5)

2where dk is the desired speech component and E{.} denotesthe expected value. For ease of exposition it is assumed thatthe first microphone signal of each array, i.e., d0 = x0,0 andd1 = x1,0, acts as the reference microphone.

The solution at each array, takes the form of the MWF [11],wkMWF = R

1

yyRxxek (6)

where Ryy = E{yyH}, Rxx = E{xxH}, and ek is a vectorwith one entry equal to 1 and 0 otherwise that selects thecolumn of Rxx that corresponds to the reference microphone.The estimated desired speech component of each array is thengiven as

dkMWF = wTkMWF

y

= eTkRxxR1

yyy. (7)

B. Linear Minimum Variance Distortionless Response FilterIt is known that (6) suppresses noise with the adverse effect

of distorting the speech. In order to avoid this an MVDRfilter can be used, which minimizes the output power whileimposing a linear constraint to enforce a distortionless filterresponse.

It is noted here that a true distortionless response can onlybe accomplished if the Rxx has rank-1, which is why MVDRfiltering is usually applied in the frequency domain wherethis rank-1 model holds as in the case of a single targetspeaker. In [8], MVDR filtering is proposed for time-domainspeech enhancement. However, since the rank-1 assumptionthen generally does not hold, this MVDR approach is notstrictly distortionless, in the sense of delivering an undistortedspeech signal, despite its name.

The MVDR cost function and linear constraint are given as[6]

minimizewkMVDR

wTkMVDRRyywkMVDR

subject to wTkMVDRak = 1(8)

where ak is a response vector, which is a scaled version ofRxxek, i.e.,

ak =Rxxek

eTkRxxek. (9)

The solution to (8) is given by [12]

wkMVDR =R1

yyak

aTkR1yyak

(10)

and using the definition of the response vector (9) the MVDRfilter is shown to be equivalent up to a scaling factor, k, tothe MWF (6), i.e.,

wkMVDR = kR1

yyRxxek (11)

wherek =

eTkRxxek

eTkRxxR1yyRxxek

. (12)

The estimated desired speech component at each array is thengiven as

dkMVDR = keTkRxxR

1

yyy

= kdkMWF . (13)

d0MWF

dA0dB0

x0,0

0 d0MWF

1 d1MWF

x1,0d1MWF y

Fig. 1. Graphical representation between the linear MWF and linear MVDRsolutions.

The similarity between the MWF and MVDR is showngraphically in Figure 1. The linear MWF projects x0,0 andx1,0 orthogonally into the y-plane. The MWF estimate forx0,0, denoted here again as d0MWF , has a component dA0along x0,0 and a component dB0 orthogonal to x0,0, whered0MWF = d

A0+ dB

0. The MVDR is then a stretched version of

the MWF solution (scaling with 0) until the dA0 componentlands in x0,0. The same process happens for the estimate ofx1,0.

Since the MWF is known to distort the speech, and since(11) is equivalent to an MWF (up to a fixed scaling), we indeedfind that the time-domain MVDR is also not distortionless.Basically, the linear constraint in (9) only ensures that thecovariance between the MVDR filter output dkMVDR and thedesired signal xk,0 is equal to the variance of xk,0, i.e.,E{xk,0dkMVDR} = E{x2k,0} [8].

Despite their theoretical equivalence, an adaptive implemen-tation of (6) or (11) based on block-processing may result indifferent output signals due to time variations in k such thatE{xk,0dkMVDR} = E{x2k,0} in each block. Finally, it is notedthat the MWF is known to preserve the binaural cues of thespeech [13]. Since the time-domain MVDR filter correspondsto an MWF with an additional scaling k which is differentfor k = 0, 1 this will result in distortion of the binaural cues.

IV. WIDELY LINEAR FILTERINGThe derivation of the linear MWF (6) and MVDR filter (11)

also holds for complex-valued signals, but then the transposeoperator T should be replaced by a transpose conjugation H inevery equation. However, complex signals allow to also exploitthe non-circularity of the signals if widely linear filteringtechniques are used instead [2], [3], [4].

In [6], [7], [8], [9], a complex signal vector is artificiallyconstructed from the real signals received at both arrays,

y = y0 + jy1 (14)where j =

1, to be able to apply WL filtering techniques.WL filtering then amounts to using the original linear filterson an augmented 2ML-dimensional signal vector, y = C2ML,defined as

y =

[y

y

](15)

3where denotes complex conjugation and where x and v aredefined similarly. This augmented signal vector can easily beshown to be a transform of the original signal vector in (4),

y =

[I jI

I jI]y. (16)

This transformation may then be used to show the equivalencebetween the estimated desired speech component found withthe WL filters and the estimated desired speech components,(k = 0, 1), found with the linear filters when applied to realsignals.

A. Widely Linear Multi-Channel Wiener FilterThe WL-MWF of (6) is given as

wMWF = R1

yyRxxe0 (17)

where Ryy = E{yyH} and Rxx = E{xxH}. The estimateddesired speech component using (17) is then given as

dMWF = eT0RxxR

1

yyy (18)

which in (19) is expanded using (16). Simplifying (19) we seethat the estimated desired speech component is given by

dMWF = [1|j][eT0

eT1

]RxxR

1

yyy

= [1|j][d0MWFd1MWF

]. (20)

The WL-MWF output then fully corresponds to the linearMWF outputs.

B. Widely Linear Minimum Variance Distortionless ResponseFilter

In [6] a WL response vector, a, is used for the solution tothe WL-MVDR filter given as

a =Rxxe0

eT0Rxxe0

. (21)

The WL-MVDR of [6] may then be given as

wMVDR =R1

yya

aHR1yya

(22)

and using the definition of the response vector (21) the WL-MVDR filter is shown to be equivalent up to a scaling factor,, to the WL-MWF (17), i.e.,

wMVDR = R1

yyRxxe0 (23)

where =

eT0Rxxe0

eT0RxxR

1

yyRxxe0

. (24)

The estimated desired speech component using (22) is thengiven as

dMVDR = eT0RxxR

1

yyy (25)

which uses the same expansion as in (19). Simplifying (25)with (19) we see that the estimated desired speech componentis given by

dMVDR = [1|j][eT0

eT1

]RxxR

1

yyy

= [1|j][d0MWFd1MWF

]

= [1|j][

1

0d0MVDR

1

1d1MVDR

]. (26)

The WL-MVDR output then corresponds to the linear MVDRoutputs up to a real-valued scaling with

0and

1(or to the

linear MWF up to a joint scaling with ).It is noted that

=eT0Rxxe0 + e

T1Rxxe1

eT0RxxR

1yyRxxe0 + eT1RxxR

1yyRxxe1

(27)

and so can also be computed from quantities available inthe linear filtering approach hence both approaches yield theexact same information.

The WL-MWF gives the same estimates for x0,0 and x1,0as the linear MWF. The WL-MVDR is obtained by equallystretching the linear MWF solutions by . If the dA

0component

is stretched into something longer than x0,0, then the dA1 isstretched into something shorter than x1,0, and vice versabecause the two stretched components, dA

0and dA

1, now

jointly satisfy one equation, i.e., E{x0,0d0MWF+x1,0d1MWF} =E{x2

0,0 + x2

1,0} . The similarity between the linear MWF andWL-MVDR can be shown graphically as in Figure 1 where thevectors representing the WL-MVDR solution would be equallengths.

V. EQUIVALENCE OF THE MVDR SCALING FACTORSUNDER A RANK-1 MODEL

Originally, the MVDR approach was designed for scenarioswith a rank-1 model for Rxx orRxx. We show that when sucha rank-1 model is used for Rxx or Rxx the scaling factorsfor the linear MVDR and WL-MVDR are equivalent.

A. Linear MVDR scaling factorThe singular value decomposition (SVD) of the assumed

rank-1 Rxx matrix is given as

Rxx = UxxVTx (28)

where x = diag(x, 0, . . . , 0) and the elements of Ux andVx are given as uxi,j and vxi,j respectively. Using this SVDof the Rxx matrix the numerator of (12) is shown to be

eT0Rxxe0 = e

T0UxxV

Hx e0 = xux1,1vx1,1 . (29)

dMWF = eT0

[I jI

I jI]Rxx

[I jI

I jI]H [

I jI

I jI]H

R1yy

[I jI

I jI]1 [

I jI

I jI]y (19)

4For the denominator of (12) we express the SVD asRxxR

1

yyRxx = UxxyV

Tx (30)

where

xy = xVTxR

1

yyUxx (31)

which is another diagonal matrix with a single non-zeroelement, i.e., diag(xy, 0, . . . , 0). Therefore

eT0RxxR

1

yyRxxe0 = e

T0UxyxyV

Txye0

= xyux1,1vx1,1 (32)and the scaling factor 0 can be shown to be equal to

0 =eT0Rxxe0

eT0RxxR

1yyRxxe0

=x

xy(33)

which is also the same for the k = 1 array, i.e., 0 = 1.

B. Widely linear MVDR scaling factorIn the WL case, the numerator of (24) using (16), and the

SVD of Rxx, (28), is given as

eT0Rxxe0 = e

T0

[I jI

I jI]Rxx

[I jI

I jI]Te0

= [1|j][eT0eT1

]Rxx

[e0 e1

][1|j]H . (34)

However since Rxx is symmetric, the multiplication of thevectors [1|j] and [1|j]H cancel out the off-diagonal termswhile summing the diagonal terms. Therefore (34) can berepresented as

eT0Rxxe0 = Tr

{[eT0eT1

]Rxx

[e0 e1

]}

= Tr{[eT0

eT1

]UxxV

Tx

[e0 e1

]}= x(ux1,1vx1,1 + uxML+1,1vxML+1,1). (35)

The denominator is expanded in a similar fashion as

eT0RxxR

1

yyRxxe0 = Tr

{[eT0eT1

]UxxyV

Tx

[e0 e1

]}= xy(ux1,1vx1,1 + uxML+1,1vxML+1,1).

(36)The WL-MVDR scaling factor is therefore equivalent to thelinear MVDR scaling factor (33),

=eT0Rxxe0

eT0RxxR

1

yyRxxe0

=x

xy. (37)

VI. COMPUTATIONAL COMPLEXITYThe WL filtering approach computes a single complex

valued filter of length 2ML, while the linear filtering approachcomputes two real valued filters of length 2ML (one foreach desired signal dk , where k = 0, 1). However complexarithmetic is 4 times more expensive than real arithmetic. As aresult the WL filtering approach is actually twice as expensivecompared to the linear filtering approach with no increase inperformance. Furthermore, the two real-valued linear filters

can share many of their computations (e.g., the inversionof Ryy), which makes the WL filtering approach actuallymore than twice as expensive compared to the linear filteringapproach.

VII. CONCLUSIONSAn equivalence was shown between the estimated desired

speech components using time-domain linear and widely linearfilters in binaural speech enhancement applications when onlyreal signals are used. While the WL filters offer a novel way torepresent the received real signals as a single complex signalthere is no added benefit in terms of speech enhancement.However by using an artificially constructed complex signalthe memory requirement of the system is increased as well asthe computational complexity.

REFERENCES[1] J. Benesty, J. Chen, and Y.A. Huang, A widely linear distortionless

filter for single-channel noise reduction, IEEE Signal Process. Lett.,vol. 17, no. 5, pp. 469 472, May 2010.

[2] J. Benesty, J. Chen, and Y.A. Huang, On widely linear Wiener andtradeoff filters for noise reduction, Speech Communication, vol. 52, no.5, pp. 427 439, 2010.

[3] P.J. Schreier, L.L. Scharf, and C.T. Mullis, A unified approach toperformance comparisons between linear and widely linear processing,in Proc. IEEE Workshop on Statistical Signal Process., Sep. 2003, pp.114 117.

[4] T. Adali, Hualiang Li, and R. Aloysius, On properties of the widelylinear MSE filter and its LMS implementation, in 43rd Annu. Conf. onInform. Sciences and Systems (CISS 09), Mar. 2009, pp. 876 881.

[5] B. Picinbono and P. Chevalier, Widely linear estimation with complexdata, IEEE Trans. on Signal Proces., vol. 43, no. 8, pp. 2030 2033,Aug. 1995.

[6] J. Chen and J. Benesty, A time-domain widely linear MVDR filterfor binaural noise reduction, in Proc. IEEE Workshop on Applicat. ofSignal Proces. to Audio and Acoust. (WASPAA 11), Oct. 2011, pp. 105108.

[7] J. Benesty and J. Chen, A multichannel widely linear approach tobinaural noise reduction using an array of microphones, in Proc. IEEEInt.Conf. on Acoust., Speech and Signal Process. (ICASSP 12), Mar.2012, pp. 313 316.

[8] J. Benesty, J. Chen, and Y.A. Huang, Binaural noise reduction in thetime domain with a stereo setup, IEEE Trans. Audio, Speech, andLanguage Process., vol. 19, no. 8, pp. 22602272, Nov. 2011.

[9] J. Chen and J. Benesty, On the time-domain widely linear LCMVfilter for noise reduction with a stereo system, IEEE Trans. on Audio,Speech, and Language Process, vol. 21, no. 7, pp. 13431354, 2013.

[10] C. Stanciu, J. Benesty, C. Paleologu, T. Gnsler, and S. Ciochin, Awidely linear model for stereophonic acoustic echo cancellation, SignalProcessing, vol. 93, no. 2, pp. 511516, 2013.

[11] S. Doclo and M. Moonen, GSVD-based optimal filtering for single andmultimicrophone speech enhancement, IEEE Trans. on Signal Proces.,vol. 50, no. 9, pp. 2230 2244, Sep. 2002.

[12] E.A.P. Habets, J. Benesty, S. Gannot, and I. Cohen, The MVDRbeamformer for speech enhancement, in Speech Processing in ModernCommunication, Israel Cohen, Jacob Benesty, and Sharon Gannot, Eds.,vol. 3 of Springer Topics in Signal Processing, pp. 225 254. SpringerBerlin Heidelberg, 2010.

[13] B. Cornelis, S. Doclo, T. Van dan Bogaert, M. Moonen, and J. Wouters,Theoretical analysis of binaural multimicrophone noise reduction tech-niques, IEEE Trans. on Audio, Speech, and Language Process., vol.18, no. 2, pp. 342355, Feb. 2010.

Documents

12-235