Improving the Tracking Capability of Adaptive Filters via Convex Combination

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008 3137

Improving the Tracking Capability of AdaptiveFilters via Convex Combination

Magno T. M. Silva, Member, IEEE, and Vítor H. Nascimento, Member, IEEE

Abstract—As is well known, Hessian-based adaptive filters(such as the recursive-least squares algorithm (RLS) for super-vised adaptive filtering, or the Shalvi–Weinstein algorithm (SWA)for blind equalization) converge much faster than gradient-basedalgorithms [such as the least-mean-squares algorithm (LMS)or the constant-modulus algorithm (CMA)]. However, whenthe problem is tracking a time-variant filter, the issue is not soclear-cut: there are environments for which each family presentsbetter performance. Given this, we propose the use of a convexcombination of algorithms of different families to obtain an algo-rithm with superior tracking capability. We show the potential ofthis combination and provide a unified theoretical model for thesteady-state excess mean-square error for convex combinations ofgradient- and Hessian-based algorithms, assuming a random-walkmodel for the parameter variations. The proposed model is validfor algorithms of the same or different families, and for supervised(LMS and RLS) or blind (CMA and SWA) algorithms.

Index Terms—Adaptive equalizers, adaptive filters, convex com-bination, least-mean-square (LMS) methods, recursive estimation,tracking, unsupervised learning.

I. INTRODUCTION

WHEN choosing an adaptive algorithm for a given appli-cation, one of the important points to be considered is

the algorithm’s ability to track variations in the statistics of thesignals of interest. This is especially important in mobile com-munications [1], and in applications that demand long filters,such as acoustic echo cancellation [2].

There are two standard approaches to adaptive filtering: gra-dient-based, such as the least-mean-squares algorithm (LMS);and Hessian-based, such as the recursive least-squares algo-rithm (RLS). Of these two, the latter, given its use of estimatesof the Hessian of the cost function being minimized, convergesat a much faster rate than the former, as is well-known [3],[4]. However, Eweda showed in [5] that for the tracking oftime-variant parameters, LMS may in fact outperform RLS, de-pending on the statistics of the regressor and desired signals. Asimilar behavior was observed in blind algorithms for channelequalization: the gradient-based constant-modulus algorithm(CMA) [6], [7] has a considerably slower convergence than theHessian-based Shalvi–Weinstein algorithm (SWA) [8]. Again,

Manuscript received July 2, 2007; revised January 9, 2008. This work wassupported in part by CNPq under Grant 303361/2004-2 and in part by FAPESPunder Grants 2004/15114-2 and 2008/00773–1. The associate editor coordi-nating the review of this manuscript and approving it for publication was Prof.Jonathon Chambers.

The authors are with the Department of Electronic Systems Engineering,University of São Paulo, 05508-900 São Paulo-SP, Brazil (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2008.919105

as in the case of LMS and RLS, it was shown in [9] that thetracking capabilities of CMA and SWA depend heavily on thestatistics of the input signals, and CMA may outperform SWA,depending on the environment.

In this paper, we use the observations of Eweda and [9], to-gether with the convex combination of adaptive filters proposedin [10] and further extended and analyzed, respectively, in [11]and [12], to take advantage of the different tracking capabilitiesof LMS and RLS (resp., CMA and SWA) to arrive at supervised(resp., blind) algorithms with superior tracking performance.

The idea of combining the outputs of several different in-dependently-run adaptive algorithms to achieve better perfor-mance than that of a single filter is not new. It apparently wasfirst proposed in [13] and later improved in [14] and [15]. Sim-ilar ideas have also been used in the information theory litera-ture, see, e.g., [16]. The algorithms proposed in [13]–[15] arebased on a Bayesian argument, and construct an overall (com-bined) filter through a linear combination of the outputs of sev-eral independent adaptive filters. The weights are the a poste-riori probabilities that the underlying models used to describeeach individual algorithm are “true.” Since the weights add upto one, in a sense these first papers also proposed “convex” com-binations of algorithms. The method of [12] is receiving moreattention due to its relative simplicity and the proof that the com-bination is universal, i.e., the combined estimate is at least asgood as the best of the component filters in steady-state, for sta-tionary inputs.

Previous works on convex combinations of adaptive filtersmostly restricted themselves to combinations of filters of thesame families, i.e., two LMS [11], [12], [17], two RLS [11] ortwo CMA [18] filters with different step-sizes or forgetting fac-tors. A combination of two filters based on different cost-func-tions (but both gradient-based) was proposed in [19], combiningnormalized LMS and normalized sign-LMS to obtain an algo-rithm with improved robustness without the slow convergencebehavior of sign-LMS. Combinations of Kalman or RLS filterswere proposed in [14] and [15], using a different combinationrule, which also allows for the use of other algorithms and for theuse of filters with a different number of taps. It should be notedthat theoretical models (approximations for the overall filter’ssteady-state excess mean-square error) are available in the lit-erature only for the combination of two LMS algorithms [12],[17]. However, the possibility of extension of these models todifferent combinations of algorithms was already indicated in[12] and [17]. The proof in [12] that the combination is uni-versal also applies to different choices of algorithms.

The present paper extends previous results in four ways:1) proposing the combination of supervised algorithms of dif-ferent families to take advantage of their different tracking

1053-587X/$25.00 © 2008 IEEE

3138 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 7, JULY 2008

capabilities; 2) extending this result also to blind algorithms ofdifferent families; 3) providing theoretical models (in a unifiedway) for the steady-state mean-square error for combinations offilters of the same or different families, assuming a random-walkmodel for the parameter variations; and 4) providing theoreticalmodels for combinations of blind algorithms of the same or dif-ferent families.To thebestofourknowledge,allof thesearenovelcontributions. In particular, the models for the combinations oftwo RLS, two CMA or two SWA filters are also new results. Forcombinations of filters of different families, the results presentedhere are more accurate than those we published as conferencepapers, in [20] (supervised filters) and [21] (blind filters). Wealso extend these previous results both by providing a unifiedanalysis, which is valid for combinations of filters of the sameor of different families, and for supervised or blind algorithms.In this sense, the analysis provided here also recovers the resultsfor the combination of two LMS filters, presented in [12]. Unlikethis reference, here we use the traditional analysis method, whereone computes a recursion for the autocorrelation matrix of theweight-error vector of a filter, as opposed to the feedback methodof [3]. In passing, we should add that the analysis for blind adap-tive filtersusing the traditionalmethod that we present here is alsonovel, and it gives the same results obtained using the feedbackmethod for CMA in [22]–[24] and for SWA in [9].

In the remainder of this section, we provide a few examplesto motivate the combination of filters of two different families,both for supervised (combination of one RLS with one LMS)and for blind algorithms (combination of one SWA with oneCMA).

A. Introductory Simulations

In the supervised case, we simulate the identification of atime-variant channel (Rayleigh fading channel) with five coef-ficients [3, p. 401]. The parameters , , , and , whichcontrol the adaptive filters and the combination algorithm, aredescribed in Section II. Fig. 1(a) shows curves of one realiza-tion of squared a priori errors for RLS ( ), LMS (

), and their convex combination C-RLS-LMS ( ,). To facilitate the visualization, the curves were filtered

by a moving-average filter with 512 coefficients. The convexcombination performs at least as well as the best of its compo-nents, outperforming slightly both of them in some situations.This behavior can be confirmed by the mixing parametershown in Fig. 1(b). When , the combination performsclose to RLS, when , it is close to LMS, and when

, the combination tends to be better than both in-dependent filters.

For the same example, we show in Fig. 2 a comparisonbetween C-RLS-LMS, the convex combination of two LMSs(CLMS), and the robust variable step-size LMS (RVSS-LMS)of [25]. We observe that C-RLS-LMS presents better trackingperformance than CLMS, and both are better than RVSS-LMS.Thus, the convex combination of one RLS with one LMS canbe a better alternative for tracking performance.

In the blind equalization case, we consider a Rayleigh fadingchannel with fast variation (maximum Doppler spread

Hz) and three coefficients [3, p. 401]. Fig. 3(a) shows residualintersymbol interference (ISI) [8] curves for SWA ( ),

Fig. 1. Time-variant channel identification. (a) Squared a priori errors for RLS(� = 0:995), LMS (� = 0:01), and their convex combination C-RLS-LMS(� = 400,� = 4). (b) Evolution of the mixing parameter. Input: correlatedGaussian noise, AR-1 model with pole at 0.99; measurement noise with � =10 ; Rayleigh fading channel (five coefficients, symbol period T = 0:8 �s,maximum Doppler spread f = 50 Hz).

Fig. 2. Same example of Fig. 1. Squared a priori errors for C-RLS-LMS,CLMS (� = 0:01, � = 0:001, � = 400, � = 4), RVSS-LMS(� = � , � = � , � = (� + � )=2, � = 0:97).

CMA ( ), and their convex combination ( ,). The combination usually performs as the best of each

component equalizer, being slightly better than both of them insome situations. In this example, the adaptation of the mixingparameter was not fast enough to switch between filters in a fewbrief occasions, most notably at the end of the simulation. Thishappens because the adaptation rule for the mixing parameter

needs some time to identify that a change is necessary.Fig. 3(b) shows the mixing parameter , which confirms thisbehavior. When , the combination performs close toSWA, when , it is close to CMA, and when

, it can be better than both of its equalizers.

SILVA AND NASCIMENTO: IMPROVING THE TRACKING CAPABILITY OF ADAPTIVE FILTERS VIA CONVEX COMBINATION 3139

Fig. 3. Blind equalization. Residual intersymbol interference curves for CMA(� = 2 � 10 ), SWA (� = 0:999), and their convex combination (� =

15, � = 4); 2-PAM (pulse amplitude modulation); Rayleigh fading channel(three coefficients, symbol periodT = 0:8�s, maximum Doppler spread f =

80 Hz); SNR = 30 dB; baud-rate equalizer with 11 coefficients.

B. Organization of the Paper

The paper is organized as follows. In the next section, we de-scribe the convex combination of two adaptive filters, for bothsupervised (RLS and LMS) and blind (SWA and CMA) algo-rithms. In Section III, the tracking analyses are presented. Ini-tially, in Section III-A, we summarize results for the trackinganalysis of the LMS and RLS algorithms. Then, in Section III-B,we present the tracking analysis of CMA and SWA using thetraditional method. Finally, in Section III-C, the tracking anal-ysis of the considered convex combinations is provided. Com-parisons between analytical and experimental results for thesteady-state excess mean-square error are shown through simu-lations in Section IV. Section V provides a summary of the mainconclusions of the paper.

II. PROBLEM FORMULATION

We focus on the convex combination of two algorithms of thefollowing general class

(1)

where the subscript is associated to the first ( ) orsecond ( ) filter of the combination, represents thelength- coefficient column-vector, is a step-size, isan -by- symmetric nonsingular matrix, is the inputregressor column-vector, and is the estimation error.Many algorithms can be written as in (1), by proper choicesof , , and . In this paper, in order to simplify thearguments, we assume that all quantities are real.

TABLE IPARAMETERS OF THE CONSIDERED ALGORITHMS

In supervised adaptive filtering

(2)

where is the output of the thtransversal filter. denotes transposition, and is thedesired response. In this case, a linear regression model holds,that is

(3)

with being the time-variant optimal solution anda zero-mean random process with variance ,uncorrelated with [3, Sec. 6.2.1]. Here, denotes theexpectation operation and the sequences and areassumed stationary. We shall use the common assumption that

is independent of (not only uncorrelated) [3].In blind equalization, algorithms based on the constant mod-

ulus cost function [6], [7] define as

(4)

where and represents thetransmitted sequence. Due to the equivalence between the con-stant modulus and Shalvi–Weinstein cost functions shown in[26], CMA and SWA seek to optimize the same criterion. Thus,although SWA was originally derived in [8] through the mini-mization of the SW cost function using empirical cumulants, itcan also be interpreted as a constant-modulus-based algorithm.

The supervised LMS and RLS algorithms and the blind CMAand SWA employ the step-sizes , the estimation errors ,and the matrices as in Table I. In this table, is the

identity matrix, is a forgetting factor, and

(5)

For RLS and SWA, is obtained via the matrixinversion lemma [3, Eq. (2.6.4)] applied to , which is anestimate (with forgetting factor ) of the autocorrelation matrixof the input signal, i.e., . These matricesare related via

(6)

Although we use the same notation for the LMS and CMAstep-sizes in Table I, the step-size intervals which ensure theconvergence and stability of such algorithms are different. For


Fig. 4. Adaptive convex combination of two transversal filters for (a) super-vised filtering and (b) blind equalization.

the LMS algorithm, this step-size interval is well-known in theliterature [3], [4], whereas for CMA, the derivation of this in-terval remains an open problem.

The convex combination of two adaptive filters proposed in[11], [12], and [18] is depicted in Fig. 4. Fig. 4(a) considers su-pervised filtering and can be used for different applications, suchas system identification, adaptive equalization, echo or noisecancellation, etc. [3], [4]. Fig. 4(b) shows a simplified commu-nications system with a convex combination of two blind equal-izers. In this case, the signal , assumed independent andidentically distributed (i.i.d.) and non-Gaussian, is transmittedthrough an unknown channel, whose model is constituted by afinite-impulse response (FIR) filter and additive white Gaussiannoise. From the received signal and the known statisticalproperties of the transmitted signal, the blind equalizer mustmitigate the channel effects and recover the signal for somedelay .

We also assume that the equalization algorithms are imple-mented in -fractionally spaced form, due to its inherentadvantages (see, e.g., [22] and [27]–[29] and the referencestherein). This type of implementation is widely used in the litera-ture since it ensures perfect equalization in a noise-free environ-ment, under certain well-known conditions. For real data, perfectequalization is achieved when the overall channel-equalizerimpulse response is of the form ,where . In this case, the equalizer reaches the so-calledzero-forcing solution and . The two

possibilities or occur due the fact that con-stant-modulus-based algorithms do not solve phase ambiguitiesintroduced by the channel. Such ambiguities can be correctedby using differential modulation [1]. Since both solutions giveequally good results, we assume in the next section that thealgorithm is initialized so that it converges to the case of .Since we are studying its steady-state performance, this doesnot imply a restriction in the applicability of our results.

In both schemes, the output of the overall filter is given by. The mixing parameter

is modified via an auxiliary variable and a sig-moidal function [11], [12], that is,

(7)

with being updated as

(8)

where

(9)

for supervised adaptive filtering, andfor blind equalization. Equation (8)

was obtained in [11], [12], and [18], using a stochastic gradientrule to minimize the instantaneous MSE cost function for thesupervised case and the constant modulus cost function for theblind case. The auxiliary variable is used to keep inthe interval [0, 1]. A drawback of this scheme is that stopsupdating whenever is close to 0 or 1. To avoid this, [12]and [18] suggest that be restricted (by simple saturation)to lie inside a symmetric interval . Thus, a minimumlevel of adaptation is always guaranteed [12].

Following [12], in all our simulations we restrict to theabove interval, but use a modified mixing variable , asdescribed below

if ,if ,if

where is a small constant. This modification tends to improvethe performance of the overall algorithm when one of the com-ponent filters performs substantially better than the other [12].Note that the adaptation rule for still uses the original, un-modified , but the overall output is now given by

III. TRACKING ANALYSIS

We assume that in a nonstationary environment, the variationin the optimal solution follows a random-walk model [3, p.359], that is

(10)

In this model, is an i.i.d. vector with positive-definite au-tocorrelation matrix , independent of the


initial conditions and of forall [3, Sec. 7.4]. In supervised filtering, is also assumedindependent of the desired response for all . In blindequalization, represents the zero-forcing solution and

models the channel variation (see assumption A3 below).One measure of the filter performance is given by the excess

mean-square error (EMSE), defined as

(11)

where

(12)

and

(13)

In steady-state, the a priori error of the overall scheme canbe written as a function of the a priori errors of the componentfilters, i.e.,

(14)

where and, . It is common in the literature to evaluate

the EMSE as

(15)

where stands for the trace of matrix and

(16)

This approach is based on the independence assumption be-tween the regressor vector and weight-error vector

. This condition is a part of the widely used independence as-sumptions in adaptive filter theory [4]. It was shown in [30],for instance, that for LMS-type algorithms and for sufficientlysmall step-sizes, the results obtained from such independenceassumptions tend to match reasonably well the real filter per-formance. Furthermore, [31] argues that it is not necessary for

to be independent of , but only of ,which represents a weaker assumption, since the outer productdoes not contain phase information about .

The main focus of the analysis that follows is the behavior ofthe algorithm in steady-state, i.e., after initial convergence of thecoefficients. Although the optimum weights are time-variant,under the model assumed for their variation, it is well-knownthat the EMSE of an adaptive filter approaches a steady-statevalue, see, e.g., [3].

It was proved in [12] that the performance of convex com-binations of adaptive filters is at least as good as that of thebest of its components in stationary environments. In this paper,we will make the assumption that the adaptation of is fastenough, so that, after initial convergence, the overall algorithmwill follow the best component filter at every instant. The sim-ulations presented in Section I-A shows that, even when theoptimum coefficients change quite rapidly, this assumption is

rarely violated. Note that the same assumption was implicitlyused in [12].

A. Tracking Analysis of Supervised Algorithms

There have been several works in the literature on the steady-state tracking performance of supervised adaptive algorithms(see, e.g., [3]–[5], and [24] and the references therein). For suffi-ciently small and , analytical expressions for the EMSEof LMS and RLS algorithms are given, respectively, by

(17)

and

(18)

The ratio between the minimum value of for these algorithmsis given as

(19)

This ratio, obtained in [5], allows us to compare the tracking per-formance of LMS and RLS. Clearly, the results of such compar-ison depend on the environment, i.e., there are situations whereRLS has superior tracking capability compared to LMS, andvice versa [5]. This is highlighted considering three differentchoices for matrix [3]–[5].

1) is a multiple of : the performance of LMS is similarto that of RLS.

2) is a multiple of : LMS is superior.3) is a multiple of : RLS is superior.

B. Tracking Analysis of Blind Equalization Algorithms

Analytical expressions for the EMSE of blind equalizationalgorithms have been computed in the literature (see, e.g., [9],[22]–[24], and [32]). Using Lyapunov stability and averaginganalysis, an approximate expression for the EMSE of CMAwas obtained in [32]. Later, [22] and [23] focused on the CMAsteady-state performance, using the feedback analysis. Consid-ering still the feedback method, [9] analyzed the tracking of con-stant-modulus-based algorithms (including SWA) in a unifiedmanner. In the sequel, we present an alternative analysis usingthe traditional method, i.e., we compute the EMSE of CMA andSWA via (15). In the remainder of this section, we suppress thesubscript , since we are interested in the analysis of each algo-rithm individually.

The steady-state analysis of blind algorithms of the form (1)is based on the following assumptions:

A1) , , , and ,i.e., is sub-Gaussian, and the constellation is sym-metric, as is the case for most constellations used in dig-ital communications [3], [8].

A2) The regressor vector and the weight-error vectorare independent in the steady-state. As men-

tioned before, this independence assumption is com-monly used with good results for the analysis of adaptivealgorithms [4].


A3) The signal-to-noise ratio at the input is high, so that, i.e., the optimum filter

achieves perfect equalization. However, due to channelvariation and gradient noise, the equalizer weight vector

is not equal to , even in steady-state.Using (13), (12), and the above approximation, we have

, that is,

(20)

This approximation was also used in the CMAsteady-state analyses of [22, Sec. III-A] and [23].As in the cited references, we assume that the filterparameters and initial condition are chosen so thatthe combined channel-equalizer response converges to

, with . A similaranalysis holds for the other optimum solution (conver-gence to a zero-forcing solution with ).

Using A3, (4) can be rewritten as

(21)

where

(22)

and

(23)

If is reasonably small in steady-state, terms depending onhigher order combinations of can be disregarded in (21),which leads to the approximation

(24)

From (23) and A1, is an i.i.d. random variable, which sat-isfies and

(25)

To proceed, we also assume thatA4) and are independent of

in steady-state. This essentially requires that theweigh-error vector be insensitive, in steady-state, tothe actually transmitted symbols .

Subtracting both sides of (1) from , using the model(10) and the approximation (24), we arrive at

(26)

Remark: Note that (26) also holds for supervised filters, onlyin that case we would have , and the measurementnoise instead of . In addition, is identically zerofor constellations which do have constant modulus, so the vari-ability in the modulus of (as measured by ) plays therole of measurement noise for the class of blind algorithms con-sidered here.

Multiplying (26) by its transpose, taking the expectations ofboth sides, and using the fact that is independent of theinitial conditions and of , we obtain

(27)

To simplify (27), we remark thatR1) When appears inside the expectations

of (27), we simply replace it by its mean. Using (6), wehave . For , this is areasonable steady-state approximation [3, Sec. 6.9.2].

R2) For channels with long impulse response, the followingapproximations are reasonable:

(28)

and

(29)

with and defined in (5) and (25), respectively.R3) In steady-state, the weight-error vector has zero mean,

i.e., .Remarks R2 and R3 are justified in detail in Appendix A.

Now, using assumptions A1-A4 and remarks R1-R3, we canevaluate the terms - of (27):

—Using A2, A4, R1, (28), and (16), can be approxi-mated by

(30)


—Analogously, we obtain for

(31)

—For CMA, since and , reduces to

(32)

Analogously for SWA, replacing by itsmean and , we get

(33)

From (30)–(33), we observe that the four first terms of theright-hand side of (27) are linear in . Thus, as-suming that the CMA step-size is sufficiently small andthe SWA forgetting-factor is sufficiently close to 1, theterm for both algorithms can be neglected with respectto the three first terms on the right-hand side of (27).

—Using R1 and (29), we get

(34)

—The term can be rewritten as

(35)

Since in steady-state, the term is a nullmatrix with dimensions , i.e., . Usingthe same arguments, we can show that the terms , , and

are also null matrices in steady-state.From the previous results, (27) reduces to

(36)

Following a similar analysis for LMS [4], it can be shown thatthis recursion is stable for sufficiently small —however, (36)cannot be used to find a range of that guarantees stability, dueto the approximations made, particularly the discarding of (33)(we intend to pursue this matter elsewhere). For small , when

, we obtain for CMA

(37)

Taking the trace of both sides of (37), we arrive at

(38)

Analogously for SWA, we have

(39)

TABLE IIANALYTICAL EXPRESSIONS FOR EMSE OF THE CONSIDERED ALGORITHMS

Multiplying both sides of (39) by , and taking the trace, weobtain

(40)

These results coincide with those of [23] and [9], obtained withthe feedback analysis for sufficiently small and . Fur-thermore, as shown in [9], the ratio between the minimum valueof of SWA and CMA is equal to (19). This allows a direct ex-tension to the blind context of the results comparing the trackingperformance of the RLS and LMS algorithms.

In Table II, the analytical expressions for the EMSEs of thesupervised and blind algorithms considered here are summa-rized for convenient reference. Comparing these expressions,one can observe that the EMSE of LMS and RLS can also be ob-tained, respectively, from the EMSE of CMA and SWA, making

and . Thus, there is an evident equivalencebetween LMS and CMA and between RLS and SWA, whichreinforces the link between blind equalization algorithms andclassical adaptive filtering of [33].

C. Tracking Analysis of Convex Combinations

Using the same arguments of [12, Sec. III], it is possible toshow that the convex combinations of algorithms of the form (1)are universal in the mean-square error sense. Thus, considering,for example, the convex combination of one RLS and one LMS,when RLS outperforms LMS in the steady-state, the behavior ofthe overall filter will be close to that of RLS and . Onthe other hand, when LMS is superior, . Moreover,there are situations where the combination will outperform bothof them. In this case, the EMSE of the overall filter will be closeto

(41)

where is the cross-EMSE , defined as

(42)

and , . Equation (41) was obtainedin [12, Eq. (33)] for the combination of two LMS filters with


different step-sizes ( -LMS and -LMS). However, it is alsovalid for the convex combination of other algorithms of the form(1), as for the combination of two RLSs with different forgettingfactors ( -RLS and -RLS), one RLS and one LMS ( -RLSand -LMS), two CMAs ( -CMA and -CMA), two SWAs( -SWA and -SWA), and one SWA and one CMA ( -SWAand -CMA). The EMSE of the overall filter is the minimumof the values calculated by the expressions of each componentfilter and (41).

Thus, the tracking analysis of convex combinations of thealgorithms of the form (1) depends on the results of Table II,and on the analytical expressions of the cross-EMSE. Usingindependence assumptions, such expressions can be obtainedthrough the evaluation of

(43)

where

(44)

Although we are more interested in convex combinations ofalgorithms with different tracking capabilities as LMS with RLSor CMA with SWA, we also obtain analytical expressions for

, considering the combinations of two RLSs, two CMAs, andtwo SWAs. Although our method also applies to the combina-tion of two LMS filters, this case was already analyzed in [12]using the feedback method.

Using the linear regression model of (3) for the desired re-sponse , the error , defined in (2) can berewritten as a function of the a priori error and of thedisturbance , i.e.,

(45)

To make the performance analysis of the supervised convexcombinations more tractable, we assume thatA5) the sequence is independent of for all

and . This assumption is widely used in the analysis ofadaptive algorithms [3, p. 284], [12, Sec. II].

Comparing (45) to (24), the errors for supervised or blindfilters satisfy

(46)

where and for a supervised algorithm,or and for a blind one. In both cases,

, and and are assumed to be indepen-dent of in steady-state. Denoting and

, R2 and A5 imply that

(47)

and

(48)

Subtracting both sides of (1) from , using the model(10) and replacing by (46), we arrive at

(49)

In order to obtain , we multiply (49) with by its trans-pose with and take the expectations of both sides. Then,assuming that is independent of the initial conditions andof , after some algebra, we get

(50)

Using the same arguments as in Section III-B, i.e., A1-A4/A5,(47), (48), R1, and R3, (50) can be simplified to

(51)

When , for the combination of -LMS with -LMSor -CMA with -CMA, we obtain

(52)

Taking the trace of both sides (52), we arrive at

(53)

Similar expressions were also obtained using the feedback anal-ysis in [12] for the combination of two LMS filters and in [21]for the combination of two CMAs.

For the combination of -RLS with -RLS or -SWAwith -SWA, replacing , by theirmeans, we obtain in the steady-state

(54)


TABLE IIIANALYTICAL EXPRESSIONS FOR CROSS-EMSE

OF THE CONSIDERED COMBINATIONS

Multiplying both sides of (54) by and taking the trace, wearrive at

(55)

Finally, for the combination of -RLS with -LMS or-SWA with -CMA, we have in the steady-state

(56)

where

(57)

Post-multiplying both sides of (56) by and takingthe trace, we obtain

(58)

The results of (53), (55), and (58) are summarized in Table IIIfor all the convex combinations of adaptive algorithms consid-ered here.

IV. SIMULATION RESULTS

To verify the behavior of the proposed scheme and the validityof the tracking analysis in the supervised case, we consider asystem identification application. The initial optimal solution isformed with independent random values between 0 and1, and is given by

The regressor is obtained from a process as, where is generated

with a first-order autoregressive model, whose transfer function

Fig. 5. (a) EMSE for � -RLS, � -LMS, and their convex combination. (b) En-semble-average of �(n); � = 0:92, � = 0:04, � = 100, � = 4,c = 2�10 , c = 2�10 , b = 0:8; mean of 500 independent runs. In (a),the dashed lines represent the predicted values of � for the convex combination.

is . This model is fed with an i.i.d. Gaussianrandom process, whose variance is such that . More-over, additive i.i.d. noise with variance is addedto form the desired signal.

Fig. 5 shows the EMSE and estimated from the en-semble-average of 500 independent runs for -RLS (

), -LMS ( ), and their convex combination( , ). To facilitate the visualization, the EMSEcurves were filtered by a moving-average filter with 32 coeffi-cients. At every iterations, the nonstationary environment,represented by the matrix , is changed. During the firstiterations, , LMS presents better tracking perfor-mance than that of RLS and the combination performs close toLMS with . When the matrix becomes equalto , this behavior changes: although RLS is slightlybetter than LMS, the combination performs better than both ofthem and . For , the com-bination switches back to LMS and . Finally, for

, the performance of RLS is better and thecombination behaves close to it, with . Thedashed lines in Fig. 5(a) show the predicted values of for thecombination, which are in a good agreement with the experi-mental results. Note also that the small disagreement betweenour model and the simulations for is due to an im-precision in the model for RLS.

Fig. 6 shows the EMSE for different values of , consid-ering theoretical and experimental results for -RLS (

), -LMS ( ), and their convex combination( , ). We assume ensemble-average of 100 in-dependent runs and three different nonstationary environments:

, , and . Good agreement be-tween analysis and simulation can be observed for all kinds of


Fig. 6. EMSE for different values of c considering theoretical and experi-mental results for the convex combination and theoretical results for LMS andRLS, with � = 0:92, � = 12:5, � = 0:04, � = 100, � = 4, b = 0:8;mean of 100 independent runs.

Fig. 7. EMSE for different values of c considering theoretical and experi-mental results for the convex combination and theoretical results for � -RLSand � -RLS, with � = 0:92, � = 0:995, � = 12:5, � = 100, � = 4,b = 0:8; mean of 100 independent runs.

considered nonstationary environments. Note that the small dis-agreement between our model and the simulations observed inFig. 6 for large is due to an imprecision in the model forRLS: this can be seen by comparing the theoretical and simula-tion curves for RLS alone.

In Fig. 7, we show the EMSE for different values of , con-sidering theoretical and experimental results for -RLS (

), -RLS ( ), and their convex combination( , ). For the simulations, each point is anensemble-average of 100 independent runs with .Again, we observe good agreement between theoretical and ex-perimental EMSE. Such agreement also occurs for other kindsof nonstationary environments, with or .

Fig. 8. EMSE for different values of c , considering theoretical and experi-mental results for CMA, SWA, and their convex combination, with � = 0:99,� = 10 , � = 0:1, � = 4; M = 4; 4-PAM; channel [0.1, 0.3, 1, �0.1,0.5, 0.2]; T=2—fractionally-spaced equalizer; mean of 100 independent runs.

In the blind equalization case, we assume 4-PAM (pulseamplitude modulation) with statistics ,

, , and channel coefficients1 [0.1, 0.3, 1,0.1, 0.5, 0.2] [22]. In the combinations, each component filter

has coefficients as a -fractionally spaced equalizerand is initialized with only one non-null element in the secondposition.

Fig. 8 shows the EMSE for different values of , consideringtheoretical and experimental results for -SWA ( ),

-CMA ( ), and their convex combination (, ). Again, each point is an ensemble-average of 100

independent runs, and we present results for three different non-stationary environments: , , and .Although the agreement between analysis and simulation is notas good as in the supervised case, the predicted values modelreasonably well the overall behavior of the algorithms, indepen-dently of the nonstationary environment. Note that a differenceof a few dB is common in models for blind algorithms, due tothe strong assumptions necessary for the analysis.

Figs. 9 and 10 show the EMSE for different values of , con-sidering, respectively, theoretical and experimental results for

-CMA, -CMA, and their convex combination with, and for -SWA, -SWA, and their convex combina-

tion, with . We assume , ,, , , , and ensemble-av-

erage of 100 independent runs. The agreement between analysisand simulation for the combination of two SWAs is better thanthat for the combination of two CMAs. This occurs because thesteady-state model of SWA shows a better agreement with ex-perimental results than that of CMA, as shown in the Figs. 9 and10. This behavior also happens for other kinds of nonstationaryenvironments.

1Although we assume channels with long impulse response to justify R2, ourmodel agrees well with simulations even for a rather short channel such as thisone. Good agreement was also observed with longer channels.


Fig. 9. EMSE for different values of c , considering theoretical and exper-imental results for � -CMA, � -CMA, and their convex combination, with� = 10 , � = 10 , � = 0:1, � = 4; M = 4; 4-PAM; channel[0.1, 0.3, 1, �0.1, 0.5, 0.2]; T=2—fractionally-spaced equalizer; mean of 100independent runs.

Fig. 10. EMSE for different values of c , considering theoretical and experi-mental results for � -SWA, � -SWA, and their convex combination, with � =

0:99, � = 0:999, � = 0:1, � = 4; M = 4; 4-PAM; channel [0.1, 0.3, 1,�0.1, 0.5, 0.2]; T=2—fractionally-spaced equalizer; mean of 100 independentruns.

V. CONCLUSION

In this paper, we proposed the convex combination of filters ofdifferent families (gradient-based and Hessian-based) to achievean overall filter with superior tracking performance. In addition,we presented a unified model for the convex combination of sev-eral different adaptive algorithms, both supervised and unsuper-vised (blind). Our models for the combination of two RLSs, twoCMAs, two SWAs, one LMS with one RLS, and one CMA withone SWA are novel, and show good agreement with simulations.

APPENDIX

REMARKS R2 AND R3

Considering an FIR channel with impulse response, its output in a noise-free environment is

given by

In the analysis, we use . From(5), this approximation reduces to say that

(59)

We check the validity of this approximation, evaluating bothsides of (59). Evaluating the term on the right-hand side of (59),using the fact that is i.i.d., we arrive at

(60)

Analogously for the term on the left-hand side of (59), we obtain

(61)

Replacing by in (61), we get

(62)The value of is nonnegative and depends on the constellation.For example, for the constant-modulus constellation of2-PAM, for 4-PAM, and for 6-PAM. As-suming that is sufficiently large such that ,(62) can be approximated by (60). In this case, (59) and (28) aresatisfied.

To show the validity of , weuse a similar argument. Recalling that is an i.i.d. se-quence, we get

(63)

Replacing in (63),we arrive at

(64)

Again, , are nonnegative and depend on the con-stellation. If the channel is such that the first term of the right-hand side of (64) can be disregarded with respect to the secondterm, (29) of R2 will be satisfied.


To show that in the steady-state, we remarkthat

(65)

This is an exact relation. For , we use R1 to approx-imate . Then,taking the expectations of both sides of (26) and using A2, weobtain

(66)

from and using (65). Since ,we get for sufficiently small .

REFERENCES

[1] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill, 2001.

[2] C. Breining et al., “Acoustic echo control – An application of very-high-order adaptive filters,” Signal Process. Mag., vol. 16, no. 4, pp.42–69, Jul. 1999.

[3] A. H. Sayed, Fundamentals of Adaptive Filtering. New York: Wiley,2003.

[4] S. Haykin, Adaptive Filter Theory, 4th ed. Upper Saddle River: Pren-tice-Hall, 2001.

[5] E. Eweda, “Comparison of RLS, LMS and sign algorithms for trackingrandomly time-varying channels,” IEEE Trans. Signal Process., vol.42, no. 11, pp. 2937–2944, Nov. 1994.

[6] D. N. Godard, “Self-recovering equalization and carrier tracking in twodimensional data communication system,” IEEE Trans. Commun., vol.COM-28, no. 11, pp. 1867–1875, Nov. 1980.

[7] J. R. Treichler and B. Agee, “A new approach to multipath correc-tion of constant modulus signals,” IEEE Trans. Acoust. Speech, SignalProcess., vol. ASSP-28, no. 2, pp. 334–358, Apr. 1983.

[8] O. Shalvi and E. Weinstein, “Super-exponential methods for blind de-convolution,” IEEE Trans. Inf. Theory, vol. 39, no. 2, pp. 504–519, Mar.1993.

[9] M. T. M. Silva and M. D. Miranda, “Tracking issues of some blindequalization algorithms,” IEEE Signal Process. Lett., vol. 11, no. 9,pp. 760–763, Sep. 2004.

[10] M. Martínez-Ramón, J. Arenas-García, A. Návia-Vazquez, and A.R. Figueiras-Vidal, “An adaptive combination of adaptive filters forplant identification,” in Proc. 14th Int. Conf. Digital Signal Process.(DSP’2002), 2002, vol. 2, pp. 1195–1198.

[11] J. Arenas-García, M. Martínez-Ramón, A. Navia-Vázquez, and A.R. Figueiras-Vidal, “Plant identification via adaptive combination oftransversal filters,” Signal Process., vol. 86, pp. 2430–2438, Sep. 2006.

[12] J. Arenas-García, A. R. Figueiras-Vidal, and A. H. Sayed,“Mean-square performance of a convex combination of two adaptivefilters,” IEEE Trans. Signal Process., vol. 54, no. 3, pp. 1078–1090,Mar. 2006.

[13] P. Andersson, “Adaptive forgetting in recursive identification throughmultiple models,” Int. J. Control, vol. 42, pp. 1175–1193, Nov. 1985.

[14] M. Niedzwiecki, “Identification of nonstationary stochastic systemsusing parallel estimation schemes,” IEEE Trans. Autom. Control, vol.35, no. 3, pp. 329–334, Mar. 1990.

[15] M. Niedzwiecki, “Multiple-model approach to finite memory adaptivefiltering,” IEEE Trans. Signal Process., vol. 40, no. 2, pp. 470–473,Feb. 1992.

[16] S. S. Kozat, A. C. Singer, and G. Zeitler, “Universal piecewise linearprediction via context trees,” IEEE Trans. Signal Process., vol. 55, no.7, pp. 3730–3745, Jul. 2007.

[17] J. Arenas-García, A. R. Figueiras-Vidal, and A. H. Sayed, “Trackingproperties of a convex combination of two adaptive filters,” in Proc.IEEE 13th Workshop Statist. Signal Process. (SSP’05), 2005, pp.109–114.

[18] J. Arenas-García and A. R. Figueiras-Vidal, “Improved blind equal-ization via adaptive combination of constant modulus algorithms,” inProc. ICASSP’06, 2006, vol. III, pp. 756–759.

[19] J. Arenas-García and A. R. Figueiras-Vidal, “Adaptive combination ofnormalized filters for robust system identification,” Electron. Lett., vol.41, no. 15, pp. 874–875, Jul. 2005.

[20] M. T. M. Silva and V. H. Nascimento, “Convex combination of adap-tive filters with different tracking capabilities,” in Proc. ICASSP’07,2007, vol. III, pp. 925–928.

[21] M. T. M. Silva and V. H. Nascimento, “Convex combination of blindadaptive equalizers with different tracking capabilities,” in Proc.ICASSP’07, 2007, vol. III, pp. 457–460.

[22] J. Mai and A. H. Sayed, “A feedback approach to the steady-stateperformance of fractionally spaced blind adaptive equalizers,” IEEETrans. Signal Process., vol. 48, no. 1, pp. 80–91, Jan. 2000.

[23] N. R. Yousef and A. H. Sayed, “A feedback analysis of the trackingperformance of blind adaptive equalization algorithms,” in Proc. IEEEConf. Decision and Control, Dec. 1999, vol. 1, pp. 174–179.

[24] N. R. Yousef and A. H. Sayed, “A unified approach to the steady-stateand tracking analyses of adaptive filters,” IEEE Trans. Signal Process.,vol. 49, no. 2, pp. 314–324, Feb. 2001.

[25] T. Aboulnasr and K. Mayyas, “A robust variable step-size LMS-typealgorithm: Analysis and simulations,” IEEE Trans. Signal Process.,vol. 45, no. 3, pp. 631–639, Mar. 1997.

[26] P. A. Regalia, “On the equivalence between the Godard andShalvi–Weinstein schemes of blind equalization,” Signal Process.,vol. 73, pp. 185–190, 1999.

[27] J. R. Treichler, I. Fijalkow, and C. R. Johnson Jr., “Fractionally spacedequalizers,” IEEE Signal Process. Mag., vol. 13, no. 3, pp. 65–81, May1996.

[28] Y. Li and Z. Ding, “Global convergence of fractionally spaced Godard(CMA) adaptive equalizers,” IEEE Trans. Signal Process., vol. 44, no.4, pp. 818–826, Apr. 1996.

[29] M. Mboup and P. A. Regalia, “A gradient search interpretation of thesuper-exponential algorithm,” IEEE Trans. Inf. Theory, vol. 46, no. 7,pp. 2731–2734, Nov. 2000.

[30] J. E. Mazo, “On the independent theory of equalizer convergence,” BellSyst. Tech. J., vol. 58, pp. 963–993, May/Jun. 1979.

[31] J. Minkoff, “Comment: On the unnecessary assumption of statisticalindependence between reference signal and filter weights in feedfor-ward adaptive systems,” IEEE Trans. Signal Process., vol. 49, no. 5, p.1109, May 2001.

[32] I. Fijalkow, C. E. Manlove, and C. R. Johnson, “Adaptive fraction-ally space blind CMA equalization: Excess MSE,” IEEE Trans. SignalProcess., vol. 46, no. 1, pp. 227–231, Jan. 1998.

[33] C. B. Papadias and D. T. M. Slock, “Normalized sliding windowconstant modulus and decision-direct algorithms: A link betweenblind equalization and classical adaptive filtering,” IEEE Trans. SignalProcess., vol. 45, no. 1, pp. 231–235, Jan. 1997.

Magno T. M. Silva (M’05) was born in São Se-bastião do Paraíso, Brazil, in 1975. He received theB.S. degree in 1999, the M.S. degree in 2001, and thePh.D. degree in 2005, all in electrical engineering,from Escola Politécnica, University of São Paulo,São Paulo, Brazil.

From February 2005 to July 2006 he was an Assis-tant Professor at Mackenzie Presbyterian University,São Paulo. He is currently an Assistant Professor inthe Department of Electronic Systems Engineering,Escola Politécnica, University of São Paulo. His re-

search interests are in linear and nonlinear adaptive filtering.


Vítor H. Nascimento (M’90) was born in São Paulo,Brazil. He received the B.S. and M.S. degrees in elec-trical engineering from the University of São Pauloin 1989 and 1992, respectively, and the Ph.D. degreefrom the University of California, Los Angeles, in1999.

From 1990 to 1994, he was a Lecturer at the Uni-versity of São Paulo, and in 1999 he joined the fac-ulty at the same school, where he is now an AssociateProfessor. His research interests include adaptive fil-tering theory and applications, linear and nonlinear

estimation, and applied linear algebra.

Dr. Nascimento received the IEEE Signal Processing Society (SPS) 2002 BestPaper Award in 2002. He served as an Associate Editor for the IEEE SIGNAL

PROCESSING LETTERS from 2003 to 2005 and is currently an Associate Editor forTRANSACTIONS ON SIGNAL PROCESSING and the EURASIP Journal on Advancesin Signal Processing, and a member of the IEEE-SPS Signal Processing Theoryand Methods Technical Committee.

Documents

Improving the Tracking Capability of Adaptive Filters via Convex Combination