Local convergence of the Sato blind equalizer and ...users.rsise.anu.edu.au/briandoa/public_html/pubs/... · Local Convergence of the Sat0 Blind Equalizer and Generalizations Under

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993 129

Local Convergence of the Sat0 Blind Equalizer and Generalizations Under Practical Constraints

Zhi Ding, Member, IEEE, Rodney A. Kennedy, Member, IEEE, Brian D. 0. Anderson, Fellow, IEEE, and C . Richard Johnson, Jr., Fellow, IEEE

Abstract- An early use of recursive identification in blind adaptive channel equalization is an algorithm developed by Sato. An important generalization of the Sat0 algorithm with exten- sive analysis appears in the work of Benveniste, Goursat, and Ruget. These generalized algorithms have been shown to possess a desirable global convergence property under two idealized conditions. The convergence properties of this class of blind algorithms under practical constraints common to a variety of channel equalization applications that violate these idealized conditions are studied. Results show that, in practice, when either the equalizer is finite-dimensional and/or the input is discrete (as in digital communications) the equalizer parameters may converge to parameter settings that fail to achieve the objective of approximating the channel inverse. It is also shown, that a center spike initialization is insufficient to guarantee avoiding such ill-convergence. Simulations verify the analytical results.

Index Terms- Blind deconvolution, equalization, intersymbol interference, local convergence, stability, adaptive filtering, system identification.

I. INTRODUCTION

A. Literature Review

HE SAT0 adaptive algorithm was one of the first widely T used recursive identification schemes for discrete time system inverse identification based on measuring the system output without explicit knowledge (i.e., direct measurements) of its input [l]. The only information concerning the system input utilized by the algorithm was knowledge of its statistical properties, e.g., the input probability distribution.

In the communication context, the system to be identified is usually the communication channel inverse transfer function (perhaps with delay) and the parameter being adapted frequently belong to a linear transversal equalizer. This type of adaptation (with no explicit knowledge of the input) is also known as “blind equalization” [2]. This blind equalization

Manuscript received December 12, 1990; revised October 17, 1991. R. A. Kennedy and B. D. 0. Anderson are supported by the Australia Research Council, ANU Centre for Information Science Research and Australian Telecommunications 2nd Electronics Research Board. C. R. Johnson, Jr. is supported by NSF Grant MIP-8921003. This work was presented in part at the ISSPA’90, Gold Coast, Australia, August 27-31, 1990 and in part at the CDC ’90, Honolulu, HI, December 5-7, 1990.

Z. Ding is with the Department of Electrical Engineering, Auburn Univer- sity, 200 Brown Hall, Auburn, AL 36849-5201.

R. A. Kennedy and B. D. 0. Anderson are with the Department of Systems Engineering, Australian National University, G.P.O. Box 4, Canberra, Act 2601, Australia.

C.R. Johnson, Jr. is with the School of Electrical Engineering, Cornell University, Ithaca, NY 14853.

IEEE Log Number 9203026.

feature makes the analysis more difficult than many standard identification schemes which form prediction errors with both the system input and output signals.

There are two different classes of blind algorithms constitut- ing generalizations of the Sat0 scheme, one by Godard [3] and Treichler et al. [4], [5], and one by Benveniste, Goursat and Ruget [6]. This latter class of generalized Sat0 algorithms is termed here as BGR algorithms. Whereas the Godard class has smooth cost functions, the BGR algorithms have been particularly difficult to analyze because of a discontinuity appearing in the prediction error of the adaptation algorithm (to be described later). In contrast, the smooth Godard algorithms have been successfully analyzed recently [7], [8]. A major contribution of this work is in presenting an analysis for nontrivial case studies which reveals clearly some of the nonideal convergence properties of these generalized Sat0 algorithms in digital communication applications.

There exist applications of algorithms similar to the BGR algorithms in fields other than blind equalization, e.g., in blind deconvolution which arises in geophysical signal processing. However, unlike in digital communications where the driving input takes discrete values (e.g., the input is a stream of equally probable independent samples of M-ary PAM data set), more general driving input distributions need to be studied. In fact, the most important result on the behavior of BGR algorithms was derived for classes of sub-Gaussian and super-Gaussian inputs, where the input is continuous instead of discrete [6]. Among many important findings in [6], it was shown that if the channel input is sub-Gaussian, then these algorithms exhibit ideal global convergence properties.

B. Contributions

Our investigations are largely intended to extend and complement the preliminary investigations by Sat0 [ l] and the seminal work by Benveniste et al. [6]. Our description con- centrates on the convergence behavior of the BGR algorithms in the adapted parameter space of the equalizer. This space is formed by parameters that are adapted directly using a gradient descent algorithm. Our results also complement and extend the work of Mazo [9] and Macchi and Eweda [lo] on the Sat0 algorithm applied to a memoryless channel with binary input.

Desirable global convergence behavior of the BGR algorithms has been established under two ideal conditions [6]:

1) the transversal equalizer requires a doubly infinite parametrization [ll], [12], and

0018-9448/93%03.00 0 1993 IEEE

- _- ~

~

130 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

TABLE I KNOWS AND UNKNOWNS ABOUT SATO ALGORITHM

Distribution

Parameterizatio Continuous Subgaussian Discrete-Level PAM Doubly Infinite Global Convergence [6] ?

Local Convergence for Ideal Channel [9],

? (Nonideal Channels) Finite (FIR) ? [lo1

2) distribution of the channel input is continuous and sub-

It is the (lack of) performance robustness in response to relaxation of these two conditions that we are concerned with. The first condition represents an idealization because practical linear transversal equalizers are causal and have only a finite number of adjustable parameters. The second condition implies the results are not immediately applicable to the most common PAM digital communication systems which have discrete symbol distributions (a large class of practical applications). Table I illustrates the current knowledge gaps where our work fits in.

In this paper, our aim is to answer the two unknown questions in Table I and extend the result of [9], [lo] to nonideal channels which do require equalization. We demonstrate the indispensability of these two basic assumptions for the desired global convergence of the BGR algorithms. Our long term aim is to better understand the limitations that are imposed by using these sorts of algorithms on practical systems and to be able to predict, explain, and avoid the potential failure mechanisms of these algorithms. Alternatively, armed with such information we believe astute, judicious deployment of these algorithms is required in order for them to realize their somewhat remark- able convergence attributes. Our contributions regarding the class of BGR algorithms may be organized as follows:

Gaussian (or super-Gaussian).

theory describing the potential for ill-convergence to parameters setting that exhibit significant IS1 (eye diagram closed) for a) a uniform (sub-Gaussian) input; and b) PAM inputs, when a finite impulse response equalizer is used (Section 111); theory describing the potential for ill-convergence when the input is PAM and the equalizer is taken as potentially doubly infinite (Section IV); failure of center-spike initialization strategy for PAM inputs (Sections IV and V); simulations to support our analysis and to show robustness of the mechanisms to which misbehavior is attributed (Section V).

11. PROBLEM FORMULATION

A. System Description

The overall system under study is shown in Fig. 1. To the left we have a (bounded input bounded output) stable linear channel (not necessarily minimum phase) H (4-l) = Cy==ohiq-i where n may be infinite. This channel distorts the input signal sequence { U k } , which assumed to be real

Channel Equalizer

Fig. 1. Channel and equalizer.

and i.i.d., with symmetric distribution va(.). The objective of channel equalization is to undo the channel distortion, which in the noiseless situation is equivalent to identifying the channel inverse by using a linear transversal equalizer.

The channel output is denoted by X k . Connected in series with the channel we have a linear transversal equalizer with tap weights given by

e^ck) [.. . k 2 ( k ) $ - , ( I C ) e , ( k ) &(k) 82(k) 4’. (2.1)

As the notation suggests, the parameters are time-varying, and are actually adjusted according to an adaptive law. In addition, the equalizer system is first assumed noncausal, though causality and finite dimensionality will be imposed later. What distinguishes the type of identification under study here is that in the adaption only { Z k } can be measured and incorporated for learning. Explicit knowledge of the input { ak} is lacking. Finally in Fig. 1, zk denotes the equalizer output that certainly is also measurable. When the linear channel H ( q - l ) and the channel input are real, the objective of ideal adaptation is to adjust O(k) to achieve

z k = +ak-A Or z k = -ak -A vk, A E z, (2.2)

where A is some fixed delay. Note the ambiguity in the sign in (2.2) is unavoidable for symmetric va(.) unless additional apriori knowledge of the system is available. Because of the use of a linear equalizer structure (2.1), it is essential that the channel transfer function has no zero on or near the unit circle.

Naturally many channel inverses cannot be exactly modeled with a finite length transversal equalizer, especially when the channel is nonminimum phase. However, a suitable approximation to the inverse incorporating a delay is often quite satisfactory [6]. Since the objective in channel equalization is to achieve an open-eye combined system output such that the correct output can be secured through quantizers, the ideal objective can often be rela_xed in practice when only a finite number of parameters { e ; } and causal systems are implementable, to allow approximate equality in (2.2). Note that issues regarding the ability of a finite impulse response (FIR) filters to approximate a nonminimum phase inverse (with delay 6) have been detailed in [6] and are not crucial for

DING et al.: LOCAL CONVERGENCE OF THE SAT0 BLIND EQUALIZER 131

our analysis. We are simply analyzing existing algorithms and studying their behavior. The development and justification of these algorithms and the precise qualification of conditions like (2.2) can be found in [6]. The question we are concerned with is: Does a (well-posed) BGR algorithm with finite equalizer parameters andfor under discrete PAM input always ensure convergence of the equalizer parameter settings, regardless of their initial values and the channel dynamics, yielding one of the desirable outcomes in (2.2) with sufficient accuracy?

B. Sato and BGR Blind Equalization Algorithms

consider take the common form The class of on-line, recursive, blind algorithms that we

h

q k + 1) = - P $ ( . ? k ) X k , (2.3)

where p is a small stepsize, and

xk A [' ' * l k + 2 zk+l xk Zk-1 xk-2 ' ' '1' is the regressor composed of channel output. The scalar $ ( z k )

is a nonlinear prediciton error generated from the equalizer output zk. The nonlinearity is essential since higher order statistics are needed to solve the blind equalization problem. In [1], Sat0 gave the following form of the memoryless nonlinearity ?I(.):

?I(.) 4i - Y sgn(z), (2.4)

where the dispersion constant is given by

Benveniste et al. [6] generalized the Sat0 _nonlinearity by extending its linear part into an odd function $ b ( ' ) convex for z 2 0, such that the prediction error nonlinearity becomes

- - - subject to $ b ( - z ) = - $ b ( z ) and # ( z ) = $:(z) 2 0 for all z > 0. Each valid function $ b ( z ) defines one algorithm in the BGR algorithm class. A BGR algorithm updates the equalizer parameters according to (2.3) and the Sat0 algorit_hm hence becomes a special case of the BGR algorithms with $ b ( z ) = z. The justification of (2.3) and (2.6) can be found in [6] and is not a concern here. Our interest is in gaining more insights to the behavior of (2.3) and (2.6). Some important features to note are:

1) the BGR algorithms utilize only measurements of signals

2) the algorithms are very easy to implement; and 3) the nonlinearities (2.4) and (2.6) have a sgn(.) function

and thus a discontinuity at the origin, which subsequently makes analysis difficult relative to the Godard algorithms

Before we can describe a key analytical result and contribution by Benveniste et al. [iif,wclleed to introduce the concepts of mean cost function and stochastic gradient descent minimization.

{ z k } and {Q}, but not {ah};

[31.

C. Mean Cost Function Minimization

In this section, we review two important notions that are standard. The first is that of a mean cost surface for which (2.3) may be viewed as a stochastic gradient descent strategy. The second is that of an equilibrium that is a parameter setting that, if frozen, given an average update in (2.3) of zero (i.e., zero expectation but not necessarily requiring the instantaneous update in (2.3) to be zero).

In connection with the BGR algorithms defined by (2.6) we introduce (modulo a constant term) a scalar costfunction defined as

* b ( z ) A j $ b ( a ) d o = $&(U) do - Y b l z I . (2.7) 0 i - 0

Then, (2.3) can be seen as a gradient descent algorithm when it is rewritten using (2.7) as

a * b (zk) A h

q k + 1) = O(k) - p ~ ai? '

where zk = Xie^(k). The average or mean cost surface, as a function of the parameters, is then defined by

which is to be minimized through the parameter updating algorithm of (2.8). For the special case of the Sat0 algorithm, the scalar cost function is simply Q ( z ) = (1.1 - y)'/2, while its mean cost surface becomes

The stationary points (e3 on the mean cost surface (2.9) are defined by

in which the commutativity follows because (2.7) has a piecewise continuous derivative. In turn, it is clear how the stationary points defined by (2.11) can be interpreted as being the parameter settings for which the average update in (2.8) is the zero vector. Given a stationary point e derived from (2.11), uniform positive definiteness of the Hessian at e provides a sufficient condition for its stability. Thus, if the Hessian matrix satisfies

for some scalar 7 > 0, then the stationary point e is locally stable and corresponds to a local minimum of the mean cost


(2.9). (For fi_nite-dimensional 5, this is trivial; for infinite- dimensional 8, the interpretation of (2.12) is discussed further in Appendix C.) One cannot in this expression commute the outer differentiation and expectation operators because of the discontinuity of $a(z) at the origin that makes subsequent analysis difficult.

Our approach to analyzing the convergence properties of a BGR algorithm can be summarized as follows: 1) determine the mean cost surface (2.9) or (2.10) that describes succinctly the global adaptation tendencies of the adaptation update; 2) locate and describe the set of stable (attractive) equilibria (2.11) and (2.12) in the equalizer parameter space; and 3) as- certain if these parameter values fulfill the desirable objective of forming an (approximate) inverse (2.2).

D. Existing Results on BGR Algorithms

Among many important results in [6], one major contribution regarding the convergence of BGR algorithms may be summarized as follows.

Convergence Result: When the distribution ua (.) of the input signal a k is sub-Gaussian, i.e., satisfies either: 1) ua(u) = Ke-g(”) where K is a constant and g(u) is an even function such that both g(u) and g’(u)/u are strictly increasing on Rf; or 2) the distribution is uniform on [-a, $61, then the only stable equilibria of the BGR algorithms with doubly infinite parametrization as in (2.1) are those achieving the desired response (2.2).

The uniform distribution can be thought of as the limiting case of a sub-Gaussian distribution. Notice that, roughly speaking, sub-Gaussian denotes tail roll-off faster than a Gaussian distribution.

This result establishes a desirable global convergence property of the BGR algorithms. It means that

1) if the channel input has sub-Gaussian distribution; and 2) if the channel is merely a constant with no dynamics

or if a noncausual, infinitely parametrized equalizer is updated for dynamical channels according to the BGR algorithm,

then the global convergence to the desired overall system performance can be assured. However, as recognized in [6], these two conditions cannot be met when applying BGR algorithms to PAM channel equalization. First, the channel input in PAM digital communication systems always has a discrete distribution, unlike sub-Gaussian distributions that are necessarily continuous. Second, only a finite number of equalizer parameters can be updated in real adaptive algorithms. Thus, the key questions arise as to whether or not the global stability result of BGR algorithms [6] are robust to violations of conditions 1) and 2). We are going to show, in the rest of this paper, that violation of either condition can result in the ill-convergence of the BGR al- ’ gorithms to local minima of its corresponding mean cost where the objective (2.2) does not hold even approximately. For analytical convenience, we are going to focus mainly but not exclusively on a special case of BGR algorithms, namely the Sat0 algorithm (2.3). The results, however, are general.

111. BGR ALGORITHMS FOR FINITE-DIMENSIONAL EQUALIZERS

A. Convolution Matrix and Its Nullspace

The seemingly straightforward steps outlined in Section II- C for the analy_sis of convergence behavior can be difficult to follow in the 8 parameter space because for the impasse in obtaining the probability distribution of the channel output x k

(see [13]). To simplify the analysis, it can be seen that the equalizer output Zk can also be viewed as_the response of a convolved (channel {hi} with equalizer (0,)) system to the input a k . In other.words,

m C O

with @ denoting convolution. In some of this section, we shall assume that the channel and equalizer have finite impulse responses, in which case the convolution has the same property. More generally, especially in the next section, we shall permit doubly-infinite impulse responses, with the requirement that they be in Z1. Notice that the convolution of two ZI impulse responses is again in 11; also if an 12 ( or loo) signal is the input to an 11 impulse response system, the output is also an 12 signal (or la).

By defining two corresponding vectors

we have Zk = 7 ( k ) ’ A k . Consequently, the mean cost 3 can be analyzed in the 7 parameter space.

The reason for introducing this “convolved” parameter space is because it affords, in certain instances, a more elegant analytical formulation since the distribution of a k is known. In addition, the condition reflecting that ideal parameter values have been achieved, (2.2), or that they have been achieved approximately, has a simple interpretation in 7 parameter space, namely that one coefficient in 7 dominates the absolute sum of the remainder [3], which corresponds to an open eye diagram (with the dominant coefficient called cursor). In short, some published work, e.g., [3], [6], [7], [14], is carried out in the convolved paramztrization that simplifies the analysis relative to working with 8. We now discuss whether such a parameter translation always preserves the convergence properties of the gradient descent algorithm.

The major existing result on the global convergence of BGR algorithms (including Sat0 algorithm) is based on analyzing its convergence behavior in the convolved parameter space where equilibria 7 satisfy

If the equalizer is an FIR filter that has a parameter vector of length m and the channel is an FIR filter of length n, then

DING et al. : LOCAL CONVERGENCE OF THE SAT0 BLIND EQUALIZER 133

a convolution matrix can be written [ l l ] , [12] as B. Ill-Convergence of BGR Algorithm for Sub-Gaussian lnput

I

\

* h2 . . . hn'dI

1 0 0 . . . ho hl m+n-l

such that 7 ( k ) = 'H'?(k), and XI, = 'HAk. By introducing the convolution matrix, we can obtain the following relationship I121

This relationship simply means that

but the converse is not necessarily true. From (3.3), the equilibria of (2.11) should be all e that cause

where N('H) is the nullspace of the convolution matrix 'H. Clearly from this equation, equilibria (3.1) correspond to "trivial" solutions of (3.4) because (3.1) is trivially in the nullspace of 3-1. However, the convergence properties of the BGR algorithms in the convolved 7 parameter space will not fully describe the convergence bebavior of the algorithm (2.3) in the equalizer parameter space 8 unless the nullspace of 3.t is trivial. As a result, the crucial question is whether or not the nullspace of 'H is always trivial.

It can be shown [ll], [12] that for IS1 channels with no singularity on the unit circle this nullspace N('H) is trivial, if and only if the equalizer is infinitely parametrized and noncasual as in (2.1). Such is the situation considered in [6], [7], [14] where equalizers with doubly infinite parameter vector are assumed. In fact, many important results of [6]

In this part of the paper, we shall show here how a finitely parametrized equalizer adjusted according to a BGR algorithm can fail to achieve global convergence even when the channel input is sub-Gaussian. We specifically consider the case of the Sat0 algorithm used with uniform, i.i.d., channel input, i.e., the probability distribution function of the channel input ak is uniform over [-6, SI, for which the Sat0 algorithm constant is y = 2S/3 from (2.5).

Instead of considering some complicated channel systems that tend to obscure the ill-convergence mechanism, we study the channel H ( q - ' ) (in Fig. 1) determined by the autoregression

xk + pxk-1 = ak, [PI < 1, (3.5)

which implies a channel impulse response hi = (-p)i, i 2 0. The channel may be alternatively expressed as a rational function in the delay operator q P 1 as H ( 4 - l ) A (1 + p q - l ) -', which is a first-order, minimum phase system. Note that a single tap equalizer (i.e., one that scales the system output) cannot satisfy the ideal objective (2.2) even approximately unless /? is very small. In contrast, the two tap equalizer

can achieveJhe ideal objectiye with zero delay of Zk = ak Vk, by setting & ( k ) = 1 and 8 1 ( k ) = p. Therefore, the ideal equilibrium is indeed achievable with the two tap equalizer, and our problem is well-posed and nonpathological. The crucial assumption different from the analysis of [6] is that the equalizer has (and only needs) a finite number of parameters to achieve the ideal response.

Local equilibria of the Sat0 algorithm are those points e that (using (2.7)) satisfy

were accomplished in the 7 parameter space because having a doubly infinite Earameter vector ensures a bijection between equilibria in the 8 and 7 parameter spaces.

In [6], it was shown that BGR algorithms are globally convergent to ideal equilibria satisfying (2.2) for equalizers with doubly infinite parameter vectors under a sub-Gaussian channel input. It was further demonstrated that in practi?

= 0, (3.7)

whye XI, = [xk xk-11' and the equalizer output is ?& = X k 8 ( k ) . While it is difficult or at least tedious to solve for all the possible equilibria satisfying (3.7), motivated by [8], we choose to search for local minima of the form

where finite truncation is made to the parameter vector 8, e Li [o e,]' , e, # 0, (3.8) global equilibria will remain in the neighborhood of the ideal settings (2.2) [25]. However, when the equalizer is finitely parametrized the nullspace of 3-1 clearly becomes nontrivial,

which means X k e = zk - l e l in (3.7). This leads to the pair of scalar equations

the ones visible in 7 parameter space. This situation is similar E

which with the aid of (3.5) and the independence between x k - 1 and ak reduce to the one equation in the unknown 6 1 :

= 0, 2 E to, ind therefore other equilibria satisfying (3.4) may exist beside

to the truncation effect discussed in [6]. As a result, we anticipate,the local convergence of BGR algorithms with a finite number of parameters for certain channels where there exist some local minima e satisfying both (3.4) and (2.12). Nevertheless, their existence remains to be established, and it is to this crucial question that we now direct our study.

2s &XC",-i - 7 s g n ( 8 l x k - l ) x k - l


There are two solutions for this unknown given by Now we note that form (3.5) and the independence of { u k }

Thus, there are two equilibria of th_e form (3.8) on either side of the origin in the two dimensional 8-space. Such an equilibrium corresponds to merely a scaled delay and is ineffective as an equalizer.

The next step is to establish that (3.8) with (3.9) are a pair of stable equilibria rather than saddle points or maxima. The local stability is assured if the Hessian matrix

where we have used Jensen’s inequality. Combining (3.14) and (3.15) gives

(3.16) s

E{I.kl) L 3 171 d77 = 2’ l7 -6

E{ Xk x: } 2s d 3 a 8

- - --E{sgn(zk)Xk} (3.10)

is positive definite when evaluated at the pair of equilibria [0 611’. In (3.10) the second term is nontrivial to evaluate. Clearly the first term is positive definite but the second term has the potential to destroy this property in the overall Hessian.

In Appendix A, we prove the following crucial relationship

where pZk(O) is the probability density of the system output Xk at zero which is shown to satisfy the following useful inequality (see also Appendix A)

(3.12)

The relationship E { z ~ x ~ - ~ } = -BE{ I x ~ I ’ } follows from (3.5). Now assemble the pieces and we can write (3.10) as

using (3.9) and (3.11). Thus, the positive definiteness of the Hessian (3.10) hinges on establishing that

(3.13)

The expression on the left of (3.13) needs to be evaluated and, given the difficulty in explicitly computing E{ Izkl}, we will content ourselves with a lower bound (and this can be coupled with the upper bound in (3.12)). By letting p a ( . ) denote the probability density of the input, simple application of Bayes’ rule yields

E { I X k l > = i p a ( v ) E { 1Xkl I ak = .I} d v -6

+6

By using our approximate bounds on p,,(O) and E { l ~ k l } we can see whether (3.13) holds nonetheless. We have, using (3.12) and (3.16), that

showing (3.13) is valid for (at least) I/?[ < l / f i E 0.57744. There are ways of tightening (3.15) to prove that for an even broader range of /3 the equilibria in (3.8) with (3.9) are indeed minima.

In summary, we have constructed an example based on an AR(1) channel with uniform i.i.d., input for which a BGR (Sato) algorithm exhibits local minima. This pair of local equilibria result in the output

M

which does not (even approximately) recover the original input. We have thus shown that the global convergence results of [6] do not generalize to equalizers with finite parameters. This confirms certain comments of [6] (Section IV) concerning the possibility of local convergence as a result of finitely truncating the infinite parameter vector. However, our results show another adverse effect of truncation in that the mapping between equilibria of two different spaces is no longer bijec- tive. Reference [6] suggested the possible existence of local minima near the “crest lines” in 7 parameter space line the ones in [9], [lo] while the local equilibria we have just derived are not even equilibria in the 7 parameter space [ l l ] , [12]. The local minima near the “crest lines” were also said to be rather shallow and negligible while the local minima we have derived seem quite stable, as shown later by simulation evidence.

C. Ill-Convergence of BGR Algorithms Under BPSK Input

In terms of channel equalization, our true interest lies in the use of BGR algorithms for PAM (or QAM) systems where the channel input is discretely distributed. We now examine the convergence of BGR algorithms used with finitely parametrized equalizers under PAM channel input.

To establish the existence of local minima for a BGR algorithm (Sato algorithm) under a PAM channel input, we use the same system set up as in the previous subsection.

DING et al.: LOCAL CONVERGENCE OF THE SAT0 BLIND EQUALIZER

+0.25

135

4 1

0 8

0 6

0 4

02

0

-0 2

-0 4

-0 6 e, -0 5 0 5 1 15

Fig. 3. Mean cost contours in $ A [& $11' space.

Once again our channel is characterized by an autoregression (but we fix the pole position for later illustration)

where the channel input is now taken as Bernoulli, i.e., ak takes binary values (binary phase shift keying) with equal probability-this means the dispersion constant (2.5) is given by y = 1.

The particular value for p above lets us easily determine the probability distribution of the system output Z k for (3.17). This is computed in [13]. In [13], it also is shown that in general for arbitrary P, the output distribution of the system (3.5) is of the Cantor-type and thereby difficult to work with. It is only for special values of ,b' (e.g., P = 2 l I K , K E Z) that one gets simple, easily characterized probability densities for x k . For p = 1 / a , the density is given by

PZ,(X) = for 1x1 < 2 - Jz, for 2 + Jz < 1x1.

(3.18)

We have given the graph for this density in Fig. 2. Also for P = 2 - 1 / 2 , the AR(1) channel (3.17) generates a closed eye output and dynamic equalization is necessary to recover the channel input. As before, the two tap equalizer can ideally achieve the inverse of the chanyl and eliminates all intersymbol interference (ISI) by setting 8(k) = [l PI'. Hence, the problem is again well posed. From the density (3.18), we are able to explicitly compute the mean cost surface (2.10) in

Fig. 4. 3-D mean cost surface in e ̂e PO ;I]' space.

closed form. Observe

which follows from (3.17), the independence between ak and xk-1 , and various symmetries. Now, we can use (3.18) to derive explicit exp_ression_s for the mean cost surface in terms of the parameters 80 and 81, noting the second-order statistics

1 E { x f ) = 2'

follow from (3.17) and the i.i.d., Bernoulli input assumption. The complete expression for the surface is complicated. In- stead, we give part of its definition valid for (a - 1) l&l >

1 - 1 6

. ( [Bo&]') =e;+g-Jze,e ,+-- 2 B o .

The remaining regions of parameter space are also easily but tediously computed and Fig. 3 gives the contour plot for a window of the parameter space containing the origin.

The surface represented in Fig. 3 has a 180' rotational symmetry. This figure indicates two locally stable Einima, one at [l PI' (desirable) and the other on the positive 80 = 0 axis. There is also a pointy maximum at the origin that is easier to visualize in the mesh plot of the same surface shown in Fig. 4.

We now prove algebraically, rather than visuallyLthat indeed there are a pair of locally stable equilibria (on the 80 = 0 axis) for this example which do not yield parameter values of the equalizer satisfying (2.2), or an approximation allowing an open eye output.

136

Let e 2 [0 &] be a candidate stable equilibrium, meaning it must satisfy (2.11). As before in the derivation of (3.9), except now the dispersion constant is y = 1 replacing y = 2613, we obtain the solution

employing (3.19). Using (3.18) one may verify

00

E{I%kl} = 2 / % p z , ( % ) d x = 7/6. 0

Therefore, this establishes that

e A f [ O 7/12]’ (3.21)

are a pair of equilibria that are easily identified in Fig. 3. To establish their local stability, we evaluate the Hessian at

(3.21). Most of the mathematical machinery required for this is relegated to Appendix B. From (2.7) and (2.12) we have

-E{ d y } = E { X k X k } - y 7E{sgn(zk)Xk}, d dB dB

(3.22)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 1, JANUARY 1993

where y = 1 in this case. The first term in (3.22) is easily determined from (3.19). The second term is evaluated in Appendix B and may be shown to be

Collecting together all the terms (3.18), (3.19), and (3.20), and checking (2.12), one can verify

Hence, both the equilibria (3.21) are stable minima. This fact is reasonably self evident from Fig. 3.

In summary, we have established that the Sat0 algorithm (special case of a BGR algorithm) is prone to difficulties in the sense that there exist stable attraction points in parameter space that yield an equalizer output satisfying

z k = f 7 /12~k- ,

= f 7/12

. + 2-l/2ak-1 + 2- lak-z + 2-3/zakP3 + . . .). Such an output is never of the form (2.2) and is therefore undesirable. Recall that this pathological behavior is achieved for a well-posed first-order minimum phase system and a special value of p. By noting the continuity of the mean cost to the channel parameters, it can be shown (cf. [16]) that for small variation of p near 1/& our result remains valid.

IV. LOCAL CONVERGENCE UNDER PAM INPUTS

A. Background

In the previous section, we illustrated a mechanism whereby the BGR algorithms fail to have ideal convergence. This mechanism arises whenever a finite number of parameters are used in the transversal equalizer. Our objective in this section is to show that there exist other mechanisms whereby the BGR algorithms fail when the channel input is PAM. Unlike the previous section the local minima to be constructed here are equilibria in the 7 parameter space. Results in the section show that the weakening of the sub-Gaussian condition in the theory presented in [6] to include the practically important discrete PAM constellations leads to undesirable local minima of the form to be described, even with doubly infinite equalizers. Benveniste et al. ([6], Remark 4, Section 3) had noted the potential existence of local equilibria in the case of Bernoulli inputs. Our results are getleralizations of [9], [lo], and [17].

B. The Ill-Convergence of the Sato Algorithm

Consider the original system setup in Seztion I1 where we have an equalizer with parameter vector O(k). Assume the PAM channel input is i.i.d., and uniform over the set

( f ( M - l ) d , . . . , f 3 d , f d } . (4.1)

Somewhat analogously to results found in [3], [9], [17], we search for equilibria in the 7 parameter space satisfying (3.1) of the form

> t # 0, (4.2) I’ - [ - 7 = . . . 0 t t t . . . t 0 . . .

where N , the number of identical nonzero components, is odd by construction. The N nonzero components do not have to be adjacent-it is for convenience of notation that we write (4.2). Note that for N = 1, the equilibria of (4.2) are the desired equilibria. For N 2 3, considerable IS1 remains.

An important issue in connection with above candidate equilibrium is the existence of an equalizer parameter setting that when convolved with the channel attains (4.2). That such a channel and equalizer exist can be demonstrated by the following example which will be referred to later. Let the channel ( H ( q - l ) be a stable autoregressive filter with 1 poles, and let N = 3. Then an FIR equalizer having the following impulse response

attains (4.2). Therefore, for arbitrary autoregressive channels with 1 poles, such parameter settings are attainable whenever the equalizer has 1 + 3 or more taps.

Before showing our construction we make a small qualification. In the case of an IIR channel where 7~ in H(q- ’ ) is infinite (and subsequently ‘H has an infinite number of columns), additional technical side conditions (e.g., channel stability) need to be imposed when considering the definitions

DING er al.: LOCAL CONVERGENCE OF THE SAT0 BLIND EQUALIZER 137

of local minima (3.1). In particular, we shall assume that H(q-') corresponds to an impulse response which is 11

summable, and that H(eJ") is nonzero for all w E [ 0 , 2 ~ ) so that then, H-'(q-') has also an I 1 impulse response. These impulse responses map 12 sequences into 12 sequences. Also, the associated matrices XI and (XI)-' map (infinite) vectors of finite euclidean length to vectors of finite euclidean length. Our analysis depends on the use of FrCchet differentials and derivatives. Stationary points are obtained by setting the (first) FrCchet derivative to zero, and a sufficient condition for stability is obtained by requiring a quadratic form using the second FrCchet derivative to be (uniformly) positive definite. Details involving the use of FrCchet differentials are relegated to Appendix C.

Considering in more detail the condition for the Sat0 algorithm to have equilibria in the 7 parameter space (3.1) at the impulse response 7 of (4.2) leads to (see Appendix C)

M-level PAM input, {ak} takes i.i.d. values uniformly over the set (4.1), for which we now prove the following:

d r 4 -E{sgn(A','T)Ak} = 0. (4.7) 8 7 17=7

We have the components of I',

However, since N is odd then ( C L i ' a k - i l 2 d. Once

E 5 Itl(M - I)-', we see

so that the limit in (4.8) is identically zero for all i,j. Therefore, the Hessian (4.6) is given by

(4.4) -1 for some q > 0. We can in fact take 7 = 0 ~ ( 1 1 7 - - ~ 1 1 ~ ) where the norm of X-' is the induced norm when 'H-' acts on vectors of finite euclidean norm. To see this, observe that for any finite euclidean norm vector a, when p = 'H'a there holds a = (X')- 'P and so 11'H-1112/?',0 2 a'a or

Let the input to the system {ak} be zero-mean and i.i.d., of variance 02, with dispersion y. Then only the middle N equations in (4.4) are nontrivial. In addition, because of the independence of the input sequence ak, all N equations are identical. By adding all N equations together, we get

N a 2 t - y sgn(t)E ak-i = 0 i:r: I } from which we obtain

(4.5)

where we have used (2.5) to substitute for y. One can conclude that if (4.2) is attainable with the t of (4.5) for some equalizer parameter setting, then it is an equilibrium of the Sato algorithm (although at this stage it is not clear that such an equalizer parameter setting exists). We now show that such an equilibrium is stable for the practically important PAM input signals.

Let e be such that 7 = We, and consider the Hessian of (2.12) evaluated at 7:

(4.6)

Uniform positive definiteness of the Hessian provides a sufficient condition for stability of the equilibrium. Now for an

We remark that in contrast to the positive definite character of the Hessian here, the Hessian at similar equilibria is indefinite for the Godard algorithm, indicating that equilibria of this type are not locally stable [14].

For a broad class of channels (combined with equalizers), e.g., an autoregressive channel with equalizer setting (4.3), this calculation establishes that undesirable stable local minima in the 7 parameter space of the Sat0 algorithm exist. In our analysis, we have chosen 7 with N successive components identical and nonzero. In fact, because of the symmetric distribution of ak, all the previous results also hold if all the nonzero elements in 7 have identical magnitudes but can have arbitrary positions and arbitrary signs.

C. Generalization to Other BGR Algorithms

In this section, we examine whether the misbehavior of the Sato algorithm can also occur in some other BGR algorithms that share certain important features with the Sato algorithm. We shall extract one such important feature that establishes the ill-convergence of those BGR algorithms. Our approach bears strong similarity with that of Verdu [17], which addressed possible locally stable equilibria but did not exclude saddle points. We give a complete demonstration that complements that in [17].

138 IEEE TRANSACTIONS ON lNFORMATION THEORY, VOL. 39, NO. 1 , JANUARY 1993

We once again focus or discussion on the cases when the channel input is Bernoulli and we specially look for undesirable equilibria 8 that result in the impulse response of the combined system having the form

7 4 . . o t t t o . . . I , t # O . (4.9)

In other words, the impulse response vector has only three nonzero elements. (There are of course many channel equalizer combinations for which 7 is attainable.) From (2.6), we know that an equilibrium of the BGR algorithms must satisfy the zero mean update condition of (2.11), i.e.,

where J b ( . ) and 7 b are defined in (2.6). Equilibria in the 7 parameter space at 7 of (4.9) are those

that satisfy

which is the BGR algorithm generalization of (4.4). This is, we require

Then because a k is zero-mean and i.i.d., this equation is trivially satisfied except for three elements in the vector Ak at a k , a k - 1 , and a k - 2 . In addition, all these three nontrivial equations are identical. Summing them up, we get

Since $ b ( . ) is odd and t # 0, multiplying both sides of the last equation by sgn(t) gives us

If there exists some t that satisfies (4.11) (and this is studied next), then 7 given by (4.9) is an equilibrium for the BGR algorithm.

To establish the existence of a minimum, one must use a second-order Frkchet derivative (see Appendix C). Here, the local stability condition is dealt with by considering the uniform positive definiteness of the Hessian

We have shown in the previous subsection that for PAM input and N odd ( N = 3 in this case), r = 0 as in (4.7). Thus, it follows that

Based on (4.11) and (4.12), we arrive at the following result (for proof, see Appendix D). -

Ill-Convergence Result: If $;(x) > 0 for all x E [1/3,3], then there exists a constant t E [1/3,1] such that

is a locally stable equilibrium for the BGR algorithm with error

function $a(x) = Jb (z) - 7 6 sgn(x) under Bernoulli channel input.

The symmetric distribution of the input signal guarantees the identical distribution of a k and - a k . Therefore the theorem trivially extends to sets of stable equilibria generalizing (4.9) where the positions of the t's are arbitrary; also any subset of the t can be replaced by -t.

Notice that this local convergence condition is derived for the Bernoulli input and the special type of undesirable equilibria (4.9) and its natural generalizations. Similar results can be derived by increasing the level of PAM input. The importance of this result is that it shows BGR algorithms are not globally convergent when the channel input is PAM. For QAM channel input, the results are expected to be similar, as with the PAM to QAM extension for the Godard algorithm in [8].

V. SIMULATIONS AND INITIALIZATION STRATEGY

A. Uniform Input to AR(1) Channel

Fig; 5 shows the trajectories of the two parameters & ( k ) and O(k) of a first-order equalizer update according to the Sato algorithm (2.3) for the AR(1) channel (3.5) with ,b' = 0.55. There are various initializations with an i.i.d., uniform [-1.5, +1.5] system input. Certain trajectories appear to hang in the vicinit~ of the vertical axis symptomatic of a local minimum [0 01] ' and other trajectories converge onto the desired parametrization [l 0.5511, which is the global minimum. The stepsize p = 0.01 was chosen. This simulation evidence confirms our theoretical investigations of Section 111-B.

B. Bernoulli Input to AR(1) Channels

Following the analysis of Section 111-C, we changed the channel input a k into an i.i.d. Bernoulli input. The channel parameter p was given the values 0.6 and 0.8 Figs. 6 and 7 illustrate the trajectories of the two parameter equalizer updated according to the Sato algorithm with p = 0.01. In both cases, even though ,b' is different from the analytically tractable value of I/&, certain simulation trajectories appear to have converged to the vicinity of [0 81]' while others to the vicinity of the global minimum [ l PI'.

DING et al.: LOCAL CONVERGENCE OF THE SAT0 BLIND EQUALIZER

1 5 - zr------ 1 5 -

1 -

’’ 0.5

0 -

Local Mmimum

-

-0.5 1 4 5

i -

-1 -0.5 0 0 5 1 IS 2

Fig. 5. Simulation of the Sat0 algorithm for AR(1) channel with 4 = 0 55 under uniform channel input.

-0 5

Simulation trajectories of equalizer parameters under Bernoulli chan-

-0 5 0 0 5 1 1 s

Fig. 6. nel input for /3 = 0.6.

These simulations verify our analysis and our claims on its generalization in Section 111-C. The equalizer parameters will converge to an equilibrium depending on its initial setting. Hence, in using Sat0 and BGR algorithms for which local convergence is possibfe, parameter initialization becomes crucial. We shall combine our next simulation study with a widely used initialization strategy.

C. Failure of Center-Spike Initialization

With the existence of locally stable equilibria for the Sat0 and BGR algorithms, the initial parameter setting provides a potential key to avoid ill-convergence. Many existing simulation successes in blind equalization (implicitly) initialized their algorithms for various reasons with a special “center- spike” style strategy. In this section, we show the failure of this type of initialization to avoid ill-convergence by the Sat0 algorithm on a class of constructed channels.

Sat0 in the first blind equalization paper [l] described the initial setting of the parameter vector that was used in the simulations, although its importance was not realized

139

60 -0.5 0 0.5 1 I .5

Fig. 7. Simulation trajectories of equalizer parameters under Bernoulli channel input for ,!3 = 0.8.

(initially) by many researchers. In the simulation of [l], all the parameters are set to zero except for the center one. Apparent desired convergence was shown. This so called center-spike initialization philosophy was later used by Godard [3] and Foschini [14] with different heuristic justifications. Godfrey and Rocca [ 181 also implemented their “Bussgang” algorithm with this special initialization. Many other simulated successes have been shown for various algorithms with this particular initialization scheme. These favorable results have made center-spike the most widely acknowledged initialization tactic for iterative blind equalization algorithms. A formal description based on [3] is given here.

Center-Spike Znitializution: The initial parameter vector e(0) of length 2N - 1 should be set such that

# N ,

where N is the effective length of the channel inverse. We note here that it is crucial in center-spike initialization

for the equalizer to have more than sufficient length. With only two equalizer parameters, our simulations in Section V-B do not allow the use of center-spike initialization. In fact, if the equalizer length is increased, then center-spike initialization can result in desired convergence for the examples in Section V-B. This suggests that some ad hoc solutions such as “good initialization” combined with “adequate” equalizer length may be helpful to reduce the chance of local convergence by blind equalization algorithms. Although what constitutes an “adequate length” requires further studying, we do not dispute or challenge such claims. Our results are intended to bridge the gap between the theory and applications of blind equalization. Recognizing, instead of dismissing, the potential for local convergence by Sat0 algorithm and its generalizations, more effective initialization tactics may be designed. In fact, our results can be seen as justification for the use of good initialization tactics.

140

0

0.2 ::L 0

-0.2 - 10 20 30

( 4


stepsize=5 .e-4 under

":U 0 0.5 1 1.5 2

( b) 0.6

0.5

0.4

0.3

0.2

0.1

0

-0.1 10 20 30

( 4

Fig. 8. 111-convergence of the center-spike Sato algorithm under Channel 1. Iterations x lo4. (a) Channel-5 impulse response. @) Ill-converging total 151. (c) Final equalizer impulse response. (d) Final total impulse response.

While many successes have been reported using the center- spike method, we now show that unfortunately, it does not always guarantee the global convergence of BGR algorithms. An example of failure will be constructed for the Sato algorithm under an equiprobable binary channel input signal. Because we have shown in Section IV that the Sato algorithm and some BGR algorithms can converge to equilibria of the form (4.2) for the combined system impulse response, we look specifically for channels such that the total impulse response from center-spike initialization is close to these locally stable, undesirable equilibria. We chose a non minimum phase MA channel to have the transfer function

Channel 1 : 0.5(1+ q-2 + q P 3 ) ,

as shown in Fig. 8(a) with zeros { -0.6823,0.3412 f i1.1615). Since channels such as this require equalizers with longer parameter vectors for IS1 removal, we choose the number of equalizer parameters to be 21 in our simulations. Because of the large number of parameters involved in channel equalization, we cannot rely on two-dimensional figures as in Sections V-A and V-B to show their effectiveness. Thus, we define the percentage ISI, P, in terms of the combined system impulse response {tz} as

which measures the severity of IS1 of the combined channel and equalizer system. The combined system has an open-eye if the P < 1 and has a close-eye if P 2 1.

Simulation results of Fig. 8(b)-(d) show the convergence of the Sato algorithm (with center-spike initialization) to the

-0.2 - 0 10 20 30

( 4

stepsize=S.e-4 under - 1 binary channel input E

, 1 1 0.2

0

-0.2

-0.4

-0.6 10 20

Fig. 9. Ill-convergence of the spike-centered Sato algorithm under Channel 2. Iterations x lo4 . (a) Channel-5 impulse response. (b) Ill-converging total 151. (c) Final equalizer impulse response. (d) Final total impulse response.

undesirable equilibrium of the equalizer

& [ 0 0 . . . 0 1 0 0 . ' . 01,

which results in a closed eye overall combined system whose P(IS1) is approximately 200%.

To test the robustness of this ill-convergence example by the Sato algorithm, we choose a different MA channel with transfer function

Channel 2 : -07 + 0.3q-1 + 0.8q-2 - 0.6q-3,

which has zeros {1.1715,0.8 f i0.3026). By analogy with the results for channel 1, we might suspect that channel 2 could have a stable undesirable equilibrium for the combined channel-equalizer transfer function near [-0.5 0 0.5 - 0.51'. The simulation results of Fig. 9 illustrate that by using the Sato algorithm, again with center spike initialization, the impulse response of the total combined system has converged to the neighborhood of the undesirable equilibrium [-0.5 0 0.5 - 0.51' given in Section IV. Because of the lim- ited length of the equalizer, the exact equilibrium may not be reachable. Nevertheless, from the rather steady IS1 in Fig. 9(b) and the final parameters of the total system shown in Fig. 9(d), we can see that under the center-spike initialization, the Sato algorithm still converged to the undesirable equilibrium. This ill-convergence for channel 2 indicates the rather sizable region of attraction of the undesired equilibria given in Section IV and the robustness of our example of failure.

VI. CONCLUSION We have studied the convergence properties of BGR-

algorithms in finite-dimensional equalizers. We have established the existence of local undesirable minima for both


discrete inputs and a subclass of sub-Gaussian inputs. 111- convergence may also be deduced using similar analytical methods for the infinite-dimensional equalizer case when the input is (discrete) PAM. The techniques that we have employed should also prove effective for the analysis of the closely related field of decision directed equalization for nontrivial channels with ISI.

We have indicated that there are two distinct mechanisms under which these BGR-algorithms can fail. One mechanism stems from taking the channel input as discrete PAM signals. This case of ill-convergence is useful in constructing counter- examples to the center-spike initialization strategy that has been proposed as an effective means to avoid the potential for ill-convergence for various blind equalization algorithms, including Sato. The second mechanism of BGR-algorithm failure stems from the finite length of the equalizer filter and does not rely either on any restriction on the input driving process. In addition to analytical results, we have also provided simulation evidence of ill-convergence. These simulations verified that the domains of attraction of the local minima are quite large and certainly their existence is robust to perturbations to the channel parameters.

Our results coupled with analogous ones for the popular Godard algorithms highlight the need for systematic research on the initialization tactics for the existing blind equalization algorithm and the development of new blind algorithms, e.g., based on unimodal cost functions and parameter constraints [ 191 - [ 211.

APPENDIX A CALCULATIONS FOR UNFORM INPUT

Here, we wish to prove (3.11) when at equilibrium (3.8) with (3.9). Observe that for E smaller than 1811, there holds

sgn(8lxk-1) = sgn( ($1 + E ) Q - ~ )

and so at $(IC) = 8 = [ O , & ] ' ,

Thus, the second column of (3.11) is indeed zero.

ated for 60 = 0. Towards this end define Now y e consider the derivatives with respect to BO, evalu-

f ( ~ ) k ~ ( z k - 1 sgn(Exk + Blxk-1>}

= ~ ( z k - 1 sgn((B1- P ~ ) x k - l + e a ) }

using the recursion (3.5). We are interested in computing f'(~) at E = 0, but we need to incorporate our knowledge of the distribution of { a k } , which is uniform on [-6,+6] into the analysis. Let E 2 0 be sufficiently small and, without loss of generality, restrict attention to el > 0, and define

> 0. A 6 IC(€) = - 01 - P E -

Note that sgn( ($1 - PE)Z~- I + E U ~ ) = sgn(zk-1 + k ( ~ ) a k ) . Now we have, using Bayes'rule,

l 6 f ( E ) = EM% 7 7 ) ) 3

g(cl 71) 2 ~ ( Z k - 1 sgn(xk-l+ k(c>ak) I > ak = 77

M E , 77) + g(% - 7 7 ) ) d77,

where

- k ( C h

= {I:(€), - 1, }xpzk(z)dx

= { L,), - Lm .Ji:,, - 1, }XPZk(X)dX

noting ak and xk-1 are independent random variables. Then,

h(E, 77)

2 77) + g(E, -77) -k(c)1) +le(€),

+m

= 4 1 + k ( E h xp,, (x) dx

once various symmetries are observed. Note that for any 77 E (0,6], h(E,q) has a local maximum at E = 0. Hence f ( ~ ) has a local maximum at E = 0. Hence, f'(O+) = 0. By symmetry a similar statement will be true for E 5 0 and thus the desired derivative is zero. The conclusion is

which is the lower left term in (3.11). Now we study the upper left term in (3.11),

Since xkSpxk-1 = ak, it is enough to study the derivative of

m ( ~ ) 2 ~ { a k sgn(Ezk + &xk-1)} = E{ak sgn( (e1 - P E ) s ~ - ~ + E U ~ ) } .

With IC(&) 2 0 as before (E 2 0 sufficiently small), define

n ( E , 7) E{ a k s g n ( a - l + ~~(E)a lc ) I arc = v } , so that we can express (much as before)

i 1-6

. Px, (.> dx +k(E)1)

= 4771 PXk (.) dx*

Then, we can evaluate

142

which at t = O+ gives

Therefore,


ables,

We need to compute f'(c) at the origin. However we note that this expression increases as E -+ 0 and hence, there is a maximum at t = 0 in the set E 2 0. By symmetry, a similar statement will be true for E 5 0 and thus, the derivative f'(~) is zero. The conclusion is

which supplies the missing term of (3.11). (Working with E < 0, meaning also h ( ~ , v ) 5 0 and taking limits yields, not surprisingly, the same result.)

The remainder of this appendix is devoted to proving inequality (3.12). Let pz,(x) denote the density of x k (the

which is the lower left term in (3.23). Now, we study

system output); then ,B-lpzk (.,E') is the density for pzk.-,. d 7 E{sgn(zk)a}. 860 Noting that pzk - 1 and Uk are independent random variables

implies from (3.17) that

L 1 z+6 1

Since %k + ,&k-1 = ak, it is enough to study the derivative of

1 g(E) A ~ { a k sgn(Exk + elxk-l} cc

P,, (x) = T ( Z - x) p P,, (;) dz = ~ { a k sgn( (e1 - P E ) Q - ~ + ~ a k ) } .

The calculation mirrors that in Appendix B and we obtain (with k ( ~ ) as in Appendix A)

= s I_, p P x , ($) dz,

Pz, (.I d.k-1. l+k(e) where T ( . ) is the (uniform [-6, +a]) density of Uk. So

P,, (0) = - /+6 P,, (i) dz

g(t) = / + k ( ' ) p z , ( x ) dx = 2 - W e )

Given the smoothness of pz,( . ) at the origin (for p = 1/& it is clear from (3.18)) we get 1

26P -6

g'k) = 2Pz, ( ~ ( W ' k ) , 1 Pz, (Y) ($4 5 3' which when evaluated at zero explaihs the remaining term of

(3.23). The case t 5 0 gives the same answer.

APPENDIX B CALCULATIONS FOR BERNOULLI INPUT

APPENDIX C SAT0 EQUILIBRIA FOR IIR CHANNELS

We are interested in the condition for equalibria in the 7 parameter space at 7 (4.2) analogous to (3.1), the difference

point with respect to 7 regarded as a sequence rather than as a finite-dimensional vector. We define a convenient norm on the I parameter space, namely, the Zl-norm:

Familiarity with Appendix A is assumed here. We wish to prove the identity (3.23), for the special case of system (3.17)

respect to 01 equal zero and will not be repeated here. Our calculation is for a general p but our primary illustration in the body of the paper for clarity is with ,8 = 1/&

Recall that we are evaluating the Hessian at #(IC) = e = [0 &] ' corresponding to the equilibrium of interest (3.21). Ap- plying the same definitions off(€) and I C ( € ) as in Appendix A, we have, noting Uk and xk-1 are independent random vari-

and equal@ (3.6). As in Appendix A the derivatives with being we need to define what is meant by a stationary

lI7ll A )til.

Notice that if the channel and equalizer separately have 11- summable impulse responses, then 7 has this property. When


7 assumes the value in (4.2) we can think of it as an infinite vector with norm equal to t N . Define the functional

differential, af’(7; A%). This is defined by the requirement that it be linear and continuous and satisfy

noting 7 E l l implies lzkl = IA;ll < 03. This last identity is guaranteed by spzcifying the channel and equalizer to be stable in the sense h, 0 E Z1 (BIB0 stable). Then the condition for a stationary point at 7 is

We can also define the second FrCehet derivative as

af’(7; A%) = f ” (7 )A%

for A% E 11. Notice that f” (7) A% or af‘ (7; A%) is itself a transformation from 11 to R, equivalently,

af(7; A 7 ) = 0, V AT E 1 1 , f ” ( T ) ~ l ; ~ l = af’(7; AQAT where af(7; AT) is the FrCchet differential of 7 ( k ) at 7

and continuous with respect to AT and is defined through the

will be an element of ]R when E i l . with increment A 7 , see [22]. The FrCchet differential is linear‘ Neglecting terms of order J ~ A ~ J I - 9 , one has, see [221,

following: f ( ~ + AT) = f(7) + ~ ~ ( T ) A T + ~ ~ / ( I ) A T A T . llf(7 + AT) - f(7) - af(7; AT) 1 1 = 0, It is clear that a sufficient condition for f(.) to have a minimum

at 7 is that f’(7) = 0 and IlA7ll lim

l l A ~ I l + O

where 7 E 11 is fixed, and AT E 11 is arbitrary. For the ~ ” ( T ) A T A T > ~ ~ J A T J ~ ~ , functional f(7) of interest the numerator of the last equation takes the form (after a little calculation)

1 1 E{A;TA;AT + y l ~ ; ~ l - y l ~ ; ~ + A;AT~

for and all Ai. This is the last condition loosely described by (2.12), except that 7 = %‘e, and the dependence is on e, rather than 7. It is convenient to rewrite the condition in terms of the (uniform) positive definiteness of a certain Hessian matrix.

17 >

+ (A;AT)~} - af(7; AT) , APPENDIX D

PROOF OF ILL-CONVERGENCE RESULT /I

where all the terms are bounded (note IA’,71 5 (M - 1) d l l 7 ( \ ). Hence, we have Since ak is Bernoulli and ‘&(.) is odd, the condition for

equilibrium (4.11) can be evaluated explicitly as

af(7; A;?) =E{A’,IA’,A7 2 - y s g n ( A i 7 ) A i A 7 } , 7 # 0, E { $b - ( It1 ’ C a k - - j J=o ) $ }

=E{zkA’, - y sgn(zk)A;}A7,

where Z k = A i 7 . So the condition for an equilibrium at 7 of (4.2) is, see [22],

7 # 0, 6 -

Gb(3ltl) ’ 3 + 8 $ b ( l t l ) 8

“{!??!$}I A

= ybE( l$ak-- j l }= ( % . 3 + : * l ) ~ b i = E(2k-h - %n(zk)Ak} I,,=aLi = E{&A;}T

i.e., 7=7

Gb(3ltl) -k ‘&b(ltl) = 2yb. - 7 sgn(t)

For Bernoulli ak,-we have from (2.6) that the dispersion reduces to r b = $b(l).

We now need to consider the Hessian (4.12) that gives + E { A k sgn[$ak-z)} = 0.

= ‘FIE‘FI‘, a derivative. From a FrCchet differential df (7; A 7 ) we can ae l- @(le)=#

7 E { $ b ( z k ) X k } To deal with stability, one must introduce a second FrCchet

identify the (first) Frtchet derivative by where E i? E{ &( t . (ak + ak-1 + ak-2))AkA;}, which has the following nonzero entries: af(T;A‘T) = f’(T)A7.

1 4

(Thus, with af(7; AT) = E{zkAL - y sgn(zk)A’,}AT, we identify (for 7 # o), f’(71 = E{z& - y Sgn(Zk)Ai}.)

f’(7) : l1 + IR, defined by f’(‘T)A7 = af(T;A7). Having constructed f’ (7) , we can now construct its FrCchet

Ezz = - (&(3t) + 3&(t)) ,

= - 1 (&(3t) - &(t)).

v i E z , - - - - - - =lo = 3 0 1 = q 1 = =312 = =20 = t o 2 Notice that f’(.) maps any 7 # 0, 7 E 11 into an operator

4


We will now show that 5 can be found to be uniformly positive definite, i.e., E - V I for some positive 77 is positive definite by choice of t . The conditions for E to be uniformly positive definite can be simplified into

1 ) &(3t) + 3&t) > 0;

It is not difficult to see that all these three conditions can be met, if and only if

$,(t) > 0 and 3&(3t) + &(t ) > 0.

If &(z) > 0, for all z E [1 /3 ,3 ] , then 3&(3z)+&(z) > 0, for all x E [1 /3 ,1] and the stability conditions are met for any t E [L/3 ,1] . In-addition from the previous equation, the function ‘$b (32) + ‘$b (x) is thus monotonically increasing for z E [1 /3 ,1] . Thus, combining this, we have

Since ‘ & , ( x ) is twice differentiable (Section 11-B) and is hence continuous, there must exist a t E [1 /3 ,1] such that the equilibrium condition

- ‘$b(sltl) f ‘?b(ltl) = 2%

is satisfied and 2 is uniformly positive definite. Now to show that the Hessian (4.12) is also uniformly positive definite let us recall the assumption that the transfer function h,eJw is nonzero for all w E [0,2x). This is necessary and sufficient, when lhil < 00, for the operator W1 to have an 11 impulse response. Hence, the associated impulse responses map an 12

sequence into 12 sequences (in a bounded way). Likewise, the matrices X’ and (Z-l)’ will map infinite vectors with finite 12 norms into infinite vectors with finite 12 norms. Hence, for a E 12, we can write, with @ = ‘,%‘a,

Now, we have just shown that the further inequality

holds for some 71 > 0. Because a = (7C1)’p, it follows that II‘,%-’l12/3’@ > a’a ( 1 1 . 1 1 denotes the induced norm) or

This shows that the Hessian of (4.12) is uniformly positive definite.

Consequently, we have shown that there exists a stable equilibrium 7 of the form (4.9) with t E [1 /3 ,1] such that the zero update condition (4.10) holds, together with the stability condition.

REFERENCES

Y. Sato, “A method of self-recovering equalization for multi-level ampli- tude modulation,” IEEE Trans. Commun., vol. COM-23, pp. 679-682, June 1975. A. Benveniste and M. Goursat, “Blind equalizers,” IEEE Trans. Com- mun., vol. COM-32, vol. 871-882, Aug. 1982. D. N. Godard, “Self-recovering equalization and carrier tracking in two-dimensional ’data communication systems,” IEEE Trans. Commun., vol. COM-28, pp. 1867-1875, Nov. 1980. J . R. Treichler and M. G. Larimore, “New processing techniques based on the constant modulus adaptive algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 420-431, Apr. 1985. J. R. Treichler and M. G. Agee, “A new approach to multipath correction of constant modulus signals,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-31, vol. 349-472, Apr. 1983. A. Benveniste, M. Goursat, and G. Ruget, “Robust identification of a nonminimum phase system: Blind adjustment of a linear equalizer in data communications,” IEEE Trans. Automatic Confr., vol. AC-25, pp. 385-399, June 1980. 0. Shalvi and E. Weinstein, “New criteria for blind deconvolution of nonminimum phase systems (channels),” IEEE Trans. Inform. Theory, vol. 36, pp. 312-321, Mar. 1990. Z. Ding, R. A. Kennedy, B. D. 0. Anderson, and C. R. Johnson, Jr., “Ill- convergence of Godard blind equalizers in data communications,” IEEE Trans. Commun., vol. 39, pp. 1313-1328, Sept. 1991. J. E. Mazo, “Analysis of decision-directed equalizer convergence,” Bell Syst. Tech. J., vol. 59, pp. 1857-1876, Dec. 1980. 0. Macchi and E. Eweda, “Convergence analysis of self-adaptive equalizers,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 162- 176, Mar. 1984. 2. Ding, C. R. Johnson, Jr., and R. A. Kennedy, “Local convergence of globally convergent blind adaptive equalization algorithms and initialization tactics,” in Proceedinxs of IEEE ICASSP-91, Toronto, Canada, - . 1991, pp. 1533-1536. -, “On the (non)existence of undesirable equilibria of Godard blind equalizers,” IEEE Trans. Signal Processing, vol. 40, pp. 2425 -2432, Oct. 1992. P.H. Wittke, W.S. Smith, and LL. Cambell, “Infinite series of interference variables with cantor-type distribution,” IEEE Trans. Inform. Theory, vol. 35, pp. 1428-1436, Nov. 1989. G. J. Foschini, “Equalization without altering or detection data,” AT&T Tech. J., vol. 64, pp. 1885-1911, Oct. 1985. A. Benveniste, M. Bonnet, M. Goursat, C. Machhi, and G. Ruget, “Identification dun svst2me h non minimum de phase par approximaton stochastique,” Tech. -Rep. No. 325, IRIA LaboAa, Sept. 1978.

[16] Z. Ding, C.R. Johnson, Jr., and R.A. Kennedy, “Non-global convergence of blind recursive identifiers based on gradient descent of continuous cost functions,” in Proc. 29th IEEE Conf Decision Control, Honolulu, HI, Dec. 1990, pp. 225-230.

[17] S. Verdd, “On the selection of memoryless adaptive laws for blind equalization in binary communications,” in Proc. 6th Int. Con$ Anal. Optimization Systems, Nice, France, pp. 239-249, June 1984.

[l8] R. Godfrey and F. Rocca, “Zero memory nonlinear deconvolution,” GeophysicaI Prospecting, vol. 29, pp. 189-228, 1981.

[19] W. T. Rupprecht, “Adaptive equalization of binary nrz-signals by means of peak value minimization,” in Proc. 7th Europ. Conj Circuit Theory Design, Prague, 1985, pp. 352-355.

I201 W.A. Sethares, R.A. Kennedy, and Z. Gu, “An approach to blind equalization of non-minimum phase systems,” in Proc. IEEE ICASSP- 91, Toronto, Canada, 1991, pp. 1529-1532.

[21] S. Vembu, S. Verdd, R.A. Kennedy, and W.A. Sethares, “Convex cost functions in blind equalization,” in Proc. 25th Conf: Inform. Sci., Systems, Baltimore, MD, Mar. 1991, pp. 792-797.

[22] D. G. Luenberger, Optimization by Vector Space Methods. New York Wiley, 1969.

Documents

Local convergence of the Sato blind equalizer and ...users.rsise.anu.edu.au/briandoa/public_html/pubs/... · Local Convergence of the Sat0 Blind Equalizer and Generalizations Under