Advanced Concepts in Adaptive Signal Processing || Introduction and Background

CHAPTER 1

INTRODUCTION AND BACKGROUND

Adaptive signal processing is a subject whose rapid development over the lastthirty years has been made possible by extraordinary advances in the related fields ofdigital computing, digital signal processing, and high speed integrated circuittechnology. One of the earliest publications in adaptive filtering was the paperpublished in 1959 by Widrow and Hoff [1.19] that first introduced the least meansquares (LMS) adaptive filtering algorithm. At the time that thi s paper waspublished, over ten years before the invention of the microprocessor, the state-of-theart in digital hardware was not sufficiently advanced for engineers to consider practicalimplementation of an adaptive filter in purely digital form. Indeed, the firstexperimental LMS filters were implemented as analog circuits with complicatedarran gements of analog relay s that performed the switching necessary to adjust thefilter tap weights. Nevertheless, the simpli city of the LMS algorithm and its robustperformance in spite of the simplifying assumptions behind its derivation, attractedthe attention of a generation of electrical engineers and formed the basis for intenseresearch and development in adaptive filter architectures and algorithms that continuesin force to the present day.

Much of the early interest in the LMS adaptive filter was for noise reduction incommunication systems, and for automatic beam steering in adaptive array antennas.The adaptive line enhancer (ALE) was one of the early success stories of the LMSadaptive algorithm. The ALE is designed to remove broadband noise from a primarysinusoidal signal of interest by using the correlated properties of the sinusoid toseparate it from additive broadband noise . The LMS adaptive filter was later used intelephone system applications as the central component in adaptive echo cancelers.Indeed, many generations of custom designed "adaptive echo cancelers on-a-chip"appeared in practice during the late 1970' s and early 1980's. In recent years there hasbeen new interest in implementing echo cancelers with programmable DSP chipsrather than custom designed integrated circuits, but the LMS adaptive algorithm invarious forms and with various modifications remains the standard in many practicalsystems today .

Since the subject of adaptive signal processing is itself quite old , and sincethere are currently many modern texts that cover the subject very well, the natural

W. K. Jenkins et al., Advanced Concepts in Adaptive Signal Processing © Springer Science+Business Media New York 1996

2 Advanced Concepts in Adaptive Filtering

question that deserves an answer is what is the need for yet another book on thesubject? The simple answer to this question is that in recent years there has beenextensive further development of new adaptive signal processing concepts that are notadequately covered in current books. For example, the finite impulse response (FIR)adaptive filter is very well covered, but there is much less reference material oninfinite impulse response (IIR) adaptive filters . Similarly, while the LMS adaptivealgorithm and its variants are treated in great length by many authors, the morepowerful and computationally more complicated algorithms such as the quasiNewton and the conjugate gradient adaptive algorithms are treated only in the moreobscure research literature. The objective of this book is to bring forth from theresearch literature some of the more recently developed adaptive filtering concepts andto provide a consolidated treatment for many of these promising new ideas.

With the recent burst of new activities surrounding cellular telephone, digitaltelevision (HDTV), wireless communications, and digital multimedia commercialservices, many new technical problems have emerged for which advanced adaptivesignal processing may offer new and better solutions. For the purpose of discussionsthroughout this book , the term "advanced" will be used in regard to topics that arebeing put forth as important areas worthy of further research and development, andwhich cannot be easily found developed in current textbooks. In contrast, the term"conventional" will be used to refer to those topics in adaptive signal processing thatare already well developed, and which are already covered in great depth in currentlyavailable books .

In keeping with the goal of concentrating on advanced topics, there will be noeffort to provide a comprehensive background in the current state-of-the-art withregard to conventional adaptive signal processing concepts. It is assumed that thereader is familiar with the current state-of-the art, so that familiar concepts such asLMS adaptive filters, FIR lattice filters , RLS algorithms, LMS sign algorithms, tapweight leakage, etc. will be freely introduced into discussions as needed withoutstressing the reader's attention with too much tedious background. Oftenconventional concepts are introduced as a starting point to discuss an advancedconcept, and sometimes they are introduced for the purpose of comparingperformance. In general the conventional concepts are richly referenced throughoutthe book so the interested reader can easily locate important background material asneeded.

1.1 Common Adaptive Concepts from Different DisciplinesIn the course of identifying and developing important new topics in adaptive

signal processing, a point of view that is adopted throughout this book is thatadaptive signal processing does not stand alone as a self-contained discipline, butrather it shares common ideas, terminology, theory, and algorithms with otherengineering disciplines. For example, engineers and scientists working in the fieldof communications often deal with issues in adaptive equalization for the reduction ofintersymbol interference in digital communications systems. Within the field ofadaptive control some of the problems frequently encountered include closed loop

Introduction and Background 3

feedback, parameter estimation, and system identification. In telecommunicationsignal processing the focus is more on echo cancellation, noise filtering, and linearpredictive coding (LPC) for speech compression. In the field of integrated circuitdesign, major issues involve equation solvers, numerical integration methods, sparsematrices, and stiff differential equations. In numerical analysis the problems aretypically concerned with optimization, existence and uniqueness of solutions,stability, and ill-conditioning. Given the diverse nature of these disciplines, it mayseem strange that there is considerable commonality and synergism among them .However, extensive commonality does indeed exist among these areas, and we shallstrive throughout this work to identify this commonality and to capitalize ontheoretical concepts, analytical tools, and algorithmic implementations from all ofthese areas to push forward the state-of-the-art wherever possible .

There are three technical disciplines in particular that interface directly with thesubject of adaptive signal processing: 1) digital signal processing, ii) automaticcontrol, and iii) numerical analysis. Figure 1.1 lists some of the importantterminology that is used in these three fields. In some cases, different terms are usedin these fields to refer to the same basic concept. For example, the term coefficientupdate equation is typically used in signal processing to refer to the iterativerelationship that describes how filter coefficients at the current iteration are generatedfrom the available parameters and signals at previous iterations. In the field ofadaptive control, a similar concept may be referred to as an adaptive control law.Furthermore, in numerical analysis the term numerical integration is used to refer tothe process of iteratively generating the solution to a differential (difference) equation.While these terms have their own special meaning within these different disciplines,it may be useful for those conducting research in adaptive signal processing torecognize the similarity of concepts in these fields and to rely on synergistic resultsfrom one field to help solve similar problems in the others . If this interdisciplinaryapproach is to be successful, it is necessary for an individual to become sufficientlyfamiliar with terminology in the different fields so as to be able to read and appreciateconcepts and results in the different areas.

For example, tap weight leakage was introduced in the 1970's and is still usedtoday in state-of-the art echo cancelers to assure that the echo cancelers at the "near"and "far ends" of a telephone channel will remain wen behaved when narrow bandsignaling information or half band fun duplex binary data are transmitted overchannels that are designed primarily for voice transmission. The problem in thesesituations is that during these periods of activity the cancelers are not properly trainedat an of their operating frequencies. It is well known that channel noise in emptyspectral bands can excite unwanted modes in the echo cancelers. Over a long periodof time such a canceler may amplify the noise, causing a catastrophic instability inthe end to end communication link known as "singing" . However, in the field ofautomatic control the concept called persistent excitation is a well developed notionthat is used in system identification problems to specify what mathematical criteria

4 Advanc ed Concepts in Adaptive Filtering

Common terminology in

Automatic ControlCommon terminology in

Signal Processing

adaptive digital filterstransform domain filterstapped delay line structureslattice structuresfinite impulse responseinfinite impulse responseadaptive equalizationadaptive echo cancellationLMS adaptive algorithmsrecursive least squaresreal time implementationquantization errorstap weight leakageconvergence factors

adaptive control systemsrobust control systemsstate space modelsKalman filterssystem identification

1I----III111a parameter estimationleast squares estimationestimation noisemeasurement noiserobustnessfrozen parameter modelstochastic control lawsKalman gaincritical damping

~

optimizationGauss-Newtonconjugate gradientscost funtionsLevinson algorithmpreconditionersToeplitz systemsill-conditioned matricesstiff systemsexistence of solution

uniqueness of solutionparallel systemseigenvaluesequation solversnumerical integrationspectral operatorsacceleratorsstabilitysteepest descentstep size

Figure 1.1 Some tcnninology used in three synerg istic fields.


must be satisfied by the input signal to guarantee that all parameters in the model areproperly identified. Borrowing the concept of persistent excitation from the field ofautomatic control can help adaptive filter designers to develop an echo canceler thatwill remain well behaved on channels that are subject to diverse channel signalingconditions.

While there are remarkable similarities in many areas of signal processing,automatic control, and numerical analysis, there are also important differences withcertain concepts that may initially seem identical. Kalman filtering , which wasdeveloped primarily by researchers from the field of automatic control , has been verysuccessful for adaptive noise control in many industrial problems, and also forestimation and tracking in guided missile control systems. Indeed there is a strongsimilarity between the Kalman filter and certain types of adaptive noise cancelingfilters. Since the Kalman filter originated within the field of automatic control, thetheory of the Kalman filter was developed in the context of a state spacerepresentation of the filter , which is a popular way for control theorists tomathematically describe an arbitrary linear system (filter) . However, a state spacedescription of a system is quite general , so it becomes apparent at the outset that aKalman filter in its most general form has both poles and zeros, and is, therefore, aninfinite impulse response (IIR) system in signal processing terminology.Comparative analyses have revealed further that the learning strategy used in theKalman filter is similar to a recursive least squares (RLS ) adaptive filteringalgorithm. This type of comparative study is quite useful. It suggests to anadaptive filter designer that the Kalman filter is a comparatively sophisticatedadaptive filter . In its general form it is an IIR system that is adjusted with an RLSadaptive algorithm [1.l2]. This suggests that the Kalman filter may be subject tothe potential instabilities of an IIR adaptive filter, as well as the well knownnumerical sensitivities of the RLS algorithm . In certain situations the designer mayfind that an adaptive FIR filter using a fast quasi-Newton algorithm is a moredesirable solution to problems that are subject to large unknown disturbances. Theimportant point of such a comparative study is not to promote anyone solution overanother, but rather to put all of the results from these different areas into a commonframework in order to provide a richer set of possible solutions.

In certain cases there can be pitfalls in transferring results from one disciplineto another. For example in adaptive control it is customary to characterize theadaptive controller as a pole-zero (IIR) system realized with a direct form structure.Since the stability of the controller is always a primary concern in the design ofadaptive control systems , a natural question that could be asked is why not start witha linear phase FIR structure for the controller? While this seems to be an attractiveapproach initially , a study of the problem readily reveals that the FIR structure is notcompatible with the needs of an effective controller. While an FIR structure iscapable of approximating IIR functions , a typical FIR response is accompanied by alarge time delay. If N is odd, where N is the length of the FIR filter, then the delayof a linear phase structure is equal to (N+ 1)12 time samples . Clearly if N is large,the time delay becomes large and may possibly lead to unstable conditions in the


closed loop system. Therefore we must conclude that the FIR adaptive filterstructure, which is the most popular structure for adaptive filters, is not well suitedfor adaptive controllers .

As another example consider the problem of channel equalization in binarycommunication systems . Channel equalizers are standard components in high datarate state-of-the-art modems where there is a need to equalize the channel to eliminateintersymbol interference. Many channel equalizers in use today are FIR structuresimplemented with a simple tapped delay line and a relatively simple adaptivealgorithm, such as the LMS algorithm. However, typical channels are modeled byall-zero functions that characterize the presence of a few dominant zeros in thespectrum [1.12]. If the channel itself is best characterized by an FIR model, it wouldfollow that the most effective structure for an equalizer would be an all pole structurewhose poles could be adaptively placed to precisely cancel the zeros in the channel.Therefore, based on this reasoning, it would appear that an all-pole IIR adaptive filterwould provide the best structure for an equalizer. But, unfortunately, many channelsare not minimum phase systems, which means that the FIR channel model may havezeros outside of the unit circle. In such cases an IIR equalizer will become unstableas one of its poles moves outside the unit circle to cancel a corresponding zero.While the IIR structure initially appears to be an ideal structure for channelequalization, in the final analysis there are peculiar features of the problem thatdemand the assurance of guaranteed stability that the FIR equalizer provides.

The above discussion provides just a brief look into the pros and cons of crossdisciplinary studies in adaptive signal processing practice. Such comparative studiesmay not lead to new solutions directly , but they do provide a broader understandingof the theories and methodologies that are collectively available from these diversefields. In some situations such studies can lead to new and better solutions. In othersituations such studies provide a better understanding of specific problems in thesedifferent fields , and they provide a better appreciation of why so many differentapproaches to adaptive signal processing problems are important in practice .

1.2 Generic Applications of Adaptive MethodsThere are four basic configurations in which adaptive filters are typically used

to solve practical engineering problems. The following paragraphs briefly introducethese basic adaptive filter configurations and discuss peculiar features of each .Throughout the book the input to the adaptive system will be denoted as a discretetime sequence x(n), whereas the output of the adaptive system will be denoted asyen). The present value and the N-I past values of the input signal are usuallyexpressed as an input vector x(n) = [x(n), x(n-l), . . . , x(n-N+ I)]T so that matrix andvector notation can be used to simplify the notation used for the adaptive filterequations. Every adaptive system requires a training signal den) that is used to drivethe adaptation process toward a particular des ired solution. The output error isformed as the difference between the desired signal and the filter output, i.e.


e(n) =den) - yen), and the ideal cost funct ion is defined as the statistical meansquared error (MSE):

J(n) =E[le(n)12]. (1.1)

In practice it is necessary to work with a realistic approximation to the expectedvalue on the right side of equation (1.1). One such approximation that is used inmany adaptive algorithms is a weighted average of all squared errors that fall within asliding window of length L, i.e.,

J(n)1 L-I .

= - L A' I e(n - i) 12

L ;=0(1.2)

where 0 < A ::; 1 is a "forgetting factor" that places smaller emphasis on samplesthat are farther back in time from the current time index. The block length L is anarbitrary parameter whose value is typically chosen to suit the constraints of aparticular application. It is sometimes desired to give the cost function an infinitememory, in which case the upper limit on the summation of equation is set equal to .the running time index n, and the (IlL) factor in front of the summation iseliminated. The ideal cost function of equation (1.1) is often referred to as thestochastic mean squarederror, whereas the form of the cost function in equation (1.2)is called the deterministic meansquarederror.

1.2.1 System Identification Configuration

An adaptive filter is said to be used in the system identification configurationwhen both the adaptive filter and an unknown system are excited by the same inputsignal x(n), the system ouputs are compared to form the error signal e(n) =den) yen), and the parameters of the adaptive filter are iteratively adjusted to minimizesome specified function of the error e(n). In the system identification configuration,the desired signal is produced as the output of an unknown plant whose input isaccessible for excitation. When the minimum of the cost function is achieved andthe adaptive filter parameters have converged to stable values, the adaptive filterprovides a model of the unknown system in the sense that the adaptive process hasformed the best approximation it can in the MSE sense using the structure imposedby the adaptive system. The converged coefficients provide good estimates of themodel parameters.

In order for the adapt ive system to form a good model of the unknown systemat all frequencies, it is important that the input signal have sufficiently rich spectralcontent. For example, if the adaptive filter is an FIR filter structure with Nadju stable coefficients, the input signal must contain at least N distinct frequencycomponents in order to uniquely determine the set of coeffic ients that minimizes theMSE. A white noise input signal is ideal because it excites all frequencies withequal power. A broadband colored noise input will also provide a good excitationsignal in the sense of driving the adaptive filter to the minimum MSE solution ,


x(n)

input

AdaptiveFilter

UnknownSystem

y(n)

e(n)

+d(n)

output

Figure 1.2 System identification configuration.

although in general the convergence rate of the learning process will be slower thanfor white noise inputs because the frequencies that are excited with small powerlevels will converge slowly. As we shall see throughout this book, many adaptivealgorithms attempt to normalize (or whiten) the input power spectrum in order toimprove the convergence rate of the learning process.

The system identification configuration is a fundamental adaptive filteringconcept that underlies many applications of adaptive filters. The major attraction ofthe system identification configuration is that the training signal is automaticallygenerated as the output of the unknown system. The disadvantage is that the inputof the unknown system must be accessible to be excited by an externally appliedinput noise signal. In some applications obtaining a model of the unknown systemis the desired result, and the accuracy of the adaptive coefficients is a primaryconcern. In other applications it is not necessary that the unknown system beidentified explicitly, but rather that the adaptive filter is required to model theunknown system only to generate accurate estimates of its output signal. This leadsto the next configuration called adaptive noise cancellation, which is really avariation on the fundamental theme of system identification.

1.2.2 Adaptive Noise Canceling Configuration

A block diagram for an adaptive noise canceler is shown in Figure 1.3, whereit is seen that the unknown system in this configuration is not shown explicitly , noris it desired to identify the implicit unknown system in a direct way. The primarysignal is assumed to be the sum of an information bearing signal s(n) and an additivenoise component No(n), which is uncorrelated with sen). The primary signal is usedto train the adaptive noise canceler, so that den) =sen) + No(n) and the error signalbecomes e(n) =d(n) - y(n) =s(n) + No(n) - yen). The reference signal, which is usedas the input to the adaptive filter, should be a reference noise N 1(n) that isuncorrelated to s(n), but which is correlated in an unknown way with No(n). The

Introduction and Background

primary input

sen) + No(n)

~n)

9

reference

x(n) Adaptive e(n) output

Figure 1.3 Noise canceling configuration.

adaptive filter forms an estimate of No(n) and subtracts this estimate from the

primary input signal, thereby forming in a good estimate of the information signal atthe output of the noise canceled. Note that

E[le(n)12] =E[ls(n)12] + E[INo(n) - (y(n)12], (1.2)

so that minimizing E[le(n)12] will also minimize E[INo(n) - (y(n)12] because the first

term is dependent only on the information signal sen) and its mean squared valuecannot be affected by the adaptive filter as long as sen) and N1(n) are uncorrelated.After the adaptive filter converges yen) becomes the best estimate of NO(n) accordingto the MSE criterion.

Since the unknown system in the adaptive noise canceling configuration isimplicit, there is no need for access to its input in this configuration. However, it isnecessary to find a suitable reference signal that does not contain any significantamount of the information signal sen). If the reference contains even small levels ofs(n), then some part of the primary signal sen) will be canceled and the overallsignal-to-noise ratio will degrade.

1.2.3 Adaptive Linear Prediction Configuration

Adaptive linear prediction is a very important and well developed subject thatspans many different areas of engineering. A block diagram of this configuration isshown in Figure 1.4. In this configuration the input vector is delayed, usually byone time sample, and the delayed input vector x(n-l) = [x(n-l), x(n-2) , ... , x(n-N)jTis then used to predict x(n), the current value of the input. The prediction error isgiven by e(n) = den) - yen) = x(n) - yen). Sometimes the entire system of Figure 1.4from the input x(n) to output I is considered to be a single complete system, inwhich case it is referred to as a prediction error filter. Whenever the mean squaredprediction error is minimized e(n) will become uncorrelated with x(n), while yen)remains highly correlated with x(n) . Therefore, since den) = yen) + e(n), theprediction filter decomposes the input signal into two components, one that is


den)

output 1e(n)

IL ~

output 2

AdaptiveFilter

x(n)

input

Figure 1.4 Linear prediction configuration.

uncorrelated to the input and one that is highly correlated to the input. In this sensethe linear predictor is a type of correlation filter.

Note that two distinct outputs, output I and output 2, are labeled inFigure 1.4 to give access to both the correlated and uncorrelated components.Output I is used in applications such as adaptive linear predictive coding (LPC) forspeech analysis and synthesis, and in adaptive differential pulse code modulation(ADPCM) for speech (and image) waveform compression. Since the prediction erroris a difference between the actual value of x(n) and its predicted value yen), thedynamic range needed for accurately encoding e(n) is usually much smaller than x(n)itself. This is the fundamental mechanism by which a linear prediction filter is ableto compress waveforms. Alternately, output 2 produces a filtered version of x(n)with the uncorrelated noise component removed. When used in this mode theadaptive linear predictor becomes a line enhancer, which is capable of removingbroadband noise from a narrow band information signal, a function frequently neededin communication systems .

The linear predictor described above is called aforward prediction filter becauseit uses the N "past" samples contained in x(n-I) to predict the "future" sample x(n).In this case the prediction error is called the forward prediction error, denoted by ef{n),and the overall filter from the input to output I is called a forward prediction errorfilter. The linear prediction problem can also be formulated as backward linearprediction, in which case the filter is used to estimate the sample x(n-N) from the N"future "samples contained in x(n) =[x(n), xm-I), ... ,x(n-N+I)]T. In this case theprediction filter is called the backward prediction filter, the prediction error is calledthe backward prediction error, denoted eben), and the overall filter is called thebackward prediction error filter. A combination of forward and backward prediction isused in the conventional adaptive lattice filters, where the uncorrelated properties ofthe prediction error leads to excellent learning characteristics [1.12, 1.13]. Theadaptive lattice structure is discussed later in this chapter, and is again revisited inChapter 5 as one of the attractive adaptive polynomial filter structures.

Introduction and Background II

1.2.4 Inverse System Configuration

The fourth adaptive filtering configuration is the inverse systemconfiguration shown in Figure 1.5. In this configuration the adaptive filter is placedin series with an unknown system and the output y(n) is driven by the adaptivealgorithm to form the best MSE approximation to a delayed copy of the inputsignal. When the adaptive filter reaches convergence, the series combination of theunknown and adaptive systems forms an overall frequency response thatapproximates a pure delay, i.e., the overall system approximates a flat magnituderesponse and a linear phase characteristic across the usable bandwidth of the excitedspectrum. In this case the adaptive filter estimates H-I (jro), where Htjoi) is thefrequency response of the unknown system . The inverse system configuration is thebasis for adaptive equalization, in which non-ideal communication channels areequalized in order to reduce dispersion and to eliminate intersymbol interference inhigh speed digital communications. The adaptive equalizer forms an essentialcomponent of most state-of-the-art modems today, where the equalization function isrequired to maintain acceptable bit error rates when binary information is transmittedacross narrowband (4 kHz) telephone channels. Equalizers have also been used toequalize the dispersive channel that a computer faces when transferring high speeddigital data to a magnetic recording medium (disk or tape). It has been shown thatproperly designed equalizers wil1 permit symbols to be more densely written on themagnetic recording medium due to the reduction in intersymbol interference. Thismethodology has attracted the attention of many disk manufacturers due to its abilityto effectively increase the capacity of the disk.

The training of an adaptive equalizer in the inverse system configuration raisesa number of problems that are unique to this configuration. Note that by the natureof the network configuration, the input to the adaptive filter has already been filtered

UnknownSystem

AdaptiveFilter

e(n)

y(n)

output

x(n)

input

Figure 1.5 Inverse system configuration.

d(n)


by the unknown system. Hence in most situ..tions the input to the equalizer cannotbe a white noise signal, and depending on the severity of the channel imperfections,the equalizer may experience trouble converging quickly. In a communicationsystem, the transmitter and the receiver are typically located at separate physicallocations, so it may not be a simple matter to provide a training signal that is anexact delayed copy of the transmitted waveform. For this reason, channel equalizersare often trained during prescribed "hand shaking" intervals, during which time apseudorandom binary sequence with known spectral characteristics is transmitted.Once the equalizer has converged to equalize the present characteristics of theunknown channel, the parameters of the equalizer are frozen and held at theirconverged values during the data transfers that follow.

Due to the difficulty in obtaining a suitable training reference, there has been agreat deal of interest in combining certain blind equalization schemes with a decisionfeedback equalizers. In these cases the blind equalization technique is used to bringthe equalizer into the neighborhood of proper convergence, at which point the schemeis switched over to the decision feedback algorithm which works very well as long asthe equalizer remains in the neighborhood of its optimum solution. An importantfamily of blind equalization techniques, the Constant Modulus Algorithm (CMA), isdiscussed further in Chapter 2 for use in this type of application.

1.3 Performance Measures in Adaptive Systems

For all adaptive systems it is important to establish performance measuresthat not only tell the designer how well a particular adaptive system is functioning,but also to provide comparative performance evaluations for various filter structuresand adaptive algorithms to aid in the proper choice of a good solution within theconstraints of the application. Although there are many possible performancemeasures that are used freely throughout the adaptive filtering literature, thefollowing list of measures will be used throughout this book to provide performanceevaluation. Note that many of these criteria are not independent. For example,convergence rate and computational complexity are closely related properties. Aproper evaluation of these criteria involves a tradeoff study that should be done in thecontext of a particular application.

1.3.1 Convergence Rate

The convergence rate of an adaptive learning process is a very importantperformance criterion that must be evaluated within the requirements of a particularapplication. While much of this book deals with filter structures and adaptivealgorithms that guarantee near optimal convergence rates, the reader should not beleft with the mistaken impression that "faster convergence" necessarily implies a"better solution". The increased cost of faster convergence is only worth theallocation of additional resources if faster convergence is needed for high frequencyoperation. In low frequency applications such as adaptive echo cancellation onstandard voice quality telephone channels, and in adaptive techniques for audio band


noise cancellation to improve room acoustics, the slow and simple solution providedby the conventional LMS algorithm is perfectly adequate. However, with theincreased use of adaptive techniques for mobile radio. cellular telephone. and digitaltelevi sion (HDTV) it is clear that future applications will place increased demands onhigh frequency operation where rapid convergence may be a deciding factor in thesuccess of an adaptive solution. Also, in two dimensional adaptive filtering ofimages and video sequences it will be necessary for adaptive filters to track rapidlyvarying signal statistics. Therefore, a major emphasis in the following chapters ofthe book is on the development of new and better algorithms that provide rapidconvergence for high frequency applications, while maintaining a computationalburden that is manageable within the system resources.

1.3.2 Minimum Mean Square Error and Misadjustment

The minimum MSE and the filter's deviation around the minimum MSE(misadjustment) are direct measures of how well an adaptive system is able toperform its task of identifying an unknown system, eliminating uncorrelated noise,or predicting future behavior of a particul ar signal. The minimum MSE depends onmany factors , some of which are within the ability of a designer to control. Inparticular the minimum MSE depends on gradient noise , coefficient sensitivities ofthe filter structure, sensiti vities to numerical quantization, the order of the adaptivesystem, and the magnitude of measurement noise, to name only a few. Throughoutthis book the minimum MSE error and its corresponding misadjustment will betreated as an important performance measure.

1.3.3 Parameter Estimation Accuracy

The accuracy of the adaptive parameters after convergence of the adaptivesystem is particularly important in system identification applications where theestimated coefficients are to be used as a model of the unknown system. In general,an adaptive system with low coefficient sensitivities will result in the most accurateparameter estimates . For example, if an IIR adaptive filter is used to identify anunknown transfer function and to obtain modeling parameters. it has beendemonstrated that a parallel form realization. which is known to be a low coefficientsensitivity structure, will result in very accurate coefficient estimates [1.28, 1.37].In many other applications the accuracy of the filter coefficients is not in itself afundamental requirement, but rather their accuracy is important only to the extentthat it leads to an acceptable minimum MSE condition.

1.3.4 Computational Complexity

Low computational complexity is particularly important for real timeapplications where the algorithm is to be implemented in custom designed VLSIhardware, or when a solution is to be programmed in real time using one of thecommercially available DSP chips. Since the hardware speed and accuracy of acommercial DSP is beyond the control of the adaptive filter designer. it is often


necessary for the designer to carefully choose an adaptive algorithm that performswell enough to meet performance objectives while computationally simple enoughto meet the timing constraints of real time operation. In contrast to the demands ofreal time operation, if data is to be processed off-line using stored data files, then arelatively sophisticated algorithm can be chosen . to guarantee the best possiblequality in the result. Furthermore, if the processing is to be done on a generalpurpose computer, then the complexity of the algorithm and the accuracy of theparameters are of little concern, since most general computers provide more thanenough capability in these respects.

In some applications, computational complexity can be translated intorealistic power requirements, so in applications where limited power is available alower complexity algorithm would be desirable. For example , it may be impossibleto meet real time processing requirements in certain applications by using only oneDSP chip, while the requirements may be easily met with two DSP chips workingtogether. But two chips means extra power consumption. In such cases a limitationon available power (in space applications or deep water environments) may motivatea designer to choose a simpler adaptive algorithm so that minimal hardware can bespecified.

1.3.5 Stability

Any time an adaptive system is considered for solving a practical problem,the question of stability arises as a fundamental concern. Since any adaptivealgorithm is a form of closed loop feedback control system, there is always apotential for instability and subsequent divergence of the adaptive system. In generaladaptive filters based on the FIR structure are inherently stable, so as long as variousstep sizes and gain constants are chosen conservatively, there should be little worryabout the occurrence of instabilities in real time performance.

However, the use of IIR structures raises a serious concern about stability. Ifthe poles of the adaptive filter are driven too far outside the unit circle during theadaptation process, the adaptive algorithm itself may become unstable and the entirelearning process may diverge . This problem is worrisome because it can be shownexperimentally that many IIR adaptive filters will achieve fastest convergence byallowing their poles to wander outside the unit circle, only to be drawn back toward astable solution as the adaptive process converges. Sometimes the placing ofrestrictions on the poles to keep them within the unit circle by reflection techniqueswill stifle the learning process, resulting in slow convergence. This is an area ofadaptive filtering which in not well understood at the current time.

Another concern that relates to stability is to assure that an adaptive systemremains persistently excited so that all internal modes are driven toward convergenceand coefficients do not wander aimlessly throughout the parameter space, withpossibly disastrous consequences caused by eventual register overflow. In mostadaptive equalizers, persistent excitation is guaranteed by the design of thepseudorandom binary sequence that is used to excite the combined channel andequalizer. However, other applications many not so easily controlled. For example,

Introductionand Background 15

an adaptive filter used in LPC speech compression works well as long as speech isindeed present on the channel. If there are unvoiced periods during the transmissionwhen single tones, or other narrowband signalling information is sent over thechannel, it may be necessary to protect the adaptive system from diverging duringthese periods . A possible solution is to use a voice detector on the channel, so thatwhen the voice signals disappear during quiet periods the adaptive filters are frozenand cannot make misdirected adjustments that could lead to instability.

1.3.6 Robustness

Robustness is an important performance criterion that is often difficult tomeasure in a quantitative manner. There are two important issues : 1) robustnesswith respect to external noise disturbances, and ii) robustness with respect toalgorithmic ill-conditioning and arithmetic quantization noise. Much of the effort inthe following chapters is devoted to the development of adaptive algorithms thatremain well conditioned regardless of the signal characteristics, and which remainwell behaved numerically. Less attention is devoted to the issues of robustness withrespect to external noise disturbances, although this type of robustness is generallyobtained when the proper attention is given to stability and persistent excitation.

1.4 The Minimum Mean Squared Error SolutionThe set of adaptive system filter coefficients that minimizes the MSE

criterion , as specified by the cost function in equation (1.1), is known as the Wienersolution. An important classical result from the field of optimal filtering theorystates that if it exists, the Wiener solution is given by the following set of linearalgebraic equations, known as the Wiener-Hopf equations [1.12],

RxWopt =p, (1.3)

where R x =E[x(n)xT(n)] is the autocorrelation matrix of the input signal , p =E[d(n)x(n)} is the cross correlation of the desired signal with the input, and Wopt isthe solution vector which contains the optimal coefficients.

It is important to note that many sources approach the derivations from adeterministic point of view using the method of least squares. While it is beyond thescope of the present work to go into this topic in detail , it is important to highlightthe similarity between deterministic least squares methods and the stochastic result ofequation (1.3) [see 1.12 for complete details] . In the method of least squares, a timeaveraged autocorrelation function q,(j,k) and a time averaged crosscorrelation functione(k) are defined as

N-Iq>(j ,k) = I,x(i - j)x *(i - k) 0 S;j, k S; N-l, (1.4)

i=Oand


N-IO(-k ) = I xCi - k)d *(i) .

i==OO::;k::;N-1. (1.5)

Then if <1> = [q>(j ,k)] and e = rO(O),O(-1), . . .,O(-N + 1){ , the optimal least squaressolution can be expressed by an alternate set of linear algebraic equations known asthe normalequations:

<1>wopt = e . (1.6)

The Wiener-Hopf equations and the normal equations are equivalent results for thestochastic and deterministic formulations of the optimal filtering problem,respectively. In each case the autocorrelation matrix should be Toeplitz, positivesemidefinite, and must be nonsingular in order to guarantee a unique solution. Also,in both cases the eigenvalue spread and numerical conditioning of the autocorrelationmatrix has a great effect on the convergence rate of the adaptive learning process.

An effective tool for investigating the performance of adaptive filters is themean squared error surface, which is an N-dimensional function giving the error as afunction of the N filter parameters. Assuming for this discussion that all signals arereal valued, an expression for the MSE surface is [1.43]

(1.7)

For a given value of the adaptive parameters, the error is measured by exercising theadaptive filter with the adjustable coefficients fixed at that value, and comparing yen)to den). This process is repeated until the entire error surface is created. The optimalvalues for the filter parameters are found by differentiating (1.7) with .respect to thevariable coefficient vector wen) and setting the result to zero . This procedure resultsin the optimal solution as given by equation (1.3).

The object of an adaptive filter is to search this error surface and locate thevalues of wen) which yield the minimal error. The properties of (1.7) affect theperformance of the optimization procedure used by the adaptive filter. For FIRadaptive filters, the error surface is quadratic and convex with a unique minimizingsolution. The shape of the error surface about a minimizing value is determined bythe autocorrelation matrix , R x. of the input signal. The eigenvalues of R xdetermine the axes of the ellipses of the error surface, and the eigenvectors of Rx setits orientation in N-space. Further error surface analyses are presented inSection 1.5. for the LMS and the transform domain adaptive optimization strategies.

In the general adaptive filtering problem the error surface is searched by aniterative algorithm which has the form of

wen) = w(n-I) + F{e(n), x(n)}, (1.8)

where F{e(n), x(n)} denotes the update increment, which is usually a function of theerror signal e(n) and the input signal x(n). For the LMS algorithm,


F{e(n), x(n)} = l1e(n)x(n)

17

(1.9)

The performance of the LMS algorithm may be analyzed by examining the meanbehavior of the error [1.43] . By expre ssing the error in terms of a set of axescentered about the solution, Wopt, the decay of E[e(n)] follows a geometric seriesgenerated by a term of the form

(1.l0)

These "modes" of convergence are determined by the statistics of the input signal.Expression (1.10), after an axes rotation in the coefficient spaces, becomes

(l.ll)

The entries of the diagonal matrix, A, are the eigenvalues of Rx. Each mode isdetermined then by one of the N eigenvalues of Rx.

For the adaptive filter to converge the step size parameter, 11, must lie betweenzero and In"max' Since the rate of convergence will be controlled by the largestmode, convergence is fastest when all modes are equal or "'min = "'max' This is thecase for white input signals. This also corresponds to an error surface with circularcontours. Slow rates of convergence are observed for highly correlated input data.The eigenvalues of Rx for these signals are highly disparate and the associated errorsurface is elliptical.

Many forms of equation (1.8) have been proposed to reduce or eliminate thedependence of the convergence rate of the LMS algor ithm upon the input signalstatistics. The goal is to cancel the matrix, R x- from (1.10), decoupling andnormalizing the modes of convergence. This is typically accomplished by"orthogonalization" - incorporating the inverse of some approximation of theautocorrelation matrix into (1.8). The modal relations become dominated by

(1.l2)

As the approximation, Rx(n), improves, (1.12) becomes diagonal, and, in the caseof exact representation, all the modes become uncoupled and have unity eigenvalues..(This implies convergence in one step.) Orthogonalization is a process whichattempts to permit the LMS algorithm to behave as if the input signal were a whitenoise signal.

1.S Adaptive Algorithms for FIR SystemsA direct form FIR digital filter structure is shown in Figure 1.6. The

structure requires N-l delays, N multiplications, and N-I additions to be performedfor each output sample that is produced. The amount of hardware (as well as power)required to implement the direct form structure depends on the degree of hardwaremultiplexing that can be tolerated within the within the speed demands of the


x(n)

x(n-l)

IIII

x(n-N+1) ~'------t--t.../

Figure 1.6 The direct form adaptive filter structure.

e(n)

eXn)

application . A fully parallel implementation consisting of N delay registers. Nmultipliers, and a tree of a of two-input adders would be needed for very highfrequency applications. At the other end of the performance spectrum. a sequentialimplementation consisting of a length N circular delay line and a single timemultiplexed multiplier and accumulation adder would provide the cheapest (andslowest) implementation. This latter structure would be characteristic of a filter thatis implemented in software on one of the many commercially available DSP chips.An important point is that regardless of the hardware complexity that results fromany particular implementation. the computational complexity of the filter isdetermined by the requirements of the algorithm. and as such remains invariant withrespect to different hardware structures . In particular, the computational complexityof the direct form FIR filter is O[N], since, N multiplications and (N-I) additionsmust be performed at each iteration. When designing an adaptive filter, it seemsreasonable to seek an adaptive algorithm whose order of complexity is no greaterthan the order of complexity of the basic filter structure itself. This goal is achievedby the LMS algorithm, which is the major contributing factor to the enormoussuccess of that algorithm. If this goal is not achieved, and if the computationalcomplexity of the adaptive algorithm is greater than that of the filter, the filter wiIlsit idle for part of each iteration, waiting for the updating of its parameters to becompleted for that iteration . Extending this principle for 2-D adaptive filters impliesthat a desirable adaptive algorithm in two dimensions would have an order ofcomplexity of O[N2] , since a 2-D FIR direct form filter has O[N2] complexityinherent in its basic structure. We will see later in Chapter 3 that the 2-D LMSalgorithm and the 2-D McClellan transformation adaptive filter both meet this


requirement, although the more powerful algorithms such as the RLS, quasi-Newton,and Preconditioned Conjugate Gradient algorithms have computational demands thatsignificantly exceed this fundamental design goal.

1.5.1 The LMS Algorithm

Since the conventional LMS algorithm for direct form FIR structures is welldeveloped in the current literature, it will not be necessary to give a comprehensivetreatment of it here. The brief summary presented below will help to set the stagefor further discussions throughout the book where the LMS algorithm is used as abaseline against which more advanced algorithms and structures are compared. Thetransform domain adaptive filter that is discussed in Section 1.5.2 is a generalizationof the original LMS FIR structure, in which a linear transformation is performed onthe input signal in order to improve the learning characteristics of the original LMSfilter. If the linear transformation matrix is replaced by an identity matrix in thetransform domain structure, the conventional LMS filter results. In Section 1.5.2the transform domain adaptive filter is treated in considerable detail because it forms acentral concept needed in the development of fault tolerant adaptive filters inChapter 4. Also it plays an important role in the nonlinear Volterra filters discussedin Chapter 5.

1.5.1.1 The LMS Gradient Approximation

The LMS algorithm is well known to be an approximation to the steepestdescent optimization strategy. The fact that the entire field of adaptive signalprocessing began with an elementary principle from optimization theory suggeststhat our search for more advanced adaptive algorithms should prosper byincorporating other results that may be well known in the field optimization, butperhaps are not yet fully exploited in adaptive filtering practice. This point of viewwill recur throughout this book, as many concepts are borrowed from the field ofoptimization and modified for adaptive filtering as needed . Some of the borrowedideas that appear in later chapters include the Gauss-Newton, quasi-Newton,Preconditioned Conjugate Gradient, and accelerated optimization strategies. Theselearning strategies are sometimes combined with block processing techniques fromsignal processing to achieve computational efficiencies that are attractive for realtime applications.

For a length N FIR filter with the input expressed as a column vector x(n) =[x(n), x(n-l), ... , x(n-N+l)jT, the filter output yen) is easily expressed as

yen) =wT(n)x(n), (1.13)

where wen) = [wo(n), wl(n), . . . , wN_l(n)]T is the time varying vector of filtercoefficients (tap weights), and the superscript "T" denotes vector transpose. Asdiscussed previously, the output error is formed as the difference between the filteroutput and a training signal d(n), i.e. e(n) =den) - yen). Strategies for obtaining an


appropriate den) vary from one application to another. In many cases the availabilityof a suitable training signal determines whether an adaptive filtering solution will besuccessful in a particular application. The ideal cost function is defined by the meansquared error (MSE) criterion, equation (1. I). The LMS algorithm is derived byapproximating the ideal cost function by the simplest form of equation (1.2) whereonly one term (i=O) is used in the summation, resulting in hMS(n) = le(n)12. Whilethe LMS seems to make a rather crude approximation at the very beginning, theapproximation results in an unbiased estimator. In many applications the LMSalgorithm is quite robust and is able to converge rapidly to a small neighborhood ofthe Wiener solution..

The steepest descent optimization strategy is given by

(I.I4)

where VE[e2 ] (n) is the gradient of the cost function with respect to the coefficient

vector wen). When the gradient is formed using the LMS cost function JLMS(n) =le(n)12, the conventional LMS results :

w(n+ I) = wen) + Jle(n) x(n),

ande(n) = den) - yen),

yen) = x(n)Twi(n) .

(I.IS)

(Note: Many sources include a "2" before the Jl factor in equation (1.15) because thisfactor arises during the derivation of (I.IS) from (1.14) . In this discussion weassume this factor is absorbed into the 11, so it will not appear explicitly.) Since theLMS algorithm is so well documented in the literature and so frequently used inpractice, we will not present any further derivation or analysis of it here . (Theinterested reader is referred to [1.3] .) However, a few observations at this time willbe useful later when other algorithms are compared to the LMS as a baseline design .

A first observation is that the order of complexity of the LMS algorithm isO[N] . For the sake of this discussion assume that all of the signals and filtervariables are real-valued. The filter itself requires N multipl ications and N-Iadditions to produce yen) at each value of n. The coefficient update algorithmrequires 2N multiplications and N additions, resulting in a total computationalburden of 3N multiplications and 2N-I additions per iteration. Since N is generallymuch larger than the factor of three, the order of complexity of this algorithm isclearly O[N]. Furthermore since the order of complexity of the filter is the same asthat of the coeffi cient update computation, the design goal stated earlier of achievinga complexity for the learning algorithm that is no greater than that of the filter itselfhas been achieved with the LMS algorithm .

A second observation is that the cost function given by equation (1.10) is thesame one used to develop the RLS algorithm. This implies that the LMS algorithm


is a simplified version of the RLS algorithm, where averages are replaced by singleinstantaneous terms. While this observation is probably of little use in practice,from a research point of view it is interesting that the two algorithms have acommon starting point.

A third observation, which will be clarified by. further discussion in Section1.5.3, is that the (power normalized) LMS algorithm is also a simplified form of thetransform domain adaptive filter which results by setting the transform matrix to theidentity matrix.

A fourth observation, which will also become clearer from the discussion inSection 1.5.4, is that the LMS algorithm is also a simplified form of the GaussNewton optimization strategy whieh introduces second order statistics (the inputautocorrelation function) to accelerate the rate of convergence. In order to obtain theLMS algorithm from the Gauss-Newton algorithm, two distinct approximationsmust be made. The gradient must be approximated by the instantaneous errorsquared, and the inverse of the input autocorrelation matrix must be crudelyapproximated by the identity matrix.

These observations suggest that many of the seemingly distinct adaptivefiltering algorithms that appear scattered about in the literature are indeed closelyrelated, and can be considered to be members of a family whose hereditarycharacteristics have their origins in Gauss-Newton optimization theory. The differentmembers of this family inherit their individual characteristics from approximationsthat are made on the pure Gauss-Newton algorithm at various stages of theirderivations. However, after the individual derivations are complete and eachalgorithm is packaged in its own algorithmic form, the algorithms look considerablydifferent from one another. Unless a conscious effort is made to reveal theircommonality, the fact that they have evolved from common roots may be entirelyobscured.

1.5.1.2 Convergence Properties of the LMS Adaptive Filter

It is well established in the existing literature [1.3] that the convergencebehavior of the LMS algorithm, as applied to a direct form FIR filter structure, iscontrolled by the autocorrelation matrix Rx of the input process, where

Rx == E[x*(n)xT(n)]. (1.l6)

(The * in equation (1.l6) denotes complex conjugate to account for the general caseof complex input signals, although throughout most of the following discussions itwill be assumed that x(n) and d(n) are both real-valued signals .) The autocorrelationmatrix R x is usually positive definite, which is one of the conditions necessary toguarantee convergence to the Wiener solution. Another necessary condition forconvergence is 0 < Jl < lIAmax, where Amax is the largest eigenvalue of Rx. It isalso well established that the convergence of this algorithm is directly related to theeigenvalue spread of Rx. The eigenvalue spread is measured by the condition numberof R x, defined as K =Amax/Amin, where Amin is the minimum eigenvalue of R x.


Ideal conditioning occurs when x =1 (white noise); as this ratio increases, slowerconvergence results . The eigenvalue spread (condition number) depends on thespectral distribution of the input signal, and can be shown to be related to themaximum and minimum values of the input power spectrum [1.13]. From this lineof reasoning it becomes clear that white noise is the ideal input signal for rapidlytraining an LMS adaptive filter. The adaptive process becomes slower and requiresmore computation for input signals that are more severely colored.

Convergence properties are reflected in the geometry of the MSE surface,which is simply the mean squared output error E[e(n)2] expressed as a function of theN adaptive filter coefficients in (N+ Ij-space. An expression for the error surface ofthe direct form filter is [1.5]

(1.17)

with Rx defined in (1.16) and z == W-Wopt, where Wopt is the vector of optimumfilter coefficients in the sense of minimizing the mean squared error (wopt is theWiener solution [1.44]). An example of an error surface for a simple two-tap filter isshown in Figure 1.7. In this example x(n) was specified to be a colored noise inputsignal with an autocorrelation matrix

[1.0 0.9].Rx =0.9 1.0

Figure 1.7 shows three equal-error contours on the three dimensional surface. Theterm z*TRxz in equation (1.14) is a quadratic form that describes the bowl shape ofthe FIR error surface. When Rx is positive definite , the equal-error contours of thesurface are hyperellipses (N dimensional ellipses) centered at the origin of thecoefficient parameter space. Furthermore, the principle axes of these hyperellipsesare the eigenvectors of Rx, and their lengths are proportional to the eigenvalues ofRx [1.2, 1.3]. Since the convergence rate of the LMS algorithm is inversely relatedto the ratio of the maximum to the minimum eigenvalues of Rx, large eccentricityof the equal-error contours implies slow convergence of the adaptive system. In thecase of an ideal white noise input, Rx has a single eigenvalue of multiplicity N, sothat the equal-error contours are hyperspheres.

1.5.2 The Transform Domain Adaptive Filter

One of the earliest works on transform domain adaptive filtering was apublication in 1978 by Dentino et. al. [1.5], in which the concept of adaptivefiltering in the frequency domain was proposed. Many publications have sinceappeared that served to further develop the theory and to expand the current knowledgeof performance characteristics for this class of adaptive filters . In addition to thediscrete Fourier transform (DFT), other orthogonal transforms such as the discretecosine transform (DCT) and the Walsh Hadamard transform (WHT) can also be

[1.0 0.9]

Rx =0.9 1.0


10

I "I '"I ', t: ~ .... '"

\ I "".... \ " .... ''\ .. .... \

\ .... I ... .... \, \ \.... ....\ \, ....

.... \ \ ' ....\ \ \ \ \, \

-10

10

23

Figure 1.7 A 2-D error surface with a colored input signal.

effectively used as a means to improve the LMS algorithm without adding too muchadditional computational complexity . For this reason, the more general termtransform domain adaptive filtering is used throughout the book to mean that theinput signal is preprocessed by decomposing the input vector into orthogonalcomponents, which are in turn used as inputs to a parallel bank of simpler adaptivesubfilters. With an orthogonal transformation, the adaptation actually takes place inthe transform domain, as it is possible to show that the adjustable parameters areindeed related to an equivalent set of time domain filter coefficients by means of thesame transformation that is used for the real time processing [1.26] .

1.5.2.1 Orthogonalization and power normalization

The transform domain adaptive filter (TDAF) structure is shown in Figure1.8. The input x(n) and the desired signal d(n) are assumed to be zero mean andjointly stationary. The input to the filter is a vector of N current and past inputsamples, defined in the previous section and denoted as x(n). This vector isprocessed by a unitary transform, such as the DFf. Once the filter order N is fixed,the transform is simply an N x N matrix T, which is in general complex, withorthonormal rows. The transformed outputs form a vector v(n) which is given by

v(n) = [vo(n). vjm), ... • vN_I(n)]T = Tx(n).

With an adaptive tap vector defined as

(1.18)


wen) =[wo(n), WI (n), .. . , WN-I(n)]T

the filter output is given by

yen) = wT(n)v(n) = WT(n)Tx(n).

The instantaneous output errore(n) = den) - yen)

(1.19)

(1.20)

(1.21)

is then formed and used to update the adaptive filter taps using a modified form of theLMS algorithm [1.26]:

where

W(n+1) = Wen) + J.l.e(n)A-2v*(n)

A2 = di [ 2 2 2]- lag 0'1 • 0'2 •...• aN(1.22)

As before, the superscript asterisk in (1.22) indicates complex conjugation to accountfor the most general case in which the transform is complex. Also, the use of theupper case coefficient vector in equation (1.22) denotes that Wen) is a transformdomain variable. The power estimates O'i2 can be developed on-line by computing anexponentially weighted average of past samples according to

x(n)----t

x(n-l)

x(n-N+ 1)

NxNLinear

Transform

Figure 1.8 The transform domain adaptive filter structure.

Introductionand Background

Gc c c I.

25

(1 .23 )

If (jj2becomes too small due to an insufficient amount of energy in the i-th channel,the update mechanism becomes ill-conditioned due to a very large effective step size.In some cases the process will become unstable and register overflow will cause theadaptation to catastrophically fail. So the algorithm given by (1.22) should have theupdate mechanism disabled for the i-th orthogonal channel if (jj2 falls below a criticalthreshold.

Sometimes the transform domain algorithm is stabil ized by adding smallpositive constants E to the diagonal elements of A2 according to

(1.24)

Then 'A2 is used in place of A2 in equation (1.22) . For most input signals (jj2 »E, and the inclusion of the stabilization factors is transparent to the performance ofthe algorithm. However, whenever (jj2 se E, the stabilization terms begins to have asignificant effect. Within this operating region the power in the channels will not beuniformly normalized and the convergence rate of the filter will begin to degrade.

The motivation for using the TDAF adaptive system instead of a simplerLMS based system, is to achieve rapid convergence of the filters coefficients whenthe input signal is not white . In the following section this convergence rateimprovement of the TOAF will be explained geometrically. Also, in Chapter 4 theTDAF structure will be used as the start ing point in the design of fault tolerantadaptive filters. In that context it will be assumed that all of the advantages of theTOAF structure discussed here for convergence rate improvement will be retainedthere, while simultaneously providing an appropriate transform structure forachieving a fault tolerant design.

1.5.2.2 Convergence properties of the TDAF

In this section a description of the convergence rate improvement of theTOAF is developed in terms of the characteristics of the mean squared error surface .From [1.26] we have Rv =T·RxTT, so that for the transform structure withoutpower normalization equation (1.17) becomes

(1.25)

The difference between (1.17) and (1.25) is the presence of T in the quadratic term of(1.25). When T is a unitary matrix, its presence in (1.25) gives a rotation and/or areflection of the surface. The eccentricity of the surface is unaffected by thetransform, so the convergence rate of the system is unchanged by the transformationalone.

However, the signal power levels at the adaptive coefficients are changed bythe transformation. Consider the intersection of the equal-error contours with the


rotated axes: letting x =[0 .. . zi ... O]T, with zi in the i-th position, equation(1.3) becomes

(1.26)

If the equal-error contours are hyperspheres (the ideal case), then for a fixed value ofthe error J(n), (1.26) must give IZiJ =IZjl for all i and j , since all points on ahypersphere are equidistant from the origin. When the filter input is not white, thiswill not hold in general. But since the power levels crj2 are easily estimated, therotated axes can be scaled to have this property. Let A-Ii =z, where A is defined in(1.22). Then the error surface of the TDAF, with transform T and including powernormalization, is given by

(1.27)

The main diagonal entries of A-IT*RxTTA-I are all equal to one, so (1.26) becomesJ(z) - Jmin =if, which has the property described above.

Thus the action of the TDAF system is to rotate the axes of the filtercoefficient space using a unitary rotation matrix T, and to then scale these axes sothat the error surface contours become approximately hyperspherical at the pointswhere they can be easily observed, i.e., the points of intersection with the new(rotated) axes. Usually the actual eccentricity of the error surface contours is reducedby this scaling, and faster convergence is obtained.

As a second example, transform domain processing is now added to theprevious example, as illustrated in Figures 1.9 and 1.10. The error surface of Figure1.9 was created by using the (arbitrary) transform

[0.866 0.500]

T = 0.500 0.866 '

on the error surface shown in Figure 1.7, which produces clockwise rotation of theellipsoidal contours so that the major and minor axes more closely align with theaxes than they did without the transform. Power normalization was then applied

using the normalization matrix A-I ncr shown in Figure 1.10, which represents the

transformed and power normalized error surface. Note that the elliptical contoursafter transform domain processing are nearly circular in shape, and in fact they wouldhave been perfectly circular if the rotation of Figure 1.9 had brought the contoursinto precise alignment with the coordinate axes . Perfect alignment did not occur inthis example because T was not able to perfectly diagonalize the inputautocorrelation matrix for this particular x(n). Since T is a fixed transform in theTDAF structure, it clearly cannot properly diagonalize Rx for an arbitrary x(n), andhence the surface rotation (orthogonalization) will be less than perfect for most inputsignals. It should be noted here that a well known conventional algorithm called


~-, 10I ,II \

T=[0.886 0.5 ]I~ , \,

I I ,-0.5 0.886

I I I

I I \ \

I II I T [1.779 0.45]

I I" \ \ TR T =

I I I I I x 0.45 0.221I I I , \ \

\ I I I I I

\ \ \ \ \ \I I \

\ I I I I

-10\ \ \ \ \ 10I \ I I I\ I \ I II \ I I\ \ J I \\I \ I I\I I

I II I, , I I

\ ~~

\ II, I, - .-10

Figure 1.9 Error surface for the TDAF with transform T.

27

-1

-10

10 ~-l = [O.~50

~-lTRTT~-l = [ 1.00.718

\ \ \I \ \

..... I I

".. ," ,,---'

10

Figure 1.10 Error surface with transform and power normalization .


Recursive Least Squares (RLS) is known to achieve near optimum convergence ratesby forming an estimate of Rx-I, the inverse of the autocorrelation matrix . This typeof algorithm automatically adjusts to whiten any input signal, and also varies overtime if the input signal is a nonstationary process. Unfortunately, the computationrequired for the RLS algorithm is large and is not easily carried out in real timewithin the resource limitations of many practical applications. The RLS algorithmfalls into the general class of quasi-Newton optimization techniques which arethoroughly treated in numerous places throughout the literature.There are two different ways to interpret the mechanism that brings about improvedconvergence rates achieved through transform domain processing. The first point ofview considers the combined operations of orthogonalization and powernormalization to be the effective transformation A-I T, an interpretation that isimplied by equation (1.27). This line of thinking leads to an understanding of thetransformed error surfaces as illustrated by example in Figures 1.9 and -1.10, andleads to the logical conclusion that the faster learning rate is due to the conventionalLMS algorithm operating on an improved error surface that has been rendered moreproperly oriented and more symmetrical via the transformation. While this point ofview is useful in understanding the principles of transform domain processing, it isnot generally implementable from a practical point of view. This is because for anarbitrary input signal, the power normalization factors that constitute the A-I part ofthe input transformation are not known apriori, and must be estimated after T is usedto decompose the input signal into orthogonal channels.

The second point of view interprets the transform domain equations asoperating on the transformed error surface (without power normalization) with amodified LMS algorithm where the step sizes are adjusted differently in the variouschannels according to /len) =2/lA- I, where /len) =[/li,i(n)] is a diagonal matrix thatcontains the step size for the i-th channel at location (i.i). The dependence of the/li ,i(n)'s on the iteration (time) index n acknowledges that the steps sizes are afunction of the power normalization factors, which are updated in real time as part ofthe on-line algorithm. This suggests that the TDAF should be able to tracknonstationary input statistics within the limited abilities of the transformation T toorthogonalize the input and within the accuracy limits of the power normalizationfactors . Furthermore, when the input signal is white all of the ol's are identical andeach is equal to the power in the input signal. In this case the TDAF with powernormalization becomes the conventional normalized LMS algorithm [1.13] .

It is straightforward to mathematically show that the above two points ofview are indeed identical [1.23]. Let v(n) E A-1Tx(n) =A-Iv(n) and let the filtertap vector be denoted wen) when the matrix A-I T is treated as the effectivetransformation. For the resulting filter to have the same response as the filter inFigure (1.8) we must have

vT (n)w =yen) =vTw =vT (n)A-1w, V yen) (1.28)


which implies that w = 11-1.w. If the tap vector wis updated using the LMSalgorithm, then

w(n+l) = X lw(n+l)=A-I[w(n) + ,ue(n)v*(n)]

= A-Iw(n) + ,ue(n)A-Iv * (n)

= wen) + ,ue(n)A-2 v * (n), (1.29)

which is precisely the algorithm (1.22). This analysis demonstrates that the twointerpretations are consistent, and that they are in fact alternate ways to explain thefundamentals of transform domain processing.

1.5.2.3 Discussion

It is clear from the above development that the power estimates ol are theoptimum scale factors, as opposed to 100ii or some other statistic. Also, it issignificant to note that no convergence rate improvement can be realized withoutpower normalization. This is the same conclusion reached by Lee and Un [1.l6]when they analyzed the frequency domain LMS algorithm with a constantconvergence factor., i.e ., when power normalization was omitted from the errorsurface description of the TDAT's operation, it is seen that an optimal transformrotates the axes of the hyperellipsoidal equal-error contours. The prescribed powernormalization scheme then gives the ideal hypershperical contours, and theconvergence rate becomes the same as if the input were white. The optimaltransform is composed of the orthonormal eigenvectors of the input autocorrelationmatrix, and is known in the literature as the Karhunen-Loe've Transform (KLT) .The KLT is signal dependent and usually cannot be easily computed in real time . Itis interesting to note, however, that real signals have real orthogonal KLT's , whichsuggests the use of real transforms in the TDAF (as opposed to complex transformssuch as the DFT).

Since the optimal transform for the TOAF is signal dependent, a universallyoptimal fixed parameter transform can never be found. It is also clear that once thefilter order has been chosen, any unitary matrix of correct dimensions is a possiblechoice for the transform; there is no need to restrict attention to classes of knowntransforms. In fact, if a prototype input power spectrum is available, its KLT can beconstructed and used . One factor that must be considered in choosing a transform forreal time applications is computational complexity. In this respect, real transformsare superior to complex ones, transforms with fast algorithms are superior to thosewithout, and transforms whose elements are all powers-of-two are attractive sinceonly additions and shifts are needed to compute them . Throughout the literature thediscrete Fourier transform (OFT) , the discrete cosine transform (DCT), and the WalshHadamard transform (WHT) have received considerable attention as possiblecandidates for use in the TDAF [1.21]. In spite of the fact that the OFT is a


complex transform and not computationally optimal from that point of view, it isoften used in practice because of the availability of efficient FFr algorithms. InChapter 4 the TOAF with the OFT will be further analyzed as a platform forincorporating fault tolerance into an adaptive filter design.

1.5.3 Quasi-Newton Adaptive Algorithms

The dependence of the adaptive system's convergence rate on the input powerspectrum can be reduced by using second order statistics via the Gauss-Newtonmethod [1.20]. The Gauss-Newton algorithm is well known in the field ofoptimization as one of the basic accelerated search techniques. In recent years it hasalso appeared in various forms in publications on adaptive filtering. In this section abrief introduction to quasi-Newton adaptive filtering methods is presented, and a fastquasi-Newton algorithm is described in considerable detail for 1-0 FIR adaptivefilters. This algorithm is highlighted in this discussion because it forms a basis forthe fast 2-0 quasi-Newton algorithm that is developed later in Chapter 3.

The basic Gauss-Newton coefficient update algorithm for an FIR adaptivefilter is given by

w(n +1) =w(n) - ,uH(n)V E[e2j(n) (1.30)

where H(n) is the Hessian matrix and VE[e2 (n) is the gradient of the cost functionat iteration n. For an FIR adaptive filter witb a stationary input the Hessian is equalto Rx-1. If the gradient is estimated with the instantaneous error squared, as in theLMS algorithm, the result is

w(n +1)=w(n) + ,ue(n)R~l(n)X(n), (1.31)

where R~l (n) an estimate of Rx"1 that varies as a function of the index n. Equation(1.31) characterizes the quasi-Newton LMS algorithm. Note that (1.30) is thestarting point for the development of many practical adaptive algorithms that can beobtained by making approximations to one or both ofthe Hessian and the gradient.Therefore we typically refer to all such algorithms derived from (1.30) as the familyof quasi-Newton algorithms.

The autocorrelation estimate Rx(n) is constructed from data received up totime step n. It must then be inverted for use in (1.31). This is in general an O[N3]operation, which must be performed for every iteration of the algorithm. However,the use of certain autocorrelation estimators allows more economical matrixinversion techniques to be applied. Using this approach, the conventional sequentialregression algorithm [1.10, 1.43] and the recursive least squares (RLS) algorithm[1.11, 1.12]] achieve quasi-Newton implementations with a computationalrequirement of only O[N2].

The RLS algorithm is probably the best known member of the class of quasiNewton algorithms. The drawback that has prevented its widespread use in real time


signal processing is its O[N2] computational requirement, which is still too high formany applications (and is an order of magnitude higher than the order of complexityof the FIR filter itself) . This problem appeared to have been solved by theformulation of O[N] versions of the RLS algorithm. Unfortunately, these moreefficient forms of the RLS tend to be numerically ill-conditioned. They are oftenunstable in finite precision implementations, especially in low signal-to-noiseapplications or where the input signal is highly colored [1.1, 1.3]]. This behavior iscaused by the accumulation of finite precision errors in internal variables of thealgorithm, and is essentially the same source of numerical instabilities that occur inthe basic O[N2] RLS algorithm, although the problem is greater in the O[N} casesince these algorithms typically have a larger number of coupled internal recursions.Considerable work has been reported to stabilize the O[N2] RLS algorithm, and toproduce a numerically robust O[N] RLS algorithm [1.9,1.17, 1.18, 1.24].

1.5.3.1 A fast quasi-Newton algorithm

The quasi-Newton algorithms discussed above achieve reduced computationthrough the use of particular autocorrelation estimators which lend themselves toefficient matrix inversion techniques. This section reviews a particular quasi-Newtonalgorithm that was developed to provide a numerically robust O[N] algorithm [1.21,1.23] . Considerable detail is given for this particular I-D algorithm in order toprovide sufficient background for the 2-D extension of this algorithm that is treatedin Chapter 3.

To derive the O[N] fast quasi-Newton (FQN) algorithm, a differentautocorrelation matrix estimate is used, which permits the use of more robust andefficient computation techniques . Assuming stationarity, the autocorrelation matrixRx has a high degree of structure; it is symmetric and Toeplitz, and thus has only Nfree parameters, the element s of the first row. This structure can be imposed on theautocorrelation estimate, since this incorporates prior knowledge of theautocorrelation into the estimation process. The estimation problem then becomesthat of estimating the N autocorrelation lags fj, i = 0, . . . , N- I, which comprise thefirst row of Rx. The autocorrelation estimate is also required to be positive definite,to ensure the stability of the adaptive update process.

The standard positive semidefinite autocorrelation lag estimator for a block ofdata is given by [1.29]

fi = _1_ IX(k - i)x(k) ,M + I k=i

(1.32)

where x(k), k=O, ... , M, is a block of real data samples, and i ranges from °to M.However, the preferred form of the estimation equation for use in an adaptive system,from an implementation standpoint, is an exponentially weighted recursion. Thus(1.32) must be expressed in an exponentially weighted recursive form, withoutdestroying its positive scmidefiniteness property . Consider the form of the sum in


equation (1.32): it is the (deterministic) correlation of the data sequence x(k), k =0, .. . , M, with itself. Thus rj, i =0, .. . , M. is the deterministic autocorrelationsequence of the sequence x(k). (note that rj must also be defined for i =-M, ... •-I, according to the requirement that rj =r_j) In fact, the deterministicautocorrelation for any sequence is positive semidefinite. The goal of exponentialweighting, in a general sense, is to weight recent data most heavily and to forget olddata by using progressively smaller weighting factors. To construct an exponentiallyweighted, positive definite autocorrelation estimate, we must weight the data first,then form its deterministic autocorrelation, to guarantee positive semidefiniteness.At time step n, the available data is x(k), k =0, ... , n. If these samples areexponentially weighted using .J(i, the result is a(n-k )/2x(k), k = 0, ... , n.Using (1.32) and assuming n > N-l , we then have

rj (n) = r[a(n-k+i)l2x(k - i)][a(n-k)/2x(k)]k=i

n-I .= a L a(n-l-k)al12x(k - i)x(k)

k=i(1.33)

+a j / 2x(n - i)x(n)

= arj (n -1) + a il 2x(n - i)x(n)

for i =O•.. . , N-l .

A normalization term is omitted in (1.33), and initialization is ignored . With regardto the latter point, the simplest way to consistently generate rj(n) for 0 $ n $ N-l isto assume that x(n) =0 for n < O. set ri(-I)=Ofor all i, and then use the aboverecursion. A small positive constant 0 may be added to ro(n) to ensure positivedefiniteness of the estimated autocorrelation matrix.

With this choice of an autocorrelation matrix estimate, a quasi-Newtonalgorithm is determined. Thus the fast quasi-Newton (QN) algorithm is given by(1.31) and (1.33) , where Rx(n)::= Rx is understood to be the Toeplitz symmetricmatrix whose first row consists of the autocorrelation lag estimates rj (n) , i = 0, .. ,N-l, generated by (1.32). The step size Il for the FQN algorithm is given by

(1.34)

This step size is used in other quasi-Newton algorithms [1.10, 1.11l. and seemsnearly optimal. The parameter E is intended to be small relative to the average value

of xT (n)R~I(n -I)x(n). Then the normalization term omitted from (1.33) , which


is a function of ex, but not of i, cancels out of the coefficient update , since R~l (n)

appears in both the numerator and the denominator. Thus the normalization can besafely ignored.

1.5.3.2 Efficient implementation of the FQN algorithm

Although the FQN algorithm may appear to be an arbitrary member of thequasi-Newton class, it is distinguished by the fact that it can be implemented veryefficiently. The greatest computational burden involved in the implementation of ageneral case quasi-Newton algorithm is in the inversion of the autocorrelation matrixestimate for use in the coefficient update. The unique feature of the FQN algorithmis that its autocorrelation matrix estimate is Toeplitz and symmetric . This structurepermits the use of special techniques for processing the autocorrelation estimate.

The first of these techniques is the well known Levinson recursion for solvingsystems of equations [1.28, 1.32]. Since the Levinson algorithm is a well knownnumerical method, its details will not be presented here, although the interested readercan find more details in terms of the adaptive filtering application in reference [1.23]The inputs to the Levinson recursion are a Toeplitz matrix Rx (n) and a vector x(n).The output is the product R~I (n), rather than the inverse matrix R~I (n) itself. Butthis is precisely the term needed to implement the coefficient update (1.31) The totalamount of computation required for one Levinson recursion is known to be (2N2 N) multiplications/divisions and (2N2 - 3N + I) additions and subtractions. Thusthe Levinson recursion provides an O[N2] implementation of the FQN algorithm. Ifa major improvement in the computational requirement is to be achieved, theLevinson recursion obviously cannot be executed at every iteration of the adaptivealgorithm.

Recall that the ideal input to an adaptive system is required to be stationary.In practice, the input may vary slowly relative to the adaptive algorithm's rate ofconvergence. The convergence times of practical adaptive algorithms are usually atleast several multiples of the adaptive filter length ; convergence in N iterations is thetheoretical optimum. Given this fundamental limit on the allowable rate of variationof the input statistics, it is clear that an input correlation estimate which is accurateat time n will remain accurate for approximately N to 2N time steps thereafter . Thisobservation suggests that the inverse autocorrelation matrix estimate R~l(n) , whichbecomes available at time n, can be reasonably used in the coefficient update for thefollowing N time steps. That is, the autocorrelation estimate used in (1.31) can beheld constant during each block of N time steps, and refreshed only at the beginningof each block. Little or no degradation of the algorithm's performance should result,since the input autocorrelation does not change significantly over a period of N timesteps . The importance of this modification of the FQN algorithm is that a matrixinversion is now required only once every N time steps. Using the Levinsonrecursion, the average amount of computation required is now O[N2]/N, or O[N].Note that the autocorrelation lag estimates (1.33) should still be updated at each timestep, so that an accurate autocorrelation estimate is maintained, although the


coefficient update process will use only every N-th estimate. The order of thecomputation achieved by this scheme could also be achieved using blocks whoselengths were multiples or submultiples of N. These alternatives could be used toreduce computation further in applications where the input varies very slowly, or toimprove performance with relatively rapidly varying inputs, respectively.

There is one thing missing in the implementation discussed above. TheLevinson recursion is to be applied at time n (the beginning of a new block) togenerate the term R~l(n)x(n) . The inverse autocorrelation estimate R~l(n) is thento be used to compute the next N coefficient vectors w(n+I), w(n+2), ,w(n+N) . These computations require the vectors R~l(n)x(n), R~l(n)x(n + I) , ,R~1 (n)x(n + N -I). But the Levinson recursion generates only the first of thesevectors , and R~1 (n) is not available to compute the others (even if R~1 (n) wereavailable, the O(N2) matrix-vector multiplications would be too costly). The specialstructure of the FQN algorithm's autocorrelation matrix estimate again allows anefficient solution to this problem. Given R~I(n)x(k), and a new input samplex(k+ 1), it is mathematically possible to compute R~I(n)x(k+ 1) with O{N}computational complexity. Note that the sliding property of the tapped delay line(FIR) input vector is needed here, i.e ., this approach cannot be used with filterstructures that do not have sliding input vectors.

An efficient technique for performing the computation described above, andreferred to as the MKC algorithm after its inventors , is described in detail in [1.21].The MKC algorithm uses byproducts from the Levinson inversion of theautocorrelation matrix inverse. It is in fact closely related to the Levinson recursion.The required operations for the MKC algorithm are 3N multiplications/divisions and3N additions/subtractions. The MKC algorithm will also recursively generate theterm xT (n)R~l(n)x(n) for use in the step size equation, at a cost of only 2multiplications and 3 additions/subtractions (these are included in the totalcomputation stated above) . Thus this is an efficient technique that can be used torecursively generate the vectors R~l (n)x(n + 1), R~1 (n)x(n + 2), . . . ,R~l (n)x(n + N - I) needed to update the filter coefficients during each block of N timesteps.

This completes the basic outline of an O[N] implementation for the FQNalgorithm. Both the Levinson and MKC techniques rely on the symmetry propertiesof the autocorrelation matrix estimate developed for the FQN, and the latter alsoexploits the sliding property of the input vector in an FIR filter structure. It shouldbe noted that the idea of inverting the autocorrelation estimate only once per block ofN samples cannot be profitably applied to the RLS algorithm. Inversion of the RLSautocorrelation matrix estimate is an O[N3} operation when it is not performedrecursively on each autocorrelation estimate, and hence nothing equivalent to theMKC algorithm is available to generate the intermediate gain vectors.

The fast FQN implementation described above achieves an averagecomputation of O[N] operations per time step by performing the O[N2] Levinsonrecursion only once every N time steps. Thus, this implementation appears to havean O[N2] bottleneck, since once every N time steps an O[N2] complexity step is


needed for the Levinson recursion. However, it was shown in [1.20] that thecalculations of the Levinson recursion can also be distributed over the N time steps ,with only O[N] computations done an anyone time step. Essentially the Levinsonrecursion is computed partially at each step, so the final calculation ofR~l(n)x(n+ 1) is completed just when the x(n+l)-th input sample becomesavailable. Therefore the final overall result is an O{N} fast quasi-Newton algorithmthat circumvents many of the numerical sensitivities of the conventional RLSalgorithm. The interested reader is referred to [1.21, 1.23] for the complete details ofthe FQN algorithm, as well as an in-depth experimental verification of itsperformance.

The above discussion of the FQN algorithm presented in this section is notintended to establish the superiority of one quasi-Newton algorithm over another, norto promote the usage of any particular algorithm in a given application. But ratherthe discussion is intended to illustrate how useful results from signal processing,optimization theory, and numerical analysis can be combined to develop newadaptive algorithms that offer tradeoffs in computational complexity and robustperformance .

The FQN algorithm is extended to the case of 2-D FIR adaptive filters inChapter 3. While the simple Toeplitz property for the autocorrelation matrix is lostin the 2-D case, the 2-D autocorrelation is found to have a Toeplitz block-Toeplitzstructure that can be exploited to achieve a 2-D FQN algorithm that is similar to theFQN for the I-D case .

1.5.4 Adaptive Lattice Algorithms

Another FIR adaptive filter structure that has received a great amount ofattention in the literature is the adapti ve FIR lattice filter [1.12, 1.13]. In fact, thereis so much literature on the analysis and performance of adaptive lattice filters thatwe will make no attempt to give this subject a comprehensive treatment here . But itis important for us to establish its posit ion in the hierarchy of available adaptivefilter structures, and to summarize its well known performance characteristics incomparison to the algorithms treated in this book.

The basic analog lattice filter has existed in the circuit theory literature fordecades, where is it known as a low sensitivity filter structure that is closely relatedto the low sensitivity lossless ladder structures frequently preferred by analog filterdesigners. During the 1970's both the ladder and lattice structures were investigatedfor their suitability in (non-adaptive) conventional digital filters . Since thesestructures have low coefficient sensitivities, it was expected that they could achieveaccurate filter characteristics with minimal coefficient world length, and that theiroverall performance would be more robust to arithmetic quantization noise . Whilethe digital lattice filters do indeed provide many of these advantages, they did notreceive too much attention for conventional (fixed coefficient) digital filters,probably because the simplicity of the FIR direct form structure was more attractivein most low frequency applications.


In the mid 1970's the lattices were "rediscovered" for use in adaptive filters.In this time frame Texas Instruments, Inc. produced a speech synthesis chip that wasused in the popular children's toy of the day, called "Speak 'n Spell". The Speak 'nSpell speech synthesizer used a rather advanced speech synthesis technique based onthe adaptive lattice predictor. The reasons why Texas Instrument's engineers selectedthe adaptive lattice had mostly to do with the fact that it provided a cheap andaccurate solution that fit the requirements of the speech synthesis problem very well.Their low sensitivity properties permitted the use of short word lengths to be used,and the modular structure of the lattice predictor allowed a single multiply-addcomputational element to be multiplexed to form a higher order lattice from a singlelow order module.

As an adaptive filter, the FIR lattice structure provides a faster convergencerate than the direct FIR structure. The basic lattice structure, shown in Figure 1.11,is also called a lattice joint process estimator (IPE) because there are two distinctestimation processes taking place simultaneously within the structure. The topportion of the structure, the lattice predictor, tracks the input signal by means oflinear prediction, attempting to maintain orthogonality between the input signal andeach of the M+ 1 backward prediction errors. The bottom portion of the structure issimply a linear combiner that functions much like a direct form structure, forming alinear combination of the bj(n)'s while adjusting the wj(n)'s to approximate atraining signal. Many different algorithms have been used to adjust both the kj(n)'s

x(n)

y(n)

Figure 1.11 The FIR lattice adaptive filter structure.


and the wj(n)'s, including the LMS, the direct least squares, and the recursive leastsquares algorithms [1.12].

The JPE structure is similar to the transform domain adaptive filter exceptthat the fixed transform of the TDAF is replaced with a lattice prediction error filter(PEF). The lattice portion of the structure is an implementation of a PEF, which ismade up of a cascade of similar sections (one reason for TI's choice in the Speak 'nSpell chip design). The kj's are adaptively adjusted parameters called the reflectioncoefficients. The lower output of the i-th section, the signal labeled bj(n) in Figure1.11, is the i-th order backward prediction error signal, i.e., it is the prediction errorthatresults from the structure implicitly forming an i-th order linear prediction of theinput signal x(n). The upper signals labeled fj(n) are the corresponding forwardprediction errors . When the top portion of the structure, i.e., the lattice predictor, isproperly converged, the set of backward prediction errors {bj(n)} forms an orthogonalbasis for the input signal space. Therefore, one can interpret the lattice predictor asan "adaptive linear transformation" that decomposes the input signal x(n) into M+ 1orthogonal components, which are then applied to the linear combiner to form acomplete adaptive filter. When these signals are used with the power normalizedLMS algorithm, the best possible convergence rate can be achieved.

The previously mentioned orthogonality property of the backward predictionerrors holds only when the reflection coefficients within each section of the PEF arefixed at their appropriate values. These values are determined by the statistics of theinput process. Since the lattice reflection coefficients are adaptive, they can adjust toproduce orthogonal backward prediction errors regardless of the statistics of the input.The rate at which this occurs appears to be independent of the eigenvalue ratio of theinput autocorrelation [1.12]. In this respect the lattice JPE has a more general usagethan the TDAF. It should also be noted that while the reflection coefficients are stilladapting to the proper values, the backward error signals are not necessarilyorthogonal , and the best possible convergence rate will not be achieved at thesetimes.

In many regards the FIR adaptive lattice filter is an ideal structure, one thatoffers very good learning characteristics while requiring modest hardware resourcesand providing a modular architecture that is attractive to designers . However, it doeshave some drawbacks. Notice that just for the implementation of the lattice predictoritself (not including the parameter update computations) the structure requires 2Mmultiplications, and the linear combiner requires Ms-I , so the overall order ofcomplexity is O[M]. But note that the number of parameters used to define theparameter space is 2M+ 1, as compared to M+ I for an equivalent LMS filter. Thismeans that the lattice filter actually operates in a parameter space whosedimensionality is approximately twice that of the LMS filter . This implies that thelattice requires considerably more computation for the basic filter operation itself, sothere will be more arithmetic quantization noise generated within the filter, and moreparameter estimation error (gradient noise) injected into various nodes of the latticenetwork.


A consequence of this is that the lattice filters tend to suffer from higher noisefloors in the output error when compared to LMS filters. Both analytical andcomputer generated experimental results have indicated that the increased noise floorof the lattice structure is due mostly to noise that originates in the reflectioncoefficients. When experiments were performed on known input signals, with thereflection coefficients calculated from the input statistics and loaded into the latticefilter as fixed parameters, if was found that the linear combiner performed very muchlike the TDAF and no unusually high noise floor occurred in the output error.However, when the reflection coefficients were left to adjust on their own incombination with the coefficients of the linear combiner , the noise floor in theoutput noticeably increased. It is an important observation that when the linearcombiner coefficients were fixed at their optimal values and the reflection coefficientswere adaptively adjusted, the noise floor still noticeably increased, thereby verifyingthat the souce of the increased output noise is the estimation noise in the reflectionscoefficients. It appears that this estimation noise in the reflection coefficients is theprimary source of increased misadjustment of the lattice filters, and that this is aninherent characteristic of a "fully adaptive" lattice structure. These observationssuggest that in stationary environments the lattice structure may not be the bestchoice, since even in with stationarity, there will always be some degree ofestimation associated with the adaptive reflection coefficients. But in manynonstationary applications where the input statistics are slowly varying the latticepredictor may be able to track the changing signal environment better than a fixedtransform LMS filter.

There have been numerous attempts to extend the adaptive lattice structureinto two dimensions, all of which have met with less than total success [1.31, 1.33,1.34]. In 2-D the size of the parameter space grows even more rapidly with filterorder than in the l-D case, and the problem of high noise floors and largemisadjustments in the output error become even more of a problem. Perhaps thesubject of 2-D adaptive lattice filters will yield better results to future researchefforts. But as of today, results in this area have been disappointing, to say the least,leaving us with a current belief that for two dimensional adaptive signal processing ,the alternative 2-D structures and adaptive algorithms that are presented in Chapter 3have a much greater chance for successful application.

1.6 Adaptive Algorithms for IIR Systems

The choice of an infinite impulse response (IIR) adaptive filter structure isoften motivated by a desire to reduce the computational burden of high-order adaptiveFIR filters . The presence of feedback generates an impulse response having largesupport with significant non zero magnitude using substantially fewer parametersthan an equivalent FIR adaptive filter . This "parsimony principle" has fueled aninterest in IIR adaptive filters which has yet to lead to their widespread use inpractice.

There are several reasons why the class of IIR adaptive filters has not receivedthe same level of attention and success as the FIR class:


• The primary concern with I1Rs is their potential to become unstable due tothe movement of poles outside the unit circle during the adaptive learningprocess. Initially this may not seem like so much of a problem when an I1Ris used in a system identification or noise canceling configuration, sincepresumably any unknown system that is interesting enough that we shouldwant to identify it would most certainly be a stable system. The problem is,that even though the filter is initialized at a stable point and will arrive at astable solution after convergence, there is no assurance that the filter willremain stable at all points along the pole trajectory. Examples can foundwhere the optimal pole trajectory starts in a stable region and ends in a stableregion of the Z-plane, but which passes outside the unit circle in order to takea "short cut" to the stable optimal solution. In these cases if the poles areartificially constrained to remain inside the unit circle at every step along thepole trajectory, the step sizes must be kept small and the convergence ratemay suffer a great deal. Therefore, a serious problem that must be solved iswhat to do about stability after a convenient monitoring mechanism is put inplace. For example, a great deal of work has been done with the parallel formI1R structure realized as a parallel interconnection of second order sections[1.28, 1.38]. The pole locations are easy to monitor in each of the secondorder sections, but deciding what to do about temporary instability during theadaptation process is a more difficult problem that has not received too muchattention .

• Due to the feedback in an IIR structure there is an interplay between themovement of the poles and the movement of the zeros, with the net resultthat most pole-zero I1Rs are rather slow to converge . This means that eventhough an IIR may have many fewer coefficients than an equivalent FIR filter,and therefore requires fewer arithmetic operations per iteration, the I1R mayrequire more iterations to reach convergence. If one uses the "total amount ofcomputation to reach convergence" as a measure of performance, it is notdifficult to find examples where the IIR takes so many iterations to converge,that even though it requires many fewer operations per iteration, the totalnumber of operations required to converge is greater than it is for theequivalent FIR filter. However, ali-pole IIRs tend to converge more rapidly.It is the interaction between moving poles and zeros that tends to slowadaptive learning the most.

• Due to the nonlinear dependence of the output error on the denominatorcoefficients in an adaptive IIR filter, the resulting MSE surface is not aquadratic surface, and it may in fact contain local minima that will causepremature termination for gradient search techniques.

• Due to internal feedback IIR adaptive filters are much more sensitive toarithmetic quantization effects than their FIR counterparts. This is

39


particularly troublesome for real time filters implemented in custom designedchips, or for implementations with fixed-point DSP chips. Many IIRadaptive algorithms that work well on large word length floating-pointcomputers will fail miserably when implemented with a short word lengthfixed-point constraint.

In spite of these serious problems, the class of IIR filters remains of great interest forits potential to solve problems which require the synthesis of very long impulseresponses. Although there is considerable literature on the IIR adaptive filters, up tonow this class has appeared largely in research papers [1.6, 1.7, 1.28, 1.36, 1.38] orhas had limited treatment where it appears in textbooks (1.2, 1.19, 1.25, 1.42, 1.43].Recently, a book has been published by Regalia [1.30] that is entirely devoted to thesubject of IIR adaptive filtering in signal processing and control.

The topic of IIR adaptive filters is revisited in several places throughout laterchapters of this book. Section 2.4 develops the use of the Preconditioned ConjugateGradient Method in an effort to improve the learning characteristics of adaptive IIRfilters . Then later in Section 3.3 the adaptive IIR problem is considered in twodimensions. A fast quasi-Newton 2-D algorithm is developed that parallels the I-DFQN algorithm that was discussed earlier in this chapter in Section 1.5.3. Below weintroduce three approaches to the IIR adaptive filtering problem and providebackground that will help the reader with the material that follows in later chapters .

1.6.1 The IIR LMS Adaptive Algorithm

The MSE approximation that led to the conventional LMS algorithm forFIR filters has also been applied to the general class of IIR filters [1.43]. Recall thata direct form IIR digital filter is characterized by a difference equation,

Nb Na

y(n) = L bk x(n - k) + L ~ y(n - k) ,k=O k=]

(1.34)

where the bk'S are the coefficients that define the zeros of the filter and the ak's definethe poles. The LMS adaptive algorithm for IIR filters is derived in a similar manneras in the FIR case, although the recursive relation of equation (1.34) is used insteadof the convolutional sum to characterize the input-output relationship of the filter .The IIR derivation is more complicated because the recursive terms on the right sideof (1.34) depend on past values of the filter coefficients . We will not undertake thecomplete derivation here because the same derivation is carried out in more detail inSection 3.3 for 2-D IIR filters. However, we will make a few comments about thederivation and discuss the consequences of the feedba ck that exists in the IIRstructure.

If the derivatives of the squared error function are calculated using the chainrule, so that all first order dependencies arc taken into account, the result is


yo _ [d(e2

(n» d(e2(n»] _ [2 ( ) (k(n ) 2 ( ) (k(n)]

E[e2] - da ' Ob - e n da ' e n Ob

= [ -2e(n) d~n) ,-2e(n) dc;,n)J

41

where

and

dy(n) = x(n _ k) + faj(n) dy(n - j) k =0, ... , Nbdbk j=! dbk

(1.3Sa)

(1.3Sb)

This procedure does not generate a closed form expression for the gradient as it did inthe FIR case, but it does result in a mechanism by which the gradient can begenerated recursively using equation (1.35). Note that equation (1.35a) consists ofNb "all-pole" filters, where the input to the k-th filter is simply the input signalshifted by k time steps; similarly, equation (1.35b) represents a similar bank offilters , but where the input to the k-th filter is the shifted output signal y(n-k) .Strictly speaking, equations (1.35) require Na+Nb distinct filters , one for each of thegradient terms. However, in practice it is usually assumed that the filter coefficientsare slowly varying in comparison to the signal frequency content, so that within ashift of Na or Nb time samples, the filters can be treated as though they are timeinvariant. With this assumption all of the gradient components represented by(1.35a) can be generated by a single all-pole gradient filter whose input is x(n).

Similarly, the gradient terms in (1.35b) can be produced with a second filterthat is identical to the first , but which uses y(n) as the input signal. Then timeshifted outputs from these filters will serve as accurate approximations to the truegradient components represented by equations (1.35). This assumption of"stationarity" in the input and output gradient filters is justified by imposing slowparameter variation by choosing a small step size [1.37]. In the adaptive filteringliterature, the filter represented by equation (1.35a) is typically called the inputgradient filter, whereas the one represented by (1.35b) is the output gradient filter.Similar concepts are found in the fields of automatic control and analog circuittheory, where such filters are typically called input and output sensitivity filters.

It is known that the use of the output error in the formulation of the costfunction prevents bias in the solution due to noise in the desired signal. However,the effect of this recursion is to make the problem nonlinear in terms of thecoefficient parameters. The current filter parameters now depend upon previous filtercoefficients, which are time-varying. This leads to MSE surfaces that are typicallynot quadratic in nature. There are many examples in the literature for which the


MSE surface demonstrates one or more local minima, in addition to the globalminimum [a good summary is found in 1.7]. In these cases the LMS algorithm,being a gradient search technique, may converge to a local minimum, resulting inpoor performance when used in practical applications . Therefore it can be concludedthat as applied to IIR adaptive filters, the LMS algorithm is prone to difficulties . Ingeneral, practicing engineers are reluctance to use it in practical applications due toits uncertain performance .

1.6.2 Equation Error Algorithm

The Least Mean Square Equation Error (LMSEE) method improves upon theunsatisfactory performance of the output error algorithms by post-filtering the outputerror e(n) to produce an equation error e(n), as shown in Figure 1.12 [1.19] . Theoptimization is then performed on a modified cost function that uses e(n) rather thane(n). Intuitively, the post filtering removes the recursion from the algorithm andproduces a better conditioned error surface in the equation error space. Note that thepost filter is an all-zero time-varying filter whose characteristic is determined byA(z,n), the instantaneous denominator of the adaptive filter.

To demonstrate how this algorithm works , consider the followingrelationships which are easily derived from the block diagram of Figure 1.12:

N.e(n) =e(n) - La j (n)e(n - j )

j=\(1.36a)

t

IB(z,n) y(n).A(z,n)

e'(n) L _ 1

e(n)A(z,n) - +

+Jn) B(z) d(n)- -ut A(z)

outpu

x(

inp

Figure 1.12 Block diagram for the equation error strategy.


N. Nb

e(n)=d(n)-y(n)- I,aj(n)d(n-j)+ I,b/n)y(n-j)j=l j =l

43

(1.36b)

(1.36c)

(1.36d)

From (1.36d) is can be seen that e(n) is linear in the aj(n)'s and bj(n)'s as long asd(n) and x(n) are independent of the adaptive coefficients. Since x(n) is anindependent input signal and since d(n) is the output of the unknown system, it isclearthat there is no dependence of e(n) on these signals . Furthermore, the termwithin the brackets of equation (1.36d) is very similar to the output of the adaptivefilter, except rather than using the actual adaptive filter output y(n) in the recursionrelation, the output d(n) of the unknown system is used in the recursion. But y(n) isclose to d(n), especially when the filter is in the neighborhood of convergence, so itsuse in the recursion should be approximately correct.

By optimizing the equation error the LMSEE algorithm transforms thepotentially nasty output error surface into a quadratic equation error surface, so thatgood optimization performance results with a gradient algorithm. The disadvantageof this approach is that the minimization of the equation error is not the same as theminimization of the output error, so the solution that the LMSEE algorithm finds isnot necessarily the same as the Wiener solution for the original problem. Manyexamples have been published in the literature to illustrate that the LMSEEalgorithm will sometimes produce biased solutions [1.6].

1.6.3 Output Error • Equation Error Hybrid Algorithms

In 1986 Fan and Jenkins [1.6, 1.7] developed a family of new IIR adaptivefiltering algorithms that were designed to incorporate the desirable properties of boththe LMS output error and the LMSEE algorithms. The basic concept is shown inFigure 1.13 as it is applied in a system identification configuration . Since it is theLMSEE post filter that causes the solution to be biased, an all pole filter with thecharacteristic lIA(z,n) is used as a prefilter in order to cancel the effects of the postfilter as the adaptive system reaches a converged condition . While the system isadapting, initially all adapting parts of the system are rapidly time varying, so thepre- and the post-filters of Figure 1.13 will not cancel. During this phase of learningthe algorithm exhibits behavior like the LMSEE algorithm. However, as the systemmoves toward convergence, the rate of change of the parameters slows down


x(n)

input

IIA(z,n) e"(n)

B(z.n)A(z,n)

A(z,n)

B(z)A(z)

yen)

e'en)

den)

output

Figure 1.13 Block diagram of the output error-equation error hybrid strategy.

considerably. and the adaptive elements of Figure 1.13 begin to behave more liketime invariant filters. Eventually the pre- and post-filters approximately cancel eachother, and the behavior of the system takes on more characteristics that resemble theLMS output error algorithm. Reference [1.6] presents an analytical proof showingthat under the ideal conditions of a white noise input, matched orders of the unknownand the adaptive filters, and no measurement noise , that the hybrid algorithm isglobally convergent in spite of the existence of local minima on the MSE surface inthe output error space. Indeed, it has been shown in [1.6, 1.7] that this algorithm isable to "hill climb" from local minima in the hybrid error space (as defined bye"(n)), ultimately reaching the global minimum and the desired Wiener solution thatis identical to that which would have been obtained by the LMS output erroralgorithm. Due to the complexity of the algorithm the general result could not beproved for colored input signals, but computer experiments suggest that thealgorithm also works reasonably well for colored noise inputs , as well as for caseswhere the unknown system and the adaptive filter are unmatched in order. Twodifferent forms of the algorithm were developed for different applications. One iscalled the adaptive filtering mode, while the second is called the system identificationmode [1.7].

Although this hybrid algorithm is a functional combination of two wellknown algorithms that seem to perform well under diverse conditions. it exhibitsseveral features that render it less than completely satisfactory. The first is that thehybrid algorithm is slow to converge. Intuitively this is explainable because theadaptive elements appear in several blocks in different forms. Another troublesomefeature. particularly in the system identification mode, is that both Aiz ,n) andIIA(z,n) are used. If this element has either a large or a small gain there isconsiderable chance that the internal signals will require a large dynamic range whichcould result in numerical instability. A third property is a problem with arithmeticquantization error, relative to the large dynamic range requirement of the structure.


Experiments with this algorithm on a Texas Instruments TMS320 series fixed-pointprocessor revealed that the algorithm may stop adapting prematurely when smallsignals in the systems are quantized to zero [1.15]. Recently an investigation hasproduced some encouraging results on the use of accelerated optimization algorithmsto achieve better numerical conditioning that will lead to overall better performance[1.39]

1.7 New Horizons in Adaptive Signal ProcessingThis chapter presented a brief overview of the current state-of-the-art in

adaptive signal processing, with particular efforts devoted to developing terminologyand providing background that is essential for the following chapters of the book.The interdisciplinary nature of the field of adaptive signal processing was emphasizedin the hope that a broad view of the subject will serve the reader well in terms ofintegrating well known and very powerful results from the fields of signalprocessing, automatic control, and numerical analysis. We are now prepared toembark on a study of four related adaptive signal processing areas that will take usbeyond the conventional topics that were introduced in this chapter.

Chapter 2 presents a study of Advanced Algorithms for I-D AdaptiveFiltering. The chapter starts with some conventional concepts for I-D adaptivefilters, which are then extended beyond the scope of the current state-of-the-art. Thefirst topic in Chapter 2 involves new data re-using algorithms for I-D FIR adaptivefilters . The idea of re-using input data to cycle the internal parameter updatecalculation at a higher rate than the input data rate is not a new idea. But the studyleads to several new data re-using algorithms which strive to introduce minimalcorrelation effects when re-using old data. The new data re-using algorithms arebased on the LMS algorithm, from which they inherit O{N] computationalsimplicity. The second topic in Chapter 2 is on the use of pseudorandommodulation as a preconditioning technique in order to maintain persistent excitationthat will improve convergence rates in adaptive filters that must operate with lessthan ideal input signals. The idea of conditioning data sequences by scrambling inorder to better train adaptive equalizers has been known for some time, and it doesindeed represent a state-of-the-art technique for high speed modems designed tooperate on commercial binary communication channels. However, the notion ofemploying direct sequence spread spectrum techniques to spread the spectral energy ofill-conditioned narrow band input signals is a novel concept. PR modulationtechniques can be used with both FIR and IIR adaptive filters, although it wasdiscovered that these techniques are not completely general because they requireaccess to certain signals that may not be directly accessible in some applications.The last two sections of Chapter 2 deal with acceleration and preconditioningtechniques that fall within the general category of quasi-Newton optimizationstrategies. A considerable amount of attention is given to the PreconditionedConjugate Gradient Method for both the FIR and IIR cases. These optimizationstrategies are combined with block processing algorithms in an effort to gaincomputational efficiency.


Chapter 3 presents a comprehensive treatment on Structures and Algorithmsfor Two-Dimensional Adaptive Signal Processing . Much of the material in Chapter3 will initially appear as the successful extension of well known I-D techniques tothe 2-D case. While this is true in many cases, the computational complexity intwo dimensions grows so rapidly with filter size that while these extensions workfrom a numerical analysis point of view, they simply require too much computationto be considered useful in practical applications. Chapter 2 includes material on bothFIR and IIR 2-D adaptive filter structures. Surprisingly, many of the experimentsperformed with 2-D IIR filters worked well and the filters remained well behaved inspite of potential pitfalls for instability. The appendix at the end of the book is acontinuation of the material from Chapter 3, in which a rather difficult error surfacestudy is undertaken for 2-D IIR filters in an effort to determine if Stearn's Conjectureis valid in 2-D. All indications are that the conjecture does hold up in twodimensions, although the complexity of the mathematical analysis precluded a prooffor the general 2-D case.

Chapter 4 introduces the concept of Adaptive Fault Tolerance (AFT), which isprobably the most novel among the special topics treated in the book. The basicconcept of AFT is straightforward. If a hardware failure occurs in a real-time adaptivesystem, the malfunction of the system will certainly cause the output error toincreased. However, the filter responds to an increase in the output error byreadjusting all of its fault-free parameters in order to bring the error back down to aminimum value . It is demonstrated that if hardware redundancy is properly designedinto the adaptive system that adaptive fault tolerance can indeed be made to workproperly for certain classes of hardware failures . This is an exciting research area thathas a great deal of future potential. At this time the theory of adaptive fault toleranceis still in primitive stages, being limited mostly by a simplistic set of fault modelsthat have been incorporated into the current designs . Adaptive fault tolerance isdeveloped in Chapter 4 for FIR filters using the LMS algorithm. Some of theseideas have been successfully extended for 2-D FIR filters , as well as for I -D IIRfilters, but the scope of this book does not permit the inclusion of the most recentresults on this topic.

Finally, Chapter 5 presents a consolidated treatment of many scattered resultson Polynomial Adaptive Filters based largely on the Volterra model. The material inthis chapter is largely tutorial, as it closely tracks many of the developments in thereferences, and tries to place recent developments into a cohesive framework. Muchof Chapter 5 is easy reading. However, the sections on the Recursive Least SquaresLattice and the QR Decomposition Based Least Squares Lattice become long andquite complicated. Polynomial adaptive filter theory is still in its infant stages, sothere appears to be some great opportunities for conducting future research in thisarea.

1.8 Notation and ConventionsMost of the terminology and notation used in this book is defined within the

context of the discussion where it is used. In general, the input signal to an adaptive


system is denoted by the sequence x(n), while the corresponding output is denoted byy(n) . In most instances the desired response, or the training signal, is denoted asd(n). For FIR filters the coefficients are generally ordered as a vector that is denotedby w(n). In order to refer to the coefficients of the numerator and denominator termsof an IIR digital filter, the vector notation a(n) is typically used for the denominatorcoefficient, while b(n) refers to numerator coefficients. Sometimes the numeratorand denominator coefficients are treated as elements of a single vector, in which casethey are combined and represented by w(n) = [aT(n), bT(n)]T. The notation J(n) isused for the cost function , E[ . ] means a statistical expected value, and O[ . ] meansorder of complexity, according to conventional usage.

Boldface variables are used to denote vectors or matrices . There is no attemptto distinguish notationally between vectors and matrices, but rather the point of viewassumed is that a vector is simply an N x I matrix; the dimensionality of allmatrices and vectors should be obvious from the context in which they are used.

The autocorrelation function appears repeatedly throughout the entire book,where it is always denoted by the matrix R. Many times the autocorrelation will beleft without a subscript if its meaning is clear within the context of its use. At othertimes it will be subscripted with a single variable , such as Rx to emphasize that thecorrelation matrix refers to the specific variable x(n). In rare cases, and in particularwhen autocorrelation and crosscorrelation matrices are discussed simultaneously forIIR filters, the autocorrelation and crosscorrelation matrices will be doublesubscripted to keep their identity clear, i.e. the input autocorrelation matrix isdenoted as Rxx(n), the output autocorrelation as Ryy(n), and the input-outputcrosscorrelation as Rxy(n).

In many places throughout the text we speak of the transfer function of anadaptive system, denoting it as Htz,n) , We also refer to pole and zeros of adaptivefilters , and in general we borrow liberally from the concepts, definitions andterminology of linear time invariant systems to describe in intuitive terms thebehavior of adaptive systems, which are truly time varying . We realize very wellthat many of these concepts that are the standard tools for analyzing and describingtime invariant systems are not strictly well defined for time varying systems. Thejustification for our liberal usage of these concepts is that we are always workingwith an underlying assumption that the adaptation rate of the system is slow relativeto the range of frequency content of the signals that pass through and are processedby these systems. We are invoking the frozen parameter model from the field ofautomatic control. The advantages gained by leaning heavily on our well developedknowledge of linear time invariant systems is well worth any criticism we mayendure for being slightly inaccurate with some of the notions. Our goal is todevelop an understanding of difficult concepts and to press forth with mathematicalanalysis of intractable problems as best we can. Perhaps after we have developed adeeper understanding of the advanced concepts and operating principles of adaptivesystems, we will be able to return to the task at hand and fine tune the mathematicswith rigor and precision.


A final word about referencing. Each chapter contains its own set ofreferences immediately at the end of the chapter. In general the citations in a givenchapter refer to the list at the end of that chapter. The only exception to this rule isthat we have placed many of the fundamental references of the field in Chapter I sothat all later chapters can refer back to the Chapter I reference list, therebyeliminating many redundant listings. However, except for referring back to ChapterI, we do not cross reference the other chapters for fear of creating too muchconfusion. If two of the later chapters need to reference the same source, and if thatarticle is not a Chapter I reference, then it is simply listed with each chapter.

References

[1.1J H. Ardalan and S. T. Alexander, "Fixed-point roundoff error analysis of theexponentially windowed RLS algorithm for time-varying systems," IEEETrans. Acoust., Speech, Signal Processing, vol. ASSP-35, no. 6, pp. 770783, June 1987.

[1.2] M. G. Bellanger, Adaptive Digital Filters and Signal Analysis, MarcelDekker, New York and Basel, 1987.

[1.3J J. M. Cioffi, "Limited-precision effects in adaptive filtering," IEEE Trans,Circuits Syst., vol. CAS-34, no. 7, pp., 821-833, July 1987.

[1.4J B. Cowan and P. Grant, Adaptive Filters, Prentice-Hall, Englewood Cliffs,NJ, 1987.

[1.5] M. Dentino, J. McCool, and B. Widrow, "Adaptive filtering in the frequencydomain," Proc. IEEE, vol. 66, pp. 1658-1659, Dec. 1978.

[1.6] H. Fan, "New adaptive IIR filtering algorithms", Ph.D. dissertation,University of Illinois at Urbana-Champaign, Urbana, IL, 1986.

[1.7] H. Fan and W. K. Jenkins, "A new adaptive IIR filter," IEEE Trans. CircuitsSyst., vol. CAS-33, no. 10, pp., 939·947, October 1986.

[1.8J P. L. Feintuch, "An adaptive recursive LMS filter," Proc. IEEE, vol. 64, no.11, pp. 1622-1624, Nov. 1976.

[1.9] A. A. Giordano and F. M. Hsu, Least Squares Estimation with Applicationsto DigitalSignal Processing, Wiley and Sons, New York, 1985.

[1.10] R. D. Gitlin and F. R. Magee, Jr ., "Self-orthogonalizing adaptiveequalization algorithms, " IEEE Trans. Commun., vol. COM-25, No.7, pp.666-672 . July 1977.


[1.11) G. C. Goodwin and K. S. Sin, Adaptive Filtering, Prediction, and Control,Prentice-Hall, Englewood Cliffs, NJ, 1984.

[1.12) S. Haykin , Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall,1991.

[1.13) M. L. Honig and D. G. Messerschmidt, Adaptive Filters: Structures,Algorithms, and Applications, Kluwer Academic Press, Boston, MA, 1984.

[1.14) A. W. Hull, "Orthogonalization techniques for adaptive filters ," Ph.D.dissertation, University of Illinois, Urbana-Champaign, IL, 1994.

[1.15) M. N. Kloos, "The investigation of several adaptive filtering algorithms fortelecommunications echo cancellation implemented in TMS32010 fixedpoint assembly code," M.S. Thesis, Department of Electrical and ComputerEngineering , Univ. of Illinois at Urbana-Champaign, Urbana, IL, 1988.

[1.l6] J. C. Lee and C. K. Un, "Performance of transform domain LMS adaptivefilters," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp.499-510, June 1986.

[1.17] D. W. Lin, "On digital implementations of the fast Kaman algorithms,"IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, no. 5, pp.998 - 1005 Oct. 1984.

[1.18] F. Ling, D. Manolakis, and J. G. Proakis, "A recursive modified GramSchmidt algorithm for least squares estimation," IEEE Trans . Acoust.,Speech, Signal Processing, vol. ASSP-34" no. 4, pp. 829 - 836" Aug.1986.

[1.19) L. Ljung and T. Soderstrom, Theory and Practice ofRecursive Identification .Cambridge : MIT Press, 1983.

[1.20) D. G. Luenberger, Linear and Nonlinear Programming, second ed., AddisonWesley Publishing Co., Reading, MA, 1984.

[1.21) D. F. Marshall, "Computationally efficient techniques for rapid convergenceof adaptive digital filters," Ph.D. dissertation, University of Illinois, UrbanaChampaign, IL, 1988.

[1.22] D. F. Marshall, W. K. Jenkins, and J. J. Murphy, "The use of orthogonaltransforms for improving performance of adaptive filters," IEEE Trans.

50 Advanced Concepts in AdaptiveFiltering

Acoust., Speech, Signal Processing, vol.- ASSP-36, no 4, pp. 474-484,April 1989.

[1.23] D. F. Marshall and W. K. Jenkins, "A fast quasi-Newton adaptive filteringalgorithm," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-40,no. 7, pp. 1652-1662, July 1992.

[1.24] J. G. McWhirter, "Recursive least-squares minimization using a systolicarray," Proc. SPIE, Int. Soc. Opt. Eng., vol. 431, pp. 105 - 112, August.1983.

[1.25] B. Mulgrew and C. Cowan, Adaptive Filters and Equalizers, KluwerAcademic Publishers, Boston, 1988.

[1.26] S. S. Narayan, A. M. Peterson, and M. J. Narasima, "Transform domainLMS algorithm," IEEE Trans. Acoust., Speech, Signal Processing, vol.ASSP-34, pp. 499-510, June 1986.

[1.27] M. Nayeri, "Improvements in adaptive filtering theory and application,"Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Urbana, IL,1988.

[1.28] M. Nayeri and W. K. Jenkins, "Alternate realizations to adaptive IIR filtersand properties of their performance surfaces," IEEE Trans. Circuits Syst.,vol. CAS-36, no. 4, pp., 485-496, April 1989.

[1.29] A. Papoulis, Probability, Random Variables , and Stochastic Processes,second ed., McGraw-Hill, New York, 1984.

[1.30] P. A. Regalia, Adaptive IIR Filtering in Signal Processing and Control,Marcel Dekker, Inc., New York, 1995.

[1.31] B. A. Schnaufer, "Development and Analysis of an adaptive two-dimensionaljoint process estimator," M.S. Thesis, Department of Electrical andComputer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL,1990.

[I.32] B. A. Schnaufer, "Practical techniques for rapid and reliablereal-timeadaptivefiltering," Ph.D. dissertation, University of Illinois at Urbana-Champaign,Urbana, IL, 1995.

[1.33] A. M. Sequeira, "Adaptive two dimensional RLS algorithms," M.S. Thesis,Department of electrical and Computer Engineering, Naval PostgraduateSchool, Monterey, CA, March 1989


[1.34] A. M. Sequeira and C. W. Therrien , "A new 2-D fast RLS algorithm,"Proceedings of the 1990 Conference on Acoustics, Speech, and SignalProcessing, Albuquerque, NM, April 1990.

[1.35] J. M. Shapiro, "Algorithms and systolic architectures for real-timemultidimensional adaptive filtering of frequency domain multiplexed videosignals," Ph.D. dissertation, Mass. Inst. Tech., Cambridge, MA, 1990.

[1.36] J. J. Shynk, "A complex adaptive algorithm for IIR filtering," IEEE Trans.Acoust. Sp. Sig. Proc., vol. ASSP-34, no. 5, pp. 1342-1344, Oct. 1986.

[1.37] J. J. Shynk, "Adaptive IIR filtering using parallel-form realizations," IEEETrans. Acoust. Sp. Sig. Proc., vol. ASSP-37, no. 4, pp. 519-533, Apr.1989.

[1.38] J. Shynk, "Adaptive IIR filtering," IEEE ASSP Magazine, April 1989.

[1.39] R. Soni, "Fast converging adaptive IIR algorithms," M.S. Thesis,Department of Electrical and Computer Engineering, Univ. of Illinois atUrbana-Champaign, Urbana, IL, 1995

[1.40] J. C. Strait, "A two-dimensional adaptive digital filter based on theMcClellan transformation," M.S. thesis, Univ . of Illinois at UrbanaChampaign, Urbana, IL, 1989.

[1.41] J. C. Strait, "Structures and algorithms for two-dimensional adaptive signalprocessing," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign,Urbana,IL, 1995.

[1.42] J. Treichler, C. R. Johnson, Jr., and M. Larimore, Theory and Design ofAdaptive Filters. New York: Wiley, 1987.

[1.43] B. Widrow and S. D. Stearns, Adaptive Signal Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1985.

[1.44] N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary TimeSeries, Wiley and Sons, New York, 1949.

Documents

Advanced Concepts in Adaptive Signal Processing || Introduction and Background