Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
ELEG-636: Statistical Signal Processing
Gonzalo R. Arce
Department of Electrical and Computer EngineeringUniversity of Delaware
Spring 2010
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 1 / 79
Course Objectives & Structure
Course Objectives & Structure
Objective: Given a discrete time sequence {x(n)}, developStatistical and spectral signal representationFiltering, prediction, and system identification algorithmsOptimization methods that are
StatisticalAdaptive
Course Structure:Weekly lectures [notes: www.ece.udel.edu/!arce]Periodic homework (theory & Matlab implementations) [15%]Midterm & Final examinations [85%]
Textbook:Haykin, Adaptive Filter Theory.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 2 / 79
Course Objectives & Structure
Course Objectives & Structure
Broad Applications in Communications, Imaging, Sensors.Emerging application in
Brain-imaging techniquesBrain-machine interfaces,Implantable devices.
Neurofeedback presents real-time physiological signals from MRIsin a visual or auditory form to provide information about brainactivity. These signals are used to train the patient to alter neuralactivity in a desired direction.Traditionally, feedback using EEGs or other mechanisms has notfocused on the brain because the resolution is not good enough.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 3 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Motivation
Adaptive Optimization and Filtering Methods
MotivationAdaptive optimization and filtering methods are appropriate,advantageous, or necessary when:
Signal statistics are not known a priori and must be “learned” fromobserved or representative samplesSignal statistics evolve over timeTime or computational restrictions dictate that simple, if repetitive,operations be employed rather than solving more complex, closedform expressionsTo be considered are the following algorithms:
Steepest Descent (SD) – deterministicLeast Means Squared (LMS) – stochasticRecursive Least Squares (RLS) – deterministic
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 5 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
Definition (Steepest Descent (SD))Steepest descent, also known as gradient descent, it is an iterativetechnique for finding the local minimum of a function.Approach: Given an arbitrary starting point, the current location (value)is moved in steps proportional to the negatives of the gradient at thecurrent point.
SD is an old, deterministic method, that is the basis for stochasticgradient based methodsSD is a feedback approach to finding local minimum of an errorperformance surfaceThe error surface must be known a prioriIn the MSE case, SD converges converges to the optimal solution,w0 = R!1p, without inverting a matrixQuestion: Why in the MSE case does this converge to the globalminimum rather than a local minimum?
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 6 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
ExampleConsider a well structured cost function with a single minimum. Theoptimization proceeds as follows:
Contour plot showing that evolution of the optimization
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 7 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
ExampleConsider a gradient ascent example in which there are multipleminima/maxima
Surface plot showing the multipleminima and maxima Contour plot illustrating that the final
result depends on starting value
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 8 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
To derive the approach, consider the FIR case:
{x(n)} – the WSS input samples{d(n)} – the WSS desired output{d(n)} – the estimate of the desired signal given by
d(n) = wH(n)x(n)wherex(n) = [x(n), x(n " 1), · · · , x(n "M + 1)]T [obs. vector]w(n) = [w0(n),w1(n), · · · ,wM-1(n)]T [time indexed filter coefs.]
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 9 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
Then similarly to previously considered cases
e(n) = d(n)" d(n) = d(n)"wH(n)x(n)
and the MSE at time n is
J(n) = E{|e(n)|2}= !2d "wH(n)p" pHw(n) +wH(n)Rw(n)
where!2d – variance of desired signalp – cross-correlation between x(n) and d(n)R – correlation matrix of x(n)
Note: The weight vector and cost function are time indexed (functionsof time)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 10 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
When w(n) is set to the (optimal) Wiener solution,
w(n) = w0 = R!1p
andJ(n) = Jmin = !2d " pHw0
Use the method of steepest descent to iteratively find w0.The optimal result is achieved since the cost function is a secondorder polynomial with a single unique minimum
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 11 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
ExampleLet M = 2. The MSE is a bowl–shaped surface, which is a function ofthe 2-D space weight vector w(n)
A contour of the MSE is given as
w0w
J(w)
J(w)
w1
w2
1
Surface Plot
w0
w2
w1
)(J∇−
2wJ
∂
∂−
1wJ
∂
∂−
Contour Plot
Imagine dropping a marble at any point on the bowl-shaped surface.The ball will reach the minimum point by going through the path ofsteepest descent.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 12 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Steepest Descent
Observation: Set the direction of filter update as: "#J(n)Resulting Update:
w(n + 1) = w(n) + 12µ["#J(n)]
or, since #J(n) = "2p+ 2Rw(n)
w(n + 1) = w(n) + µ[p" Rw(n)] n = 0, 1, 2, · · ·
where w(0) = 0 (or other appropriate value) and µ is the step sizeObservation: SD uses feedback, which makes it possible for thesystem to be unstable
Bounds on the step size guaranteeing stability can be determinedwith respect to the eigenvalues of R (Widrow, 1970)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 13 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Convergence AnalysisDefine the error vector for the tap weights as
c(n) = w(n)"w0
Then using p = Rw0 in the update,
w(n + 1) = w(n) + µ[p" Rw(n)]= w(n) + µ[Rw0 " Rw(n)]= w(n)" µRc(n)
and subtracting w0 from both sides
w(n + 1)"w0 = w(n)"w0 " µRc(n)$ c(n + 1) = c(n)" µRc(n)
= [I" µR]c(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 14 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Using the Unitary Similarity Transform
R = Q!QH
we have
c(n + 1) = [I" µR]c(n)= [I" µQ!QH ]c(n)
$ QHc(n + 1) = [QH " µQHQ!QH ]c(n)= [I" µ!]QHc(n) (%)
Define the transformed coefficients as
v(n) = QHc(n)= QH(w(n)"w0)
Then (%) becomesv(n + 1) = [I" µ!]v(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 15 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Consider the initial condition of v(n)
v(0) = QH(w(0)"w0)
= "QHw0 [if w(0) = 0]
Consider the kth term (mode) in
v(n + 1) = [I" µ!]v(n)
Note [I" µ!] is diagonalThus all modes are independently updatedThe update for the kth term can be written as
vk (n + 1) = (1" µ"k )vk (n) k = 1, 2, · · · ,M
or using recursion
vk (n) = (1" µ"k )nvk (0)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 16 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Observation: Conversion to the optimal solution requires
limn"#
w(n) = w0
$ limn"#
c(n) = limn"#
w(n)"w0 = 0
$ limn"#
v(n) = limn"#
QHc(n) = 0
$ limn"#
vk (n) = 0 k = 1, 2, · · · ,M (%)
Result: According to the recursion
vk (n) = (1" µ"k )nvk (0)
the limit in (%) holds if and only if
|1" µ"k | < 1 for all k
Thus since the eigenvalues are nonnegative, 0 < µ"max < 2, or
0 < µ <2
"max
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 17 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Observation: The kth mode has geometric decay
vk (n) = (1" µ"k )nvk (0)
The rate of decay it is characterized by the time it takes to decayto e!1 of the initial valueLet #k denote this time for the kth mode
vk (#k ) = (1" µ"k )!k vk (0) = e!1vk (0)
$ e!1 = (1" µ"k )!k
$ #k ="1
ln(1" µ"k )&
1µ"k
for µ ' 1
Result: The overall rate of decay is
"1ln(1" µ"max)
( # ("1
ln(1" µ"min)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 18 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
ExampleConsider the typical behavior of a single mode
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 19 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Error AnalysisRecall that
J(n) = Jmin + (w(n)"w0)HR(w(n)"w0)
= Jmin + (w(n)"w0)HQ!QH(w(n)"w0)
= Jmin + v(n)H!v(n)
= Jmin +M!
k=1"k |vk (n)|2 [sub in vk (n) = (1" µ"k )
nvk (0)]
= Jmin +M!
k=1"k (1" µ"k )
2n|vk (0)|2
Result: If 0 < µ < 2"max
, then
limn"#
J(n) = Jmin
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 20 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
ExampleConsider a two–tap predictor for real–valued input
λ
χ
Analyzed the effects of the following cases:Varying the eigenvalue spread $(R) = "max
"minwhile keeping µ fixed
Varying µ and keeping the eigenvalue spread $(R) fixed
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 21 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]for step-size µ = 0.3
Eigenvalue spread: $(R) = 1.22Small eigenvalue spread$modes converge at a similar rate
Eigenvalue spread: $(R) = 3Moderate eigenvalue spread$modes converge at moderatelysimilar rates
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 22 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]for step-size µ = 0.3
Eigenvalue spread: $(R) = 10Large eigenvalue spread$modes converge at different rates
Eigenvalue spread: $(R) = 100Very large eigenvalue spread$modes converge at very differentratesPrinciple direction convergence isfastest
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 23 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]for step-size µ = 0.3
Eigenvalue spread: $(R) = 1.22Small eigenvalue spread$modes converge at a similar rate
Eigenvalue spread: $(R) = 3Moderate eigenvalue spread$modes converge at moderatelysimilar rates
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 24 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]for step-size µ = 0.3
Eigenvalue spread: $(R) = 10Large eigenvalue spread$modes converge at different rates
Eigenvalue spread: $(R) = 100Very large eigenvalue spread$modes converge at very differentratesPrinciple direction convergence isfastest
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 25 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
Learning curves of steepest-descent algorithm with step-sizeparameter µ = 0.3 and varying eigenvalue spread.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 26 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [v1(n), v2(n)]with $(R) = 10 and varying step–sizes
Step–sizes: µ = 0.3This is over–damped$ slowconvergence
Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 27 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD loci plots (with shown J(n) contours) as a function of [w1(n),w2(n)]with $(R) = 10 and varying step–sizes
Step–sizes: µ = 0.3This is over–damped$ slowconvergence
Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 28 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
ExampleConsider a system identification problem
w(n) system
{x(n)}
)(ˆ nd d(n)+_
e(n)
Suppose M = 2 and
Rx ="
1 0.80.8 1
#
P =
"
0.80.5
#
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 29 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
From eigen analysis we have
"1 = 1.8,"2 = 0.2$ µ <21.8
alsoq1 =
1)2
"
11
#
q2 =1)2
"
1"1
#
andQ =
1)2
"
1 11 "1
#
Also,w0 = R!1p =
"
1.11"0.389
#
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 30 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
Thusv(n) = QH [w(n)"w0]
Noting that
v(0) = "QHw0 = "1)2
"
1 11 "1
# "
1.11"0.389
#
=
"
0.511.06
#
andv1(n) = (1" µ(1.8))n0.51
v2(n) = (1" µ(0.2))n1.06
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 31 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Predictor
SD convergence properties for two µ values
Step–sizes: µ = 0.5This is over–damped$ slowconvergence
Step–sizes: µ = 1This is under–damped$ fast(erratic) convergence
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 32 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)
Least Mean Squares (LMS)
Definition (Least Mean Squares (LMS) Algorithm)Motivation: The error performance surface used by the SD method isnot always known a prioriSolution: Use estimated values. We will use the followinginstantaneous estimates
R(n) = x(n)xH(n)
p(n) = x(n)d$(n)
Result: The estimates are RVs and thus this leads to a stochasticoptimization
Historical Note: Invented in 1960 by Stanford University professorBernard Widrow and his first Ph.D. student, Ted Hoff
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 33 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)
Recall the SD update
w(n + 1) = w(n) + 12µ["#(J(n))]
where the gradient of the error surface at w(n) was shown to be
#(J(n)) = "2p+ 2Rw(n)
Using the instantaneous estimates,
#(J(n)) = "2x(n)d$(n) + 2x(n)xH(n)w(n)= "2x(n)[d$(n)" xH(n)w(n)]= "2x(n)[d$(n)" d$(n)]= "2x(n)e$(n)
where e$(n) is the complex conjugate of the estimate error.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 34 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Least Mean Squares (LMS)
Utilizing #(J(n)) = "2x(n)e$(n) in the update
w(n + 1) = w(n) + 12µ["#(J(n))]
= w(n) + µx(n)e$(n) [LMS Update]
The LMS algorithm belongs to the family of stochastic gradientalgorithmsThe update is extremely simpleAlthough the instantaneous estimates may have large variance,the LMS algorithm is recursive and effectively averages theseestimatesThe simplicity and good performance of the LMS algorithm makeit the benchmark against which other optimization algorithms arejudged
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 35 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Convergence Analysis
Independence TheoremThe following conditions hold:
1 The vectors x(1), x(2), · · · , x(n) are statistically independent2 x(n) is independent of d(1), d(2), · · · , d(n " 1)3 d(n) is statistically dependent on x(n), but is independent ofd(1), d(2), · · · , d(n " 1)
4 x(n) and d(n) are mutually Gaussian
The independence theorem is invoked in the LMS algorithmanalysisThe independence theorem is justified in some cases, e.g.,beamforming where we receive independent vector observationsIn other cases it is not well justified, but allows the analysis toproceeds (i.e., when all else fails, invoke simplifying assumptions)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 36 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
We will invoke the independence theorem to show that w(n) convergesto the optimal solution in the mean
limn"#
E{w(n)} = w0
To prove this, evaluate the update
w(n + 1) = w(n) + µx(n)e$(n)$ w(n + 1)"w0 = w(n)"w0 + µx(n)e$(n)
$ c(n + 1) = c(n) + µx(n)(d$(n)" xH(n)w(n))= c(n) + µx(n)d$(n)" µx(n)xH(n)[w(n)"w0 +w0]
= c(n) + µx(n)d$(n)" µx(n)xH(n)c(n)"µx(n)xH(n)w0
= [I" µx(n)xH(n)]c(n) + µx(n)[d$(n)" xH(n)w0]
= [I" µx(n)xH(n)]c(n) + µx(n)e$0(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 37 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Take the expectation of the update noting thatw(n) is based on past inputs and desired values
$ w(n), and consequently c(n)), are independent of x(n)(Independence Theorem)
Thus
c(n + 1) = [I" µx(n)xH(n)]c(n) + µx(n)e$0(n)$ E{c(n + 1)} = (I" µR)E{c(n)}+ µE{x(n)e$0(n)}
$ %& '
=0 why?
= (I" µR)E{c(n)}
Using arguments similar to the SD case we have
limn"#
E{c(n)} = 0 if 0 < µ <2
"max
or equivalently
limn"#
E{w(n)} = w0 if 0 < µ <2
"max
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 38 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Noting that(M
i=1 "i = trace[R]
$ "max ( trace[R] = Mr(0) = M!2x
Thus a more conservative bound (and one easier to determine) is
0 < µ <2
M!2x
Convergence in the mean
limn"#
E{w(n)} = w0
is a weak condition that says nothing about the variance, whichmay even growA stronger condition is convergence in the mean square, whichsays
limn"#
E{|c(n)|2} = constant
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 39 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Proving convergence in the mean square is equivalent to showing that
limn"#
J(n) = limn"#
E{|e(n)|2} = constant
To evaluate the limit, write e(n) as
e(n) = d(n)" d(n) = d(n)"wH(n)x(n)= d(n)"wH
0 x(n)" [wH(n)"wH0 ]x(n)
= e0(n)" cH(n)x(n)
Thus
J(n) = E{|e(n)|2}= E
)*
e0(n)" cH(n)x(n)+*
e$0(n)" xH(n)c(n)+,
= Jmin + E{cH(n)x(n)xH(n)c(n)}$ %& '
Jex(n)
[Cross terms* 0, why?]
= Jmin + Jex(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 40 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Since Jex(n) is a scalar
Jex(n) = E{cH(n)x(n)xH(n)c(n)}= E{trace[cH(n)x(n)xH(n)c(n)]}= E{trace[x(n)xH(n)c(n)cH(n)]}= trace[E{x(n)xH(n)c(n)cH(n)}]
Invoking the independence theorem
Jex(n) = trace[E{x(n)xH(n)}E{c(n)cH(n)}]= trace[RK(n)]
whereK(n) = E{c(n)cH(n)}
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 41 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Thus
J(n) = Jmin + Jex(n)= Jmin + trace[RK(n)]
RecallQHRQ = ! or R = Q!QH
SetS(n) + QHK(n)Q
where S(n) need not be diagonal. Then
K(n) = QQHK(n)QQH [since Q!1 = QH ]
= QS(n)QH
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 42 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Utilizing R = Q!QH and K(n) = QS(n)QH in the excess errorexpression
Jex(n) = trace[RK(n)]= trace[Q!QHQS(n)QH ]
= trace[Q!S(n)QH ]
= trace[QHQ!S(n)]= trace[!S(n)]
Since ! is diagonal
Jex(n) = trace[!S(n)] =M!
i=1"i si(n)
where s1(n), s2(n), · · · , sM(n) are the diagonal elements of S(n).
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 43 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
The previously derived recursion
E{c(n + 1)} = (I" µR)E{c(n)}
can be modified to yield a recursion on S(n),
S(n + 1) = (I" µ!)S(n)(I" µ!) + µ2Jmin!
which for the diagonal elements is
si(n + 1) = (1" µ"i)2si(n) + µ2Jmin"i i = 1, 2, · · · ,M
Suppose Jex(n) converges, then si(n + 1) = si(n)
$ si(n) = (1" µ"i)2si(n) + µ2Jmin"i
$ si(n) =µ2Jmin"i
1" (1" µ"i)2=
µ2Jmin"i2µ"i " µ2"2i
=µJmin2" µ"i
i = 1, 2, · · · ,M
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 44 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Convergence Analysis
Consider again
Jex(n) = trace[!S(n)] =M!
i=1"i si(n)
Taking the limit and utilizing si(n) = µJmin2!µ"i
,
limn"#
Jex(n) = Jmin
M!
i=1
µ"i2" µ"i
The LMS misadjustment is defined as
MA =limn"# Jex(n)
Jmin=
M!
i=1
µ"i2" µ"i
Note: A misadjustment at 10% or less is generally consideredacceptable.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 45 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Example
µ
µ
This is a one tap predictor
x(n) = w(n)x(n " 1)
Take the underlying process to be areal order one AR process
x(n) = "ax(n " 1) + v(n)
The weight update is
w(n + 1) = w(n) + µx(n " 1)e(n) [LMS update for obs. x(n " 1)]= w(n) + µx(n " 1)[x(n)" w(n)x(n " 1)]
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 46 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Sincex(n) = "ax(n " 1) + v(n) [AR model]
and
x(n) = w(n)x(n " 1) [one tap predictor]$ w0 = "a
Note thatE{x(n " 1)eo(n)} = E{x(n " 1)v(n)} = 0
proves the optimalitySet µ = 0.05 and consider two cases
a !2x-0.99 0.936270.99 0.995
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 47 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Figure: Transient behavior of adaptive first-order predictor weight w(n) forµ = 0.05.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 48 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Figure: Transient behavior of adaptive first-order predictor squared error forµ = 0.05.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 49 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Figure: Mean-squared error learning curves for an adaptive first-orderpredictor with varying step-size parameter µ.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 50 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Consider the expected trajectory of w(n). Recall
w(n + 1) = w(n) + µx(n " 1)e(n)= w(n) + µx(n " 1)[x(n)" w(n)x(n " 1)]= [1" µx(n " 1)x(n " 1)]w(n) + µx(n " 1)x(n)
In this example, x(n) = "ax(n " 1) + v(n). Substituting in:
w(n + 1) = [1" µx(n " 1)x(n " 1)]w(n) + µx(n " 1)["ax(n " 1)+v(n)]
= [1" µx(n " 1)x(n " 1)]w(n)" µax(n " 1)x(n " 1)+µx(n " 1)v(n)
Taking the expectation and invoking the dependence theorem
E{w(n + 1)} = (1" µ!2x )E{w(n)}" µ!2xa
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 51 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Figure: Comparison of experimental results with theory, based on w(n).
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 52 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Next, derive a theoretical expression for J(n).Note that the initial value of J(n) is
J(0) = E{(x(0)" w(0)x("1))2} = E{(x(0))2} = !2x
and the final value isJ(,) = Jmin + Jex
= E{(x(n)" w(n)x(n " 1))2}+ Jex
= E{(v(n))2}+ Jex
= !2v + Jminµ"1
2" µ"1
Note "1 = !2x . Thus,
J(,) = !2v + !2v
-µ!2x
2" µ!2x
.
= !2v
-
1+ µ!2x2" µ!2x
.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 53 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
And if µ is small
J(,) = !2v
-
1+ µ!2x2" µ!2x
.
& !2v
-
1+ µ!2x2
.
Putting all the components together:
J(n) = [!2x " !2v (1+µ
2!2x )]
$ %& '
J(0)!J(#)
(1" µ!2x )2n
$ %& '
1"0
+!2v (1+µ
2!2x )
$ %& '
J(#)
Also, the time constant is
# = "1
2 ln(1" µ"1)= "
12 ln(1" µ!2x )
&1
2µ!2x
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 54 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: First-Order Predictor
Figure: Comparison of experimental results with theory for the adaptivepredictor, based on the mean-square error for µ = 0.001.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 55 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Example (Adaptive Equalization)Objective: Pass a known signal through an unknown channel to invertthe effects the channel and noise have on the signal
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 56 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
The signal is a Bernoulli sequence
xn =
/
+1 with probability 1/2"1 with probability 1/2
The additive noise is ! N(0, 0.001)The channel has a raised cosine response
hn =
/ 120
1+ cos12#w (n " 2)
23
n = 1, 2, 30 otherwise
$ w controls the eigenvalue spread $(R)$ hn is symmetric about n = 2 and thus introduces a delay of 2We will use an M = 11 tap filter, which is symmetric about n = 5$ Introduce a delay of 5Thus an overall delay of % = 5+ 2 = 7 is added to the system
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 57 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Channel response and Filter response
Figure: (a) Impulse response of channel; (b) impulse response of optimumtransversal equalizer.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 58 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Consider three w values
Note the step size is bound by the w = 3.5 case
µ (2
Mr(0) =2
11(1.3022) = 0.14
Choose µ = 0.075 in all cases.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 59 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Figure: Learning curves of the LMS algorithm for an adaptive equalizer withnumber of taps M = 11, step-size parameter µ = 0.075, and varyingeigenvalue spread $(R).
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 60 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Ensemble-average impulse response of the adaptive equalizer (after1000 iterations) for each of four different eigenvalue spreads.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 61 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: Adaptive Equalization
Figure: Learning curves of the LMS algorithm for an adaptive equalizer withthe number of taps M = 11, fixed eigenvalue spread, and varying step-sizeparameter µ.
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 62 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality
ExampleDirectionality of the LMS algorithm
The speed of convergence of the LMS algorithm is faster incertain directions in the weight spaceIf the convergence is in the appropriate direction, the convergencecan be accelerated by increased eigenvalue spread
To investigate this phenomenon, consider the deterministic signal
x(n) = A1 cos(&1n) + A2 cos(&2n)
Even though it is deterministic, a correlation matrix can be determined:
R =12
"
A21 + A22 A21 cos(&1) + A22 cos(&2)A21 cos(&1) + A22 cos(&2) A21 + A22
#
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 63 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality
Determining the eigenvalues and eigenvectors yields
"1 =12A
21(1+ cos(&1)) +
12A
22(1+ cos(&2))
"2 =12A
21(1" cos(&1)) +
12A
22(1" cos(&2))
andq1 =
"
11
#
q2 ="
"11
#
Case 1: A1 = 1,A2 = 0.5,&1 = 1.2,&2 = 0.1
xa(n) = cos(1.2n) + 0.5 cos(0.1n) and $(R) = 2.9
Case 2: A1 = 1,A2 = 0.5,&1 = 0.6,&2 = 0.23
xb(n) = cos(0.6n) + 0.5 cos(0.23n) and $(R) = 12.9
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 64 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality
Since p undefined, set p = "iqiThen since p = Rw0, we see (two cases)
p = "1q1 $ Rw0 = "1q1 $ w0 = q1 ="
11
#
p = "2q2 $ Rw0 = "2q2 $ w0 = q2 ="
"11
#
Utilize 200 iterations of the algorithm.
Consider the minimum eigenfilter first, w0 = q2 ="
"11
#
Consider the maximum eigenfilter second, w0 = q1 ="
11
#
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 65 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality
Convergence of the LMS algorithm, for a deterministic sinusoidalprocess, along “slow” eigenvector (i.e., minimum eigenfilter).
For input xa(n) ($(R) = 2.9) For input xb(n) ($(R) = 12.9)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 66 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Example: LMS Directionality
Convergence of the LMS algorithm, for a deterministic sinusoidalprocess, along “fast” eigenvector (i.e., maximum eigenfilter).
For input xa(n) ($(R) = 2.9) For input xb(n) ($(R) = 12.9)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 67 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Observation: The LMS correction is proportional to µx(n)e$(n)
w(n + 1) = w(n) + µx(n)e$(n)
If x(n) is large, the LMS update suffers from gradient noiseamplificationThe normalized LMS algorithm seeks to avoid gradient noiseamplification
The step size is made time varying, µ(n), and optimized tominimize the next step error
w(n + 1) = w(n) + 12µ(n)["#J(n)]
= w(n) + µ(n)[p" Rw(n)]
Choose µ(n), such that w(n + 1) produces the minimum MSE,
J(n + 1) = E{|e(n + 1)|2}
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 68 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Let #(n) + #J(n) and note
e(n + 1) = d(n + 1)"wH(n + 1)x(n + 1)
Objective: Choose µ(n) such that it minimizes J(n + 1)The optimal step size, µ0(n), will be a function of R and #(n).$ Use instantaneous estimates of these values
To determine µ0(n), expand J(n + 1)
J(n + 1) = E{e(n + 1)e$(n + 1)}= E{(d(n + 1)"wH(n + 1)x(n + 1))
(d$(n + 1)" xH(n + 1)w(n + 1))}= !2d "wH(n + 1)p" pHw(n + 1)
+wH(n + 1)Rw(n + 1)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 69 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Now use the fact that w(n + 1) = w(n)" 12µ(n)#(n)
J(n + 1) = !2d "wH(n + 1)p" pHw(n + 1)+wH(n + 1)Rw(n + 1)
= !2d ""
w(n)" 12µ(n)#(n)
#Hp
"pH"
w(n)" 12µ(n)#(n)
#
+
"
w(n)" 12µ(n)#(n)
#HR"
w(n)" 12µ(n)#(n)
#
$ %& '
= wH(n)Rw(n)" 12µ(n)w
H(n)R#(n)
"12µ(n)#
H(n)Rw(n) + 14µ
2(n)#H(n)R#(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 70 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
J(n + 1) = !2d ""
w(n)" 12µ(n)#(n)
#Hp
"pH"
w(n)" 12µ(n)#(n)
#
+wH(n)Rw(n)" 12µ(n)w
H(n)R#(n)
"12µ(n)#
H(n)Rw(n) + 14µ
2(n)#H(n)R#(n)
Differentiating with respect to µ(n),
'J(n + 1)'µ(n) =
12#
H(n)p+12p
H#(n)" 12w
HR#(n)
"12#
H(n)Rw(n) + 12µ(n)#
H(n)R#(n) (%)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 71 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Setting (%) equal to 0
µ0(n)#H(n)R#(n) = wH(n)R#(n)" pH#(n)+#H(n)Rw(n)"#H(n)p
$ µ0(n) =wH(n)R#(n)" pH#(n) +#H(n)Rw(n)"#H(n)p
#H(n)R#(n)
=[wH(n)R" pH ]#(n) +#H(n)[Rw(n)" p]
#H(n)R#(n)
=[Rw(n)" p]H#(n) +#H(n)[Rw(n)" p]
#H(n)R#(n)
=12#
H(n)#(n) + 12#
H(n)#(n)#H(n)R#(n)
=#H(n)#(n)#H(n)R#(n)
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 72 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Using instantaneous estimates
R = x(n)xH(n) and p = x(n)d$(n)$ #(n) = 2[Rw(n)" p]
= 2[x(n)xH(n)w(n)" x(n)d$(n)]= 2[x(n)(d$(n)" d$(n))]= "2x(n)e$(n)
Thus
µ0(n) =#H(n)#(n)#H(n)R#(n) =
4xH(n)e(n)x(n)e$(n)2xH(n)e(n)x(n)xH(n)2x(n)e$(n)
=|e(n)|2xH(n)x(n)
|e(n)|2(xH(n)x(n))2
=1
xH(n)x(n) =1
||x(n)||2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 73 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) Normalized LMS Algorithm
Result: The NLMS update is
w(n + 1) = w(n) + µ
||x(n)||2$ %& '
µ(n)
x(n)e$(n)
µ is introduced to scale the updateTo avoid problems when ||x(n)||2 & 0 we add an offset
w(n + 1) = w(n) + µ
a+ ||x(n)||2x(n)e$(n)
where a > 0
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 74 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce
Objective: Analyze the NLMS convergence
w(n + 1) = w(n) + µ
||x(n)||2x(n)e$(n)
Substituting e(n) = d(n)"wH(n)x(n)
w(n + 1) = w(n) + µ
||x(n)||2x(n)[d$(n)" xH(n)w(n)]
=
"
I" µx(n)xH(n)||x(n)||2
#
w(n) + µx(n)d$(n)||x(n)||2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 75 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce
Objective: Compare the NLMS and LMS algorithms:NLMS:
w(n + 1) ="
I" µx(n)xH(n)||x(n)||2
#
w(n) + µx(n)d$(n)||x(n)||2
LMS:
w(n + 1) = [I" µx(n)xH(n)]w(n) + µx(n)d$(n)
By observation, we see the following corresponding terms
LMS NLMSµ µ
x(n)xH(n) x(n)xH(n)||x(n)||2
x(n)d$(n) x(n)d$(n)||x(n)||2
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 76 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce
LMS NLMSµ µ
x(n)xH(n) x(n)xH(n)||x(n)||2
x(n)d$(n) x(n)d$(n)||x(n)||2
LMS case:
0 < µ <2
trace[E{x(n)xH(n)}] =2
trace[R]
guarantees stabilityBy analogy,
0 < µ <2
trace4
E)x(n)xH(n)||x(n)||2
,5
guarantees stability of the NLMSGonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 77 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce
To analyze the bound, make the following approximation
E/x(n)xH(n)||x(n)||2
6
&E{x(n)xH(n)}E{||x(n)||2}
Then
trace"
E/x(n)xH(n)||x(n)||2
6#
=trace[E{x(n)xH(n)}]
E{||x(n)||2}
=E{trace[x(n)xH(n)]}
E{||x(n)||2}
=E{trace[xH(n)x(n)]}
E{||x(n)||2}
=E{trace[||x(n)||2]}
E{||x(n)||2}= 1
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 78 / 79
Adaptive Optimization (Steepest Descent, LMS, RLS Algorithms) NLMS Convergnce
Thus0 < µ <
2trace
4
E)x(n)xH(n)||x(n)||2
,5 = 2
Final Result: The NLMS update
w(n + 1) = w(n) + µx(n)
||x(n)||2e$(n)
will converge if 0 < µ < 2Note:
The NLMS has a simpler convergence criterion than the LMSThe NLMS generally converges faster than the LMS algorithm
Gonzalo R. Arce (ECE, Univ. of Delaware) ELEG-636: Statistical Signal Processing Spring 2010 79 / 79
ELEG 636 - Statistical Signal Processing
Gonzalo R. Arce
Variants of the LMS algorithm
Department of Electrical and Computer EngineeringUniversity of Delaware
Newark, DE, 19716Fall 2013
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 1 / 15
Standard LMS Algorithm
FIR filters:
y(n) = w0(n)u(n)+w1(n)u(n�1)+ ...+W
M�1(n)u(n�M +1)
=M�1
Âk=0
w
k
(n)u(n�k) = w(n)T
u(n), n = 0,1, . . . ,•
Error between filter output y(t) and a desired signal d(t):
e(n) = d(n)�y(n) = d(n)�w(n)T
u(n).
Update filter parameters according to
w(n+1) = w(n)+µu(n)e(n).
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 2 / 15
1. Normalized LMS Algorithm
Modify at time n the parameter vector from w(n) to w(n+1)Where l will result from
d(n) =M�1
Âi=0
w
i
(n+1)u(n� i).
In order to add an extra freedom degree to the adaptation strategy, oneconstant, µ, controlling the step size will be introduced:
w
j
(n+1)=w
j
(n)+ µ 1
ÂM�1i=0 (u(n� i))2
e(n)u(n� j)=w
j
(n)+µ
ku(n)k2 e(n)u(n� j).
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 3 / 15
To overcome the possible numerical difficulties when ku(n)k is close to zero,a constant a > 0 is used:
w
j
(n+1) = w
j
(n)+µ
a+ku(n)k2 e(n)u(n� j)
This is the update used in the Normalized LMS algorithm.
The Normalized LMS algorithm converges if
0 < µ < 2
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 4 / 15
Comparison of LMS and NLMS
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 5 / 15
Comparison of LMS and NLMS
The LMS was run with three different step-sizes:µ = [0.075; 0.025; 0.0075].
The NLMS was run with four different step-sizes: µ = [1.0; 0.5; 0.1].
The larger the step-size, the faster the convergence. The smallerthe step-size, the better the steady state square error.
LMS with µ = 0.0075 and NLMS with µ = 0.1 achieved similar averagesteady state square error. However, NLMS was faster.
LMS with µ = 0.075 and NLMS with µ = 1.0 had a similar convergencespeed. However, NLMS achieved a lower steady state average sqareerror.
Conclusion: NLMS offers better trade-offs than LMS.
The computational complexity of NLMS is slightly higher that that ofLMS.
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 6 / 15
2. LMS Algorithm with Time Variable Adaptation StepHeuristics: combine benefits of two different situations:
Convergence time constant is small for large µ.
Mean-square error in steady state is low for small µ. Initial adaptation µ
is kept large, then it is monotonically reduced µ(n) = 1n+c
.
Disadvantage for non-stationary data: algorithm will not react to changesin the optimum solution, for large values of n.
Variable Step algorithm:
w(n+1) = w(n)+M(n)u(n)e(n)
where M(n) =
��������
µ0(n) 0 · · · 00 µ1(n) · · · 00 0 · · · 00 0 · · · µ
M�1(n)
��������.
Each filter parameter w
i
()n is updated using an independent adaptationstep µ
i
(n).
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 7 / 15
Comparison of LMS and variable size LMS µ = µ0.01n+c
with c = [10;20;50]
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 8 / 15
3. Sign algorithmsIn high speed communication the time is critical, thus faster adaptationprocesses is needed.
sgn(a) =
8<
:
1; a> 00; a= 0�1; a< 0
The Sign algorithm (other names: pilot LMS, or Sign Error)
w(n+1) = w(n)+µu(n) sgn(e(n)).
The Clipped LMS (or Signed Regressor)
w(n+1) = w(n)+µ sgn(u(n))e(n).
The Zero forcing LMS (or Sign Sign)
w(n+1) = w(n)+µ sgn(u(n)) sgn(e(n)).
The Sign algorithm can be derived as a LMS algorithm for minimizing the Meanabsolute error (MAE) criterion
J(w) = E [|e(n)|] = E [|d(n)�w
T
u(n)|].
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 9 / 15
Properties of sign algorithms
Fast computation: if µ is constrained to the form µ = 2m, only shiftingand addition operations are required.
Drawback: the update mechanism is degraded, compared to LMSalgorithm, by the crude quantization of gradient estimates.
The steady state error will increaseThe convergence rate decreases
The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for32000 bps system.
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 10 / 15
Comparison of LMS and Sign LMS
Sign LMS algorithm should be operated at smaller step-sizes to get a similarbehavior as standard LMS algorithm.
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 11 / 15
4. Linear smoothing of LMS gradient estimates
Lowpass filtering the noisy gradient Rename the noisy gradientg(n) = —
w
J
g(n) = —w
J = 2u(n)e(n)
g
i
(n) =�2e(n)u(n� i).
Passing the signals g
i
(n) through low pass filters will prevent the largefluctuations of direction during adaptation process.
b
i
(n) = LPF (gi
(n)).
The updating process will use the filtered noisy gradient
w(n+1) = w(n)�µb(n).
The following versions are well known:Averaged LMS algorithm LPF is the filter with impulse responseh(m) = 1
N
m = 0,1, . . . ,N �1.
w(n+1) = w(n)+µN
n
Âj=n�N+1
e(j)u(j).
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 12 / 15
Momentum LMS algorithm LPF is an IIR filter of first orderh(0) = 1� g,h(1) = gh(0),h(2) = g2
h(0), . . . then,
b
i
(n) = LPF (gi
(n)) = gb
i
(n�1)+(1� g)gi
(n)
b(n) = gb(n�1)+(1� g)g(n)
The resulting algorithm can be written as a second order recursion:
w(n+1) = w(n)�µb(n)
gw(n) = gw(n�1)� gµb(n�1)w(n+1)� gw(n) = w(n)� gw(n�1)�µb(n)+ gµb(n�1)
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 13 / 15
w(n+1)� gw(n) = w(n)� gw(n�1)�µb(n)+ gµb(n�1)w(n+1) = w(n)+ g(w(n)�w(n�1))�µ(b(n)� gb(n�1))w(n+1) = w(n)+ g(w(n)�w(n�1))�µ(1� g)g(n)w(n+1) = w(n)+ g(w(n)�w(n�1))+2µ(1� g)e(n)u(n)
w(n+1)�w(n) = g(w(n)�w(n�1))+ µ(1� g)e(n)u(n)
Drawback: The convergence rate may decrease.Advantages: The momentum term keeps the algorithm active even in theregions close to minimum.
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 14 / 15
5. Nonlinear smoothing of LMS gradient estimates
Impulsive interference in either d(n) or u(n), drastically degrades LMSperformance.
Smooth noisy gradient components using a nonlinear filter.
The Median LMS AlgorithmThe adaptation equation can be implemented as
w
i
(n+1) = w
i
(n)�µ ·med((e(n)u(n� i)),(e(n�1)u(n�1� i)), . . . ,
(e(n�N)u(n�N � i)))
Smoothing effect in impulsive noise environment is very strong.If the environment is not impulsive, the performances of MedianLMS are comparable with those of LMS.Convergence rate is slower than in LMS.
(Variants of the LMS algorithm) Gonzalo R. Arce Spring, 2013 15 / 15