Upload
bengi-mutlu-duelek
View
230
Download
0
Embed Size (px)
Citation preview
8/3/2019 Estimation Theory Presentation
1/66
Estimation Theory
Alireza Karimi
Laboratoire dAutomatique, MEC2 397,email: [email protected]
Spring 2011
(Introduction) Estimation Theory Spring 2011 1 / 66
8/3/2019 Estimation Theory Presentation
2/66
Course Objective
Extract information from noisy signalsParameter Estimation Problem : Given a set of measured data
{x[0], x[1], . . . , x[N 1]}
which depends on an unknown parameter vector , determine an estimator
= g(x[0], x[1], . . . , x[N 1])
where g is some function.
Applications : Image processing, communications, biomedicine, systemidentification, state estimation in control, etc.
(Introduction) Estimation Theory Spring 2011 2 / 66
8/3/2019 Estimation Theory Presentation
3/66
8/3/2019 Estimation Theory Presentation
4/66
8/3/2019 Estimation Theory Presentation
5/66
References
Main reference :
Fundamentals of Statistical Signal ProcessingEstimation Theory
by Steven M. KAY, Prentice-Hall, 1993 (available in Library de La
Fontaine, RLC). We cover Chapters 1 to 14, skipping Chapter 5 andChapter 9.
Other references :
Lessons in Estimation Theory for Signal Processing, Communicationsand Control. By Jerry M. Mendel, Prentice-Hall, 1995.
Probability, Random Processes and Estimation Theory for Engineers.By Henry Stark and John W. Woods, Prentice-Hall, 1986.
(Introduction) Estimation Theory Spring 2011 5 / 66
8/3/2019 Estimation Theory Presentation
6/66
Review of Probability and Random Variables
(Probability and Random Variables) Estimation Theory Spring 2011 6 / 66
8/3/2019 Estimation Theory Presentation
7/66
Random Variables
Random Variable : A rule X() that assigns to every element of a samplespace a real value is called a RV. So X is not really a variable that variesrandomly but a function whose domain is and whose range is somesubset of the real line.
Example : Consider the experiment of throwing a coin twice. The sample
space (the possible outcomes) is :
= {HH, HT, TH, TT}
We can define a random variable X such that
X(HH) = 1, X(HT) = 1.1, X(TH) = 1.6, X(TT) = 1.8
Random variable X assigns to each event (e.g. E = {HT, TH} ) asubset of the real line (in this case B = {1.1, 1.6}).
(Probability and Random Variables) Estimation Theory Spring 2011 7 / 66
8/3/2019 Estimation Theory Presentation
8/66
Probability Distribution Function
For any element in , the event {|X() x} is an important event.The probability of this event
Pr[{|X() x}] = PX(x)is called the probability distribution function of X.
Example : For the random variable defined earlier, we have :PX(1.5) = Pr[{|X() 1.5}] = Pr[{HH, HT}] = 0.5
PX(x) can be computed for all x R. It is clear that 0 PX(x) 1.
Remark :For the same experiment (throwing a coin twice) we could defineanother random variable that would lead to a different PX(x).
In most of engineering problems the sample space is a subset of the
real line so X() = and PX(x) is a continuous function of x.(Probability and Random Variables) Estimation Theory Spring 2011 8 / 66
8/3/2019 Estimation Theory Presentation
9/66
Probability Density Function (PDF)
The Probability Density Function, if it exists, is given by :
pX(x) =dPX(x)
dx
When we deal with a single random variable the subscripts are removed :
p(x) =dP(x)
dx
Properties :
(i)
p(x)dx = P() P() = 1
(ii) Pr[{|X() x}] = Pr[X x] = P(x) = x
p()d
(iii) Pr[x1 < X x2] =x2x1
p(x)dx
(Probability and Random Variables) Estimation Theory Spring 2011 9 / 66
8/3/2019 Estimation Theory Presentation
10/66
8/3/2019 Estimation Theory Presentation
11/66
Some other common PDF
Chi-square 2 : p(x) = 1
2n(n/2)xn/21 exp( x2 ) for x > 0
0 for x < 0
Exponential ( > 0) : p(x) =1
exp(x/)u(x)
Rayleigh ( > 0) : p(x) =x
2exp(x2
22)u(x)
Uniform (b > a) : p(x) =
1
ba a < x < b0 otherwise
where (z) =
0
tz1etdt and u(x) is the unit step function.
(Probability and Random Variables) Estimation Theory Spring 2011 11 / 66
8/3/2019 Estimation Theory Presentation
12/66
Joint, Marginal and Conditional PDF
Joint PDF : Consider two random variables X and Y then :
Pr[x1 < X x2 and y1 < Y y2] = x2x1 y2
y1 p(x, y)dxdy
Marginal PDF :
p(x) =
p(x, y)dy and p(y) =
p(x, y)dx
Conditional PDF : p(x|y) is defined as the PDF of X conditioned onknowing the value of Y.
Bayes Formula : Consider two RVs defined on the same probability spacethen we have :
p(x, y) = p(x|y)p(y) = p(y|x)p(x) or p(x|y) = p(x, y)p(y)
(Probability and Random Variables) Estimation Theory Spring 2011 12 / 66
8/3/2019 Estimation Theory Presentation
13/66
Independent Random Variables
Two RVs X and Y are independent if and only if :
p(x, y) = p(x)p(y)
A direct conclusion is that :
p(x|y) = p(x, y)p(y)
=p(x)p(y)
p(y)= p(x) and p(y|x) = p(y)
which means conditioning does not change the PDF.
Remark : For a joint Gaussian pdf the contours of constant density is anellipse centered at (x, y). For independent X and Y the major (orminor) axis is parallel to x or y axis.
(Probability and Random Variables) Estimation Theory Spring 2011 13 / 66
8/3/2019 Estimation Theory Presentation
14/66
Expected Value of a Random Variable
The expected value, if it exists, of a random variable X with PDF p(x) isdefined by :
E(X) =
xp(x)dx
Some properties of expected value :
E
{X + Y
}= E
{X
}+ E
{Y
}E{aX} = aE{X}The expected value of Y = g(X) can be computed by :
E(Y) =
g(x)p(x)dx
Conditional expectation : The conditional expectation of X given that aspecific value of Y has occurred is :
E(X|Y) =
xp(x|y)dx
(Probability and Random Variables) Estimation Theory Spring 2011 14 / 66
8/3/2019 Estimation Theory Presentation
15/66
Moments of a Random Variable
The rth moment of X is defined as :
E(Xr) =
xrp(x)dx
The first moment of X is its expected value or the mean ( = E(X)).
Moments of Gaussian RVs : A Gaussian RV with N(, 2
) hasmoments of all orders in closed form
E(X) = E(X2) = 2 + 2
E(X
3
) =
3
+ 3
2
E(X4) = 4 + 622 + 34
E(X5) = 5 + 1032 + 154
E(X6) = 6 + 1542 + 4524 + 156
(Probability and Random Variables) Estimation Theory Spring 2011 15 / 66
C
8/3/2019 Estimation Theory Presentation
16/66
Central Moments
The rth central moment of X is defined as :
E[(X )r] =
(x )rp(x)dx
The second central moment (variance) is denoted by 2 or var(X).
Central Moments of Gaussian RVs :
E[(X )r] =
0 if r is oddr(r 1)!! ifr is even
where n!! denotes the double factorial that is the product of every oddnumber from n to 1.
(Probability and Random Variables) Estimation Theory Spring 2011 16 / 66
S i f G i RV
8/3/2019 Estimation Theory Presentation
17/66
Some properties of Gaussian RVs
If X N
(x, 2
x
) then Z = (X
x)/x N
(0, 1).
If Z N(0, 1) then X = xZ + x N(x, 2x).If X N(x, 2x) then Z = aX + b N(ax + b, a22x).If X N(x, 2x) and Y N(y, 2y) are two independent RVs, then
aX + bY N(ax + by, a2x + b2y)
The sum of square of n independent RV with standard normaldistributionN(0, 1) has a 2n distribution with n degree of freedom.For large value of n, 2
n
converges toN
(n, 2n).
The Euclidian norm X2 + Y2 of two independent RVs withstandard normal distribution has the Rayleigh distribution.
(Probability and Random Variables) Estimation Theory Spring 2011 17 / 66
C i
8/3/2019 Estimation Theory Presentation
18/66
Covariance
For two RVs X and Y, the covariance is defined as
xy = E[(X x)(Y y)]xy =
(x x)(y y)p(x, y)dxdy
If X and Y are zero mean then xy = E{XY}.var(X + Y) = 2x + 2y + 2xyvar(aX) = a22x
Important formula : The relation between the variance and the mean of
X is given by
2 = E[(X )2] = E(X2) 2E(X) + 2= E(X2) 2
The variance is the mean of the square minus the square of the mean.(Probability and Random Variables) Estimation Theory Spring 2011 18 / 66
I d d U l d d O h li
8/3/2019 Estimation Theory Presentation
19/66
Independence, Uncorrelatedness and Orthogonality
If xy = 0, then X and Y are uncorrelated and
E{XY} = E{X}E{Y}X and Y called orthogonal if E{XY} = 0.If X and Y are independent then they are uncorrelated.
p(x, y) = p(x)p(y) E{XY} = E{X}E{Y}Uncorrelatedness does not imply the independence. For example, if Xis a normal RV with zero mean and Y = X2 we have p(y|x) = p(y)but
xy = E{XY} E{X}E{Y} = E{X3
} 0 = 0The correlation only shows the linear dependence between two RV sois weaker than independence.
For Jointly Gaussian RVs, independence is equivalent to beinguncorrelated.
(Probability and Random Variables) Estimation Theory Spring 2011 19 / 66
R d V t
8/3/2019 Estimation Theory Presentation
20/66
Random Vectors
Random Vector : is a vector of random variables 1 :
x = [x1, x2, . . . , xn]T
Expectation Vector : x = E(x) = [E(x1), E(x2), . . . , E(xn)]T
Covariance Matrix : Cx = E[(x x)(x x)T]Cx is an n
n symmetric matrix which is assumed to be positive
definite and so invertible.
The elements of this matrix are : [Cx]ij = E{[xi E(xi)][xj E(xj)]}.If the random variables are uncorrelated then Cx is a diagonal matrix.
Multivariate Gaussian PDF :
p(x) =1
(2)n det(Cx)exp1
2(x x)TC1x (x x)
1In some books (including our main reference) there is no distinction between random
variable X and its specific value x. From now on we adopt the notation of our reference.(Probability and Random Variables) Estimation Theory Spring 2011 20 / 66
R d P
8/3/2019 Estimation Theory Presentation
21/66
Random Processes
Discrete Random Process : x[n] is a sequence of random variablesdefined for every integer n.
Mean value : is defined as E(x[n]) = x[n].Autocorrelation Function (ACF) : is defined as
rxx[k, n] = E(x[n]x[n + k])
Wide Sense Stationary (WSS) : x[n] is WSS if its mean and its
autocorrelation function (ACF) do not depend on n.Autocovariance function : is defined as
cxx[k] = E[(x[n] x)(x[n + k] x)] = rxx[k] 2xCross-correlation Function (CCF) : is defined as
rxy[k] = E(x[n]y[n + k])
Cross-covariance function : is defined as
cxy[k] = E[(x[n]
x)(y[n + k]
y)] = rxy[k]
xy
(Probability and Random Variables) Estimation Theory Spring 2011 21 / 66
Discrete White Noise
8/3/2019 Estimation Theory Presentation
22/66
Discrete White Noise
Some properties of ACF and CCF :
rxx[0] |rxx[k]| rxx[k] = rxx[k] rxy[k] = ryx[k]Power Spectral Density : The Fourier transform of ACF and CCF givesthe Auto-PSD and Cross-PSD :
Pxx(f) =
k=rxx[k]exp(j2fk)
Pxy(f) =
k=rxy[k]exp(j2fk)
Discrete White Noise : is a discrete random process with zero mean andrxx[k] =
2[k] where [k] is the Kronecker impulse function. The PSD ofwhite noise becomes Pxx(f) =
2 and is completely flat with frequency.
(Probability and Random Variables) Estimation Theory Spring 2011 22 / 66
8/3/2019 Estimation Theory Presentation
23/66
Introduction and Minimum Variance UnbiasedEstimation
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 23 / 66
The Mathematical Estimation Problem
8/3/2019 Estimation Theory Presentation
24/66
The Mathematical Estimation Problem
Parameter Estimation Problem : Given a set of measured data
x ={
x[0], x[1], . . . , x[N
1]}
which depends on an unknown parameter vector , determine an estimator
= g(x[0], x[1], . . . , x[N 1])where g is some function.
The first step is to find the PDF of data as afunction of : p(x; )
Example : Consider the problem of DC level in white Gaussian noise withone observed data x[0] = + w[0] where w[0] has the PDF N(0, 2).Then the PDF of x[0] is :
p(x[0]; ) =1
22exp
1
22(x[0] )2
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 24 / 66
The Mathematical Estimation Problem
8/3/2019 Estimation Theory Presentation
25/66
The Mathematical Estimation Problem
Example : Consider a data sequence that can be modeled with a linear
trend in white Gaussian noise
x[n] = A + Bn + w[n] n = 0, 1, . . . , N 1
Suppose that w[n] N(0, 2) and is uncorrelated with all the othersamples. Letting = [A B] and x = [x[0], x[1], . . . , x[N 1]] the PDF is :
p(x;) =N1n=0
p(x[n];) =1
(
22)Nexp
1
22
N1n=0
(x[n] A Bn)2
The quality of any estimator for this problem is related to the assumptionson the data model. In this example, linear trend and WGN PDFassumption.
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 25 / 66
The Mathematical Estimation Problem
8/3/2019 Estimation Theory Presentation
26/66
The Mathematical Estimation Problem
Classical versus Bayesian estimation
If we assume is deterministic we will have a classical estimationproblem. The following method will be studied : MVU, MLE, BLUE,LSE.
If we assume is a random variable with a known PDF, then we will
have a Bayesian estimation problem. In this case the data aredescribed as the joint PDF
p(x, ) = p(x|)p()
where p() summarizes our knowledge about before any data isobserved and p(x|) summarizes our knowledge provided by data xconditioned on knowing . The following methods will be studied :MMSE, MAP, Kalman Filter.
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 26 / 66
Assessing Estimator Performance
8/3/2019 Estimation Theory Presentation
27/66
Assessing Estimator Performance
Consider the problem of estimating a DC level A in uncorrelated noise :
x[n] = A + w[n] n = 0, 1, . . . , N 1Consider the following estimators :
A1 =1
N
N1
n=0 x[n]A2 = x[0]
Suppose that A = 1, A1 = 0.95 and A2 = 0.98. Which estimator is better ?
An estimator is a random variable, so itsperformance can only be described by its PDF or
statistically (e.g. by Monte-Carlo simulation).
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 27 / 66
Unbiased Estimators
8/3/2019 Estimation Theory Presentation
28/66
Unbiased Estimators
An estimator that on the average yield the true value is unbiased.Mathematically
E( ) = 0 for a < < bLets compute the expectation of the two estimators A1 and A2 :
E(A1) =1
N
N1
n=0 E(x[n]) =1
N
N1
n=0 E(A + w[n]) =1
N
N1
n=0 (A + 0) = AE(A2) = E(x[0]) = E(A + w[0]) = A + 0 = A
Both estimators are unbiased. Which one is better ?Now, lets compute the variance of the two estimators :
var(A1) = var
1
N
N1n=0
x[n]
=
1
N2
N1n=0
var(x[n]) =1
N2N2 =
2
N
var(A2) = var(x[0]) = 2 > var(A1)
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 28 / 66
Unbiased Estimators
8/3/2019 Estimation Theory Presentation
29/66
Unbiased Estimators
Remark : When several unbiased estimators of the same parameters fromindependent set of data are available, i.e., 1, 2, . . . , n, a better estimatorcan be obtained by averaging :
=1
n
n
i=1i E() =
Assuming that the estimators have the same variance, we have :
var() =1
n2
n
i=1var(i) =
1
n2n var(i) =
var(i)
n
By increasing n, the variance will decrease (if n , ).It is not the case for biased estimators, no matter how many estimators areaveraged.
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 29 / 66
Minimum Variance Criterion
8/3/2019 Estimation Theory Presentation
30/66
Minimum Variance Criterion
The most logical criterion for estimation is the Mean Square Error (MSE) :
mse() = E[( )2]Unfortunately this type of estimators leads to unrealizable estimators (theestimator will depend on ).
mse() = E{[ E() + E() ]2} = E{[ E() + b()]2}where b() = E() is defined as the bias of the estimator. Therefore :
mse() = E{[ E()]2} + 2b()E[ E()] + b2() = var() + b2()
Instead of minimizing MSE we can minimize the variance ofthe unbiased estimators :
Minimum Variance Unbiased Estimator
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 30 / 66
Minimum Variance Unbiased Estimator
8/3/2019 Estimation Theory Presentation
31/66
u a a ce U b ased st ato
Existence of MVU Estimator : In general MVU estimator does notalways exist. There may be no unbiased estimator or non of unbiasedestimators has a uniformly minimum variance.
Finding the MVU Estimator : There is no known procedure which
always leads to the MVU estimator. Three existing approaches are :1 Determine the Cramer-Rao lower bound (CRLB) and check to see if
some estimator satisfies it.
2 Apply the Rao-Blackwell-Lehmann-Scheffe theorem (we will skip it).
3 Restrict to linear unbiased estimators.
(Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 31 / 66
8/3/2019 Estimation Theory Presentation
32/66
Cramer-Rao Lower Bound
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 32 / 66
Cramer-Rao Lower Bound
8/3/2019 Estimation Theory Presentation
33/66
CRLB is a lower bound on the variance of any unbiased estimator.
var() CRLB()
Note that the CRLB is a function of.It tells us what is the best performance that can be achieved(useful in feasibility study and comparison with other estimators).
It may lead us to compute the MVU estimator.
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 33 / 66
Cramer-Rao Lower Bound
8/3/2019 Estimation Theory Presentation
34/66
Theorem (scalar case)
Assume that the PDF p(x; ) satisfies the regularity condition
E
ln p(x; )
= 0 for all
Then the variance of any unbiased estimator satisfies
var() E
2 ln p(x; )
2
1An unbiased estimator that attains the CRLB can be found iff :
ln p(x; )
= I()(g(x) )
for some functions g(x) and I(). The estimator is = g(x) and theminimum variance is 1/I().
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 34 / 66
Cramer-Rao Lower Bound
8/3/2019 Estimation Theory Presentation
35/66
Example : Consider x[0] = A + w[0] with w[0] N(0, 2).
p(x[0]; A) = 122
exp 122
(x[0] A)2ln p(x[0], A) = ln
22 1
22(x[0] A)2
Then
ln p(x[0]; A)
A=
1
2(x[0] A)
2 ln p(x[0]; A)
A2=
1
2
According to Theorem :
var(A) 2 and I(A) = 12
and A = g(x[0]) = x[0]
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 35 / 66
Cramer-Rao Lower Bound
8/3/2019 Estimation Theory Presentation
36/66
Example : Consider multiple observations for DC level in WGN :
x[n] = A + w[n] n = 0, 1, . . . , N
1 with w[n]
N(0, 2)
p(x; A) =1
(
22)Nexp
1
22
N1n=0
(x[n] A)2
Then
ln p(x; A)
A=
A
ln[(22)N/2] 1
22
N1n=0
(x[n] A)2
=1
2
N1
n=0 (x[n] A) =N
2 1NN1
n=0 x[n] AAccording to Theorem :
var(A) 2
Nand I(A) =
N
2and A = g(x[0]) =
1
N
N1
n=0 x[n](Cramer-Rao Lower Bound) Estimation Theory Spring 2011 36 / 66
Transformation of Parameters
8/3/2019 Estimation Theory Presentation
37/66
If it is desired to estimate = g(), then the CRLB is :
var() E2 ln p(x; )2
1g2
Example : Compute the CRLB for estimation of the power (A2) of a DClevel in noise :
var(A2) 2
N(2A)2 = 4A
2
2
N
Definition
Efficient estimator : An unbiased estimator that attains the CRLB is said
to be efficient.
Example : Knowing that x =1
N
N1n=0
x[n] is an efficient estimator for A, is
x2 an efficient estimator for A2 ?(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 37 / 66
Transformation of Parameters
8/3/2019 Estimation Theory Presentation
38/66
Solution : Knowing that x N(A, 2/N), we have :
E(x2) = E2(x) + var(x) = A2 +2
N = A2
So the estimator A2 = x2 is not even unbiased.Lets look at the variance of this estimator :
var(x2) = E(x4) E2(x2)but we have from the moments of Gaussian RVs (slide 15) :
E(x4) = A4 + 6A22
N
+ 3(2
N
)2
Therefore :
var(x2) = A4 + 6A22
N+
34
N2
A2 +2
N
2=
4A22
N+
24
N2
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 38 / 66
Transformation of Parameters
8/3/2019 Estimation Theory Presentation
39/66
Remarks :
The estimator A2 = x2 is biased and not efficient.As N the bias goes to zero and the variance of the estimatorapproaches the CRLB. This type of estimators are calledasymptotically efficient.
General Remarks :
If g() = a + b is an affine function of , then g() = g() is anefficient estimator. First, it is unbiased : E(a + b) = a + b = g(),moreover :
var(g()) g
2
var() = a2var()
but var(g()) = var(a + b) = a2var(), so that the CRLB is achieved.If g() is a nonlinear function of and is an efficient estimator,then g() is an asymptotically efficient estimator.
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 39 / 66
Cramer-Rao Lower Bound
8/3/2019 Estimation Theory Presentation
40/66
Theorem (Vector Parameter)
Assume that the PDF p(x;) satisfies the regularity condition
E
ln p(x;)
= 0 for all
Then the variance of any unbiased estimator satisfies C
I1()
0where 0 means that the matrix is positive semidefinite. I() is called theFisher information matrix and is given by :
Iij() =
E
2 ln p(x;)
ij An unbiased estimator that attains the CRLB can be found iff :
ln p(x;)
= I()(g(x) )
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 40 / 66
CRLB Extension to Vector Parameter
8/3/2019 Estimation Theory Presentation
41/66
Example : Consider a DC level in WGN with A and 2 unknown.Compute the CRLB for estimation of = [A 2]T.
ln p(x;) = N2
ln 2 N2
ln 2 122
N1n=0
(x[n] A)2
The Fisher information matrix is :
I() = E 2 ln p(x;)
A22 ln p(x;)A2
2 ln p(x;)A2
2 ln p(x;)(2)2
= N
20
0 N24
The matrix is diagonal (just for this example) and can be easily inverted to
yield :
var() 2
Nvar(
2) 2
4
N
Is there any unbiased estimator that achieves these bounds ?
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 41 / 66
Transformation of Parameters
8/3/2019 Estimation Theory Presentation
42/66
If it is desired to estimate = g(), and the CRLB for the covariance of is I1(), then :
C g
E
2 ln p(x; )2
1g
TExample : Consider a DC level in WGN with A and 2 unknown.
Compute the CRLB for estimation of signal to noise ratio = A2
/2
.We have = [A 2]T and = g() = 21/2, then the Jacobian is :
g()
=
g()
1
g()
2
=
2A
2 A
2
4
So the CRLB is :
var()
2A
2 A
2
4
N2
0
0 N24
1 2A2A24
=
4 + 22
N
(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 42 / 66
Linear Models with WGN
8/3/2019 Estimation Theory Presentation
43/66
If N point samples of data observed can be modeled as
x = H + w
where
x = N 1 observation vectorH = N p observation matrix (known, rank p) = p 1 vector of parameters to be estimatedw = N 1 noise vector with PDF N(0, 2I)
Compute the CRLB and the MVU estimator that achieves this bound.
Step 1 : Compute ln p(x;).
Step 2 : Compute I() =
E
2 ln p(x;)
2 and the covariance
matrix of : C = I1().
Step 3 : Find the MVU estimator g(x) by factoring
ln p(x;)
= I()[g(x) ]
(Linear Models) Estimation Theory Spring 2011 43 / 66
Linear Models with WGN
8/3/2019 Estimation Theory Presentation
44/66
Step 1 : ln p(x;) = ln(
22)N 122
(x H)T(x H).Step 2 :
ln p(x;)
= 122
[xTx 2xTH + THTH]
=1
2[HTx HTH]
Then I() = E2 ln p(x;)2 = 12 HTHStep 3 : Find the MVU estimator g(x) by factoring
ln p(x;)
= I()[g(x) ]
=HTH
2[(HTH)1HTx ]
Therefore :
= g(x) = (HTH)1HTx C = I1() = 2(HTH)1
(Linear Models) Estimation Theory Spring 2011 44 / 66
Linear Models with WGN
8/3/2019 Estimation Theory Presentation
45/66
For a linear model with WGN represented by x = H + w the MVU
estimator is : = (HTH)1HTx
This estimator is efficient and attains the CRLB.
That the estimator is unbiased can be seen easily by :
E() = (HTH)1HTE(H + w) =
The statistical performance of is completely specified because is alinear transformation of a Gaussian vector x and hence has a Gaussian
distribution : N(, 2(HTH)1)
(Linear Models) Estimation Theory Spring 2011 45 / 66
Example (Curve Fitting)
8/3/2019 Estimation Theory Presentation
46/66
Consider fitting the data x[n] by a p-th order polynomial function of n :
x[n] = 0 + 1n + 2n2 +
+ pn
p + w[n]
We have N data sample, then :
x = [x[0], x[1], . . . , x[N 1]]Tw = [w[0], w[1], . . . , w[N 1]]T
= [0, 1, . . . , p]T
so x = H + w, where H is :
1 0 0 01 1 1 11 2 4 2
p
......
.... . .
...1 N 1 (N 1)2 (N 1)p
N(p+1)
Hence the MVU estimator is : = (HTH)1HTx(Linear Models) Estimation Theory Spring 2011 46 / 66
Example (Fourier Analysis)
8/3/2019 Estimation Theory Presentation
47/66
Consider the Fourier analysis of the data x[n] :
x[n] =
Mk=1
akcos(2knN
) +
Mk=1
bk sin( 2knN
) + w[n]
so we have = [a1, a2, . . . , aM, b1, b2, . . . , bM]T and x = H + w where :
H = [ha1, ha2, . . . , haM, hb1, hb2, . . . , hbM]
with
hak =
1
cos( 2kN
)
cos(2k2N )...
cos( 2k(N1)N
)
, hbk =
1
sin( 2kN
)
sin(2k2N )...
sin( 2k(N1)N
)
Hence the MVU estimate of the Fourier coefficients is : = (HTH)1HTx(Linear Models) Estimation Theory Spring 2011 47 / 66
Example (Fourier Analysis)
8/3/2019 Estimation Theory Presentation
48/66
After simplification (noting that (HTH)1 = 2N
I), we have :
=
2
N[(ha
1)T
x, . . . , (haM)
T
x , (hb
1)T
x, . . . , (hbM)
T
x]T
which is the same as the standard solution :
ak =2
N
N1
n=0 x[n]cos2kn
N , bk =2
N
N1
n=0 x[n]sin2kn
N From the properties of linear models the estimates are unbiased.
The covariance matrix is :
C = 2(HTH)1 = 2
2
NI
Note that is Gaussian and C is diagonal (the amplitude estimatesare independent).
(Linear Models) Estimation Theory Spring 2011 48 / 66
Example (System Identification)
8/3/2019 Estimation Theory Presentation
49/66
Consider identification of a Finite Impulse Response (FIR) model, h[k] fork = 0, 1, . . . , p
1, with input u[n] and output x[n] provided for
n = 0, 1, . . . , N 1 :
x[n] =
p1k=0
h[k]u[n k] + w[n] n = 0, 1, . . . , N 1
FIR model can be represented by the linear model x = H + w where
=
h[0]h[1]
. . .h[p 1]
p1
H =
u[0] 0 0u[1] u[0] 0
.
..
.
.... .
.
..u[N 1] u[N 2] u[N p]
Np
The MVU estimate is = (HTH)1HTx with C = 2(HTH)1.
(Linear Models) Estimation Theory Spring 2011 49 / 66
Linear Models with Colored Gaussian Noise
8/3/2019 Estimation Theory Presentation
50/66
Determine the MVU estimator for the linear model x = H + w with w acolored Gaussian noise with N(0, C).Whitening approach : Since C is positive definite, its inverse can befactored as C1 = DTD where D is an invertible matrix. This matrix actsas a whitening transformation for w :
E[(Dw)(Dw)T] = E(DwwTD) = DCDT = DD1DTD = I
Now if we transform the linear model x = H + w to :x = Dx = DH + Dw = H + w
where w = Dw N(0, I) is white and we can compute the MVUestimator as :
= (HTH)1HTx = (HTDTDH)1HTDTDx
so, we have :
= (HTC1H)1HTC1x with C = (HTH)1 = (HTC1H)1
(Linear Models) Estimation Theory Spring 2011 50 / 66
Linear Models with known components
8/3/2019 Estimation Theory Presentation
51/66
Consider a linear model x = H + s + w, where s is a known signal. Todetermine the MVU estimator let x = x s, so that x = H + w is astandard linear model. The MVU estimator is :
= (HTH)1HT(x s) with C = 2(HTH)1
Example : Consider a DC level and exponential in WGN :x[n] = A + rn + w[n] where r is known. Then we have :
x[0]x[1]
...x[N
1]
=
11...1
A +
1r...
rN1
+
w[0]w[1]
...w[N
1]
The MVU estimator is :
A = (HTH)1HT(x s) = 1N
N1n=0
(x[n] rn) with var(A) = 2
N
(Linear Models) Estimation Theory Spring 2011 51 / 66
Best Linear Unbiased Estimators (BLUE)
8/3/2019 Estimation Theory Presentation
52/66
Problems of finding the MVU estimators :
The MVU estimator does not always exist or impossible to find.
The PDF of data may be unknown.
BLUE is a suboptimal estimator that :
restricts estimates to be linear in data ; = Ax
restricts estimates to be unbiased ; E() = AE(x) =
minimizes the variance of the estimates ;
needs only the mean and the variance of the data (not the PDF). Asa result, in general, the PDF of the estimates cannot be computed.
Remark : The unbiasedness restriction implies a linear model for the data.However, it may still be used if the data are transformed suitably or themodel is linearized.
(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 52 / 66
Finding the BLUE (Scalar Case)
8/3/2019 Estimation Theory Presentation
53/66
1 Choose a linear estimator for the observed data
x[n] , n = 0, 1, . . . , N 1
=N1n=0
anx[n] = aTx where a = [a0, a1, . . . , aN1]T
2 Restrict estimate to be unbiased :
E() =
N1n=0
anE(x[n]) =
3 Minimize the variance
var() = E{[ E()]2} = E{[aTx aTE(x)]2}= E{aT[x E(x)][x E(x)]Ta} = aTCa
(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 53 / 66
Finding the BLUE (Scalar Case)
8/3/2019 Estimation Theory Presentation
54/66
Consider the problem of amplitude estimation of known signals in noise :
x[n] = s[n] + w[n]
1 Choose a linear estimator : =
N1n=0
anx[n] = aTx
2 Restrict estimate to be unbiased : E() = aTE(x) = aTs =
then aTs = 1 where s = [s[0], s[1], . . . , s[N 1]]T3 Minimize aTCa subject to aTs = 1.
The constrained optimization can be solved using Lagrangian Multipliers :
Minimize J = aT
Ca + (aT
s 1)The optimal solution is :
=sTC1xsTC1s
and var() =1
sTC1s(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 54 / 66
Finding the BLUE (Vector Case)
8/3/2019 Estimation Theory Presentation
55/66
Theorem (GaussMarkov)
If the data are of the general linear model form
x = H + w
with w is a noise vector with zero mean and covariance C (the PDF of w
is arbitrary), then the BLUE of is :
= (HTC1H)1HTC1x
and the covariance matrix of is
C = (HTC1H)1
Remark : If noise is Gaussian then BLUE is MVU estimator.
(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 55 / 66
Finding the BLUE
E l C id h bl f DC l l i hi i
8/3/2019 Estimation Theory Presentation
56/66
Example : Consider the problem of DC level in white noise :x[n] = A + w[n], where w[n] is of unspecified PDF with var(w[n]) = 2n.
We have = A and H = 1 = [1, 1, . . . , 1]T
. The covariance matrix is :
C =
20 0 00 21 0...
.... . .
...
0 0 2N1
C1 =
120
0 00 1
21 0
......
. . ....
0 0 12N1
and hence the BLUE is :
= (HTC1H)1HTC1x = N1
n=01
2n
1 N1
n=0x[n]
2n
and the minimum covariance is :
C = (HTC1H)1 =
N1
n=01
2n
1(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 56 / 66
8/3/2019 Estimation Theory Presentation
57/66
Maximum Likelihood Estimation
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 57 / 66
Maximum Likelihood Estimation
Problems : MVU ti t d t ft i t t b f d
8/3/2019 Estimation Theory Presentation
58/66
Problems : MVU estimator does not often exist or cannot be found.BLUE is restricted to linear models.
Maximum Likelihood Estimator (MLE) :
can always be applied if the PDF is known ;
is optimal for large data size ;
is computationally complex and requires numerical methods.
Basic Idea : Choose the parameter value that makes the observed data,the most likely data to have been observed.
Likelihood Function : is the PDF p(x; ) when is regarded as a variable
(not a parameter).
ML Estimate : is the value of that maximizes the likelihood function.
Procedure : Find log-likelihood function ln p(x; ) ; differentiate w.r.t and set to zero and solve for .
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 58 / 66
Maximum Likelihood Estimation
Example : Consider DC level in WGN with unknown variance
8/3/2019 Estimation Theory Presentation
59/66
Example : Consider DC level in WGN with unknown variancex[n] = A + w[n]. Suppose that A > 0 and 2 = A. The PDF is :
p(x; A) =1
(2A)N2
exp 1
2A
N1n=0
(x[n] A)2Taking the derivative of the log-likelihood function, we have :
ln p(x; A)
A= N
2A+
1
A
N1n=0
(x[n] A) + 12A2
N1n=0
(x[n] A)2
What is the CRLB ? Does an MVU estimator exist ?
MLE can be found by setting the above equation to zero :
A = 12
+
1N
N1n=0
x2[n] +1
4
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 59 / 66
Maximum Likelihood Estimation
Properties : MLE may be biased and is not necessarily an efficient
8/3/2019 Estimation Theory Presentation
60/66
Properties : MLE may be biased and is not necessarily an efficientestimator. However :
MLE is asymptotically unbiased meaning that limNE() .
MLE is asymptotically attains the CRLB.MLE has asymptotically a Gaussian PDF :
a N(, I1()).If an MVE estimator exists then ML procedure will find it.
Example : Consider DC level in WGN with known variance 2
. Then
p(x; A) =1
(22)N2
exp
1
22
N1n=0
(x[n] A)2
ln p(x; A)A = 12
N1n=0
(x[n] A) = 0 N1n=0
x[n] N = 0
which leads to = x =1
N
N1
n=0
x[n]
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 60 / 66
MLE for Transformed Parameters (Invariance Property)Find MLE for = g(), when PDF p(x; ) is given :
8/3/2019 Estimation Theory Presentation
61/66
( ) ( )1 If = g() is a one-to-one function, then
= arg max p(x; g1
()) = g()
Example : Consider DC level in WGN and find MLE of = exp(A).Since g() is a one-to-one function then = exp(x).
2 If = g() is not a one-to-one function, thenpT(x; ) = max
{:=g()}p(x; ) and = arg max
pT(x; )
Example : Consider DC level in WGN and find MLE of = A2. Sinceg() is not a one-to-one function then :
= arg max
0{p(x; ), p(x; )}
=
arg max0
{p(x; ), p(x; )}2= arg max
8/3/2019 Estimation Theory Presentation
62/66
Example : DC Level in WGN with unknown variance. The vectorparameter = [A 2]T should be estimated. We have :
ln p(x;)
A=
1
2
N1n=0
(x[n] A)
ln p(x;)
2
=
N
22
+1
24
N1
n=0 (x[n] A)2which leads to the following MLE :
A = x
2
=
1
N
N1
n=0 (x[n] x)2Asymptotic properties of the MLE : Under some regularity conditions,the MLE is asymptotically normally distributed
a N(, I1()) even ifthe PDF of x is not Gaussian.
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 62 / 66
MLE for General Gaussian Case
8/3/2019 Estimation Theory Presentation
63/66
Consider the general Gaussian case where x N((), C()).The partial derivative of the PDF is :
ln p(x;)
k= 1
2tr
C1()
C()
k
+
()T
kC1()(x ())
1
2
(x
())T
C1()
k(x
())
for k = 1, . . . , p.
By setting the above equations equal to zero, MLE can be found.
A particular case is when C is known (the first and third terms
become zero).
In addition, if() is linear in , the general linear model is obtained.
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 63 / 66
MLE for General Linear Models
Consider the general linear model x = H + w where w is a noise vector
8/3/2019 Estimation Theory Presentation
64/66
Consider the general linear model x H + w where w is a noise vectorwith PDF N(0, C) :
p(x;) = 1(2)n det(C)
exp 12
(x H)TC1(x H)Taking the derivative of ln p(x;) leads to :
ln p(x;)
= (H)T
C1(x H)
Then
H
T
C1
(x H) = 0 = (H
T
C1
H)1
H
T
C1
xwhich is the same as MVU estimator. The PDF of is :
N(, (HTC1H)1)(Maximum Likelihood Estimation) Estimation Theory Spring 2011 64 / 66
MLE (Numerical Method)
8/3/2019 Estimation Theory Presentation
65/66
NewtonRaphson : A closed form estimator cannot be always computedby maximizing the likelihood function. However, the maximum value can
be computed by the numerical methods like the iterative NewtonRaphsonalgorithm.
k+1 = k 2 ln p(x;)
T 1
ln p(x;)
=kRemarks :
The Hessian can be replaced by the negative of its expectation, theFisher information matrix I().
This method suffers from convergence problems (local maximum).Typically, for large data length, the log-likelihood function becomesmore quadratic and the algorithm will produce the MLE.
(Maximum Likelihood Estimation) Estimation Theory Spring 2011 65 / 66
8/3/2019 Estimation Theory Presentation
66/66
Least Squares Estimation
(Least Squares) Estimation Theory Spring 2011 66 / 66