Estimation Theory Presentation

Embed Size (px)

Citation preview

  • 8/3/2019 Estimation Theory Presentation

    1/66

    Estimation Theory

    Alireza Karimi

    Laboratoire dAutomatique, MEC2 397,email: [email protected]

    Spring 2011

    (Introduction) Estimation Theory Spring 2011 1 / 66

  • 8/3/2019 Estimation Theory Presentation

    2/66

    Course Objective

    Extract information from noisy signalsParameter Estimation Problem : Given a set of measured data

    {x[0], x[1], . . . , x[N 1]}

    which depends on an unknown parameter vector , determine an estimator

    = g(x[0], x[1], . . . , x[N 1])

    where g is some function.

    Applications : Image processing, communications, biomedicine, systemidentification, state estimation in control, etc.

    (Introduction) Estimation Theory Spring 2011 2 / 66

  • 8/3/2019 Estimation Theory Presentation

    3/66

  • 8/3/2019 Estimation Theory Presentation

    4/66

  • 8/3/2019 Estimation Theory Presentation

    5/66

    References

    Main reference :

    Fundamentals of Statistical Signal ProcessingEstimation Theory

    by Steven M. KAY, Prentice-Hall, 1993 (available in Library de La

    Fontaine, RLC). We cover Chapters 1 to 14, skipping Chapter 5 andChapter 9.

    Other references :

    Lessons in Estimation Theory for Signal Processing, Communicationsand Control. By Jerry M. Mendel, Prentice-Hall, 1995.

    Probability, Random Processes and Estimation Theory for Engineers.By Henry Stark and John W. Woods, Prentice-Hall, 1986.

    (Introduction) Estimation Theory Spring 2011 5 / 66

  • 8/3/2019 Estimation Theory Presentation

    6/66

    Review of Probability and Random Variables

    (Probability and Random Variables) Estimation Theory Spring 2011 6 / 66

  • 8/3/2019 Estimation Theory Presentation

    7/66

    Random Variables

    Random Variable : A rule X() that assigns to every element of a samplespace a real value is called a RV. So X is not really a variable that variesrandomly but a function whose domain is and whose range is somesubset of the real line.

    Example : Consider the experiment of throwing a coin twice. The sample

    space (the possible outcomes) is :

    = {HH, HT, TH, TT}

    We can define a random variable X such that

    X(HH) = 1, X(HT) = 1.1, X(TH) = 1.6, X(TT) = 1.8

    Random variable X assigns to each event (e.g. E = {HT, TH} ) asubset of the real line (in this case B = {1.1, 1.6}).

    (Probability and Random Variables) Estimation Theory Spring 2011 7 / 66

  • 8/3/2019 Estimation Theory Presentation

    8/66

    Probability Distribution Function

    For any element in , the event {|X() x} is an important event.The probability of this event

    Pr[{|X() x}] = PX(x)is called the probability distribution function of X.

    Example : For the random variable defined earlier, we have :PX(1.5) = Pr[{|X() 1.5}] = Pr[{HH, HT}] = 0.5

    PX(x) can be computed for all x R. It is clear that 0 PX(x) 1.

    Remark :For the same experiment (throwing a coin twice) we could defineanother random variable that would lead to a different PX(x).

    In most of engineering problems the sample space is a subset of the

    real line so X() = and PX(x) is a continuous function of x.(Probability and Random Variables) Estimation Theory Spring 2011 8 / 66

  • 8/3/2019 Estimation Theory Presentation

    9/66

    Probability Density Function (PDF)

    The Probability Density Function, if it exists, is given by :

    pX(x) =dPX(x)

    dx

    When we deal with a single random variable the subscripts are removed :

    p(x) =dP(x)

    dx

    Properties :

    (i)

    p(x)dx = P() P() = 1

    (ii) Pr[{|X() x}] = Pr[X x] = P(x) = x

    p()d

    (iii) Pr[x1 < X x2] =x2x1

    p(x)dx

    (Probability and Random Variables) Estimation Theory Spring 2011 9 / 66

  • 8/3/2019 Estimation Theory Presentation

    10/66

  • 8/3/2019 Estimation Theory Presentation

    11/66

    Some other common PDF

    Chi-square 2 : p(x) = 1

    2n(n/2)xn/21 exp( x2 ) for x > 0

    0 for x < 0

    Exponential ( > 0) : p(x) =1

    exp(x/)u(x)

    Rayleigh ( > 0) : p(x) =x

    2exp(x2

    22)u(x)

    Uniform (b > a) : p(x) =

    1

    ba a < x < b0 otherwise

    where (z) =

    0

    tz1etdt and u(x) is the unit step function.

    (Probability and Random Variables) Estimation Theory Spring 2011 11 / 66

  • 8/3/2019 Estimation Theory Presentation

    12/66

    Joint, Marginal and Conditional PDF

    Joint PDF : Consider two random variables X and Y then :

    Pr[x1 < X x2 and y1 < Y y2] = x2x1 y2

    y1 p(x, y)dxdy

    Marginal PDF :

    p(x) =

    p(x, y)dy and p(y) =

    p(x, y)dx

    Conditional PDF : p(x|y) is defined as the PDF of X conditioned onknowing the value of Y.

    Bayes Formula : Consider two RVs defined on the same probability spacethen we have :

    p(x, y) = p(x|y)p(y) = p(y|x)p(x) or p(x|y) = p(x, y)p(y)

    (Probability and Random Variables) Estimation Theory Spring 2011 12 / 66

  • 8/3/2019 Estimation Theory Presentation

    13/66

    Independent Random Variables

    Two RVs X and Y are independent if and only if :

    p(x, y) = p(x)p(y)

    A direct conclusion is that :

    p(x|y) = p(x, y)p(y)

    =p(x)p(y)

    p(y)= p(x) and p(y|x) = p(y)

    which means conditioning does not change the PDF.

    Remark : For a joint Gaussian pdf the contours of constant density is anellipse centered at (x, y). For independent X and Y the major (orminor) axis is parallel to x or y axis.

    (Probability and Random Variables) Estimation Theory Spring 2011 13 / 66

  • 8/3/2019 Estimation Theory Presentation

    14/66

    Expected Value of a Random Variable

    The expected value, if it exists, of a random variable X with PDF p(x) isdefined by :

    E(X) =

    xp(x)dx

    Some properties of expected value :

    E

    {X + Y

    }= E

    {X

    }+ E

    {Y

    }E{aX} = aE{X}The expected value of Y = g(X) can be computed by :

    E(Y) =

    g(x)p(x)dx

    Conditional expectation : The conditional expectation of X given that aspecific value of Y has occurred is :

    E(X|Y) =

    xp(x|y)dx

    (Probability and Random Variables) Estimation Theory Spring 2011 14 / 66

  • 8/3/2019 Estimation Theory Presentation

    15/66

    Moments of a Random Variable

    The rth moment of X is defined as :

    E(Xr) =

    xrp(x)dx

    The first moment of X is its expected value or the mean ( = E(X)).

    Moments of Gaussian RVs : A Gaussian RV with N(, 2

    ) hasmoments of all orders in closed form

    E(X) = E(X2) = 2 + 2

    E(X

    3

    ) =

    3

    + 3

    2

    E(X4) = 4 + 622 + 34

    E(X5) = 5 + 1032 + 154

    E(X6) = 6 + 1542 + 4524 + 156

    (Probability and Random Variables) Estimation Theory Spring 2011 15 / 66

    C

  • 8/3/2019 Estimation Theory Presentation

    16/66

    Central Moments

    The rth central moment of X is defined as :

    E[(X )r] =

    (x )rp(x)dx

    The second central moment (variance) is denoted by 2 or var(X).

    Central Moments of Gaussian RVs :

    E[(X )r] =

    0 if r is oddr(r 1)!! ifr is even

    where n!! denotes the double factorial that is the product of every oddnumber from n to 1.

    (Probability and Random Variables) Estimation Theory Spring 2011 16 / 66

    S i f G i RV

  • 8/3/2019 Estimation Theory Presentation

    17/66

    Some properties of Gaussian RVs

    If X N

    (x, 2

    x

    ) then Z = (X

    x)/x N

    (0, 1).

    If Z N(0, 1) then X = xZ + x N(x, 2x).If X N(x, 2x) then Z = aX + b N(ax + b, a22x).If X N(x, 2x) and Y N(y, 2y) are two independent RVs, then

    aX + bY N(ax + by, a2x + b2y)

    The sum of square of n independent RV with standard normaldistributionN(0, 1) has a 2n distribution with n degree of freedom.For large value of n, 2

    n

    converges toN

    (n, 2n).

    The Euclidian norm X2 + Y2 of two independent RVs withstandard normal distribution has the Rayleigh distribution.

    (Probability and Random Variables) Estimation Theory Spring 2011 17 / 66

    C i

  • 8/3/2019 Estimation Theory Presentation

    18/66

    Covariance

    For two RVs X and Y, the covariance is defined as

    xy = E[(X x)(Y y)]xy =

    (x x)(y y)p(x, y)dxdy

    If X and Y are zero mean then xy = E{XY}.var(X + Y) = 2x + 2y + 2xyvar(aX) = a22x

    Important formula : The relation between the variance and the mean of

    X is given by

    2 = E[(X )2] = E(X2) 2E(X) + 2= E(X2) 2

    The variance is the mean of the square minus the square of the mean.(Probability and Random Variables) Estimation Theory Spring 2011 18 / 66

    I d d U l d d O h li

  • 8/3/2019 Estimation Theory Presentation

    19/66

    Independence, Uncorrelatedness and Orthogonality

    If xy = 0, then X and Y are uncorrelated and

    E{XY} = E{X}E{Y}X and Y called orthogonal if E{XY} = 0.If X and Y are independent then they are uncorrelated.

    p(x, y) = p(x)p(y) E{XY} = E{X}E{Y}Uncorrelatedness does not imply the independence. For example, if Xis a normal RV with zero mean and Y = X2 we have p(y|x) = p(y)but

    xy = E{XY} E{X}E{Y} = E{X3

    } 0 = 0The correlation only shows the linear dependence between two RV sois weaker than independence.

    For Jointly Gaussian RVs, independence is equivalent to beinguncorrelated.

    (Probability and Random Variables) Estimation Theory Spring 2011 19 / 66

    R d V t

  • 8/3/2019 Estimation Theory Presentation

    20/66

    Random Vectors

    Random Vector : is a vector of random variables 1 :

    x = [x1, x2, . . . , xn]T

    Expectation Vector : x = E(x) = [E(x1), E(x2), . . . , E(xn)]T

    Covariance Matrix : Cx = E[(x x)(x x)T]Cx is an n

    n symmetric matrix which is assumed to be positive

    definite and so invertible.

    The elements of this matrix are : [Cx]ij = E{[xi E(xi)][xj E(xj)]}.If the random variables are uncorrelated then Cx is a diagonal matrix.

    Multivariate Gaussian PDF :

    p(x) =1

    (2)n det(Cx)exp1

    2(x x)TC1x (x x)

    1In some books (including our main reference) there is no distinction between random

    variable X and its specific value x. From now on we adopt the notation of our reference.(Probability and Random Variables) Estimation Theory Spring 2011 20 / 66

    R d P

  • 8/3/2019 Estimation Theory Presentation

    21/66

    Random Processes

    Discrete Random Process : x[n] is a sequence of random variablesdefined for every integer n.

    Mean value : is defined as E(x[n]) = x[n].Autocorrelation Function (ACF) : is defined as

    rxx[k, n] = E(x[n]x[n + k])

    Wide Sense Stationary (WSS) : x[n] is WSS if its mean and its

    autocorrelation function (ACF) do not depend on n.Autocovariance function : is defined as

    cxx[k] = E[(x[n] x)(x[n + k] x)] = rxx[k] 2xCross-correlation Function (CCF) : is defined as

    rxy[k] = E(x[n]y[n + k])

    Cross-covariance function : is defined as

    cxy[k] = E[(x[n]

    x)(y[n + k]

    y)] = rxy[k]

    xy

    (Probability and Random Variables) Estimation Theory Spring 2011 21 / 66

    Discrete White Noise

  • 8/3/2019 Estimation Theory Presentation

    22/66

    Discrete White Noise

    Some properties of ACF and CCF :

    rxx[0] |rxx[k]| rxx[k] = rxx[k] rxy[k] = ryx[k]Power Spectral Density : The Fourier transform of ACF and CCF givesthe Auto-PSD and Cross-PSD :

    Pxx(f) =

    k=rxx[k]exp(j2fk)

    Pxy(f) =

    k=rxy[k]exp(j2fk)

    Discrete White Noise : is a discrete random process with zero mean andrxx[k] =

    2[k] where [k] is the Kronecker impulse function. The PSD ofwhite noise becomes Pxx(f) =

    2 and is completely flat with frequency.

    (Probability and Random Variables) Estimation Theory Spring 2011 22 / 66

  • 8/3/2019 Estimation Theory Presentation

    23/66

    Introduction and Minimum Variance UnbiasedEstimation

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 23 / 66

    The Mathematical Estimation Problem

  • 8/3/2019 Estimation Theory Presentation

    24/66

    The Mathematical Estimation Problem

    Parameter Estimation Problem : Given a set of measured data

    x ={

    x[0], x[1], . . . , x[N

    1]}

    which depends on an unknown parameter vector , determine an estimator

    = g(x[0], x[1], . . . , x[N 1])where g is some function.

    The first step is to find the PDF of data as afunction of : p(x; )

    Example : Consider the problem of DC level in white Gaussian noise withone observed data x[0] = + w[0] where w[0] has the PDF N(0, 2).Then the PDF of x[0] is :

    p(x[0]; ) =1

    22exp

    1

    22(x[0] )2

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 24 / 66

    The Mathematical Estimation Problem

  • 8/3/2019 Estimation Theory Presentation

    25/66

    The Mathematical Estimation Problem

    Example : Consider a data sequence that can be modeled with a linear

    trend in white Gaussian noise

    x[n] = A + Bn + w[n] n = 0, 1, . . . , N 1

    Suppose that w[n] N(0, 2) and is uncorrelated with all the othersamples. Letting = [A B] and x = [x[0], x[1], . . . , x[N 1]] the PDF is :

    p(x;) =N1n=0

    p(x[n];) =1

    (

    22)Nexp

    1

    22

    N1n=0

    (x[n] A Bn)2

    The quality of any estimator for this problem is related to the assumptionson the data model. In this example, linear trend and WGN PDFassumption.

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 25 / 66

    The Mathematical Estimation Problem

  • 8/3/2019 Estimation Theory Presentation

    26/66

    The Mathematical Estimation Problem

    Classical versus Bayesian estimation

    If we assume is deterministic we will have a classical estimationproblem. The following method will be studied : MVU, MLE, BLUE,LSE.

    If we assume is a random variable with a known PDF, then we will

    have a Bayesian estimation problem. In this case the data aredescribed as the joint PDF

    p(x, ) = p(x|)p()

    where p() summarizes our knowledge about before any data isobserved and p(x|) summarizes our knowledge provided by data xconditioned on knowing . The following methods will be studied :MMSE, MAP, Kalman Filter.

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 26 / 66

    Assessing Estimator Performance

  • 8/3/2019 Estimation Theory Presentation

    27/66

    Assessing Estimator Performance

    Consider the problem of estimating a DC level A in uncorrelated noise :

    x[n] = A + w[n] n = 0, 1, . . . , N 1Consider the following estimators :

    A1 =1

    N

    N1

    n=0 x[n]A2 = x[0]

    Suppose that A = 1, A1 = 0.95 and A2 = 0.98. Which estimator is better ?

    An estimator is a random variable, so itsperformance can only be described by its PDF or

    statistically (e.g. by Monte-Carlo simulation).

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 27 / 66

    Unbiased Estimators

  • 8/3/2019 Estimation Theory Presentation

    28/66

    Unbiased Estimators

    An estimator that on the average yield the true value is unbiased.Mathematically

    E( ) = 0 for a < < bLets compute the expectation of the two estimators A1 and A2 :

    E(A1) =1

    N

    N1

    n=0 E(x[n]) =1

    N

    N1

    n=0 E(A + w[n]) =1

    N

    N1

    n=0 (A + 0) = AE(A2) = E(x[0]) = E(A + w[0]) = A + 0 = A

    Both estimators are unbiased. Which one is better ?Now, lets compute the variance of the two estimators :

    var(A1) = var

    1

    N

    N1n=0

    x[n]

    =

    1

    N2

    N1n=0

    var(x[n]) =1

    N2N2 =

    2

    N

    var(A2) = var(x[0]) = 2 > var(A1)

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 28 / 66

    Unbiased Estimators

  • 8/3/2019 Estimation Theory Presentation

    29/66

    Unbiased Estimators

    Remark : When several unbiased estimators of the same parameters fromindependent set of data are available, i.e., 1, 2, . . . , n, a better estimatorcan be obtained by averaging :

    =1

    n

    n

    i=1i E() =

    Assuming that the estimators have the same variance, we have :

    var() =1

    n2

    n

    i=1var(i) =

    1

    n2n var(i) =

    var(i)

    n

    By increasing n, the variance will decrease (if n , ).It is not the case for biased estimators, no matter how many estimators areaveraged.

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 29 / 66

    Minimum Variance Criterion

  • 8/3/2019 Estimation Theory Presentation

    30/66

    Minimum Variance Criterion

    The most logical criterion for estimation is the Mean Square Error (MSE) :

    mse() = E[( )2]Unfortunately this type of estimators leads to unrealizable estimators (theestimator will depend on ).

    mse() = E{[ E() + E() ]2} = E{[ E() + b()]2}where b() = E() is defined as the bias of the estimator. Therefore :

    mse() = E{[ E()]2} + 2b()E[ E()] + b2() = var() + b2()

    Instead of minimizing MSE we can minimize the variance ofthe unbiased estimators :

    Minimum Variance Unbiased Estimator

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 30 / 66

    Minimum Variance Unbiased Estimator

  • 8/3/2019 Estimation Theory Presentation

    31/66

    u a a ce U b ased st ato

    Existence of MVU Estimator : In general MVU estimator does notalways exist. There may be no unbiased estimator or non of unbiasedestimators has a uniformly minimum variance.

    Finding the MVU Estimator : There is no known procedure which

    always leads to the MVU estimator. Three existing approaches are :1 Determine the Cramer-Rao lower bound (CRLB) and check to see if

    some estimator satisfies it.

    2 Apply the Rao-Blackwell-Lehmann-Scheffe theorem (we will skip it).

    3 Restrict to linear unbiased estimators.

    (Minimum Variance Unbiased Estimation) Estimation Theory Spring 2011 31 / 66

  • 8/3/2019 Estimation Theory Presentation

    32/66

    Cramer-Rao Lower Bound

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 32 / 66

    Cramer-Rao Lower Bound

  • 8/3/2019 Estimation Theory Presentation

    33/66

    CRLB is a lower bound on the variance of any unbiased estimator.

    var() CRLB()

    Note that the CRLB is a function of.It tells us what is the best performance that can be achieved(useful in feasibility study and comparison with other estimators).

    It may lead us to compute the MVU estimator.

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 33 / 66

    Cramer-Rao Lower Bound

  • 8/3/2019 Estimation Theory Presentation

    34/66

    Theorem (scalar case)

    Assume that the PDF p(x; ) satisfies the regularity condition

    E

    ln p(x; )

    = 0 for all

    Then the variance of any unbiased estimator satisfies

    var() E

    2 ln p(x; )

    2

    1An unbiased estimator that attains the CRLB can be found iff :

    ln p(x; )

    = I()(g(x) )

    for some functions g(x) and I(). The estimator is = g(x) and theminimum variance is 1/I().

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 34 / 66

    Cramer-Rao Lower Bound

  • 8/3/2019 Estimation Theory Presentation

    35/66

    Example : Consider x[0] = A + w[0] with w[0] N(0, 2).

    p(x[0]; A) = 122

    exp 122

    (x[0] A)2ln p(x[0], A) = ln

    22 1

    22(x[0] A)2

    Then

    ln p(x[0]; A)

    A=

    1

    2(x[0] A)

    2 ln p(x[0]; A)

    A2=

    1

    2

    According to Theorem :

    var(A) 2 and I(A) = 12

    and A = g(x[0]) = x[0]

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 35 / 66

    Cramer-Rao Lower Bound

  • 8/3/2019 Estimation Theory Presentation

    36/66

    Example : Consider multiple observations for DC level in WGN :

    x[n] = A + w[n] n = 0, 1, . . . , N

    1 with w[n]

    N(0, 2)

    p(x; A) =1

    (

    22)Nexp

    1

    22

    N1n=0

    (x[n] A)2

    Then

    ln p(x; A)

    A=

    A

    ln[(22)N/2] 1

    22

    N1n=0

    (x[n] A)2

    =1

    2

    N1

    n=0 (x[n] A) =N

    2 1NN1

    n=0 x[n] AAccording to Theorem :

    var(A) 2

    Nand I(A) =

    N

    2and A = g(x[0]) =

    1

    N

    N1

    n=0 x[n](Cramer-Rao Lower Bound) Estimation Theory Spring 2011 36 / 66

    Transformation of Parameters

  • 8/3/2019 Estimation Theory Presentation

    37/66

    If it is desired to estimate = g(), then the CRLB is :

    var() E2 ln p(x; )2

    1g2

    Example : Compute the CRLB for estimation of the power (A2) of a DClevel in noise :

    var(A2) 2

    N(2A)2 = 4A

    2

    2

    N

    Definition

    Efficient estimator : An unbiased estimator that attains the CRLB is said

    to be efficient.

    Example : Knowing that x =1

    N

    N1n=0

    x[n] is an efficient estimator for A, is

    x2 an efficient estimator for A2 ?(Cramer-Rao Lower Bound) Estimation Theory Spring 2011 37 / 66

    Transformation of Parameters

  • 8/3/2019 Estimation Theory Presentation

    38/66

    Solution : Knowing that x N(A, 2/N), we have :

    E(x2) = E2(x) + var(x) = A2 +2

    N = A2

    So the estimator A2 = x2 is not even unbiased.Lets look at the variance of this estimator :

    var(x2) = E(x4) E2(x2)but we have from the moments of Gaussian RVs (slide 15) :

    E(x4) = A4 + 6A22

    N

    + 3(2

    N

    )2

    Therefore :

    var(x2) = A4 + 6A22

    N+

    34

    N2

    A2 +2

    N

    2=

    4A22

    N+

    24

    N2

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 38 / 66

    Transformation of Parameters

  • 8/3/2019 Estimation Theory Presentation

    39/66

    Remarks :

    The estimator A2 = x2 is biased and not efficient.As N the bias goes to zero and the variance of the estimatorapproaches the CRLB. This type of estimators are calledasymptotically efficient.

    General Remarks :

    If g() = a + b is an affine function of , then g() = g() is anefficient estimator. First, it is unbiased : E(a + b) = a + b = g(),moreover :

    var(g()) g

    2

    var() = a2var()

    but var(g()) = var(a + b) = a2var(), so that the CRLB is achieved.If g() is a nonlinear function of and is an efficient estimator,then g() is an asymptotically efficient estimator.

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 39 / 66

    Cramer-Rao Lower Bound

  • 8/3/2019 Estimation Theory Presentation

    40/66

    Theorem (Vector Parameter)

    Assume that the PDF p(x;) satisfies the regularity condition

    E

    ln p(x;)

    = 0 for all

    Then the variance of any unbiased estimator satisfies C

    I1()

    0where 0 means that the matrix is positive semidefinite. I() is called theFisher information matrix and is given by :

    Iij() =

    E

    2 ln p(x;)

    ij An unbiased estimator that attains the CRLB can be found iff :

    ln p(x;)

    = I()(g(x) )

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 40 / 66

    CRLB Extension to Vector Parameter

  • 8/3/2019 Estimation Theory Presentation

    41/66

    Example : Consider a DC level in WGN with A and 2 unknown.Compute the CRLB for estimation of = [A 2]T.

    ln p(x;) = N2

    ln 2 N2

    ln 2 122

    N1n=0

    (x[n] A)2

    The Fisher information matrix is :

    I() = E 2 ln p(x;)

    A22 ln p(x;)A2

    2 ln p(x;)A2

    2 ln p(x;)(2)2

    = N

    20

    0 N24

    The matrix is diagonal (just for this example) and can be easily inverted to

    yield :

    var() 2

    Nvar(

    2) 2

    4

    N

    Is there any unbiased estimator that achieves these bounds ?

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 41 / 66

    Transformation of Parameters

  • 8/3/2019 Estimation Theory Presentation

    42/66

    If it is desired to estimate = g(), and the CRLB for the covariance of is I1(), then :

    C g

    E

    2 ln p(x; )2

    1g

    TExample : Consider a DC level in WGN with A and 2 unknown.

    Compute the CRLB for estimation of signal to noise ratio = A2

    /2

    .We have = [A 2]T and = g() = 21/2, then the Jacobian is :

    g()

    =

    g()

    1

    g()

    2

    =

    2A

    2 A

    2

    4

    So the CRLB is :

    var()

    2A

    2 A

    2

    4

    N2

    0

    0 N24

    1 2A2A24

    =

    4 + 22

    N

    (Cramer-Rao Lower Bound) Estimation Theory Spring 2011 42 / 66

    Linear Models with WGN

  • 8/3/2019 Estimation Theory Presentation

    43/66

    If N point samples of data observed can be modeled as

    x = H + w

    where

    x = N 1 observation vectorH = N p observation matrix (known, rank p) = p 1 vector of parameters to be estimatedw = N 1 noise vector with PDF N(0, 2I)

    Compute the CRLB and the MVU estimator that achieves this bound.

    Step 1 : Compute ln p(x;).

    Step 2 : Compute I() =

    E

    2 ln p(x;)

    2 and the covariance

    matrix of : C = I1().

    Step 3 : Find the MVU estimator g(x) by factoring

    ln p(x;)

    = I()[g(x) ]

    (Linear Models) Estimation Theory Spring 2011 43 / 66

    Linear Models with WGN

  • 8/3/2019 Estimation Theory Presentation

    44/66

    Step 1 : ln p(x;) = ln(

    22)N 122

    (x H)T(x H).Step 2 :

    ln p(x;)

    = 122

    [xTx 2xTH + THTH]

    =1

    2[HTx HTH]

    Then I() = E2 ln p(x;)2 = 12 HTHStep 3 : Find the MVU estimator g(x) by factoring

    ln p(x;)

    = I()[g(x) ]

    =HTH

    2[(HTH)1HTx ]

    Therefore :

    = g(x) = (HTH)1HTx C = I1() = 2(HTH)1

    (Linear Models) Estimation Theory Spring 2011 44 / 66

    Linear Models with WGN

  • 8/3/2019 Estimation Theory Presentation

    45/66

    For a linear model with WGN represented by x = H + w the MVU

    estimator is : = (HTH)1HTx

    This estimator is efficient and attains the CRLB.

    That the estimator is unbiased can be seen easily by :

    E() = (HTH)1HTE(H + w) =

    The statistical performance of is completely specified because is alinear transformation of a Gaussian vector x and hence has a Gaussian

    distribution : N(, 2(HTH)1)

    (Linear Models) Estimation Theory Spring 2011 45 / 66

    Example (Curve Fitting)

  • 8/3/2019 Estimation Theory Presentation

    46/66

    Consider fitting the data x[n] by a p-th order polynomial function of n :

    x[n] = 0 + 1n + 2n2 +

    + pn

    p + w[n]

    We have N data sample, then :

    x = [x[0], x[1], . . . , x[N 1]]Tw = [w[0], w[1], . . . , w[N 1]]T

    = [0, 1, . . . , p]T

    so x = H + w, where H is :

    1 0 0 01 1 1 11 2 4 2

    p

    ......

    .... . .

    ...1 N 1 (N 1)2 (N 1)p

    N(p+1)

    Hence the MVU estimator is : = (HTH)1HTx(Linear Models) Estimation Theory Spring 2011 46 / 66

    Example (Fourier Analysis)

  • 8/3/2019 Estimation Theory Presentation

    47/66

    Consider the Fourier analysis of the data x[n] :

    x[n] =

    Mk=1

    akcos(2knN

    ) +

    Mk=1

    bk sin( 2knN

    ) + w[n]

    so we have = [a1, a2, . . . , aM, b1, b2, . . . , bM]T and x = H + w where :

    H = [ha1, ha2, . . . , haM, hb1, hb2, . . . , hbM]

    with

    hak =

    1

    cos( 2kN

    )

    cos(2k2N )...

    cos( 2k(N1)N

    )

    , hbk =

    1

    sin( 2kN

    )

    sin(2k2N )...

    sin( 2k(N1)N

    )

    Hence the MVU estimate of the Fourier coefficients is : = (HTH)1HTx(Linear Models) Estimation Theory Spring 2011 47 / 66

    Example (Fourier Analysis)

  • 8/3/2019 Estimation Theory Presentation

    48/66

    After simplification (noting that (HTH)1 = 2N

    I), we have :

    =

    2

    N[(ha

    1)T

    x, . . . , (haM)

    T

    x , (hb

    1)T

    x, . . . , (hbM)

    T

    x]T

    which is the same as the standard solution :

    ak =2

    N

    N1

    n=0 x[n]cos2kn

    N , bk =2

    N

    N1

    n=0 x[n]sin2kn

    N From the properties of linear models the estimates are unbiased.

    The covariance matrix is :

    C = 2(HTH)1 = 2

    2

    NI

    Note that is Gaussian and C is diagonal (the amplitude estimatesare independent).

    (Linear Models) Estimation Theory Spring 2011 48 / 66

    Example (System Identification)

  • 8/3/2019 Estimation Theory Presentation

    49/66

    Consider identification of a Finite Impulse Response (FIR) model, h[k] fork = 0, 1, . . . , p

    1, with input u[n] and output x[n] provided for

    n = 0, 1, . . . , N 1 :

    x[n] =

    p1k=0

    h[k]u[n k] + w[n] n = 0, 1, . . . , N 1

    FIR model can be represented by the linear model x = H + w where

    =

    h[0]h[1]

    . . .h[p 1]

    p1

    H =

    u[0] 0 0u[1] u[0] 0

    .

    ..

    .

    .... .

    .

    ..u[N 1] u[N 2] u[N p]

    Np

    The MVU estimate is = (HTH)1HTx with C = 2(HTH)1.

    (Linear Models) Estimation Theory Spring 2011 49 / 66

    Linear Models with Colored Gaussian Noise

  • 8/3/2019 Estimation Theory Presentation

    50/66

    Determine the MVU estimator for the linear model x = H + w with w acolored Gaussian noise with N(0, C).Whitening approach : Since C is positive definite, its inverse can befactored as C1 = DTD where D is an invertible matrix. This matrix actsas a whitening transformation for w :

    E[(Dw)(Dw)T] = E(DwwTD) = DCDT = DD1DTD = I

    Now if we transform the linear model x = H + w to :x = Dx = DH + Dw = H + w

    where w = Dw N(0, I) is white and we can compute the MVUestimator as :

    = (HTH)1HTx = (HTDTDH)1HTDTDx

    so, we have :

    = (HTC1H)1HTC1x with C = (HTH)1 = (HTC1H)1

    (Linear Models) Estimation Theory Spring 2011 50 / 66

    Linear Models with known components

  • 8/3/2019 Estimation Theory Presentation

    51/66

    Consider a linear model x = H + s + w, where s is a known signal. Todetermine the MVU estimator let x = x s, so that x = H + w is astandard linear model. The MVU estimator is :

    = (HTH)1HT(x s) with C = 2(HTH)1

    Example : Consider a DC level and exponential in WGN :x[n] = A + rn + w[n] where r is known. Then we have :

    x[0]x[1]

    ...x[N

    1]

    =

    11...1

    A +

    1r...

    rN1

    +

    w[0]w[1]

    ...w[N

    1]

    The MVU estimator is :

    A = (HTH)1HT(x s) = 1N

    N1n=0

    (x[n] rn) with var(A) = 2

    N

    (Linear Models) Estimation Theory Spring 2011 51 / 66

    Best Linear Unbiased Estimators (BLUE)

  • 8/3/2019 Estimation Theory Presentation

    52/66

    Problems of finding the MVU estimators :

    The MVU estimator does not always exist or impossible to find.

    The PDF of data may be unknown.

    BLUE is a suboptimal estimator that :

    restricts estimates to be linear in data ; = Ax

    restricts estimates to be unbiased ; E() = AE(x) =

    minimizes the variance of the estimates ;

    needs only the mean and the variance of the data (not the PDF). Asa result, in general, the PDF of the estimates cannot be computed.

    Remark : The unbiasedness restriction implies a linear model for the data.However, it may still be used if the data are transformed suitably or themodel is linearized.

    (Best Linear Unbiased Estimators) Estimation Theory Spring 2011 52 / 66

    Finding the BLUE (Scalar Case)

  • 8/3/2019 Estimation Theory Presentation

    53/66

    1 Choose a linear estimator for the observed data

    x[n] , n = 0, 1, . . . , N 1

    =N1n=0

    anx[n] = aTx where a = [a0, a1, . . . , aN1]T

    2 Restrict estimate to be unbiased :

    E() =

    N1n=0

    anE(x[n]) =

    3 Minimize the variance

    var() = E{[ E()]2} = E{[aTx aTE(x)]2}= E{aT[x E(x)][x E(x)]Ta} = aTCa

    (Best Linear Unbiased Estimators) Estimation Theory Spring 2011 53 / 66

    Finding the BLUE (Scalar Case)

  • 8/3/2019 Estimation Theory Presentation

    54/66

    Consider the problem of amplitude estimation of known signals in noise :

    x[n] = s[n] + w[n]

    1 Choose a linear estimator : =

    N1n=0

    anx[n] = aTx

    2 Restrict estimate to be unbiased : E() = aTE(x) = aTs =

    then aTs = 1 where s = [s[0], s[1], . . . , s[N 1]]T3 Minimize aTCa subject to aTs = 1.

    The constrained optimization can be solved using Lagrangian Multipliers :

    Minimize J = aT

    Ca + (aT

    s 1)The optimal solution is :

    =sTC1xsTC1s

    and var() =1

    sTC1s(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 54 / 66

    Finding the BLUE (Vector Case)

  • 8/3/2019 Estimation Theory Presentation

    55/66

    Theorem (GaussMarkov)

    If the data are of the general linear model form

    x = H + w

    with w is a noise vector with zero mean and covariance C (the PDF of w

    is arbitrary), then the BLUE of is :

    = (HTC1H)1HTC1x

    and the covariance matrix of is

    C = (HTC1H)1

    Remark : If noise is Gaussian then BLUE is MVU estimator.

    (Best Linear Unbiased Estimators) Estimation Theory Spring 2011 55 / 66

    Finding the BLUE

    E l C id h bl f DC l l i hi i

  • 8/3/2019 Estimation Theory Presentation

    56/66

    Example : Consider the problem of DC level in white noise :x[n] = A + w[n], where w[n] is of unspecified PDF with var(w[n]) = 2n.

    We have = A and H = 1 = [1, 1, . . . , 1]T

    . The covariance matrix is :

    C =

    20 0 00 21 0...

    .... . .

    ...

    0 0 2N1

    C1 =

    120

    0 00 1

    21 0

    ......

    . . ....

    0 0 12N1

    and hence the BLUE is :

    = (HTC1H)1HTC1x = N1

    n=01

    2n

    1 N1

    n=0x[n]

    2n

    and the minimum covariance is :

    C = (HTC1H)1 =

    N1

    n=01

    2n

    1(Best Linear Unbiased Estimators) Estimation Theory Spring 2011 56 / 66

  • 8/3/2019 Estimation Theory Presentation

    57/66

    Maximum Likelihood Estimation

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 57 / 66

    Maximum Likelihood Estimation

    Problems : MVU ti t d t ft i t t b f d

  • 8/3/2019 Estimation Theory Presentation

    58/66

    Problems : MVU estimator does not often exist or cannot be found.BLUE is restricted to linear models.

    Maximum Likelihood Estimator (MLE) :

    can always be applied if the PDF is known ;

    is optimal for large data size ;

    is computationally complex and requires numerical methods.

    Basic Idea : Choose the parameter value that makes the observed data,the most likely data to have been observed.

    Likelihood Function : is the PDF p(x; ) when is regarded as a variable

    (not a parameter).

    ML Estimate : is the value of that maximizes the likelihood function.

    Procedure : Find log-likelihood function ln p(x; ) ; differentiate w.r.t and set to zero and solve for .

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 58 / 66

    Maximum Likelihood Estimation

    Example : Consider DC level in WGN with unknown variance

  • 8/3/2019 Estimation Theory Presentation

    59/66

    Example : Consider DC level in WGN with unknown variancex[n] = A + w[n]. Suppose that A > 0 and 2 = A. The PDF is :

    p(x; A) =1

    (2A)N2

    exp 1

    2A

    N1n=0

    (x[n] A)2Taking the derivative of the log-likelihood function, we have :

    ln p(x; A)

    A= N

    2A+

    1

    A

    N1n=0

    (x[n] A) + 12A2

    N1n=0

    (x[n] A)2

    What is the CRLB ? Does an MVU estimator exist ?

    MLE can be found by setting the above equation to zero :

    A = 12

    +

    1N

    N1n=0

    x2[n] +1

    4

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 59 / 66

    Maximum Likelihood Estimation

    Properties : MLE may be biased and is not necessarily an efficient

  • 8/3/2019 Estimation Theory Presentation

    60/66

    Properties : MLE may be biased and is not necessarily an efficientestimator. However :

    MLE is asymptotically unbiased meaning that limNE() .

    MLE is asymptotically attains the CRLB.MLE has asymptotically a Gaussian PDF :

    a N(, I1()).If an MVE estimator exists then ML procedure will find it.

    Example : Consider DC level in WGN with known variance 2

    . Then

    p(x; A) =1

    (22)N2

    exp

    1

    22

    N1n=0

    (x[n] A)2

    ln p(x; A)A = 12

    N1n=0

    (x[n] A) = 0 N1n=0

    x[n] N = 0

    which leads to = x =1

    N

    N1

    n=0

    x[n]

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 60 / 66

    MLE for Transformed Parameters (Invariance Property)Find MLE for = g(), when PDF p(x; ) is given :

  • 8/3/2019 Estimation Theory Presentation

    61/66

    ( ) ( )1 If = g() is a one-to-one function, then

    = arg max p(x; g1

    ()) = g()

    Example : Consider DC level in WGN and find MLE of = exp(A).Since g() is a one-to-one function then = exp(x).

    2 If = g() is not a one-to-one function, thenpT(x; ) = max

    {:=g()}p(x; ) and = arg max

    pT(x; )

    Example : Consider DC level in WGN and find MLE of = A2. Sinceg() is not a one-to-one function then :

    = arg max

    0{p(x; ), p(x; )}

    =

    arg max0

    {p(x; ), p(x; )}2= arg max

  • 8/3/2019 Estimation Theory Presentation

    62/66

    Example : DC Level in WGN with unknown variance. The vectorparameter = [A 2]T should be estimated. We have :

    ln p(x;)

    A=

    1

    2

    N1n=0

    (x[n] A)

    ln p(x;)

    2

    =

    N

    22

    +1

    24

    N1

    n=0 (x[n] A)2which leads to the following MLE :

    A = x

    2

    =

    1

    N

    N1

    n=0 (x[n] x)2Asymptotic properties of the MLE : Under some regularity conditions,the MLE is asymptotically normally distributed

    a N(, I1()) even ifthe PDF of x is not Gaussian.

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 62 / 66

    MLE for General Gaussian Case

  • 8/3/2019 Estimation Theory Presentation

    63/66

    Consider the general Gaussian case where x N((), C()).The partial derivative of the PDF is :

    ln p(x;)

    k= 1

    2tr

    C1()

    C()

    k

    +

    ()T

    kC1()(x ())

    1

    2

    (x

    ())T

    C1()

    k(x

    ())

    for k = 1, . . . , p.

    By setting the above equations equal to zero, MLE can be found.

    A particular case is when C is known (the first and third terms

    become zero).

    In addition, if() is linear in , the general linear model is obtained.

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 63 / 66

    MLE for General Linear Models

    Consider the general linear model x = H + w where w is a noise vector

  • 8/3/2019 Estimation Theory Presentation

    64/66

    Consider the general linear model x H + w where w is a noise vectorwith PDF N(0, C) :

    p(x;) = 1(2)n det(C)

    exp 12

    (x H)TC1(x H)Taking the derivative of ln p(x;) leads to :

    ln p(x;)

    = (H)T

    C1(x H)

    Then

    H

    T

    C1

    (x H) = 0 = (H

    T

    C1

    H)1

    H

    T

    C1

    xwhich is the same as MVU estimator. The PDF of is :

    N(, (HTC1H)1)(Maximum Likelihood Estimation) Estimation Theory Spring 2011 64 / 66

    MLE (Numerical Method)

  • 8/3/2019 Estimation Theory Presentation

    65/66

    NewtonRaphson : A closed form estimator cannot be always computedby maximizing the likelihood function. However, the maximum value can

    be computed by the numerical methods like the iterative NewtonRaphsonalgorithm.

    k+1 = k 2 ln p(x;)

    T 1

    ln p(x;)

    =kRemarks :

    The Hessian can be replaced by the negative of its expectation, theFisher information matrix I().

    This method suffers from convergence problems (local maximum).Typically, for large data length, the log-likelihood function becomesmore quadratic and the algorithm will produce the MLE.

    (Maximum Likelihood Estimation) Estimation Theory Spring 2011 65 / 66

  • 8/3/2019 Estimation Theory Presentation

    66/66

    Least Squares Estimation

    (Least Squares) Estimation Theory Spring 2011 66 / 66