SGN-2607 Statistical Signal Processinghehu/SSP/lecture1.pdfSGN-2607 Statistical Signal Processing Course literature: Kay S. M. (1993), Fundamentals of Statistical Signal Processing

SGN-2607 Statistical Signal Processing

• Lectures: Heikki Huttunen; [email protected]• Tuesdays 12-14, TB222/TB215 starting 8.1.2011.

• Exercises: Muhammad Farhan• Thursdays at 14-16, TB220 (first time 17.1)• Fridays at 10-12, TB220 (first time 18.1)

• Requirements: homework and the final exam• The homework assignment will be given later.• Late policy: late homeworks are not accepted.

• Weekly exercises will give you bonus. Check the website.• Web-site: http://www.cs.tut.fi/courses/SGN-2607/

SGN-2607 Statistical Signal Processing

• Course literature: Kay S. M. (1993), Fundamentals ofStatistical Signal Processing - Estimation Theory, PrenticeHall

• Lecture slides will be available after each lecture at:http://www.cs.tut.fi/courses/SGN-2607/slides/lecture1.pdfhttp://www.cs.tut.fi/courses/SGN-2607/slides/lecture2.pdf...http://www.cs.tut.fi/courses/SGN-2607/slides/lecture16.pdf

• The slides are probably enough for passing the course.

Passing the course

Requirements for passing the course are:• Mandatory Matlab assignment.• Passed exam. Grading 0-5, Max. 30 pts.

• 0—14 pts→ 0• 15-17 pts→ 1• 18-20 pts→ 2• 21-23 pts→ 3• 24-26 pts→ 4• 27-30 pts→ 5

Additionally, the exercises add bonus to the exam grade (total0-4 pts).

Introduction - estimation

• Our goal is to estimate the values of a group of parametersfrom data.

• Examples: radar, sonar, speech, image analysis,biomedicine, communications, control, seismology, etc.

• Parameter estimation: Given an N-point data setx = {x[0], x[1], . . . , x[N− 1]} which depends on the unknownparameter θ ∈ R, we wish to design an estimator for θ

θ = g(x[0], x[1], . . . , x[N− 1]).

• The question is how to determine a good model and itsparameters.

Introductory Example – Straight line

• Suppose we have the following time series and would liketo approximate the relationship of the two coordinates.

0 10 20 30 40 50 60 70 80 90 100−20

0

20

40

60

80

100

• The relationship looks linear, so we could assume thefollowing model:

y[n] = ax[n] + b+w[n],

with a ∈ R and b ∈ R unknown and w[n] ∼ N(0,σ2).

Introductory Example – Straight line (cont.)

• Each pair of a and b represent one line.• Which line of the three would best describe the data set?

Or some other line?

0 10 20 30 40 50 60 70 80 90 100−20

0

20

40

60

80

100

Introductory Example – Straight line (cont.)

• It can be shown that the best solution (in the MVUE sense;to be defined later) is given by

a = −6

N(N+ 1)

N−1∑n=0

y(n) +12

N(N2 − 1)

N−1∑n=0

x(n)y(n)

b =2(2N− 1)N(N+ 1)

N−1∑n=0

y(n) −6

N(N+ 1)

N−1∑n=0

x(n)y(n).

• Or, as we will later learn, in an easy matrix form:

θ =

(a

b

)= (HTH)−1HTx

Introductory Example 2 – Sinusoid

• Consider transmitting the sinusoid below.

0 20 40 60 80 100 120 140 160 180−4

−2

0

2

4

Introductory Example 2 – Sinusoid (cont.)

• However, when the data is received it iscorrupted by noise and the received samples look like below.

0 20 40 60 80 100 120 140 160 180 200−3

−2

−1

0

1

2

3


• In this case, the problem is to find good values for A, f0and φ in the following model:

x[n] = A cos(2πf0n+ φ) +w[n],

with w[n] ∼ N(0,σ2).


• We will learn that the maximum likelihood estimator; MLE forparameters A, f0 and φ are given by

f0 = value of f that maximizes

∣∣∣∣∣N−1∑n=0

x(n)e−2πifn

∣∣∣∣∣ ,A =

2N

∣∣∣∣∣N−1∑n=0

x(n)e−2πif0n

∣∣∣∣∣φ = arctan

−∑N−1n=0 x(n) sin(2πf0n)∑N−1n=0 x(n) cos(2πfn)

.


• It turns out that the sinusoidal parameter estimation isvery successful:

0 20 40 60 80 100 120 140 160 180 200−4

−2

0

2

4

• The blue curve is the original sinusoid, and the red curve isthe one estimated from the green circles.


• In this case, the estimates are: f0 = 0.0399, φ = 0.3871 andA = 1.1698, while the true values in the simulation weref0 = 0.04, φ = 0.4 and A = 1.

• However, the results are different for each realization of thenoise w[n].

• Thus, we’re not very interested in an individual case, butrather on the distributions of estimates• What are the expectations: E[f0], E[φ] and E[A]?• What are their respective variances?• Could there be a better formula that would yield smaller

variance?• If yes, how to discover the better estimators?

Example of the variance of an estimator

• Consider the estimation of the mean of the followingmeasurement data:

0 20 40 60 80 100 120 140 160 180 2000

2

4

6

8

10

• Now we’re searching for the estimator A in the model

x[n] = A+w[n],

with w[n] ∼ N(0,σ2) where σ2 is also unknown.

Example of the variance of an estimator (cont.)

• A natural estimator of A is the sample mean:

A =1N

N−1∑n=0

x[n].

• Alternatively, one might propose to use only the firstsample as such:

A = x[0].

• How to justify that the first one is better?


• This can be done by comparing the variances of theestimator outputs.

var(A) = var

(1N

N−1∑n=0

x[n]

)

=1N2

N−1∑n=0

var(x[n])

=1N2Nσ

2 =σ2

N.

var(A) = var(x[0]) = σ2.


• Histograms of the estimates over 100 data realizations:

1 2 3 4 5 6 7 8 90

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

14

Estimator 1, small variance

Estimator 2,large variance


• Compared to the "First sample estimator" A = x[0], theestimator variance of A is one N’th.

• Could there be also better ones than A?• Is the estimator variance the only thing to consider, or are

there other important properties, as well?• Note that the picture only is not enough. The use of

computer simulations for assessing estimationperformance, although quite valuable for gaining insightand motivating conjectures, is never conclusive.

• Soon we will describe the estimator performancemathematically.

Classical vs. Bayesian

• There are two estimation approaches: classical estimationand Bayesian estimation.

• Classical estimation thinks of θ as a deterministic butunknown parameter.• What is the DC level in this measurement?

• Bayesian estimation thinks of θ as random variable, whoserealization is behind the data.• The DC level of this measurement is somewhere between 3

and 5⇒ p(θ) = U[3, 5].• Bayes rule: p(θ | x) = p(x|θ)p(θ)

p(x) .

• First we will concentrate on the deterministic case.

Properties of an estimator

• Commonly the following properties for estimators areconsidered.

• Bias: E[θ] − θ.

• Variance: var(θ) = E[(θ− E(θ))2] = E[θ2] − E[θ]2.

• Mean Square Error (MSE): E[(θ− θ)2].

Properties of an estimator (cont.)

• The MSE consists of bias and variance:

MSE = E[(θ− θ)2]

= E[θ2] − 2E[θθ] + E[θ2]

= E[θ2]

=0︷︸︸︷−E[θ]2 + E[θ]2 −2θE[θ] + θ2

= E[θ2] − E[θ]2︸︷︷︸variance

+(E[θ] − θ)2︸︷︷︸bias squared

= var(θ) + bias2(θ)


• Additional properties:• Convergence or consistency (the estimator becomes better

as the number of observations increases).• Efficiency; i.e., the estimator variance is as small as

possible.• Asymptotic efficiency; i.e., the estimator becomes efficient

as N→∞.• Asymptotic normality; i.e.,

√N(θN − θ) approaches the

Normal distribution with zero mean and variance Cθ.


• It is common to distinguish between small sample andlarge sample (asymptotic) properties.

Small sample Large sampleProperties any sample size (Asympt.) infinite size

< infinite sizeeasier to study

-unbiasedness -asymptotic unbiasedness-efficiency -asymptotic efficiency

-consistency-asymptotic normality

Unbiasedness

• Unbiasedness: on average an estimator gives correctestimates.

• An estimator is unbiased for a parameter θ if

E(θ) = θ, when θ is deterministic,E(θ) = E(θ), when θ is random.

Unbiasedness (cont.)

• An unbiased estimator may not always exist or it may bedifficult to compute.

• The property of unbiasedness is not necessarily invariantunder functional transformations.• E.g., for the model x[n] = A+w[n] withw[n] ∼ N(0,σ2), the

sample mean

A =1N

N−1∑n=0

x[n]

is an unbiased estimator of A. However, A2 is not anunbiased estimator of A2.

Efficiency

• The variance measures the performance of an estimator.

• The efficiency can be established by comparing thevariance of θwith the smallest error variance that can beattained by any unbiased estimator.

• If var(θ1) < var(θ2) and E(θ1) = E(θ2) the estimator θ1 ismore efficient than θ2.

Efficiency (cont.)

• An estimator that is unbiased and has the smallestvariance of all unbiased estimators is desirable.

• If this estimator also reaches the theoretical bound(Cramer-Rao lower bound), it is called an efficient estimator.

• The relative efficiency of two estimators T1 and T2 based onN observations is var(T2(N))

var(T1(N)) .

• E.g. The efficiency of median relative to mean in case ofnormally distributed population is 2

π .

Example

• Below are four estimators of A in the modelx[n] = A+w[n]; w[n] ∼ N(0,σ2), when the true A = 5.

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

5000

6000

Efficient and unbiased estimator

0 1 2 3 4 5 6 7 8 9 100

200

400

600

800

1000Unbiased but not efficient estimator

0 1 2 3 4 5 6 7 8 9 100

1000

2000

3000

4000

Efficient but biased estimator

0 1 2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

700

Not efficient nor unbiased estimator

Notation

• The observations x and the parameter vector θ are relatedby a model

x = g(θ) + w,

where g(·) is a function of θ.• Example: estimation of the phase of a sinusoidal signal.x[n] = A cos(2πf0n+ φ) +w[n], where θ = φ is theunknown parameter.

• An important special case is the linear model:

x = Hθ+ w,

where H is a known observation matrix.

Notation (cont.)

• Example: estimation of the amplitude and mean of asinusoidal signal. x[n] = A cos(2πf0n+ φ) + B+w[n],where θ = [AB]T are the unknown parameters.

• Can be formulated as x = Hθ+ w, where

x = Hθ+w =

cos(φ) 1cos(2πf0 + φ) 1cos(4πf0 + φ) 1cos(6πf0 + φ) 1

......

cos(2(N− 1)πf0 + φ) 1

(A

B

)+

w[0]w[1]w[2]

...w[N− 1]

Notation (cont.)

• The vector parameter θ is unknown and it is estimatedusing x and H and, possibly, other a priori information.

• An estimate of θ is denoted by θ.

• The estimation error is θ = θ− θ.

• The estimation model is x = Hθ. It is assumed that the zeromean random noise cannot be modeled.

• The prediction error is defined by x = x − x which satisfiesx = Hθ+ w.

MSE as criterion

• One might think that the MSE is a good criterion forestimator design.

• However, this is not usually true.• Example. Consider estimation of the DC level A in white

Gaussian noise:x[n] = A+w[n].

• Suppose we would like to improve the sample meanestimator to get a smaller MSE: Let A = a 1

N

∑N−1n=0 x[n] be

a modified mean sample estimator.• We will try to find such awhich results in minimum MSE.

MSE as criterion (cont.)

• E(A) = aA and var(A) = a2σ2/N so we have

MSE(A) =a2σ2

N︸︷︷︸variance

+(a− 1)2A2︸︷︷︸bias squared

Differentiating theMSEwith respect to a gives

dMSE

da=

2aσ2

N+ 2(a− 1)A2,


and setting it to zero yields

2aσ2

N+ 2(a− 1)A2 = 0

or

a

(2σ2

N+ 2A2

)= 2A2.

Solving a from here gives the minimum MSE solution:

aopt =A2

A2 + σ2/N


• Unfortunately, the minimum MSE estimator depends onthe true value of A. Since it is unknown, this estimator isnot realizable.

• The reason why finding the MMSE estimator failed, is thatthe true parameter value A is in the bias term.

• Typically having A in the bias term leads to unrealizableestimators. This is one reason, why we want to study onlyunbiased estimators and try to minimize their variance.

• Such estimators are called Minimum Variance UnbiasedEstimators (MVU estimators).

Minimum Variance Unbiased Estimator

• Generally, the minimumMSE estimator is not realizable.• When θ is an unknown deterministic parameter,MSE is

typically minimized by a biased estimator.• The minimum variance alone is not a good criterion. Note

that the estimator variance E[θ2] − E[θ]2 does not depend ofthe true parameter value θ. For example the estimatorθ ≡ 0 would minimize the variance no matter the data.

• It’s better to use the unbiasedness constraint and find theminimum variance estimator.

Minimum Variance Unbiased Estimator (cont.)

• (OR allow a small bias and use the MSE criterion.Allowing a small bias may decrease theMSEsignificantly—this is advanced stuff)

• Of all unbiased estimators, the minimum varianceunbiased estimator (MVUE) is also the one minimizingMSE.

Example.

• Consider estimating the DC level in zero mean whiteGaussian noise

x[n] = A+w[n], n = 0 . . .N− 1

whereA is the parameter to be estimated andw[n] is WGN.

• It is easy to see that the estimator A = 1N

∑N−1n=0 x[n] is an

unbiased estimator of A:

E(A) = E

[1N

N−1∑n=0

x[n]

]=

1N

N−1∑n=0

E[x[n]] =1NNA = A.

Example. (cont.)

• On the other hand, the estimator A = 12N

∑N−1n=0 x[n] is

biased with

E(A) = E

[1

2N

N−1∑n=0

x[n]

]=

12N

N−1∑n=0

E[x[n]] =1

2NNA = A/2.

Existence of Minimum Variance UnbiasedEstimator

• The MVUE does not always exist.• It is also possible that it may not exist a single MVUE for allθ.

• If the form of PDF changes with θ then it should beexpected that the estimator changes with θ.

Finding the MVUE: three approaches

1 So-called Cramer-Rao lower bound gives the theoreticalminimum variance for any unbiased estimator—nounbiased estimator can have a variance below that.• This gives one method for finding the MVUE: Invent

(guess) an estimator you think is the MVUE. See if it’svariance equals the CRLB. If yes, it is also the MVUE,because no other estimator can be better. If not, try anotherestimator.

• The obvious drawback is that finding this estimator is pureluck. Also, it may well be, that the MVUE does not exist ordoes not attain the CRLB (and there’s no way you’ll knowit).

Finding the MVUE: three approaches (cont.)

2 Apply the Rao-Blackwell-Lehmann-Scheffe theorem,which we’ll study in Chapter 5.• This method finds so called complete sufficient statistic T(x)ofθ.

• Then the MVUE is the conditional expectation E[θ|T(x)],where θ is any unbiased estimator of θ.

3 Further restrict the class of estimators to be not onlyunbiased, but also linear. Then find the minimum varianceestimator within this restricted class.• The linearization makes the work easier, but may not

produce the MVUE.

Documents

SGN-2607 Statistical Signal Processinghehu/SSP/lecture1.pdfSGN-2607 Statistical Signal Processing Course literature: Kay S. M. (1993), Fundamentals of Statistical Signal Processing