II - Random Processes - Applications to Signal ...faculty.nps.edu/fargues/teaching/ec3410/EC3410-II-SuFY14.pdf · ... Random Processes - Applications to Signal & Information ... Multiple

06/19/14 EC3410.SuFY14/MPF - Section II 1

II - Random Processes - Applications to Signal & Information Processing

• [p. 3] Random signal/sequence definition • [p. 6] Signal mean, variance, autocorrelation & autocovariance sequence, normalized cross-correlation sequence • [p. 16] Statistical characterization of random signals

– I.I.D. Random process – Stationarity – Wide sense stationarity (wss) – Jointly wide sense stationarity (jointly wss) – Correlation & cross-correlation for stationary RPs – Signal average – Ergodicity – Concept of white noise, colored noise, Bernoulli process, Random walk

• [p. 51] Application: MA Processes: definitions and pdf properties • [p. 57] Random process properties • [p. 66] Multiple Random Processes Joint Properties • [p. 72] Application to data analysis – How to assess signal stationarity • [p. 76] Application to data analysis - How to check IID assumption

– Autocorrelation -- Lag plot • [p. 89] Application: target range detection • [p. 91] Introduction to the spectrogram • [p. 95] Application: Gas furnace reaction time • [p. 98] Application: Evaluating correlation between random signals • [p. 101] Application: Evaluating correlation status between random signals • [p. 102] Application: Detection of the periodicity of stationary signals in noisy environments • [p. 104] Correlation matrix properties for a stationary process • [p. 108] How to estimate correlation lags; biased/unbiased estimator issues • [p. 116] Frequency domain description for a stationary process

– Power spectral density (PSD) definition & properties • [p. 127] Principal Component Analysis (PCA, DKLT)

– Applications to biometrics (face recognition) – Applications to network traffic flow anomaly detection

• [p. 158] Appendices • [p. 184] References


Examples • [p. 7] Example 1 • [p. 9] Example 2 • [p. 11] Example 3 • [p. 28] Example 4 • [p. 33] Example 5 • [p. 35] Example 6 • [p. 37] Example 7 • [p. 39] Example 8 • [p. 50] Example 9 • [p. 58] Example 10 • [p. 60] Example 11 • [p. 70] Example 12 • [p. 75] Example 13; Pack2Data1 • [p. 87] Example 14 • [p. 88] Example 15; Pack2Data3 • [p. 94] Example 16; Pack2Data2

• [p. 98] Example 17 • [p. 101] Example 18; Pack2Data6 • [p. 102] Example 19 • [p. 103] Example 20; Pack2Data4 • [p. 107] Example 21 • [p. 114] Example 22 • [p. 117] Example 23 • [p. 119] Example 24 • [p. 120] Example 25 • [p. 123] Example 26


Random Signal/Sequence - definitions

A RP is a mapping function that attributes a function x(t) = x(t,ξ) (for continuous signal case) or x(n)=x(nTs, ξ) (for discrete signal case) to each outcome of the random experiment

•

x(t, ξ1)

x(t, ξ3)

x(t, ξ2)

ξ1 •

• • ξ2

ξ3

t

t

t

• • •

• •

• •

• • •

• •

• •

• • •

• • •

• • •

• • •

•

• •

• • • •

• • • •

• • • • •

• •

• • • • •

• •

...

...

...

n

n

n x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

ξ1 •

• • ξ2

ξ3

Continuous random

signal/process

Discrete random

signal/process


•Consider sequence x(n) =x(n,ξ) ← for a fixed t, x(n) is a Random Variable (RV)

• x(n): → can be infinite dimensional

• x(n,ξ) for fixed RV ξ: called realization/trial of the random process

Random Signal/Sequence - definitions, cont’

• • •

• •

• •

• • •

• •

• •

• • •

• • •

• • •

• • •

•

• •

• • • •

• • • •

• • • • •

• •

• • • • •

• •

...

...

...

n

n

n x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

ξ1 •

• • ξ2

ξ3


• • •

• •

• •

• • •

• •

• •

• • •

• • •

• • •

• • •

•

• •

• • • •

• • • •

• • • • •

• •

• • • • •

• •

...

...

...

n

n

n x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

ξ1 •

• • ξ2

ξ3

Example: x(n,ξ) = ξcos(πn/10), where ξ = U[0,1].

For a discrete random signal


Signal mean value (ensemble average):

Signal variance: ( ) ( ){ }( ){ } ( )

22

2 2

( )x x

x

n E x n m n

E x n m n

σ = −

= −

n1 n

x[n]

n2

2 1n n= −

lag

( ) ( ){ }xm n E x n=

Discrete signal case

t1

t

x(t) 2 1t tτ = −Time lag (sec)

t2

Continuous signal case

Note: dimensionless!


x(n,ξ) = ξcos(πn/5), where ξ = U[0,1]. Example 1: Compute process mean and variance


Signal autocorrelation sequence:

n1 n

x[n]

n2

2 1k n n= −lag

( ) ( ) ( ) ( ){ }*1 2 1 2 1 2, ,xx xR n n R n n E x n x n= =

measures the dependency between values of the process at two different times. Allows to evaluate: 1) How quickly a random signal changes with respect to time, 2) The amount of memory a signal may have, 3) Whether the process has a periodic component and what the expected frequency might be, etc…


Let x(n) be a real valued process defined as x(n, ξ)= ξ where ξ is defined as a RV with mean 0 and variance σ2

x. Compute: Rx(k,n)

Example 2:

x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

ξ1 •

• • ξ2

ξ3



x(n,ξ) = cos(πn/5+ξ), where ξ = U[0,2π]. Example 3: Compute Rx(n1,n2)


( ) ( )( ) ( ) ( )( ){ }( ) ( )

*

1 2 1 1 2 2

*

1 2 1 2

( , )

( , )

xx x x

x x x

C n n E x m x m

R n n m m

n n n n

n n

= − −

= −

Signal autocovariance function (remove impact due to process mean):

Signal normalized correlation function (remove impact due to process mean and normalizes max value to 1):

( ) ( )( ) ( )

1 21 2

1 2

,, x

xx x

C n nn n

n nρ

σ σ= ( )1 2| , | 1 !!x n nρ ≤


Signal cross-correlation function:

( )1 2,xyR n n =

• Measures the dependency between values of two processes at two different times. • Allows to evaluate whether two processes are related in some linear fashion, or how well their dependence can be approximated by a linear relationship.

• Will NOT evaluate nonlinear dependence (as with the correlation coefficient defined earlier for random variables)

• Warning: Correlation does NOT imply causation


Signal cross-covariance function:

( ) ( ) ( ) ( )*1 2 1 2 1 2, ,xy xy x yC n n R n n m n m n= −

• Similar to cross-correlation function: measures the dependency between values of two processes at two different times,

but also

• Removes impact of the mean value.

Note: unless there is a good reason to keep the signal

means, best to remove or use covariance based expressions!


Normalized cross-correlation function:

( ) ( )( ) ( )

1 21 2

1 2

,, xy

xyx y

C n nn n

n nρ

σ σ= ( )1 2| , | 1 !!xy n nρ ≤


• Random signals are characterized by joint distribution (or density) of samples

• Fx(x1, x2, …, xk, n1,…, nk) = Pr [x(n1) ≤ x1, … x(nk) ≤ xk]

• F(.) is highly complex to compute - difficult or impossible to obtain in practice

Statistical Characterization of Random Signals


Independent, Identically Distributed (I.I.D.) Random Process:

A Random Process is said to be:

• An independent process (i.e., independent of itself at earlier and/or later times) if for any time index nk :

fx(x1, x2,…,xk;n1,…,nk) = f1(x1;n1)…fk(x2;nk)

• A RP process is IID if all RVs obtained for all time indices have the same pdf fx(x)

Note: I.I.D. processes have no memory, where a future value would depend on past values they can be viewed as building blocks for more realistic random signals. • Mean of I.I.D. Process:

mx(n) = E{x(n)} =


Independent, Identically Distributed (I.I.D.) RP, cont’

Autocovariance of an IID process

{ }{ } { }

{ }

*1 2 1 1 2 2

*1 1 2 2 1 2

21 1 1 2

( , ) ( ( ) ( ))( ( ) ( ))

( ( ) ( )) ( ( ) ( )) ,

| ( ( ) ( )) | ,

=

x x x

x x

x

C n n E x n m n x n m n

E x n m n E x n m n n n

E x n m n n n

= − −

− − ≠= − =

Autocorrelation of an IID process:

Rx(n1, n2) =


( , ) 0.05* ( ), ~ (0,1), ~ (0,1)x n n w n

N w Nξ ξ

ξ= + +

I.I.D. process ?

[ ( , )]E x n ξ

0 5 10 15 20 25 30 35 40 45 500

0.5

1

1.5

2

2.5

3

Time

E[x(n, ksi) ]


( , ) 0.05* ( ), with ~ (0,1), ( )~N(0,1)x n n w n

N w nξ ξ

ξ= + +

I.I.D. process ?

[ ( , )]E x n ξ

0 5 10 15 20 25 30 35 40 45 500

5

10RP x(n)=ksi + 0.05*n+w(n)

Time

Tria

l 1

0 5 10 15 20 25 30 35 40 45 50-5

0

5

Time

Tria

l 2

0 5 10 15 20 25 30 35 40 45 50-5

0

5

Time

Tria

l 3

0 5 10 15 20 25 30 35 40 45 50-5

0

5

Time

Tria

l 4


Data Analysis Application – What does the I.I.D assumption mean when talking about a finite time trial of the random signal? From [7, p. 17]

• IID is a property of the RP, not of the single trial.

• Saying that a signal is IID means we can consider the collected signal set {xi}i=1,…N as obtained from a sequence of random variables {Xi}i=1,…N, where RVs are independent and have the same pdf.

• See how the assumption can be verified for data later.

•Do we need the I.I.D. assumption? No, but it is very convenient and greatly simplifies CI derivations. non I.I.D. examples discussed in [7, Sect. 3]


Stationarity Concept:

If x(n) is stationary for all orders N = 1, 2, … x(n) is said to be strict-sense stationary.

• If x(n) is stationary for order N = 1,

•Stationary up to order 2 → called wide-sense stationary (WSS).

( ; ) ( ; ) x xf x n f x n k⇒ = +

Pdf is identical for all times indices n

Definition: a RP is said to be stationary if any joint density or distribution function depends only on the spacing between samples, not where in the sequence the samples occur

fx(x1, …, xN; n1, …, nN) = fx(x1, …, xN; n1+k ,…, nN+k) for any k & any joint pdf


Stationarity of order N=1 - Physical interpretation

• •

• •

• •

• •

• •

• •

• •

• •

• •

• •

•

• •

• •

•

•

• •

• • •

•

• • •

• •

• • • •

•

• •

• •

• • •

•

...

...

...

n

n

n

x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

• •

• •

• •

• •

• •

• •

• •

• •

• •

• • •

•

•

• •

• • •

• ...

...

n

n

n

x(n, ξ4)

•

• •

•

• • •

• •

• • • •

•

• •

• •

• • •

•

... x(n, ξP)

x(n, ξ5)

Experiment is performed P times leads to P time sequences How to compute Fx(x1; n1) = Pr [x(n1) ≤ x1] [Probability that the functions x(n,ξ) do not exceed x1 at time n1] • Select values for x1and n1 • Count the number of trials K for which x(n1) ≤ x1 Fx(x1; n1) = Pr [x(n1) ≤ x1] = K/P Stationarity of order 1 means Fx(x1; n1) = Fx(x1; n2)=K/P Pr [x(n1) ≤ x1]= Pr [x(n2) ≤ x1]

x1

x1

x1

x1

x1

x1

n1 [11]

n2


Stationarity of order N=2 - Physical interpretation Experiment is performed P times leads to P time sequences How to compute Fx(x1, x2; n1, n2) = Pr [x(n1) ≤ x1, x(n2) ≤ x2] [Probability that the functions x(n,ξ) do not exceed x1 at time n1 and x2 at time n2] • Select values for x1, x2, n1, n2 • Count the number of trials K for which x(n1) ≤ x1 and x(n2) ≤ x2 Fx(x1,x2; n1, n2) = K/P Stationarity of order 2 means Fx(x1,x2; n1,n2) = Fx(x1,x2; n1+N,n2+N) Pr [x(n1) ≤ x1, x(n2) ≤ x2]= Pr [x(n1+N) ≤ x1, x(n2+N) ≤ x2]

• •

• •

• •

• •

• •

• •

• •

• •

• •

• •

•

• •

• •

•

•

• •

• • •

•

• • •

• •

• • • •

•

• •

• •

• • •

•

...

...

...

n

n

n

x(n, ξ1)

x(n, ξ3)

x(n, ξ2)

• •

• •

• •

• •

• •

• •

• •

• •

• •

• • •

•

•

• •

• • •

• ...

...

n

n

n

x(n, ξ4)

•

• •

•

• • •

• •

• • • •

•

• •

• •

• • •

•

... x(n, ξP)

x(n, ξ5)

x1

x1

x1

x1

x1

x1

n1 x2

x2

x2

x2

x2

x2

n2 [11]


Wide-Sense Stationarity Concept Definition: a random signal x(n) is called wide-sense stationary (WSS) if

(1) the mean is a constant independent of “n”

(2) the autocorrelation depends only on the distance k= n1 − n2

Consequences:

(1) Correlation sequence defined with one index only: Rx(k) which measures the amount of “predictability” of the RP (which is linked to memory present in the process)

(2) Variance is a constant independent of “n”

( ){ } ( )x xE x n m n m= =

( ) ( ) ( )( ) ( ) ( ){ }

2 2 11

*

21, ,

xx x x

x

n nR n n R n n R

kR E x n x n k

= =

= = −

−

( ) ( ) ( ) ( ){ } ( ){ } ( )

( ) ( )

2 22 2

22

| |

0 | |

x x x

x x x

n E x n m n E x n m n

R m

σ

σ

= − = −

= − =


Wide-Sense Stationarity Concept, cont’ (3) the autocovariance also depends only on the time lag distance k= n1 − n2

( ) ( )

( ) ( ) ( ){ }( )

( ) ( )

1 2 1 2

1 2

*

*

2

,Select

( )( )

| |

xx x

x x x

x x x

x x x

C n n C n nk n n

C k E x n m x n k m

R k m m

C k R k m

= −

= −

= − − −

= −

= −


Correlation/Covariance Function Properties for wss x(n)

(1) Conjugate symmetry

(2) Positive semi-definite property:

i.e., for any N and vector we have

(3) Rx(k) max at k = 0 and Rx(0)>0

(can we have Rx(0)=0?)

does (3) hold for Cx(k)?

( ) ( ) ( ) ( )* * Cx x x xR k R k k C k= − = −

( ){ } 1

N

na n

=

( ) ( ) ( )0 1

*1 1 0 0

1 10

N N

xn n

a n R n n a n= =

− ≥∑∑Useful

consequence



Example 4 - a RP process consists of 4 possible sample functions occurring with equal likelihood

1) Find the mean and correlation function 2) Is the RP wss?

1 2 3

4

( ) 1, ( ) 1, ( ) cos(0.2 ), ( ) sin(0.2 )x n x n x n n

x n nππ

= − = ==



• Coherence (Normalized covariance, also called normalized correlation coefficient) function for a wss process is defined as

measures the predictability of a RP ( easier to judge than by using Rx(k) ) as it is a bounded quantity

2 | ( ) | 1( )( ) xxx

x

C k kkρσ

ρ ≤= !!

Wide-Sense Stationarity, cont’


Definition: x(n) and y(n) are said to be w.s. jointly stationary if:

1) x(n) and y(n) are each wss stationary

2) Rxy (n1, n0) = Rxy (n1 − n0)

Consequence: When x(n) and y(n) are w.s. jointly stationary:


( ) ( ) ( ) ( ){ }( ) ( )

( )( ) ( )( ){ } ( )

* *1 0 1 0 1 0

1 0 1 0

*

, { ( ) } ( )

,

=

k

xy xy

xy xy

k

x y xy x y

R n n E x n y n R n n E x n y n k

C n n C n n

E x n m y n k m R k m m∗

== = − = −

= − =

− − − = −


• Cross correlation/covariance properties:

Rxy(k)=

Cxy(k)=

• Normalized cross-covariance is defined as ( )

( ) | ( ) | 1xyyx xy

x y

C kk kρ

σρ

σ≤= !!


Measures the amount of common information between 2 RPs delayed between each other by time lag k. Concept used as basis for radar detection schemes (more later….).


Example 5: x(n,ξ) = exp[j(πn/5+ξ)], where ξ = U[0,2π].

y(n,ξ) = exp[j(πn/5+ξ’)], where ξ’ = U[0,π].

1) Compute Rxy(n1,n2), assume ξ and ξ’ independent 2) Are the processes j. wss?



Example 6

Assume you are given the zero-mean wss random processes x(n) and y(n) defined as: y(n)=x(n-D)+w(n), where w(n) is zero mean and wss and independent of x(n). Compute Rxy(k)


in many applications only one realization of a RP is available

in general, one single member doesn’t provide information about the statistics of the process

except when process is stationary +ergodic: statistical information cannot be derived from one realization of RP, i.e., time averages

Def: a RP is called ergodic if:

all ensemble averages = all corresponding time averages

Def: a RP is said to be ergodic in the mean if:

Ergodicity:

Signal (time) Average:

Def: a wss RP is said to be ergodic in correlation at lag k if:

[ ] ( )1lim ,2 1

N

N n Nx n x n

Nξ

→+∞=−

=+ ∑

( ) ( )1( ) lim , * ,2 1

N

x N n NR k x n x n k

Nξ ξ

→+∞=−

= −+ ∑

[ ( , )] ( , )E x n x nξ ξ=< >


Process can be stationary and NOT ergodic

Ergodicity, cont’

Example 7: Assume RP which is a dc voltage waveform where the pdf for the voltage is given by U[0, 5].

1) Plot several possible trials for the RP 2) Is the process wss ? 3) Is the process ergodic in the mean?

x(n,ξ )



Example 8 - Consider the RP x(t) shown below. Check whether the process is 1) ergodic in the mean, 2) ergodic in correlation, 3) wss.

x1(t)=K

x2(t)=-K

1 1/ 2P =

2 1/ 2P =



RP Example - White noise

Definition: A random sequence w(n) is called a white noise process with mean 0 and variance σ2

w iff E{w(n)} = 0 & Rw(k) = σ 2wδ(k)

Notes: 1) All frequencies contribute the same amount (as in the case of white light, therefore the name of “white noise”) 2) There is NO constraint on the pdf. If the pdf of w(n) is Gaussian: it is called “white Gaussian noise” 3) In communication systems applications, thermal noise at a receiver is defined as the process w(n) with autocorrelation defined as: Rw(k) = (Ν0 /2)δ(k) where N0=KT, K=1.37 10-23 Joules/kelvin (Boltzmann’s constant) T: receiver noise temperature in kelvin


x=randn(1200,1); [rx,lags]=xcorr(x,50,'biased'); figure subplot(211),plot(x),title('White Gaussian noise') xlabel('Sample number') subplot(212),plot(lags,rx)

Assume process is ergodic

Why do we need

ergodicity?


RP Example - Colored noise

Definition: A non periodic random noise sequence w(n) is called a colored noise process if Rw(k) is not zero at k≠0 and

Notes: 1) All frequencies do NOT contribute the same amount (as was the case for white noise) 2) There is NO constraint on the pdf. If the pdf of w(n) is Gaussian: it is called “colored Gaussian noise” 3) Colored can easily be generated by passing white noise through a filter

lim [ ] 0wkR k

→∞=


x=randn(1200,1); h=(1/30.)*ones(30,1); y=filter(h,1,x); % basic averaging filter [ry,lags]=xcorr(y,50,'biased'); subplot(211),plot(y),title('Colored Gaussian noise') xlabel('Sample number') subplot(212),plot(lags,ry) title('Correlation sequence'),xlabel('Lag number')

Assume process is ergodic

How can we check

whether the correlation plot make

sense?



RP Example - Bernoulli Process a binary sequence & independent samples

• Probabilistic description:

Pr (x(0) = 1, x(1) = 1, x(2) = −1) =

n

x[n]

. . .

x[n] = 1 with probability P = −1 with probability (1 − P) for P = 1/2 process is called binary white noise

Mean Variance


Random Walk Random Process

• Consider a sequence of I.I.D. RVs {Xi}

• Define

•The process Sn is called simple random walk when Xi = ± 1 (Bernoulli RVs)

• When P = 1/2 and Xi = ± 1 (i.e., for Bernoulli process): discrete Wiener process

Turns out:[ ( )] 0

var( ( )) 1E x n

x n n=

= +

( )

0( ) ( ), 0,1,....

( ) ( ) ( ) sum process( ) 1/ ( ) mean process

n

kS n X k n

S n S n X nM n n S n

=

= =

= + ←

= ←

∑

Is this a wss process ?


RP Example - Random Walk, cont’ Sequence of I.I.D. RVs {Xi} & S(n) = X1 + X2 + . . . + Xn, n = 1, 2, . .

•Property: S(n) has independent increments in non-overlapping time intervals

( ) ( )2 1

1 2

2 3

2 1 1 1

1

3 2 1

( ) ( )

( ) ( )

n n

n n

n n

S n S n X X X X

X X

S n S n X X

+

+

− = + + − + +

= + +

− = + +

n1 n2 n3


Random Walk, cont’ General Character • Tends to have long runs of positive and negative values.

• Length of runs increases with increasing time, local behavior remains the same.

s = rand(1,10000); r = cumsum(( (s > 0.5) *2) - 1);

Random walks applications found in Economics: to model shares prices, Physics: to model random movement of molecules in liquids and gases, Vision science: used to describe eye movements, Psychology: to explain the relation between the time needed to make a decision and the probability that a certain decision will be made,


Example 9 - You are given the simple random walk process S(n). Compute P[S(n)]=+1 after 3 steps.


RP Example –Moving Average (MA) Random Process

0( ) ( ),

where ( ) is zero mean white noise & ergodic

N

pp

x n a s n p

s n=

= −∑

• Compute mx(k), and Rx(n0,n1) • Is x(n) wss?



Data Analysis Application – pdf properties of MA Random Processes

0( ) ( ),

where ( ) is zero mean white noise

N

pp

x n a s n p

s n=

= −∑

Can you say anything on the pdf of x(n) ?

06/19/14 EC3410.SuFY14/MPF - Section II 55 55

Recall Lindeberg’s Central Limit Theorem (CLT) - NotePack 1 Describes the limiting behavior of the distribution function of a sum of independent random variables with finite mean and variance (Feller’s condition) (note “identically distributed” is no longer required….)

1 1 1

2 2

1 1

; with [ ] ;

var[ ]

n i

n i

n N N

n i s i xi i i

N N

s i xi i

s x m E x m

xσ σ

= = =

= =

= = =

= =

∑ ∑ ∑

∑ ∑2

2k=1,...N

2

Provided max 0 as n

then ~ ( , ) when is large

k

n

n n

s

s

n s ss N m n

σσ

σ

→ → ∞

mean variance

N(a,b)

Results can be applied to filters outputs….




Random Process Properties

• if x(n) is periodic, ( ) ( )x n x n N= +

• Mean

( ) ( ) ( ) ( )x xm n E x n E x n pN m n pN= = + = + • Correlation/Covariance for a wss RP

( ) ( )1 2 1 2,x xR n n R n n= −

( ) ( )( ) ( )

x x

x x

R k R k pN

C k C k pN

= +

= +

Mean & Correlation/covariance functions of a periodic

process are also periodic with the same period


Example 10 1) x(n) = A exp (j(ωn + θ)), θ ~ U [0,2π] Compute Rx(k) & mx(n) 2) x(n) = A cos(ωn + θ), θ ~ U [0,2π] Compute Rx(k) & mx(n)



Example 11 y(n)=s(n)+w(n), where s(n)=A cos(ωn + θ), θ ~ U [0,2π], w(n) zero-mean white wss noise, w(n) & s(n) are independent. Compute Ry(k) and my(n)



- A RP is said to be uncorrelated if

Uncorrelated Random Process

{ }{ }

*1 2 1 1 2 2

2 21 1 1 1 2

1 2

21 2 1 1 2

*1 2 1 2 1 2

21 2 1 1 2

( , ) ( ( ) ( ))( ( ) ( ))

| ( ) ( ) | ( ),

0,

( , ) ( ) ( )or equivalently if

( , ) ( , ) ( ) ( )

( , ) = ( ) ( )

x x x

x x

x x

x x x x

x x x

C n n E x n m n x n m n

E x n m n n n n

n n

C n n n n n

R n n C n n m n m nR n n n n n m

σ

σ δ

σ δ

= − −

− = == ≠

= −

= +

− + *1 2( ) ( )xn m n


Uncorrelated wss Random Process

- A wss RP is said to be uncorrelated if

{ }{ }

*

2 2

2

2

2 2

( ) ( ( ) )( ( ) )

| ( ) | , 0

0, 0

( ) ( )or equivalently

( ) ( ) | |

( ) = ( ) | |

x x x

x x

x x

x x x

x x x

C k E x n m x n k m

E x n m k

k

C k k

R k C k mR k k m

σ

σ δ

σ δ

= − − −

− = == ≠

=

= +

+


- A RP is said to be wide-sense (ws) cyclostationary if ∃ N such that

mx (n) = mx(n + N), ∀ n Rx (n1, n2) = Rx(n1 + N, n2 + N)

Examples of a w.s. cyclostationary process:

* DSB-AM signal

x(n) = A(n) cos (ω0n) A(n) = stationary RP

ω0 = constant

Cyclostationary Process

Signal statistics vary periodically with time

Leads to correlation

between areas of the signal

spectrum

Note: signal itself NOT necessarily

periodic

* OFDM signals


Cyclostationary properties, cont’

• Cyclostationary property taken advantage of in cognitive radio (CR) detection applications:

- pilot symbols used in OFDM applications exhibit periodic behaviors resulting in cyclostationary signal behavior - noise usually doesn’t exhibit periodic behavior - difference between signal/noise behavior taken advantage of to extract OFDM signal characteristics.


• Two RPs x(n) and y(n) are said to be statistically independent (of each other) if for all time indices n1 and n2:

• Or equivalently that for each choice of n1 and n2, the RVs x(n1) and x(n2) are independent. • 2 RPs x(n) and y(n) are said to be uncorrelated (of each other) if for all values n1 and n2

•2 RPs x(n) and y(n) are independent of each other uncorrelated

Multiple Random Processes Joint Properties

{ } { } { }

{ }

* *1 2 1 2 1 2

*1 2 1 1 2 2

( , ) ( ) ( ) ( ) ( )

which is equivalent to:

( , ) ( ( ) ( ))( ( ) ( )) 0

xy

xy x y

R n n E x n y n E x n E y n

C n n E x n m n y n m n

= =

= − − =

1 2 1 2( , , , ) ( , ) ( , )xy x yf x y n n f x n f y n=


• Two RPs x(n) and y(n) are said to be jointly Gaussian RPs if for any choice of ni and mi, the random vectors [x(n1), …x(nn)] and [y(m1), …y(mn)] are jointly Gaussian.

• If x(n) and y(n) are jointly Gaussian and uncorrelated RPs

independent

• 2 RPs x(n) and y(n) are said to be orthogonal if for all values n1 and n2

Multiple Random Processes Joint Properties, cont’

{ }*1 2 1 2( , ) ( ) ( ) 0 xyR n n E x n y n= =


• Two wss RPs x(n) and y(n) are said to be statistically independent (of each other) if for all time lag k and time n:

• Or equivalently that for each choice of n and k the RVs x(n) and x(n+k) are independent. • 2 wss RPs x(n) and y(n) are said to be uncorrelated (of each other) if for all time lag values k

• 2 wss RPs x(n) and y(n) are said to be orthogonal if for all time lag k

Multiple wss Random Processes Joint Properties

{ } { } { }{ }

* * *

*

( ) ( ) ( ) ( ) ( ) =

which is equivalent to: ( ) ( ( ) )( ( ) ) 0

xy x y

xy x y

R k E x n y n k E x n E y n k m m

C k E x n m y n k m

= − = −

= − − − =

( , , , ) ( , ) ( , )xy x yf x y n n k f x n f y n k+ = +

{ }*( ) ( ) ( ) 0 xyR k E x n y n k= − =


Recall that:

• 2 wss RPs x(n) and y(n) are said to be uncorrelated (of each other) if for all time lag values k

• 2 wss RPs x(n) and y(n) are said to be orthogonal if for all time lag k:

Multiple wss Random Processes Joint Properties, cont’

*( ) or ( ) 0 xy x y xyR k m m C k= =

( ) 0 xyR k =

Consequences: 2 wss RPs x(n) and y(n) are independent of each other uncorrelated 2 wss orthogonal RPs x(n) and y(n) are orthogonal and at least one RP is zero mean

uncorrelated

(Unless both RPs are Gaussian)


Example 12: let x(n) and y(n) be RPs generated as:

x(n) = αn, y(n)=α2n, α~N(0,1)

1) find the mean mx(n), my(n)

2) are x(n), y(n) wss?

3) are x(n), y(n) uncorrelated RPs?, uncorrelated of each other?

4) are x(n), y(n) independent RPs?, independent of each other?



Investigate changes in mean or variance: - If changes occur process is not wss How do we decide there is a change? (visually or via statistical

tests) - Visually

- Statistical tests: Two-Sample tests for equal means and equal variances over small non overlapping data blocks (independence bet. samples required for the tests)

Data Analysis Application – How do we assess whether data is stationary?

1) Consider the environment that produced it, 2) Check whether basic properties of the signal change with time or not.

Compute and track changes in mx(t) & varx(t)


wss data example Non-wss data example


- Tests can be implemented over short-time windows in MATLAB using

• ttest2.m (use the t-distribution) • vartest2.m (use the F-distribution)

- Requires the selection of a level of significance α (usually picked

around 5 to 10%) Tests sensitive to block lengths (useful only when data set is large

enough…)


Example 13: You collected data from 2 thermal sensors X and Y. The data collected for each is contained in the matrix DATA=[X,Y]. Can each data be considered wss? ( Pack2Data1.mat) MATLAB hint: The Matlab function Pack2Example13Template.m provides you with a shell code to compute short-term statistics defined over overlapping data segments. Use if you find useful.


Data Analysis Application – How do we know if the I.I.D assumption is valid ?

1) Inspect the normalized correlation plot

EXAMPLE [Ref 7, Ex. 2.18]: CPU DATA. Execution times for n = 7632 consecutive requests are measured and displayed on the upper left panel. Initial testing indicates the data appears stationary and roughly normal so the autocorrelation function can be used to test independence.

• The plot on the lower left panel shows a strong correlation. data is not independent

• Assume you are interested in extracting a IID sequence out of this data. How would you do so? Try sub-sampling


( )x kρ

Data x(n)

• Random sub-sampling example, [Ref. 7]


Data Analysis Application – How do we know if the I.I.D assumption is valid?, cont’

EXAMPLE [Ref 7, Ex. 2.18], cont’: CPU DATA example is not IID. • How is sub-sampling implemented?

- Basic N-level sub-sampling may be implemented by picking every Nth sample. However, this may result in aliasing in some cases (why is that? hint: think about what decimating does to the signal in the frequency domain see plots next page) - A better approach introduces randomness in the picking task. The sub-sampled data is obtained following the random sub-sampling scheme as follows. For every index i = 1...n, decide with probability p = 1/2 whether the point is kept. This gives the second plot on the figure. Then repeat the process. This gives sub-sampled data with p = 1/2 to 1/27 = 1/128.


• Comparisons between deterministic/random sub-sampling y(n)=x(n)+0.5x(n-1)+0.3x(n-2)

Original signal spectrum & normalized correlation correlation between samples shows up to lag 2.

Spectrum of signal down-sampled by picking every other sample & and resulting normalized correlation correlation between samples shows at lag 1

Spectrum of signal down-sampled by picking randomly every other sample on average & and resulting normalized correlation correlation between samples shows at lag 1.

Spectrum of signal down-sampled by picking randomly every 4th sample on average & and resulting normalized correlation very weak correlation between samples shows at lag 1.

Conclusion: Degree of correlation between samples has decreased by picking every other sample. However, this may not be always the case. See next example


• Comparisons between deterministic/random sub-sampling, cont’ Original signal spectrum & normalized correlation long term correlation between samples shows.

Spectrum of signal down-sampled by picking every other sample & and resulting normalized correlation correlation between samples shows.

Spectrum of signal down-sampled by picking randomly every other sample on average & and resulting normalized correlation decreased correlation between samples shows.

Spectrum of signal down-sampled by picking randomly every 4th sample on average & and resulting normalized correlation correlation between samples has significantly decreased.

Conclusion: Degree of correlation between samples has decreased by picking every other sample in a random fashion, while it does NOT when samples are picked in a regular fashion..


• Comparisons between deterministic/random sub-sampling, cont’

Conclusion: If you wish to extract an IID sequence out of a correlated sequence, the best (safest) approach is to sub-sample in a random fashion to avoid potential aliasing effects.


( )x kρ

Data x(n)

• Random sub-sampling example, [Ref. 7]

All ρx(k) within CI bounds!


- Recall that the normalized correlation sequence obtained from a white sequence x(n) is defined as 2( ) ( )x xk kρ σ δ=Question: when can be considered equal to 0? use CI concept ( )x kρ

• Result: when x(n) is IID with pdf~N(0,1),

0( ) (0,1/ )x kk N Nρ ≠

• For 95% CI, zα/2=1.96 | ( ) | 1.96 /x k Nρ <

• How to evaluate the transformed sequence is IID?


EXAMPLE cont’ [Ref 7, Ex. 2.18]: The previous figure shows that the data looses correlation when the sampling probability is p = 1/64. The turning point test for the sub-sampled data with p = 1/64 The sub-sampled data has 114 points, and the 95% CI obtained for the estimated mean of the sub-sampled data is [65.5, 71.7]. The 95% confidence interval that would be obtained if we would (wrongly) assume the original data to be IID is: [69.2, 69.9]. The IID assumption grossly underestimates the CI because the data is correlated.


Is sub-sampling always the solution to removing correlation ? unfortunately not always! [Ex 2.19, Ref 7] shows the number of bytes transferred over an Ethernet LAN, (360,000 points) Illustrates long range dependent data

Above CI upper limit


How do we know the iid assumption is valid ? Cont’

2) Inspect the “lag plot”

Def: plot x(n) versus x(n+lag) for different values of “lag” lag plot checks whether a data set or time series is independent

or not. Random data does not exhibit any identifiable structure in the lag plot. Non-random structure in the lagplot indicates that the underlying data may be correlated in some fashion.

[Ref 8]


Example 14: Evaluate the “lagplot” obtained for a random walk sequence

11

=n

n k n nk

X S X S−=

= +∑


Example 15: You are given measurements collected by sensors x y and z. ( Pack2Data3.mat)

1) Using the correlation function xcorr.m, evaluate whether the measurements obtained for each sensor are correlated or not.

Hint: 1) use [xcor,lags]=xcorr(x,maxlag,’coeff’); this will insure you can plot the range of correlation coefficients for a specified range of correlation lags from –maxlag to +maxlag and the correlation is normalized to have Rx(0)=1; 2) use a relatively small number of lags, around 20 or less to start so that you can see what happens around lag 0.

1) Using the matlab function lagplot.m plot lagplots for user-specified lags for sequences x, y and z. Evaluate whether the measurements obtained for each sensor are correlated. You can start with lags 2, 4, 8 and higher. Repeat for the measurements contained in y & z.

2) Estimate the maximum lag at which the data is correlated for each sensor. 3) Assume you want to generate an uncorrelated sequence out of y. Explain

how random subsampling can be used, implement random subsampling of y with factors equal to 2, 4, 8. Explain how you check that the data extracted of out y is uncorrelated.

4) Repeat 3) for sequence z. MATLAB note: The Matlab function Pack2Example15Template.m provides you with a shell code to compute a

random sub-sampled sequence for various subsampling amounts. Use if you find useful.


Target

assume y(n) = x(n - N)

Application - Radar Target Detection Cross-correlation application

( ) ( ) ( ){ }*yxR E y n x n= − =


Assume y(n) = x(n - N)

( ) ( ) ( ){ }*xyR k E x n y n k= − =


Brief introduction to the Sliding Window FT (spectrogram)

FT

FT

FT

FT

Usually window incremented by a fraction (25 to 75%).

w[m]

Time

Freq

uenc

y


Linear Chirp – Time domain

Linear Chirp – Spectrogram

Nor

mal

ized

Fr

eque

ncy,

f s=2

Spectrogram, cont’


Low noise level

High noise level


Example 16: Assume you send the chirp signal x(t). You turn your receiver on at the time you send x(t) and leave it on until you receive y(t). Assume the sampling frequency to be equal to 1Hz.

You have two scenarios to investigate: high and low SNR received signals obtained by sending x(t). The received signal in the high SNR case is yhigh(t). The received signal in the low SNR case is ylow(t). Plot the spectrograms for x(t), yhigh(t), and ylow(t) using the matlab function spectrogram.m • A good set of starting values for the spectrogram are: window length 32; overlap 16, nfft=2048. • Use Fs=1 and the ‘yaxis’ option so that the spectrogram is plotted with the time axis as the x-axis. Estimate the target distance in number of samples obtained for both cases ( x, ylow & yhigh in Pack2Data2.mat)


Application: Gas furnace reaction time – cross-covariance/cross-correlation function application

Example: x1(t) represents a furnace input gas feed rate for a gas furnace x2(t) represents the % of CO2 in outlet gas Goal: Evaluate how fast the furnace responds to changes in the gas feed rate

[Box-Jenkins data]

06/19/14 EC3410.SuFY14/MPF - Section II 97 EC3410.WFY10/MPF - Section II

2 1

2 1

2 1

( )Compute: ( ) x x

x xx x

C kkρ

σ σ=

Min at lag=5

Question: What is the significance of the minimum ?


Question: What is the impact of the data mean?


Example 17 You are given the following 2 independent ergodic random signals s1(n) and s2(n) generated as RVs pdfs ~ U[-0.5,0.5], and the following pairs of ergodic random signals derived as:

Assume you generate 10000 data values of s1 and s2. Scatter plots and histogram information are shown next page. 1) Comment on the correlation status between y1(n) & y2(n), z1(n) & z2(n), v1(n) & v2(n), v1(n) & y1(n), v2(n) & y1(n). 2) Provide justification regarding the difference between the pdfs behaviors of the pairs y1(n) & y2(n) and z1(n) & z2(n)

1 1 1 1

2 2 2 2

1 1

2 2

( ) ( ) ( ) ( )8 10 1 1( ) ( ) ( ) ( )10 2 1 1

( ) ( 10)8 10( ) ( 5)10 2

y n s n z n s ny n s n z n s n

v n s nv n s n

− = =

+

= +



-20 -15 -10 -5 0 5 10 15 20-1

-0.5

0

0.5

1Ry1,y2

lag no.-20 -15 -10 -5 0 5 10 15 20

-1

-0.5

0

0.5

1Rz1,z2

lag no.

-20 -15 -10 -5 0 5 10 15 20-1

-0.5

0

0.5

1Rv1,v2

lag no.-20 -15 -10 -5 0 5 10 15 20

-1

-0.5

0

0.5

1Rv1,y1

lag no.

-20 -15 -10 -5 0 5 10 15 20-1

-0.5

0

0.5

1Rv2,y1

lag no.


Example 18 You are given the following ergodic random signals s1(n), s2(n),s3(n). Extract 1) information regarding their density characteristics, and 2) whether and how they may be related to each other. Pack2Data6.mat


Property: if the process x(n) is a-periodic and zero-mean, then

Application: Detection of the signal periodicity in noisy environments

Example 19: Assume we have a sinusoidal signal x(n) with uniform random phase φ imbedded in wss zero-mean white noise w(n) with variance σ2 (signal and noise uncorrelated). The correlation sequence may be used to get information on the properties of the periodic signal

lim ( ) 0xkR k

→∞=

Period N=?


Example 20: You are given measurements collected from 2 underwater sensors y1 and y2, which contain tone(s) imbedded in noise ( Pack2Data4.mat). Assume the sampling frequency is equal to 1Hz. 1) Evaluate whether you can extract periodicity information on y1 and y2 using correlation information. 2) Evaluate whether the noise is white or not by computing the spectral estimates . Note: spectral estimates may be computed using h = spectrum.welch; % Select the frequency estimation scheme hpsd=psd(h,y,'Fs',1); % Calculate the frequency information, assume sampling freq=1Hz. Plot(hpsd) % plot frequency information from 0 to 1/2fs 3) Compute the frequency information from y1 and y2 from the spectral estimates computed in 2), and derive periodicity information on y1 and y2. 3) Compare the information obtained with both approaches. List advantages/limitations of both approaches.

2 2

1 2( ) & ( )j jY e Y eω ω


Correlation Matrix Properties for a Stationary Process

Recall:

Correlation Matrix for a Stationary Process x(n) stationary ⇒ Rx(n1, n0) = Rx(n1 – n0)

Rx =

Assume you have the 2-dimensional random vector x = [x(0), x(1)]T

( )( )

( ) ( )**0 0 , 11

Hx

x x xR E xx E

x

= = =


Assume x = [x(0), x(1)]T

Correlation Matrix for a Periodic RP

Rx(n) = Rx(n + N) N: period


Correlation Matrix Properties

(1) Rx is Hermitian

(2) Rx is positive semi-definite, i.e., λ(Rx)

(3) Rx has an eigendecomposition of the form Rx=U Λ UH where: * U is a unitary eigenvector matrix (UUH=UHU=I) * Λ is a diagonal eigenvector matrix (4) The eigenvectors are orthogonal to each other

(5) .

Assume x = [x(0), …, x (N − 1)]T

tr( )x ii

R λ= ∑


3 11 3xR

=

Is Rx a valid correlation matrix for the wss process x(n)?

Example 21:


• Quality of estimate? → find mean and variance of

How to compute correlation estimates

• For discrete and real data: x = [x(0), , x(N-1 )]T

( )ˆxR k

( ) ( ) ( ) ( )1

0

1ˆ ˆN k

x xi

sR R x i x i kN k

kTk− −

=

= = +− ∑

( ) ( ) ( )1

0

1ˆ(1) N k

xi

E R k E x i x i kN k

− −

=

= + − ∑

( ) 0Assume known from 0 ,& ergodic (why?)x t t t T= → =

in seconds Lag: dimensionless


How to compute correlation estimates, cont’

( )[ ]

( ) ( ) ( )22

ˆ(2) Var

when

x x x xi

NR k R i R i k R i kN k

N k

∞

=−∞

≅ + + − −

>>

∑


( ) ( ) ( ) ( )1

0

1 N k

x x si

R k R kT x i x i kN

− −

=

= = +∑

( )(1) xE R k =

( ) ( ) ( ) ( )21(2) Var

0

x x x xR k R i R i k R i kN

k

∞

−∞

≅ + + −

>

∑

Alternate Estimator: Biased Estimator

Quality of estimate:

How to compute correlation estimates, cont’


( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )[ ]

( )

( ) ( )

( ) ( ) ( ) ( )

( )

1 1

0 0

2

1 1ˆ

ˆ

1 ˆVar Var

whenˆVar Constant Var

1 ˆ

when

Var

b

N k N k

x xi i

x x x

x x x

x x x

x

R k x i x i k R k x i x i kN N k

N kE R k R k E R k R kN

NR k R R k RN N k

k N

R k R k

E R k R k E R k R kN

N

R k

− − − −

= =

= + = +−

− = =

= = −

→

→ <∞ + ∞

= →

→ + ∞

∑ ∑

L L

( )ias of xE R k

Biased Estimator Unbiased Estimator

Biased/unbiased discrete correlation estimator summary


je ω


Example 22: Comparing theoretical and Estimated correlation sequences

Assume that you are given a wss ergodic RP generated as: s(n)=as(n-1)+v(n), where v(n) is Gaussian zero-mean white noise ~N(0,1), with |a|<1 1) Compute the theoretical correlation expression Rs(k) Figure next page plots 1) theoretical correlation values, 2) estimated biased correlation values, assuming N=50 data points are available for s(n), and 3) estimated biased correlation values, assuming N=10000 data points are available for s(n) and a=0.8. 2) Comment on differences




Frequency Domain Description of Stationary Processes

Power spectral density (PSD)

( ) ( ) ( )

( ) ( ) ( )2

12

j j kx T x x

j j j kx x x

S e F R k R k e

R k IFT S e S e e d

ω

ω ω

π

ω

ω ωπ

−= =

= =

∑

∫

Digital frequency defined for 0 2ω π≤ ≤

Covers range [0,2π] or [−π,π], etc


( ) 1kxR k a a= <

Example 23: find the PSD of zero-mean w.s.s. x(n) with


PSD has three key properties:

The PSD Sx(ejω) is a real-valued and periodic (1) P1 = with period 2π for any x(n) (see details in Appendix C) if x(n) is real then Sx(ejω) is even

(2) P2 = The PSD Sx(ejω) is non-negative definite; i.e., Sx(ejω) ≥ 0 → see page 159, text [Therrien]

(3) P3 = The area under Sx(ejω) is non-negative and equals the energy of x(n)


( ) 2 ( )xR k kσ δ= Example 24: White noise find the PSD of white noise x(n) with

Application to communication systems: ( ) 0( / 2) ( )xR k N kδ=


Example 25: Harmonic Process

• Definition: a harmonic process is defined as:

where M, {Ak}, {ωk} are constants

{φk} are pairwise independent RVs uniformly distributed over [0, 2π]

• Compute E{x(n)}, Rx(k), Sx(ejω)

( )1

( ) cos ; 0M

k k k kk

x n A nω φ ω=

= + ≠∑




Example 26: 1 2

1 2

( ) cos(0.1 ) 2sin(1.5 ), [0,2 ], independent

x n n nU

π θ π θθ θ π

= + + +

Compute Rx(k), Sx(ejω)



Summary of Properties for Stationary x(n) Definitions

Mean

Correlation

Covariance

Cross-Correlation

Cross-Covariance

PSD / Cross-PSD

Inter-relations ( ) ( ) 2

x x xC R m= − ( ) ( ) *xy xy x yC R m m= −

Properties

Autocorrelation PSD

( ) ( )( ) ( )( ){ }*x x xC E x n m x n m= − − −

( ){ } xE x n m=

( ) ( ) ( ){ }*xR E x n x n= −

( ) ( ) ( ){ }*xyR E x n y n= −

( ) ( )( ) ( )( ){ }*xy x xC E x n m y n m= − − −

( ) ( )j jx xS e R eω ω−= ∑

( ) ( )j jxy xyS e R eω ω−= ∑

( )( ) ( )( ) ( )

*

is NND

0 ,

x

x x

x x

R

R R

R R

= −

≥ ∀

( )( ) ( ) ( )

0

, real

jx

j jx x

S e

S e S e x n

ω

ω ω−

≥

=


Properties

Cross-correlation Cross-PSD ( ) ( )( ) ( ) ( )

( ) ( )

( )

*

0 0

1

xy yx

xy x y

xyxy

x y

xy

R R

R R R

Cρ

σ σ

ρ

= −

≤

=

≤

( ) [ ( )]jxy xyS e FT R kω =


Principal Component Analysis (PCA)

( Discrete Karhunen-Loeve Transform (DKLT))

• In many practical applications, it is beneficial to represent a random sequence x with a linearly equivalent sequence w consisting of uncorrelated components (such sequence w is called the innovation representation).

• In such cases, each component of the uncorrelated sequence w can be viewed as adding new information to the previous components. • Applications exist in compression, classification, etc...

PCA transformation may be used to perform dimension reduction while preserving as much variance from the original space as possible


How to transform x into the innovation representation ?

Assume x=[x(0), … x(N − 1)]T is zero-mean. If x is not zero-mean, remove the mean before proceeding further. Questions: (1) What does it mean for w to be uncorrelated ? (2) What does “represent a random sequence x with a linearly equivalent sequence w” mean ?


Define linear transformation as A=UH

x can be recovered by y by x = Uy x can be rewritten as:

1

11

| |

| |

N

N iii

N

yx u u y u

y =

= =

∑

11

1

| | 0

| | 0

HH

Nx HN

N

uC U U u u

u

λ

λ

− − = Λ = − −

1 0uncorrelated

0

Hy

N

y U x C yλ

λ

= ⇒ = ⇒

UH transformation diagonalizes the covariance matrix

Eigenvector matrix Eigenvalue matrix

See derivation in Appendix E


What if I want to compress information ?

1

M

iii

x y u M N=

= ≤∑

We want eM as small in norm as possible.

1 1 1Recall

N M N

i i ii i ii i i M

M

x y u y u y u

ex= = = +

= = +∑ ∑ ∑

2

1 1

N NHM MM i i

i M i ME E e e E y λ

= + = +

= = = ∑ ∑

How can I pick so that error between x and is minimum ? { } 1

Mi i

u=

x

Derivation shown in Appendix F


2

1 1

N NHM MM i i

i M i ME E e e E y λ

= + = +

= = = ∑ ∑

To minimize loss EM, pick in eM eigenvectors associated with smallest eigenvalues.

+ put in the eigenvectors associated with largest eigenvalues. x


PCA Applications

PCA is used in

- Speech / image coding,

- Communications

- Networking

- Data compression

- Electronic warfare

Because it allows for a lower dimensional representation of the data

- Classification

Because it works! Ref: [4]


[Ref 13]

a

a

b

b

c

c

xyx

zyxy

=

- Redundancy in collected information? - How do we reduce dimension?

PCA Applications, cont’


[Ref 13]

Common information/redundancy between sensors




Example: Assume we have N samples of 2-dimensional data of type x=[x1,x2].

Ref: [4]

Reduced Basis Selection Scheme PCA

x(n) x(n) ∧

“Inverse” PCA


[Ref 13]


Caution: It is possible for PCA to fail ….


Application to Biometrics

Database

Enrollment subsystem

Biometric reader

Feature Extractor

Authentication subsystem

Biometric reader

Feature Extractor

Biometric Matcher

Match or No Match

1010010…

Template

1010010…

Template

[Ref. 5]


Identification (1:N)

Biometric reader

Biometric Matcher

Identification vs. Verification

Database

Verification (1:1)

Biometric reader

Biometric Matcher

ID

Database

This person is Emily Dawson

Match

I am Emily Dawson

[Ref. 5]

06/19/14 EC3410.SuFY14/MPF - Section II 140 1229/09

Face recognition overall procedure

Image captured

Face extracted

Discriminating feature parameters extracted

Extract Feature parameters combined to characterize

each individual + Select classifier type

Training (Design) Stage Testing Stage

Test algorithm with new data (testing set)


Application to Face recognition The eigenface (PCA) implementation

re-organize each image (several options)

- Image with 6464 pixels results in a feature vectors of length 4096! - Need to reduce dimension to simplify problem.


Display Decision

Database Collection (N people)

Camera

Cropped Image Files

...

Dimension reduction &

Features extraction

Subject-specific Feature Generation

Create feature space

Training Phase

Test Subject Camera features extraction

Compare and Classify

Testing Phase


PCA CX=UΛUH

Training Images: each line is a N-dim vector representing a face

Average image vector

Zero-mean training images

Testing Images: each line is a N-dim vector representing a face

Testing Images subtracted from

averaged image vector

M principal components in “face space”

Eigenfaces: each column is an eigenvector of length N; select M<N columns

M principal components in “face space”

First Phase

Third Phase

Class-specific references

Class-specific Centroid calculation

Classification

Second Phase

PCA Recognizer overview


Database Collection (N people)

Camera

Cropped Image Files

...

Class-specific Feature Generation

Create feature space

Dimension reduction &

Features extraction

• First Phase: Training – Extract relevant features & Select classifier

Define “class centroids” for class-specific

references

Goals: (1) Project Images into a smaller dimensional space to reduce computational load & (2) Keep discriminating class information


Feature Space

Dimension reduction Projection

... ... • Create Projection matrix A from covariance matrix

• Project data onto smaller dimensional feature space

C1

C2 C3

• Second Phase: Training - Identify class features


• Third Phase: Testing - Project test data onto feature space & compare against class centroids

C1 C2

C3 Feature Space

?

Decision: ( Class C2 )


Original faces eigenfaces

= K1 + K2 +….+ KN

= K1 + K2

[Ref. 6]


Average face 1st eigenface

[Ref. 6]


PCA - Application to Traffic Monitoring & Network Anomaly Detection

[Ref. 9,10]

Interested in finding out whether • Network is under attack

• There is a sudden change in traffic patterns

• There is an equipment outage

• There is something never seen before In general, unsupervised methods for reliably detecting and classifying may be preferred are they do not require as much a-priori information as supervised schemes do. (flip side: they sometimes may not perform as well….)


• Use Origin-Destination (OD) Network-wide traffic flow data

• Tested on Abilene (precedes Internet2, bef. 2007) academic network – Network connecting 200 US universities – 11 points of presence (PoPs) – Spanned the continental US – Upgraded in 2007 (Internet2)

• OD flow: measures IP-level flow entering and exiting the network at

a given PoP Seattle

Sunnyvale

Los Angeles Denver

Houston

Chicago New York

Washington

Atlanta

Kansas City Indianapolis

[Ref. 12]


[Ref. 10]

• Represent overall network behavior by the set of all OD traffic flows

• Information carried by various OD flows may be related

• High dimensional problem how can we reduce the dimensionality ?

• Can we use OD flows to detect traffic anomalies ?


Examples of OD flows

[Ref. 10]


• Collect OD traffic flow obtained for each possible combination of Origin & Destinations (5 mn increments over one week)

• Combine in the OD traffic flow matrix X

• Use PCA (done via the SVD decomposition) to decompose information into a set of eigenvectors (called “eigenflows” and associated eigenvalues

X n×m U: eigenvector matrix (n×n)

1

m

λ

λ

Λ: eigenvalue matrix (n×n)


Deterministic components (eigenflows associated with largest eigenvalues)

Spiky components (eigenflows associated unusual events

Noisy components (eigenflows associated with smaller eigenvalues

Three main types of eigenflows

[Ref. 10]


Only a few large eigenvalues

Overall traffic may be modeled with few dimensions only

[Ref. 10]


OD flow reconstructed in terms of three types of eigenflows

[Ref. 10]


• Identify unusual traffic behavior by separating traffic into two components

(1) Usual traffic represented by eigenflows associated with k (small usually less than 10) of the largest eigenvalues (where most of the energy resides). represented by space S1 spanned by the first top k eigenflows define PS1: projection onto S1 (2) Unusual traffic represented by eigenflows not taken into account in usual traffic) represented by space S2 spanned by remaining n-k eigenflows.

• Project OD flow traffic y obtained onto S1 and S2 for all OD flows normal traffic = PS1y residual traffic= y-PS1y • Traffic anomaly detected when there is a sudden change in residual traffic


nb of OD flows

i=1( ) ODflow(i,t)total traffic t = ∑

Residual traffic(t)=total traffic(t) – usual traffic(t)

[Ref. 10]


Section II – Random Processes

Appendices



Appendix A

How to compute correlation matrix estimates


How to compute correlation matrix estimates

Discrete data: x = [x(0), … , x(N-1)]T Compute correlation matrix based on N data points. Maximum correlation matrix dimension? Define matrix X as

X =

x(0) x(1)

…

…

x(N – 1) 0

0

0 x(0) x(1)

x(N – 1) 0 0

…

0

0 0

0 x(0)

…

0 x(N – 1)

…

…

…

…

…

…

…

…


XHX(1, 1) =

XHX =

x*(0)

…

…

0

0 0

… 0 0

…

… … …

…

…

x*(1) x*(0)

x*(N – 1) x*(N – 2) x*(N – 1)

x*(N – 1) x*(N – 2) x*(0)

x(0) x(1)

… …

x(N – 1) 0

0

0 x(0) x(1)

x(N – 1) 0

…

0

0

0 x(0)

…

0 x(N – 1)

…

…

…

…

…

…

…

…

x(2)

x(N – 2)

…

×



Appendix B

How to assess data stationarity


• Statistical test is defined as: 0 1 2

1 1 2

::

H m mH m m

=≠

• Where the test statistic is:

1 2

2 21 21 2/ /

m mTN Nσ σ

−=

+• If equal variances are assumed, T reduces to:

1 2

2 21 21 2

1 21 2

( 1) ( 1) 1/ 1/2

m mTN N N N

N Nσ σ

−=

− + −+

+ −

• Turns out T has a t distribution with υ degrees of freedom where:

( )

22 21 21 2

2 22 21 21 1 2 2

1 2

/ /

( / ) /( 1) ( / ) /( 1)If equal variances are assumed, then 2

N N

N N N NN N

σ συ

σ συ

+=

− + −= + −

• Reject the hypothesis that the two means are equal with (1-α) confidence if: ,1 / 2 , / 2 , / 2orT t t T tυ α υ α υ α−< = − >

• Test is implemented in MATLAB using ttest2.m

T-test for equality of means

[Ref: http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm ]

Assume the sample mean and variance for each segments are defined as:

1 2

2 21 2, , ,m mσ σ

http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm


[Ref: http://www.itl.nist.gov/div898/handbook/eda/section3/eda359.htm]

F-test for equality of variance

• Test is implemented in MATLAB using vartest2.m

0 1 2

1 1 2

::

HH

σ σσ σ

=≠

• Statistical test is defined as:

• Similar derivation as for the mean, requiring the use of the F distribution

• So the name: “Two-sample F-test for equal variances”


Section II – Random Processes Appendix C

Detection of the number of periodic

tones imbedded in a noisy signal using the correlation matrix information

06/19/14 EC3410.SuFY14/MPF - Section II 167 167

• if rank ; rank can be 2 or 3

• if rank , then rank

• if rank , is called “rank deficient” • maximum rank = 2

X(n)=A exp (j(ωn + θ)), θ ~ U[0, 2π] Detection of number of stationary tones – noise free case

Compute the 2-dimensional correlation matrix of x and its rank. Linear algebra results:

(1) Rank of a square matrix = number of non-zero eigenvalues (2) Given the correlation matrix

( )2xR

( )( )2 1xR = ( )2xR

( )( )2 1xR = ( )( )3 1xR =

( )( )2 2xR =

( )( ) 1 3PxR P= >

( )( )3xR

( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

* *

*3

0 1 21 0 1

2 1 0

x x x

x x xx

x x x

R R RR R R

R

R R R

=

( )2xR

& rank


( ) ( ) ( )( ) ( )

( ) ( ) ( )

*2

*

0 11 0

x xx

x x

x

R RR

R R

R k E x n x n k

= =

= −

06/19/14 EC3410.SuFY14/MPF - Section II 169 12/29/09

y(n)=A exp (j(ωn + θ))+w(n), θ ~ U[0, 2π], w[n] white noise with pdf~N(0,K), θ & w(n) independent

• Compute the 2-dimensional correlation matrix of y and its rank. • How can the information be used to detect the number of tones?

Detection of number of stationary tones – noisy case



Eigenvalues of the signal covariance matrix

Noise-free

Noisy

Noise-free

Noisy


Limitations when tones are close


Example C1: You are given the set of noise-free measurements contained in xa and the noisy measurements computed from xa contained in ya. You are also given noisy data contained in yb ( Data in Pack2Data5.mat). 1) For data xa and ya: Pick the covariance matrix dimension N=10, N=30, N=100, i.e., pick maxlag, as defined below, to be equal first to 10, 30, and 100. plot the eigenvalues. Explain how and why the specific selection of N affects the ability to detect the number of tones. Estimate the number of complex tones. 2) Estimate the number of complex tones contained in yb. The eigenvalues of the N-dim covariance matrix, where N is defined as maxlag=20 below (to be selected by user, value is problem dependent), or the correlation matrix of the ”zero-meaned”, data x may be computed and plotted as follows: maxlag=20;

[xc,xlags]=xcov(x, maxlag, 'biased'); % lag 0 is at index maxlag+1 xc1=xc(maxlag+1:end); XC=toeplitz(xc1); [v,lam]=eig(XC); lamda=flipud(diag(lam)); subplot(211),stem(lamda) % plot eigenvalues by decreasing values

subplot(212),periodogram(x,[],[],1),ylabel('dB'),title(‘Periodogram') % plot PSD


Section II – Random Processes Appendix D

Proof that the PSD is periodic with a

period equal to 2π


Property

Let x[n] be a stationary RP, then Sx(ejω) is periodic with period 2π

Proof - Simplest case:

( ) ( ) ( )002 2j k j

x xR k e S eω ω π δ ω ω π= ↔ = − −∑

( ) ( )We can prove jx T xS e F R kω =

( ) ( )by proving jx xR k IFT S e ω =


( ) ( )

( )

( )

0

0

0

2

02

02

inside [ 2 ] only1" "ispresent2 0

121 2 2

2

2

i

j j j kx x

j k

j k

afor

j kj k

IFT S e S e e d

e d

e d

e e

ω ω ω

π

ω

π

ω

π

πω ω π

ωω

ω ω

ωπ

π δ ω ω π ωπ

δ ω ω π ω

= + =

=

=

= − −

= − −

= =

∫

∑∫

∑ ∫

( ) ( )We prove jx T xS e F R kω = ( ) ( )by proving j

x xR k IFT S e ω =


( )If ij kx i

iR k a e ω= ∑

( ) ( )2 2jx i i

iS e aω π δ ω ω π = − −

∑ ∑

( )2 ijji

ia e e ωωπ δ −∑sometimes written as


Section II – Random Processes Appendix E

PCA derivation


1

| |

| |Nx u u

=

To Prove that:

1

N

ii

x y=

= ∑may be rewritten as


1

1

1 11 1 12 2 1

2 21 1 22 2 2

1 1 2 2

| |

| | ......

......

...

N

N

N N

N N

N N N NN N

yx u u

yx u y u y u yx u y u y u y

x u y u y u y

⇓ ⇓

= =

= + + += + + +

= + + +

1 21 2

1

+ +

N NN

iii

x u y u y u y

x y u=

= +

= ∑


Section II – Random Processes Appendix F

PCA derivation - How to minimize the

error quantity using the eigendecomposition of the covariance

matrix


06/19/14 EC3410.SuFY14/MPF - Section II 183 EC3410.WFY10/MPF - Section II


…

Database S images

6464=4096 pixels

Images get reshaped as

column vectors

dimension N=40961

…

Cx= E[(x-mx)(x-mx) H]

…

Combine S vectors each of dimension

40961

x1… .xS

Cx dimension (NN)

40964096

PCA Find N eigen vectors/values

[ U, Λ]

Compression step: Keep P “top” eigenvectors

P<<N

Projection step: Project {xi} to get {yi}, i =1,…,S

Identify each subject characteristics via subject-specific {yi}’s

(select class centroid & spread information)

Store in database: • Centroid information for each subject (of P-dimension)

• P eigenvectors information (P4096)

Face Recognition Application for 6464 images - Summary

(P << N=4096)1

1

, 1,...,

H

iiHP

uy x i S

u

− − − −

= = − − − −


[1]: Discrete Random Signals and Statistical Signal Processing, C. Therrien, Prentice Hall, 1992 [2]:Statistical and Adaptive Signal Processing, D. Manolakis, V. Ingle & S. Kogon, Artech House, 2005 [3] R. Giuterez-Osuna, course notes for CPSC 689: Statistical Classification and Clustering http://courses.cs.tamu.edu/rgutier/cpsc689_f05/ [4] Carreira-Perpiñán, M. Á. (2001): Continuous latent variable models for dimensionality reduction and sequential data reconstruction. PhD thesis, University of Sheffield, UK. http://www.cse.ogi.edu/~miguel/papers.html [5] “Biometrics: Faces and Identity Verification in a Networked World,” Presentation for CSI7163/ELG5121, D. Chow, M. Samuel [6] A. Drygajlo, Biometrics, Speech Processing and Biometrics Group Signal Processing Institute Ecole Polytechnique Fédérale de Lausanne (EPFL), http://scgwww.epfl.ch/courses [7] Performance Evaluation of Computer and Communication Systems, J-Y. Le Boudec, epfl, http://perfevalepfl.ch/lectureNotes.htm [8] http://www.itl.nist.gov/div898/handbook/eda [9] Whole-Network’ Methods for Traffic Analysis and Anomaly Detection, Eric D. Kolaczyk, www.sytacom.mcgill.ca/eng/15_MITACS/KolaczykMITACS08.pdf [10] N. Feamster, lectures notes for CS7260 (Internetworking Architectures and Protocols) http://www.cc.gatech.edu/classes/AY2006/cs7260_spring/syllabus.html#Schedule [11] W. Cham, Foundation Course on Probability, Random Variable and Random Processes [12] http://www.internet2.edu/2004AR/abilene_map_large.cfm

Section II – Random Processes References

http://courses.cs.tamu.edu/rgutier/cpsc689_f05/

http://www.cse.ogi.edu/~miguel/papers.html

http://scgwww.epfl.ch/courses

http://perfevalepfl.ch/lectureNotes.htm

http://www.itl.nist.gov/div898/handbook/eda

http://www.sytacom.mcgill.ca/eng/15_MITACS/KolaczykMITACS08.pdf

http://www.cc.gatech.edu/classes/AY2006/cs7260_spring/syllabus.html#Schedule

http://www.internet2.edu/2004AR/abilene_map_large.cfm


[13] J. Shlens, “A Tutorial on Principal Component Analysis,” ver. 3.0.1,

http://www.snl.salk.edu/~shlens/

http://www.snl.salk.edu/~shlens/

Documents

II - Random Processes - Applications to Signal ...faculty.nps.edu/fargues/teaching/ec3410/EC3410-II-SuFY14.pdf · ... Random Processes - Applications to Signal & Information ... Multiple