31
Lecture 4 - Spectral Estimation The Discrete Fourier Transform The Discrete Fourier Transform (DFT) is the equivalent of the continuous Fourier Transform for signals known only at N instants separated by sample times T (i.e. a finite sequence of data). Let f (t ) be the continuous signal which is the source of the data. Let N samples be denoted f [0],f [1],f [2],...,f [k ],...,f [N 1]. The Fourier Transform of the original signal, f (t ), would be F (j ω)= −∞ f (t )e j ωt dt We could regard each sample f [k ] as an impulse having area f [k ]. Then, since 74

N T N f ,f k ,f N f t F f k e f k - University of Oxfordsjrob/Teaching/B14_SP/b14... · 2017. 1. 4. · r Xˆ = M ˆa ) n e = X − Xˆ n. o he effie ˆa 1. 46 , − 1 . 08 , 0

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Lecture 4 - Spectral Estimation

    The Discrete Fourier Transform

    The Discrete Fourier Transform (DFT) is the equivalent of the continuous FourierTransform for signals known only at N instants separated by sample times T (i.e.a finite sequence of data).Let f (t) be the continuous signal which is the source of the data. Let N samplesbe denoted f [0], f [1], f [2], . . . , f [k ], . . . , f [N − 1].The Fourier Transform of the original signal, f (t), would be

    F (jω) =

    ∫ ∞

    −∞f (t)e−jωtdt

    We could regard each sample f [k ] as an impulse having area f [k ]. Then, since

    74

  • the integrand exists only at the sample points:

    F (jω) =

    ∫ (N−1)T

    of (t)e−jωtdt

    = f [0]e−j0 + f [1]e−jωT + . . .+ f [k ]e−jωkT + . . . f (N − 1)e−jω(N−1)T

    ie. F (jω) =N−1∑

    k=0

    f [k ]e−jωkT

    We could in principle evaluate this for any ω, but with only N data points to startwith, only N final outputs will be significant.You may remember that the continuous Fourier transform could be evaluated

    over a finite interval (usually the fundamental period To) rather than from −∞ to+∞ if the waveform was periodic. Similarly, since there are only a finite numberof input data points, the DFT treats the data as if it were periodic (i.e. f (N) tof (2N − 1) is the same as f (0) to f (N − 1).)Hence the sequence shown below in Fig. 4.1(a) is considered to be one periodof the periodic sequence in plot (b).

    75

  • 0 1 2 3 4 5 6 7 8 9 10 110

    0.2

    0.4

    0.6

    0.8

    1(a)

    0 5 10 15 20 25 300

    0.2

    0.4

    0.6

    0.8

    1(b)

    Figure 4.1: (a) Sequence of N = 10 samples. (b) implicit periodicity in DFT.

    76

  • Since the operation treats the data as if it were periodic, we evaluate theDFT equation for the fundamental frequency (one cycle per sequence, 1NTHz,2πNT rad/sec.) and its harmonics (not forgetting the d.c. component (or average)at ω = 0).

    i.e. set ω = 0,2π

    NT,2π

    NT× 2, . . .

    NT× n, . . .

    NT× (N − 1)

    or, in general

    F [n] =N−1∑

    k=0

    f [k ]e−j2πN nk (n = 0 : N − 1)

    F [n] is the Discrete Fourier Transform of the sequence f [k ]. We may write thisequation in matrix form as:⎛

    ⎜⎜⎜⎜⎜⎝

    F [0]F [1]F [2]...

    F [N − 1]

    ⎟⎟⎟⎟⎟⎠

    =

    ⎜⎜⎜⎜⎜⎜⎜⎝

    1 1 1 1 . . . 11 W W 2 W 3 . . . WN−1

    1 W 2 W 4 W 6 . . . WN−2

    1 W 3 W 6 W 9 . . . WN−3...1 WN−1 WN−2 WN−3 . . . W

    ⎟⎟⎟⎟⎟⎟⎟⎠

    ⎜⎜⎜⎜⎜⎝

    f [0]f [1]f [2]...

    f [N − 1]

    ⎟⎟⎟⎟⎟⎠

    where W = exp(−j2π/N) and W = W 2N etc. = 1.

    77

  • DFT – example

    Let the continuous signal be

    f (t) = 5︸︷︷︸dc

    +2 cos(2πt − 90o)︸ ︷︷ ︸

    1Hz

    +3 cos 4πt︸ ︷︷ ︸2Hz

    0 1 2 3 4 5 6 7 8 9 10−4

    −2

    0

    2

    4

    6

    8

    10

    Figure 4.2: Example signal for DFT.

    Let us sample f (t) at 4 times per second (ie. fs = 4Hz) from t = 0 to t =34.

    78

  • The values of the discrete samples are given by:

    f [k ] = 5 + 2cos(π2k − 90o) + 3cosπk by putting t = kTs =

    k4

    i.e. f [0] = 8, f [1] = 4, f [2] = 8, f [3] = 0, (N = 4)

    Therefore F [n] =3∑

    0

    f [k ]e−jπ2nk =

    3∑

    k=0

    f [k ](−j)nk

    ⎜⎜⎜⎝

    F [0]F [1]F [2]F [3]

    ⎟⎟⎟⎠=

    ⎜⎜⎜⎝

    1 1 1 11 −j −1 j1 −1 1 −11 j −1 −j

    ⎟⎟⎟⎠

    ⎜⎜⎜⎝

    f [0]f [1]f [2]f [3]

    ⎟⎟⎟⎠=

    ⎜⎜⎜⎝

    20−j412j4

    ⎟⎟⎟⎠

    The magnitude of the DFT coefficients is shown below in Fig. 4.3.

    79

  • 0 1 2 30

    5

    10

    15

    20

    f (Hz)

    |F[n

    ]|

    Figure 4.3: DFT of four point sequence.

    80

  • Inverse Discrete Fourier Transform

    The inverse transform of

    F [n] =N−1∑

    k=0

    f [k ]e−j2πN nk

    is

    f [k ] =1

    N

    N−1∑

    n=0

    F [n]e+j2πN nk

    i.e. the inverse matrix is 1N times the complex conjugate of the original (symmet-ric) matrix.

    Note that the F [n] coefficients are complex. We can assume that the f [k ] valuesare real (this is the simplest case; there are situations (e.g. radar) in which twoinputs, at each k , are treated as a complex pair, since they are the outputs from0 o and 90 o demodulators).

    In the process of taking the inverse transform the terms F [n] and F [N − n]

    81

  • (remember that the spectrum is symmetrical about N2 ) combine to produce 2frequency components, only one of which is considered to be valid (the one atthe lower of the two frequencies, n × 2πT Hz where n ≤

    N2 ; the higher frequency

    component is at an “aliasing frequency” (n > N2 )).

    From the inverse transform formula, the contribution to f [k ] of F [n] and F [N−n]is:

    fn[k ] =1

    N{F [n]ej

    2πN nk + F [N − n]ej

    2πN (N−n)k} (4.2)

    For allf [k ] real, F [N − n] =N−1∑

    k=0

    f [k ]e−j2πN (N−n)k

    But e−j2πN (N−n)k = e−j2πk

    ︸ ︷︷ ︸

    1 for all k

    e+j2πnN k = e+j

    2πN nk

    i.e. F [N − n] = F ∗(n) (i.e. the complex conjugate)

    Substituting into the Equation for fn[k ] above gives,

    fn[k ] =1

    N{F [n]ej

    2πN nk + F ∗(n)e−j

    2πN nk} since ej2πk = 1

    82

  • ie. fn[k ] =2

    N{Re{F [n]} cos

    Nnk − Im{F [n]} sin

    Nnk}

    or fn[k ] =2

    N|F [n]| cos{(

    NTn)kT + arg(F [n])}

    i.e. a sampled sinewave at 2πnNT Hz, of magnitude2N |F [n]|.

    For the special case of n = 0, F [0] =∑

    f [k ] (i.e. sum of all samples) and thecontribution of F [0] to f [k ] is f0[k ] =

    1NF [0] = average of f [k ] = d.c. compo-

    nent.

    83

  • Interpretation of example

    1. F [0] = 20 implies a d.c. value of 1NF [0] =204 = 5 (as expected)

    2. F [1] = −j4 = F ∗[3] implies a fundamental component of peak amplitude2N |F [1]| =

    24 × 4 = 2 with phase given by argF [1] = −90

    o

    i.e. 2 cos(2π

    NTkT − 90 o) = 2 cos(

    π

    2k − 90 o) (as expected)

    3. F [2] = 12 (n = N2 – no other N − n component here) and this implies acomponent

    f2[k ] =1

    NF [2]ej

    2πN ·2k =

    1

    4F [2]ejπk = 3cosπk (as expected)

    since sinπk = 0 for all k

    Thus, the conventional way of displaying a spectrum is not as shown in Fig. 4.3but as shown in Fig. 4.4 (obviously, the information content is the same):

    84

  • 0 1 2 30

    1

    2

    3

    4

    5

    6

    f (Hz)

    |F[n

    ]|

    sqrt(2)

    3/sqrt(2)

    Figure 4.4: DFT of four point signal.

    85

  • In typical applications, N is much greater than 4; for example, for N = 1024, F [n]has 1024 components, but 513− 1023 are the complex conjugates of 511− 1,leaving F [0]1024 as the d.c. component,

    21024

    |F [1]|√2to 21024

    |F [511]|√2as complete a.c.

    components and 11024F [512]√2as the cosine-only component at the highest distin-

    guishable frequency (n = N2 ).

    Most computer programmes evaluate |F [n]|N (or|F [n]|2N for the power spectral den-

    sity) which gives the correct “shape” for the spectrum, except for the values atn = 0 and N2 .

    86

  • 4.1 Discrete Fourier Transform Errors

    To what degree does the DFT approximate the Fourier transform of the functionunderlying the data? Clearly the DFT is only an approximation since it providesonly for a finite set of frequencies. But how correct are these discrete valuesthemselves? There are two main types of DFT errors: aliasing and “leakage”:

    4.1.1 Aliasing

    This is another manifestation of the phenomenon which we have now encoun-tered several times. If the initial samples are not sufficiently closely spaced torepresent high-frequency components present in the underlying function, thenthe DFT values will be corrupted by aliasing. As before, the solution is eitherto increase the sampling rate (if possible) or to pre-filter the signal in order tominimise its high-frequency spectral content.

    4.1.2 Leakage

    Recall that the continuous Fourier transform of a periodic waveform requires theintegration to be performed over the interval - ∞ to +∞ or over an integer

    87

  • number of cycles of the waveform. If we attempt to complete the DFT overa non-integer number of cycles of the input signal, then we might expect thetransform to be corrupted in some way. This is indeed the case, as will now beshown.Consider the case of an input signal which is a sinusoid with a fractional number

    of cycles in the N data samples. The DFT for this case (for n = 0 to n = N2 ) isshown below in 4.5.

    0 2 4 6 80

    2

    4

    6

    8

    freq

    |F[n

    ]|

    Figure 4.5: Leakage.

    88

  • We might have expected the DFT to give an output at just the quantised fre-quencies either side of the true frequency. This certainly does happen but wealso find non-zero outputs at all other frequencies. This smearing effect, whichis known as leakage, arises because we are effectively calculating the Fourier se-ries for the waveform in Fig. 4.6, which has major discontinuities, hence otherfrequency components.

    0 5 10 15 20 25 30 35 40 45 50−1

    −0.5

    0

    0.5

    1

    Figure 4.6: Leakage. The repeating waveform has discontinuities.

    Most sequences of real data are much more complicated than the sinusoidalsequences that we have so far considered and so it will not be possible to avoidintroducing discontinuities when using a finite number of points from the sequencein order to calculate the DFT.The solution is to use one of the window functions which we encountered in the

    89

  • design of FIR filters (e.g. the Hamming or Hanning windows). These windowfunctions taper the samples towards zero values at both endpoints, and so thereis no discontinuity (or very little, in the case of the Hanning window) with ahypothetical next period. Hence the leakage of spectral content away from itscorrect location is much reduced, as in Fig 4.7.

    0 2 4 6 80

    1

    2

    3

    4

    5

    6

    7(a)

    0 2 4 6 80

    1

    2

    3

    4

    5(b)

    Figure 4.7: Leakage is reduced using a Hanning window.

    90

  • Stochastic Models

    4.2 Introduction

    We now discuss autocorrelation and autoregressive processes; that is, the corre-lation between successive values of a time series and the linear relations betweenthem. We also show how these models can be used for spectral estimation.

    4.3 Autocorrelation

    Given a time series xt we can produce a lagged version of the time series xt−Twhich lags the original by T samples. We can then calculate the covariancebetween the two signals

    σxx(T ) =1

    N − 1

    N∑

    t=1

    (xt−T − µx)(xt − µx) (4.3)

    where µx is the signal mean and there are N samples. We can then plot σxx(T )as a function of T . This is known as the autocovariance function. The autocor-

    91

  • relation function is a normalised version of the autocovariance

    rxx(T ) =σxx(T )

    σxx(0)(4.4)

    Note that σxx(0) = σ2x . We also have rxx(0) = 1. Also, because σxy = σyx wehave rxx(T ) = rxx(−T ); the autocorrelation (and autocovariance) are symmetricfunctions or even functions. Figure 4.8 shows a signal and a lagged version of itand Figure 4.9 shows the autocorrelation function.

    92

  • 0 20 40 60 80 1000

    1

    2

    3

    4

    5

    6

    t

    Figure 4.8: Signal xt (top) and xt+5 (bottom). The bottom trace leads the top trace by 5 samples. Or wemay say it lags the top by -5 samples.

    93

  • (a)

    −100 −50 0 50 100−0.5

    0

    0.5

    1

    Lag

    r

    Figure 4.9: Autocorrelation function for xt . Notice the negative correlation at lag 20 and positive correlationat lag 40. Can you see from Figure 4.8 why these should occur?

    94

  • 4.4 Autoregressive models

    An autoregressive (AR) model predicts the value of a time series from previousvalues. A pth order AR model is defined as

    xt =p∑

    i=1

    xt−iai + et (4.5)

    where ai are the AR coefficients and et is the prediction error. These errorsare assumed to be Gaussian with zero-mean and variance σ2e . It is also possibleto include an extra parameter a0 to soak up the mean value of the time series.Alternatively, we can first subtract the mean from the data and then apply thezero-mean AR model described above. We would also subtract any trend fromthe data (such as a linear or exponential increase) as the AR model assumesstationarity.The above expression shows the relation for a single time step. To show the

    relation for all time steps we can use matrix notation.We can write the AR model in matrix form by making use of the embedding

    matrix, M, and by writing the signal and AR coefficients as vectors. We now

    95

  • illustrate this for p = 4. This gives

    M =

    ⎢⎢⎢⎣

    x4 x3 x2 x1x5 x4 x3 x2.. .. .. ..xN−1 xN−2 xN−3 xN−4

    ⎥⎥⎥⎦

    (4.6)

    We can also write the AR coefficients as a vector a = [a1, a2, a3, a4]T , theerrors as a vector e = [e5, e6, ..., eN]T and the signal itself as a vector X =[x5, x6, ..., xN]T . This gives

    ⎢⎢⎢⎣

    x5x6..xN

    ⎥⎥⎥⎦=

    ⎢⎢⎢⎣

    x4 x3 x2 x1x5 x4 x3 x2.. .. .. ..xN−1 xN−2 xN−3 xN−4

    ⎥⎥⎥⎦

    ⎢⎢⎢⎣

    a1a2a3a4

    ⎥⎥⎥⎦+

    ⎢⎢⎢⎣

    e5e6..eN

    ⎥⎥⎥⎦

    (4.7)

    which can be compactly written as

    X =Ma+ e (4.8)

    The AR model is therefore a special case of the multivariate regression model.The AR coefficients can therefore be computed from the equation

    â = (MTM)−1MTX (4.9)

    96

  • The AR predictions can then be computed as the vector

    X̂ =Mâ (4.10)

    and the error vector is then e = X − X̂. The variance of the noise is thencalculated as the variance of the error vector.To illustrate this process we analyse our data set using an AR(4) model. The

    AR coefficients were estimated to be

    â = [1.46,−1.08, 0.60,−0.186]T (4.11)

    and the AR predictions are shown in Figure 4.10. The noise variance was esti-mated to be σ2e = 0.079 which corresponds to a standard deviation of 0.28. Thevariance of the original time series was 0.3882 giving a signal to noise ratio of(0.3882− 0.079)/0.079 = 3.93.

    97

  • (a)0 20 40 60 80 100

    −1.5

    −1

    −0.5

    0

    0.5

    1

    1.5

    t

    (b)0 20 40 60 80 100

    −1.5

    −1

    −0.5

    0

    0.5

    1

    1.5

    t

    Figure 4.10: (a) Original signal (solid line), X, and predictions (dotted line), X̂, from an AR(4) model and (b)the prediction errors, e. Notice that the variance of the errors is much less than that of the original signal.

    98

  • 4.4.1 Relation to autocorrelation

    The autoregressive model can be written as

    xt = a1xt−1 + a2xt−2 + ...+ apxt−p + et (4.12)

    If we multiply both sides by xt−k we get

    xtxt−k = a1xt−1xt−k + a2xt−2xt−k + ...+ apxt−pxt−k + etxt−k (4.13)

    If we now sum over t and divide by N − 1 and assume that the signal is zeromean (if it isn’t we can easily make it so, just by subtracting the mean value fromevery sample) the above equation can be re-written in terms of covariances atdifferent lags

    σxx(k) = a1σxx(k − 1) + a2σxx(k − 2) + ...+ apσxx(k − p) + σe,x (4.14)

    where the last term σe,x is the covariance between the noise and the signal. Butas the noise is assumed to be independent from the signal σe,x = 0. If we nowdivide every term by the signal variance we get a relation between the correlationsat different lags

    rxx(k) = a1rxx(k − 1) + a2rxx(k − 2) + ...+ aprxx(k − p) (4.15)

    99

  • This holds for all lags. For an AR(p) model we can write this relation out forthe first p lags. For p = 4

    ⎢⎢⎢⎣

    rxx(1)rxx(2)rxx(3)rxx(4)

    ⎥⎥⎥⎦=

    ⎢⎢⎢⎣

    rxx(0) rxx(−1) rxx(−2) rxx(−3)rxx(1) rxx(0) rxx(−1) rxx(−2)rxx(2) rxx(1) rxx(0) rxx(−1)rxx(3) rxx(2) rxx(1) rxx(0)

    ⎥⎥⎥⎦

    ⎢⎢⎢⎣

    a1a2a3a4

    ⎥⎥⎥⎦

    (4.16)

    which can be compactly written as

    r = Ra (4.17)

    where r is the autocorrelation vector and R is the autocorrelation matrix. Theabove equations are known, after their discoverers, as the Yule-Walker relations.They provide another way to estimate AR coefficients

    a = R−1r (4.18)

    This leads to a more efficient algorithm than the general method for multivari-ate linear regression (equation 4.9) because we can exploit the structure in theautocorrelation matrix. By noting that rxx(k) = rxx(−k) we can rewrite the

    100

  • correlation matrix as

    R =

    ⎢⎢⎢⎣

    1 rxx(1) rxx(2) rxx(3)rxx(1) 1 rxx(1) rxx(2)rxx(2) rxx(1) 1 rxx(1)rxx(3) rxx(2) rxx(1) 1

    ⎥⎥⎥⎦

    (4.19)

    Because this matrix is both symmetric and a Toeplitz matrix (the terms alongany diagonal are the same) we can use a recursive estimation technique knownas the Levinson-Durbin algorithm.

    4.5 Moving Average Models

    A Moving Average (MA) model of order q is defined as

    xt =q∑

    i=0

    biet−i (4.20)

    where et is Gaussian random noise with zero mean and variance σ2e . They are atype of FIR filter. These can be combined with AR models to get Autoregressive

    101

  • Moving Average (ARMA) models

    xt =p∑

    i=1

    aixt−i +q∑

    i=0

    biet−i (4.21)

    which can be described as an ARMA(p,q) model. They are a type of IIR filter.Usually, however, FIR and IIR filters have a set of fixed coefficients which

    have been chosen to give the filter particular frequency characteristics. In MA orARMA modelling the coefficients are tuned to a particular time series so as tocapture the spectral characteristics of the underlying process.

    4.6 Spectral Estimation using AR models

    Autoregressive models can also be used for spectral estimation. An AR(p) modelpredicts the next value in a time series as a linear combination of the p previousvalues

    xt = −p∑

    k=1

    akxt−k + et (4.22)

    where ak are the AR coefficients and et is IID Gaussian noise with zero mean andvariance σ2. Note the sudden -ve, this is arbitrary and only added to make termsin the spectral equations (as below) +ve.

    102

  • The above equation can be solved by using the z -transform. This allows theequation to be written as

    X(z)

    (

    1 +p∑

    k=1

    akz−k

    )

    = E(z) (4.23)

    It can then be rewritten for X(z) as

    X(z) =E(z)

    1 +∑pk=1 akz

    −k (4.24)

    Taking z = exp(iωTs) where ω is frequency and Ts is the sampling period wecan see that the frequency domain characteristics of an AR model are given by(and noting that the power in the noise process is σ2eTs)

    P (ω) =σ2eTs

    |1 +∑pk=1 ak exp(−ikωTs)|2

    (4.25)

    An AR(p) model can provide spectral estimates with p/2 peaks; therefore ifyou know how many peaks you’re looking for in the spectrum you can define theAR model order. Alternatively, AR model order estimation methods should auto-matically provide the appropriate level of smoothing of the estimated spectrum.AR spectral estimation has two distinct advantages over methods based on

    the Fourier transform (i) power can be estimated over a continuous range of

    103

  • (a) 0 10 20 30 40 50 60 700

    5

    10

    15

    20

    25

    30

    35

    (b) 0 10 20 30 40 50 60 700

    5

    10

    15

    20

    25

    Figure 4.11: Power spectral estimates of two sinwaves in additive noise using (a) Discrete Fourier transformmethod and (b) autoregressive spectral estimation.

    frequencies (not just at fixed intervals) and (ii) the power estimates have lessvariance.

    104