Download pdf - Chapter 5 Spectral Representationssuhasini/teaching673/spectral_representations.pdf · Chapter 5 Spectral Representations ... chapter. But ﬁrst we will summarise why this result

Chapter 5

Spectral Representations

Prerequisites

• Knowledge of complex numbers.

• Have some idea of what the covariance of a complex random variable (we do define it

below).

• Some idea of a Fourier transform.

Objectives

• Know the definition of the spectral density.

• The spectral density is always non-negative and this is a way of checking that a sequence

is actually non-negative definite (is a autocovariance).

• The DFT of a second order stationary time series is almost uncorrelated.

• The spectral density of an ARMA time series, and how the roots of the characteristic

polynomial of an AR may influence the spectral density function.

• There is no need to understand the proofs of either Bochner’s (generalised) theorem or

the spectral representation theorem, just know what these theorems are. However, you

should know the proof of Bochner’s theorem in the sample case that!

r |rc(r)| < !.

5.1 Some Fourier background

The background given here is a extremely sketchy (to say the least), for a more thorough

background of Fourier analysis in time series, the reader is referred, for example, to Priestley

82

(1983), Chapter 4 and Fuller (1995), Chapter 3. For a good (nonstatistical) overview on

the Discrete Fourier transform and how they are related to Fourier transforms the reader is

referred to Briggs and Henson (1997).

(i) Discrete Fourier transforms of finite sequences

It is straightforward to show (by using the property!n

j=1 exp(i2!k/n) = 0 for k "= 0)

that if

dk =1#n

n"

j=1

xj exp(i2!jk/n),

then {xr} can be recovered by inverting this transformation

xr =1#n

n"

k=1

dk exp($i2!rk/n),

(ii) Fourier sums and integrals

Of course the above only has meaning when {xk} is a finite sequence. However suppose

that {xk} is a sequence which belongs to "2 (that is!

k x2k < !), then we can define

the function

f(#) =1#2!

!"

k="!

xk exp(ik#),

where# 2!

0 f(#)2d# =!

k x2k, and we we can recover {xk} from f(#), through

xk =1#2!

$ 2!

0

f(#) exp($ik#).

(iii) Convolutions. Let us suppose that!

k |ak|2 < ! and!

k |bk|2 < ! and we define

the Fourier transform of the sequences {ak} and {bk} as A(#) = 1#2!ak exp(ik#) and

B(#) = 1#2!

!

k bk exp(ik#) respectively. Then

!"

j="!

ajbk"j =

$ 2!

0

A(#)B($#) exp($ik#)d#

!"

j="!

ajbj exp(ij#) =

$ 2!

0

A($)B(# $ $)d$. (5.1)

83

The proof of the above follows from

!"

j="!

ajbj exp(ij#) =!"

r="!

$ 2!

0

$ 2!

0

A($1)B($2) exp($ir($1 + $2)) exp(ij#)(5.2)

=

$ $

A($1)B($2)!"

r="!exp(ir(# $ $1 $ $2))

% &' (

="!(#1+#2)

d$1d$2 (5.3)

=

$ 2!

0

A($)B(# $ $)d$. (5.4)

5.2 Motivation

We first start by giving a taster of the spectral representation (before we dive into its rather

theoretical derivation) and why it may be useful, let us suppose {Xt} is a stationary time se-

ries, where we observe {Xt}nt=1. We recall from Section 2.3.4 that Rn = var(Xn) is a Toeplitz

matrix. It is well known that R1/2n X t is an uncorrelated sequence, but in reality this serves

little purpose because we would have to know R1/2n (thus the correlation structure) in order

to uncorrelate Xn. However, for a second order stationary time series, something curiously,

remarkable happens. The DFT, almost uncorrelates the time series. The implication of this

is extremely useful in time series, and we shall be using this transform in estimation in a later

chapter. But first we will summarise why this result is true. As you may have guessed it all

boils down to the fact that the covariance is a Toeplitz matrix which can be approximated

by (two) circulant matrices.

We start by defining the Fourier transform of {Xt}nt=1

Jn(#j) =1#n

n"

t=1

Xt exp(ij2!t

n)

where #j = 2!j/n are often called the fundamental, Fourier frequencies. Since the DFT is

a linear transformation we can go backwards and forwards between the DFT and the data

itself because

Xt =1#n

n"

k=1

J(#k) exp($it2!k

n).

We recall that often in statistics we may transform the data because it may be easier to

analyse the transformed data. The same applies to {Xt} and {Jn(#k)}k because there is

a one to one correspondence between the two, sometimes it is easier to do inference on

the DFT than the data itself, we explain why this may be so below. Note that {Xt} are

84

the observations over time, whereas Jn(#) is detecting for periodicities. Note that because

Jn(#k) = Jn(#n"k), that all the information about {Xt} is contained in {Jn(#k)}n/2k=1.

Let X $n = (Xn, . . . , X1), using the notation introduced in Section 2.3.4 our objective is

to show that EnXn is almost an uncorrelated sequence (recall that (En)s,t = #(s"1)(t"1)n =

exp(2i!(s"1)(t"1)n ) is the Fourier matrix - please note that I have abused notation by using

# to denote both the frequency and also the exponential too!). There are various ways

the uncorrelatedness property can be shown (which we summarise in a lemma below), but

here we demonstrate this property using the circulant matrix approximation. We start by

considering the variance of EnXn

var(EnXn) = EnRnEn,

and our aims is to show that it is almost a diagonal. We first recall that if Rn were a circulant

matrix, then EnXn would be uncorrelated. This is not the case, but the upper right hand

side and the lower left hand side of Rn can approximated by circulant matrices - this is the

trick in showing the ‘near’ uncorrelatedness. Studying Rn

Rn =

)

****+

c(0) c(1) c(2) . . . c(n$ 1)

c(1) c(0) c(1) . . . c(n$ 2)...

.... . .

......

c(n$ 1) c(n$ 2)... c(1) c(0)

,

----.

we observe that it can be written as the sum of two circulant matrices, plus some error, that

we will bound. That is, we define the two circulant matrices

C1n =

)

****+

c(0) c(1) c(2) . . . c(n$ 1)

c(n$ 1) c(0) c(1) . . . c(n$ 2)...

.... . .

......

c(1) c(2)... c(n$ 1) c(0)

,

----.

and

C2n =

)

*******+

0 c(n$ 1) c(n$ 2) . . . c(1)

c(1) 0 c(n$ 1) . . . c(2)

c(2) c(1) 0 . . . c(3)...

.... . .

......

c(n$ 1) c(n$ 2)... c(1) 0

,

-------.

We observe that the upper right hand sides of C1n and Rn match and the lower left and

sides of C2n and Rn match. As the above are circulant their eigenvector matices are En and

85

its inverse En = E"1n . The eigenvalues of Cn1 and Cn2 are

diag(n"1"

j=0

c(j),n"1"

j=0

c(j)#jt , . . . ,

n"1"

j=0

c(j)#(t"1)jt )

diag(n"1"

j=1

c(j),n"1"

j=1

c(n$ j)#jt , . . . ,

n"1"

j=1

c(n$ j)#(t"1)jt )

= diag(n"1"

j=1

c(j),n"1"

j=1

c(j)#"jt , . . . ,

n"1"

j=1

c(j)#"(t"1)jt ),

respectively. Or more succinctly the kth eigenvalues of Cn1 and Cn2 are $k1 =!n"1

j=0 c(j)#j(k"1)t

and $k2 =!n"1

j=1 c(j)#"j(k"1)t this may not look like much, but together they give us the so

called spectral density function.

We now show that under the condition!

r |rc(r)| < ! we have

EnRnEn $ En

/

Cn1 + Cn2

0

En = O(1

n)I, (5.5)

where I is a n % n matrix of ones. To bound the above we consider the above di!erences

element by element. Since the upper right hand sides of Cn1 and Rn match and the lower

left and sides of Cn2 and Rn match, the above di!erence is

11esRne

$t $ esCn1e

$t $ esCn2e

$t

11 =

2

n

n"1"

r=1

|rc(r)| = O(1

n).

Thus we have shown (5.5). Therefore, since En is the eigenvector matrix of Cn1 and Cn2,

altogether we have

En

/

Cn1 + Cn2

0

En = diag(fn(0), fn(2!

n), . . . , fn(

2!(n$ 1)

n)),

where fn(#) =!n"1

r="(n"1) c(r) exp(ij#). Altogether this gives

var(EnXn) = EnRnEn =

)

****+

fn(0) 0 . . . 0 0

0 fn(2!n ) . . . 0 0

.... . . . . . . . .

...

0 . . . . . . 0 fn(2!(n"1)

n ))

,

----.

+O(1

n)

)

****+

1 1 . . . 1 1

1 1 . . . 1 1...

. . . . . . . . ....

1 . . . . . . 1 1

,

----.

.

Some caution needs to be take with the above, since fn(#) is not necessarily positive - but

it will be for a large enough n.

We summarise the above result in the following lemma, where we also give an alternative

proof.

86

Lemma 5.2.1 Suppose!

r |rc(r)| < !. Then we have

cov(Jn(2!k1n

), Jn(2!k2n

)) =

2

fn(2!kn ) +O( 1n) k1 = k2O( 1n) k1 "= k2

=

2

f(2!kn ) +O( 1n) k1 = k2O( 1n) k1 "= k2

where f(#) = 12!

!!j="! c(j) exp(ij#).

PROOF. The proof immediately follows from the approximation by a double circulant matrix

above. However, a more hands on proof is to just calculate cov(Jn(2!k1n ), Jn(

2!k2n )). We note

that cov(A,B) = E(AB)$ E(A)E(B), thus we have

cov(Jn(2!k1n

), Jn(2!k2n

)) =1

n

n"

t,$=1

cov(Xt, X$ ) exp(i(tk1 $ %k2)2!

n)

=1

n

n"1"

r="(n"1)

c(r) exp(ir2!k1n

)n"|r|"

t=1

exp(2!it(k1 $ k2)

n)

=1

n

n"1"

r="(n"1)

c(r) exp(ir2!k1n

)n"

t=1

exp(2!it(k1 $ k2)

n) +O(

1

n).

Thus we obtain the first result. To prove the second part, we note that fn(#) & f(#) (see

the section below for precise details). !

Now let us return to the representation of Xt in terms of the DFTs

Xt =1#n

n"

k=1

Jn(#k) exp($it#k). (5.6)

We see that from Lemma 5.2.1 that we can write Xt as the weighted average of uncorrelated

frequencies, Jn(#k), this is a heuristic justification of what is often called the spectral repre-

sentation theorem. Moreover, taking the covariance of Xt, again by using Lemma 5.2.1 we

see that

cov(Xt, X$ ) '1

n

n"

k=1

var(Jn(#k)) exp($i(t$ %)#k), (5.7)

this is essentially a heuristic justification of Bochner’s Theorem (or Herglotz’s theorem). The

two theorems mentioned above we will describe in detail in the following sections.

Of course the entire time series {Xt} will have infinite length (and in general it will not

belong to "2), so it is natural to ask whether the above results can be generalised to {Xt}.The answer is yes, by replacing the sum in (5.7) by an integral to obtain

c(t$ %) =

$ 2!

0

exp(i(t$ %)#)f(#)d# =

$ 2!

0

exp(i(t$ %)#)dF (#).

87

Comparing the above with (5.7), we observe that var(Jn(#k)) is a positive function, thus

f(·) should be positive. Moroever, its integral (F (#) =# %

0 fn($)d$) must be positive and

nondecreasing.

Let us return to (5.6), we can rewrite it as

Xt =1#n

n"

k=1

Jn(#k) exp($it#k) =n"

k=1

exp($it#k)1#n

/

Sn(#k)$ Sn(#k"1)0

,

where Sn(#k) =1#n

!kj=1 Jn(#j). Noting that the increments in Sn(#k) are orthogonal. We

show below that extending the above to the entire time series yields the representation

Xt =

$

exp(ik#)dZ(#),

where Z(#) is right continuous orthogonal increment process (that is E((Z(#1)$Z(#2)(Z(#3)$Z(#4)) = 0, when the intervals [#1,#2] and [#3,#4] do not overlap) and E(|Z(#)|2) = F (#).

The above result holds for both linear and nonlinear time series, however, in the case that

Xt has a linear representation

Xt =!"

j="!

&j't"j,

then Xt has the particular form

Xt =

$

A(#) exp(ik#)dZ(#), (5.8)

where A(#) =!!

j="! &j exp(ij#) and Z(#) is an orthogonal increment process, but in

addition E(|dZ(#)|2) = d# ie. the variance of increments do not vary over frequency (as this

varying has been absorbed by A(#), since F $(#) = |A(#)|2). This result should not come as

a surprise as a have used a close approximation of it in Section 2.3.4. In particular we recall

that

Xt 'n"

s=1

's1

n

n"1"

r=0

$"1r #rs

n #"rtn

where $"1r = (1$

!pj=1 (j exp(ij#))"1. Rearranging the above gives

Xt ' 1#n

n"1"

r=0

$"1r #"rt

n

1#n

n"

s=1

's#rsn

=1#n

n"1"

r=0

1

1$!p

j=1 (j exp(ij#r)% &' (

A(%r)

J&(#r)

88

where #r = 2!rn . It will be shown that the DFT of the innovations J&(#r) has a constant

variance over all frequencies. Therefore comparing (5.8) with the above we see that they are

very similar.

We prove the above results in the following section.

We mention that a more detailed discussion on spectral analysis in time series is give in

Priestley (1983), Chapters 4 and 6, Brockwell and Davis (1998), Chapters 4 and 10, Fuller

(1995), Chapter 3, Shumway and Sto!er (2006), Chapter 4. In many of these references they

also discuss tests for periodicity etc (see also Quinn and Hannan (2001) for estimation of

frequencies etc.).

5.3 The spectral density and distribution

Be warned, it takes time to digest these results and to appreciate them. On first reading,

they may well bore your socks o!, however, like a good wine/cheese or apple, they take time

to digest and appreciate.

Some preliminaries

We first state a theorem which is very useful for checking positive definiteness of a sequence.

See Brockwell and Davis (1998), Corollary 4.3.2 or Fuller (1995), Theorem 3.1.9.

To prove part of the result we use the fact that if a sequence {ak} ( "2, then g(#) =12!

!!k=! ak exp(ik#) ( L2 (by Parseval’s theorem) and ak =

# 2!

0 g(#) exp(ik#) and the

following result.

Lemma 5.3.1 Suppose!!

k="! |c(k)| < !, then we have

1

n

(n"1)"

k="(n"1)

|kc(k)| & 0

as n & !. Moreover, if!!

k="! |kc(k)| < !, then 1n

!(n"1)k="(n"1) |kc(k)| = O( 1n).

PROOF. The proof in straightforward in the case that!!

k=! |kc(k)| < ! (the second

assertion), in this case!(n"1)

k="(n"1)|k|n |c(k)| = O( 1n). The proof is slightly more tricky in the

case that!!

k=! |c(k)| < !. First we note that since!!

k="! |c(k)| < ! for every ' > 0

there exists a N& such that for all n ) N&,!

|k|%n |c(k)| < '. Let us suppose that n > N&,

89

then we have the bound

1

n

(n"1)"

k="(n"1)

|kc(k)| * 1

n

(N""1)"

k="(N""1)

|kc(k)|+ 1

n

"

|k|%N"

|kc(k)|

* 1

2!n

(N""1)"

k="(N""1)

|kc(k)|+ '.

Hence if we keep N& fixed we see that 1n

!(N""1)k="(N""1) |kc(k)| & 0 as n & !. Since this is

true for all ' (for di!erent thresholds N&) we obtain the required result. !

5.3.1 The spectral density

The Fourier transform of the covariance

f(#) =1

2!

!"

k="!

c(k) exp(ik#)

is called the spectral density. The eagle-eyed amongst will notice that it is the limit of

the eigenvalues (or spectral representation) of the double circulant matrix considered in the

previous section - which may be one explanation as to why it is called the spectral density

(the reason for it being called density will become clear from the theorem below).

Theorem 5.3.1 (The spectral density) Suppose the coe!cients {c(k)} are absolutely summable

(that is!

k |c(k)| < !). Then the sequence {c(k)} is nonnegative definite if an only if the

function f(#), where

f(#) =1

2!

!"

k="!

c(k) exp(ik#)

is nonnegative. Moreover

c(k) =

$ 2!

0

exp($ik#)f(#)d#. (5.9)

It is worth noting that f is called the spectral density corresponding to the covariances {c(k)}.

PROOF. We first show that if {c(k)} is a non-negative definite sequence, then f(#) is a

nonnegative function. We recall that since {c(k)} is non-negative then for any sequence x =

(x1, . . . , xN ) (real or complex) we have!n

s,t=1 xsc(s$ t)xs ) 0 (where xs is the complex con-

jugate of xs). Now we consider the above for the particular case x = (exp(i#), . . . , exp(in#)).

Define the function

fn(#) =1

2!n

n"

s,t=1

exp(is#)c(s$ t) exp($it#).

90

Thus by definition fn(#) ) 0. We note that fn(#) can be rewritten as

fn(#) =1

2!

(n"1)"

k="(n"1)

/n$ |k|n

0

c(k) exp(ik#).

Comparing f(#) = 12!

!!k="! c(k) exp(ik#) with fn(#) we see that

11f(#)$ fn(#) *

11 * 1

2!

11"

|k|%n

c(k) exp(ik#)11+

1

2!

11

(n"1)"

k="(n"1)

|k|nc(k) exp(ik#)

11

:= In + IIn.

Now since!!

k="! |c(k)| < ! it is clear that In & 0 as n & !. Using Lemma 5.3.1 we have

IIn & 0 as n & !. Altogether the above implies

11f(#)$ fn(#)

11& 0 n & !. (5.10)

Now it is clear that since for all n, fn(#) are nonnegative functions, the limit f must be

nonnegative (if we suppose the contrary, then there must exist a sequence of functions

{fnk(#)} which are not necessarily nonnegative, which is not true). Therefore we have

shown that if {c(k)} is a nonnegative definite sequence, then f(#) is a nonnegative function.

We now show that f(#), defined by 12!

!!k="! c(k) exp(ik#), is a nonnegative function

then {c(k)} is a nonnegative sequence. We first note because {c(k)} ( "1 it is also in "2hence we have that c(k) =

# 2!

0 f(#) exp(ik#). Now we have

n"

s,t=1

xsc(s$ t)xs =

$ 2!

0

f(#)3

n"

s,t=1

xs exp(i(s$ t)#)xs

4

d# =

$ 2!

0

f(#)|n"

s=1

xs exp(is#)|2d# ) 0.

Hence we obtain the desired result. !

The above theorem is very useful. It basically gives a simple way to check whether a

sequence {c(k)} is non-negative definite or not (hence whether it is a covariance function -

recall Theorem 1.1.1).

Example 5.3.1 Consider the empirical covariances (here we gives an alternative proof to

Remark 4.1.1) defined in Chapter 4

cn(k) =

21n

!n"kt=1 XtXt"k |k| * n$ 1

0 otherwise,

we show below that {cn(k)} is non-negative definite sequence. Thus, using Lemma 1.1.1 there

exists a stationary time series {Zt} which has the covariance cn(k).

91

To show that the sequence is non-negative definite we use Theorem 5.3.1, where the

main consequence is that a sequence is non-negative definite, if its Fourier transform is

non-negative. We now apply this result. The Fourier transform of {cn(k)} is

(n"1)"

k="(n"1)

exp(ik#)cn(k) =(n"1)"

k="(n"1)

exp(ik#)cn(k) =1

n

n"|k|"

t=1

XtXt+|k| =1

n

11

n"

t=1

Xt exp(it#)11 ) 0.

Since the above is non-negative, this means that {cn(k)} is a non-negative definite sequence.

We now state a useful result which relates the largest and smallest eigenvalue of of a

variance matrix of a stationary process to the smallest and largest values of the spectral

density.

Lemma 5.3.2 Suppose that {Xk} is a stationary process with covariance function {c(k)}and spectral density f(#). Let Rn = var(Xn), where Xn = (X1, . . . , Xn). Suppose inf% f(#) )m > 0 and sup% f(#) * M < ! Then for all n we have

$min(Rn) ) inf%f(#) and $max(Rn) * sup

%f(#).

PROOF. Let e1 be the eigenvector with smallest eigenvalue $1 corresponding to Rn. Then

using c(s$ t) =#

f(#) exp(i(s$ t)#)d# we have

$min(Rn) = e$1Rne1 =n"

s,t=1

es,1c(s$ t)et,1 =

$

f(#)n"

s,t=1

es,1 exp(i(s$ t)#)et,1d# =

=

$ 2!

0

f(#)|n"

s=1

es,1 exp(is#)|2d# )$ 2!

0

f(#)

$ 2!

0

|n"

s=1

es,1 exp(is#)|2d# * inf%f(#),

since#

|!n

s=1 es,1 exp(is#)|2d# = 1. Using a similar method we can show that $max(Rn) *sup f(#). !

A consequence of the above result is if a spectral density is bounded from above and

bounded away from zero then it is non-singular and with a bounded spectral norm.

Lemma 5.3.3 Suppose the covariance {c(k)} decays to zero as k & !, then for all n,

Rn = var(Xn) is a non-singular matrix (Note we do not require the stronger condition the

covariances are absolutely summable).

PROOF. See Brockwell and Davis (1998), Proposition 5.1.1. !

92

5.3.2 The spectral distribution and Bochner’s theorem

Theorem 5.3.1 only holds when the sequence {c(k)} is absolutely summable. Of course this

may not always be the case. An example of an ‘extreme’ case is the time series Xt = Z.

Clearly this is a stationary time series and its covariance is c(k) = var(Z) for all k. In this

case the autocovariances {c(k) = 1}, is not absolutely summable, hence the representation

of the covariance in Theorem 5.3.1 can not be applied to this case. The reason is because the

fourier transform of the infinite sequence {1} is not well defined (since {1} does not belong

to "1 and also "2).

However, we now show that Theorem 5.3.1 can be generalised to include all non-negative

definite sequences and stationary processes, by considering the spectral distribution rather

than the spectral density (we use the integral#

g(x)dF (x), a definition is given in the

Appendix).

Theorem 5.3.2 A function {c(k)} is non-negative definite sequence if and only if

c(k) =

$ 2!

0

exp(ik#)dF (#), (5.11)

where F (#) is a right-continuous (this means that F (x + h) & F (x) as 0 < h & 0),

non-decreasing, non-negative bounded function on [$!, !] (hence it has all the properties

of a distribution and it can be consider as a distribution - it is usually called the spectral

distribution). This representation is unique.

PROOF. We first show that if {c(k)} is non-negative definite sequence, then we can write

c(k) =# 2!

0 exp(ik#)dF (#), where F (#) is a distribution function. Had {c(k)} been abso-

lutely summable, then we can use Theorem 5.3.1 to write c(k) =# 2!

0 exp(ik#)dF (#), where

F (#) =# %

0 f($)d$ and f($) = 12!

!!k="! c(k) exp(ik#). By using Theorem 5.3.1 we know

that f($) is nonnegative, hence F (#) is a distribution, and we have the result.

In the case that {c(k)} is not absolutely summable we cannot use this approach but we

adapt some of the ideas used to prove Theorem 5.3.1. As in the proof of Theorem 5.3.1

define the nonnegative function

fn(#) =1

2!n

n"

s,t=1

exp(is#)c(s$ t) exp($it#) =1

2!

(n"1)"

k="(n"1)

/n$ |k|n

0

c(k) exp(ik#).

When the {c(k)} is not absolutely summable, the limit of fn(#) may no longer be well

defined. To circumvent our dealing with functions which may have awkward limits, we

consider instead their integral, which we will show will always be a distribution function.

Let us define the function Fn(#) whose derivative is fn(#), that is

Fn(#) =

$ %

0

fn($)d$ 0 * $ * 2!.

93

Since fn($) is nonnegative, Fn is a nondecreasing function, and it is bounded (Fn(!) =# 2!

0 fn($)d$ * c(0)). Hence Fn satisfies all properties of a distribution and can be treated

as a distribution function. Now it is clear that for every k we have

$ 2!

0

exp(ik#)dFn(#) =

2

(1$ |k|n )c(k) |k| * n

0 0(5.12)

If we let dn,k =# 2!

0 exp(ik#)dFn(#), we see that {dn,k}n for every k, {dn,k}n is a Cauchy

sequence, thus dn,k & dk as n & !. But we should ask what this tells us about the limit of

the distribution the underlying {Fn}? It would appear that the distributions {Fn} should

(weakly converge) to a function F and this function should also be a distribution function

(if {Fn} are all nondecreasing functions, then its limit must be nondecreasing). In fact this

turns out to be the case by applying Helly’s theorem (see Appendix). Roughly speaking, it

states that given a sequence of distributions {Gk} which are all bounded, then there exists

a distribution G, which is the limit of a subsequence of {Gki}. Hence for every h ( L2 we

have$

h(#)dGki(#) &$

h(#)dG(#).

We now apply this result to the sequence {Fn}. We observe that the sequence of distributions

{Fn} are all uniformly bounded (by c(0)). Therefore applying Helly’s theorem there must

exist a distribution F , which is the limit of a subsequence of {Fn}, that is for every h ( L2

we have$

h(#)dFki(#) &$

h(#)dF (#), i & !,

for some subsequence {Fki}. Below we show that the above limit is true not only for a

subsequence but the actual sequence {Fk}. We observe that {exp(ik#)} is a basis of L2,

and that the sequence {# 2!

0 exp(ik#)dFn(#)}n converges for all k, to c(k). Therefore for all

h ( L2 we have$

h(#)dFk(#) &$

h(#)dF (#), k & !,

for some distribution function F . Therefore looking at the case {exp(ik#)} we have$

exp(ik#)dFk(#) &$

exp(ik#)dF (#), k & !.

Since#

exp(ik#)dFk(#) & c(k) and#

exp(ik#)dFk(#) &#

exp(ik#)dF (#), then we have

c(k) =#

exp(ik#)dF (#), where F is a distribution.

To show that {c(k)} is a non-negative definite sequence when c(k) is defined as c(k) =#

exp(ik#)dFk(#), we use the same method given in the proof of Theorem 5.3.1. !

94

Example 5.3.2 Using the above we can construct the spectral distribution for the (rather

silly) time series Xt = Z. Let F (#) = 0 for # < 0 and F (#) = var(Z) for # ) 0 (hence F

is the step function). Then we have

cov(X0, Xk) = var(Z) =

$

exp(ik#)dF (#).

5.4 The spectral representation theorem

We now state the spectral representation theorem and give a rough outline of the proof.

Theorem 5.4.1 If {Xt} is a second order stationary time series with mean zero, and spectral

distribution F (#), and the spectral distribution function is F (#), then there exists a right

continuous, orthogonal increment process {Z(#)} (that is E((Z(#1)$Z(#2)(Z(#3)$Z(#4)) =

0, when the intervals [#1,#2] and [#3,#4] do not overlap) such that

Xt =

$ 2!

0

exp(it#)dZ(#), (5.13)

where for #1 ) #2, E(Z(#1) $ Z(#2))2 = F (#1) $ F (#2) (noting that F (0) = 0). (One

example of a right continuous, orthogonal increment process is Brownian motion, though

this is just one example, and usually Z(#) will be far more general than Brownian motion).

Heuristically we see that (5.13) is the decomposition of Xt in terms of frequencies, whose

amplitudes are orthogonal. In other words Xt is decomposed in terms of frequencies exp(it#)

which have the orthogonal amplitudes dZ(#) ' (Z(# + ))$ Z(#)).

Remark 5.4.1 Note that so far we have not defined the integral on the right hand side of

(5.13), this is known as a stochastic integral. Unlike many deterministic functions (functions

whose derivative exists), one cannot really suppose dZ(#) ' Z $(#)d#, because usually a

typical realisation of Z(#) will not be smooth enough to di"erentiate. For example, it is

well known that Brownian is quite ‘rough’, that is a typical realisation of Brownian motion

satisfies |B(t1, #) $ B(t2, #)| * K(#)|t1 $ tt|', where # is a realisation and * * 1/2, but

in general * will not be larger. The integral#

g(#)dZ(#) is well defined if it is defined

as the limit (in the mean squared sense) of discrete sums. In other words let Zn(#) =!n

k=1 Z(#k)I%nk!1,%nk(#) and

$

g(#)dZn(#) =n"

k=1

g(#k){Z(#k)$ Z(#k"1)},

then#

g(#)dZ(#) is the mean squared limit of {#

g(#)dZn(#)}n that is E[#

g(#)dZ(#) $#

g(#)dZn(#)]2. For a more precise explanation, see Priestley (1983), Sections 3.6.3 and

Section 4.11 and Brockwell and Davis (1998), Section 4.7.

95

A very elegant explanation on the di!erent proofs of the spectral representation theorem

is given in Priestley (1983), Section 4.11. We now give a rough outline of the proof using

the functional theory approach.

PROOF of the Spectral Representation Theorem To prove the result we will

define two Hilbert spaces H1 and H2, where H1 one contains deterministic functions and H2

contains random variables. We will define what is known as an isomorphism (a one-to-one

mapping which preserves the norm and is linear) between these two spaces.

Let H1 be defined by all functions g, if# 2!

0 g2(#)dF (#) < ! (and F is the spectral

distribution of the second order stationary time series {Xt}), then g ( H1 and define the

inner product on H1 to be

< f, g >=

$ 2!

0

f(x)g(x)dF (x). (5.14)

We first note that {exp(ik#)} belongs to H1, moreover they also span the space H1. Hence

if g ( H1, then there exists coe"cients {aj} such that g(x) =!

j aj exp(ij#). Let H2 be

the space spanned by the second order stationary time series {Xt}, hence H2 = sp({Xt}) (itnecessary to define the closure of this space, but we won’t do so here) and the inner product

is the covariance cov(Y,X).

Now let us define the linear mapping T : H1 & H2

T (n"

j=1

aj exp(ik#)) =n"

j=1

ajXk, (5.15)

for any n (it is necessary to show that this can be extended to infinite n, but we won’t do so

here). We need to shown that T defines an isomorphism. We first observe that this mapping

perserves the inner product. That is suppose f, g ( H1, then there exists {fj} and {gj} such

that f(x) =!

j fj exp(ij#) and g(x) =!

j gj exp(ij#). Hence by definition of T in (5.15)

we have

< Tf, Tg > = cov("

j

fjXj ,"

j

gjXj) ="

j1,j2

fj1gj2cov(Xj1 , Xj2)

=

$ 2!

0

/"

j1,j2

fj1gj2 exp(i(j1 $ j2)#)0

dF (#) =

$ 2!

0

f(x)g(x)dF (x) =< f, g >

(5.16)

where the above is due to Bochner’s theorem (see Theorem 5.3.2). Hence < Tf, Tg >=<

f, g >, so the inner product is preserved. To show that it is a one-to-one mapping see Brock-

well and Davis (1998), Section 4.7. Altogether this means that T defines an isomorphism

betwen H1 and H2. Therefore all functions which are in H1 have a corresponding random

variable in H2 which display similar properties.

96

Since for all # ( [0, 2!], the identity functions I[0,%](x) ( H1, we can define the random

function {Z(#); 0 * $ * 2!} to be T (I[0,%]) = Z(#). Now since that mapping T is linear we

observe that T (I[%1,%2]) = Z(#1)$Z(#2). Moreover, since T preserves the norm we have for

any non-intersecting intervals [#1,#2] and [#3,#4] that

E((Z(#1)$ Z(#2)(Z(#3)$ Z(#4)) = < T (I[%1,%2]), T (I[%3,%4]) >=< I[%1,%2], I[%3,%4] >

=

$

I[%1,%2](x)I[%3,%4]dF (#) = 0.

Therefore by construction {Z(#); 0 * $ * 2!} is an orthogonal increment process, with

E((Z(#1)$ Z(#2)2) = < T (I[%1,%2]), T (I[%1,%2]) >< I[%1,%2], I[%2,%3] >

=

$ %2

%1

dF (#) = F (#1)$ F (#2).

Having defined the two spaces which are isomorphic and the random function {Z(#); 0 *$ * 2!} and function I[0,%](x) which are have orthogonal increments. We can now prove the

result. We note that for any function g ( L2 we can write

g(#) =

$ 2!

0

g(s)dI[%,2!](s),

(thus dI[0,%](s) = )%(s)ds, where )%(s) is the dirac delta function). We now consider the

special case gt(#) = exp(it#), and apply the isomophism T to this

T (exp(it#)) =

$ 2!

0

exp(its)dI[%,2!](s)

$ 2!

0

exp(its)dT (I[%,2!](s)),

where the mapping goes inside the integral due to the linearity of the isomorphism. Now

we observe that I[%,2!](s) = I[0,s](%), thus by definition of {Z(#); 0 * $ * 2!} we have

T (I[0,s](#)) = Z(s). Substituting this into the above gives

Xt =

$ 2!

0

exp(its)dZ(s),

which gives the required result. !

It is worth taking a stop back from the proof and see where the assumption of stationarity

crept in. By Bochner’s theorem we have that

c(t$ %) =

$

exp(i(t$ %)#)dF (#),

thus we use F to define the space H1, the mapping T (through {exp(ik#)}k), the inner-

product and thus the isomorphism. But the exponential functions {exp(ik#)} and F do not

97

necessarily play a unique role. It was the construction of the orthogonal random functions

{Z(#)} that was instrumental. The main idea of the proof was that there are functions

{(k(#)} and a distribution H such that all the covariances of the stochastic process {Xt}can be written as

E(XtX$ ) = c(t, %) =

$ 2!

0

(t(#)($ (#)dH(#),

where H is a measure. As long as the above representation exists, then we can define

two spaces H1 and H2 where {(k} is the basis of the functional space H1 and it con-

tains all functions f such that#

f 2(#)dH(#) < ! and H2 is the random space defined by

sp(. . . , X"1, X0, X1, . . .). From here we can define an isomorphism T : H1 & H2, where for

all functions f(#) =!

k fk(k(#) ( H1

T (f) ="

k

fkXk ( H2.

An important example is T ((k) = Xk. Now by using the same arguments as those in the

proof above we have

Xt =

$

(t(#)dZ(#)

where {Z(#)} are orthogonal random functions and E|Z(#)|2 = H(#). We state this result

in the theorem below (see Priestley (1983), Section 4.11).

Theorem 5.4.2 (General orthogonal expansions) Let {Xt} be a time series (not nec-

essarily second order stationary) with covariance {E(XtX$ ) = c(t, s)}. If there exists a

sequence of functions {(k(·)} which satisfy for all k

$ 2!

0

|(k(#)|2dH(#) < !

and the covariance admits the representation

c(t, s) =

$ 2!

0

(t(#)(s(#)dH(#), (5.17)

where H is a distribution then for all t we have the representation

Xt =

$

(t(#)dZ(#) (5.18)

where {Z(#)} are orthogonal random functions and E|Z(#)|2 = H(#). On the other hand if

Xt has the representation (5.18), then c(s, t) admits the representation (5.17).

98

Remark 5.4.2 We mention that the above representation applies to both stationary and

nonstationary time series. What makes the exponential functions {exp(ik#)} special is if a

process is stationary then the representation of c(k) := cov(Xt, Xt+k) in terms of exponentials

is guaranteed:

c(k) =

$ 2!

0

exp($ik#)dF (#). (5.19)

Therefore there always exists an orthogonal random function {Z(#)} such that

Xt =

$

exp($it#)dZ(#).

Indeed, whenever the exponential basis is used in the definition of either the covariance or

the process {Xt}, the resulting process will always be second order stationary.

We mention that it is not always guaranteed that for any basis {(t} we can represent the

covariance {c(k)} as (5.17). However (5.18) is a very useful starting point for characterising

nonstationary processes.

5.5 The spectral densities of MA, AR and ARMA mod-

els

We obtain the spectral density function for MA(!) processes. Using this we can easily

obtain the spectral density for ARMA processes. Let us suppose that {Xt} satisfies the

representation

Xt =!"

j="!

&j't"j (5.20)

where {'t} are iid random variables with mean zero and variance +2 and!!

j="! |&j| < !.

We recall that the covariance of above is

c(k) = E(XtXt+k) =!"

j="!

&j&j+k. (5.21)

Since!!

j="! |&j| < !, it can be seen that

"

k

|c(k)| *"

k

!"

j="!

|&j| · |&j+k| < !.

Hence by using Theorem 5.3.1, the spectral density function of {Xt} is well defined. There

are several ways to derive the spectral density of {Xt}, we can either use (5.21) and f(#) =

99

12!

!

k c(k) exp(ik#) or obtain the spectral representation of {Xt} and derive f(#) from the

spectral representation. We prove the results using the latter method.

Since {'t} are iid random variables, using Theorem 5.4.1 there exists an orthogonal

random function {Z(#)} such that

't =1

2!

$ 2!

0

exp(it#)dZ(#).

Since E('t) = 0 and E('2t ) = +2 multiplying the above by 't, taking expectations and noting

that due to the orthogonality of {Z(#)} we have E(dZ(#1)dZ(#2)) = 0 unless #1 = #2 we

have that E(|dZ(#)|2) = 2!+2d#, hence f&(#) = 2!+2.

Using the above we can obtain the spectral representation for {Xt}

Xt =1

2!

$ 2!

0

3!"

j="!

&j exp($ij#)4

exp(it#)dZ(#).

Hence

Xt =

$ 2!

0

A(#) exp(it#)dZ(#), (5.22)

where A(#) = 12!

!!j="! &j exp($ij#), noting that this is the unique spectral representation

of Xt.

Definition 5.5.1 (The Cramer Representation) We mention that the representation in

(5.22) of a stationary process is usually called the Cramer representation of a stationary

process, where

Xt =

$ 2!

0

A(#) exp(it#)dZ(#),

where {Z(#) : 0 * # * 2!} are orthogonal functions.

Multiplying (5.22) by Xt+k and taking expectations gives

E(XtXt+k) = c(k) =

$ 2!

0

A(#1)A($#2) exp(it#1 $ i(t+ k)#2)E(dZ(#1)dZ(#2)).

Due to the orthogonality of {Z(#)} we have E(dZ(#1)dZ(#2)) = 0 unless #1 = #2, altogether

this gives

E(XtXt+k) = c(k) =

$ 2!

0

|A(#)|2 exp($ik#)E(|dZ(#)|2) =$ 2!

0

2!+2|A(#)|2 exp($ik#)d#.

Comparing the above with (5.9) we see that the spectral density f(#) = 2!+2|A(#)|2 =(2

2! |!!

j="! &j exp($ij#)|2. Therefore the spectral density function corresponding to the

MA(!) process defined in (5.20) is

f(#) = 2!+2|A(#)|2 = +2

2!|

!"

j="!

&j exp($ij#)|2.

100

Example 5.5.1 Let us suppose that {Xt} is a stationary ARMA(p, q) time series (not

necessarily invertible or causal), where

Xt $p"

j=1

&jXt"j =q"

j=1

,j't"j,

{'t} are iid random variables with E('t) = 0 and E('2t ) = +2. Then the spectral density of

{Xt} is

f(#) =+2

2!

|1 +!q

j=1 ,j exp(ij#)|2

|1$!q

j=1 (j exp(ij#)|2

We note that because the ARMA is the ratio of trignometric polynomials, this is known as a

rational spectral density.

Remark 5.5.1 The roots of characteristic function of an AR process will have an influence

on the location of peaks in its corresponding spectral density function. To see why consider

the AR(2) model

Xt = (1Xt"1 + (2Xt"2 + 't,

where {'t} are iid random variables with zero mean and E('2) = +2. Suppose the roots of the

characteristic polynomial ((B) = 1$(1B$(2B2 lie outside the unit circle and are complex

conjugates where $1 = r exp(i,) and $2 = r exp($i,). Then the spectral density function is

f(#) =+2

|1$ r exp(i(, $ #))|2|1$ r exp(i($, $ #)|2

=+2

[1 + r2 $ 2r cos(, $ #)][1 + r2 $ 2r cos($, $ #)].

Now we see that r > 0, the f(#) is maximum when # = ,, however when r < 0 then the

above is maximum when # = , $ !. Thus the peaks in f(#) correspond to peaks in the

pseudo periodicities of the time series and covariance structure (which one would expect).

How pronounced these peaks are depend on how close r is to one. The close r is to one the

larger the peak. We can generalise the above argument to higher order Autoregressive models,

in this case there may be multiple peaks. In fact, this suggests that the larger the number of

peaks, the higher the order of the AR model that should be fitted.

5.5.1 Approximations of the spectral density to AR and MA spec-

tral densities

In this section we show that the spectral density

f(#) =1

2!

!"

r="!c(r) exp(ir#)

101

can be approximated to any order by the spectral density of an AR(p) or MA(q) process.

We do this by truncating the infinite number of covariances by a finite number, however,

this does not lead to a positive definite spectral density. Instead we consider a slight variant

on this and define the Cesaro sum

fm(#) =1

2!

m"

r="m

(1$ |r|m

)c(r) exp(ir#) =

$ 2!

0

f($)Fm($$ #)d$, (5.23)

where Fm($) =!m

r="m Dr($) andDr($) =!r

j="r exp(ij#) (these are the Fejer and Dirichlet

kernels respectively). It can be shown that Fm is a positive kernel, therefore since f is

positive, then fm is positive and {(1$ |r|m c(r))}r is a positive definite sequence. Moreover, it

can be shown that

sup%

|fm(#)$ f(#)| & 0, (5.24)

thus for a large enough m, fm(#) will be within ) of the spectral density f . Using this we

can prove the results below.

Lemma 5.5.1 Suppose that!

r |c(r)| < ! and f is the spectral density of the covariances.

Then for every ) > 0, there exists a m such that |f(#)$ fm(#)| < ) and fm(#) = +2|&(#)|2,where &(#) =

!mj=0 &j exp(ij#). Thus we can approximate the spectral density of f with the

spectral density of a MA.

PROOF. We show that there exists an MA(m) which has the spectral density fm(#), where

fm is defined in (5.23). Thus by (5.24) we have the result.

To prove the result we note that if a(r) = a($r), then!m

j="m a(r) exp(ir#) can be

factorised asm"

j="m

a(r) exp(ir#) = exp($im#)2m"

r=0

a(r $m) exp(ir#)

= C exp($im#)m5

j=1

(1$ $j exp(i#))(1$ $"1j exp(i#))

= C($1)mm5

j=1

$j

m5

j=1

(1$ $"1j exp($i#))(1$ $"1

j exp(i#))

for some finite constant C. Using the above we have

fm(#) = K/

m5

j=1

(1$ $"1j exp(i#))

0/m5

j=1

(1$ $"1j exp($i#))

0

:= A(#)A($#),

thus fm(#) is the spectral density of an MA(m) process. !

102

Lemma 5.5.2 Suppose that!

r |c(r)| < ! and f is the spectral density of the covariances.

Then for every ) > 0, there exists a m such that |f(#)$gm(#)| < ) and gm(#) = +2|((#)|"2,

where ((#) =!m

j=0 (j exp(ij#). Thus we can approximate the spectral density of f with the

spectral density of an AR.

PROOF. We observe that

11f(#)$ gm(#)

11 = f(#)|gm(#)"1 $ f(#)"1|gm(#).

Now we can apply the same arguments to prove to Lemma 5.5.1 to obtain the result. !

5.6 Higher order spectrums

We recall that the covariance is measure of linear dependence between two random variables.

Higher order cumulants are a measure of higher order dependence. For example, the third

order cumulant for the zero mean random variables X1, X2, X3 is

cum(X1, X2, X3) = E(X1X2X3)

and the fourth order cumulant for the zero mean random variables X1, X2, X3, X4 is

cum(X1, X2, X3, X4) = E(X1X2X3X4)$ E(X1X2)E(X3X4)$ E(X1X3)E(X2X4)$ E(X1X4)E(X2X3).

From the definition we see that if X1, X2, X3, X4 are independent then cum(X1, X2, X3) = 0

and cum(X1, X2, X3, X4) = 0.

Moreover, if X1, X2, X3, X4 are Gaussian random variables then cum(X1, X2, X3) = 0

and cum(X1, X2, X3, X4) = 0. Indeed all cumulants higher than order two is zero. This

comes from the fact that cumulants are the coe"cients of the power series expansion of the

logarithm of the moment generating function of {Xt}.Since the spectral density is the fourier transform of the covariance it is natural to ask

whether one can define the higher order spectra as the fourier transform of the higher order

cumulants. This turns out to be the case, and the higher order spectra have several interesting

properties.

Let us suppose that {Xt} is a stationary time series (notice that we are assuming it

is strictly stationary and not second order). Let -3(t, s) = cum(X0, Xt, Xs), -3(t, s, r) =

cum(X0, Xt, Xs, Xr) and -q(t1, . . . , tq"1) = cum(X0, Xt1 , . . . , Xtq) (noting that like the co-

variance the higher order cumulants are invariant to shift). The third, fourth and qth order

103

spectras is defined as

f3(#1,#2) =!"

s="!

!"

t="!-3(s, t) exp(is#1 + it#2)

f4(#1,#2,#3) =!"

s="!

!"

t="!

!"

r="!-4(s, t, r) exp(is#1 + it#2 + ir#3)

fq(#1,#2, . . . ,#q"1) =!"

t1,...,tq!1="!-q(t1, t2, . . . , tq"1) exp(it1#1 + it2#2 + . . .+ itq"1#q"1).

Example 5.6.1 (Third and Fourth order spectra of a linear process) Let us suppose

that {Xt} satisfies

Xt =!"

j="!

&j't"j

where!!

j="! |&j| < !, E('t) = 0 and E('4t ) < !. Let A(#) =!!

j="! &j exp(ij#). Then

it is straightforward to show that

f(#) = +2|A(#)|2

f3(#1,#2) = -3A(#1)A(#2)A($#1 $ #2)

f4(#1,#2,#3) = -4A(#1)A(#2)A(#3)A($#1 $ #2 $ #3),

where -3 = cum('t, 't, 't) and -4 = cum('t, 't, 't, 't).

We see from the example, that unlike the spectral density, the higher order spectras are

not necessarily positive or even real.

A review of higher order spectra can be found in Brillinger (2001). Higher order spectras

have several applications especially in nonlinear processes, see Subba Rao and Gabr (1984).

We will consider one such application in a later chapter.

Using the definition of the higher order spectrum we can now generalise Lemma 5.2.1 to

higher order cumulants (see Brillinger (2001), Theorem 4.3.4).

Proposition 5.6.1 {Xt} is a strictly stationary time series, where for all 1 * i * q $ 1 we

have!!

t1,...,tq!1=! |(1 + ti)-q(t1, . . . , tq"1)| < !. Then we have

cum(Jn(#k1), . . . , Jn(#kq)) =1

nq/2fq(#k2 , . . . ,#kq)

n"

j=1

exp(ij(#k1 $ . . .$ #kq)) +O(1

nq/2)

=

21

n(q!1)/2fq(#k2 , . . . ,#kq) +O( 1nq/2 )

!qi=1 ki = nm

O( 1nq/2 ) otherwise

where #ki =2!kin .

104

5.7 Extensions

5.7.1 The spectral density of a time series with randomly missing

observations

Let us suppose that {Xt} is a second order stationary time series. However {Xt} is not

observed at everytime point and there are observations missing, thus we only observe Xt

at {%k}k. Thus what is observed is {X$k}. The question is how to deal with this type of

data. One method was suggested in ?. He suggested that the missingness mechanism {%k}be modelled stochastically. That is define the random process {Yt} which only takes the

values {0, 1}, where Yt = 1 if Xt is observed, but Yt = 0 if Xt is not observed. Thus we

observe {XtYt}t = {Xtk} and also {Yt} (which is the time points the process is observed).

He also suggests modelling {Yt} as a stationary process, which is independent of {Xt} (thus

the missingness mechanism and the time series are independent).

The spectral densities of {XtYt}, {Xt} and {Yt} have an interest relationship, which

can be exploited to estimate the spectral density of {Xt} given estimators of the spectral

densities of {XtYt} and {Xt} (which we recall are observed). We first note that since {Xt}and {Yt} are stationary, then {XtYt} is stationary, furthermore

cov(XtYt, X$Y$ ) = cov(Xt, X$ )cov(Yt, Y$ ) + cov(Xt, Y$ )cov(Yt, X$ ) + cum(Xt, Yt, X$ , Y$ )

= cov(Xt, X$ )cov(Yt, Y$ ) = cX(t$ %)cY (t$ %)

where the above is due to independence of {Xt} and {Yt}. Thus the spectral density of

{XtYt} is

fXY (#) =1

2!

!"

r="!cov(X0Y0, XrYr) exp(ir#)

=1

2!

!"

r="!cX(r)cY (r) exp(ir#)

=

$

fX($)fY (# $ $)d#,

where fX($) = 12!

!!r="! cX(r) exp(ir#) and fY ($) = 1

2!

!!r="! cY (r) exp(ir#) are the

spectral densities of the observations and the missing process.

105