Chapter 5
Spectral Representations
Prerequisites
• Knowledge of complex numbers.
• Have some idea of what the covariance of a complex random variable (we do define it
below).
• Some idea of a Fourier transform.
Objectives
• Know the definition of the spectral density.
• The spectral density is always non-negative and this is a way of checking that a sequence
is actually non-negative definite (is a autocovariance).
• The DFT of a second order stationary time series is almost uncorrelated.
• The spectral density of an ARMA time series, and how the roots of the characteristic
polynomial of an AR may influence the spectral density function.
• There is no need to understand the proofs of either Bochner’s (generalised) theorem or
the spectral representation theorem, just know what these theorems are. However, you
should know the proof of Bochner’s theorem in the sample case that!
r |rc(r)| < !.
5.1 Some Fourier background
The background given here is a extremely sketchy (to say the least), for a more thorough
background of Fourier analysis in time series, the reader is referred, for example, to Priestley
82
(1983), Chapter 4 and Fuller (1995), Chapter 3. For a good (nonstatistical) overview on
the Discrete Fourier transform and how they are related to Fourier transforms the reader is
referred to Briggs and Henson (1997).
(i) Discrete Fourier transforms of finite sequences
It is straightforward to show (by using the property!n
j=1 exp(i2!k/n) = 0 for k "= 0)
that if
dk =1#n
n"
j=1
xj exp(i2!jk/n),
then {xr} can be recovered by inverting this transformation
xr =1#n
n"
k=1
dk exp($i2!rk/n),
(ii) Fourier sums and integrals
Of course the above only has meaning when {xk} is a finite sequence. However suppose
that {xk} is a sequence which belongs to "2 (that is!
k x2k < !), then we can define
the function
f(#) =1#2!
!"
k="!
xk exp(ik#),
where# 2!
0 f(#)2d# =!
k x2k, and we we can recover {xk} from f(#), through
xk =1#2!
$ 2!
0
f(#) exp($ik#).
(iii) Convolutions. Let us suppose that!
k |ak|2 < ! and!
k |bk|2 < ! and we define
the Fourier transform of the sequences {ak} and {bk} as A(#) = 1#2!ak exp(ik#) and
B(#) = 1#2!
!
k bk exp(ik#) respectively. Then
!"
j="!
ajbk"j =
$ 2!
0
A(#)B($#) exp($ik#)d#
!"
j="!
ajbj exp(ij#) =
$ 2!
0
A($)B(# $ $)d$. (5.1)
83
The proof of the above follows from
!"
j="!
ajbj exp(ij#) =!"
r="!
$ 2!
0
$ 2!
0
A($1)B($2) exp($ir($1 + $2)) exp(ij#)(5.2)
=
$ $
A($1)B($2)!"
r="!exp(ir(# $ $1 $ $2))
% &' (
="!(#1+#2)
d$1d$2 (5.3)
=
$ 2!
0
A($)B(# $ $)d$. (5.4)
5.2 Motivation
We first start by giving a taster of the spectral representation (before we dive into its rather
theoretical derivation) and why it may be useful, let us suppose {Xt} is a stationary time se-
ries, where we observe {Xt}nt=1. We recall from Section 2.3.4 that Rn = var(Xn) is a Toeplitz
matrix. It is well known that R1/2n X t is an uncorrelated sequence, but in reality this serves
little purpose because we would have to know R1/2n (thus the correlation structure) in order
to uncorrelate Xn. However, for a second order stationary time series, something curiously,
remarkable happens. The DFT, almost uncorrelates the time series. The implication of this
is extremely useful in time series, and we shall be using this transform in estimation in a later
chapter. But first we will summarise why this result is true. As you may have guessed it all
boils down to the fact that the covariance is a Toeplitz matrix which can be approximated
by (two) circulant matrices.
We start by defining the Fourier transform of {Xt}nt=1
Jn(#j) =1#n
n"
t=1
Xt exp(ij2!t
n)
where #j = 2!j/n are often called the fundamental, Fourier frequencies. Since the DFT is
a linear transformation we can go backwards and forwards between the DFT and the data
itself because
Xt =1#n
n"
k=1
J(#k) exp($it2!k
n).
We recall that often in statistics we may transform the data because it may be easier to
analyse the transformed data. The same applies to {Xt} and {Jn(#k)}k because there is
a one to one correspondence between the two, sometimes it is easier to do inference on
the DFT than the data itself, we explain why this may be so below. Note that {Xt} are
84
the observations over time, whereas Jn(#) is detecting for periodicities. Note that because
Jn(#k) = Jn(#n"k), that all the information about {Xt} is contained in {Jn(#k)}n/2k=1.
Let X $n = (Xn, . . . , X1), using the notation introduced in Section 2.3.4 our objective is
to show that EnXn is almost an uncorrelated sequence (recall that (En)s,t = #(s"1)(t"1)n =
exp(2i!(s"1)(t"1)n ) is the Fourier matrix - please note that I have abused notation by using
# to denote both the frequency and also the exponential too!). There are various ways
the uncorrelatedness property can be shown (which we summarise in a lemma below), but
here we demonstrate this property using the circulant matrix approximation. We start by
considering the variance of EnXn
var(EnXn) = EnRnEn,
and our aims is to show that it is almost a diagonal. We first recall that if Rn were a circulant
matrix, then EnXn would be uncorrelated. This is not the case, but the upper right hand
side and the lower left hand side of Rn can approximated by circulant matrices - this is the
trick in showing the ‘near’ uncorrelatedness. Studying Rn
Rn =
)
****+
c(0) c(1) c(2) . . . c(n$ 1)
c(1) c(0) c(1) . . . c(n$ 2)...
.... . .
......
c(n$ 1) c(n$ 2)... c(1) c(0)
,
----.
we observe that it can be written as the sum of two circulant matrices, plus some error, that
we will bound. That is, we define the two circulant matrices
C1n =
)
****+
c(0) c(1) c(2) . . . c(n$ 1)
c(n$ 1) c(0) c(1) . . . c(n$ 2)...
.... . .
......
c(1) c(2)... c(n$ 1) c(0)
,
----.
and
C2n =
)
*******+
0 c(n$ 1) c(n$ 2) . . . c(1)
c(1) 0 c(n$ 1) . . . c(2)
c(2) c(1) 0 . . . c(3)...
.... . .
......
c(n$ 1) c(n$ 2)... c(1) 0
,
-------.
We observe that the upper right hand sides of C1n and Rn match and the lower left and
sides of C2n and Rn match. As the above are circulant their eigenvector matices are En and
85
its inverse En = E"1n . The eigenvalues of Cn1 and Cn2 are
diag(n"1"
j=0
c(j),n"1"
j=0
c(j)#jt , . . . ,
n"1"
j=0
c(j)#(t"1)jt )
diag(n"1"
j=1
c(j),n"1"
j=1
c(n$ j)#jt , . . . ,
n"1"
j=1
c(n$ j)#(t"1)jt )
= diag(n"1"
j=1
c(j),n"1"
j=1
c(j)#"jt , . . . ,
n"1"
j=1
c(j)#"(t"1)jt ),
respectively. Or more succinctly the kth eigenvalues of Cn1 and Cn2 are $k1 =!n"1
j=0 c(j)#j(k"1)t
and $k2 =!n"1
j=1 c(j)#"j(k"1)t this may not look like much, but together they give us the so
called spectral density function.
We now show that under the condition!
r |rc(r)| < ! we have
EnRnEn $ En
/
Cn1 + Cn2
0
En = O(1
n)I, (5.5)
where I is a n % n matrix of ones. To bound the above we consider the above di!erences
element by element. Since the upper right hand sides of Cn1 and Rn match and the lower
left and sides of Cn2 and Rn match, the above di!erence is
11esRne
$t $ esCn1e
$t $ esCn2e
$t
11 =
2
n
n"1"
r=1
|rc(r)| = O(1
n).
Thus we have shown (5.5). Therefore, since En is the eigenvector matrix of Cn1 and Cn2,
altogether we have
En
/
Cn1 + Cn2
0
En = diag(fn(0), fn(2!
n), . . . , fn(
2!(n$ 1)
n)),
where fn(#) =!n"1
r="(n"1) c(r) exp(ij#). Altogether this gives
var(EnXn) = EnRnEn =
)
****+
fn(0) 0 . . . 0 0
0 fn(2!n ) . . . 0 0
.... . . . . . . . .
...
0 . . . . . . 0 fn(2!(n"1)
n ))
,
----.
+O(1
n)
)
****+
1 1 . . . 1 1
1 1 . . . 1 1...
. . . . . . . . ....
1 . . . . . . 1 1
,
----.
.
Some caution needs to be take with the above, since fn(#) is not necessarily positive - but
it will be for a large enough n.
We summarise the above result in the following lemma, where we also give an alternative
proof.
86
Lemma 5.2.1 Suppose!
r |rc(r)| < !. Then we have
cov(Jn(2!k1n
), Jn(2!k2n
)) =
2
fn(2!kn ) +O( 1n) k1 = k2O( 1n) k1 "= k2
=
2
f(2!kn ) +O( 1n) k1 = k2O( 1n) k1 "= k2
where f(#) = 12!
!!j="! c(j) exp(ij#).
PROOF. The proof immediately follows from the approximation by a double circulant matrix
above. However, a more hands on proof is to just calculate cov(Jn(2!k1n ), Jn(
2!k2n )). We note
that cov(A,B) = E(AB)$ E(A)E(B), thus we have
cov(Jn(2!k1n
), Jn(2!k2n
)) =1
n
n"
t,$=1
cov(Xt, X$ ) exp(i(tk1 $ %k2)2!
n)
=1
n
n"1"
r="(n"1)
c(r) exp(ir2!k1n
)n"|r|"
t=1
exp(2!it(k1 $ k2)
n)
=1
n
n"1"
r="(n"1)
c(r) exp(ir2!k1n
)n"
t=1
exp(2!it(k1 $ k2)
n) +O(
1
n).
Thus we obtain the first result. To prove the second part, we note that fn(#) & f(#) (see
the section below for precise details). !
Now let us return to the representation of Xt in terms of the DFTs
Xt =1#n
n"
k=1
Jn(#k) exp($it#k). (5.6)
We see that from Lemma 5.2.1 that we can write Xt as the weighted average of uncorrelated
frequencies, Jn(#k), this is a heuristic justification of what is often called the spectral repre-
sentation theorem. Moreover, taking the covariance of Xt, again by using Lemma 5.2.1 we
see that
cov(Xt, X$ ) '1
n
n"
k=1
var(Jn(#k)) exp($i(t$ %)#k), (5.7)
this is essentially a heuristic justification of Bochner’s Theorem (or Herglotz’s theorem). The
two theorems mentioned above we will describe in detail in the following sections.
Of course the entire time series {Xt} will have infinite length (and in general it will not
belong to "2), so it is natural to ask whether the above results can be generalised to {Xt}.The answer is yes, by replacing the sum in (5.7) by an integral to obtain
c(t$ %) =
$ 2!
0
exp(i(t$ %)#)f(#)d# =
$ 2!
0
exp(i(t$ %)#)dF (#).
87
Comparing the above with (5.7), we observe that var(Jn(#k)) is a positive function, thus
f(·) should be positive. Moroever, its integral (F (#) =# %
0 fn($)d$) must be positive and
nondecreasing.
Let us return to (5.6), we can rewrite it as
Xt =1#n
n"
k=1
Jn(#k) exp($it#k) =n"
k=1
exp($it#k)1#n
/
Sn(#k)$ Sn(#k"1)0
,
where Sn(#k) =1#n
!kj=1 Jn(#j). Noting that the increments in Sn(#k) are orthogonal. We
show below that extending the above to the entire time series yields the representation
Xt =
$
exp(ik#)dZ(#),
where Z(#) is right continuous orthogonal increment process (that is E((Z(#1)$Z(#2)(Z(#3)$Z(#4)) = 0, when the intervals [#1,#2] and [#3,#4] do not overlap) and E(|Z(#)|2) = F (#).
The above result holds for both linear and nonlinear time series, however, in the case that
Xt has a linear representation
Xt =!"
j="!
&j't"j,
then Xt has the particular form
Xt =
$
A(#) exp(ik#)dZ(#), (5.8)
where A(#) =!!
j="! &j exp(ij#) and Z(#) is an orthogonal increment process, but in
addition E(|dZ(#)|2) = d# ie. the variance of increments do not vary over frequency (as this
varying has been absorbed by A(#), since F $(#) = |A(#)|2). This result should not come as
a surprise as a have used a close approximation of it in Section 2.3.4. In particular we recall
that
Xt 'n"
s=1
's1
n
n"1"
r=0
$"1r #rs
n #"rtn
where $"1r = (1$
!pj=1 (j exp(ij#))"1. Rearranging the above gives
Xt ' 1#n
n"1"
r=0
$"1r #"rt
n
1#n
n"
s=1
's#rsn
=1#n
n"1"
r=0
1
1$!p
j=1 (j exp(ij#r)% &' (
A(%r)
J&(#r)
88
where #r = 2!rn . It will be shown that the DFT of the innovations J&(#r) has a constant
variance over all frequencies. Therefore comparing (5.8) with the above we see that they are
very similar.
We prove the above results in the following section.
We mention that a more detailed discussion on spectral analysis in time series is give in
Priestley (1983), Chapters 4 and 6, Brockwell and Davis (1998), Chapters 4 and 10, Fuller
(1995), Chapter 3, Shumway and Sto!er (2006), Chapter 4. In many of these references they
also discuss tests for periodicity etc (see also Quinn and Hannan (2001) for estimation of
frequencies etc.).
5.3 The spectral density and distribution
Be warned, it takes time to digest these results and to appreciate them. On first reading,
they may well bore your socks o!, however, like a good wine/cheese or apple, they take time
to digest and appreciate.
Some preliminaries
We first state a theorem which is very useful for checking positive definiteness of a sequence.
See Brockwell and Davis (1998), Corollary 4.3.2 or Fuller (1995), Theorem 3.1.9.
To prove part of the result we use the fact that if a sequence {ak} ( "2, then g(#) =12!
!!k=! ak exp(ik#) ( L2 (by Parseval’s theorem) and ak =
# 2!
0 g(#) exp(ik#) and the
following result.
Lemma 5.3.1 Suppose!!
k="! |c(k)| < !, then we have
1
n
(n"1)"
k="(n"1)
|kc(k)| & 0
as n & !. Moreover, if!!
k="! |kc(k)| < !, then 1n
!(n"1)k="(n"1) |kc(k)| = O( 1n).
PROOF. The proof in straightforward in the case that!!
k=! |kc(k)| < ! (the second
assertion), in this case!(n"1)
k="(n"1)|k|n |c(k)| = O( 1n). The proof is slightly more tricky in the
case that!!
k=! |c(k)| < !. First we note that since!!
k="! |c(k)| < ! for every ' > 0
there exists a N& such that for all n ) N&,!
|k|%n |c(k)| < '. Let us suppose that n > N&,
89
then we have the bound
1
n
(n"1)"
k="(n"1)
|kc(k)| * 1
n
(N""1)"
k="(N""1)
|kc(k)|+ 1
n
"
|k|%N"
|kc(k)|
* 1
2!n
(N""1)"
k="(N""1)
|kc(k)|+ '.
Hence if we keep N& fixed we see that 1n
!(N""1)k="(N""1) |kc(k)| & 0 as n & !. Since this is
true for all ' (for di!erent thresholds N&) we obtain the required result. !
5.3.1 The spectral density
The Fourier transform of the covariance
f(#) =1
2!
!"
k="!
c(k) exp(ik#)
is called the spectral density. The eagle-eyed amongst will notice that it is the limit of
the eigenvalues (or spectral representation) of the double circulant matrix considered in the
previous section - which may be one explanation as to why it is called the spectral density
(the reason for it being called density will become clear from the theorem below).
Theorem 5.3.1 (The spectral density) Suppose the coe!cients {c(k)} are absolutely summable
(that is!
k |c(k)| < !). Then the sequence {c(k)} is nonnegative definite if an only if the
function f(#), where
f(#) =1
2!
!"
k="!
c(k) exp(ik#)
is nonnegative. Moreover
c(k) =
$ 2!
0
exp($ik#)f(#)d#. (5.9)
It is worth noting that f is called the spectral density corresponding to the covariances {c(k)}.
PROOF. We first show that if {c(k)} is a non-negative definite sequence, then f(#) is a
nonnegative function. We recall that since {c(k)} is non-negative then for any sequence x =
(x1, . . . , xN ) (real or complex) we have!n
s,t=1 xsc(s$ t)xs ) 0 (where xs is the complex con-
jugate of xs). Now we consider the above for the particular case x = (exp(i#), . . . , exp(in#)).
Define the function
fn(#) =1
2!n
n"
s,t=1
exp(is#)c(s$ t) exp($it#).
90
Thus by definition fn(#) ) 0. We note that fn(#) can be rewritten as
fn(#) =1
2!
(n"1)"
k="(n"1)
/n$ |k|n
0
c(k) exp(ik#).
Comparing f(#) = 12!
!!k="! c(k) exp(ik#) with fn(#) we see that
11f(#)$ fn(#) *
11 * 1
2!
11"
|k|%n
c(k) exp(ik#)11+
1
2!
11
(n"1)"
k="(n"1)
|k|nc(k) exp(ik#)
11
:= In + IIn.
Now since!!
k="! |c(k)| < ! it is clear that In & 0 as n & !. Using Lemma 5.3.1 we have
IIn & 0 as n & !. Altogether the above implies
11f(#)$ fn(#)
11& 0 n & !. (5.10)
Now it is clear that since for all n, fn(#) are nonnegative functions, the limit f must be
nonnegative (if we suppose the contrary, then there must exist a sequence of functions
{fnk(#)} which are not necessarily nonnegative, which is not true). Therefore we have
shown that if {c(k)} is a nonnegative definite sequence, then f(#) is a nonnegative function.
We now show that f(#), defined by 12!
!!k="! c(k) exp(ik#), is a nonnegative function
then {c(k)} is a nonnegative sequence. We first note because {c(k)} ( "1 it is also in "2hence we have that c(k) =
# 2!
0 f(#) exp(ik#). Now we have
n"
s,t=1
xsc(s$ t)xs =
$ 2!
0
f(#)3
n"
s,t=1
xs exp(i(s$ t)#)xs
4
d# =
$ 2!
0
f(#)|n"
s=1
xs exp(is#)|2d# ) 0.
Hence we obtain the desired result. !
The above theorem is very useful. It basically gives a simple way to check whether a
sequence {c(k)} is non-negative definite or not (hence whether it is a covariance function -
recall Theorem 1.1.1).
Example 5.3.1 Consider the empirical covariances (here we gives an alternative proof to
Remark 4.1.1) defined in Chapter 4
cn(k) =
21n
!n"kt=1 XtXt"k |k| * n$ 1
0 otherwise,
we show below that {cn(k)} is non-negative definite sequence. Thus, using Lemma 1.1.1 there
exists a stationary time series {Zt} which has the covariance cn(k).
91
To show that the sequence is non-negative definite we use Theorem 5.3.1, where the
main consequence is that a sequence is non-negative definite, if its Fourier transform is
non-negative. We now apply this result. The Fourier transform of {cn(k)} is
(n"1)"
k="(n"1)
exp(ik#)cn(k) =(n"1)"
k="(n"1)
exp(ik#)cn(k) =1
n
n"|k|"
t=1
XtXt+|k| =1
n
11
n"
t=1
Xt exp(it#)11 ) 0.
Since the above is non-negative, this means that {cn(k)} is a non-negative definite sequence.
We now state a useful result which relates the largest and smallest eigenvalue of of a
variance matrix of a stationary process to the smallest and largest values of the spectral
density.
Lemma 5.3.2 Suppose that {Xk} is a stationary process with covariance function {c(k)}and spectral density f(#). Let Rn = var(Xn), where Xn = (X1, . . . , Xn). Suppose inf% f(#) )m > 0 and sup% f(#) * M < ! Then for all n we have
$min(Rn) ) inf%f(#) and $max(Rn) * sup
%f(#).
PROOF. Let e1 be the eigenvector with smallest eigenvalue $1 corresponding to Rn. Then
using c(s$ t) =#
f(#) exp(i(s$ t)#)d# we have
$min(Rn) = e$1Rne1 =n"
s,t=1
es,1c(s$ t)et,1 =
$
f(#)n"
s,t=1
es,1 exp(i(s$ t)#)et,1d# =
=
$ 2!
0
f(#)|n"
s=1
es,1 exp(is#)|2d# )$ 2!
0
f(#)
$ 2!
0
|n"
s=1
es,1 exp(is#)|2d# * inf%f(#),
since#
|!n
s=1 es,1 exp(is#)|2d# = 1. Using a similar method we can show that $max(Rn) *sup f(#). !
A consequence of the above result is if a spectral density is bounded from above and
bounded away from zero then it is non-singular and with a bounded spectral norm.
Lemma 5.3.3 Suppose the covariance {c(k)} decays to zero as k & !, then for all n,
Rn = var(Xn) is a non-singular matrix (Note we do not require the stronger condition the
covariances are absolutely summable).
PROOF. See Brockwell and Davis (1998), Proposition 5.1.1. !
92
5.3.2 The spectral distribution and Bochner’s theorem
Theorem 5.3.1 only holds when the sequence {c(k)} is absolutely summable. Of course this
may not always be the case. An example of an ‘extreme’ case is the time series Xt = Z.
Clearly this is a stationary time series and its covariance is c(k) = var(Z) for all k. In this
case the autocovariances {c(k) = 1}, is not absolutely summable, hence the representation
of the covariance in Theorem 5.3.1 can not be applied to this case. The reason is because the
fourier transform of the infinite sequence {1} is not well defined (since {1} does not belong
to "1 and also "2).
However, we now show that Theorem 5.3.1 can be generalised to include all non-negative
definite sequences and stationary processes, by considering the spectral distribution rather
than the spectral density (we use the integral#
g(x)dF (x), a definition is given in the
Appendix).
Theorem 5.3.2 A function {c(k)} is non-negative definite sequence if and only if
c(k) =
$ 2!
0
exp(ik#)dF (#), (5.11)
where F (#) is a right-continuous (this means that F (x + h) & F (x) as 0 < h & 0),
non-decreasing, non-negative bounded function on [$!, !] (hence it has all the properties
of a distribution and it can be consider as a distribution - it is usually called the spectral
distribution). This representation is unique.
PROOF. We first show that if {c(k)} is non-negative definite sequence, then we can write
c(k) =# 2!
0 exp(ik#)dF (#), where F (#) is a distribution function. Had {c(k)} been abso-
lutely summable, then we can use Theorem 5.3.1 to write c(k) =# 2!
0 exp(ik#)dF (#), where
F (#) =# %
0 f($)d$ and f($) = 12!
!!k="! c(k) exp(ik#). By using Theorem 5.3.1 we know
that f($) is nonnegative, hence F (#) is a distribution, and we have the result.
In the case that {c(k)} is not absolutely summable we cannot use this approach but we
adapt some of the ideas used to prove Theorem 5.3.1. As in the proof of Theorem 5.3.1
define the nonnegative function
fn(#) =1
2!n
n"
s,t=1
exp(is#)c(s$ t) exp($it#) =1
2!
(n"1)"
k="(n"1)
/n$ |k|n
0
c(k) exp(ik#).
When the {c(k)} is not absolutely summable, the limit of fn(#) may no longer be well
defined. To circumvent our dealing with functions which may have awkward limits, we
consider instead their integral, which we will show will always be a distribution function.
Let us define the function Fn(#) whose derivative is fn(#), that is
Fn(#) =
$ %
0
fn($)d$ 0 * $ * 2!.
93
Since fn($) is nonnegative, Fn is a nondecreasing function, and it is bounded (Fn(!) =# 2!
0 fn($)d$ * c(0)). Hence Fn satisfies all properties of a distribution and can be treated
as a distribution function. Now it is clear that for every k we have
$ 2!
0
exp(ik#)dFn(#) =
2
(1$ |k|n )c(k) |k| * n
0 0(5.12)
If we let dn,k =# 2!
0 exp(ik#)dFn(#), we see that {dn,k}n for every k, {dn,k}n is a Cauchy
sequence, thus dn,k & dk as n & !. But we should ask what this tells us about the limit of
the distribution the underlying {Fn}? It would appear that the distributions {Fn} should
(weakly converge) to a function F and this function should also be a distribution function
(if {Fn} are all nondecreasing functions, then its limit must be nondecreasing). In fact this
turns out to be the case by applying Helly’s theorem (see Appendix). Roughly speaking, it
states that given a sequence of distributions {Gk} which are all bounded, then there exists
a distribution G, which is the limit of a subsequence of {Gki}. Hence for every h ( L2 we
have$
h(#)dGki(#) &$
h(#)dG(#).
We now apply this result to the sequence {Fn}. We observe that the sequence of distributions
{Fn} are all uniformly bounded (by c(0)). Therefore applying Helly’s theorem there must
exist a distribution F , which is the limit of a subsequence of {Fn}, that is for every h ( L2
we have$
h(#)dFki(#) &$
h(#)dF (#), i & !,
for some subsequence {Fki}. Below we show that the above limit is true not only for a
subsequence but the actual sequence {Fk}. We observe that {exp(ik#)} is a basis of L2,
and that the sequence {# 2!
0 exp(ik#)dFn(#)}n converges for all k, to c(k). Therefore for all
h ( L2 we have$
h(#)dFk(#) &$
h(#)dF (#), k & !,
for some distribution function F . Therefore looking at the case {exp(ik#)} we have$
exp(ik#)dFk(#) &$
exp(ik#)dF (#), k & !.
Since#
exp(ik#)dFk(#) & c(k) and#
exp(ik#)dFk(#) &#
exp(ik#)dF (#), then we have
c(k) =#
exp(ik#)dF (#), where F is a distribution.
To show that {c(k)} is a non-negative definite sequence when c(k) is defined as c(k) =#
exp(ik#)dFk(#), we use the same method given in the proof of Theorem 5.3.1. !
94
Example 5.3.2 Using the above we can construct the spectral distribution for the (rather
silly) time series Xt = Z. Let F (#) = 0 for # < 0 and F (#) = var(Z) for # ) 0 (hence F
is the step function). Then we have
cov(X0, Xk) = var(Z) =
$
exp(ik#)dF (#).
5.4 The spectral representation theorem
We now state the spectral representation theorem and give a rough outline of the proof.
Theorem 5.4.1 If {Xt} is a second order stationary time series with mean zero, and spectral
distribution F (#), and the spectral distribution function is F (#), then there exists a right
continuous, orthogonal increment process {Z(#)} (that is E((Z(#1)$Z(#2)(Z(#3)$Z(#4)) =
0, when the intervals [#1,#2] and [#3,#4] do not overlap) such that
Xt =
$ 2!
0
exp(it#)dZ(#), (5.13)
where for #1 ) #2, E(Z(#1) $ Z(#2))2 = F (#1) $ F (#2) (noting that F (0) = 0). (One
example of a right continuous, orthogonal increment process is Brownian motion, though
this is just one example, and usually Z(#) will be far more general than Brownian motion).
Heuristically we see that (5.13) is the decomposition of Xt in terms of frequencies, whose
amplitudes are orthogonal. In other words Xt is decomposed in terms of frequencies exp(it#)
which have the orthogonal amplitudes dZ(#) ' (Z(# + ))$ Z(#)).
Remark 5.4.1 Note that so far we have not defined the integral on the right hand side of
(5.13), this is known as a stochastic integral. Unlike many deterministic functions (functions
whose derivative exists), one cannot really suppose dZ(#) ' Z $(#)d#, because usually a
typical realisation of Z(#) will not be smooth enough to di"erentiate. For example, it is
well known that Brownian is quite ‘rough’, that is a typical realisation of Brownian motion
satisfies |B(t1, #) $ B(t2, #)| * K(#)|t1 $ tt|', where # is a realisation and * * 1/2, but
in general * will not be larger. The integral#
g(#)dZ(#) is well defined if it is defined
as the limit (in the mean squared sense) of discrete sums. In other words let Zn(#) =!n
k=1 Z(#k)I%nk!1,%nk(#) and
$
g(#)dZn(#) =n"
k=1
g(#k){Z(#k)$ Z(#k"1)},
then#
g(#)dZ(#) is the mean squared limit of {#
g(#)dZn(#)}n that is E[#
g(#)dZ(#) $#
g(#)dZn(#)]2. For a more precise explanation, see Priestley (1983), Sections 3.6.3 and
Section 4.11 and Brockwell and Davis (1998), Section 4.7.
95
A very elegant explanation on the di!erent proofs of the spectral representation theorem
is given in Priestley (1983), Section 4.11. We now give a rough outline of the proof using
the functional theory approach.
PROOF of the Spectral Representation Theorem To prove the result we will
define two Hilbert spaces H1 and H2, where H1 one contains deterministic functions and H2
contains random variables. We will define what is known as an isomorphism (a one-to-one
mapping which preserves the norm and is linear) between these two spaces.
Let H1 be defined by all functions g, if# 2!
0 g2(#)dF (#) < ! (and F is the spectral
distribution of the second order stationary time series {Xt}), then g ( H1 and define the
inner product on H1 to be
< f, g >=
$ 2!
0
f(x)g(x)dF (x). (5.14)
We first note that {exp(ik#)} belongs to H1, moreover they also span the space H1. Hence
if g ( H1, then there exists coe"cients {aj} such that g(x) =!
j aj exp(ij#). Let H2 be
the space spanned by the second order stationary time series {Xt}, hence H2 = sp({Xt}) (itnecessary to define the closure of this space, but we won’t do so here) and the inner product
is the covariance cov(Y,X).
Now let us define the linear mapping T : H1 & H2
T (n"
j=1
aj exp(ik#)) =n"
j=1
ajXk, (5.15)
for any n (it is necessary to show that this can be extended to infinite n, but we won’t do so
here). We need to shown that T defines an isomorphism. We first observe that this mapping
perserves the inner product. That is suppose f, g ( H1, then there exists {fj} and {gj} such
that f(x) =!
j fj exp(ij#) and g(x) =!
j gj exp(ij#). Hence by definition of T in (5.15)
we have
< Tf, Tg > = cov("
j
fjXj ,"
j
gjXj) ="
j1,j2
fj1gj2cov(Xj1 , Xj2)
=
$ 2!
0
/"
j1,j2
fj1gj2 exp(i(j1 $ j2)#)0
dF (#) =
$ 2!
0
f(x)g(x)dF (x) =< f, g >
(5.16)
where the above is due to Bochner’s theorem (see Theorem 5.3.2). Hence < Tf, Tg >=<
f, g >, so the inner product is preserved. To show that it is a one-to-one mapping see Brock-
well and Davis (1998), Section 4.7. Altogether this means that T defines an isomorphism
betwen H1 and H2. Therefore all functions which are in H1 have a corresponding random
variable in H2 which display similar properties.
96
Since for all # ( [0, 2!], the identity functions I[0,%](x) ( H1, we can define the random
function {Z(#); 0 * $ * 2!} to be T (I[0,%]) = Z(#). Now since that mapping T is linear we
observe that T (I[%1,%2]) = Z(#1)$Z(#2). Moreover, since T preserves the norm we have for
any non-intersecting intervals [#1,#2] and [#3,#4] that
E((Z(#1)$ Z(#2)(Z(#3)$ Z(#4)) = < T (I[%1,%2]), T (I[%3,%4]) >=< I[%1,%2], I[%3,%4] >
=
$
I[%1,%2](x)I[%3,%4]dF (#) = 0.
Therefore by construction {Z(#); 0 * $ * 2!} is an orthogonal increment process, with
E((Z(#1)$ Z(#2)2) = < T (I[%1,%2]), T (I[%1,%2]) >< I[%1,%2], I[%2,%3] >
=
$ %2
%1
dF (#) = F (#1)$ F (#2).
Having defined the two spaces which are isomorphic and the random function {Z(#); 0 *$ * 2!} and function I[0,%](x) which are have orthogonal increments. We can now prove the
result. We note that for any function g ( L2 we can write
g(#) =
$ 2!
0
g(s)dI[%,2!](s),
(thus dI[0,%](s) = )%(s)ds, where )%(s) is the dirac delta function). We now consider the
special case gt(#) = exp(it#), and apply the isomophism T to this
T (exp(it#)) =
$ 2!
0
exp(its)dI[%,2!](s)
$ 2!
0
exp(its)dT (I[%,2!](s)),
where the mapping goes inside the integral due to the linearity of the isomorphism. Now
we observe that I[%,2!](s) = I[0,s](%), thus by definition of {Z(#); 0 * $ * 2!} we have
T (I[0,s](#)) = Z(s). Substituting this into the above gives
Xt =
$ 2!
0
exp(its)dZ(s),
which gives the required result. !
It is worth taking a stop back from the proof and see where the assumption of stationarity
crept in. By Bochner’s theorem we have that
c(t$ %) =
$
exp(i(t$ %)#)dF (#),
thus we use F to define the space H1, the mapping T (through {exp(ik#)}k), the inner-
product and thus the isomorphism. But the exponential functions {exp(ik#)} and F do not
97
necessarily play a unique role. It was the construction of the orthogonal random functions
{Z(#)} that was instrumental. The main idea of the proof was that there are functions
{(k(#)} and a distribution H such that all the covariances of the stochastic process {Xt}can be written as
E(XtX$ ) = c(t, %) =
$ 2!
0
(t(#)($ (#)dH(#),
where H is a measure. As long as the above representation exists, then we can define
two spaces H1 and H2 where {(k} is the basis of the functional space H1 and it con-
tains all functions f such that#
f 2(#)dH(#) < ! and H2 is the random space defined by
sp(. . . , X"1, X0, X1, . . .). From here we can define an isomorphism T : H1 & H2, where for
all functions f(#) =!
k fk(k(#) ( H1
T (f) ="
k
fkXk ( H2.
An important example is T ((k) = Xk. Now by using the same arguments as those in the
proof above we have
Xt =
$
(t(#)dZ(#)
where {Z(#)} are orthogonal random functions and E|Z(#)|2 = H(#). We state this result
in the theorem below (see Priestley (1983), Section 4.11).
Theorem 5.4.2 (General orthogonal expansions) Let {Xt} be a time series (not nec-
essarily second order stationary) with covariance {E(XtX$ ) = c(t, s)}. If there exists a
sequence of functions {(k(·)} which satisfy for all k
$ 2!
0
|(k(#)|2dH(#) < !
and the covariance admits the representation
c(t, s) =
$ 2!
0
(t(#)(s(#)dH(#), (5.17)
where H is a distribution then for all t we have the representation
Xt =
$
(t(#)dZ(#) (5.18)
where {Z(#)} are orthogonal random functions and E|Z(#)|2 = H(#). On the other hand if
Xt has the representation (5.18), then c(s, t) admits the representation (5.17).
98
Remark 5.4.2 We mention that the above representation applies to both stationary and
nonstationary time series. What makes the exponential functions {exp(ik#)} special is if a
process is stationary then the representation of c(k) := cov(Xt, Xt+k) in terms of exponentials
is guaranteed:
c(k) =
$ 2!
0
exp($ik#)dF (#). (5.19)
Therefore there always exists an orthogonal random function {Z(#)} such that
Xt =
$
exp($it#)dZ(#).
Indeed, whenever the exponential basis is used in the definition of either the covariance or
the process {Xt}, the resulting process will always be second order stationary.
We mention that it is not always guaranteed that for any basis {(t} we can represent the
covariance {c(k)} as (5.17). However (5.18) is a very useful starting point for characterising
nonstationary processes.
5.5 The spectral densities of MA, AR and ARMA mod-
els
We obtain the spectral density function for MA(!) processes. Using this we can easily
obtain the spectral density for ARMA processes. Let us suppose that {Xt} satisfies the
representation
Xt =!"
j="!
&j't"j (5.20)
where {'t} are iid random variables with mean zero and variance +2 and!!
j="! |&j| < !.
We recall that the covariance of above is
c(k) = E(XtXt+k) =!"
j="!
&j&j+k. (5.21)
Since!!
j="! |&j| < !, it can be seen that
"
k
|c(k)| *"
k
!"
j="!
|&j| · |&j+k| < !.
Hence by using Theorem 5.3.1, the spectral density function of {Xt} is well defined. There
are several ways to derive the spectral density of {Xt}, we can either use (5.21) and f(#) =
99
12!
!
k c(k) exp(ik#) or obtain the spectral representation of {Xt} and derive f(#) from the
spectral representation. We prove the results using the latter method.
Since {'t} are iid random variables, using Theorem 5.4.1 there exists an orthogonal
random function {Z(#)} such that
't =1
2!
$ 2!
0
exp(it#)dZ(#).
Since E('t) = 0 and E('2t ) = +2 multiplying the above by 't, taking expectations and noting
that due to the orthogonality of {Z(#)} we have E(dZ(#1)dZ(#2)) = 0 unless #1 = #2 we
have that E(|dZ(#)|2) = 2!+2d#, hence f&(#) = 2!+2.
Using the above we can obtain the spectral representation for {Xt}
Xt =1
2!
$ 2!
0
3!"
j="!
&j exp($ij#)4
exp(it#)dZ(#).
Hence
Xt =
$ 2!
0
A(#) exp(it#)dZ(#), (5.22)
where A(#) = 12!
!!j="! &j exp($ij#), noting that this is the unique spectral representation
of Xt.
Definition 5.5.1 (The Cramer Representation) We mention that the representation in
(5.22) of a stationary process is usually called the Cramer representation of a stationary
process, where
Xt =
$ 2!
0
A(#) exp(it#)dZ(#),
where {Z(#) : 0 * # * 2!} are orthogonal functions.
Multiplying (5.22) by Xt+k and taking expectations gives
E(XtXt+k) = c(k) =
$ 2!
0
A(#1)A($#2) exp(it#1 $ i(t+ k)#2)E(dZ(#1)dZ(#2)).
Due to the orthogonality of {Z(#)} we have E(dZ(#1)dZ(#2)) = 0 unless #1 = #2, altogether
this gives
E(XtXt+k) = c(k) =
$ 2!
0
|A(#)|2 exp($ik#)E(|dZ(#)|2) =$ 2!
0
2!+2|A(#)|2 exp($ik#)d#.
Comparing the above with (5.9) we see that the spectral density f(#) = 2!+2|A(#)|2 =(2
2! |!!
j="! &j exp($ij#)|2. Therefore the spectral density function corresponding to the
MA(!) process defined in (5.20) is
f(#) = 2!+2|A(#)|2 = +2
2!|
!"
j="!
&j exp($ij#)|2.
100
Example 5.5.1 Let us suppose that {Xt} is a stationary ARMA(p, q) time series (not
necessarily invertible or causal), where
Xt $p"
j=1
&jXt"j =q"
j=1
,j't"j,
{'t} are iid random variables with E('t) = 0 and E('2t ) = +2. Then the spectral density of
{Xt} is
f(#) =+2
2!
|1 +!q
j=1 ,j exp(ij#)|2
|1$!q
j=1 (j exp(ij#)|2
We note that because the ARMA is the ratio of trignometric polynomials, this is known as a
rational spectral density.
Remark 5.5.1 The roots of characteristic function of an AR process will have an influence
on the location of peaks in its corresponding spectral density function. To see why consider
the AR(2) model
Xt = (1Xt"1 + (2Xt"2 + 't,
where {'t} are iid random variables with zero mean and E('2) = +2. Suppose the roots of the
characteristic polynomial ((B) = 1$(1B$(2B2 lie outside the unit circle and are complex
conjugates where $1 = r exp(i,) and $2 = r exp($i,). Then the spectral density function is
f(#) =+2
|1$ r exp(i(, $ #))|2|1$ r exp(i($, $ #)|2
=+2
[1 + r2 $ 2r cos(, $ #)][1 + r2 $ 2r cos($, $ #)].
Now we see that r > 0, the f(#) is maximum when # = ,, however when r < 0 then the
above is maximum when # = , $ !. Thus the peaks in f(#) correspond to peaks in the
pseudo periodicities of the time series and covariance structure (which one would expect).
How pronounced these peaks are depend on how close r is to one. The close r is to one the
larger the peak. We can generalise the above argument to higher order Autoregressive models,
in this case there may be multiple peaks. In fact, this suggests that the larger the number of
peaks, the higher the order of the AR model that should be fitted.
5.5.1 Approximations of the spectral density to AR and MA spec-
tral densities
In this section we show that the spectral density
f(#) =1
2!
!"
r="!c(r) exp(ir#)
101
can be approximated to any order by the spectral density of an AR(p) or MA(q) process.
We do this by truncating the infinite number of covariances by a finite number, however,
this does not lead to a positive definite spectral density. Instead we consider a slight variant
on this and define the Cesaro sum
fm(#) =1
2!
m"
r="m
(1$ |r|m
)c(r) exp(ir#) =
$ 2!
0
f($)Fm($$ #)d$, (5.23)
where Fm($) =!m
r="m Dr($) andDr($) =!r
j="r exp(ij#) (these are the Fejer and Dirichlet
kernels respectively). It can be shown that Fm is a positive kernel, therefore since f is
positive, then fm is positive and {(1$ |r|m c(r))}r is a positive definite sequence. Moreover, it
can be shown that
sup%
|fm(#)$ f(#)| & 0, (5.24)
thus for a large enough m, fm(#) will be within ) of the spectral density f . Using this we
can prove the results below.
Lemma 5.5.1 Suppose that!
r |c(r)| < ! and f is the spectral density of the covariances.
Then for every ) > 0, there exists a m such that |f(#)$ fm(#)| < ) and fm(#) = +2|&(#)|2,where &(#) =
!mj=0 &j exp(ij#). Thus we can approximate the spectral density of f with the
spectral density of a MA.
PROOF. We show that there exists an MA(m) which has the spectral density fm(#), where
fm is defined in (5.23). Thus by (5.24) we have the result.
To prove the result we note that if a(r) = a($r), then!m
j="m a(r) exp(ir#) can be
factorised asm"
j="m
a(r) exp(ir#) = exp($im#)2m"
r=0
a(r $m) exp(ir#)
= C exp($im#)m5
j=1
(1$ $j exp(i#))(1$ $"1j exp(i#))
= C($1)mm5
j=1
$j
m5
j=1
(1$ $"1j exp($i#))(1$ $"1
j exp(i#))
for some finite constant C. Using the above we have
fm(#) = K/
m5
j=1
(1$ $"1j exp(i#))
0/m5
j=1
(1$ $"1j exp($i#))
0
:= A(#)A($#),
thus fm(#) is the spectral density of an MA(m) process. !
102
Lemma 5.5.2 Suppose that!
r |c(r)| < ! and f is the spectral density of the covariances.
Then for every ) > 0, there exists a m such that |f(#)$gm(#)| < ) and gm(#) = +2|((#)|"2,
where ((#) =!m
j=0 (j exp(ij#). Thus we can approximate the spectral density of f with the
spectral density of an AR.
PROOF. We observe that
11f(#)$ gm(#)
11 = f(#)|gm(#)"1 $ f(#)"1|gm(#).
Now we can apply the same arguments to prove to Lemma 5.5.1 to obtain the result. !
5.6 Higher order spectrums
We recall that the covariance is measure of linear dependence between two random variables.
Higher order cumulants are a measure of higher order dependence. For example, the third
order cumulant for the zero mean random variables X1, X2, X3 is
cum(X1, X2, X3) = E(X1X2X3)
and the fourth order cumulant for the zero mean random variables X1, X2, X3, X4 is
cum(X1, X2, X3, X4) = E(X1X2X3X4)$ E(X1X2)E(X3X4)$ E(X1X3)E(X2X4)$ E(X1X4)E(X2X3).
From the definition we see that if X1, X2, X3, X4 are independent then cum(X1, X2, X3) = 0
and cum(X1, X2, X3, X4) = 0.
Moreover, if X1, X2, X3, X4 are Gaussian random variables then cum(X1, X2, X3) = 0
and cum(X1, X2, X3, X4) = 0. Indeed all cumulants higher than order two is zero. This
comes from the fact that cumulants are the coe"cients of the power series expansion of the
logarithm of the moment generating function of {Xt}.Since the spectral density is the fourier transform of the covariance it is natural to ask
whether one can define the higher order spectra as the fourier transform of the higher order
cumulants. This turns out to be the case, and the higher order spectra have several interesting
properties.
Let us suppose that {Xt} is a stationary time series (notice that we are assuming it
is strictly stationary and not second order). Let -3(t, s) = cum(X0, Xt, Xs), -3(t, s, r) =
cum(X0, Xt, Xs, Xr) and -q(t1, . . . , tq"1) = cum(X0, Xt1 , . . . , Xtq) (noting that like the co-
variance the higher order cumulants are invariant to shift). The third, fourth and qth order
103
spectras is defined as
f3(#1,#2) =!"
s="!
!"
t="!-3(s, t) exp(is#1 + it#2)
f4(#1,#2,#3) =!"
s="!
!"
t="!
!"
r="!-4(s, t, r) exp(is#1 + it#2 + ir#3)
fq(#1,#2, . . . ,#q"1) =!"
t1,...,tq!1="!-q(t1, t2, . . . , tq"1) exp(it1#1 + it2#2 + . . .+ itq"1#q"1).
Example 5.6.1 (Third and Fourth order spectra of a linear process) Let us suppose
that {Xt} satisfies
Xt =!"
j="!
&j't"j
where!!
j="! |&j| < !, E('t) = 0 and E('4t ) < !. Let A(#) =!!
j="! &j exp(ij#). Then
it is straightforward to show that
f(#) = +2|A(#)|2
f3(#1,#2) = -3A(#1)A(#2)A($#1 $ #2)
f4(#1,#2,#3) = -4A(#1)A(#2)A(#3)A($#1 $ #2 $ #3),
where -3 = cum('t, 't, 't) and -4 = cum('t, 't, 't, 't).
We see from the example, that unlike the spectral density, the higher order spectras are
not necessarily positive or even real.
A review of higher order spectra can be found in Brillinger (2001). Higher order spectras
have several applications especially in nonlinear processes, see Subba Rao and Gabr (1984).
We will consider one such application in a later chapter.
Using the definition of the higher order spectrum we can now generalise Lemma 5.2.1 to
higher order cumulants (see Brillinger (2001), Theorem 4.3.4).
Proposition 5.6.1 {Xt} is a strictly stationary time series, where for all 1 * i * q $ 1 we
have!!
t1,...,tq!1=! |(1 + ti)-q(t1, . . . , tq"1)| < !. Then we have
cum(Jn(#k1), . . . , Jn(#kq)) =1
nq/2fq(#k2 , . . . ,#kq)
n"
j=1
exp(ij(#k1 $ . . .$ #kq)) +O(1
nq/2)
=
21
n(q!1)/2fq(#k2 , . . . ,#kq) +O( 1nq/2 )
!qi=1 ki = nm
O( 1nq/2 ) otherwise
where #ki =2!kin .
104
5.7 Extensions
5.7.1 The spectral density of a time series with randomly missing
observations
Let us suppose that {Xt} is a second order stationary time series. However {Xt} is not
observed at everytime point and there are observations missing, thus we only observe Xt
at {%k}k. Thus what is observed is {X$k}. The question is how to deal with this type of
data. One method was suggested in ?. He suggested that the missingness mechanism {%k}be modelled stochastically. That is define the random process {Yt} which only takes the
values {0, 1}, where Yt = 1 if Xt is observed, but Yt = 0 if Xt is not observed. Thus we
observe {XtYt}t = {Xtk} and also {Yt} (which is the time points the process is observed).
He also suggests modelling {Yt} as a stationary process, which is independent of {Xt} (thus
the missingness mechanism and the time series are independent).
The spectral densities of {XtYt}, {Xt} and {Yt} have an interest relationship, which
can be exploited to estimate the spectral density of {Xt} given estimators of the spectral
densities of {XtYt} and {Xt} (which we recall are observed). We first note that since {Xt}and {Yt} are stationary, then {XtYt} is stationary, furthermore
cov(XtYt, X$Y$ ) = cov(Xt, X$ )cov(Yt, Y$ ) + cov(Xt, Y$ )cov(Yt, X$ ) + cum(Xt, Yt, X$ , Y$ )
= cov(Xt, X$ )cov(Yt, Y$ ) = cX(t$ %)cY (t$ %)
where the above is due to independence of {Xt} and {Yt}. Thus the spectral density of
{XtYt} is
fXY (#) =1
2!
!"
r="!cov(X0Y0, XrYr) exp(ir#)
=1
2!
!"
r="!cX(r)cY (r) exp(ir#)
=
$
fX($)fY (# $ $)d#,
where fX($) = 12!
!!r="! cX(r) exp(ir#) and fY ($) = 1
2!
!!r="! cY (r) exp(ir#) are the
spectral densities of the observations and the missing process.
105