Time Series: Theory and Methods (Springer Series in Statistics)
-
Upload
others
-
View
1
-
Download
0
Embed Size (px)
Citation preview
Springer Series in Statistics
I. Olkin, N. Wermuth, S. Zeger
Peter J. Brockwell Richard A. Davis
Time Series: Theory and Methods Second Edition
Springer
Peter J. Brockwell Department of Statistics Colorado State
University Fort Collins, CO 8052 3 USA
Richard A. Davis Department of Statistics Columbia University New
York, NY 10027 USA
Mathematical Subject Classification: 62-01, 62M10
Library of Congress Cataloging-in-Publication Data Brockwell, Peter
J.
Time series: theory and methods I Peter J. Brockwell, Richard A.
Davis. p. em. -(Springer series in statistics)
"Second edition"-Pref. Includes bibliographical references and
index. ISBN 0-387-97429-6 (USA).-ISBN 3-540-97429-6 (EUR.) I.
Time-series analysis. I. Davis, Richard A. II. Title. III.
Series.
QA280.B76 1991 90-25821 519.5'5-dc20
ISBN 1-4419-0319-8 Printed on a.cid-free paper. ISBN
978-1-4419-0319-8 (soft cover) © 2006 Springer Science +Business
Media, LLC All rights reserved. This work may not be translated or
copied in whole or in part without the written permission of the
publisher (Springer Science+ Business Media, LLC, 233 Spring
Street, New York, NY 10013, USA), except for brief excerpts in
connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed is forbidden. The use
in this publication of trade names, trademarks, service marks, and
similar terms, even if they are not identified as such, is not to
be taken as a n expres.sion of opinion as to whether or not they
are subject to proprietary rights.
Printed in the United States of America.
15 14 13
Preface to the Second Edition
This edition contains a large number of additions and corrections
scattered throughout the text, including the incorporation of a new
chapter on state-space models. The companion diskette for the IBM
PC has expanded into the software package I TSM: An Interactive
Time Series Modelling Package for the PC, which includes a manual
and can be ordered from Springer-Verlag. *
We are indebted to many readers who have used the book and programs
and made suggestions for improvements. Unfortunately there is not
enough space to acknowledge all who have contributed in this way;
however, spcial mention must be made of our prize-winning
fault-finders, Sid Resnick and F. Pukelsheim. Special mention
should also be made of Anthony Brockwell, whose advice and support
on computing matters was invaluable in the preparation of the new
diskettes. We have been fortunate to work on the new edition in the
excellent environments provided by the University of Melbourne and
Colorado State University. We thank Duane Boes particularly for his
support and encouragement throughout, and the Australian Research
Council and National Science Foundation for their support of
research related to the new material. We are also indebted to
Springer-Verlag for their constant support and assistance in
preparing the second edition.
Fort Collins, Colorado November, 1 990
P.J. BROCKWELL
R.A. DAVIS
* ITSM: An Interactive Time Series Modelling Package for the PC by
P.J. Brockwell and R.A. Davis. ISBN: 0-387-97482-2; 1991.
viii Preface to the Second Edition
Note added in the eighth printing: The computer programs referred
to in the text
have now been superseded by the package ITSM2000, the student
version of which
accompanies our other text, Introduction to Time Series and
Forecasting, also
published by Springer-Verlag. Enquiries regarding purchase of the
professional
version of this package should be sent to pjbrockwell
@cs.com.
Preface to the First Edition
We have attempted in this book to give a systematic account of
linear time series models and their application to the modelling
and prediction of data collected sequentially in time. The aim is
to provide specific techniques for handling data and at the same
time to provide a thorough understanding of the mathematical basis
for the techniques. Both time and frequency domain methods are
discussed but the book is written in such a way that either
approach could be emphasized. The book is intended to be a text for
graduate students in statistics, mathematics, engineering, and the
natural or social sciences. It has been used both at the M.S.
level, emphasizing the more practical aspects of modelling, and at
the Ph.D. level, where the detailed mathematical derivations of the
deeper results can be included.
Distinctive features of the book are the extensive use of
elementary Hilbert space methods and recursive prediction
techniques based on innovations, use of the exact Gaussian
likelihood and AIC for inference, a thorough treatment of the
asymptotic behavior of the maximum likelihood estimators of the
coefficients of univariate ARMA models, extensive illustrations of
the tech niques by means of numerical examples, and a large number
of problems for the reader. The companion diskette contains
programs written for the IBM PC, which can be used to apply the
methods described in the text. Data sets can be found in the
Appendix, and a more extensive collection (including most of those
used for the examples in Chapters 1 , 9, 10, 1 1 and 1 2) is on the
diskette. Simulated ARMA series can easily be generated and filed
using the program PEST. Valuable sources of additional time-series
data are the collections of Makridakis et al. ( 1984) and Working
Paper 109 ( 1984) of Scientific Computing Associates, DeKalb,
Illinois.
Most of the material in the book is by now well-established in the
time series literature and we have therefore not attempted to give
credit for all the
X Preface to the First Edition
results discussed. Our indebtedness to the authors of some of the
well-known existing books on time series, in particular Anderson,
Box and Jenkins, Fuller, Grenander and Rosenblatt,, Hannan,
Koopmans and Priestley will however be apparent. We were also
fortunate to have access to notes on time series by W. Dunsmuir. To
these and to the many other sources that have influenced our
presentation of the subject we express our thanks.
Recursive techniques based on the Kalman filter and state-space
represen tations of ARMA processes have played an important role
in many recent developments in time series analysis. In particular
the Gaussian likelihood of a time series can be expressed very
simply in terms of the one-step linear predictors and their mean
squared errors, both of which can be computed recursively using a
Kalman filter. Instead of using a state-space representation for
recursive prediction we utilize the innovations representation of
an arbi trary Gaussian time series in order to compute best linear
predictors and exact Gaussian likelihoods. This approach, developed
by Rissanen and Barbosa, Kailath, Ansley and others, expresses the
value of the series at time t in terms of the one-step prediction
errors up to that time. This representation provides insight into
the structure of the time series itself as well as leading to
simple algorithms for simulation, prediction and likelihood
calculation.
These algorithms are used in the parameter estimation program
(PEST) found on the companion diskette. Given a data set of up to
2300 observations, the program can be used to find preliminary,
least squares and maximum Gaussian likelihood estimators of the
parameters of any prescribed ARIMA model for the data, and to
predict future values. It can also be used to simulate values of an
ARMA process and to compute and plot its theoretical auto
covariance and spectral density functions. Data can be plotted,
differenced, deseasonalized and detrended. The program will also
plot the sample auto correlation and partial autocorrelation
functions of both the data itself and the residuals after
model-fitting. The other time-series programs are SPEC, which
computes spectral estimates for univariate or bivariate series
based on the periodogram, and TRANS, which can be used either to
compute and plot the sample cross-correlation function of two
series, or to perform least squares estimation of the coefficients
in a transfer function model relating the second series to the
first (see Section 1 2.2). Also included on the diskette is a
screen editing program (WORD6), which can be used to create
arbitrary data files, and a collection of data files, some of which
are analyzed in the book. Instructions for the use of these
programs are contained in the file HELP on the diskette.
For a one-semester course on time-domain analysis and modelling at
the M.S. level, we have used the following sections of the
book:
1 . 1 - 1 .6; 2. 1 -2.7; 3 . 1 -3.5; 5. 1-5.5; 7. 1 , 7.2; 8 .
1-8.9; 9. 1 -9.6
(with brief reference to Sections 4.2 and 4.4). The prerequisite
for this course is a knowledge of probability and statistics at the
level ofthe book Introducti on to the Theory of Stati sti cs by
Mood, Graybill and Boes.
Preface to the First Edition XI
For a second semester, emphasizing frequency-domain analysis and
multi variate series, we have used
4. 1 -4.4, 4.6-4. 10; 10. 1 - 10.7; 1 1 . 1 - 1 1 .7; selections
from Chap. 1 2.
At the M.S. level it has not been possible (or desirable) to go
into the mathe matical derivation of all the results used,
particularly those in the starred sections, which require a
stronger background in mathematical analysis and measure theory.
Such a background is assumed in all of the starred sections and
problems.
For Ph.D. students the book has been used as the basis for a more
theoretical one-semester course covering the starred sections from
Chapters 4 through 1 1 and parts of Chapter 1 2. The prerequisite
for this course is a knowledge of measure-theoretic
probability.
We are greatly indebted to E.J. Hannan, R.H. Jones, S.l. Resnick,
S.Tavare and D. Tj0stheim, whose comments on drafts of Chapters 1-8
led to sub stantial improvements. The book arose out of courses
taught in the statistics department at Colorado State University
and benefitted from the comments of many students. The development
of the computer programs would not have been possible without the
outstanding work of Joe Mandarino, the architect of the computer
program PEST, and Anthony Brockwell, who contributed WORD6,
graphics subroutines and general computing expertise. We are
indebted also to the National Science Foundation for support for
the research related to the book, and one of us (P.J.B.) to Kuwait
University for providing an excellent environment in which to work
on the early chapters. For permis sion to use the optimization
program UNC22MIN we thank R. Schnabel of the University of Colorado
computer science department. Finally we thank Pam Brockwell, whose
contributions to the manuscript went far beyond those of typist,
and the editors of Springer-Verlag, who showed great patience and
cooperation in the final production of the book.
Fort Collins, Colorado October 1 986
P.J. BROCKWELL
R.A. DAVIS
CHAPTER I
Stationary Time Series § 1 . 1 Examples of Time Series § 1 .2
Stochastic Processes § 1 . 3 Stationarity and Strict Stationarity §
1 .4 The Estimation and Elimination of Trend and Seasonal
Components § 1 . 5 The Autocovariance Function of a Stationary
Process § 1 .6 The Multivariate Normal Distribution §1 .7*
Applications of Kolmogorov's Theorem
Problems
CHAPTER 2 Hilbert Spaces §2. 1 Inner-Product Spaces and Their
Properties §2.2 Hilbert Spaces §2.3 The Projection Theorem §2.4
Orthonormal Sets §2.5 Projection in IR" §2.6 Linear Regression and
the General Linear Model §2.7 Mean Square Convergence, Conditional
Expectation and Best
Linear Prediction in L 2(!1, :F, P) §2.8 Fourier Series §2.9
Hilbert Space Isomorphisms §2. 10* The Completeness of L2(Q, .?, P)
§2. 1 1 * Complementary Results for Fourier Series
Problems
62 65 67 68 69 73
XIV Contents
CHAPTER 3
Stationary ARMA Processes 77 §3.1 Causal and Invertible ARMA
Processes 77 §3.2 Moving Average Processes of Infinite Order 89
§3.3 Computing the Autocovariance Function of an ARMA(p, q) Process
9 1 §3.4 The Partial AutOCfimelation Function 98 §3.5 The
Autocovariance Generating Function 103 §3.6* Homogeneous Linear
Difference Equations with
Constant Coefficients 105 Problems 1 10
CHAPTER 4
The Spectral Representation of a Stationary Process 1 14 §4. 1
Complex-Valued Stationary Time Series 1 14 §4.2 The Spectral
Distribution of a Linear Combination of Sinusoids 1 16 §4.3
Herglotz's Theorem 1 1 7 §4.4 Spectral Densities and ARMA Processes
1 22 §4.5* Circulants and Their Eigenvalues 1 33 §4.6* Orthogonal
Increment Processes on [ -n, n] 1 38 §4.7* Integration with Respect
to an Orthogonal Increment Process 140 §4.8* The Spectral
Representation 143 §4.9* Inversion Formulae 1 50 §4. 1 0*
Time-Invariant Linear Filters 1 52 §4. 1 1 * Properties of the
Fourier Approximation h" to J(v. wJ 1 57
Problems 1 59
CHAPTER 5
Prediction of Stationary Processes 1 66 §5. 1 The Prediction
Equations in the Time Domain 1 66 §5.2 Recursive Methods for
Computing Best Linear Predictors 1 69 §5.3 Recursive Prediction of
an ARMA(p, q) Process 1 75 §5.4 Prediction of a Stationary Gaussian
Process; Prediction Bounds 1 82 §5.5 Prediction of a Causal
Invertible ARMA Process in
§5.6* §5.7* §5.8*
Prediction in the Frequency Domain The Wold Decomposition
Kolmogorov's Formula Problems
CHAPTER 6*
Asymptotic Theory §6. 1 Convergence in Probability §6.2 Convergence
in r'h Mean, r > 0 §6.3 Convergence in Distribution §6.4 Central
Limit Theorems and Related Results
Problems
1 82 1 85 1 87 1 9 1 1 92
1 98 1 98 202 204 209 2 1 5
Contents
CHAPTER 7
Estimation of the Mean and the Autocovariance Function §7. 1 §7.2
§7.3*
Estimation of J1 Estimation of y( ·) and p( · ) Derivation of the
Asymptotic Distributions Problems
CHAPTER 8
2 1 8 2 1 8 220 225 236
Estimation for ARMA Models 238 §8. 1 The Yule-Walker Equations and
Parameter Estimation for
Autoregressive Processes 239 §8.2 Preliminary Estimation for
Autoregressive Processes Using the
Durbin-Levinson Algorithm 241 §8.3 Preliminary Estimation for
Moving Average Processes Using the
Innovations Algorithm 245 §8.4 Preliminary Estimation for ARMA(p,
q) Processes 250 §8.5 Remarks on Asymptotic Efficiency 253 §8.6
Recursive Calculation of the Likelihood of an Arbitrary
Zero-Mean Gaussian Process 254 §8.7 Maximum Likelihood and Least
Squares Estimation for
ARMA Processes 256 §8.8 Asymptotic Properties of the Maximum
Likelihood Estimators 258 §8.9 Confidence Intervals for the
Parameters of a Causal Invertible
ARMA Process 260 §8. 1 0* Asymptotic Behavior of the Yule-Walker
Estimates 262 §8. 1 1 * Asymptotic Normality of Parameter
Estimators 265
Problems 269
CHAPTER 9
Model Building and Forecasting with ARIMA Processes §9. 1 ARIMA
Models for Non-Stationary Time Series §9.2 Identification
Techniques §9.3 Order Selection §9.4 Diagnostic Checking §9.5
Forecasting ARIMA Models §9.6 Seasonal ARIMA Models
Problems
CHAPTER 10
Inference for the Spectrum of a Stationary Process §10 . 1 The
Periodogram §10.2 Testing for the Presence of Hidden Periodicities
§ 10.3 Asymptotic Properties of the Periodogram § 10.4 Smoothing
the Periodogram § 10.5 Confidence Intervals for the Spectrum § 10.6
Autoregressive, Maximum Entropy, Moving Average and
Maximum Likelihood ARMA Spectral Estimators § 10.7 The Fast Fourier
Transform (FFT) Algorithm
273 274 284 301 306 3 14 320 326
330 3 3 1 334 342 350 362
365 373
XVI
§ 10.8* Derivation of the Asymptotic Behavior of the Maximum
Likelihood and Least Squares Estimators of the Coefficients of an
ARMA Process Problems
CHAPTER II Multivariate Time Series § 1 1 . 1 Second Order
Properties of Multivariate Time Series §1 1 .2 Estimation of the
Mean and Covariance Function § 1 1 .3 Multivariate ARMA Processes §
1 1 .4 Best Linear Predictors of Second Order Random Vectors § 1 1
. 5 Estimation for Multivariate ARMA Processes § 1 1 .6 The Cross
Spectrum § 1 1 .7 Estimating the Cross Spectrum §1 1 .8* The
Spectral Representation of a Multivariate Stationary
Time Series Problems
CHAPTER 12 State-Space Models and the Kalman Recursions § 12. 1
State-Space Models § 12.2 The Kalman Recursions § 12.3 State-Space
Models with Missing Observations § 12.4 Controllability and
Observability § 12.5 Recursive Bayesian State Estimation
Problems
CHAPTER 13 Further Topics §13. 1 Transfer Function Modelling § 13.2
Long Memory Processes § 1 3.3 Linear Processes with Infinite
Variance § 13.4 Threshold Models
Problems
Contents
401 402 405 4 1 7 421 430 434 443
454 459
506 506 520 535 545 552
555 561 567
Stationary Time Series
In this chapter we introduce some basic ideas of time series
analysis and stochastic processes. Of particular importance are the
concepts of stationarity and the autocovariance and sample
autocovariance functions. Some standard techniques are described
for the estimation and removal of trend and season ality (of known
period) from an observed series. These are illustrated with
reference to the data sets in Section 1 . 1 . Most of the topics
covered in this chapter will be developed more fully in later
sections of the book. The reader who is not already familiar with
random vectors and multivariate analysis should first read Section
1 .6 where a concise account of the required background is given.
Notice our convention that an n-dimensional random vector is
assumed (unless specified otherwise) to be a column vector X = (X
1, X 2, . . . , X nY of random variables. If S is an arbitrary set
then we shall use the notation sn to denote both the set of
n-component column vectors with components in S and the set of
n-component row vectors with components in S.
§ 1 . 1 Examples of Time Series
A time series is a set of observations x,, each one being recorded
at a specified time t. A discrete-time series (the type to which
this book is primarily devoted) is one in which the set T0 of times
at which observations are made is a discrete set, as is the case
for example when observations are made at fixed time intervals.
Continuous-time series are obtained when observations are recorded
continuously over some time interval, e.g. when T0 = [0, 1 ] . We
shall use the notation x(t) rather than x, if we wish to indicate
specifically that observations are recorded continuously.
2 1 . Stationary Time Series
EXAMPLE l.l.l (Current Through a Resistor). If a sinusoidal voltage
v(t) = a cos( vt + 8) is applied to a resistor of resistance r and
the current recorded continuously we obtain a continuous time
series
x(t) = r-1acos(vt + 8).
If observations are made only at times 1 , 2, . . . , the resulting
time series will be discrete. Time series of this particularly
simple type will play a fundamental role in our later study of
stationary time series.
0 . 5
- 1 5
- 2 0 1 0 2 0 30 40 50 60 7 0 8 0 9 0 1 00
Figure 1 . 1 . 1 00 observations of the series x(t) = cos(.2t +
n/3).
§ 1 . 1 . Examples of Time Series 3
EXAMPLE 1 . 1 .2 (Population x, of the U.S.A., 1 790- 1980).
x, x,
1 790 3,929,2 14 1 890 62,979,766 1 800 5,308,483 1 900 76,2 1 2, 1
68 1 8 10 7,239,88 1 19 10 92,228,496 1 820 9,638,453 1 920 106,021
,537 1830 1 2,860,702 1930 1 23,202,624 1 840 1 7,063,353 1940 1
32, 1 64,569 1 850 23, 1 9 1 ,876 1950 1 5 1 ,325,798 1 860 3 1
,443,321 1960 1 79,323, 1 75 1 870 38,558,371 1 970 203,302,03 1 1
880 50, 1 89,209 1980 226,545,805
2 6 0
1 00
8 0
6 0
40
40
0 1 78 0 1 83 0 1 8 80 1 9 3 0 1 9 80
Figure 1 .2. Population of the U.S.A. at ten-year intervals, 1 790-
1980 (U.S. Bureau of the Census).
4
EXAMPLE 1 . 1 .3 (Strikes in the U.S.A., 1 95 1 - 1980).
Ill 1J c 0 Ill 0 J: f.--
6
5
4 3
1951 1952 1953 1954 1955 1956 1957 1958 1 959 1960 1961 1962 1963 1
964 1965
x,
4737 1966 5 1 1 7 1 967 5091 1 968 3468 1969 4320 1970 3825 1 97 1
3673 1 972 3694 1973 3708 1974 3333 1 975 3367 1976 36 14 1 977
3362 1978 3655 1979 3963 1980
I. Stationary Time Series
x,
4405 4595 5045 5700 57 1 6 5 1 38 5010 5353 6074 503 1 5648 5506
4230 4827 3885
2 +--,-.-,-, 1950 1955 1 9 6 0 1 9 6 5 1 9 7 0 1 9 7 5 1980
Figure 1 .3 . Strikes in the U.S.A., 1 95 1 - 1 980 (Bureau of
Labor Statistics, U.S. Labor Department).
§I. I. Examples of Time Series
EXAMPLE 1 . 1 .4 (All Star Baseball Games, 1 933- 1980).
t- 1900 33 34 x , - I - I
t- 1900 49 50 x, - I I
t- 1900 65 66 x,
t =no game.
- 2
- 3
rp
if the National League won in year t, if the American League won in
year t.
37 38 39 40 4 1 42 43 44 - I - I I - I - I - I 53 54 55 56 57 58 59
60
I - I I I - I - I * *
69 70 7 1 72 73 74 75 76 - I
9*\
6 1 62 63 * * I
77 78 79 I
1 9 3 0 1 9 3 5 1 9 40 1945 1 9 5 0 1 9 5 5 1 9 6 0 1 9 6 5 1 9 7 0
1 975 1 9 8 0
Figure 1 .4. Results x,, Example 1 . 1 .4, of All-star baseball
games, 1933- 1980.
48 - I
64 I
6 I. Stationary Time Series
EXAMPLE 1 . 1 .5 (Wolfer Sunspot Numbers, 1 770- 1 869).
1 770 1 77 1 1 772 1 773 1 774 1 775 1 776 1 777 1 778 1 779 1 780
1781 1 782 1 783 1 784 1 785 1 786 1 787 1 788 1 789
101 82 66 35 3 1 7
20 92
1 54 1 25 85 68 38 23 10 24 83
1 32 1 3 1 1 1 8
1 790 1 79 1 1 792 1 793 1 794 1 795 1 796 1 797 1 798 1 799 1 800
1 80 1 1 802 1 803 1 804 1 805 1 806 1 807 1 808 1 809
90 67 60 47 41 2 1 16 6 4 7
14 34 45 43 48 42 28 10 8 2
1 8 1 0 1 8 1 1 1 8 1 2 1 8 1 3 1 8 1 4 1 8 1 5 1 8 1 6 1 8 1 7 1 8
1 8 1 8 19 1 820 1 82 1 1 822 1 823 1 824 1 825 1 826 1 827 1 828 1
829
0
5 12 14 35 46 4 1 30 24 1 6 7 4 2 8
1 7 36 50 62 67
1830 1 8 3 1 1 832 1 833 1 834 1 835 1 836 1 837 1 838 1 839 1 840
1 84 1 1 842 1 843 1 844 1 845 1 846 1 847 1 848 1 849
7 1 48 28 8
1 3 57
1 22 1 38 103 86 63 37 24 1 1 1 5 40 62 98
1 24 96
1 850 1 85 1 1 852 1 853 1 854 1 855 1 856 1 857 1 858 1 859 1 860
1 86 1 1 862 1 863 1 864 1 865 1 866 1 867 1 868 1 869
66 64 54 39 2 1 7 4
23 55 94 96 77 59 44 47 30 1 6 7
37 74
1 30
80 70
6 0
30 2 0
1 0 0
1 7 7 0 1 7 8 0 1 7 9 0 1 800 1 8 1 0 1 8 2 0 1830 1 8 4 0 1 85 0 1
8 6 0 1 8 7 0
Figure 1 .5. The Wolfer sunspot numbers, 1 770- 1 869.
§ 1 . 1 . Examples of Time Series
EXAMPLE 1 . 1 .6 (Monthly Accidental Deaths in the U.S.A., 1 973-1
978).
UJ "() c
1 1
1 0
8
7
Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec.
0
1973 1 974
9007 7750 8 1 06 698 1 8928 8038 9 1 37 8422
1 00 1 7 87 14 1 0826 95 1 2 1 13 1 7 1 0 1 20 1 0744 9823 97 1 3
8743 9938 9 1 29 9 1 6 1 87 10 8927 8680
1 2 2 4
1 975 1976 1 977 1 978
8 1 62 77 1 7 7792 7836 7306 7461 6957 6892 8 1 24 7776 7726 779 1
7870 7925 8 1 06 8 1 29 9387 8634 8890 9 1 1 5 9556 8945 9299
9434
10093 1 0078 10625 1 0484 9620 9 1 79 9302 9827 8285 8037 83 14 9 1
1 0 8433 8488 8850 9070 8 1 60 7874 8265 8633 8034 8647 8796
9240
3 6 4 8 6 0
7
7 2
Figure 1.6. Monthly accidental deaths in the U.S.A., 1 973- 1 978
(National Safety Council).
8 I. Stationary Time Series
These examples are of course but a few of the multitude of time
series to be found in the fields of engineering, science, sociology
and economics. Our purpose in this book is to study the techniques
which have been developed for drawing inferences from such series.
Before we can do this however, it is necessary to set up a
hypothetical mathematical model to represent the data. Having
chosen a model (or family of models) it then becomes possible to
estimate parameters, check for goodness of fit to the data and
possibly to use the fitted model to enhance our understanding of
the mechanism generating the series. Once a satisfactory model has
been developed, it may be used in a variety of ways depending on
the particular field of application. The applications include
separation (filtering) of noise from signals, prediction of future
values of a series and the control of future values.
The six examples given show some rather striking differences which
are apparent if one examines the graphs in Figures 1 . 1 - 1 .6.
The first gives rise to a smooth sinusoidal graph oscillating about
a constant level, the second to a roughly exponentially increasing
graph, the third to a graph which fluctuates erratically about a
nearly constant or slowly rising level, and the fourth to an
erratic series of minus ones and ones. The fifth graph appears to
have a strong cyclic component with period about 1 1 years and the
last has a pronounced seasonal component with period 12.
In the next section we shall discuss the general problem of
constructing mathematical models for such data.
§ 1.2 Stochastic Processes
The first step in the analysis of a time series is the selection of
a suitable mathematical model (or class of models) for the data. To
allow for the possibly unpredictable nature of future observations
it is natural to suppose that each observation x, is a realized
value of a certain random variable X,. The time series { x" t E T0
} is then a realization of the family of random variables {X, , t E
T0 } . These considerations suggest modelling the data as a
realization (or part of a realization) of a stochastic process {X,,
t E T} where T 2 T0 . To clarify these ideas we need to define
precisely what is meant by a stochastic process and its
realizations. In later sections we shall restrict attention to
special classes of processes which are particularly useful for
modelling many of the time series which are encountered in
practice.
Definition 1.2.1 (Stochastic Process). A stochastic process is a
family of random variables {X, , t E T} defined on a probability
space (Q, ff, P).
Remark 1. In time series analysis the index (or parameter) set Tis
a set of time points, very often {0, ± 1 , ± 2, . . . } , { 1 , 2,
3, . . . }, [0, oo ) or ( - oo, oo ). Stochastic processes in which
Tis not a subset of IR are also of importance. For example in
geophysics stochastic processes with T the surface of a sphere are
used to
§ 1 .2. Stochastic Processes 9
represent variables indexed by their location on the earth's
surface. In this book however the index set T will always be a
subset of IR.
Recalling the definition of a random variable we note that for each
fixed t E T, X, is in fact a function X,( . ) on the set n. On the
other hand, for each fixed wEn, X.(w) is a function on T.
Definition 1.2.2 (Realizations of a Stochastic Process). The
functions {X.(w), w E!l} on T are known as the realizations or
sample-paths of the process {X,, t E T} .
Remark 2. We shall frequently use the term time series to mean both
the data and the process of which it is a realization.
The following examples illustrate the realizations of some specific
stochastic processes. The first two could be considered as possible
models for the time series of Examples 1 . 1 . 1 and 1 . 1 .4
respectively.
ExAMPLE 1 .2. 1 (Sinusoid with Random Phase and Amplitude). Let A
and 0 be independent random variables with A :;:::: 0 and 0
distributed uniformly on (0, 2n). A stochastic process { X(t), t E
IR} can then be defined in terms of A and 0 for any given v :;::::
0 and r > 0 by
X, = r-1 A cos(vt + 0), ( 1 .2. 1 )
o r more explicitly,
X,(w) = r-1 A (w)cos(vt + 0(w)), ( 1 .2.2)
where w is an element of the probability space n on which A and 0
are defined. The realizations of the process defined by 1 .2.2 are
the functions of t
obtained by fixing w, i.e. functions of the form
x (t) = r-1 a cos(vt + (}).
The time series plotted in Figure 1 . 1 is one such
realization.
EXAMPLE 1 .2.2 (A Binary Process). Let {X,, t = 1, 2, . . . } be a
sequence of independent random variables for each of which
P(X, = 1 ) = P(X, = - 1) = l ( 1 .2.3)
In this case it is not so obvious as in Example 1 .2. 1 that there
exists a probability space (Q, ff, P) with random variables X 1 ,
X2 , . • • defined on n having the required joint distributions,
i.e. such that
( 1 .2.4)
for every n-tuple (i 1 , . . . , in) of 1 's and - 1 's. The
existence of such a process is however guaranteed by Kolmogorov's
theorem which is stated below and discussed further in Section 1
.7.
10 1 . Stationary Time Series
The time series obtained by tossing a penny repeatedly and scoring
+ 1 for each head, - I for each tail is usually modelled as a
realization of the process defined by ( 1 .2.4 ). Each realization
of this process is a sequence of 1 's and - 1 's.
A priori we might well consider this process as a model for the All
Star baseball games, Example 1 . 1 .4. However even a cursory
inspection of the results from 1 963 onwards casts serious doubt on
the hypothesis P(X, = 1) = t·
ExAMPLE 1 .2.3 (Random Walk). The simple symmetric random walk {S,
t = 0, I, 2, . . . } is defined in terms of Example 1 .2.2 by S0 =
0 and
t 1 . ( 1 .2.5)
The general random walk is defined in the same way on replacing X1
, X2 , . • •
by a sequence of independently and identically distributed random
variables whose distribution is not constrained to satisfy ( 1
.2.3). The existence of such an independent sequence is again
guaranteed by Kolmogorov's theorem (see Problem 1 . 1 8).
ExAMPLE 1 .2.4 (Branching Processes). There is a large class of
processes, known as branching processes, which in their most
general form have been applied with considerable success to the
modelling of population growth (see for example lagers ( 1 976)).
The simplest such process is the Bienayme Galton-Watson process
defined by the equations X0 = x (the population size in generation
zero) and
t = 0, 1, 2, 0 0 0 ' ( 1 .2.6)
where Z,,j, t = 0, I , . . . , j = 1 , 2, are independently and
identically distributed non-negative integer-valued random
variables, Z,,j, representing the number of offspring of the ph
individual born in generation t.
In the first example we were able to define X,(w) quite explicitly
for each t and w. Very frequently however we may wish (or be
forced) to specify instead the collection of all joint
distributions of all finite-dimensional vectors (X, , , X,2, . . .
, X,J, t = (t1, . . . , t") E T", n E {I , 2, . . . }. In such a
case we need to be sure that a stochastic process (see Definition 1
.2. 1 ) with the specified distributions really does exist.
Kolmogorov's theorem, which we state here and discuss further in
Section 1.7 , guarantees that this is true under minimal conditions
on the specified distribution functions. Our statement of Kolmo
gorov's theorem is simplified slightly by the assumption (Remark 1)
that T is a subset of IR and hence a linearly ordered set. If T
were not so ordered an additional "permutation" condition would be
required (a statement and proof of the theorem for arbitrary T can
be found in numerous books on probability theory, for example
Lamperti, 1 966).
§ 1 .3. Stationarity and Strict Stationarity 1 1
Definition 1.2.3 (The Distribution Functions of a Stochastic
Process {X ' t E Tc !R}). Let 5be the set of all vectors {t = (t 1
, . . . , tn)' E Tn: t 1 < t2 < · · · < tn , n = 1 , 2, .
. . }. Then the (finite-dimensional) distribution functions of {X '
t E T} are the functions { F1 ( • ), t E 5} defined for t = (t 1 ,
• • • , tn)' by
Theorem 1.2.1 (Kolmogorov's Theorem). The probabi li ty di stri
buti on functi ons { F1( • ), t E 5} are the di stri buti on functi
ons of some stochasti c process if and only if for any n E { 1 , 2,
. . . }, t = (t 1, . . . , tn)' E 5 and 1 :-:::; i :-:::; n,
lim F1(x) = F1<;>(x(i )) ( 1 .2.8)
where t (i ) and x(i ) are the (n - I )- component vectors obtai
ned by d eleti ng the i'h components oft and x respecti vely.
If (M · ) is the characteristic function corresponding to F1( • ),
i.e.
tP1(u) = l e ;u·x F. (d x 1 , . , . , d xn), J n U = (u 1 , . . . ,
un)' E !Rn,
then ( 1 .2.8) can be restated in the equivalent form,
lim tP1(u) = tPt(i)(u(i )), ui-+0
( 1 .2.9)
where u (i) is the (n - I )-component vector obtained by deleting
the i 1h component of u.
Condition ( 1 .2.8) is simply the "consistency" requirement that
each function F1( · ) should have marginal distributions which
coincide with the specified lower dimensional distribution
functions.
§ 1 .3 Stationarity and Strict Stationarity
When dealing with a finite number of random variables, it is often
useful to compute the covariance matrix (see Section 1 .6) in order
to gain insight into the dependence between them. For a time series
{X1, t E T} we need to extend the concept of covariance matrix to
deal with infinite collections of random variables. The
autocovariance function provides us with the required
extension.
Definition 1.3.1 (The Autocovariance Function). If {X, , t E T} is
a process such that Var(X1) < oo for each t E T, then the
autocovariance function Yx( · , · ) of { X1 } is defined by
Yx(r, s) = Cov(X,X.) = E[(X, - EX,) (Xs - EX5)], r, s E T. ( 1 .3 .
1 )
1 2 I . Stationary Time Series
Definition 1.3.2 (Stationarity). The time series { X0 t E Z } ,
with index set Z = {0, ± 1 , ± 2, . . . }, is said to be stationary
if
and
(i) E I X1 1 2 < oo for all t E Z,
(ii) EX1 = m for all t E £',
(iii) Yx(r, s) = Yx(r + t, s + t) for all r, s, t E £'.
Remark I. Stationarity as just defined is frequently referred to in
the literature as weak stationarity, covariance stationarity,
stationarity in the wide sense or second-order stationarity. For us
however the term stationarity, without further qualification, will
always refer to the properties specified by Definition 1
.3.2.
Remark 2. If { X1, t E Z } is stationary then Yx(r, s) = Yx(r - s,
0) for all r, s E £'. I t i s therefore convenient to redefine the
autocovariance function of a stationary process as the function of
just one variable,
Yx(h) = Yx(h, 0) = Cov(Xr+h > X1) for all t, h E £'.
The function YxC ) will be referred to as the autocovariance
function of { X1} and Yx(h) as its value at "lag" h. The
autocorrelation function (acf) of { X1} is defined analogously as
the function whose value at lag h is
Px(h) = Yx(h)!Yx(O) = Corr(Xr+h> X1) for all t, h E 7L.
Remark 3. It will be noticed that we have defined stationarity only
in the case when T = Z. It is not difficult to define stationarity
using a more general index set, but for our purposes this will not
be necessary. If we wish to model a set of data { X1, t E T c Z }
as a realization of a stationary process, we can always consider it
to be part of a realization of a stationary process { X1, t E Z }
.
Another important and frequently used notion of stationarity i s
introduced in the following definition.
Definition 1.3.3 (Strict Stationarity). The time series { X0 t E Z
} is said to be strictly stationary if the joint distributions
of(X1,, • • • , X1J and (X1, +h, . . . , Xr.+h)' are the same for
all positive integers k and for all t 1, . . . , tk , h E£'.
Strict stationarity means intuitively that the graphs over two
equal-length time intervals of a realization of the time series
should exhibit similar statistical characteristics. For example,
the proportion of ordinates not exceeding a given level x should be
roughly the same for both intervals.
Remark 4. Definition 1 .3 .3 is equivalent to the statement that (X
1, • • • , Xk)' and (X l+h' . . . , Xk+h)' have the same joint
distribution for all positive integers k and integers h.
§ 1 .3. Stationarity and Strict Stationarity 13
The Relation Between Stationarity and Strict Stationarity
If { X1 } is strictly stationary it immediately follows, on taking
k = 1 in Definition 1 .3 .3, that X1 has the same distribution for
each t E 7!.. . If E IX1I 2 < oo this implies in particular that
EX1 and Var(X1) are both constant. Moreover, taking k = 2 in
Definition 1 .3.3, we find that Xt+h and X1 have the same joint
distribution and hence the same covariance for all h E 7!... Thus a
strictly stationary process with finite second moments is
stationary.
The converse of the previous statement is not true. For example if
{X1 } is a sequence of independent random variables such that X1 is
exponentially distributed with mean one when t is odd and normally
distributed with mean one and variance one when t is even, then
{X1} is stationary with Yx(O) = 1 and Yx(h) = 0 for h =F 0. However
since X 1 and X 2 have different distributions, { X1 } cannot be
strictly stationary.
There is one important case however in which stationarity does
imply strict stationarity.
Definition 1 .3.4 (Gaussian Time Series). The process { X1 } is a
Gaussian time series if and only if the distribution functions of {
X1} are all multivariate normal.
If { Xn t E 7!.. } is a stationary Gaussian process then { X1 } is
strictly stationary, since for all n E { 1 , 2, . . . } and for all
h, t 1 , t2 , • • • E Z, the random vectors (X1,, . . • , X1} and
(X1, +h• . . . , X1"+h)' have the same mean and covariance matrix,
and hence the same distribution.
ExAMPLE 1 .3 . 1 . Let X1 = A cos(8t) + B sin(8t) where A and B are
two uncor related random variables with zero means and unit
variances with 8 E [ -n, n]. This time series is stationary
since
Cov(Xr+h• X1) = Cov(A cos(8(t + h)) + B sin(8(t + h)), A cos(8t) +
B sin(8t))
= cos(8t)cos(8(t + h)) + sin(8t)sin(8(t + h))
which is independent of t.
EXAMPLE 1 .3.2. Starting with an independent and identically
distributed sequence of zero-mean random variables Z1 with finite
variance ai, define XI = zl + ezt-1· Then the autocovariance
function of XI is given by
Cov(Xt+h• XI) = Cov(Zt+h + ezt+h-1> zl + ezt-1 ) { ( 1 + 82 )al
if h = 0, = 8al if h = ± 1 ,
0 if I h i > 1 ,
14 I. Stationary Time Series
and hence { X1 } is stationary. In fact it can be shown that { X1 }
is strictly stationary (see Problem 1 . 1 ).
EXAMPLE 1 .3 .3. Let
{Y, x-- ¥,+ 1 if t is even, if t is odd.
where { Y, } is a stationary time series. Although Cov(Xr+h• X1) =
yy(h), { X1 } is not stationary for it does not have a constant
mean.
EXAMPLE 1 .3.4. Referring to Example 1 .2.3, let st be the random
walk S1 = X 1 + X 2 + · · · + X, where X 1, X 2 , . . . , are
independent and identically distributed with mean zero and variance
(J2 . For h > 0, ( t+h t )
Cov(Sr+h• S1) = Cov ; X;, j
Xj
and thus st is not stationary.
Stationary processes play a crucial role in the analysis of time
series. Of course many observed time series (see Section 1 . 1 )
are decidedly non stationary in appearance. Frequently such data
sets can be transformed by the techniques described in Section 1 .4
into series which can reasonably be modelled as realizations of
some stationary process. The theory of stationary processes
(developed in later chapters) is then used for the analysis,
fitting and prediction of the resulting series. In all of this the
autocovariance function is a primary tool. Its properties will be
discussed in Section 1 .5 .
§ 1 .4 The Estimation and Elimination of Trend and Seasonal
Components
The first step in the analysis of any time series is to plot the
data. If there are apparent discontinuities in the series, such as
a sudden change of level, it may be advisable to analyze the series
by first breaking it into homogeneous segments. If there are
outlying observations, they should be studied carefully to check
whether there is any justification for discarding them (as for
example if an observation has been recorded of some other process
by mistake). Inspection of a graph may also suggest the possibility
of representing the data as a realization of the process (the
"classical decomposition" model),
§ 1 .4. The Estimation and Elimination of Trend and Seasonal
Components 1 5
X, = m, + s, + r;, ( 1 .4. 1 )
where m , i s a slowly changing function known as a "trend
component", s, i s a function with known period d referred to as a
"seasonal component", and r; is a "random noise component" which is
stationary in the sense of Definition 1 .3.2. If the seasonal and
noise fluctuations appear to increase with the level of the process
then a preliminary transformation of the data is often used to make
the transformed data compatible with the model ( 1 .4. 1) . See for
example the airline passenger data, Figure 9.7, and the transformed
data, Figure 9.8, obtained by applying a logarithmic
transformation. In this section we shall discuss some useful
techniques for identifying the components in ( 1 .4. 1) .
Our aim is to estimate and extract the deterministic components m,
and s, in the hope that the residual or noise component r; will
turn out to be a stationary random process. We can then use the
theory of such processes to find a satisfactory probabilistic model
for the process {I; } , to analyze its properties, and to use it in
conjunction with m, and s, for purposes of prediction and control
of {X, } .
An alternative approach, developed extensively by Box and Jenkins (
1970), is to apply difference operators repeatedly to the data { x,
} until the differenced observations resemble a realization of some
stationary process {Wr}. We can then use the theory of stationary
processes for the modelling, analysis and prediction of { Wr } and
hence of the original process. The various stages of this procedure
will be discussed in detail in Chapters 8 and 9.
The two approaches to trend and seasonality removal, (a) by
estimation of m, and s, in ( 1 .4. 1 ) and (b) by differencing the
data { x, } , will now be illustrated with reference to the data
presented in Section 1 . 1 .
Elimination of a Trend i n the Absence of Seasonality
In the absence of a seasonal component the model ( 1 .4. 1 )
becomes
t = 1, . . . , n ( 1 .4.2)
where, without loss of generality, we can assume that EI; =
0.
Method 1 (Least Squares Estimation of m, ). In this procedure we
attempt to fit a parametric family of functions, e.g.
( 1 .4.3)
to the data by choosing the parameters, in this illustration a0, a
1 and a2 , to minimize ,L, (x, - m,f .
Fitting a function of the form ( 1 .4.3) to the population data of
Figure 1 .2, 1 790 :::::; t :::::; 1 980 gives the estimated
parameter values,
llo = 2.0979 1 1 X 1 0 1 0,
a1 = -2.334962 x 107,
2 6 0
2 4 0
2 2 0
2 0 0
1 8 0
1 60 0
2- 1 20
4 0
2 0
0 1 78 0 1 8 3 0 188 0 1 93 0 1 98 0
Figure 1 .7. Population of the U.S.A., 1 790- 1980, showing the
parabola fitted by least squares.
and a2 = 6.49859 1 x 1 03.
A graph of the fitted function is shown with the original data in
Figure 1 .7. The estimated values of the noise process 1;, 1 790 $;
t $; 1 980, are the residuals obtained by subtraction of m t = ao +
a! t + llzt2 from xt.
The trend component m1 furnishes us with a natural predictor of
future values of X1• For example if we estimate ¥1990 by its mean
value (i.e. zero) we obtain the estimate,
m1990 = 2.484 x 1 08,
for the population of the U.S.A. in 1 990. However if the residuals
{ Yr} are highly correlated we may be able to use their values to
give a better estimate of ¥1990 and hence of X 1990 .
Method 2 (Smoothing by Means of a Moving Average). Let q be a non
negative integer and consider the two-sided moving average,
q w, = (2q + 1 )-1 L xt+j• ( 1 .4.4)
j=-q of the process { X1 } defined by ( 1 .4.2). Then for q + 1 $;
t $; n - q,
q q w, = (2q + 1 )-
l 2: mt+j + (2q + 1 )-l 2: Yr+j j=-q j=-q ( 1 .4.5)
§ 1.4. The Estimation and Elimination of Trend and Seasonal
Components 17
assuming that m, is approximately linear over the interval [t - q,
t + q] and that the average of the error terms over this interval
is close to zero.
The moving average thus provides us with the estimates q
m, = (2q + W1 L x,+ j, q + 1 ::; t ::; n - q. ( 1 .4.6) j=-q Since
X, is not observed for t ::; 0 or t > n we cannot use ( 1 .4.6)
for t ::; q or t > n- q. The program SMOOTH deals with this
problem by defining X,:= X 1 for t < 1 and X,:= X n for t >
n. The results of applying this program to the strike data of
Figure 1 . 3 are shown in Figure 1 .8. The estimated noise terms,
Y, = X, - m" are shown in Figure 1 .9. As expected, they show no
apparent trend.
For any fixed a E [0, 1], the one-sided moving averages m,, t = 1 ,
. . . , n,
defined by the recursions, m, = aX, + ( 1 - a)m,_1, t = 2, . . . ,
n, ( 1 .4.7)
and ( 1 .4.8)
can also be computed using the program SMOOTH. Application of ( 1
.4.7) and ( 1 .4.8) is often referred to as exponential smoothing,
since it follows from these recursions that, for t :;:o: 2, m, = Li
a(l - a'jX,_ i + (1 - a)'- 1 X 1 , a weighted moving average of X,,
X,_ 1, • • . , with weights decreasing expo nentially (except for
the last one).
It is useful to think of { m,} in ( 1 .4.6) as a process obtained
from {X,} by application of a linear operator or linear filter, m,
= L-co ajx,+ j with
6
t:, 4
2 +- 1950 1955 1960 1965 1970 1975 1980
Figure 1 .8. Simple 5-term moving average m, of the strike data
from Figure 1 .3.
18 I . Stationary Time Series
1 ,-------------------------------------------------, 0.9 0.8 0.7
0.6 0.5 0.4 0.3
'-;;' 0.2 -g 0.1 0 04-4---+-4-+------r--_,-+--+- :J 0 -0.1 .r: t:.,
-0.2
-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9
-1 1950 1955 1960 1965 1970 1975 1980
Figure 1 .9. Residuals, Y, = x, - m,, after subtracting the 5-term
moving average from the strike data.
weights aj = (2q + 1 )-1 , - q s j s q, and aj = 0, Ul > q. This
particular filter is a "low-pass" filter since it takes the data {
x,} and removes from it the rapidly fluctuating (or high frequency)
component { Y,}, to leave the slowly varying estimated trend term {
m,} (see Figure 1 . 1 0).
{x,} Linear filter
Figure 1 . 1 0. Smoothing with a low-pass linear filter.
The particular filter ( 1 .4.6) is only one of many which could be
used for smoothing. For large q, provided (2q + 1 )- 1 2J=-q Y,+i
0, it will not only attenuate noise but at the same time will allow
linear trend functions m, =
at + b, to pass without distortion. However we must beware of
choosing q to be too large since if m, is not linear, the filtered
process, although smooth, will not be a good estimate of m,. By
clever choice of the weights { aj} it is possible to design a
filter which will not only be effective in attenuating noise from
the data, but which will also allow a larger class of trend
functions (for example all polynomials of degree less than or equal
to 3) to pass undistorted through the filter. The Spencer 1 5-point
moving average for example has weights
ai = 0, I ii > 7,
§ 1 .4. The Estimation and Elimination of Trend and Seasonal
Components 1 9
with
I i i :,;; 7, and
[a0 , a1 , ... , a7 ] = 3i0 [74, 67, 46, 2 1, 3, - 5, - 6, - 3]
.
Applied to the process ( 1 .4.3) with m, = at3 + bt2 + ct + d , it
gives 7 7 7
L a;Xt+i = L a;mt+i + L a; Yr+i i=-7 i= -7 i=- 7
7
= mo
( 1 .4.9)
where the last step depends on the assumed form of m, (Problem 1
.2). Further details regarding this and other smoothing filters can
be found in Kendall and Stuart, Volume 3, Chapter 46.
Method 3 (Differencing to Generate Stationary Data). Instead of
attempting to remove the noise by smoothing as in Method 2, we now
attempt to eliminate the trend term by differencing. We define the
first difference operator V by
VX, =X,- Xt-1 = ( 1 - B)X0
where B is the backward shift operator,
BX, = X,-1·
( 1 .4. 1 0)
( 1 .4. 1 1 )
Powers of the operators B and V are defined in the obvious way,
i.e. Bj(X,) = X,_j and Vj(X,) = V(Vj-1(X,)),j 1 with V0(X,) =X,.
Polynomials in B and V are manipulated in precisely the same way as
polynomial functions of real variables. For example
=X,- 2X,_1 + X,_z.
If the operator V is applied to a linear trend function m1 = at +
b, then we obtain the constant function Vm, = a. In the same way
any polynomial trend of degree k can be reduced to a constant by
application of the operator Vk
(Problem 1 .4). Starting therefore with the model X, = m, + Yr
where m, = LJ=o alj and
Yr is stationary with mean zero, we obtain
VkX, = k! ak + VkYr,
a stationary process with mean k!ak. These considerations suggest
the possibility, given any sequence { x,} of data, of applying the
operator V repeatedly until we find a sequence {Vkx,} which can
plausibly be modelled as a realization of a stationary process. It
is often found in practice that the
20 I. Stationary Time Series
2 0
1 5
1 0
- 5
- 1 0
- 1 5
- 20 1 78 0 1 8 3 0 1 880 1 9 3 0 1 980
Figure 1 . 1 1 . The twice-differenced series derived from the
population data of Figure 1 .2.
order k of differencing required is quite small, frequently one or
two. (This depends on the fact that many functions can be well
approximated, on an interval of finite length, by a polynomial of
reasonably low degree.)
Applying this technique to the twenty population values { xn , n =
1 , . . . , 20} of Figure 1 .2 we find that two differencing
operations are sufficient to produce a series with no apparent
trend. The differenced data, V2 xn = xn - 2xn- t + xn- z, are
plotted in Figure 1 . 1 1 . Notice that the magnitude of the
fluctuations in V2 xn increase with the value of xn - This effect
can be suppressed by first taking natural logarithms, Yn = In xn,
and then applying the operator V2 to the series { Yn } · (See also
Section 9.2 (a).)
Elimination of both Trend and Seasonality
The methods described for the removal of trend can be adapted in a
natural way to eliminate both trend and seasonality in the general
model
( 1 .4. 1 2)
where El; = 0, st+d = S1 and I1=t si = 0. We illustrate these
methods, with reference to the accident data of Example 1 . 1 .6
(Figure 1 .6) for which the period d of the seasonal component is
clearly 1 2.
It will be convenient in Method 1 to index the data by year and
month. Thus xi.k >j = 1, . . . , 6, k = 1, . . . , 1 2 will
denote the number ofaccidental deaths
§ 1 .4. The Estimation and Elimination of Trend and Seasonal
Components 2 1
reported for the k th month of the/h year, ( 1 972 + j). I n other
words we define
j = 1 ' . . . ' 6, k = 1 ' . . . ' 1 2.
Method Sl (The Small Trend Method). If the trend is small (as in
the accident data) it is not unreasonable to suppose that the trend
term is constant, say mi, for the /h year. Since :Lf,:1 sk = 0, we
are led to the natural unbiased estimate
1 1 2 mj = 1 2 I xj. k > k=l
while for sk > k = 1 , . . . , 1 2 we have the estimates,
1 6 .sk = -6 I (xj.k - mJ,
j= l
( 1 .4. 14)
which automatically satisfy the requirement that :LI,:1 sk = 0. The
estimated error term for month k of the /h year is of course
Y k = x. k - m - .sk ), j, J ' j = 1 , . . . , 6, k = 1 , . . . , 1
2. ( 1 .4. 1 5)
The generalization of( 1 .4 . 1 3)-( 1 .4. 1 5) to data with
seasonality having a period other than 1 2 should be
apparent.
In Figures 1 . 1 2, 1 . 1 3 and 1 . 1 4 we have plotted
respectively the detrended observations xj, k - mi, the estimated
seasonal components sk > and the de-
Vl u c
- 1
- 2
0 1 2 2 4 3 6 48 6 0 7 2
Figure 1 . 1 2. Monthly accidental deaths from Figure 1 .6 after
subtracting the trend estimated by Method S l .
22
- 1
- 2
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 3 . The seasonal component o f the monthly accidental
deaths, estimated by Method S l .
Vl D c
- 1
- 2
0 1 2 2 4 3 6 4 8 6 0 7 2
Figure 1 . 1 4. The detrended and deseasonalized monthly accidental
deaths (Method S l).
§ 1 .4. The Estimation and Elimination of Trend and Seasonal
Components 23
(/) u c 0 (/) :J 0 .J:
t:.
7
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 5. Comparison of the moving average and piecewise
constant estimates of trend for the monthly accidental
deaths.
trended, deseasonalized observations .k = xi. k - mi - sk . The
latter have no apparent trend or seasonality.
Method S2 (Moving Average Estimation). The following technique is
preferable to Method S 1 since it does not rely on the assumption
that mr is nearly constant over each cycle. It is the basis for the
"classical decomposition" option in the time series identification
section of the program PEST.
Suppose we have observations {x 1 , . . . , x.} . The trend is
first estimated by applying a moving average filter specially
chosen to eliminate the seasonal component and to dampen the noise.
If the period d is even, say d = 2q, then we use
mt = (0.5Xr-q + Xr-q+l + ' ' ' + Xr+q- 1 + 0.5Xr+q)/d, ( 1 .4. 1
6)
q < t s n - q.
If the period is odd, say d = 2q + 1, then we use the simple moving
average ( 1 .4.6).
In Figure 1 . 1 5 we show the trend estimate mn 6 < t s 66, for
the accidental deaths data obtained from ( 1 .4. 1 6). Also shown
is the piecewise constant estimate obtained from Method S l .
The second step is to estimate the seasonal component. For each k =
1 , . . . , d we compute the average wk of the deviations { (xk+ id
- mk+ id) : q < k + jd s n - q}. Since these average deviations
do not necessarily sum to zero, we
24 1 . Stationary Time Series
Table 1 . 1 . Estimated Seasonal Components for the Accidental
Deaths Data k 2 3 4 5 6 7 8 9 10 1 1
., (Method S 1) - 744 - 1 504 - 724 - 523 338 808 1 665 96 1 - 87
197 - 32 1
., (Method S2) - 804 - 1 522 - 737 - 526 343 746 1 680 987 - 109
258 - 259
estimate the seasonal component sk as
k = 1 , . . . , d, ( 1 .4. 1 7)
and sk = sk-d• k > d. The deseasonalized data is then defined to
be the original series with the
estimated seasonal component removed, i.e.
d, = x, - s, t = 1 , . . . , n. ( 1 .4. 1 8)
Finally we reestimate the trend from { d, } either by applying a
moving average filter as described earlier for non-seasonal data,
or by fitting a polynomial to the series { d, } . The program PEST
allows the options of fitting a linear or quadratic trend m,. The
estimated noise terms are then
5; = x, - m, - s,, t = 1 , . . . , n.
The results of applying Methods S l and S2 to the accidental deaths
data are quite similar, since in this case the piecewise constant
and moving average estimates of m, are reasonably close (see Figure
1 . 1 5).
A comparison of the estimates of sk > k = 1 , . . . , 1 2,
obtained by Methods S 1 and S2 is made in Table 1 . 1 .
Method S3 (Differencing a t Lag d). The technique of differencing
which we applied earlier to non-seasonal data can be adapted to
deal with seasonality of period d by introducing the lag-d
difference operator vd defined by
( 1 .4. 1 9)
(This operator should not be confused with the operator Vd = ( 1 -
B)d defined
earlier.) Applying the operator Vd to the model,
X, = m, + s, + Y,,
where { s, } has period d, we obtain
which gives a decomposition of the difference vdxt into a trend
component (m, - m,_d) and a noise term ( Y, - Y,-d). The trend, m,
- m,_d, can then be eliminated using the methods already described,
for example by application of some power of the operator V.
Figure 1 . 1 6 shows the result of applying the operator V1 2 to
the accidental
1 2
Vl 1J c
- 1
- 2
0 1 2 2 4 3 6 48 6 0 7 2
Figure 1. 16. The differenced series {V 1 2 x,, t = 1 3, . . . ,
72} derived from the monthly accidental deaths {x, , t = ! , . . .
, 72}.
deaths data. The seasonal component evident in Figure 1 .6 is
absent from the graph of V 1 2 x, 1 3 :s:; t :s:; 72. There still
appears to be a non-decreasing trend however. If we now apply the
operator V to V1 2x, and plot the resulting differences VV1 2x,, t
= 14, . . . , 72, we obtain the graph shown in Figure 1 . 1 7,
which has no apparent trend or seasonal component. In Chapter 9 we
shall show that the differenced series can in fact be well
represented by a stationary time series model.
In this section we have discussed a variety of methods for
estimating and/or removing trend and seasonality. The particular
method chosen for any given data set will depend on a number of
factors including whether or not estimates of the components of the
series are required and whether or not it appears that the data
contains a seasonal component which does not vary with time. The
program PEST allows two options, one which decomposes the series as
described in Method S2, and the other which proceeds by successive
differencing of the data as in Methods 3 and S3.
§1.5 The Autocovariance Function of a Stationary Process
In this section we study the properties of the autocovariance
function intro duced in Section 1 .3.
26
- 1
- 2
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 7. The differenced series {VV1 2 x,, t = 14, . . . ,
72} derived from the monthly accidental deaths {x, , t = 1, . . . ,
72}.
Proposition 1 .5.1 (Elementary Properties). If y( · ) is the
autocovariance function of a stationary process {X, t E Z} ,
then
y(O) :;::.: 0,
l y (h) l :::;; y(O) for all h E Z,
and y( · ) is even, i.e.
y (h) = y( - h) for all h E Z.
( 1 .5. 1 )
( 1 .5 .2)
( 1 .5.3)
PROOF. The first property is a statement of the obvious fact that
Var(X,) :;::>: 0, the second is an immediate consequence of the
Cauchy-Schwarz inequality,
and the third is established by observing that
y( - h) = Cov(X,_h , X,) = Cov(X, X,+h) = y(h). D
Autocovariance functions also have the more subtle property of non
negative definiteness.
Definition 1 .5.1 (Non-Negative Definiteness). A real-valued
function on the integers, K : Z --> IR, is said to be
non-negative definite if and only if
§1 .5. The Autocovariance Function of a Stationary Process
n L a;K(t; - ti)ai 0 i,j= l
27
( 1 .5 .4)
for all positive integers n and for all vectors a = (a 1 , . . . ,
anY E !Rn and t = (t 1 , . . . , tnY E zn or if and only if Li.i =
1 a;K(i - j)ai 0 for all such n and a.
Theorem 1 .5.1 (Characterization of Autocovariance Functions). A
real-valued function defined on the integers is the autocovariance
function of a stationary time series if and only if it is even and
non-negative definite.
PROOF. To show that the autocovariance function y( · ) of any
stationary time series {X, } is non-negative definite, we simply
observe that if a = (a1 , • • • , anY E !Rn, t = (t 1 , . . . ,
tn)' E zn, and Z1 = (X,, - EX,, , . . . , X,., - EX,J', then
= a'rna n
= L a;y(t; - ti)ai, i,j=l
where rn = [y(t; - ti)]i.i= l is the covariance matrix of (X, , , .
. . , X,). To establish the converse, let K : Z --> IR be an
even non-negative definite
function. We need to show that there exists a stationary process
with K( · ) as its autocovariance function, and for this we shall
use Kolmogorov's theorem. For each positive integer n and each t =
(t 1' . . . ' tnY E zn such that t 1 < t2 < · · · < tn ,
let F1 be the distribution function on !Rn with characteristic
function
tP1(u) = exp( - u'Ku/2),
where u = (u 1 , . . . , unY E !Rn and K = [K(t; - ti)]i.i=I ·
Since K is non-negative definite, the matrix K is also non-negative
definite and consequently tPt is the characteristic function of an
n-variate normal distribution with mean zero and covariance matrix
K (see Section 1 .6). Clearly, in the notation of Theorem 1 .2. 1
,
tPt< ;>(u(i)) = lim tP1(u) for each t E Y, uc-·""0
i.e. the distribution functions F1 are consistent, and so by
Kolmogorov's theorem there exists a time series {X, } with
distribution functions F1 and characteristic functions tP1, t E Y.
In particular the joint distribution of X; and Xi is bivariate
normal with mean 0 and covariance matrix
[ K(O) K(i - j)J K(i - j) K(O) '
which shows that Cov(X;, XJ = K(i - j) as required. D
28 I . Stationary Time Series
Remark l . As shown in the proof of Theorem 1 .5. 1 , for every
autocovariance function y( · ), there exists a stationary Gaussian
time series with y( · ) as its autocovariance function.
Remark :Z. To verify that a given function is non-negative definite
it is sometimes simpler to specify a stationary process with the
given autocovariance function than to check Definition 1 .5. 1 .
For example the function K(h) = cos(Bh), h E Z, is the
autocovariance function of the process in Example 1 .3 . 1 and is
therefore non-negative definite. Direct verification by means of
Definition 1 .5 . 1 however is more difficult. Another simple
criterion for checking non-negative definite ness is Herglotz's
theorem, which will be proved in Section 4.3.
Remark 3. An autocorrelation function p( ·) has all the properties
of an autocovariance function and satisfies the additional
condition p(O) = 1 .
ExAMPLE 1 . 5 . 1 . Let us show that the real-valued function on Z,
{1 if h = 0, K(h) = p if h = ± 1 ,
0 otherwise,
is an autocovariance function if and only if I P I t. If I p I i
then K ( · ) i s the autocovariance function of the process defined
in
Example 1 .3.2 with (J2 = ( 1 + B2r1 and e = (2pr1 ( 1 ± j1 - 4p2
). If p > !, K = [K(i - j)J7,j=t and a is the n-component vector
a =
( 1 , - 1 , 1 , - 1 , . . . )', then
a'Ka = n - 2(n - 1)p < 0 for n > 2pj(2p - 1 ),
which shows that K( · ) is not non-negative definite and therefore,
by Theorem 1 .5 . 1 is not an autocovariance function.
If p < -i, the same argument using the n-component vector a = (
1 , 1 , 1 , . . . )' again shows that K( · ) is not non-negative
definite.
The Sample Autocovariance Function of an Observed Series
From the observations {x 1 , x2 , . . . , xn } of a stationary time
series {Xr } we frequently wish to estimate the autocovariance
function y( · ) of the underlying process { Xr } in order to gain
information concerning its dependence structure. This is an
important step towards constructing an appropriate mathematical
model for the data. The estimate of y( · ) which we shall use is
the sample autocovariance function.
Definition 1 .5.2. The sample autocovariance function of { x 1 , .
. . , xn } is defined by
§1 .5. The Autocovariance Function of a Stationary Process
n-h P(h) := n-1 I (xj+h - x) (xj - x),
j= 1 0 :<::; h < n,
29
and y(h) = y( - h), - n < h :-:::; 0, where .X is the sample
mean .X = n-1 I'i=1 xi. Remark 4. The divisor n is used rather than
(n - h) since this ensures that the matrix f" := [y(i - j)J7. j= 1
is non-negative definite (see Section 7.2).
Remark 5. The sample autocorrelation function is defined in terms
of the sample autocovariance function as
p(h) := y(h)/Y(O), l h l < n.
The corresponding matrix Rn := [p(i - j)J7. i=1 is then also
non-negative definite.
Remark 6. The large-sample properties of the estimators y(h) and
p(h) are discussed in Chapter 7.
EXAMPLE 1 .5.2. Figure 1 . 1 8(a) shows 300 simulated observations
of the series X, = Z, + 8Z,_1 of Example 1 .3 .2 with 8 = 0.95 and
Z, N(O, 1 ). Figure 1 . 1 8 (b) shows the corresponding sample
autocorrelation function at lags 0, . . . , 40. Notice the
similarity between p( · ) and the function p( · ) computed as
described in Example 1 .3.2 (p(h) = 1 for h = 0, .4993 for h = ± 1
, 0 otherwise).
EXAMPLE 1 .5 .3 . Figures 1 . 1 9(a) and 1 . 1 9(b) show simulated
observations and the corresponding sample autocorrelation function
for the process X, =
Z, + 8Z,_1 , this time with 8 = - 0.95 and Z, N(O, 1 ). The
similarity between p( · ) and p( · ) is again apparent.
Remark 7. Notice that the realization of Example 1 .5.2 is less
rapidly fluctuating than that of Example 1 .5.3. This is to be
expected from the two autocorrelation functions. Positive
autocorrelation at lag 1 reflects a tendency for successive
observations to lie on the same side of the mean, while negative
autocorrelation at lag I reflects a tendency for successive
observations to lie on opposite sides of the mean. Other properties
of the sample-paths are also reflected in the autocorrelation (and
sample autocorrelation) functions. For example the sample
autocorrelation function of the Wolfer sunspot series (Figure 1
.20) reflects the roughly periodic behaviour of the data (Figure 1
.5).
Remark 8. The sample autocovariance and autocorrelation functions
can be computed for any data set {x 1 , . . . , xn } and are not
restricted to realizations of a stationary process. For data
containing a trend, I P{h) l will exhibit slow decay as h
increases, and for data with a substantial deterministic periodic
component, p(h) will exhibit similar behaviour with the same
periodicity. Thus p( · ) can be useful as an indicator of
non-stationarity (see also Section 9. 1 ).
30
5
4
3
2
0
- 1
- 2
- 3
- 4
- 5 0 50
1 0 . 9 0 .8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 . 5 - 0 6 - 0 . 7 - 0 .8 - 0
. 9
- 1 0
3 0 4 0
Figure 1 . 1 8. (a) 300 observations of the series X, = Z, + .95Z,
_ 1 , Example 1 .5.2. (b) The sample autocorrelation function p(h),
0 :s; h :s; 40.
§ 1 .5. The Autocovariance Function of a Stationary Process
0
- 1
- 2
- 3
- 4
- 5
1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 5 - 0 . 6 - 0 . 7 - 0 . 8 - 0
. 9
- 1
(a)
20
(b)
3 0 4 0
Figure 1 . 1 9. (a) 300 observations of the series X, = Z, - .95Z,
_ 1 , Example 1 .5.3. (b) The sample autocorrelation function p(h),
0 h 40.
32 I . Stationary Time Series
1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 . 5 - 0 . 6 - 0 . 7 - 0 . 8 -
0 . 9
- 1 c 1 0 2 0 3 0 4 0
Figure 1.20. The sample autocorrelation function of the Wolfer
sunspot numbers (see Figure 1 .5).
§ 1 .6 The Multivariate Normal Distribution
An n-dimensional random vector is a column vector, X = (X 1 , . . .
, X.)', each of whose components is a random variable. If E I X; !
< oo for each i, then we define the mean or expected value of X
to be the column vector,
( 1 .6. 1 )
I n the same way we define the expected value of any array whose
elements are random variables (e.g. a matrix of random variables)
to be the same array with each random variable replaced by its
expected value (assuming each expectation exists).
If X = (X 1 , . . . , X.)' and Y = ( Y1 , . . . , Ym)' are random
vectors such that E I X; / 2 < oo, i = 1 , . . . , n, and E l ¥;
1 2 < oo, i = 1 , . . . , m, we define the co variance matrix
of X and Y to be the matrix,
xv = Cov(X, Y) = E [(X - EX) (Y - EY)' ]
= E(XY') - (EX) (EY)'. ( 1 .6.2)
The (i,j)-element ofxv is the covariance, Cov(X;, lj) = E(X; lj) -
E(X;)E(lj). In the special case when Y = X, Cov(X, Y) reduces to
the covariance matrix of X.
§1 .6. The Multivariate Normal Distribution 33
Proposition 1 .6.1 . If a is an m-component column vector, B is an
m x n matrix and X = (X1 , • • . , Xn)' where E I X; I 2 < oo, i
= 1, . . . , n, then the random vector,
Y = a + BX, ( 1 .6.3)
has mean EY = a + BEX, ( 1 .6.4)
and covariance matrix,
Lyy = BLxx B'. ( 1 .6.5)
PROOF. Problem 1 . 1 5.
Proposition 1 .6.2. The covariance matrix Lxx is symmetric and
non-negative definite, i.e. b' Lxx b 2': 0 for all b = (b1 , . . .
, bnY E n.
PROOF. The symmetry of Lxx is apparent from the definition. To
prove non negative definiteness let b = (b1 , • • . , bn)' be an
arbitrary vector in n. Then by Proposition 1 .6. 1
b'Lxxb = Var(b'X) ;:o: 0. ( 1 .6.6) 0
Proposition 1 .6.3. Any symmetric, non-negative definite n x n
matrix L can be written in the form
L = PAP', ( 1 .6.7)
where P is an orthogonal matrix (i.e. P' = p-1 ) and A is a
diagonal matrix A = diag()" 1 , . . . , ).n) in which A1 , • . . ,
An are the eigenvalues (all non-negative) of L.
PROOF. This proposition is a standard result from matrix theory and
for a proof we refer the reader to Graybill ( 1 983). We observe
here only that if P;, i = 1, . . . , n, is a set of orthonormal
right eigenvectors of L corresponding to the eigenvalues )" 1 , • •
• , An respectively, then P may be chosen as the n x n matrix whose
i'h column is p;, i = 1 , . . . , n. D
Remark 1. Using the factorization ( 1 .6. 7) and the fact that det
P = det P' = 1 , we immediately obtain the result,
det L = Jc1 )"2 . . . A.. -
Definition 1 .6.1 (The Multivariate Normal Distribution). The
random vector Y = ( Y1 , . . . , Y,)' is said to be multivariate
normal, or to have a multivariate normal distribution, if and only
if there exist a column vector a, a matrix B and a random vector X
= (X1 , • • • , Xm)' with independent standard normal
34 1 . Stationary Time Series
components, such that
Y = a + BX. ( 1 .6.8)
Remark 2. The components X1 , • • • , Xm of X in ( 1 .6.8) must
have the joint density
X = (x 1 , . . . , Xm)' E !Rm, ( 1 .6.9)
and corresponding characteristic function,
</lx(u) = Eeiu'X = exp (- t u]/2) , j= 1
Remark 3. It is clear from the definition that if Y has a
multivariate normal distribution and if D is any k x n matrix and c
any k x 1 vector, then Z = c + DY is a k-component multivariate
normal random vector.
Remark 4. If Y is multivariate normal with representation ( 1
.6.8), then by Proposition 1 .6. 1 , EY = a and YY = BB'.
Proposition 1 .6.4. If Y = (Y1 , . . . , Y,)' is a multivariate
normal random vector such that EY = J1 and YY = . then the
characteristic function of Y is
<jly(u) = exp(iu' Jl - !u'u),
If det > 0 then Y has the density,
fy(y) = (2n)-ni2 (det )-1i2 exp [ -!(Y - J1)'-1 (y - J1}] .
PROOF. If Y i s multivariate normal with representation ( 1 .6.8)
then
<jly(u) = E exp [iu'(a + BX)] = exp(iu'a)E exp(iu'BX).
( 1 .6. 1 1 )
( 1 .6. 1 2)
Using ( 1 .6. 1 0) with u (E !Rm) replaced by B'u (u E !Rn) in
order to evaluate the last term, we obtain
<f>v(u) = exp(iu'a)exp( -!u'BB'u),
which reduces to ( 1 .6. 1 1 ) by Remark 4. If det > 0, then by
Proposition 1 .6.3 we have the factorization,
= PAP',
where PP' = In, the n x n identity matrix, A = diag(A.1 , . . . ,
A.n) and each ).i > 0. If we define A -112 = diag(A.(112, . . .
, A.;;-112 ) and
-1;2 = PA -112P',
then it is easy to check that -112-112 = ln. From Proposition 1 .6.
1 and Remark 3 we conclude that the random vector
§1 .6. The Multivariate Normal Distribution 35
( 1 .6. 1 3)
is multivariate normal with EZ = 0 and I:zz = ln . Application of
the result ( 1 .6. 1 1 ) now shows that Z has the characteristic
function lftz(u) = exp( -u'u/2), whence it follows that Z has the
probability density ( 1 .6.9) with m = n. In view of the relation (
1 .6. 1 3), the density of Y is given by
fv(Y) = j det I:-l/2 1fz(I:-If2 (y - f.l))
= (det I:)-112 (2n)-nf2 exp[ -!(Y - f.l)'I:-1 (y - f.l)]
as required. D
Remark 5. The transformation ( 1 .6. 1 3) which maps Y into a
vector of inde pendent standard normal random variables is clearly
a generalization of the transformation Z = u-1 ( ¥ - /1) which
standardizes a single normal random variable with mean 11 and
variance u2.
Remark 6. Given any vector f.l E IRn and any symmetric non-negative
definite n x n matrix I:, there exists a multivariate normal random
vector with mean f.l and covariance matrix I:. To construct such a
random vector from a vector X = (X 1 , . . . , XnY with independent
standard normal components we simply choose a = f.l and B = 1: 112
in ( 1 .6.8), where 1:112, in the terminology of Proposition 1
.6.3, is the matrix PA112P' with A112 = diag().il2, . . . , 212
).
Remark 7. Proposition 1 .6.4 shows that a multivariate normal
distribution is uniquely determined by its mean and covariance
matrix. If Y is multivariate normal, EY = f.l and I:yy = I:, we
shall therefore say that Y has the multi variate normal
distribution with mean f.l and covariance matrix I:, or more
succinctly,
Y N(f.l, I:).
ExAMPLE 1 .6. 1 (The Bivariate Normal Distribution). The random
vector Y = ( ¥1 , Y2 )' is bivariate normal with mean f.l = (f1 1 ,
f12 )' and covariance matrix
( 1 .6. 14)
if and only if Y has the characteristic function (from ( 1 .6. 1 1
) )
lftv(u) = exp[i(u 1 f11 + U2f12 ) - !Cui uf + 2u1 u2pU1 (J2 +
uui)]. ( 1 .6. 1 5)
The parameters (J 1 , u 2 and p are the standard deviations and
correlation of the components Y1 and ¥2 . Since every symmetric
non-negative definite 2 x 2 matrix can be written in the form ( 1
.6. 14), it follows that every bivariate normal random vector has a
characteristic function of the form ( 1 .6. 1 5). If u1 i= 0, u2 i=
0 and - 1 < p < 1 then I: has an inverse,
36 I . Stationary Time Series
( 1 .6. 1 6)
and so by ( 1 .6. 1 2), Y has the probability density,
( 1 .6. 1 7)
Proposition 1.6.5. The random vector Y = ( Y1 , . • . , Yn)' is
multivariate normal with mean 11 and covariance matrix L if and
only if for each a = (a 1 , . . • , an)' E IRn, a'Y has a
univariate normal distribution with mean a'11 and variance
a'La.
PROOF. The necessity of the condition has already been established.
To prove the sufficiency we shall show that Y has the appropriate
characteristic function. For any a E IRn we are assuming that a' Y
N(a'Jl, a'La), or equivalently that
E exp(ita'Y) = exp(ita' 11 - 1t2a'La). ( 1 .6. 1 8)
Setting t = 1 in ( 1 .6. 1 8) we obtain the required characteristic
function ofY, viz.
E exp(ia'Y) = exp(ia'Jl - 1a'La). 0
.
Correspondingly we can write the mean and covariance matrix of Y
as
11 = [:::] and L = [L t t L 1 2] 11 L2 1 L2 2
where 11(i) = £yU> and Lii = E(Y<i> - ,.u>) (yu> -
Jl(j))'.
Proposition 1.6.6.
(i) y( l J and Y(2) are independent if and only if L 1 2 = 0. (ii)
If det L2 2 > 0 then the conditional distribution of Y(l l given
y<2> is
N(J1< 1 > + L 1 2 Lzi(Y<2 > - 11<2 >), L t t - L
1 2Lzi L2 t l·
PROOF. (i) If y( l > and Y<2> are independent, then
L 1 2 = E(Y< t > - 11( 1 l )E(Y<2> - 11(2>)' = 0.
Conversely if L 1 2 = 0 then the characteristic function lfov(u),
as specified by
§ I . 7. * Applications of Kolmogorov's Theorem
Proposition 1 .6.4, factorizes into
tPy(U) = tPyi 1 >(U( l ))tPy<2>(U(Z)), establishing the
independence of y< I > and Y(2).
(ii) If we define
then clearly
so that X and y<z> are independent by (i). Using the relation
( 1 .6. 1 9) we can express the conditional characteristic function
of y< l > given Y(2) as
E(exp(iu'Y< 1 > ) j Y<2 > ) = E(exp[iu'X + iu'(J1< 1
> + L 1 2L2 (Y<2> - Jl<2>))] jY<2> )
= exp[iu'(J1< 1 > + L 1 2 L2 (Y<2> - J1<2 >))] E
exp(iu'X j Y<2>),
where the last line is obtained by taking a factor dependent only
on y<z> outside the conditional expectation. Now since X and
Y(2) are independent,
E(exp(iu'X) j Y<2 > ) = E exp(iu' X) = exp[ -u'(L 1 1 - L 1
2Lz L2 du],
so
E(exp(iu'Y< 1 > ) j Y(2) )
= exp[iu' (Jl< 1 > + L 1 2 L2 (Y<2> - Jl<2 >)) -
u'(L l l - L l zLz Lz l )u] ,
completing the proof. D
ExAMPLE 1 .6.2. For the bivariate normal random vector Y discussed
in Example 1 .6. 1 we immediately deduce from Proposition 1 .6.6
that Y1 and Y2 are independent if and only if p(J1 (J2 = 0. If (J1
> 0, (J2 > 0 and p > 0 then conditional on Y2 , Y1 is
normal with mean
E(Y1 l Yz ) = !1 1 + P(J1 (Jz - 1 ( Yz - Jlz ),
and variance
§ 1 .7* Applications of Kolmogorov's Theorem
In this section we illustrate the use of Theorem 1 .2. 1 to
establish the existence of two important processes, Brownian motion
and the Poisson process.
Definition 1 .7.1 (Standard Brownian Motion). Standard Brownian
motion starting at level zero is a process { B(t), t ?: 0}
satisfying the conditions
38 l. Stationary Time Series
(a) B(O) = 0, (b) B(t2 ) - B(t 1 ), B(t3 ) - B(t2 ), . . . , B(tn)
- B(tn_1 ), are independent for every
n E { 3, 4, . . . } and every t = (t 1 , . . . , tn)' such that 0 t
1 < t 2 < · · · < tn , (c) B(t) - B(s) N(O, t - s) for t
;;::: s.
To establish the existence of such a process we observe that
conditions (a), (b) and (c) are satisfied if and only if, for every
t = (t 1 , . . . , tn)' such that 0 t 1 < · · · < tn , the
characteristic function of (B(t 1 ), . . . , B(tn)) is
1Pt{u) = E exp[iu 1 B(td + · · · + iunB(tn)J
= E exp[iu1 i1 1 + iu2 (i1 1 + l12) + · · · + iun(L1 1 + · · · +
L1n)]
(where L1i = B(ti) - B(ti_1 ), j ;;::: 1 , and t0 = 0) ( 1 .7 . 1
)
= E exp[ii11 (u 1 + · · · + Un) + il12 (u2 + · · · + un) + · · · +
il1nunJ
= exp [- I (ui + · · · + un)2 (ti - ti_1 )J . 2 j=1
It is trivial to check that the characteristic functions (M · )
satisfy the consistency condition ( 1 .2.9) and so by Kolmogorov's
theorem there exists a process with characteristic functions r/J1(
· ), or equivalently with the properties (a), (b) and (c).
Definition 1 .7.2 (Brownian Motion with Drift). Brownian motion
with drift Jl, variance parameter rJ2 and initial level x is
process { Y(t), t ;;::: 0} where
Y(t) = X + J.Lt + rJB(t),
and B(t) is standard Brownian motion.
The existence of Brownian motion with drift follows at once from
that of standard Brownian motion.
Definition 1 .7.3 (Poisson Process). A Poisson process with mean
rate A. ( > 0) is a process { N(t), t ;;::: 0} satisfying the
conditions
(a) N(O) = 0, (b) N(t2 ) - N(td, N(t3 ) - N(t2 ), • • . , N(tn) -
N(tn_1 ), are independent for
every n E { 3,4, . . . } and every t = (t 1 , . . . , tn)' such
that O t1 < t2 < · · · < tn , (c) N(t) - N(s) has the
Poisson distribution with mean A.(t - s) for t ;;::: s.
The proof of the existence of a Poisson process follows precisely
the same steps as the proof of the existence of standard Brownian
motion. For the Poisson process however the characteristic function
of the increment L1i = N(ti) - N(ti-d is
E exp(iul1i) = exp { - A.(ti - ti_d ( l - e;") } .
In fact the same proof establishes the existence of a process
{Z(t), t ;;::: 0}
Problems 39
satisfying conditions (a) and (b) of Definition 1 .7 . 1 provided
the increments L1i = Z(ti) - Z(tj- t ) have characteristic function
of the form
Problems
1 . 1. Suppose that X, = Z, + IJZ,_ 1 , t = 1, 2, . . . , where Z0,
Z1 , Z2 , . . . , are independent random variables, each with
moment generating function E exp().Z;) = m(A). (a) Express the
joint moment generating function E exp(L71 A;XJ in terms of
the function m( · ). (b) Deduce from (a) that {X, } is strictly
stationary.
1 .2. (a) Show that a l inear filter { aJ passes an arbitrary
polynomial of degree k without distortion, i.e.
for all k'h degree polynomials m, = c0 + c1t + · · · + ckt k , if
and only if
(b) Show that the Spencer 1 5-point moving average filter { aJ does
not distort a cubic trend.
1 .3. Suppose that m, = c0 + c 1 t + c2 t2, t = 0, ± 1, . . . . (a)
Show that
2 3 m, = I, aimt+i = I bimr+i ' i= -2 i= - 3
t = 0, ± 1, . . . ,
where a2 = a_ 2 = - f.;, a 1 = a_ 1 = H, a0 = H, and b3 = b_ 3 =
-fr, b2 = b_2 = -fr, b1 = b_! = fr, bo = k
(b) Suppose that X, = m, + Z, where { Z,, t = 0, ± 1 , . . . } is
an independent se quence of normal random variables, each with
mean 0 and variance u2• Let U, = If-2 a;Xr+i and V, = If- 3 b;Xr+i
· (i) Find the means and variances of U, and V,.
(ii) Find the correlations between U, and U,+1 and between V, and
V,+t · (iii) Which of the two filtered series { U, } and { V, }
would you expect to be
smoother in appearance?
1 .4. If m, = Ifo ckt\ t = 0, ± 1 , . . . , show that Vm, is a
polynomial of degree (p - 1 ) i n t and hence that VP+l m , =
0.
1 .5. Design a symmetric moving average filter which eliminates
seasonal components with period 3 and which at the same time passes
quadratic trend functions without distortion.
40 1 . Stationary Time Series
1 .6. (a) Use the programs WORD6 and PEST to plot the series with
values {x 1 , . . . , x30 } given by
1-10 1 486 474 434 441 435 401 414 414 386 405 1 1-20 4 1 1 389 414
426 4 10 441 459 449 486 5 10 2 1-30 506 549 579 5 8 1 630 666 674
729 77 1 785
This series is the sum of a quadratic trend and a period-three
seasonal component.
(b) Apply the filter found in Problem 1 . 5 to the preceding series
and plot the result. Comment on the result.
1 .7. Let Z,, t = 0, ± 1, . . . , be independent normal random
variables each with mean 0 and variance a2 and let a, b and c be
constants. Which, if any, of the following processes are
stationary? For each stationary process specify the mean and
autocovariance function. (a) X, = a + bZ, + cZ,_ 1 , (b) X, = a +
bZ0 , (c) X, = Z1 cos(ct) + Z2 sin(ct), (d) X, = Z0 cos(ct), (e) X,
= Z,cos(ct) + Z,_ , sin(ct), (f) X, = Z,Z,_ 1 .
1 .8. Let { Y, } be a stationary process with mean zero and let a
and b be constants. (a) If X, = a + bt + s, + Y, where s, is a
seasonal component with period 1 2,
show that VV1 2X, = ( 1 - 8) ( 1 - B1 2 )X, is stationary. (b) If
X, = (a + bt)s, + Y, where s, is again a seasonal component with
period
1 2, show that V2 X, = ( 1 - 81 2) ( 1 - B1 2 )X, is
stationary.
1 .9. Use the program PEST to analyze the accidental deaths data by
"classical de composition". (a) Plot the data. (b) Find estimates
s, t = 1, . . . , 1 2, for the classical decomposition model,
X, = m, + s, + Y, , where s, = s,+ 1 2 , z)1 s, = 0 and E Y, = 0.
(c) Plot the deseasonalized data, X, - s, t = 1, . . . , 72. (d)
Fit a parabola by least squares to the deseasonalized data and use
it as your
estimate m, of m, .
(e) Plot the residuals Y, = X, - m, - ., , t = 1, . . . , 72. (f)
Compute the sample autocorrelation function of the residuals p(h),
h =
0, . . . , 20. (g) Use your fitted model to predict X,, t = 73, . .
. , 84 (using predicted noise
values of zero).
1 . 1 0. Let X, = a + bt + Y,, where { Y,, t = 0, ± 1, . . . } is
an independent and identically distributed sequence of random
variables with mean 0 and variance a2, and a and b are constants.
Define
q W, = (2q + W' l: X,+j · j= -q
Compute the mean and autocovariance function of { W,} . Notice that
although { W, } is not stationary, its autocovariance function y(t
+ h, t) = Cov(W,+h, W,) does not depend on t. Plot the
autocorrelation function p(h) = Corr(W,+h, W,). Discuss your
results in relation to the smoothing of a time series.
Problems 4 1
1 . 1 1 . If {X,} and { Y,} are uncorrelated stationary sequences,
i.e. if X, and Y, are uncorrelated for every s and t, show that {X,
+ Y,} is stationary with auto covariance function equal to the sum
of the autocovariance functions of {X,} and { Y, } .
1 . 1 2. Which, i f any, of the following functions defined on the
integers i s the autocovariance function of a stationary time
series? { 1 if h = 0, (a) f(h) =
1 /h if h =I 0. (b) f(h) = ( 1 )ihl
nh nh (c) f(h) = 1 + cos T + cos 4 {1 if h = 0,
(e) f(h) = .4 if h = ± 1 , 0 otherwise.
nh nh (d) f(h) = 1 + cos - cos -
2 4 {1 if h = 0, (f) .f(h) = .6 if h = ± 1 ,
0 otherwise.
1 . 1 3 . Let {S, t = 0, 1 , 2, . . . } be the random walk with
constant drift f1, defined by S0 = 0 and
s, = 11 + s,_ 1 + x, t = 1 , 2, . . . '
where X 1 , X 2 , . . . are independent and identically distributed
random variables with mean 0 and variance (J2 . Compute the mean of
S, and the autocovariance function of the process { S, } . Show
that {VS, } is stationary and compute its mean and autocovariance
function.
1 . 1 4. If X, = a + bt, t = 1 , 2, . . . , n, where a and b are
constants, show that the sample autocorrelations have the property
p(k) --> 1 as n --> oo for each fixed k.
1 . 1 5. Prove Proposition 1 . 6. 1 .
1 . 1 6. (a) If Z N(O, 1 ) show that Z2 has moment generating
function Ee'z' = ( 1 2tf112 for t < !, thus showing that Z2 has
the chi-squared distribution with 1 degree of freedom.
(b) If Z1 , . . . , Z" are independent N(O, 1) random variables,
prove that Zl + · · · + z; has the chi-squared distribution with n
degrees of freedom by showing that its moment generating function
is equal to (1 2tf"12 for t < !.
(c) Suppose that X = (X1 , . . . , X")' N(Jl, L) with L
non-singular. Using ( 1 .6. 1 3), show that (X Jl)'L-1 (X Jl) has
the chi-squared distribution with n degrees of freedom.
1 . 1 7. If X = (X 1 , . . . , X")' is a random vector with
covariance matrix L, show that L is singular if and only if there
exists a non-zero vector b = (b1 , • . . , b")' E IR" such that
Var(b'X) = 0.
1 . 1 8.* Let F be any distribution function, let T be the index
set T = { 1 , 2, 3, . . . } and let Y be as in Definition 1 .2.3.
Show that the functions F1, t E Y, defined by
F,, ... 1Jx 1 , • • • , xn) := F(x J l · · · F(xn), X 1 , . . . ,
Xn E !R, constitute a family of distribution functions, consistent
in the sense of ( 1 .2.8). By Kolmogorov's theorem this establishes
that there exists a sequence of inde pendent random variables {X 1
, X 2, . • . } defined on some probability space and such that P(X;
:o:; x) = F(x) for all i and for all x E IR.
CHAPTER 2
Hilbert Spaces
Although it is possible to study time series analysis without
explicit use of Hilbert space terminology and techniques, there are
great advantages to be gained from a Hilbert space formulation.
These are largely derived from our familiarity with two- and
three-dimensional Euclidean geometry and in par ticular with the
concepts of orthogonality