Time Series: Theory and Methods (Springer Series in Statistics)

Springer Series in Statistics
I. Olkin, N. Wermuth, S. Zeger
Peter J. Brockwell Richard A. Davis
Time Series: Theory and Methods Second Edition
Springer
Peter J. Brockwell Department of Statistics Colorado State University Fort Collins, CO 8052 3 USA
Richard A. Davis Department of Statistics Columbia University New York, NY 10027 USA
Mathematical Subject Classification: 62-01, 62M10
Library of Congress Cataloging-in-Publication Data Brockwell, Peter J.
Time series: theory and methods I Peter J. Brockwell, Richard A. Davis. p. em. -(Springer series in statistics)
"Second edition"-Pref. Includes bibliographical references and index. ISBN 0-387-97429-6 (USA).-ISBN 3-540-97429-6 (EUR.) I. Time-series analysis. I. Davis, Richard A. II. Title. III. Series.
QA280.B76 1991 90-25821 519.5'5-dc20
ISBN 1-4419-0319-8 Printed on a.cid-free paper. ISBN 978-1-4419-0319-8 (soft cover) © 2006 Springer Science +Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+ Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as a n expres.sion of opinion as to whether or not they are subject to proprietary rights.
Printed in the United States of America.
15 14 13
Preface to the Second Edition
This edition contains a large number of additions and corrections scattered throughout the text, including the incorporation of a new chapter on state-space models. The companion diskette for the IBM PC has expanded into the software package I TSM: An Interactive Time Series Modelling Package for the PC, which includes a manual and can be ordered from Springer-Verlag. *
We are indebted to many readers who have used the book and programs and made suggestions for improvements. Unfortunately there is not enough space to acknowledge all who have contributed in this way; however, spcial mention must be made of our prize-winning fault-finders, Sid Resnick and F. Pukelsheim. Special mention should also be made of Anthony Brockwell, whose advice and support on computing matters was invaluable in the preparation of the new diskettes. We have been fortunate to work on the new edition in the excellent environments provided by the University of Melbourne and Colorado State University. We thank Duane Boes particularly for his support and encouragement throughout, and the Australian Research Council and National Science Foundation for their support of research related to the new material. We are also indebted to Springer-Verlag for their constant support and assistance in preparing the second edition.
Fort Collins, Colorado November, 1 990
P.J. BROCKWELL
R.A. DAVIS
* ITSM: An Interactive Time Series Modelling Package for the PC by P.J. Brockwell and R.A. Davis. ISBN: 0-387-97482-2; 1991.
viii Preface to the Second Edition
Note added in the eighth printing: The computer programs referred to in the text
have now been superseded by the package ITSM2000, the student version of which
accompanies our other text, Introduction to Time Series and Forecasting, also
published by Springer-Verlag. Enquiries regarding purchase of the professional
version of this package should be sent to pjbrockwell @cs.com.
Preface to the First Edition
We have attempted in this book to give a systematic account of linear time series models and their application to the modelling and prediction of data collected sequentially in time. The aim is to provide specific techniques for handling data and at the same time to provide a thorough understanding of the mathematical basis for the techniques. Both time and frequency domain methods are discussed but the book is written in such a way that either approach could be emphasized. The book is intended to be a text for graduate students in statistics, mathematics, engineering, and the natural or social sciences. It has been used both at the M.S. level, emphasizing the more practical aspects of modelling, and at the Ph.D. level, where the detailed mathematical derivations of the deeper results can be included.
Distinctive features of the book are the extensive use of elementary Hilbert space methods and recursive prediction techniques based on innovations, use of the exact Gaussian likelihood and AIC for inference, a thorough treatment of the asymptotic behavior of the maximum likelihood estimators of the coefficients of univariate ARMA models, extensive illustrations of the tech niques by means of numerical examples, and a large number of problems for the reader. The companion diskette contains programs written for the IBM PC, which can be used to apply the methods described in the text. Data sets can be found in the Appendix, and a more extensive collection (including most of those used for the examples in Chapters 1 , 9, 10, 1 1 and 1 2) is on the diskette. Simulated ARMA series can easily be generated and filed using the program PEST. Valuable sources of additional time-series data are the collections of Makridakis et al. ( 1984) and Working Paper 109 ( 1984) of Scientific Computing Associates, DeKalb, Illinois.
Most of the material in the book is by now well-established in the time series literature and we have therefore not attempted to give credit for all the
X Preface to the First Edition
results discussed. Our indebtedness to the authors of some of the well-known existing books on time series, in particular Anderson, Box and Jenkins, Fuller, Grenander and Rosenblatt,, Hannan, Koopmans and Priestley will however be apparent. We were also fortunate to have access to notes on time series by W. Dunsmuir. To these and to the many other sources that have influenced our presentation of the subject we express our thanks.
Recursive techniques based on the Kalman filter and state-space represen tations of ARMA processes have played an important role in many recent developments in time series analysis. In particular the Gaussian likelihood of a time series can be expressed very simply in terms of the one-step linear predictors and their mean squared errors, both of which can be computed recursively using a Kalman filter. Instead of using a state-space representation for recursive prediction we utilize the innovations representation of an arbi trary Gaussian time series in order to compute best linear predictors and exact Gaussian likelihoods. This approach, developed by Rissanen and Barbosa, Kailath, Ansley and others, expresses the value of the series at time t in terms of the one-step prediction errors up to that time. This representation provides insight into the structure of the time series itself as well as leading to simple algorithms for simulation, prediction and likelihood calculation.
These algorithms are used in the parameter estimation program (PEST) found on the companion diskette. Given a data set of up to 2300 observations, the program can be used to find preliminary, least squares and maximum Gaussian likelihood estimators of the parameters of any prescribed ARIMA model for the data, and to predict future values. It can also be used to simulate values of an ARMA process and to compute and plot its theoretical auto covariance and spectral density functions. Data can be plotted, differenced, deseasonalized and detrended. The program will also plot the sample auto correlation and partial autocorrelation functions of both the data itself and the residuals after model-fitting. The other time-series programs are SPEC, which computes spectral estimates for univariate or bivariate series based on the periodogram, and TRANS, which can be used either to compute and plot the sample cross-correlation function of two series, or to perform least squares estimation of the coefficients in a transfer function model relating the second series to the first (see Section 1 2.2). Also included on the diskette is a screen editing program (WORD6), which can be used to create arbitrary data files, and a collection of data files, some of which are analyzed in the book. Instructions for the use of these programs are contained in the file HELP on the diskette.
For a one-semester course on time-domain analysis and modelling at the M.S. level, we have used the following sections of the book:
1 . 1 - 1 .6; 2. 1 -2.7; 3 . 1 -3.5; 5. 1-5.5; 7. 1 , 7.2; 8 . 1-8.9; 9. 1 -9.6
(with brief reference to Sections 4.2 and 4.4). The prerequisite for this course is a knowledge of probability and statistics at the level ofthe book Introducti on to the Theory of Stati sti cs by Mood, Graybill and Boes.
Preface to the First Edition XI
For a second semester, emphasizing frequency-domain analysis and multi variate series, we have used
4. 1 -4.4, 4.6-4. 10; 10. 1 - 10.7; 1 1 . 1 - 1 1 .7; selections from Chap. 1 2.
At the M.S. level it has not been possible (or desirable) to go into the mathe matical derivation of all the results used, particularly those in the starred sections, which require a stronger background in mathematical analysis and measure theory. Such a background is assumed in all of the starred sections and problems.
For Ph.D. students the book has been used as the basis for a more theoretical one-semester course covering the starred sections from Chapters 4 through 1 1 and parts of Chapter 1 2. The prerequisite for this course is a knowledge of measure-theoretic probability.
We are greatly indebted to E.J. Hannan, R.H. Jones, S.l. Resnick, S.Tavare and D. Tj0stheim, whose comments on drafts of Chapters 1-8 led to sub stantial improvements. The book arose out of courses taught in the statistics department at Colorado State University and benefitted from the comments of many students. The development of the computer programs would not have been possible without the outstanding work of Joe Mandarino, the architect of the computer program PEST, and Anthony Brockwell, who contributed WORD6, graphics subroutines and general computing expertise. We are indebted also to the National Science Foundation for support for the research related to the book, and one of us (P.J.B.) to Kuwait University for providing an excellent environment in which to work on the early chapters. For permis sion to use the optimization program UNC22MIN we thank R. Schnabel of the University of Colorado computer science department. Finally we thank Pam Brockwell, whose contributions to the manuscript went far beyond those of typist, and the editors of Springer-Verlag, who showed great patience and cooperation in the final production of the book.
Fort Collins, Colorado October 1 986
P.J. BROCKWELL
R.A. DAVIS
CHAPTER I
Stationary Time Series § 1 . 1 Examples of Time Series § 1 .2 Stochastic Processes § 1 . 3 Stationarity and Strict Stationarity § 1 .4 The Estimation and Elimination of Trend and Seasonal Components § 1 . 5 The Autocovariance Function of a Stationary Process § 1 .6 The Multivariate Normal Distribution §1 .7* Applications of Kolmogorov's Theorem
Problems
CHAPTER 2 Hilbert Spaces §2. 1 Inner-Product Spaces and Their Properties §2.2 Hilbert Spaces §2.3 The Projection Theorem §2.4 Orthonormal Sets §2.5 Projection in IR" §2.6 Linear Regression and the General Linear Model §2.7 Mean Square Convergence, Conditional Expectation and Best
Linear Prediction in L 2(!1, :F, P) §2.8 Fourier Series §2.9 Hilbert Space Isomorphisms §2. 10* The Completeness of L2(Q, .?, P) §2. 1 1 * Complementary Results for Fourier Series
Problems
62 65 67 68 69 73
XIV Contents
CHAPTER 3
Stationary ARMA Processes 77 §3.1 Causal and Invertible ARMA Processes 77 §3.2 Moving Average Processes of Infinite Order 89 §3.3 Computing the Autocovariance Function of an ARMA(p, q) Process 9 1 §3.4 The Partial AutOCfimelation Function 98 §3.5 The Autocovariance Generating Function 103 §3.6* Homogeneous Linear Difference Equations with
Constant Coefficients 105 Problems 1 10
CHAPTER 4
The Spectral Representation of a Stationary Process 1 14 §4. 1 Complex-Valued Stationary Time Series 1 14 §4.2 The Spectral Distribution of a Linear Combination of Sinusoids 1 16 §4.3 Herglotz's Theorem 1 1 7 §4.4 Spectral Densities and ARMA Processes 1 22 §4.5* Circulants and Their Eigenvalues 1 33 §4.6* Orthogonal Increment Processes on [ -n, n] 1 38 §4.7* Integration with Respect to an Orthogonal Increment Process 140 §4.8* The Spectral Representation 143 §4.9* Inversion Formulae 1 50 §4. 1 0* Time-Invariant Linear Filters 1 52 §4. 1 1 * Properties of the Fourier Approximation h" to J(v. wJ 1 57
Problems 1 59
CHAPTER 5
Prediction of Stationary Processes 1 66 §5. 1 The Prediction Equations in the Time Domain 1 66 §5.2 Recursive Methods for Computing Best Linear Predictors 1 69 §5.3 Recursive Prediction of an ARMA(p, q) Process 1 75 §5.4 Prediction of a Stationary Gaussian Process; Prediction Bounds 1 82 §5.5 Prediction of a Causal Invertible ARMA Process in
§5.6* §5.7* §5.8*
Prediction in the Frequency Domain The Wold Decomposition Kolmogorov's Formula Problems
CHAPTER 6*
Asymptotic Theory §6. 1 Convergence in Probability §6.2 Convergence in r'h Mean, r > 0 §6.3 Convergence in Distribution §6.4 Central Limit Theorems and Related Results
Problems
1 82 1 85 1 87 1 9 1 1 92
1 98 1 98 202 204 209 2 1 5
Contents
CHAPTER 7
Estimation of the Mean and the Autocovariance Function §7. 1 §7.2 §7.3*
Estimation of J1 Estimation of y( ·) and p( · ) Derivation of the Asymptotic Distributions Problems
CHAPTER 8
2 1 8 2 1 8 220 225 236
Estimation for ARMA Models 238 §8. 1 The Yule-Walker Equations and Parameter Estimation for
Autoregressive Processes 239 §8.2 Preliminary Estimation for Autoregressive Processes Using the
Durbin-Levinson Algorithm 241 §8.3 Preliminary Estimation for Moving Average Processes Using the
Innovations Algorithm 245 §8.4 Preliminary Estimation for ARMA(p, q) Processes 250 §8.5 Remarks on Asymptotic Efficiency 253 §8.6 Recursive Calculation of the Likelihood of an Arbitrary
Zero-Mean Gaussian Process 254 §8.7 Maximum Likelihood and Least Squares Estimation for
ARMA Processes 256 §8.8 Asymptotic Properties of the Maximum Likelihood Estimators 258 §8.9 Confidence Intervals for the Parameters of a Causal Invertible
ARMA Process 260 §8. 1 0* Asymptotic Behavior of the Yule-Walker Estimates 262 §8. 1 1 * Asymptotic Normality of Parameter Estimators 265
Problems 269
CHAPTER 9
Model Building and Forecasting with ARIMA Processes §9. 1 ARIMA Models for Non-Stationary Time Series §9.2 Identification Techniques §9.3 Order Selection §9.4 Diagnostic Checking §9.5 Forecasting ARIMA Models §9.6 Seasonal ARIMA Models
Problems
CHAPTER 10
Inference for the Spectrum of a Stationary Process §10 . 1 The Periodogram §10.2 Testing for the Presence of Hidden Periodicities § 10.3 Asymptotic Properties of the Periodogram § 10.4 Smoothing the Periodogram § 10.5 Confidence Intervals for the Spectrum § 10.6 Autoregressive, Maximum Entropy, Moving Average and
Maximum Likelihood ARMA Spectral Estimators § 10.7 The Fast Fourier Transform (FFT) Algorithm
273 274 284 301 306 3 14 320 326
330 3 3 1 334 342 350 362
365 373
XVI
§ 10.8* Derivation of the Asymptotic Behavior of the Maximum Likelihood and Least Squares Estimators of the Coefficients of an ARMA Process Problems
CHAPTER II Multivariate Time Series § 1 1 . 1 Second Order Properties of Multivariate Time Series §1 1 .2 Estimation of the Mean and Covariance Function § 1 1 .3 Multivariate ARMA Processes § 1 1 .4 Best Linear Predictors of Second Order Random Vectors § 1 1 . 5 Estimation for Multivariate ARMA Processes § 1 1 .6 The Cross Spectrum § 1 1 .7 Estimating the Cross Spectrum §1 1 .8* The Spectral Representation of a Multivariate Stationary
Time Series Problems
CHAPTER 12 State-Space Models and the Kalman Recursions § 12. 1 State-Space Models § 12.2 The Kalman Recursions § 12.3 State-Space Models with Missing Observations § 12.4 Controllability and Observability § 12.5 Recursive Bayesian State Estimation
Problems
CHAPTER 13 Further Topics §13. 1 Transfer Function Modelling § 13.2 Long Memory Processes § 1 3.3 Linear Processes with Infinite Variance § 13.4 Threshold Models
Problems
Contents
401 402 405 4 1 7 421 430 434 443
454 459
506 506 520 535 545 552
555 561 567
Stationary Time Series
In this chapter we introduce some basic ideas of time series analysis and stochastic processes. Of particular importance are the concepts of stationarity and the autocovariance and sample autocovariance functions. Some standard techniques are described for the estimation and removal of trend and season ality (of known period) from an observed series. These are illustrated with reference to the data sets in Section 1 . 1 . Most of the topics covered in this chapter will be developed more fully in later sections of the book. The reader who is not already familiar with random vectors and multivariate analysis should first read Section 1 .6 where a concise account of the required background is given. Notice our convention that an n-dimensional random vector is assumed (unless specified otherwise) to be a column vector X = (X 1, X 2, . . . , X nY of random variables. If S is an arbitrary set then we shall use the notation sn to denote both the set of n-component column vectors with components in S and the set of n-component row vectors with components in S.
§ 1 . 1 Examples of Time Series
A time series is a set of observations x,, each one being recorded at a specified time t. A discrete-time series (the type to which this book is primarily devoted) is one in which the set T0 of times at which observations are made is a discrete set, as is the case for example when observations are made at fixed time intervals. Continuous-time series are obtained when observations are recorded continuously over some time interval, e.g. when T0 = [0, 1 ] . We shall use the notation x(t) rather than x, if we wish to indicate specifically that observations are recorded continuously.
2 1 . Stationary Time Series
EXAMPLE l.l.l (Current Through a Resistor). If a sinusoidal voltage v(t) = a cos( vt + 8) is applied to a resistor of resistance r and the current recorded continuously we obtain a continuous time series
x(t) = r-1acos(vt + 8).
If observations are made only at times 1 , 2, . . . , the resulting time series will be discrete. Time series of this particularly simple type will play a fundamental role in our later study of stationary time series.
0 . 5
- 1 5
- 2 0 1 0 2 0 30 40 50 60 7 0 8 0 9 0 1 00
Figure 1 . 1 . 1 00 observations of the series x(t) = cos(.2t + n/3).
§ 1 . 1 . Examples of Time Series 3
EXAMPLE 1 . 1 .2 (Population x, of the U.S.A., 1 790- 1980).
x, x,
1 790 3,929,2 14 1 890 62,979,766 1 800 5,308,483 1 900 76,2 1 2, 1 68 1 8 10 7,239,88 1 19 10 92,228,496 1 820 9,638,453 1 920 106,021 ,537 1830 1 2,860,702 1930 1 23,202,624 1 840 1 7,063,353 1940 1 32, 1 64,569 1 850 23, 1 9 1 ,876 1950 1 5 1 ,325,798 1 860 3 1 ,443,321 1960 1 79,323, 1 75 1 870 38,558,371 1 970 203,302,03 1 1 880 50, 1 89,209 1980 226,545,805
2 6 0
1 00
8 0
6 0
40
40
0 1 78 0 1 83 0 1 8 80 1 9 3 0 1 9 80
Figure 1 .2. Population of the U.S.A. at ten-year intervals, 1 790- 1980 (U.S. Bureau of the Census).
4
EXAMPLE 1 . 1 .3 (Strikes in the U.S.A., 1 95 1 - 1980).
Ill 1J c 0 Ill 0 J: f.--
6
5
4 3
1951 1952 1953 1954 1955 1956 1957 1958 1 959 1960 1961 1962 1963 1 964 1965
x,
4737 1966 5 1 1 7 1 967 5091 1 968 3468 1969 4320 1970 3825 1 97 1 3673 1 972 3694 1973 3708 1974 3333 1 975 3367 1976 36 14 1 977 3362 1978 3655 1979 3963 1980
I. Stationary Time Series
x,
4405 4595 5045 5700 57 1 6 5 1 38 5010 5353 6074 503 1 5648 5506 4230 4827 3885
2 +--,-.-,-, 1950 1955 1 9 6 0 1 9 6 5 1 9 7 0 1 9 7 5 1980
Figure 1 .3 . Strikes in the U.S.A., 1 95 1 - 1 980 (Bureau of Labor Statistics, U.S. Labor Department).
§I. I. Examples of Time Series
EXAMPLE 1 . 1 .4 (All Star Baseball Games, 1 933- 1980).
t- 1900 33 34 x , - I - I
t- 1900 49 50 x, - I I
t- 1900 65 66 x,
t =no game.
- 2
- 3
rp
if the National League won in year t, if the American League won in year t.
37 38 39 40 4 1 42 43 44 - I - I I - I - I - I 53 54 55 56 57 58 59 60
I - I I I - I - I * *
69 70 7 1 72 73 74 75 76 - I
9*\
6 1 62 63 * * I
77 78 79 I
1 9 3 0 1 9 3 5 1 9 40 1945 1 9 5 0 1 9 5 5 1 9 6 0 1 9 6 5 1 9 7 0 1 975 1 9 8 0
Figure 1 .4. Results x,, Example 1 . 1 .4, of All-star baseball games, 1933- 1980.
48 - I
64 I
6 I. Stationary Time Series
EXAMPLE 1 . 1 .5 (Wolfer Sunspot Numbers, 1 770- 1 869).
1 770 1 77 1 1 772 1 773 1 774 1 775 1 776 1 777 1 778 1 779 1 780 1781 1 782 1 783 1 784 1 785 1 786 1 787 1 788 1 789
101 82 66 35 3 1 7
20 92
1 54 1 25 85 68 38 23 10 24 83
1 32 1 3 1 1 1 8
1 790 1 79 1 1 792 1 793 1 794 1 795 1 796 1 797 1 798 1 799 1 800 1 80 1 1 802 1 803 1 804 1 805 1 806 1 807 1 808 1 809
90 67 60 47 41 2 1 16 6 4 7
14 34 45 43 48 42 28 10 8 2
1 8 1 0 1 8 1 1 1 8 1 2 1 8 1 3 1 8 1 4 1 8 1 5 1 8 1 6 1 8 1 7 1 8 1 8 1 8 19 1 820 1 82 1 1 822 1 823 1 824 1 825 1 826 1 827 1 828 1 829
0
5 12 14 35 46 4 1 30 24 1 6 7 4 2 8
1 7 36 50 62 67
1830 1 8 3 1 1 832 1 833 1 834 1 835 1 836 1 837 1 838 1 839 1 840 1 84 1 1 842 1 843 1 844 1 845 1 846 1 847 1 848 1 849
7 1 48 28 8
1 3 57
1 22 1 38 103 86 63 37 24 1 1 1 5 40 62 98
1 24 96
1 850 1 85 1 1 852 1 853 1 854 1 855 1 856 1 857 1 858 1 859 1 860 1 86 1 1 862 1 863 1 864 1 865 1 866 1 867 1 868 1 869
66 64 54 39 2 1 7 4
23 55 94 96 77 59 44 47 30 1 6 7
37 74
1 30
80 70
6 0
30 2 0
1 0 0
1 7 7 0 1 7 8 0 1 7 9 0 1 800 1 8 1 0 1 8 2 0 1830 1 8 4 0 1 85 0 1 8 6 0 1 8 7 0
Figure 1 .5. The Wolfer sunspot numbers, 1 770- 1 869.
§ 1 . 1 . Examples of Time Series
EXAMPLE 1 . 1 .6 (Monthly Accidental Deaths in the U.S.A., 1 973-1 978).
UJ "() c
1 1
1 0
8
7
Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec.
0
1973 1 974
9007 7750 8 1 06 698 1 8928 8038 9 1 37 8422
1 00 1 7 87 14 1 0826 95 1 2 1 13 1 7 1 0 1 20 1 0744 9823 97 1 3 8743 9938 9 1 29 9 1 6 1 87 10 8927 8680
1 2 2 4
1 975 1976 1 977 1 978
8 1 62 77 1 7 7792 7836 7306 7461 6957 6892 8 1 24 7776 7726 779 1 7870 7925 8 1 06 8 1 29 9387 8634 8890 9 1 1 5 9556 8945 9299 9434
10093 1 0078 10625 1 0484 9620 9 1 79 9302 9827 8285 8037 83 14 9 1 1 0 8433 8488 8850 9070 8 1 60 7874 8265 8633 8034 8647 8796 9240
3 6 4 8 6 0
7
7 2
Figure 1.6. Monthly accidental deaths in the U.S.A., 1 973- 1 978 (National Safety Council).
These examples are of course but a few of the multitude of time series to be found in the fields of engineering, science, sociology and economics. Our purpose in this book is to study the techniques which have been developed for drawing inferences from such series. Before we can do this however, it is necessary to set up a hypothetical mathematical model to represent the data. Having chosen a model (or family of models) it then becomes possible to estimate parameters, check for goodness of fit to the data and possibly to use the fitted model to enhance our understanding of the mechanism generating the series. Once a satisfactory model has been developed, it may be used in a variety of ways depending on the particular field of application. The applications include separation (filtering) of noise from signals, prediction of future values of a series and the control of future values.
The six examples given show some rather striking differences which are apparent if one examines the graphs in Figures 1 . 1 - 1 .6. The first gives rise to a smooth sinusoidal graph oscillating about a constant level, the second to a roughly exponentially increasing graph, the third to a graph which fluctuates erratically about a nearly constant or slowly rising level, and the fourth to an erratic series of minus ones and ones. The fifth graph appears to have a strong cyclic component with period about 1 1 years and the last has a pronounced seasonal component with period 12.
In the next section we shall discuss the general problem of constructing mathematical models for such data.
§ 1.2 Stochastic Processes
The first step in the analysis of a time series is the selection of a suitable mathematical model (or class of models) for the data. To allow for the possibly unpredictable nature of future observations it is natural to suppose that each observation x, is a realized value of a certain random variable X,. The time series { x" t E T0 } is then a realization of the family of random variables {X, , t E T0 } . These considerations suggest modelling the data as a realization (or part of a realization) of a stochastic process {X,, t E T} where T 2 T0 . To clarify these ideas we need to define precisely what is meant by a stochastic process and its realizations. In later sections we shall restrict attention to special classes of processes which are particularly useful for modelling many of the time series which are encountered in practice.
Definition 1.2.1 (Stochastic Process). A stochastic process is a family of random variables {X, , t E T} defined on a probability space (Q, ff, P).
Remark 1. In time series analysis the index (or parameter) set Tis a set of time points, very often {0, ± 1 , ± 2, . . . } , { 1 , 2, 3, . . . }, [0, oo ) or ( - oo, oo ). Stochastic processes in which Tis not a subset of IR are also of importance. For example in geophysics stochastic processes with T the surface of a sphere are used to
§ 1 .2. Stochastic Processes 9
represent variables indexed by their location on the earth's surface. In this book however the index set T will always be a subset of IR.
Recalling the definition of a random variable we note that for each fixed t E T, X, is in fact a function X,( . ) on the set n. On the other hand, for each fixed wEn, X.(w) is a function on T.
Definition 1.2.2 (Realizations of a Stochastic Process). The functions {X.(w), w E!l} on T are known as the realizations or sample-paths of the process {X,, t E T} .
Remark 2. We shall frequently use the term time series to mean both the data and the process of which it is a realization.
The following examples illustrate the realizations of some specific stochastic processes. The first two could be considered as possible models for the time series of Examples 1 . 1 . 1 and 1 . 1 .4 respectively.
ExAMPLE 1 .2. 1 (Sinusoid with Random Phase and Amplitude). Let A and 0 be independent random variables with A :;:::: 0 and 0 distributed uniformly on (0, 2n). A stochastic process { X(t), t E IR} can then be defined in terms of A and 0 for any given v :;:::: 0 and r > 0 by
X, = r-1 A cos(vt + 0), ( 1 .2. 1 )
o r more explicitly,
X,(w) = r-1 A (w)cos(vt + 0(w)), ( 1 .2.2)
where w is an element of the probability space n on which A and 0 are defined. The realizations of the process defined by 1 .2.2 are the functions of t
obtained by fixing w, i.e. functions of the form
x (t) = r-1 a cos(vt + (}).
The time series plotted in Figure 1 . 1 is one such realization.
EXAMPLE 1 .2.2 (A Binary Process). Let {X,, t = 1, 2, . . . } be a sequence of independent random variables for each of which
P(X, = 1 ) = P(X, = - 1) = l ( 1 .2.3)
In this case it is not so obvious as in Example 1 .2. 1 that there exists a probability space (Q, ff, P) with random variables X 1 , X2 , . • • defined on n having the required joint distributions, i.e. such that
( 1 .2.4)
for every n-tuple (i 1 , . . . , in) of 1 's and - 1 's. The existence of such a process is however guaranteed by Kolmogorov's theorem which is stated below and discussed further in Section 1 .7.
The time series obtained by tossing a penny repeatedly and scoring + 1 for each head, - I for each tail is usually modelled as a realization of the process defined by ( 1 .2.4 ). Each realization of this process is a sequence of 1 's and - 1 's.
A priori we might well consider this process as a model for the All Star baseball games, Example 1 . 1 .4. However even a cursory inspection of the results from 1 963 onwards casts serious doubt on the hypothesis P(X, = 1) = t·
ExAMPLE 1 .2.3 (Random Walk). The simple symmetric random walk {S, t = 0, I, 2, . . . } is defined in terms of Example 1 .2.2 by S0 = 0 and
t 1 . ( 1 .2.5)
The general random walk is defined in the same way on replacing X1 , X2 , . • •
by a sequence of independently and identically distributed random variables whose distribution is not constrained to satisfy ( 1 .2.3). The existence of such an independent sequence is again guaranteed by Kolmogorov's theorem (see Problem 1 . 1 8).
ExAMPLE 1 .2.4 (Branching Processes). There is a large class of processes, known as branching processes, which in their most general form have been applied with considerable success to the modelling of population growth (see for example lagers ( 1 976)). The simplest such process is the Bienayme Galton-Watson process defined by the equations X0 = x (the population size in generation zero) and
t = 0, 1, 2, 0 0 0 ' ( 1 .2.6)
where Z,,j, t = 0, I , . . . , j = 1 , 2, are independently and identically distributed non-negative integer-valued random variables, Z,,j, representing the number of offspring of the ph individual born in generation t.
In the first example we were able to define X,(w) quite explicitly for each t and w. Very frequently however we may wish (or be forced) to specify instead the collection of all joint distributions of all finite-dimensional vectors (X, , , X,2, . . . , X,J, t = (t1, . . . , t") E T", n E {I , 2, . . . }. In such a case we need to be sure that a stochastic process (see Definition 1 .2. 1 ) with the specified distributions really does exist. Kolmogorov's theorem, which we state here and discuss further in Section 1.7 , guarantees that this is true under minimal conditions on the specified distribution functions. Our statement of Kolmo gorov's theorem is simplified slightly by the assumption (Remark 1) that T is a subset of IR and hence a linearly ordered set. If T were not so ordered an additional "permutation" condition would be required (a statement and proof of the theorem for arbitrary T can be found in numerous books on probability theory, for example Lamperti, 1 966).
§ 1 .3. Stationarity and Strict Stationarity 1 1
Definition 1.2.3 (The Distribution Functions of a Stochastic Process {X ' t E Tc !R}). Let 5be the set of all vectors {t = (t 1 , . . . , tn)' E Tn: t 1 < t2 < · · · < tn , n = 1 , 2, . . . }. Then the (finite-dimensional) distribution functions of {X ' t E T} are the functions { F1 ( • ), t E 5} defined for t = (t 1 , • • • , tn)' by
Theorem 1.2.1 (Kolmogorov's Theorem). The probabi li ty di stri buti on functi ons { F1( • ), t E 5} are the di stri buti on functi ons of some stochasti c process if and only if for any n E { 1 , 2, . . . }, t = (t 1, . . . , tn)' E 5 and 1 :-:::; i :-:::; n,
lim F1(x) = F1<;>(x(i )) ( 1 .2.8)
where t (i ) and x(i ) are the (n - I )- component vectors obtai ned by d eleti ng the i'h components oft and x respecti vely.
If (M · ) is the characteristic function corresponding to F1( • ), i.e.
tP1(u) = l e ;u·x F. (d x 1 , . , . , d xn), J n U = (u 1 , . . . , un)' E !Rn,
then ( 1 .2.8) can be restated in the equivalent form,
lim tP1(u) = tPt(i)(u(i )), ui-+0
( 1 .2.9)
where u (i) is the (n - I )-component vector obtained by deleting the i 1h component of u.
Condition ( 1 .2.8) is simply the "consistency" requirement that each function F1( · ) should have marginal distributions which coincide with the specified lower dimensional distribution functions.
§ 1 .3 Stationarity and Strict Stationarity
When dealing with a finite number of random variables, it is often useful to compute the covariance matrix (see Section 1 .6) in order to gain insight into the dependence between them. For a time series {X1, t E T} we need to extend the concept of covariance matrix to deal with infinite collections of random variables. The autocovariance function provides us with the required extension.
Definition 1.3.1 (The Autocovariance Function). If {X, , t E T} is a process such that Var(X1) < oo for each t E T, then the autocovariance function Yx( · , · ) of { X1 } is defined by
Yx(r, s) = Cov(X,X.) = E[(X, - EX,) (Xs - EX5)], r, s E T. ( 1 .3 . 1 )
1 2 I . Stationary Time Series
Definition 1.3.2 (Stationarity). The time series { X0 t E Z } , with index set Z = {0, ± 1 , ± 2, . . . }, is said to be stationary if
and
(i) E I X1 1 2 < oo for all t E Z,
(ii) EX1 = m for all t E £',
(iii) Yx(r, s) = Yx(r + t, s + t) for all r, s, t E £'.
Remark I. Stationarity as just defined is frequently referred to in the literature as weak stationarity, covariance stationarity, stationarity in the wide sense or second-order stationarity. For us however the term stationarity, without further qualification, will always refer to the properties specified by Definition 1 .3.2.
Remark 2. If { X1, t E Z } is stationary then Yx(r, s) = Yx(r - s, 0) for all r, s E £'. I t i s therefore convenient to redefine the autocovariance function of a stationary process as the function of just one variable,
Yx(h) = Yx(h, 0) = Cov(Xr+h > X1) for all t, h E £'.
The function YxC ) will be referred to as the autocovariance function of { X1} and Yx(h) as its value at "lag" h. The autocorrelation function (acf) of { X1} is defined analogously as the function whose value at lag h is
Px(h) = Yx(h)!Yx(O) = Corr(Xr+h> X1) for all t, h E 7L.
Remark 3. It will be noticed that we have defined stationarity only in the case when T = Z. It is not difficult to define stationarity using a more general index set, but for our purposes this will not be necessary. If we wish to model a set of data { X1, t E T c Z } as a realization of a stationary process, we can always consider it to be part of a realization of a stationary process { X1, t E Z } .
Another important and frequently used notion of stationarity i s introduced in the following definition.
Definition 1.3.3 (Strict Stationarity). The time series { X0 t E Z } is said to be strictly stationary if the joint distributions of(X1,, • • • , X1J and (X1, +h, . . . , Xr.+h)' are the same for all positive integers k and for all t 1, . . . , tk , h E£'.
Strict stationarity means intuitively that the graphs over two equal-length time intervals of a realization of the time series should exhibit similar statistical characteristics. For example, the proportion of ordinates not exceeding a given level x should be roughly the same for both intervals.
Remark 4. Definition 1 .3 .3 is equivalent to the statement that (X 1, • • • , Xk)' and (X l+h' . . . , Xk+h)' have the same joint distribution for all positive integers k and integers h.
§ 1 .3. Stationarity and Strict Stationarity 13
The Relation Between Stationarity and Strict Stationarity
If { X1 } is strictly stationary it immediately follows, on taking k = 1 in Definition 1 .3 .3, that X1 has the same distribution for each t E 7!.. . If E IX1I 2 < oo this implies in particular that EX1 and Var(X1) are both constant. Moreover, taking k = 2 in Definition 1 .3.3, we find that Xt+h and X1 have the same joint distribution and hence the same covariance for all h E 7!... Thus a strictly stationary process with finite second moments is stationary.
The converse of the previous statement is not true. For example if {X1 } is a sequence of independent random variables such that X1 is exponentially distributed with mean one when t is odd and normally distributed with mean one and variance one when t is even, then {X1} is stationary with Yx(O) = 1 and Yx(h) = 0 for h =F 0. However since X 1 and X 2 have different distributions, { X1 } cannot be strictly stationary.
There is one important case however in which stationarity does imply strict stationarity.
Definition 1 .3.4 (Gaussian Time Series). The process { X1 } is a Gaussian time series if and only if the distribution functions of { X1} are all multivariate normal.
If { Xn t E 7!.. } is a stationary Gaussian process then { X1 } is strictly stationary, since for all n E { 1 , 2, . . . } and for all h, t 1 , t2 , • • • E Z, the random vectors (X1,, . . • , X1} and (X1, +h• . . . , X1"+h)' have the same mean and covariance matrix, and hence the same distribution.
ExAMPLE 1 .3 . 1 . Let X1 = A cos(8t) + B sin(8t) where A and B are two uncor related random variables with zero means and unit variances with 8 E [ -n, n]. This time series is stationary since
Cov(Xr+h• X1) = Cov(A cos(8(t + h)) + B sin(8(t + h)), A cos(8t) + B sin(8t))
= cos(8t)cos(8(t + h)) + sin(8t)sin(8(t + h))
which is independent of t.
EXAMPLE 1 .3.2. Starting with an independent and identically distributed sequence of zero-mean random variables Z1 with finite variance ai, define XI = zl + ezt-1· Then the autocovariance function of XI is given by
Cov(Xt+h• XI) = Cov(Zt+h + ezt+h-1> zl + ezt-1 ) { ( 1 + 82 )al if h = 0, = 8al if h = ± 1 ,
0 if I h i > 1 ,
and hence { X1 } is stationary. In fact it can be shown that { X1 } is strictly stationary (see Problem 1 . 1 ).
EXAMPLE 1 .3 .3. Let
{Y, x-- ¥,+ 1 if t is even, if t is odd.
where { Y, } is a stationary time series. Although Cov(Xr+h• X1) = yy(h), { X1 } is not stationary for it does not have a constant mean.
EXAMPLE 1 .3.4. Referring to Example 1 .2.3, let st be the random walk S1 = X 1 + X 2 + · · · + X, where X 1, X 2 , . . . , are independent and identically distributed with mean zero and variance (J2 . For h > 0, ( t+h t )
Cov(Sr+h• S1) = Cov ; X;, j
Xj
and thus st is not stationary.
Stationary processes play a crucial role in the analysis of time series. Of course many observed time series (see Section 1 . 1 ) are decidedly non stationary in appearance. Frequently such data sets can be transformed by the techniques described in Section 1 .4 into series which can reasonably be modelled as realizations of some stationary process. The theory of stationary processes (developed in later chapters) is then used for the analysis, fitting and prediction of the resulting series. In all of this the autocovariance function is a primary tool. Its properties will be discussed in Section 1 .5 .
§ 1 .4 The Estimation and Elimination of Trend and Seasonal Components
The first step in the analysis of any time series is to plot the data. If there are apparent discontinuities in the series, such as a sudden change of level, it may be advisable to analyze the series by first breaking it into homogeneous segments. If there are outlying observations, they should be studied carefully to check whether there is any justification for discarding them (as for example if an observation has been recorded of some other process by mistake). Inspection of a graph may also suggest the possibility of representing the data as a realization of the process (the "classical decomposition" model),
§ 1 .4. The Estimation and Elimination of Trend and Seasonal Components 1 5
X, = m, + s, + r;, ( 1 .4. 1 )
where m , i s a slowly changing function known as a "trend component", s, i s a function with known period d referred to as a "seasonal component", and r; is a "random noise component" which is stationary in the sense of Definition 1 .3.2. If the seasonal and noise fluctuations appear to increase with the level of the process then a preliminary transformation of the data is often used to make the transformed data compatible with the model ( 1 .4. 1) . See for example the airline passenger data, Figure 9.7, and the transformed data, Figure 9.8, obtained by applying a logarithmic transformation. In this section we shall discuss some useful techniques for identifying the components in ( 1 .4. 1) .
Our aim is to estimate and extract the deterministic components m, and s, in the hope that the residual or noise component r; will turn out to be a stationary random process. We can then use the theory of such processes to find a satisfactory probabilistic model for the process {I; } , to analyze its properties, and to use it in conjunction with m, and s, for purposes of prediction and control of {X, } .
An alternative approach, developed extensively by Box and Jenkins ( 1970), is to apply difference operators repeatedly to the data { x, } until the differenced observations resemble a realization of some stationary process {Wr}. We can then use the theory of stationary processes for the modelling, analysis and prediction of { Wr } and hence of the original process. The various stages of this procedure will be discussed in detail in Chapters 8 and 9.
The two approaches to trend and seasonality removal, (a) by estimation of m, and s, in ( 1 .4. 1 ) and (b) by differencing the data { x, } , will now be illustrated with reference to the data presented in Section 1 . 1 .
Elimination of a Trend i n the Absence of Seasonality
In the absence of a seasonal component the model ( 1 .4. 1 ) becomes
t = 1, . . . , n ( 1 .4.2)
where, without loss of generality, we can assume that EI; = 0.
Method 1 (Least Squares Estimation of m, ). In this procedure we attempt to fit a parametric family of functions, e.g.
( 1 .4.3)
to the data by choosing the parameters, in this illustration a0, a 1 and a2 , to minimize ,L, (x, - m,f .
Fitting a function of the form ( 1 .4.3) to the population data of Figure 1 .2, 1 790 :::::; t :::::; 1 980 gives the estimated parameter values,
llo = 2.0979 1 1 X 1 0 1 0,
a1 = -2.334962 x 107,
2 6 0
2 4 0
2 2 0
2 0 0
1 8 0
1 60 0
2- 1 20
4 0
2 0
0 1 78 0 1 8 3 0 188 0 1 93 0 1 98 0
Figure 1 .7. Population of the U.S.A., 1 790- 1980, showing the parabola fitted by least squares.
and a2 = 6.49859 1 x 1 03.
A graph of the fitted function is shown with the original data in Figure 1 .7. The estimated values of the noise process 1;, 1 790 $; t $; 1 980, are the residuals obtained by subtraction of m t = ao + a! t + llzt2 from xt.
The trend component m1 furnishes us with a natural predictor of future values of X1• For example if we estimate ¥1990 by its mean value (i.e. zero) we obtain the estimate,
m1990 = 2.484 x 1 08,
for the population of the U.S.A. in 1 990. However if the residuals { Yr} are highly correlated we may be able to use their values to give a better estimate of ¥1990 and hence of X 1990 .
Method 2 (Smoothing by Means of a Moving Average). Let q be a non negative integer and consider the two-sided moving average,
q w, = (2q + 1 )-1 L xt+j• ( 1 .4.4)
j=-q of the process { X1 } defined by ( 1 .4.2). Then for q + 1 $; t $; n - q,
q q w, = (2q + 1 )-
l 2: mt+j + (2q + 1 )-l 2: Yr+j j=-q j=-q ( 1 .4.5)
§ 1.4. The Estimation and Elimination of Trend and Seasonal Components 17
assuming that m, is approximately linear over the interval [t - q, t + q] and that the average of the error terms over this interval is close to zero.
The moving average thus provides us with the estimates q
m, = (2q + W1 L x,+ j, q + 1 ::; t ::; n - q. ( 1 .4.6) j=-q Since X, is not observed for t ::; 0 or t > n we cannot use ( 1 .4.6) for t ::; q or t > n- q. The program SMOOTH deals with this problem by defining X,:= X 1 for t < 1 and X,:= X n for t > n. The results of applying this program to the strike data of Figure 1 . 3 are shown in Figure 1 .8. The estimated noise terms, Y, = X, - m" are shown in Figure 1 .9. As expected, they show no apparent trend.
For any fixed a E [0, 1], the one-sided moving averages m,, t = 1 , . . . , n,
defined by the recursions, m, = aX, + ( 1 - a)m,_1, t = 2, . . . , n, ( 1 .4.7)
and ( 1 .4.8)
can also be computed using the program SMOOTH. Application of ( 1 .4.7) and ( 1 .4.8) is often referred to as exponential smoothing, since it follows from these recursions that, for t :;:o: 2, m, = Li a(l - a'jX,_ i + (1 - a)'- 1 X 1 , a weighted moving average of X,, X,_ 1, • • . , with weights decreasing expo nentially (except for the last one).
It is useful to think of { m,} in ( 1 .4.6) as a process obtained from {X,} by application of a linear operator or linear filter, m, = L-co ajx,+ j with
6
t:, 4
2 +- 1950 1955 1960 1965 1970 1975 1980
Figure 1 .8. Simple 5-term moving average m, of the strike data from Figure 1 .3.
18 I . Stationary Time Series
1 ,-------------------------------------------------, 0.9 0.8 0.7 0.6 0.5 0.4 0.3
'-;;' 0.2 -g 0.1 0 04-4---+-4-+------r--_,-+--+- :J 0 -0.1 .r: t:., -0.2
-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9
-1 1950 1955 1960 1965 1970 1975 1980
Figure 1 .9. Residuals, Y, = x, - m,, after subtracting the 5-term moving average from the strike data.
weights aj = (2q + 1 )-1 , - q s j s q, and aj = 0, Ul > q. This particular filter is a "low-pass" filter since it takes the data { x,} and removes from it the rapidly fluctuating (or high frequency) component { Y,}, to leave the slowly varying estimated trend term { m,} (see Figure 1 . 1 0).
{x,} Linear filter
Figure 1 . 1 0. Smoothing with a low-pass linear filter.
The particular filter ( 1 .4.6) is only one of many which could be used for smoothing. For large q, provided (2q + 1 )- 1 2J=-q Y,+i 0, it will not only attenuate noise but at the same time will allow linear trend functions m, =
at + b, to pass without distortion. However we must beware of choosing q to be too large since if m, is not linear, the filtered process, although smooth, will not be a good estimate of m,. By clever choice of the weights { aj} it is possible to design a filter which will not only be effective in attenuating noise from the data, but which will also allow a larger class of trend functions (for example all polynomials of degree less than or equal to 3) to pass undistorted through the filter. The Spencer 1 5-point moving average for example has weights
ai = 0, I ii > 7,
with
I i i :,;; 7, and
[a0 , a1 , ... , a7 ] = 3i0 [74, 67, 46, 2 1, 3, - 5, - 6, - 3] .
Applied to the process ( 1 .4.3) with m, = at3 + bt2 + ct + d , it gives 7 7 7
L a;Xt+i = L a;mt+i + L a; Yr+i i=-7 i= -7 i=- 7
7
= mo
( 1 .4.9)
where the last step depends on the assumed form of m, (Problem 1 .2). Further details regarding this and other smoothing filters can be found in Kendall and Stuart, Volume 3, Chapter 46.
Method 3 (Differencing to Generate Stationary Data). Instead of attempting to remove the noise by smoothing as in Method 2, we now attempt to eliminate the trend term by differencing. We define the first difference operator V by
VX, =X,- Xt-1 = ( 1 - B)X0
where B is the backward shift operator,
BX, = X,-1·
( 1 .4. 1 0)
( 1 .4. 1 1 )
Powers of the operators B and V are defined in the obvious way, i.e. Bj(X,) = X,_j and Vj(X,) = V(Vj-1(X,)),j 1 with V0(X,) =X,. Polynomials in B and V are manipulated in precisely the same way as polynomial functions of real variables. For example
=X,- 2X,_1 + X,_z.
If the operator V is applied to a linear trend function m1 = at + b, then we obtain the constant function Vm, = a. In the same way any polynomial trend of degree k can be reduced to a constant by application of the operator Vk
(Problem 1 .4). Starting therefore with the model X, = m, + Yr where m, = LJ=o alj and
Yr is stationary with mean zero, we obtain
VkX, = k! ak + VkYr,
a stationary process with mean k!ak. These considerations suggest the possibility, given any sequence { x,} of data, of applying the operator V repeatedly until we find a sequence {Vkx,} which can plausibly be modelled as a realization of a stationary process. It is often found in practice that the
2 0
1 5
1 0
- 5
- 1 0
- 1 5
- 20 1 78 0 1 8 3 0 1 880 1 9 3 0 1 980
Figure 1 . 1 1 . The twice-differenced series derived from the population data of Figure 1 .2.
order k of differencing required is quite small, frequently one or two. (This depends on the fact that many functions can be well approximated, on an interval of finite length, by a polynomial of reasonably low degree.)
Applying this technique to the twenty population values { xn , n = 1 , . . . , 20} of Figure 1 .2 we find that two differencing operations are sufficient to produce a series with no apparent trend. The differenced data, V2 xn = xn - 2xn- t + xn- z, are plotted in Figure 1 . 1 1 . Notice that the magnitude of the fluctuations in V2 xn increase with the value of xn - This effect can be suppressed by first taking natural logarithms, Yn = In xn, and then applying the operator V2 to the series { Yn } · (See also Section 9.2 (a).)
Elimination of both Trend and Seasonality
The methods described for the removal of trend can be adapted in a natural way to eliminate both trend and seasonality in the general model
( 1 .4. 1 2)
where El; = 0, st+d = S1 and I1=t si = 0. We illustrate these methods, with reference to the accident data of Example 1 . 1 .6 (Figure 1 .6) for which the period d of the seasonal component is clearly 1 2.
It will be convenient in Method 1 to index the data by year and month. Thus xi.k >j = 1, . . . , 6, k = 1, . . . , 1 2 will denote the number ofaccidental deaths
reported for the k th month of the/h year, ( 1 972 + j). I n other words we define
j = 1 ' . . . ' 6, k = 1 ' . . . ' 1 2.
Method Sl (The Small Trend Method). If the trend is small (as in the accident data) it is not unreasonable to suppose that the trend term is constant, say mi, for the /h year. Since :Lf,:1 sk = 0, we are led to the natural unbiased estimate
1 1 2 mj = 1 2 I xj. k > k=l
while for sk > k = 1 , . . . , 1 2 we have the estimates,
1 6 .sk = -6 I (xj.k - mJ,
j= l
( 1 .4. 14)
which automatically satisfy the requirement that :LI,:1 sk = 0. The estimated error term for month k of the /h year is of course
Y k = x. k - m - .sk ), j, J ' j = 1 , . . . , 6, k = 1 , . . . , 1 2. ( 1 .4. 1 5)
The generalization of( 1 .4 . 1 3)-( 1 .4. 1 5) to data with seasonality having a period other than 1 2 should be apparent.
In Figures 1 . 1 2, 1 . 1 3 and 1 . 1 4 we have plotted respectively the detrended observations xj, k - mi, the estimated seasonal components sk > and the de-
Vl u c
- 1
- 2
0 1 2 2 4 3 6 48 6 0 7 2
Figure 1 . 1 2. Monthly accidental deaths from Figure 1 .6 after subtracting the trend estimated by Method S l .
22
- 1
- 2
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 3 . The seasonal component o f the monthly accidental deaths, estimated by Method S l .
Vl D c
- 1
- 2
0 1 2 2 4 3 6 4 8 6 0 7 2
Figure 1 . 1 4. The detrended and deseasonalized monthly accidental deaths (Method S l).
§ 1 .4. The Estimation and Elimination of Trend and Seasonal Components 23
(/) u c 0 (/) :J 0 .J:
t:.
7
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 5. Comparison of the moving average and piecewise constant estimates of trend for the monthly accidental deaths.
trended, deseasonalized observations .k = xi. k - mi - sk . The latter have no apparent trend or seasonality.
Method S2 (Moving Average Estimation). The following technique is preferable to Method S 1 since it does not rely on the assumption that mr is nearly constant over each cycle. It is the basis for the "classical decomposition" option in the time series identification section of the program PEST.
Suppose we have observations {x 1 , . . . , x.} . The trend is first estimated by applying a moving average filter specially chosen to eliminate the seasonal component and to dampen the noise. If the period d is even, say d = 2q, then we use
mt = (0.5Xr-q + Xr-q+l + ' ' ' + Xr+q- 1 + 0.5Xr+q)/d, ( 1 .4. 1 6)
q < t s n - q.
If the period is odd, say d = 2q + 1, then we use the simple moving average ( 1 .4.6).
In Figure 1 . 1 5 we show the trend estimate mn 6 < t s 66, for the accidental deaths data obtained from ( 1 .4. 1 6). Also shown is the piecewise constant estimate obtained from Method S l .
The second step is to estimate the seasonal component. For each k = 1 , . . . , d we compute the average wk of the deviations { (xk+ id - mk+ id) : q < k + jd s n - q}. Since these average deviations do not necessarily sum to zero, we
Table 1 . 1 . Estimated Seasonal Components for the Accidental Deaths Data k 2 3 4 5 6 7 8 9 10 1 1
., (Method S 1) - 744 - 1 504 - 724 - 523 338 808 1 665 96 1 - 87 197 - 32 1
., (Method S2) - 804 - 1 522 - 737 - 526 343 746 1 680 987 - 109 258 - 259
estimate the seasonal component sk as
k = 1 , . . . , d, ( 1 .4. 1 7)
and sk = sk-d• k > d. The deseasonalized data is then defined to be the original series with the
estimated seasonal component removed, i.e.
d, = x, - s, t = 1 , . . . , n. ( 1 .4. 1 8)
Finally we reestimate the trend from { d, } either by applying a moving average filter as described earlier for non-seasonal data, or by fitting a polynomial to the series { d, } . The program PEST allows the options of fitting a linear or quadratic trend m,. The estimated noise terms are then
5; = x, - m, - s,, t = 1 , . . . , n.
The results of applying Methods S l and S2 to the accidental deaths data are quite similar, since in this case the piecewise constant and moving average estimates of m, are reasonably close (see Figure 1 . 1 5).
A comparison of the estimates of sk > k = 1 , . . . , 1 2, obtained by Methods S 1 and S2 is made in Table 1 . 1 .
Method S3 (Differencing a t Lag d). The technique of differencing which we applied earlier to non-seasonal data can be adapted to deal with seasonality of period d by introducing the lag-d difference operator vd defined by
( 1 .4. 1 9)
(This operator should not be confused with the operator Vd = ( 1 - B)d defined
earlier.) Applying the operator Vd to the model,
X, = m, + s, + Y,,
where { s, } has period d, we obtain
which gives a decomposition of the difference vdxt into a trend component (m, - m,_d) and a noise term ( Y, - Y,-d). The trend, m, - m,_d, can then be eliminated using the methods already described, for example by application of some power of the operator V.
Figure 1 . 1 6 shows the result of applying the operator V1 2 to the accidental
1 2
Vl 1J c
- 1
- 2
0 1 2 2 4 3 6 48 6 0 7 2
Figure 1. 16. The differenced series {V 1 2 x,, t = 1 3, . . . , 72} derived from the monthly accidental deaths {x, , t = ! , . . . , 72}.
deaths data. The seasonal component evident in Figure 1 .6 is absent from the graph of V 1 2 x, 1 3 :s:; t :s:; 72. There still appears to be a non-decreasing trend however. If we now apply the operator V to V1 2x, and plot the resulting differences VV1 2x,, t = 14, . . . , 72, we obtain the graph shown in Figure 1 . 1 7, which has no apparent trend or seasonal component. In Chapter 9 we shall show that the differenced series can in fact be well represented by a stationary time series model.
In this section we have discussed a variety of methods for estimating and/or removing trend and seasonality. The particular method chosen for any given data set will depend on a number of factors including whether or not estimates of the components of the series are required and whether or not it appears that the data contains a seasonal component which does not vary with time. The program PEST allows two options, one which decomposes the series as described in Method S2, and the other which proceeds by successive differencing of the data as in Methods 3 and S3.
§1.5 The Autocovariance Function of a Stationary Process
In this section we study the properties of the autocovariance function intro duced in Section 1 .3.
26
- 1
- 2
0 1 2 2 4 3 6 48 60 7 2
Figure 1 . 1 7. The differenced series {VV1 2 x,, t = 14, . . . , 72} derived from the monthly accidental deaths {x, , t = 1, . . . , 72}.
Proposition 1 .5.1 (Elementary Properties). If y( · ) is the autocovariance function of a stationary process {X, t E Z} , then
y(O) :;::.: 0,
l y (h) l :::;; y(O) for all h E Z,
and y( · ) is even, i.e.
y (h) = y( - h) for all h E Z.
( 1 .5. 1 )
( 1 .5 .2)
( 1 .5.3)
PROOF. The first property is a statement of the obvious fact that Var(X,) :;::>: 0, the second is an immediate consequence of the Cauchy-Schwarz inequality,
and the third is established by observing that
y( - h) = Cov(X,_h , X,) = Cov(X, X,+h) = y(h). D
Autocovariance functions also have the more subtle property of non negative definiteness.
Definition 1 .5.1 (Non-Negative Definiteness). A real-valued function on the integers, K : Z --> IR, is said to be non-negative definite if and only if
§1 .5. The Autocovariance Function of a Stationary Process
n L a;K(t; - ti)ai 0 i,j= l
27
( 1 .5 .4)
for all positive integers n and for all vectors a = (a 1 , . . . , anY E !Rn and t = (t 1 , . . . , tnY E zn or if and only if Li.i = 1 a;K(i - j)ai 0 for all such n and a.
Theorem 1 .5.1 (Characterization of Autocovariance Functions). A real-valued function defined on the integers is the autocovariance function of a stationary time series if and only if it is even and non-negative definite.
PROOF. To show that the autocovariance function y( · ) of any stationary time series {X, } is non-negative definite, we simply observe that if a = (a1 , • • • , anY E !Rn, t = (t 1 , . . . , tn)' E zn, and Z1 = (X,, - EX,, , . . . , X,., - EX,J', then
= a'rna n
= L a;y(t; - ti)ai, i,j=l
where rn = [y(t; - ti)]i.i= l is the covariance matrix of (X, , , . . . , X,). To establish the converse, let K : Z --> IR be an even non-negative definite
function. We need to show that there exists a stationary process with K( · ) as its autocovariance function, and for this we shall use Kolmogorov's theorem. For each positive integer n and each t = (t 1' . . . ' tnY E zn such that t 1 < t2 < · · · < tn , let F1 be the distribution function on !Rn with characteristic function
tP1(u) = exp( - u'Ku/2),
where u = (u 1 , . . . , unY E !Rn and K = [K(t; - ti)]i.i=I · Since K is non-negative definite, the matrix K is also non-negative definite and consequently tPt is the characteristic function of an n-variate normal distribution with mean zero and covariance matrix K (see Section 1 .6). Clearly, in the notation of Theorem 1 .2. 1 ,
tPt< ;>(u(i)) = lim tP1(u) for each t E Y, uc-·""0
i.e. the distribution functions F1 are consistent, and so by Kolmogorov's theorem there exists a time series {X, } with distribution functions F1 and characteristic functions tP1, t E Y. In particular the joint distribution of X; and Xi is bivariate normal with mean 0 and covariance matrix
[ K(O) K(i - j)J K(i - j) K(O) '
which shows that Cov(X;, XJ = K(i - j) as required. D
Remark l . As shown in the proof of Theorem 1 .5. 1 , for every autocovariance function y( · ), there exists a stationary Gaussian time series with y( · ) as its autocovariance function.
Remark :Z. To verify that a given function is non-negative definite it is sometimes simpler to specify a stationary process with the given autocovariance function than to check Definition 1 .5. 1 . For example the function K(h) = cos(Bh), h E Z, is the autocovariance function of the process in Example 1 .3 . 1 and is therefore non-negative definite. Direct verification by means of Definition 1 .5 . 1 however is more difficult. Another simple criterion for checking non-negative definite ness is Herglotz's theorem, which will be proved in Section 4.3.
Remark 3. An autocorrelation function p( ·) has all the properties of an autocovariance function and satisfies the additional condition p(O) = 1 .
ExAMPLE 1 . 5 . 1 . Let us show that the real-valued function on Z, {1 if h = 0, K(h) = p if h = ± 1 ,
0 otherwise,
is an autocovariance function if and only if I P I t. If I p I i then K ( · ) i s the autocovariance function of the process defined in
Example 1 .3.2 with (J2 = ( 1 + B2r1 and e = (2pr1 ( 1 ± j1 - 4p2 ). If p > !, K = [K(i - j)J7,j=t and a is the n-component vector a =
( 1 , - 1 , 1 , - 1 , . . . )', then
a'Ka = n - 2(n - 1)p < 0 for n > 2pj(2p - 1 ),
which shows that K( · ) is not non-negative definite and therefore, by Theorem 1 .5 . 1 is not an autocovariance function.
If p < -i, the same argument using the n-component vector a = ( 1 , 1 , 1 , . . . )' again shows that K( · ) is not non-negative definite.
The Sample Autocovariance Function of an Observed Series
From the observations {x 1 , x2 , . . . , xn } of a stationary time series {Xr } we frequently wish to estimate the autocovariance function y( · ) of the underlying process { Xr } in order to gain information concerning its dependence structure. This is an important step towards constructing an appropriate mathematical model for the data. The estimate of y( · ) which we shall use is the sample autocovariance function.
Definition 1 .5.2. The sample autocovariance function of { x 1 , . . . , xn } is defined by
§1 .5. The Autocovariance Function of a Stationary Process
n-h P(h) := n-1 I (xj+h - x) (xj - x),
j= 1 0 :<::; h < n,
29
and y(h) = y( - h), - n < h :-:::; 0, where .X is the sample mean .X = n-1 I'i=1 xi. Remark 4. The divisor n is used rather than (n - h) since this ensures that the matrix f" := [y(i - j)J7. j= 1 is non-negative definite (see Section 7.2).
Remark 5. The sample autocorrelation function is defined in terms of the sample autocovariance function as
p(h) := y(h)/Y(O), l h l < n.
The corresponding matrix Rn := [p(i - j)J7. i=1 is then also non-negative definite.
Remark 6. The large-sample properties of the estimators y(h) and p(h) are discussed in Chapter 7.
EXAMPLE 1 .5.2. Figure 1 . 1 8(a) shows 300 simulated observations of the series X, = Z, + 8Z,_1 of Example 1 .3 .2 with 8 = 0.95 and Z, N(O, 1 ). Figure 1 . 1 8 (b) shows the corresponding sample autocorrelation function at lags 0, . . . , 40. Notice the similarity between p( · ) and the function p( · ) computed as described in Example 1 .3.2 (p(h) = 1 for h = 0, .4993 for h = ± 1 , 0 otherwise).
EXAMPLE 1 .5 .3 . Figures 1 . 1 9(a) and 1 . 1 9(b) show simulated observations and the corresponding sample autocorrelation function for the process X, =
Z, + 8Z,_1 , this time with 8 = - 0.95 and Z, N(O, 1 ). The similarity between p( · ) and p( · ) is again apparent.
Remark 7. Notice that the realization of Example 1 .5.2 is less rapidly fluctuating than that of Example 1 .5.3. This is to be expected from the two autocorrelation functions. Positive autocorrelation at lag 1 reflects a tendency for successive observations to lie on the same side of the mean, while negative autocorrelation at lag I reflects a tendency for successive observations to lie on opposite sides of the mean. Other properties of the sample-paths are also reflected in the autocorrelation (and sample autocorrelation) functions. For example the sample autocorrelation function of the Wolfer sunspot series (Figure 1 .20) reflects the roughly periodic behaviour of the data (Figure 1 .5).
Remark 8. The sample autocovariance and autocorrelation functions can be computed for any data set {x 1 , . . . , xn } and are not restricted to realizations of a stationary process. For data containing a trend, I P{h) l will exhibit slow decay as h increases, and for data with a substantial deterministic periodic component, p(h) will exhibit similar behaviour with the same periodicity. Thus p( · ) can be useful as an indicator of non-stationarity (see also Section 9. 1 ).
30
5
4
3
2
0
- 1
- 2
- 3
- 4
- 5 0 50
1 0 . 9 0 .8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 . 5 - 0 6 - 0 . 7 - 0 .8 - 0 . 9
- 1 0
3 0 4 0
Figure 1 . 1 8. (a) 300 observations of the series X, = Z, + .95Z, _ 1 , Example 1 .5.2. (b) The sample autocorrelation function p(h), 0 :s; h :s; 40.
§ 1 .5. The Autocovariance Function of a Stationary Process
0
- 1
- 2
- 3
- 4
- 5
1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 5 - 0 . 6 - 0 . 7 - 0 . 8 - 0 . 9
- 1
(a)
20
(b)
3 0 4 0
Figure 1 . 1 9. (a) 300 observations of the series X, = Z, - .95Z, _ 1 , Example 1 .5.3. (b) The sample autocorrelation function p(h), 0 h 40.
1 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1
0 - 0 . 1 - 0 . 2 - 0 . 3 - 0 . 4 - 0 . 5 - 0 . 6 - 0 . 7 - 0 . 8 - 0 . 9
- 1 c 1 0 2 0 3 0 4 0
Figure 1.20. The sample autocorrelation function of the Wolfer sunspot numbers (see Figure 1 .5).
§ 1 .6 The Multivariate Normal Distribution
An n-dimensional random vector is a column vector, X = (X 1 , . . . , X.)', each of whose components is a random variable. If E I X; ! < oo for each i, then we define the mean or expected value of X to be the column vector,
( 1 .6. 1 )
I n the same way we define the expected value of any array whose elements are random variables (e.g. a matrix of random variables) to be the same array with each random variable replaced by its expected value (assuming each expectation exists).
If X = (X 1 , . . . , X.)' and Y = ( Y1 , . . . , Ym)' are random vectors such that E I X; / 2 < oo, i = 1 , . . . , n, and E l ¥; 1 2 < oo, i = 1 , . . . , m, we define the co variance matrix of X and Y to be the matrix,
xv = Cov(X, Y) = E [(X - EX) (Y - EY)' ]
= E(XY') - (EX) (EY)'. ( 1 .6.2)
The (i,j)-element ofxv is the covariance, Cov(X;, lj) = E(X; lj) - E(X;)E(lj). In the special case when Y = X, Cov(X, Y) reduces to the covariance matrix of X.
§1 .6. The Multivariate Normal Distribution 33
Proposition 1 .6.1 . If a is an m-component column vector, B is an m x n matrix and X = (X1 , • • . , Xn)' where E I X; I 2 < oo, i = 1, . . . , n, then the random vector,
Y = a + BX, ( 1 .6.3)
has mean EY = a + BEX, ( 1 .6.4)
and covariance matrix,
Lyy = BLxx B'. ( 1 .6.5)
PROOF. Problem 1 . 1 5.
Proposition 1 .6.2. The covariance matrix Lxx is symmetric and non-negative definite, i.e. b' Lxx b 2': 0 for all b = (b1 , . . . , bnY E n.
PROOF. The symmetry of Lxx is apparent from the definition. To prove non negative definiteness let b = (b1 , • • . , bn)' be an arbitrary vector in n. Then by Proposition 1 .6. 1
b'Lxxb = Var(b'X) ;:o: 0. ( 1 .6.6) 0
Proposition 1 .6.3. Any symmetric, non-negative definite n x n matrix L can be written in the form
L = PAP', ( 1 .6.7)
where P is an orthogonal matrix (i.e. P' = p-1 ) and A is a diagonal matrix A = diag()" 1 , . . . , ).n) in which A1 , • . . , An are the eigenvalues (all non-negative) of L.
PROOF. This proposition is a standard result from matrix theory and for a proof we refer the reader to Graybill ( 1 983). We observe here only that if P;, i = 1, . . . , n, is a set of orthonormal right eigenvectors of L corresponding to the eigenvalues )" 1 , • • • , An respectively, then P may be chosen as the n x n matrix whose i'h column is p;, i = 1 , . . . , n. D
Remark 1. Using the factorization ( 1 .6. 7) and the fact that det P = det P' = 1 , we immediately obtain the result,
det L = Jc1 )"2 . . . A.. -
Definition 1 .6.1 (The Multivariate Normal Distribution). The random vector Y = ( Y1 , . . . , Y,)' is said to be multivariate normal, or to have a multivariate normal distribution, if and only if there exist a column vector a, a matrix B and a random vector X = (X1 , • • • , Xm)' with independent standard normal
components, such that
Y = a + BX. ( 1 .6.8)
Remark 2. The components X1 , • • • , Xm of X in ( 1 .6.8) must have the joint density
X = (x 1 , . . . , Xm)' E !Rm, ( 1 .6.9)
and corresponding characteristic function,
</lx(u) = Eeiu'X = exp (- t u]/2) , j= 1
Remark 3. It is clear from the definition that if Y has a multivariate normal distribution and if D is any k x n matrix and c any k x 1 vector, then Z = c + DY is a k-component multivariate normal random vector.
Remark 4. If Y is multivariate normal with representation ( 1 .6.8), then by Proposition 1 .6. 1 , EY = a and YY = BB'.
Proposition 1 .6.4. If Y = (Y1 , . . . , Y,)' is a multivariate normal random vector such that EY = J1 and YY = . then the characteristic function of Y is
<jly(u) = exp(iu' Jl - !u'u),
If det > 0 then Y has the density,
fy(y) = (2n)-ni2 (det )-1i2 exp [ -!(Y - J1)'-1 (y - J1}] .
PROOF. If Y i s multivariate normal with representation ( 1 .6.8) then
<jly(u) = E exp [iu'(a + BX)] = exp(iu'a)E exp(iu'BX).
( 1 .6. 1 1 )
( 1 .6. 1 2)
Using ( 1 .6. 1 0) with u (E !Rm) replaced by B'u (u E !Rn) in order to evaluate the last term, we obtain
<f>v(u) = exp(iu'a)exp( -!u'BB'u),
which reduces to ( 1 .6. 1 1 ) by Remark 4. If det > 0, then by Proposition 1 .6.3 we have the factorization,
= PAP',
where PP' = In, the n x n identity matrix, A = diag(A.1 , . . . , A.n) and each ).i > 0. If we define A -112 = diag(A.(112, . . . , A.;;-112 ) and
-1;2 = PA -112P',
then it is easy to check that -112-112 = ln. From Proposition 1 .6. 1 and Remark 3 we conclude that the random vector
§1 .6. The Multivariate Normal Distribution 35
( 1 .6. 1 3)
is multivariate normal with EZ = 0 and I:zz = ln . Application of the result ( 1 .6. 1 1 ) now shows that Z has the characteristic function lftz(u) = exp( -u'u/2), whence it follows that Z has the probability density ( 1 .6.9) with m = n. In view of the relation ( 1 .6. 1 3), the density of Y is given by
fv(Y) = j det I:-l/2 1fz(I:-If2 (y - f.l))
= (det I:)-112 (2n)-nf2 exp[ -!(Y - f.l)'I:-1 (y - f.l)]
as required. D
Remark 5. The transformation ( 1 .6. 1 3) which maps Y into a vector of inde pendent standard normal random variables is clearly a generalization of the transformation Z = u-1 ( ¥ - /1) which standardizes a single normal random variable with mean 11 and variance u2.
Remark 6. Given any vector f.l E IRn and any symmetric non-negative definite n x n matrix I:, there exists a multivariate normal random vector with mean f.l and covariance matrix I:. To construct such a random vector from a vector X = (X 1 , . . . , XnY with independent standard normal components we simply choose a = f.l and B = 1: 112 in ( 1 .6.8), where 1:112, in the terminology of Proposition 1 .6.3, is the matrix PA112P' with A112 = diag().il2, . . . , 212 ).
Remark 7. Proposition 1 .6.4 shows that a multivariate normal distribution is uniquely determined by its mean and covariance matrix. If Y is multivariate normal, EY = f.l and I:yy = I:, we shall therefore say that Y has the multi variate normal distribution with mean f.l and covariance matrix I:, or more succinctly,
Y N(f.l, I:).
ExAMPLE 1 .6. 1 (The Bivariate Normal Distribution). The random vector Y = ( ¥1 , Y2 )' is bivariate normal with mean f.l = (f1 1 , f12 )' and covariance matrix
( 1 .6. 14)
if and only if Y has the characteristic function (from ( 1 .6. 1 1 ) )
lftv(u) = exp[i(u 1 f11 + U2f12 ) - !Cui uf + 2u1 u2pU1 (J2 + uui)]. ( 1 .6. 1 5)
The parameters (J 1 , u 2 and p are the standard deviations and correlation of the components Y1 and ¥2 . Since every symmetric non-negative definite 2 x 2 matrix can be written in the form ( 1 .6. 14), it follows that every bivariate normal random vector has a characteristic function of the form ( 1 .6. 1 5). If u1 i= 0, u2 i= 0 and - 1 < p < 1 then I: has an inverse,
( 1 .6. 1 6)
and so by ( 1 .6. 1 2), Y has the probability density,
( 1 .6. 1 7)
Proposition 1.6.5. The random vector Y = ( Y1 , . • . , Yn)' is multivariate normal with mean 11 and covariance matrix L if and only if for each a = (a 1 , . . • , an)' E IRn, a'Y has a univariate normal distribution with mean a'11 and variance a'La.
PROOF. The necessity of the condition has already been established. To prove the sufficiency we shall show that Y has the appropriate characteristic function. For any a E IRn we are assuming that a' Y N(a'Jl, a'La), or equivalently that
E exp(ita'Y) = exp(ita' 11 - 1t2a'La). ( 1 .6. 1 8)
Setting t = 1 in ( 1 .6. 1 8) we obtain the required characteristic function ofY, viz.
E exp(ia'Y) = exp(ia'Jl - 1a'La). 0
.
Correspondingly we can write the mean and covariance matrix of Y as
11 = [:::] and L = [L t t L 1 2] 11 L2 1 L2 2
where 11(i) = £yU> and Lii = E(Y<i> - ,.u>) (yu> - Jl(j))'.
Proposition 1.6.6.
(i) y( l J and Y(2) are independent if and only if L 1 2 = 0. (ii) If det L2 2 > 0 then the conditional distribution of Y(l l given y<2> is
N(J1< 1 > + L 1 2 Lzi(Y<2 > - 11<2 >), L t t - L 1 2Lzi L2 t l·
PROOF. (i) If y( l > and Y<2> are independent, then
L 1 2 = E(Y< t > - 11( 1 l )E(Y<2> - 11(2>)' = 0. Conversely if L 1 2 = 0 then the characteristic function lfov(u), as specified by
§ I . 7. * Applications of Kolmogorov's Theorem
Proposition 1 .6.4, factorizes into
tPy(U) = tPyi 1 >(U( l ))tPy<2>(U(Z)), establishing the independence of y< I > and Y(2).
(ii) If we define
then clearly
so that X and y<z> are independent by (i). Using the relation ( 1 .6. 1 9) we can express the conditional characteristic function of y< l > given Y(2) as
E(exp(iu'Y< 1 > ) j Y<2 > ) = E(exp[iu'X + iu'(J1< 1 > + L 1 2L2 (Y<2> - Jl<2>))] jY<2> )
= exp[iu'(J1< 1 > + L 1 2 L2 (Y<2> - J1<2 >))] E exp(iu'X j Y<2>),
where the last line is obtained by taking a factor dependent only on y<z> outside the conditional expectation. Now since X and Y(2) are independent,
E(exp(iu'X) j Y<2 > ) = E exp(iu' X) = exp[ -u'(L 1 1 - L 1 2Lz L2 du],
so
E(exp(iu'Y< 1 > ) j Y(2) )
= exp[iu' (Jl< 1 > + L 1 2 L2 (Y<2> - Jl<2 >)) - u'(L l l - L l zLz Lz l )u] ,
completing the proof. D
ExAMPLE 1 .6.2. For the bivariate normal random vector Y discussed in Example 1 .6. 1 we immediately deduce from Proposition 1 .6.6 that Y1 and Y2 are independent if and only if p(J1 (J2 = 0. If (J1 > 0, (J2 > 0 and p > 0 then conditional on Y2 , Y1 is normal with mean
E(Y1 l Yz ) = !1 1 + P(J1 (Jz - 1 ( Yz - Jlz ),
and variance
§ 1 .7* Applications of Kolmogorov's Theorem
In this section we illustrate the use of Theorem 1 .2. 1 to establish the existence of two important processes, Brownian motion and the Poisson process.
Definition 1 .7.1 (Standard Brownian Motion). Standard Brownian motion starting at level zero is a process { B(t), t ?: 0} satisfying the conditions
38 l. Stationary Time Series
(a) B(O) = 0, (b) B(t2 ) - B(t 1 ), B(t3 ) - B(t2 ), . . . , B(tn) - B(tn_1 ), are independent for every
n E { 3, 4, . . . } and every t = (t 1 , . . . , tn)' such that 0 t 1 < t 2 < · · · < tn , (c) B(t) - B(s) N(O, t - s) for t ;;::: s.
To establish the existence of such a process we observe that conditions (a), (b) and (c) are satisfied if and only if, for every t = (t 1 , . . . , tn)' such that 0 t 1 < · · · < tn , the characteristic function of (B(t 1 ), . . . , B(tn)) is
1Pt{u) = E exp[iu 1 B(td + · · · + iunB(tn)J
= E exp[iu1 i1 1 + iu2 (i1 1 + l12) + · · · + iun(L1 1 + · · · + L1n)]
(where L1i = B(ti) - B(ti_1 ), j ;;::: 1 , and t0 = 0) ( 1 .7 . 1 )
= E exp[ii11 (u 1 + · · · + Un) + il12 (u2 + · · · + un) + · · · + il1nunJ
= exp [- I (ui + · · · + un)2 (ti - ti_1 )J . 2 j=1
It is trivial to check that the characteristic functions (M · ) satisfy the consistency condition ( 1 .2.9) and so by Kolmogorov's theorem there exists a process with characteristic functions r/J1( · ), or equivalently with the properties (a), (b) and (c).
Definition 1 .7.2 (Brownian Motion with Drift). Brownian motion with drift Jl, variance parameter rJ2 and initial level x is process { Y(t), t ;;::: 0} where
Y(t) = X + J.Lt + rJB(t),
and B(t) is standard Brownian motion.
The existence of Brownian motion with drift follows at once from that of standard Brownian motion.
Definition 1 .7.3 (Poisson Process). A Poisson process with mean rate A. ( > 0) is a process { N(t), t ;;::: 0} satisfying the conditions
(a) N(O) = 0, (b) N(t2 ) - N(td, N(t3 ) - N(t2 ), • • . , N(tn) - N(tn_1 ), are independent for
every n E { 3,4, . . . } and every t = (t 1 , . . . , tn)' such that O t1 < t2 < · · · < tn , (c) N(t) - N(s) has the Poisson distribution with mean A.(t - s) for t ;;::: s.
The proof of the existence of a Poisson process follows precisely the same steps as the proof of the existence of standard Brownian motion. For the Poisson process however the characteristic function of the increment L1i = N(ti) - N(ti-d is
E exp(iul1i) = exp { - A.(ti - ti_d ( l - e;") } .
In fact the same proof establishes the existence of a process {Z(t), t ;;::: 0}
Problems 39
satisfying conditions (a) and (b) of Definition 1 .7 . 1 provided the increments L1i = Z(ti) - Z(tj- t ) have characteristic function of the form
Problems
1 . 1. Suppose that X, = Z, + IJZ,_ 1 , t = 1, 2, . . . , where Z0, Z1 , Z2 , . . . , are independent random variables, each with moment generating function E exp().Z;) = m(A). (a) Express the joint moment generating function E exp(L71 A;XJ in terms of
the function m( · ). (b) Deduce from (a) that {X, } is strictly stationary.
1 .2. (a) Show that a l inear filter { aJ passes an arbitrary polynomial of degree k without distortion, i.e.
for all k'h degree polynomials m, = c0 + c1t + · · · + ckt k , if and only if
(b) Show that the Spencer 1 5-point moving average filter { aJ does not distort a cubic trend.
1 .3. Suppose that m, = c0 + c 1 t + c2 t2, t = 0, ± 1, . . . . (a) Show that
2 3 m, = I, aimt+i = I bimr+i ' i= -2 i= - 3
t = 0, ± 1, . . . ,
where a2 = a_ 2 = - f.;, a 1 = a_ 1 = H, a0 = H, and b3 = b_ 3 = -fr, b2 = b_2 = -fr, b1 = b_! = fr, bo = k
(b) Suppose that X, = m, + Z, where { Z,, t = 0, ± 1 , . . . } is an independent se quence of normal random variables, each with mean 0 and variance u2• Let U, = If-2 a;Xr+i and V, = If- 3 b;Xr+i · (i) Find the means and variances of U, and V,.
(ii) Find the correlations between U, and U,+1 and between V, and V,+t · (iii) Which of the two filtered series { U, } and { V, } would you expect to be
smoother in appearance?
1 .4. If m, = Ifo ckt\ t = 0, ± 1 , . . . , show that Vm, is a polynomial of degree (p - 1 ) i n t and hence that VP+l m , = 0.
1 .5. Design a symmetric moving average filter which eliminates seasonal components with period 3 and which at the same time passes quadratic trend functions without distortion.
1 .6. (a) Use the programs WORD6 and PEST to plot the series with values {x 1 , . . . , x30 } given by
1-10 1 486 474 434 441 435 401 414 414 386 405 1 1-20 4 1 1 389 414 426 4 10 441 459 449 486 5 10 2 1-30 506 549 579 5 8 1 630 666 674 729 77 1 785
This series is the sum of a quadratic trend and a period-three seasonal component.
(b) Apply the filter found in Problem 1 . 5 to the preceding series and plot the result. Comment on the result.
1 .7. Let Z,, t = 0, ± 1, . . . , be independent normal random variables each with mean 0 and variance a2 and let a, b and c be constants. Which, if any, of the following processes are stationary? For each stationary process specify the mean and autocovariance function. (a) X, = a + bZ, + cZ,_ 1 , (b) X, = a + bZ0 , (c) X, = Z1 cos(ct) + Z2 sin(ct), (d) X, = Z0 cos(ct), (e) X, = Z,cos(ct) + Z,_ , sin(ct), (f) X, = Z,Z,_ 1 .
1 .8. Let { Y, } be a stationary process with mean zero and let a and b be constants. (a) If X, = a + bt + s, + Y, where s, is a seasonal component with period 1 2,
show that VV1 2X, = ( 1 - 8) ( 1 - B1 2 )X, is stationary. (b) If X, = (a + bt)s, + Y, where s, is again a seasonal component with period
1 2, show that V2 X, = ( 1 - 81 2) ( 1 - B1 2 )X, is stationary.
1 .9. Use the program PEST to analyze the accidental deaths data by "classical de composition". (a) Plot the data. (b) Find estimates s, t = 1, . . . , 1 2, for the classical decomposition model,
X, = m, + s, + Y, , where s, = s,+ 1 2 , z)1 s, = 0 and E Y, = 0. (c) Plot the deseasonalized data, X, - s, t = 1, . . . , 72. (d) Fit a parabola by least squares to the deseasonalized data and use it as your
estimate m, of m, .
(e) Plot the residuals Y, = X, - m, - ., , t = 1, . . . , 72. (f) Compute the sample autocorrelation function of the residuals p(h), h =
0, . . . , 20. (g) Use your fitted model to predict X,, t = 73, . . . , 84 (using predicted noise
values of zero).
1 . 1 0. Let X, = a + bt + Y,, where { Y,, t = 0, ± 1, . . . } is an independent and identically distributed sequence of random variables with mean 0 and variance a2, and a and b are constants. Define
q W, = (2q + W' l: X,+j · j= -q
Compute the mean and autocovariance function of { W,} . Notice that although { W, } is not stationary, its autocovariance function y(t + h, t) = Cov(W,+h, W,) does not depend on t. Plot the autocorrelation function p(h) = Corr(W,+h, W,). Discuss your results in relation to the smoothing of a time series.
Problems 4 1
1 . 1 1 . If {X,} and { Y,} are uncorrelated stationary sequences, i.e. if X, and Y, are uncorrelated for every s and t, show that {X, + Y,} is stationary with auto covariance function equal to the sum of the autocovariance functions of {X,} and { Y, } .
1 . 1 2. Which, i f any, of the following functions defined on the integers i s the autocovariance function of a stationary time series? { 1 if h = 0, (a) f(h) =
1 /h if h =I 0. (b) f(h) = ( 1 )ihl
nh nh (c) f(h) = 1 + cos T + cos 4 {1 if h = 0,
(e) f(h) = .4 if h = ± 1 , 0 otherwise.
nh nh (d) f(h) = 1 + cos - cos -
2 4 {1 if h = 0, (f) .f(h) = .6 if h = ± 1 ,
0 otherwise.
1 . 1 3 . Let {S, t = 0, 1 , 2, . . . } be the random walk with constant drift f1, defined by S0 = 0 and
s, = 11 + s,_ 1 + x, t = 1 , 2, . . . '
where X 1 , X 2 , . . . are independent and identically distributed random variables with mean 0 and variance (J2 . Compute the mean of S, and the autocovariance function of the process { S, } . Show that {VS, } is stationary and compute its mean and autocovariance function.
1 . 1 4. If X, = a + bt, t = 1 , 2, . . . , n, where a and b are constants, show that the sample autocorrelations have the property p(k) --> 1 as n --> oo for each fixed k.
1 . 1 5. Prove Proposition 1 . 6. 1 .
1 . 1 6. (a) If Z N(O, 1 ) show that Z2 has moment generating function Ee'z' = ( 1 2tf112 for t < !, thus showing that Z2 has the chi-squared distribution with 1 degree of freedom.
(b) If Z1 , . . . , Z" are independent N(O, 1) random variables, prove that Zl + · · · + z; has the chi-squared distribution with n degrees of freedom by showing that its moment generating function is equal to (1 2tf"12 for t < !.
(c) Suppose that X = (X1 , . . . , X")' N(Jl, L) with L non-singular. Using ( 1 .6. 1 3), show that (X Jl)'L-1 (X Jl) has the chi-squared distribution with n degrees of freedom.
1 . 1 7. If X = (X 1 , . . . , X")' is a random vector with covariance matrix L, show that L is singular if and only if there exists a non-zero vector b = (b1 , • . . , b")' E IR" such that Var(b'X) = 0.
1 . 1 8.* Let F be any distribution function, let T be the index set T = { 1 , 2, 3, . . . } and let Y be as in Definition 1 .2.3. Show that the functions F1, t E Y, defined by
F,, ... 1Jx 1 , • • • , xn) := F(x J l · · · F(xn), X 1 , . . . , Xn E !R, constitute a family of distribution functions, consistent in the sense of ( 1 .2.8). By Kolmogorov's theorem this establishes that there exists a sequence of inde pendent random variables {X 1 , X 2, . • . } defined on some probability space and such that P(X; :o:; x) = F(x) for all i and for all x E IR.
CHAPTER 2
Hilbert Spaces
Although it is possible to study time series analysis without explicit use of Hilbert space terminology and techniques, there are great advantages to be gained from a Hilbert space formulation. These are largely derived from our familiarity with two- and three-dimensional Euclidean geometry and in par ticular with the concepts of orthogonality

Documents

Time Series: Theory and Methods (Springer Series in Statistics)