David R. Brillinger Time Series Data Analysis and Theory 2001

Time Series

http://avaxhome.ws/blogs/ChrisRedfield

SIAM's Classics in Applied Mathematics series consists of books that were previously allowedto go out of print. These books are republished by SIAM as a professional service because theycontinue to be important resources for mathematical scientists.

Editor-in-ChiefRobert E. O'Malley, Jr., University of Washington

Editorial BoardRichard A. Brualdi, University of Wisconsin-Madison

Herbert B. Keller, California Institute of Technology

Andrzej Z. Manitius, George Mason University

Ingram Olkin, Stanford University

Stanley Richardson, University of Edinburgh

Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht

Classics in Applied Mathematics

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the NaturalSciences

Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras withApplications and Computational Methods

James M. Ortega, Numerical Analysis: A Second Course

Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: SequentialUnconstrained Minimization Techniques

F. H. Clarke, Optimization and Nonsmooth Analysis

George F. Carrier and Carl E. Pearson, Ordinary Differential Equations

Leo Breiman, Probability

R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding

Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the MathematicalSciences

Olvi L. Mangasarian, Nonlinear Programming

*Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors:Part One, Part Two, Supplement. Translated by G. W. Stewart

Richard Bellman, Introduction to Matrix Analysis

U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary ValueProblems for Ordinary Differential Equations

K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-ValueProblems in Differential-Algebraic Equations

Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems

J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for UnconstrainedOptimization and Nonlinear Equations

Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability

*First time in print.

Classics in Applied Mathematics (continued)

Cornelius Lanczos, Linear Differential Operators

Richard Bellman, Introduction to Matrix Analysis, Second Edition

Beresford N. Parlett, The Symmetric Eigenvalue Problem

Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, andTraffic Flow

Peter W. M. John, Statistical Design and Analysis of Experiments

Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition

Emanuel Parzen, Stochastic Processes

Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods inControl: Analysis and Design

Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and OrderingPopulations: A New Statistical Methodology

James A. Murdock, Perturbations: Theory and Methods

Ivar Ekeland and Roger Témam, Convex Analysis and Variational Problems

Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II

J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in SeveralVariables

David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities andTheir Applications

F. Natterer, The Mathematics of Computerized Tomography

Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging

R. Wong, Asympototic Approximations of Integral

O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theoryand Computation

David R. Brillinger, Time Series: Data Analysis and Theory

This page intentionally left blank

Time SeriesData Analysis and Theory

David R. BrillingerUniversity of California at Berkeley

Berkeley, California

siamSociety for Industrial and Applied MathematicsPhiladelphia

Copyright © 2001 by the Society for Industrial and Applied Mathematics.

This SIAM edition is an unabridged republication of the work first published by Holden Day,Inc., San Francisco, 1981.

1 0 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this book may bereproduced, stored, or transmitted in any manner without the written permission of thepublisher. For information, write to the Society for Industrial and Applied Mathematics, 3600University City Science Center, Philadelphia, PA 19104-2688.

Library of Congress Cataloging-in-Publication Data

Brillinger, David R.Time series: data analysis and theory / David R. Brillinger

p. cm. -- (Classics in applied mathematics ; 36)"This SIAM edition is an unabridged republication of the work first published by

Holden Day, Inc., San Francisco, 1981" -- T.p. verso.ISBN 0-89871-501-6 (pbk.)1. Time-series analysis. 2. Fourier transformations. I. Title. II. Series

QA280 .B74 2001519.5'5--dc21

2001034170

Figure 1.1.3 reprinted with permission from E. W. Carpenter, "Explosion Seismology," Science,147:363-373, 22 January 1965. Copyright 1965 by the American Association for theAdvancement of Science.

siamis a registered trademark.

To My Family


CONTENTS

Preface to the Classics Edition xiiiPreface to the Expanded Edition xviiPreface to the First Edition hxix

1 The Nature of Time Series andTheir Frequency Analysis 1

1.1 Introduction 11.2 A Reason for Harmonic Analysis 71.3 Mixing 81.4 Historical Development 91.5 The Uses of the Frequency Analysis 101.6 Inference on Time Series 121.7 Exercises 13

2 Foundations 16

2.1 Introduction 162.2 Stochastics 172.3 Cumulants 192.4 Stationarity 222.5 Second-Order Spectra 232.6 Cumulant Spectra of Order k 252.7 Filters 272.8 Invariance Properties of Cumulant Spectra 342.9 Examples of Stationary Time Series 352.10 Examples of Cumulant Spectra 392.11 The Functional and Stochastic Approaches to Time Series Analysis

412.12 Trends 432.13 Exercises 44

ix

X CONTENTS

3 Analytic Properties of Fourier Transformsand Complex Matrices 49

3.1 Introduction 493.2 Fourier Series 493.3 Convergence Factors 523.4 Finite Fourier Transforms and Their Properties 603.5 The Fast Fourier Transform 643.6 Applications of Discrete Fourier Transforms 673.7 Complex Matrices and Their Extremal Values 703.8 Functions of Fourier Transforms 753.9 Spectral Representations in the Functional Approach to Time

Series 803.10 Exercises 82

4 Stochastic Properties of Finite Fourier Transforms 88

4.1 Introduction 884.2 The Complex Normal Distribution 894.3 Stochastic Properties of the Finite Fourier Transform 904.4 Asymptotic Distribution of the Finite Fourier Transform 944.5 Probability 1 Bounds 984.6 The Cramer Representation 1004.7 Principal Component Analysis and its Relation to the Cramer

Representation 1064.8 Exercises 109

5 The Estimation of Power Spectra 116

5.1 Power Spectra and Their Interpretation 1165.2 The Periodogram 1205.3 Further Aspects of the Periodogram 1285.4 The Smoothed Periodogram 1315.5 A General Class of Spectral Estimates 1425.6 A Class of Consistent Estimates 1465.7 Confidence Intervals 1515.8 Bias and Prefiltering 1545.9 Alternate Estimates 1605.10 Estimating the Spectral Measure and Autocovariance Function 1665.11 Departures from Assumptions 1725.12 The Uses of Power Spectrum Analysis 1795.13 Exercises 181

CONTENTS xi

6 Analysis of A Linear Time Invariant Relation BetweenA Stochastic Series and Several Deterministic Series 186

6.1 Introduction 1866.2 Least Squares and Regression Theory 1886.3 Heuristic Construction of Estimates 1926.4 A Form of Asymptotic Distribution 1946.5 Expected Values of Estimates of the Transfer Function and Error

Spectrum 1966.6 Asymptotic Covariances of the Proposed Estimates 2006.7 Asymptotic Normality of the Estimates 2036.8 Estimating the Impulse Response 2046.9 Confidence Regions 2066.10 A Worked Example 2096.11 Further Considerations 2196.12 A Comparison of Three Estimates of the Impulse Response 2236.13 Uses of the Proposed Technique 2256.14 Exercises 227

7 Estimating The Second-Order Spectraof Vector-Valued Series 232

7.1 The Spectral Density Matrix and its Interpretation 2327.2 Second-Order Periodograms 2357.3 Estimating the Spectral Density Matrix by Smoothing 2427.4 Consistent Estimates of the Spectral Density Matrix 2477.5 Construction of Confidence Limits 2527.6 The Estimation of Related Parameters 2547.7 Further Considerations in the Estimation of Second-Order Spectra

2607.8 A Worked Example 2677.9 The Analysis of Series Collected in an Experimental Design 2767.10 Exercises 279

8 Analysis of A Linear Time Invariant Relation BetweenTwo Vector-Valued Stochastic Series 286

8.1 Introduction 2868.2 Analogous Multivariate Results 2878.3 Determination of an Optimum Linear Filter 2958.4 Heuristic Interpretation of Parameters and Construction of Estimates

2998.5 A Limiting Distribution for Estimates 3048.6 A Class of Consistent Estimates 3068.7 Second-Order Asymptotic Moments of the Estimates 309

xii CONTENTS

8.8 Asymptotic Distribution of the Estimates 3138.9 Confidence Regions for the Proposed Estimates 3148.10 Estimation of the Filter Coefficients 3178.11 Probability 1 Bounds 3218.12 Further Considerations 3228.13 Alternate Forms of Estimates 3258.14 A Worked Example 3308.15 Uses of the Analysis of this Chapter 3318.16 Exercises 332

9 Principal Components in The Frequency Domain 337

9.1 Introduction 3379.2 Principal Component Analysis of Vector-Valued Variates 3399.3 The Principal Component Series 3449.4 The Construction of Estimates and Asymptotic Properties 3489.5 Further Aspects of Principal Components 3539.6 A Worked Example 3559.7 Exercises 364

10 The Canonical Analysis of Time Series 367

10.1 Introduction 36710.2 The Canonical Analysis of Vector-Valued Variates 36810.3 The Canonical Variate Series 37910.4 The Construction of Estimates and Asymptotic Properties 38410.5 Further Aspects of Canonical Variates 38810.6 Exercises 390

Proofs of Theorems 392

References 461

Notation Index 488

Author Index 490

Subject Index 496

Addendum: Fourier Analysis of Stationary Processes 501

PREFACE TO THE CLASSICSEDITION

"One can FT anything—often meaningfully."—John W. Tukey

John Tukey made this remark after my book had been published, but it issurely the motif of the work of the book. In fact the preface of the original bookstates that

The reader will note that the various statistics presented are imme-diate functions of the discrete Fourier transforms of the observedvalues of the time series. Perhaps this is what characterizes the workof this book. The discrete Fourier transform is given such promi-nence because it has important empirical and mathematical properties.Also, following the work of Cooley and Tukey (1965), it may becomputed rapidly.

The book was finished in mid 1972. The field has moved on from its placethen. Some of the areas of particular development include the following.

I. Limit theorems for empirical Fourier transforms.Many of the techniques based on the Fourier transform of a stretch of time

series are founded on limit or approximation theorems. Examples may be foundin Brillinger (1983). There have been developments to more abstract types ofprocesses: see, for example, Brillinger (1982, 1991). One particular type ofdevelopment concerns distributions with long tails; see Freedman and Lane(1981). Another type of extension concerns series that have so-called longmemory. The large sample distribution of the Fourier transform values in thiscase is developed in Rosenblatt (1981), Yajima (1989), and Pham and Guegan

xiii

xiv PREFACE TO THE CLASSICS EDITION

(1994).

II. Tapering.The idea of introducing convergence factors into a Fourier approximation has

a long history. In the time series case, this is known as tapering. Surprisingproperties continue to be found; see Dahlhaus (1985).

III. Finite-dimensional parameter estimation.Dzhaparidze (1986) develops in detail Whittle's method of Gaussian or

approximate likelihood estimation. Brillinger (1985) generalizes this to the third-order case in the tool of bispectral fitting. Terdik (1999) develops theoreticalproperties of that procedure.

IV. Computation.Time series researchers were astonished in the early 1980s to learn that the fast

Fourier transform algorithms had been anticipated many years earlier by K. F.Gauss. The story is told in Heideman et al. (1985). There have since beenextensions to the cases of a prime number of observations (see Anderson and Dillon(1996)) and to the case of unequally spaced time points (see Nguyen and Liu(1999)).

V. General methods and examples.A number of applications to particular physical circumstances have been made

of Fourier inference; see the paper by Brillinger (1999) and the book by Bloomfield(2000).

D. R. B.Berkeley, CaliforniaDecember 2000

ANDERSON, C., and DILLON, M. (1996). "Rapid computation of the discreteFourier transform." SIAM J. Sci, Comput. 17:913-919.

BLOOMFIELD, P. (2000). Fourier Analysis of Time Series: An Introduction.Second Edition. New York: Wiley.

BRILLINGER, D. R. (1982). "Asymptotic normality of finite Fourier transformsof stationary generalized processes." J. Multivariate Anal. 12:64-71.

BRILLINGER, D. R. (1983). "The finite Fourier transform of a stationaryprocess." In Time Series in the Frequency Domain, Handbook of Statist. 3, Eds. D.R. Brillinger and P. R. Krishnaiah, pp. 21-37. Amsterdam: Elsevier.

PREFACE TO THE CLASSICS EDITION xv

BRILLINGER, D. R. (1985). "Fourier inference: some methods for the analysisof array and nongaussian series data." Water Resources Bulletin. 21:743-756.

BRILLINGER, D. R. (1991). "Some asymptotics of finite Fourier transforms ofa stationary p-adic process." J. Combin. Inform. System Sci. 16:155-169.

BRILLINGER, D. R. (1999). "Some examples of empirical Fourier analysis inscientific problems." In Asymptotics, Nonparametrics and Time Series, Ed. S.Ghosh, pp. 1-36. New York: Dekker.

DAHLHAUS, R. (1985). "A functional limit theorem for tapered empiricalspectral functions." Stochastic Process. Appl. 19:135-149.

DZHAPARIDZE, K. (1986). Parameter Estimation and Hypothesis Testing inSpectral Analysis of Stationary Time Series. New York: Springer.

FREEDMAN, D., and LANE, D. (1981). "The empirical distribution of theFourier coefficients of a sequence of independent, identically distributed long-tailed random variables." Zeitschrift fur Wahrscheinlichkeitstheorie undVerwandte Gebiete. 58:21-40.

HEIDEMAN, M. T., JOHNSON, D. H., and BURRUS, C. S. (1985). "Gauss andthe history of the fast Fourier transform." Arch. Hist. Exact Sci. 34:265-277.

NGUYEN, N., and LIU, Q. H. (1999). "The regular Fourier matrices andnonuniform fast Fourier transforms." SIAM. J. Sci. Comput. 21:283-293.

PHAM, D. T., and GUEGAN, D. (1994). "Asymptotic normality of the discreteFourier transform of long memory series." Statist. Probab. Lett. 21:299-309.

ROSENBLATT, M. (1981). "Limit theorems for Fourier transforms of function-als of Gaussian sequences." Zeitschrift fur Wahrscheinlichkeitstheorie undVerwandte Gebiete. 55:123-132.

TERDIK, G. (1999). Bilinear Stochastic Models and Related Problems ofNonlinear Time Series Analysis: A Frequency Domain Approach, Lecture Notesin Statist. 142. New York: Springer.

YAJIMA, Y. (1989). "A central limit theorem of Fourier transforms of stronglydependent stationary processes." J. Time Ser. Anal. 10:375-383.


PREFACE TO THE EXPANDEDEDITION

The 1975 edition of Time Series: Data Analysis and Theory has been ex-panded to include the survey paper "Fourier Analysis of Stationary Pro-cesses." The intention of the first edition was to develop the many impor-tant properties and uses of the discrete Fourier transforms of the observedvalues or time series. The Addendum indicates the extension of the resultsto continuous series, spatial series, point processes and random Schwartzdistributions. Extensions to higher-order spectra and nonlinear systems arealso suggested.

The Preface to the 1975 edition promised a Volume Two devoted to theaforementioned extensions. The author found that there was so much ex-isting material, and developments were taking place so rapidly in thoseareas, that whole volumes could be devoted to each. He chose to concen-trate on research, rather than exposition.

From the letters that he has received the author is convinced that his in-tentions with the first edition have been successfully realized. He thanksthose who wrote for doing so.

D. R. B.

xvii


PREFACE TO THE FIRST EDITION

The initial basis of this work was a series of lectures that I presented tothe members of Department 1215 of Bell Telephone Laboratories, MurrayHill, New Jersey, during the summer of 1967. Ram Gnanadesikan of thatDepartment encouraged me to write the lectures up in a formal manner.Many of the worked examples that are included were prepared that summerat the Laboratories using their GE 645 computer and associated graphicaldevices.

The lectures were given again, in a more elementary and heuristic manner,to graduate students in Statistics at the University of California, Berkeley,during the Winter and Spring Quarters of 1968 and later to graduatestudents in Statistics and Econometrics at the London School of Economicsduring the Lent Term, 1969. The final manuscript was completed in mid1972. It is hoped that the references provided are near complete for theyears before then.

I feel that the book will prove useful as a text for graduate level courses intime series analysis and also as a reference book for research workersinterested in the frequency analysis of time series. Throughout, I have triedto set down precise definitions and assumptions whenever possible. Thisundertaking has the advantage of providing a firm foundation from whichto reach for real-world applications. The results presented are generally farfrom the best possible; however, they have the advantage of flowing from asingle important mixing condition that is set down early and gives continuityto the book.

Because exact results are simply not available, many of the theorems ofthe work are asymptotic in nature. The applied worker need not be put offby this. These theorems have been set down in the spirit that the indicated

xix

XX PREFACE

asymptotic moments and distributions may provide reasonable approxima-tions to the desired finite sample results. Unfortunately not too much workhas gone into checking the accuracy of the asymptotic results, but somereferences are given.

The reader will note that the various statistics presented are immediatefunctions of the discrete Fourier transforms of the observed values of thetime series. Perhaps this is what characterizes the work of this book.The discrete Fourier transform is given such prominence because it hasimportant empirical and mathematical properties. Also, following the workof Cooley and Tukey (1965), it may be computed rapidly. The definitions,procedures, techniques, and statistics discussed are, in many cases, simpleextensions of existing multiple regression and multivariate analysis tech-niques. This pleasant state of affairs is indicative of the widely pervasivenature of the important statistical and data analytic procedures.

The work is split into two volumes. This volume is, in general, devoted toaspects of the linear analysis of stationary vector-valued time series. VolumeTwo, still in preparation, is concerned with nonlinear analysis and theextension of the results of this volume to stationary vector-valued con-tinuous series, spatial series, and vector-valued point processes.

Dr. Colin Mallows of Bell Telephone Laboratories provided the authorwith detailed comments on a draft of this volume. Professor Ingram Olkinof Stanford University also commented on the earlier chapters of that draft.Mr. Jostein Lillestöl read through the galleys. Their suggestions were mosthelpful.

I learned time series analysis from John W. Tukey. I thank him nowfor all the help and encouragement he has provided.

D.R.B.

1

THE NATURE OFTIME SERIES

AND THEIRFREQUENCY ANALYSIS

1.1 INTRODUCTION

In this work we will be concerned with the examination of r vector-valuedfunctions

where Xj(t), j = 1,.. ., r is real-valued and t takes on the values 0, ± 1,±2, . . . . Such an entity of measurements will be referred to as an r vector-valued time series. The index t will often refer to the time of recording of themeasurements.

An example of a vector-valued time series is the collection of meanmonthly temperatures recorded at scattered locations. Figure 1.1.1 givessuch a series for the locations listed in Table 1.1.1. Figure 1.1.2 indicatesthe positions of these locations. Such data may be found in World WeatherRecords (1965). This series was provided by J. M. Craddock, Meteorologi-cal Office, Bracknell. Another example of a vector-valued time series is theset of signals recorded by an array of seismometers in the aftermath of anearthquake or nuclear explosion. These signals are discussed in Keen et al(1965) and Carpenter (1965). Figure 1.1.3 presents an example of such arecord.

1

2 NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS

Figure 1.1.1 Monthly mean temperatures in °C at 14 stations for the years 1920-1930.

1.1 INTRODUCTION 3

NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS

Table 1.1.1 Stations and Time Periods of TemperatureData Used in Worked Examples

Index

123456789

1011121314

City

ViennaBerlinCopenhagenPragueStockholmBudapestDeBiltEdinburghGreenwichNew HavenBaselBreslauVilnaTrondheim

Period Available

1780-19501769-19501798-19501775-19391756-19601780-19471711-19601764-19591763-19621780-19501755-19571792-19501781-19381761-1946

Figure 1.1.2 Locations of the temperature stations (except New Haven, U.S.A.).

4 4

1.1 INTRODUCTION 5

These examples are taken from the physical sciences; however, the socialsciences also lead to the consideration of vector-valued time series. Figure1.1.4 is a plot of exports from the United Kingdom separated by destinationduring the period 1958-1968. The techniques discussed in this work willsometimes be useful in the analysis of such a series although the results ob-tained are not generally conclusive due to a scarcity of data and departurefrom assumptions.

An inspection of the figures suggests that the individual componentseries are quite strongly interrelated. Much of our concern in this work willcenter on examining interrelations of component series. In addition thereare situations in which we are interested in a single series on its own. Forexample, Singleton and Poulter (1967) were concerned with the call of amale killer whale and Godfrey (1965) was concerned with the quantity ofcash held within the Federal Reserve System for the purpose of meetinginterbank check-handling obligations each month. Figure 1.1.5 is a graph ofthe annual mean sunspot numbers for the period 1760-1965; see Waldmeir(1961). This series has often been considered by statisticians; see Yule(1927), Whittle (1954), Brillinger and Rosenblatt (1967b). Generally speak-ing it will be enough to consider single component series as particular cases

Figure 1.1.3 Signals recorded by an array of seismometers at the time of an event.


of vector-valued series corresponding to r = 1. However, it is typicallymuch more informative if we carry out a vector analysis, and it is wise tosearch out series related to any single series and to include them in theanalysis.

Figure 1.1.4 Value of United Kingdom exports by destination for 1958-1968.

1.2 A REASON FOR HARMONIC ANALYSIS

Figure 1.1.5 Annual mean sunspot numbers for 1760-1965.

1.2 A REASON FOR HARMONIC ANALYSIS

The principal mathematical methodology we will employ in our analysisof time series is harmonic analysis. This is because of our decision to restrictconsideration to series resulting from experiments not tied to a specifictime origin or, in other words, experiments invariant with respect to trans-lations of time. This implies, for example, that the proportion of the valuesX(t), t > u, falling in some interval /, should be approximately the same asthe proportion of the values X(t), t > u + u, falling in / for all v.

The typical physical experiment appears to possess, in large part, this sortof time invariance. Whether a physicist commenced to measure the force ofgravity one day or the next does not seem to matter for most purposes. Acursory examination of the series of the previous section suggests: the tem-perature series of Figure 1.1.1 are reasonably stable in time; portions of theseismic series appear stable; the export series do not appear stationary; andthe sunspot series appear possibly so. The behavior of the export series istypical of that of many socioeconomic series. Since people learn from thepast and hence alter their behavior, series relating to them are not generallytime invariant. Later we will discuss methods that may allow removing astationary component from a nonstationary series; however, the techniquesof this work are principally directed toward the analysis of series stable intime.

The requirement of elementary behavior under translations in time hascertain analytic implications. Let/(f) be a real or complex-valued functiondefined for / = 0, ±1, . . . . If we require

7

then clearly /(/) is constant. We must therefore be less stringent than ex-pression (1.2.1) in searching for functions behaving simply under timetranslations. Let us require instead

with Cj = Cj exp f iXj-u}. In other words, if a function of interest is a sum oicosinusoids, then its behavior under translations is also easily described. Wehave, therefore, in the case of experiments leading to results that are deter-ministic functions, been led to functions that can be developed in themanner of (1.2.6). The study of such functions is the concern of harmonic 01Fourier analysis; see Bochner (1959), Zygmund (1959), Hewitt and Ross(1963), Wiener (1933), Edwards (1967).

In Section 2.7 we will see that an important class of operations on timeseries, niters, is also most easily described and investigated through har-monic analysis.

With experiments that result in random or stochastic functions, X(t).time invariance leads us to investigate the class of experiments such that{X(t\\... ,X(tk)\ has the same probability structure as {X(t\ + M), ..., X(tk + «)}for all u and t\,.. ., tk. The results of such experiments are called stationarystochastic processes; see Doob (1953), Wold (1938), and Khintchine (1934).

1.3 MIXING

A second important requirement that we will place upon the time seriesthat we consider is that they have a short span of dependence. That is, the


Setting u = 1 and proceeding recursively gives

In either case, if we write Ci = exp [a], a real or complex, then we see thatthe general solution of expression (1.2.2) may be written

and that Cu — exp (aw). The bounded solutions of expression (1.2.2) areseen to occur for a = i\, X real, where / = -^— 1. In summary, if we look forfunctions behaving simply with respect to time translation, then we are ledto the sinusoids exp {i\t\, X real; the parameter X is called the frequency olthe sinusoid. If in fact

then

1.4 HISTORICAL DEVELOPMENT 9

measurements X(t) and X(s) are becoming unrelated or statistically inde-pendent of each other as t — s —> °°.

This requirement will later be set down in a formal manner with Assump-tions 2.6.1 and 2.6.2(7). It allows us to define relevant population parametersand implies that various estimates of interest are asymptotically Gaussian inthe manner of the central limit theorem.

Many series that are reasonably stationary appear to satisfy this sort ofrequirement; possibly because as time progresses they are subjected torandom shocks, unrelated to what has gone before, and these randomshocks eventually form the prime content of the series.

A requirement that a time series have a weak memory is generally referredto as a mixing assumption; see Rosenblatt (1956b).

1.4 HISTORICAL DEVELOPMENT

The basic tool that we will employ, in the analysis of time series, is thefinite Fourier transform of an observed section of the series.

The taking of the Fourier transform of an empirical function was pro-posed as a means of searching for hidden periodicities in Stokes (1879).Schuster (1894), (1897), (1900), (1906a), (1906b), in order to avoid the an-noyance of considering relative phases, proposed the consideration of themodulus-squared of the finite Fourier transform. He called this statistic theperiodogram. His motivation was also the search for hidden periodicities.

The consideration of the periodogram for general stationary processeswas initiated by Slutsky (1929, 1934). He developed many of the statisticalproperties of the periodogram under a normal assumption and a mixingassumption. Concurrently Wiener (1930) was proposing a very general formof harmonic analysis for time series and beginning a study of vectorprocesses.

The use of harmonic analysis as a tool for the search of hidden periodici-ties was eventually replaced by its much more important use for inquiringinto relations between series; see Wiener (1949) and Press and Tukey (1956).An important statistic in this case is the cross-periodogram, a product of thefinite Fourier transforms of two series. It is inherent in Wiener (1930) andGoodman (1957); the term cross-periodogram appears in Whittle (1953).

The periodogram and cross-periodogram are second-order statistics andthus are especially important in the consideration of Gaussian processes.Higher order analogs are required for the consideration of various aspectsof non-Gaussian series. The third-order periodogram, a product of threefinite Fourier transforms, appears in Rosenblatt and Van Ness (1965), andthe fcth order periodogram, a product of k finite Fourier transforms, inBrillinger and Rosenblatt (1967a, b).


The instability of periodogram-type statistics is immediately apparentwhen they are calculated from empirical functions; see Kendall (1946),Wold (1965), and Chapter 5 of this text. This instability led Daniell (1946)to propose a numerical smoothing of the periodogram which has now be-come basic to most forms of frequency analysis.

Papers and books, historically important in the development of themathematical foundations of the harmonic analysis of time series, include:Slutsky (1929), Wiener (1930), Khintchine (1934), Wold (1938), Kolmogorov(1941a, b), Crame'r (1942), Blanc-Lapiere and Fortet (1953), and Grenander(1951a).

Papers and books, historically important in the development of the em-pirical harmonic analysis of time series, include: Schuster (1894, 1898),Tukey (1949), Bartlett (1948), Blackman and Tukey (1958), Grenander andRosenblatt (1957), Bartlett (1966), Hannan (1960), Stumpff (1937), andChapman and Bartels (1951).

Wold (1965) is a bibliography of papers on time series analysis. Burkhardt(1904) and Wiener (1938) supply a summary of the very early work. Simpson(1966) and Robinson (1967) provide many computer programs useful inanalyzing time series.

1.5 THE USES OF THE FREQUENCY ANALYSIS

This section contains a brief survey of some of the fields in which spectralanalysis has been employed. There are three principal reasons for usingspectral analysis in the cases to be presented (i) to provide useful descriptivestatistics, (ii) as a diagnostic tool to indicate which further analyses mightbe relevant, and (iii) to check postulated theoretical models. Generally, thesuccess experienced with the technique seems to vary directly with the lengthof series available for analysis.

Physics If the spectral analysis of time series is viewed as the study of theindividual frequency components of some time series of interest, then thefirst serious application of this technique may be regarded as havingoccurred in 1664 when Newton broke sunlight into its component parts bypassing it through a prism. From this experiment has grown the subject ofspectroscopy (Meggers (1946), McGucken (1970), and Kuhn (1962)), inwhich there is investigation of the distribution of the energy of a radiationfield as a function of frequency. (This function will later be called a powerspectrum.) Physicists have applied spectroscopy to identifying chemicalelements, to determine the direction and rate of movement of celestialbodies, and to testing general relativity. The spectrum is an importantparameter in the description of color; Wright (1958).

1.5 THE USES OF THE FREQUENCY ANALYSIS 11

The frequency analysis of light is discussed in detail in Born and Wolfe(1959); see also Schuster (1904), Wiener (1953), Jennison (1961), andSears (1949).

Power spectra have been used frequently in the fields of turbulence andfluid mechanics; see Meecham and Siegel (1964), Kampe de Feriet (1954),Hopf (1952), Burgers (1948), Friedlander and Topper (1961), and Batchelor(1960). Here one typically sets up a model leading to a theoretical powerspectrum and checks it empirically. Early references are given in Wiener(1930).

Electrical Engineering Electrical engineers have long been concernedwith the problem of measuring the power in various frequency bandsof some electromagnetic signal of interest. For example, see Pupin(1894), Wegel and Moore (1924), and Van der Pol (1930). Later, the inven-tion of radar gave stimulus to the problem of signal detection, and fre-quency analysis proved a useful tool in its investigation; see Wiener (1949),Lee and Wiesner (1950), and Solodovnikov (1960). Frequency analysis isnow firmly involved in the areas of coding, information theory, and com-munications; see Gabor (1946), Middleton (1960), and Pinsker (1964). Inmany of these problems, Maxwell's equations lead to an underlying modelof some use.

Acoustics Frequency analysis has proved itself important in the field ofacoustics. Here the power spectrum has generally played the role of adescriptive statistic. For example, see Crandall and Sacia (1924), Beranek(1954), and Majewski and Hollien (1967). An important device in this con-nection is the sound spectrograph which permits the display of time-dependent spectra; see Fehr and McGahan (1967). Another interestingdevice is described in Noll (1964).

Geophysics Tukey (1965a) has given a detailed description and bibliog-raphy of the uses of frequency analysis in geophysics; see also Tukey(1965b), Kinosita (1964), Sato (1964), Smith et al (1967), Labrouste (1934),Munk and MacDonald (1960), Ocean Wave Spectra (1963), Haubrich andMacKenzie (1965), and various authors (1966). A recent dramatic exampleinvolves the investigation of the structure of the moon by the frequencyanalysis of seismic signals, resulting from man-made impacts on the moon;see Latham et al (1970).

Other Engineering Harmonic analysis has been employed in many areasof engineering other than electrical: for example, in aeronautical engineer-ing, Press and Tukey (1956), Takeda (1964); in naval engineering, Yamanou-chi (1961), Kawashima (1964); in hydraulics, Nakamura and Murakami(1964); and in mechanical engineering, Nakamura (1964), Kaneshige (1964),Crandall (1958), Crandall (1963). Civil engineers find spectral techniquesuseful in understanding the responses of buildings to earthquakes.


Medicine A variety of medical data is collected in the form of timeseries; for example, electroencephalograms and electrocardiograms. Refer-ences to the frequency analysis of such data include: Alberts et al (1965),Bertrand and Lacape (1943), Gibbs and Grass (1947), Suhara and Suzuki(1964), and Yuzuriha (1960). The correlation analysis of EEG's is discussedin Barlow (1967); see also Wiener (1957, 1958).

Economics Two books, Granger (1964) and Fishman (1969), haveappeared on the application of frequency analysis to economic time series.Other references include: Beveridge (1921), Beveridge (1922), Nerlove(1964), Cootner (1964), Fishman and Kiviat (1967), Burley (1969), andBrillinger and Hatanaka (1970). Bispectral analysis is employed in God-frey (1965).

Biology Frequency analysis has been used to investigate the circadianrhythm present in the behavior of certain plants and animals; for example,see Aschoff (1965), Chance et al (1967), Richter (1967). Frequencyanalysis is also useful in constructing models for human hearing; seeMathews (1963).

Psychology A frequency analysis of data, resulting from psychologicaltests, is carried out in Abelson (1953).

Numerical Analysis Spectral analysis has been used to investigate theindependence properties of pseudorandom numbers generated by variousrecursive schemes; see Jagerman (1963) and Coveyou and MacPherson(1967).

1.6 INFERENCE ON TIME SERIES

The purpose of this section is to record the following fact that the readerwill soon note for himself in proceeding through this work: the theory andtechniques employed in the discussion of time series statistics are entirelyelementary. The basic means of constructing estimates is the method ofmoments. Asymptotic theory is heavily relied upon to provide justifications.Much of what is presented is a second-order theory and is therefore mostsuitable for Gaussian processes. Sufficient statistics, maximum likelihoodstatistics, and other important concepts of statistical inference are onlybarely mentioned.

A few attempts have been made to bring the concepts and methods ofcurrent statistical theory to bear on stationary time series; see Bartlett(1966), Grenander (1950), Slepian (1954), and Whittle (1952). Likelihoodratios have been considered in Striebel (1959), Parzen (1963), and Gikmanand Skorokhod (1966). General frameworks for time series analysis havebeen described in Rao (1963), Stigum (1967), and Rao (1966); see alsoHajek (1962), Whittle (1961), and Arato (1961).

1.7 EXERCISES 13

It should be pointed out that historically there have been two ratherdistinct approaches to the analysis of time series: the frequency or harmonicapproach and the time domain approach. This work is concerned with theformer, while the latter is exemplified by the work of Mann and Wald (1943),Quenouille (1957), Durbin (1960), Whittle (1963), Box and Jenkins (1970).The differences between these two analyses is discussed in Wold (1963). Withthe appearance of the Fast Fourier Algorithm, however, it may be moreefficient to carry out computations in the frequency domain even whenthe time domain approach is adopted; see Section 3.6, for example.

1.7 EXERCISES

1.7.1 If/(-) is complex valued and/(ri + «i, . . . , /* -f- «*) = Cui...ukf(ti, ...,tk)for tj, Uj = 0, ±1, ±2, . . . , 7 = 1, . . . ,*, prove that /(/i, . . . , tk) =/(O, . . . , 0) exp{ 2ajtj} for some ai, . . . , a*. See Aczel (1969).

1.7.2 If/(/) is complex valued, continuous, and/(/ + u) = Cuf(t) for — co < /,u < oo, prove that /(/) = /(O) exp{a/} for some a.

1.7.3 If f(/) is r vector valued, with complex components, and f(t + u) = C«f(/)for /, « = 0, ±1, ±2, .. . and Cu an r X r matrix function, prove thatf(/) = Ci' f(0) if Det{f(0), . . . , f(r - 1)} ^ 0, where

See Doeblin (1938) and Kirchener (1967).1.7.4 Let W(a), — oo < a < °o be an absolutely integrable function satisfying

Let/(a), —oo < « < oobea bounded function continuous at a = X. Showthat e~! / ^[s-KX - <x)]da = 1 and

1.7.5 Prove that for

14

1.7.6 Let X\, . . . , Xr be independent random variables with EXj = /z, and vaXj - a2. Consider linear combinations Y — ̂ y ajXj, ^j a/ = 1. We hav<EY = ^j ajuj. Prove that var Y is minimized by the choice aj = aj~2/^j erf2, J = 1, . . . , r.

1.7.7 Prove that J^f",} exp{ i(27rw)/rj = 7 if s = 0, =tr, ±2r,... and = 0 foiother integral values of s.

1.7.8 If A' is a real-valued random variable with finite second moment and 6 is reavalued, prove that E(X - 0)2 = var X + (EX - 6?.

1.7.9 Let /denote the space of two-sided sequences x = {x,, t = 0, ±1, ±2,... }Let Ct denote an operation on / that is linear, [GL(ax -f- &y) = a&x -f- jSOt.for a, ft scalars and x, y € /,] and time invariant, [&y = Y if GLx = X, yt =x,+u, Y, = Xl+u for some u ~ 0, ±1, ±2, . . . ]. Prove that there exists ifunction /4(X) such that (CU)/ = A(\)xt if x, = exp{/Xr}.

1.7.10 Consider a sequence co, c\, 02,. . . , its partial sums 5r = ]Cf=0 c'> an^tn'

Cesaro means

If ST -* S, prove that err -»S (as T -> oo); see Knopp (1948).

1.7.11 Let „ be a vector-valued random variable with Y real-valued an<

EY2 < oo. Prove that <£(X) with £#(X)2 < °° that minimizes E[Y - <f>(X)]is given by <KX) =£{ r |X j .

1.7.12 Show that for n = 1, 2, . . .

and from this show

NATURE OF TIME SERIES AND THEIR FREQUENCY ANALYSIS

1.7 EXERCISES 15

1.7.13 Show that the identity

holds, where 0 ^ m ^ n, Uk = «o Hh uk, (k ^ 0), £/_i = 0. (Abel'stransformation)

1.7.14 (a) Let /(*), 0 ^ x ^ 1, be integrable and have an integrable derivativef^(x). Show that

with [y] denoting the integral part of y.(b) Let /<*>(*), k = 0,1, 2 , . . . denote the ah derivative of f(x\ Suppose

/<*>(*), 0 ^ x ^ 1, is integrable for k = 0, 1, 2,. . . , K. Show that

where Bk(y) denotes the kth Bernoulli polynomial. (Euler-MacLaurin)

2

FOUNDATIONS

2.1 INTRODUCTION

In this chapter we present portions of both the stochastic and determinis-tic approaches to the foundations of time series analysis. The assumptionsmade in either approach will be seen to lead to the definition of similarparameters of interest, and implications for practice are generally the same.In fact it will be shown that the two approaches are equivalent in a certainsense. An important part of this chapter will be to develop the invarianceproperties of the parameters of interest for a class of transformations of theseries called filters. Proofs of the theorems and lemmas are given at the endof the book.

The notation that will be adopted throughout this text includes bold faceletters A, B which denote matrices. If a matrix A has entries Ajk we some-times indicate it by [Ajk]. Given an r X s matrix A, its s X r transpose is de-noted by Ar, and the matrix whose entries are the complex conjugates ofthose of A is denoted by A. Det A denotes the determinant of the squarematrix A; the trace of A, is indicated by tr A. |A| denotes the sum of theabsolute values of the entries of A, and I, the identity matrix. An r vector isan r X 1 matrix.

We denote the expected value of a random variable X by EX generally,and sometimes, by ave X. This will reduce the possibility of confusion incertain expressions. We denote the variance of Xby \arX. If (X,Y) is a bivari-ate random variable, we denote the covariance of X with Y by cov [X,Y]. Wesignify the correlation of A' with Y by cor \X,Y].

16

2.2 STOCHASnCS 17

If z is a complex number, we indicate its real part by Re z and itsimaginary part by Im z. We therefore have the representation

We denote the modulus of z, [(Re z)2 + (Im z)2]"2 by |z| and its argument,tan~l {Im z/Re z}, by arg z. If x and y are real numbers, we will write

when the difference x — y is an integral multiple of a.The following functions will prove useful in our work: the Kronecker

delta

otherwise

and the Kronecker comb

otherwise.

Likewise the following generalized functions will be useful: the Dirac deltafunction, S(a), —<*> < a < <*>, with the property

for all functions/(a) continuous at 0, and the Dirac comb

for — oo < a < «> with the property

for all suitable functions/(a). These last functions are discussed in Lighthill(1958), Papoulis (1962), and Edwards (1967). Exercise 1.7.4 suggests thate~1M/(e~'a), for small e, provides an approximate Dirac delta function.

2.2 STOCHASTICS

On occasion it may make sense to think of a particular r vector-valuedtime series X(/) as being a member of an ensemble of vector time series whichare generated by some random scheme. We can denote such an ensemble by

18 FOUNDATIONS

(X(f,0); 6 e 9 and f = 0, ±1, ±2, . . . ,} where 0 denotes a random variabletaking values in 9. If X(f,0) is a measurable function of 0, then X(f,0) is arandom variable and we can talk of its finite dimensional distributions givenby relations such as

and we can consider functional such as

and

if the integrals involved exist. Once a 0 has been generated (in accordancewith its probability distribution), the function X(/,0), with 0 fixed, will b<described as a realization, trajectory, or sample path of the time series

Since there will generally be no need to include 0 specifically as an argument in X(/,0), we will henceforth denote X(f,0) by X(/). X(/) will be called itime series, stochastic process, or random function.

The interested reader may refer to Cramer and Leadbetter (1967), Yaglon(1962), or Doob (1953) for more details of the probabilistic foundations o:time series. Function ca(t)t defined in (2.2.2), is called the mean function o:the time series^/)- Function caa(t\,t2), as derived from (2.2.4), is called th<(auto) covariance function of Xa(t), and cab(ti,t2), defined in (2.2.4), is callecthe cross-covariance function of Xa(t) with Xb(i). ca(i) will exist if and only iiave \Xa(t)\ < °° • By the Schwarz inequality we have

is called the (auto) correlation function of Xa(f) and

is called the cross-correlation function of Xa(t\) withXfa)-We will say that the series Xa(t) and X£t) are orthogonal if cab(ti,t2) = C

for all /i, /2.

2.3 CUMULANTS 19

2.3 CUMULANTS

Consider for the present an r variate random variable (Yi,. .., Yr) withave \Yj\r < <» ,j = 1 , . . . , r where the Yjare real or complex.

Definition 2.3.1 The rth order joint cumulant, cum (Yi,..., Yr), of(7i, . . . , F,)is given by

where the summation extends over all partitions (y\, •.., vp), p = 1,. . . , r,of ( l , . . . , r ) .

An important special case of this definition occurs when F, = YJ = !,...,/•.The definition gives then the cumulant of order r of a univariate randomvariable.

Theorem 2.3.1 cum (Y\,.. . , Yr) is given by the coefficient of (iyti • • • tr inthe Taylor series expansion of log (ave exp i J^-i Yjtj) about the origin.

This last is sometimes taken as the definition of cum (Yi,. .. , Yr).Properties of cum (7i , . . . , Yr) include:

(i) cum (a\Y\,. . . , arYr) = a\ • • -a, cum(Y\,. .. , Yr) for ai, . . . , ar

constant(ii) cum(Fi, . . . , Yr) is symmetric in its arguments

(iii) if any group of the Y's are independent of the remaining Y's, thencum<T, , . . . , r r ) = 0

(iv) for the random variable (Z\,Yi,,. . , Yr), cum (Y\ + Z\,Y2,..., Yr)= cum (Yi,Y2t. . . , Fr) + cum (Z,,r2, . . . , Yr)

(v) for n constant and r = 2, 3, . . .

(vi) if the random variables (Fi,. . . , 7r) and (Zi,. . . ,Zr) are inde-pendent, then

(vii) cum Yj = $Yjfory = 1,. .., r(viii) cum (y/,F/) = var Y j f o r j = ! , . . . , / •

(ix) cum (rv,Ffc) = cov(ry,n) fory, A: = 1,. . . , r.

and a partition P\ U /*2 U • • • U PM of its entries. We shall say that setsPm',Pm", of the partition, hook if there exist (iiji) £ Pm> and (fcji) € Pm"such that /i = h. We shall say that the sets Pm> and Pm» communicate if thereexists a sequence of sets Pmi = Pm>, Pm,, . . . , PmA^ = Pm., such that Pmn andPmn+l hook for n = 1,2, . . . , 7 V — I. A partition is said to be indecomposableif all sets communicate. If the rows of Table 2.3.4 are denoted R\,. . . , Ri,then a partition Pi • • • PM is indecomposable if and only if there exist no setsPm i , . . ., PmN, (N < M), and rows Rtl,. .. , /*,., (0 < /), with

20 FOUNDATIONS

Cumulants will provide us with a means of defining parameters of interest,with useful measures of the joint statistical dependence of random variables(see (iii) above) and with a convenient tool for proving theorems. Cumulantshave also been called semi-invariants and are discussed in Dressel (1940),Kendall and Stuart (1958), and Leonov and Shiryaev (1959).

A standard normal variate has characteristic function exp { —12 /2\ . Itfollows from the theorem therefore that its cumulants of order greater than2 are 0. Also, from (iii), all the joint cumulants of a collection of inde-pendent variates will be 0. Now a general multivariate normal is defined tobe a vector of linear combinations of independent normal variates. It nowfollows from (i) and (vi) that all the cumulants of order greater than 2 are 0for a multivariate normal.

We will have frequent occasion to discuss the joint cumulants of poly-nomial functions of random variables. Before presenting expressions for thejoint cumulants of such variates, we introduce some terminology due toLeonov and Shiryaev (1959). Consider a (not necessarily rectangular) two-way table

The next lemma indicates a result relating to indecomposable partitions.

Lemma 2.3.1 Consider a partition P\--PM> M > 1, of Table 2.3.4.Given elements rijt sm;j = ! , . . . , / , - ; /= 1 , . . . , / ; m — 1,. . . , M; definethe function 0(r,v) = sm if (ij) £ Pm, The partition is indecomposable if andonly if the 0(riyi) — <£(/*,./,); 1 ̂ ji, J2 ^ Jr, i = 1,. . ., / generate all theelements of the set \sm — sm>; \ ̂ m, m' ^ M} by additions and subtrac-tions. Alternately, given elements f,, i = 1,.. ., / define the function

This is a case of a result of Isserlis (1918).We end this section with a definition extending that of the mean'function

and autocovariance function given in Section 2.2. Given the r vector-valuedtime series X(f), t = 0, ±1,. . . with components Xa(t), a = 1,.. . , r, andE\Xa(i)\< oo, we define

for fli,. . . , ak = 1, . . . , r and /i, . . . , / * = 0, ±1, . . . . Such a functionwill be called a joint cumulant function of order fe of the series X(f), t = 0,±1, . . . .

2.3 CUMULANTS 21

Mfij) — ti',j' = ! , . . . , / / ; / = 1 , . . . , / . The partition is indecomposable ifand only if the ̂ (r/y) - iK/vy); O'J), 0V) € /V, m = 1,. . ., M generate allthe elements of the set {/, — /,->; 1 ^ // <J /} by addition and subtraction.

We remark that the set {ti — ?,-; 1 ^ /',/' ^ /} is generated by / — 1 in-dependent differences, such as t\ — ti,. . . , ti-\ — ti. It follows that whenthe partition is indecomposable, we may find 7 — 1 independent differencesamong the Mnj) - iK/Vy); (/,;), (/',/) € Pm', m = 1 , . . . , M.

Theorem 2.3.2 Consider a two-way array of random variables X^;j = 1,. . . , J/; / = 1, . . . , / . Consider the / random variables

The joint cumulant cum (Y\,. . . , F/) is then given by

where the summation is over all indecomposable partitions v = v\ U • • • U vp

of the Table 2.3.4.

This theorem is a particular case of a result of work done by Leonov andShiryaev (1959).

We briefly mention an example of the use of this theorem. Let (Xi,... ,^4)be a 4-variate normal random variable. Its cumulants of order greater than 2will be 0. Suppose we wish cov {X^JtsX*} • Following the details of Theo-rem 2.3.2 we see that

We indicate the r X r matrix-valued function with entries cab(u) by cxx(u)and refer to it as the autocovariance function of the series X(f), t = 0,± 1 , . . . . If we extend the definition of cov to vector-valued random vari-ables X, Y by writing

for t, u = 0, ±1,. . . and a, b = 1,. .. , r.

We note that a strictly stationary series with finite second-order momentsis second-order stationary.

On occasion we write the covariance function, of a second-order sta-tionary series, in an unsymmetric form as

22 FOUNDATIONS

2.4 STATIONARITY

An r vector-valued time series X(f), t = 0, ±1,... is called strictlystationary when the whole family of its finite dimensional distributions isinvariant under a common translation of the time arguments or, when thejoint distribution of Xai(ti + / ) , . . . , X0k(tk + t) does not depend on t fort, h,. . ., tk = 0, ±1,... and 0i,. . . ,ak = 1,.. . , r, A: = 1 , 2 , . . . .

Examples of strictly stationary series include a series of independentidentically distributed r vector-valued variates, e(/), t = 0, ±1,.. . and aseries that is a deterministic function of such variates as

More examples of strictly stationary series will be given later.In this section, and throughout this text, the time domain of the series is

assumed to be t = 0, ± 1, . . . . We remark that if / is any finite stretch ofintegers, then a series X(f), / € /, that is relatively stationary over /, may beextended to be strictly stationary over all the integers. (The stationary ex-tension of series defined and relatively stationary over an interval is con-sidered in Parthasarathy and Varadhan (1964).) The important thing, fromthe standpoint of practice, is that the series be approximately stationaryover the time period of observation.

An r vector-valued series X(/), / = 0, ±1, . . . is called second-orderstationary or wide-sense stationary if

2.5 SECOND-ORDER SPECTRA 23

then we may define the autocovariance function of the series X(t) by

for t, u = 0, ±1,... in the second-order stationary case.If the vector-valued series X(0, / = 0, ±1,... is strictly stationary with

JELXXOI*< »,./= l , . . . , r , then

for ti,.. ., tk, a = 0, ±1, . . . . In this case we will sometimes use theasymmetric notation

to remove the redundancy. This assumption of finite moments need notcause concern, for in practice all series available for analysis appear to bestrictly bounded, \Xj(t)\ < C, j = 1, . . . , r for some finite C and so allmoments exist.

2.5 SECOND-ORDER SPECTRA

Suppose that the series X(0, t = 0, ± 1,... is stationary and that, follow-ing the discussion of Section 1.3, its span of dependence is small in the sensethatXa(t) &ndXt(t + u) are becoming increasingly less dependent as |M| —» =>for a, b = 1 , . . . , r. It is then reasonable to postulate that

In this case we define the second-order spectrum of the series Xa(t) with theseries ATt(r) by

Under the condition (2.5.1),/0<,(X) is bounded and uniformly continuous.The fact that the components of \(t) are real-valued implies that

Also an examination of expression (2.5.2) shows that/,i,(X) has period 2irwith respect to X.

The real-valued parameter X appearing in (2.5.2) is called the radian orangular frequency per unit time or more briefly the frequency. If b = a, then/«w(X) is called the power spectrum of the series Xa(t) at frequency X. If b ^ a,then/06(X) is called the cross-spectrum of the series Xtt(t) with the series Xd(t)

24 FOUNDATIONS

at frequency X. We note that tfXa(t) = Xb(i), t = 0, ±1,. . .with probability1, then /afc(X), the cross-spectrum, is in fact the power spectrum faa(\\Re /,i(X) is called the co-spectrum and Im /0&(X) is called the quadraturespectrum. <£a6(X) = arg/a&(X) is called the phase spectrum, while |/aft(X)| iscalled the amplitude spectrum.

Suppose that the autocovariance functions cat,(u), u — 0, ±1,. . . arecollected together into the matrix-valued function cxx(u), u = 0, ±1, . . .having cab(u) as the entry in the 0th row and bth column. Suppose likewisethat the second-order spectra,/fl/>(X), — «> < X < «, are collected togetherinto the matrix-valued function fxx(X), — °° < X < °°, having/aft(X) as theentry in the 0th row and 6th column. Then the definition (2.5.2) may bewritten

The r X r matrix-valued function, fxx(^), — °° < X < <», is called thespectral density matrix of the series X(/), t = 0, ± 1,. . . . Under the condi-tion (2.5.1), the relation (2.5.4) may be inverted to obtain the representation

In Theorem 2.5.1 we shall see that the matrix fxx(^) is Hermitian, non-negative definite, that is, f^XX) = fxx&Y and «Tfjrjr(X)a ^ 0 for all rvectors a with complex entries.

Theorem 2.5.1 LetX(/), t = 0, ±1, ... be a vector-valued series withautocovariance function CXX(H) = cov{X(f + u), X(?)}, /, u — 0, ±1, ...satisfying

Then the spectral density matrix

is Hermitian, non-negative definite.

In the case r = 1, this implies that the power spectrum is real andnon-negative.

In the light of this theorem and the symmetry and periodicity propertiesindicated above, a power spectrum may be displayed as a non-negativefunction on the interval [0,7r], We will discuss the properties of powerspectra in detail in Chapter 5.

2.6 CUMULANT SPECTRA OF ORDER k 25

In the case that the vector-valued series X(0, / = 0, ±1,. . . has finitesecond-order moments, but does not necessarily satisfy some mixing condi-tion of the character of (2.5.1), we can still obtain a spectral representationof the nature of (2.5.5). Specifically we have the following:

Theorem 2.5.2 Let X(/), t — 0, ±1,... be a vector-valued series that issecond-order stationary with finite autocovariance function cxx(u) =cov {X(/ + u), X(f)|, for t, u = 0, ±1,Then there exists an r X rmatrix-valued function ¥xx(ty, — ir < X ̂ TT, whose entries are of boundedvariation and whose increments are non-negative definite, such that

The representation (2.5.8) was obtained by Herglotz (1911) in the real-valued case and by Cramer (1942) in the vector-valued case.

for — oo < \j, < oo, a t , . . . , a* = 1, .. ., r, k = 2, 3,. . . . We will extendthe definition (2.6.2) to the case k = 1 by setting fa=caEXa(f),

The function F^X) is called the spectral measure of the series X(f),t = 0, ±1, . . . . In the case that (2.5.1) holds, it is given by

This function is given by

In this case, we define the feth order cumulant spectrum,/,, . . . , 0 t ( X i , . . . , X*_i)

2.6 CUMULANT SPECTRA OF ORDER fe

Suppose that the series X(r), t = 0, ± I , . . . is stationary and that its spanof dependence is small enough that

26 FOUNDATIONS

a = 1,. . . , r. We will sometimes add a symbolic argument X* to the func-tion of (2.6.2) writing/o, ak(\i,.. . , X*) in order to maintain symmetry. X*may be taken to be related to the other X, by ̂ X, == 0 (mod 2ir).

We note that/ai,...,afc(Xr,.. . , X*) is generally complex-valued. It is alsobounded and uniformly continuous in the manifold ]£* X, = 0 (mod 2?r).We have the inverse relation

and in symmetric form

where

is the Dirac comb of (2.1.6).We will frequently assume that our series satisfy

Assumption 2.6.1 X(/) is a strictly stationary r vector-valued series withcomponents Xj(i), j = 1, . . . , / • all of whose moments exist, and satisfying(2.6.1) for ai, . . . , dk = 1, . . . , r and k — 2, 3, . . . .

We note that all cumulant spectra, of all orders, exist for series satisfyingAssumption 2.6.1. In the case of a Gaussian process, it amounts tonothing more than £ |ca6(tt)| < «, a, b = 1, . . . , r.

Cumulant spectra are defined and discussed in Shiryaev (1960), Leonov(1964), Brillinger (1965) and Brillinger and Rosenblatt (1967a, b). The ideaof carrying out a Fourier analysis of the higher moments of a time seriesoccurs in Blanc-Lapierre and Fortet (1953).

The third-order spectrum of a single series has been called the bispectrum;see Tukey (1959) and Hasselman, Munk and MacDonald (1963). Thefourth-order spectrum has been called the trispectrum.

On occasion we will find the following assumption useful.

Assumption 2.6.2(1) Given the r-vector stationary process X(/) with com-ponents Xj(i), j = 1, . . . , / • , there is an / ^ 0 with

for j = 1,.. . , k — 1 and any k tuple a\,. . . , a* when k = 2, 3, . . . .

2.7 FILTERS 27

This assumption implies, for / > 0, that well-separated (in time) values ofthe process are even less dependent than implied by Assumption 2.6.1, theextent of dependence depending directly on /. Equation (2.6.6) implies thatfai flfc(Xi,..., X*) has bounded and uniformly continuous derivatives oforder ^ /.

If instead of expressions (2.6.1) or (2.6.6) we assume only ave |Jfa(/)|fc < °°,

a = ! , . . . , / - , then the/fll a t (Xi , . . . , \k) appearing in (2.6.4) are Schwartzdistributions of order ^ 2. These distributions, or generalized functions, arefound in Schwartz (1957, 1959). In the case k = 2, Theorem 2.5.2 showsthey are measures.

Several times in later chapters we will require a stronger assumption thanthe commonly used Assumption 2.6.1. It is the following:

Assumption 2.6.3 The r vector-valued series X(f), / = 0, ± 1,. . . satisfiesAssumption 2.6.1. Also if

2.7 FILTERS

In the analysis of time series we often have occasion to apply somemanipulatory operation. An important class of operations consists ofthose that are linear and time invariant. Specifically, consider an operationwhose domain consists of r vector-valued series X(f), t = 0, ±1, . . . andwhose range consists of s vector-valued series Y(f), / = 0, r b l , . . . . Wewrite

for z in a neighborhood of 0.

This assumption will allow us to obtain probability 1 bounds for variousstatistics of interest. If X(/), t = 0, ±1,. . . is Gaussian, all that is requiredis that the covariance function be summable. Exercise 2.13.36 indicates theform of the assumption for another example of interest.

to indicate the action of the operation. The operation is linear if for seriesXi(0, X2(0> t = 0, ±1, . . . in its domain and for constants ai, 0:2 we have

then

28 FOUNDATIONS

Next for given w let T"X(t), t = 0, ±1,. . . denote the series X(t + u), t = 0,± 1,. . . . The operation a is time invariant if

We may now set down the definition: an operation a carrying r vector-valued series into s vector-valued series and possessing the properties (2.7.2)and (2.7.3) is called an s X r linear filter.

The domain of an s X r linear filter may include r X r matrix-valuedfunctions U(/), t = 0, ±1,. . . . Denote the columns of U(r) by U//),j — 1,. .. , r and we then define

The range of this extended operation is seen to consist of s X r matrix-valued functions.

An important property of filters is that they transform cosinusoids intocosinusoids. In particular we have

Lemma 2.7.1 Let S be a linear time invariant operation whose domainincludes the r X r matrix-valued series

/ = 0, ±1,...; — oo < X < co where I is the r X r identity matrix. Thenthere is an s X r matrix A(X) such that

In other words a linear time invariant operation carries complex exponen-tials of frequency X over into complex exponentials of the same frequency X.The function A(X) is called the transfer function of the operation. We see thatA(X -(- 2*0 = A(X).

An important class of s X r linear filters takes the form

/ = 0, ±1, .. ., where X(f) is an r vector-valued series, Y(/) is an s vector-valued series, and a(w), u = 0, ± 1, .. . is a sequence of 5 X r matricessatisfying

2.7 FILTERS 29

We call such a filter an s X r summable filter and denote it by {a(u)|. Thetransfer function of the filter (2.7.7) is seen to be given by

It is a uniformly continuous function of X in view of (2.7.8). The functiona(«), « = 0, ±1,. . . is called the impulse response of the filter in view of thefact that if the domain of the filter is extended to include r X r matrixvalued series and we take the input series to be the impulse

then the output series is a(/), t = 0, ±1, . . . .An s X r filter ja(w)} is said to be realizable if a(u) = 0 for u — — 1, —2,

— 3 From (2.7.7) we see that such a filter has the form

and so Y(0 only involves the values of the X series for present and pasttimes. In this case the domain of A(X) may be extended to be the region- oo < Re X < oo, Im X £ 0.

On occasion we may wish to apply a succession of filters to the sameseries. In this connection we have

Lemma 2.7.2 If {ai(/)( and (a2(OI are s X r summable filters with transferfunctions Ai(X), A2(X), respectively, then (ai(/) + &2(i)} is an s X r sum-mable filter with transfer function Ai(X) + A2(X).

If {bi(/)j is an r X q summable filter with transfer function Bi(X) and{b2(01 is an s X r summable filter with transfer function Ba(X), then(b2*bi(0), the filter resulting from applying first {bi(0} followed by{b2(0)> is a 5 X q summable filter with transfer function B2(X)Bi(X).

The second half of this lemma demonstrates the advantage of consideringtransfer functions as well as the time domain coefficients of a filter. Theconvolution expression

b2 * b,(0 = £ b2(r - «)bi(ii) (2.7.12)u

takes the form of a multiplication in the frequency domain.Let (a(0) be an r X r summable filter. If an r X r filter \b(t)\ exists such

30 FOUNDATIONS

then {a(0} is said to be nonsingular. The filter |b(0} is called the inverse of{a(0}. It exists if the matrix A(X) is nonsingular for — » < X < oo; itstransfer function is ACX)"1.

On occasion we will refer to an / summable filter. This is a summablefilter satisfying the condition

Two examples of / summable filters follow. The operation indicated by

is an / summable filter, for all /, with coefficients

and transfer function

We will see the shape of this transfer function in Section 3.2. For M not toosmall, A(\) is a function with its mass concentrated in the neighborhood offrequencies X = 0 (mod 2ir). The general effect of this filter will be tosmooth functions to which it is applied.

Likewise the operation indicated by

is an / summable filter, for all /, with coefficients

and transfer function

This transfer function has most of its mass in the neighborhood of fre-quencies X = ±T, ± 3 7 r , . . . . The effect of this filter will be to remove theslowly varying part of a function and retain the rapidly varying part.

We will often be applying filters to stochastic series. In this connectionwe have

2.7 FILTERS 31

Lemma 2.7.3 If X(0 is a stationary r vector-valued series with £|X(0| < °°,and {a(/)j is an s X r summable filter, then

it is possible to define the output of such a filter as a limit in mean square.Specifically we have

Theorem 2.7.1 Let X(0, t = 0, db 1, . . . be an r vector-valued series withabsolutely summable autocovariance function. Let A(\) be an s X r matrix-valued function satisfying (2.7.23). Set

t = 0, ±1,.. . exists with probability 1 and is an s vector-valued stationaryseries. If £|X(f)|* < », k > 0, then £|Y(/)j* < ».

An important use of this lemma is in the derivation of additional sta-tionary time series from stationary time series already under discussion. Forexample, if e(r) is a sequence of independent identically distributed r vectorvariates and {a(f)l is an s X r filter, then the s vector-valued series

is a strictly stationary series. It is called a linear process.Sometimes we will want to deal with a linear time invariant operation

whose transfer function A(X) is not necessarily the Fourier transform of anabsolutely summable sequence. In the case that

M = 0, ±1,..... Then

exists for t — 0, ±1,

Results of this character are discussed in Rosenberg (1964) for the case inwhich the conditions of Theorem 2.5.2 are satisfied plus

32 FOUNDATIONS

Two 1 X 1 filters satisfying (2.7.23) will be of particular importance inour work. A 1 X 1 filter \a(u)} is said to be a band-pass filter, centered at thefrequency Xo and with band-width 2A if its transfer function has the form

in the domain — T < X < tr. Typically A is small. If Xo = 0, the filter iscalled a low-pass filter. In the case that

for constants RJt <£,, k and the transfer function A(\) is given by (2.7.26), wesee that the filtered series is given by

with the summation extending overy such that |X, ± Xoj ̂ A. In otherwords, components whose frequencies are near Xo remain unaffected, where-as other components are removed.

A second useful 1 X 1 filter is the Hilbert transform. Its transfer functionis purely imaginary and given by —i sgn X, that is

If the series X(t), t = 0, ± 1, . . . is given by (2.7.27), then the series resultingfrom the application of the filter with transfer function (2.7.29) is

The series that is the Hilbert transform of a series X(t) will be denoted*"(/), / = 0, ± 1 , . . . .

Lemma 2.7.4 indicates how the procedure of complex demodulation (seeTukey (1961)) may be used to obtain a band-pass filter centered at a generalfrequency Xo and the corresponding Hilbert transform from a low-pass filter.

In complex demodulation we first form the pair of real-valued series

for / = 0, ± 1,. . . and then the pair of series

2.7 FILTERS 33

where {a(0} is a low-pass filter. The series, W$t\ W-&S) — oo < / < oo,arecalled the complex demodulates of the series X(t), — <=° < t < <». Because{a(t)\ is a low-pass filter, they will typically be substantially smoother thanthe series X(i), — oo < t < <». If we further form the series

for — oo < t < oo, then the following lemma shows that the series V\(f) isessentially a band-pass filtered version of the series X(t\ while the seriesV2(i) is essentially a band-pass filtered version of the series XH(t).

Lemma 2.7.4 Let [a(t)\ be a filter with transfer function A(\\— oo < X < oo. The operation carrying the series X(f), -co < t < °°, intothe series V\(f) of (2.7.33) is linear and time invariant with transfer function

The operation carrying the series X(f) into Vz(t) of (2.7.33) is linear and timeinvariant with transfer function

In the case that A($ is given by

for — TT < X < TT and A small, functions (2.7.34) and (2.7.35) are seen tohave the forms

— TT < X, Ao < TT and

for — IT < X, Ao < TT.Bunimovitch (1949), Oswald (1956), Dugundji (1958), and Deutsch (1962)

discuss the interpretation and use of the output of such filters.

for all (complex) A I , . . . , A, and so obtain the result of Theorem 2.5.1 —that the matrix fxx(X) is non-negative definite — from the case r = 1.

As power spectra are non-negative, we may conclude from (2.8.4) that

If s = 1, then the power spectrum of Y(t) is given by

where A(\) is the transfer function of the filter.

Example 2.8.2 Let fxx(ty and fyy(X) signify the r X r and s X s matrices ofsecond-order spectra of X(0 and Y(/); respectively. Then

Some cases of this theorem are of particular importance.

Example 2.8.1 LctX(t) and Y(t) be real-valued with power spectra/y^X),/yy(X), respectively; then

34 FOUNDATIONS

2.8 INVARIANCE PROPERTIES OF CUMULANT SPECTRA

The principal parameters involved in our discussion of the frequencyanalysis of stationary time series are the cumulant spectra. At the same timewe will often be applying filters to the series or it will be the case that somefiltering operation has already been applied. It is therefore important thatwe understand the effect of a filter on the cumulant spectra of stationaryseries. The effect is of an elementary algebraic nature.

Theorem 2.8.1 Let X(r) be an r vector series satisfying Assumption 2.6.1and Y(0 = ]£„ &(t — «)X(«), where {a(0} is an s X r summable filter. Y(f)satisfies Assumption 2.6.1. Its cumulant spectra

are given by

2.9 EXAMPLES OF STATIONARY TIME SERIES 35

Example 2.8.3 If X(/), Y(f)> t = 0, db 1,. . . are both r vector-valued with Yrelated to X through

2.9 EXAMPLES OF STATIONARY TIME SERIES

The definition of, and several elementary examples of, a stationary timeseries was presented in Section 2.4. As stationary series are the basic entitiesof our analysis, it is of value to have as many examples as possible.

Example 2.9.1 (A Pure Noise Series) Let e(f), t = 0, ±1,. . . be a sequenceof independent, identically distributed r vector-valued random variables.Such a series clearly forms a stationary time series.

Example 2.9.2 (Linear Process) Let c(/), t = 0, ±1,. . . be the r vector-valued pure noise series of the previous example. Let

then the cumulant spectra of $f) are given by

where Bj($ denotes the transfer function of the filter {&/«)}.

Later we will see that Examples 2.8.1 and 2.8.3 provide convenient meansof interpreting the power spectrum, cross-spectrum, and higher ordercumulant spectra.

where (a(w)} is an s X r summable filter. Following Lemma 2.7.3, this seriesis a stationary s vector-valued series.

If only a finite number of the a(w) in expression (2.9.1) are nonzero, thenthe series X(t) is referred to as a moving average process. If a(0), a(m) 7* 0and a(w) = 0 for u > m and u < 0, the process is said to be of order m.

Example 2.9.3 (Cosinusoid) Suppose that X(f) is an r vector-valued serieswith components

where /? i , . . . , / ? , are constant, #1,. . ., <f>r-i are uniform on (—7r,7r), and4>i -f • • • + <£r = 0. This series is stationary, because if any finite collectionof values is considered and then the time points are all shifted by t, theirstructure is unchanged.

36 FOUNDATIONS

Example 2.9.4 (Stationary Gaussian Series) An r vector time series X(/),t = 0, ±1, ±2 , . . . is a Gaussian series if all of its finite dimensional distri-butions are multivariate Gaussian (normal). If EX(t) = y and EX(t)\(u)T =R(t — u) for all t, u, then X(f) is stationary in this case, for the series is deter-mined by its first- and second-order moment properties.

We note that if X(f) is a stationary r vector Gaussian series, then

for an s X r filter |a(0] is a stationary s vector Gaussian series.Extensive discussions of stationary Gaussian series are found in Blanc-

Lapierre and Fortet (1965), Loeve (1963), and Cramer and Leadbetter(1967).

Example 2.9.5 (Stationary Markov Processes) An r vector time series X(t),t = 0, ±1, ±2, . . . is said to be an r vector Markov process if the condi-tional probability

Prob{X(/) ^ X | X(Sl) = x,,. . . , X(sn) = x,, X(s) = x} (2.9.4)

(for any s\ < 52 < • • • < sn < s < t) is equal to the conditional probability

The function P(.y,x,/,X) is called the transition probability function. It andan initial probability Prob{X(0) ^ XQ} , completely determine the probabilitylaw of the process. Extensive discussions of Markov processes and in par-ticular stationary Markov processes may be found in Doob (1953), Dynkin(1960), Loeve (1963), and Feller (1966).

A particularly important example is that of the Gaussian stationaryMarkov process. In the real-valued case, its autocorrelation function takes asimple form.

Lemma 2.9.1 If X(t), t = 0, ±1, ±2, . . . is a nondegenerate real-valued,Gaussian, stationary Markov process, then its autocovariance function isgiven by CA-A-(O)P|M|, for some p, — 1 < p < 1.

Another class of examples of real-valued stationary Markov processes isgiven in Wong (1963). Bernstein (1938) considers the generation oPMarkovprocesses as solutions of stochastic difference and differential equations.

An example of a stationary Markov r vector process is provided by X(0,the solution of

2.9 EXAMPLES OF STATIONARY TIME SERIES 37

where t(f) is an r vector pure noise series and a an r X r matrix with alleigenvalues less than 1 in absolute value.

Example 2.9.6 (Autoregressive Schemes) Equation (2.9.6) leads us to con-sider r vector processes X(t) that are generated by schemes of the form

where t(i) is an r vector pure noise series and a(l),. . . , a(m) are r X rmatrices. If the roots of Det A(z) = 0 lie outside the unit circle where

it can be shown (Section 3.8) that (2.9.7) has a stationary solution. Such anX(/) is referred to as an r vector-valued autoregressive process of order m.

Example 2.9.7 (Mixing Moving Average and Autoregressive Process) Onoccasion we combine the moving average and autoregressive schemes. Con-sider the r vector-valued process X(t) satisfying

where t(t) is an s vector-valued pure noise series, a(y), j — 1,.. . , m arer X r matrices, and b(&), k = 1,. . . , n are r X s matrices. If a stationaryX(f), satisfying expression (2.9.9), exists it is referred to as a mixed movingaverage autoregressive process of order (m,n).

If the roots of

lie outside the unit circle, then an X(t) satisfying (2.9.9) is in fact a linearprocess

where C(X) = A(A)-!B(\); see Section 3.8.

Example 2.9.8 (Functions of Stationary Series) If we have a stationaryseries (such as a pure noise series) already at hand and we form time in-variant measurable functions of that series then we have generated anotherstationary series. For example, suppose X(f) is a stationary series andY(0 = ̂ u &(t — u)X(u) for some 5 X r filter {a(«)}. We have seen, (Lemma2.7.1), that under regularity conditions Y(r) is also stationary. Alternativelywe can form a Y(/) through nonlinear functions as by

where U is a transformation that preserves probabilities and 6 lies in theprobability space; see Doob (1953) p. 509. We can often take 6 in the unitinterval; see Choksi (1966).

Unfortunately relations such as (2.9.12) and (2.9.13) generally are not easyto work with. Consequently investigators (Wiener (1958), Balakrishnan(1964), Shiryaev (1960), McShane (1963), and Meecham and Siegel (1964))have turned to series generated by nonlinear relations of the form

in the hope of obtaining more reasonable results. Nisio (1960, 1961) hasinvestigated 7(0 of the above form for the case in which X(t) is a pure noiseGaussian series. Meecham (1969) is concerned with the case where Y(t) isnearly Gaussian.

We will refer to expansions of the form of expression (2.9.14) as Volterrafunctional expansions; see Volterra (1959) and Brillinger (I970a).

In connection with Y(t) generated by expression (2.9.14), we have

Theorem 2.9.1 If the series X(t), t = 0, ±1,. . .satisfies Assumption 2.6.1and

with the aj absolutely summable and L < <», then the series Y(t), t = 0,±1,... also satisfies Assumption 2.6.1.

We see, for example, that the series X(tY, t — 0, ±1, ... satisfies As-sumption 2.6.1 when the series X(t) does. The theorem generalizes to rvector-valued series and in that case provides an extension of Lemma 2.7.3.

Example 2.9.9 (Solutions of Stochastic Difference and Differential Equations)We note that literature is developing on stationary processes that satisfyrandom difference and differential equations; see, for example, Kampe deFeriet (1965), Ito and Nisio (1964), and Mortensen (1969).

38 FOUNDATIONS

for some measurable f[xi,X2]; see Rosenblatt (1964). In fact, in a real sense,all stationary functions are of the form of expression (2.9.12), f possiblyhaving an infinite number of arguments. Any stationary time series, definedon a probability space, can be put in the form

2.10 EXAMPLES OF CUMULANT SPECTRA 39

In certain cases (see Ito and Nisio (1964)) the solution of a stochasticequation may be expressed in the form (2.9.14).

Example 2.9.10 (Solutions of Volterra Functional Relations) On occasion,we may be given Y(t) and wish to define X(t) as a series satisfying expression(2.9.14). This provides a model for frequency demultiplication and theappearance of lower order harmonics.

2.10 EXAMPLES OF CUMULANT SPECTRA

In this section we present a number of examples of cumulant spectra oforder k for a number of r vector-valued stationary time series of interest.

Example 2.10.1 (A Pure Noise Series) Suppose that t(t) is an r vector purenoise series with components za(t),a — \,...,r. Let

The result of this example may be combined with that of the previous ex-ample to obtain the spectra of moving average and autoregressive processes.

Example 2.10.2 (Stationary Gaussian Series) The characteristic functionof a multivariate Gaussian variable, with mean vector p and variance-covariance matrix S, is given by

We see from this that all cumulant functions of order greater than 2 mustvanish for a Gaussian series and therefore all cumulant spectra of ordergreater than 2 also vanish for such a series.

where (a(01 is an s X r filter and e(/) an r vector pure noise series. FromTheorem 2.8.1 we have

Example 2.10.2 (A Linear Process) Suppose that

exist; then cai ak(ui,.. ., uk-i) = Ka, ak8{ui} • • -d{uk-i}, where §{*} isthe Kronecker delta. We see directly that

40 FOUNDATIONS

We see that cumulant spectra of order greater than 2, in some sense,measure the non-normality of a series.

Example 2.10.3 (Cosinusoids) Suppose that X(0 is an r vector process withcomponents Xa(i) = Ra cos (uat -f- </>„), a = 1,. . . , r, where Ra is constant,wi + • • • + ojr = 0 (mod 2?r), and , = () (mod 2*-). X(0 is stationary. We note thatthe members of any proper subset of <pi , . . . , tpr are independent of eachother and so joint cumulants involving such proper subsets vanish. Therefore

cum{*i(fi),. . . , Xr(tr)= ave{ATi(fi) X • • • X Xr(tr) 1

r?(X) was defined by (2.1.6), see also Exercise 2.13.33.In the case that r ~ 1, the power spectrum of the series X(t) =

R cos (o>f + 0), is seen to be

It has peaks at the frequencies X = ±co (mod 2ir). This provides one of thereasons for calling X the frequency. We see that u/(2w) is the number ofcomplete cycles the cosinusoid cos (ut + <£) passes through when t increasesby one unit. For this reason X/(27r) is called the frequency in cycles per unittime. Its reciprocal, 27T/X, is called the period. X itself is the angular frequencyin radians per unit time.

Example 2.10.4 (Volterra Functional Expansions) We return to Example2.9.8 and have

Theorem 2.10.1 Let Y(t\ t = 0, ±1 , . . . be given by (2.9.15) whereS \aj(ui,. . ., uj)\ < oo, and

This is a function of 11 — f r , . . ., tr-\ — tr as wi + • • • + ov = 0 (mod 2ir).We have

and so

2.11 FUNCTIONAL & STOCHASTIC APPROACHES TO TIME SERIES ANALYSIS 41

Then the 7th order cumulant spectrum of the series Y(t), t — 0, ±1,.. . isgiven by

where the outer sum is over all the indecomposable partitions {Pi,..., PM],M = 1,2, . . . o f Table 2.3.4.

We have used the symmetrical notation for the cumulant spectra in Equa-tion (2.10.10). Theorem 2 in Shiryaev (1960) provides a related result.

2.11 THE FUNCTIONAL AND STOCHASTIC APPROACHES TO TIMESERIES ANALYSIS

Currently two different approaches are adopted by workers in time series:the stochastic approach and the functional approach. The former, generallyadopted by probabilists and statisticians (Doob, (1953) and Cramer andLeadbetter (1967)), is that described in Section 2.2. A given time series is re-garded as being selected stochastically from an ensemble of possible series.We have a set 0 of r vector functions 6(0- After defining a probabilitymeasure on 6, we obtain a random function X(f,0), whose samples are thegiven functions d(t). Alternatively given X(/), we can set up an index 6 = X( •)and take 6 to be the set of all 6. We then may set X(f,0) = X(f,X(-)). In anycase we find ourselves dealing with measure theory and probability spaces.

In the second approach, a given r vector time series is interpreted as amathematical function and the basic ensemble of time functions takes theform |X(/,p) = X(/ + D) \ v = 0, ±1, ±2, . . .}, where X(f) is the given rvector function. This approach is taken in Wiener (1930), for example, andis called generalized harmonic analysis.

The distinction, from the point of view of the theoretician, is the differentmathematical tools required and the different limiting processes involved.

Suppose that X(f) has components^/), a = 1, . . . , / • . In the functionalapproach we assume that limits of the form

42 FOUNDATIONS

exist. A form of stationarity obtains as

independently of v for v = 0, ±1, ±2, . . . . We now define a cross-covariance function by

If

we can define a second-order spectrum /fli>(X) as in Section 2.5.Suppose that the functions Xa(t), a — 1, . . . , r are such that

(i) for given real x\,...,Xk and ti,. . . , tk the proportions, F^ ak

(jci,. . . , XA;/I , • • • » t k ) , of /'s in the interval [—5,7") such that

tends to a limit Fai ak(x\, . . . , Xk\t\, . . . , tk) (at points of continuityof this function) as S, T —> <» and

(ii) a compactness assumption such as

is satisfied for all S, T and some u > 0.

In this case the Fai ak(xi,. . . , x*;/i,. . . , tk) provide a consistent andsymmetric family of finite dimensional distributions and so can be associatedwith some stochastic process by the Kolmogorov extension theorem; seeDoob (1953). The limit in (i) depends only on the differences ti — tk, . • • ,tk-i — tk and so the associated process is strictly stationary. If in (ii) wehave u ^ k and X(/) is the associated stationary process, then

and the association makes sense. X(/) will satisfy Assumption 2.6.1 if thecumulant-type functions derived from X(t) satisfy (2.6.1).

2.12 TRENDS 43

In other words, if the function (of the functional approach) satisfies cer-tain regularity conditions, then there is a strictly stationary process whoseanalysis is equivalent.

Conversely: if X(/) is ergodic (metrically transitive), then with probability1 any sample path satisfies the required limiting properties and can be takenas the basis for a functional approach.1

In conclusion, we have

Theorem 2.11.1 If an r vector function satisfies (i) and (ii) above, then astationary stochastic process can be associated with it having the same limit-ing properties. Alternatively, if a stationary process is ergodic, then withprobability 1 any of its sample paths can be taken as the basis for a func-tional approach.

These two approaches are directly comparable to the two approaches tostatistics through kollectivs (Von Mises (1964)) and measurable functions(Doob (1953)); see also Von Mises and Doob (1941).

The condition that X(/) be ergodic is not overly restrictive for our pur-poses since it is ergodic when it satisfies Assumption 2.6.1 and is determinedby its moments; Leonov (1960). We note that a general stationary process is amixture of ergodic processes (Rozanov (1967)), and the associated processobtained by the above procedure will correspond to some component of themixture. The limits in (i) will exist with probability 1; however, they willgenerally be random variables.

Wold (1948) discusses relations between the functional and stochastic ap-proaches in the case of second-order moments.

We note that the limits required in expressions (2.11.1) and (2.11.2) followunder certain conditions from the existence of the limits in (i); see Wintner(1932).

We will return to a discussion of the functional approach to time seriesanalysis in Section 3.9.

2.12 TRENDS

One simple form of departure from the assumption of stationarity is thatthe series X(t), t = 0, ±1,. . . has the form

JX(/) is ergodic if for any real-valued/[x] with ave |/(X(/)] | < o°, with probability 1,

See Cramer and Leadbetter (1967), Wiener et al (1967), Halmos (1956), Billingsley(1965), and Hopf (1937).

44 FOUNDATIONS

where the series e(0, / = 0, ±1,. . . is stationary, while m(t), t = 0, ±1,.. .is a nonconstant deterministic function. If, in addition, m(t) does not satisfyconditions of the character of those of Section 2.11, then a harmonic analy-sis of X(t) is not directly available. Our method of analysis of such series willbe to try to isolate the effects of m(f)and e(/) for separate analysis.

If the function m(t), t = 0, ± 1,.. . varies slowly, it will be referred to as atrend. Many series occurring in practice appear to possess such a trend com-ponent. The series of United Kingdom exports graphed in Figure 1.1.4appear to have this characteristic. In Section 5.11 we will discuss the estima-tion of trend functions of simple form.

2.13 EXERCISES

2.13.1 Let X(t) = cos (Xr + 0), where 6 has a uniform distribution on ( —TT.TT].Determine the finite dimensional distributions of the process, the meanfunction, cx(i), and the autocovariance function cxx(ti,t2).

2.13.2 If (Yi,..., yr) is an r variate chance quantity for which cum (Yjlt. . . ,Yjt)exists71, .. . Js = 1,. .., r andZ* = ̂ , atjYj, k = 1,. . . , s, prove that

cum(Zkl,... ,Zfc.) = J]yr • '2./.a*i>r ' '«*•>.cum Wi» • • • » YjJ> *i, • • •, *i— i, . . . , s.

2.13.3 Denote cum (Yi[m\ times], . . . , Yr[mr times]) and cum (Z\[m times], . . . ,Zs[n, times]) by K,ni...n,r(Y) and #„,...n,(Z), respectively and let £[ffll(Y) andKln](Z\ m = mi H\- mr, n = m H n, denote the vectors withthese components. Denote the transformation of 2.13.2 by Z = AY, whereA is an s X r matrix. Prove that A"[nl(Z) = AulA:lnl(Y), where A1"1 is the«th symmetric Kronecker power of A; see Hua (1963) pp. 10, 100.

2.13.4 Determine the transfer function of the filter of (2.9.6).2.13.5 Show that the power spectrum of the (wide sense stationary) series X(t) —

R cos (co/ + <t>) where R is a constant, to is a random variable with con-tinuous density function /(w) and <J> is an independent uniform variate on(—T.TT] is given by

2.13.6 Prove that the transfer function of the r X r filter indicated by

has off-diagonal elements 0 and diagonal elements [sin(2A/r+ l)X/2]/[(2N + 1) sin X/2].

2.13.7 If Yi(i) = £„ au(f - «)*i(«), Y2(t) = £„ a22(t - u)X2(u\ where\X$t\ X2(t)} satisfies Assumption 2.6.1. Suppose the transfer functions

2.13 EXERCISES 45

AU($, /*2a(X) are not 0. Denote the second-order spectra of [Xi(i), X&)}by fjk(X) and those of {Ki(0, Y2(t)} by gjk(\),j, k = 1, 2. Prove that

2.13.8 Prove that 6(*) = lim «JF(/ar) where/ |JP(;t)ldx: < « and/ W(;c)<fc: = 1.W—t 00

2.13.9 If X(/), Y(r) are statistically independent r vector series with cumulant spectrafa(W = A a*(Xi , . . . , X*)ga(W = g"io*(Xi , . . . , X*), respectively, provethat the cumulant spectra of X(0 + Y(/) are given by fa(3t) + &A).

2.13.10 If X(f)and a(t)are real-valued, 7(/) = £„ o(/ - «)^(«) and ^r(/) hascumulant spectra /^ x(hi, . . ., X*), prove that the cumulant spectra ofY(t) are given by ^(Xi)- • -A(\k)xjK\i, . . . , X*).

2.13.11 Prove thatât(Xi, . . . , X*) = faiat(—Xi, . . . . —X*) for anr vectorseries with real-valued components.

2.13.12 If X(/) is a stationary Gaussian Markov r vector process with

prove that

and cxx(u) = (CXX(-UJTu < 0.2.13.13 Prove that the power spectrum of a real-valued stationary Gaussian

Markov process has the form a2/(l + p2 — 2p cos X)2r, — T < X ̂ TT,-1 <p < 1.

2.13.14 Give an example to indicate that X(/) of Section 2.11 is not necessarilyergodic.

2.13.15 Let X(w(f), / = 0, ±1,... ; N = 1, 2, . . . be a sequence of series satisfyingAssumption 2.6.1. Suppose

for / , « i , . . . , Mfc_i = 0, ±1,. . . ; N = 1, 2, . . . where

Suppose, as N —> <», all the finite dimensional distributions of the processXw(f), / = 0, ±1, . . . tend in distribution to those of a process X(f),/ = 0, ±1 Show that

2.13.16 Show that the transfer function of the filter

vanishes at X — ±co. Discuss the effect of this filter on the series

46 FOUNDATIONS

2.13.17 Let*(/) = 1 for (2j - I)2 ^ f ^ (2/)2 and

Let^r(r) = -1 for

Prove that X(i) satisfies the conditions of Section 2.11 and determine theassociated stochastic process.

2.13.18 Let AX/) = R cos (w/ + <p) where R, w, and <p are constants. Prove thatX(fsatisfies the conditions of Section 2.11 and determine the associatedstochastic process.

2.13.19 Let X(t\ t = 0, ±1,... and Y(t\ / = 0, ±1,... be independent series withmean 0 and power spectra fxx(\\ /rr(X) respectively. Show that the powerspectrum of the series X(t)Y(t), t = 0, ±1, . . . is

2.13.20 Let X(t\ t = 0, ±1, . . . be a Gaussian series with mean 0 and powerspectrum fxx(X). Show that the power spectrum of the series AX/)2, t - 0,±1, . . . is

2.13.21 If AX/) is a real-valued series satisfying Assumption 2.6.1, prove, directly,that [AX/)]2 also satisfies Assumption 2.6.1 and determine its cumulantspectra.

2.13.22 If X(/) satisfies Assumption 2.6.2(/), Y(r) = £)„ a(/ - «)X(w) for a(w) ans X r filter with 2 |M|'|a7fc(«)| < <» ,7 = 1 , . . . , s, k ~ 1, . . . , r for some /,then Y(r) satisfies Assumption 2.6.2(/).

2.13.23 An s X r filter a(w) is said to have rank t if A(X) has rank t for each X.Prove that in this case a(w) is equivalent in effect to applying first a / X rfilter then an s X t filter.

2.13.24 If X(/) = û"oa(' ~ u)t(u) with t(/) an r vector pure noise series and|a(w)} an s X r summable filter, prove that fx*(X) may be written in theform $(e'x) 4»(e'A)T where 4>(z) is an 5 X r matrix valued function withcomponents analytic in the disc z\ ^ 1.

2.13.25 If X(t) = ^Jto0^ ~ «)e(«) with e(w) a real-valued pure noise series and2a(w)2 < oo, prove that the kih order cumulant spectrum, 7x..*(Xi,..., X*)has the form $(e'xO- • -$(e'x*), Xi -\ \-\k = 0(mod 2?r), with *(z)analytic in the disc |z| ^ 1.

2.13.26 If X(r) is a moving average process of order m, prove that cxx(u) = 0 for\u\ > m.

2.13.27 If we adopt the functional approach to time series analysis, demonstratethat Y(t) = £)„ a(t - u)X(u\ ^-^ |a(w)| < « defines a filter. Indicate therelation between the spectra of Y(f) and those of X(t).

2.13 EXERCISES 47

2.13.28 Show that Vi(t\ F2(0 of (2.7.33) come from X(t) through filters withcoefficients \a(u) cos \ou], [a(u) sin XOH}, respectively.

2.13.29 Prove that S(ax) = \a\~l8(x).

2.13.30 Let X(0 be a stationary r vector valued series with X}{t) = pjXfa — 1) +e/0, \pj\ < 1, y = 1, . . . , /• where e(0 is an r vector pure noise series.Prove that

Him: Define lxx^(\) as in the proof of Theorem 2.5.2. See Bochner (1959)/• /*^ /-Tp. 329, and Grenander (1954). Prove / \(a)lxx(T)(oi)da-> / A(a)dGxx(a)

J -» J -T

for A(a) continuous on [ — TT.TT] also.

where T = min ( / i , . . . , /*), a = (a i , . . . , a*) and #ai ak = cum{ea ,(/),... ,6*4(0}.

2.13.31 Let $(r), 7 = 1, 2 , . . . be a sequence of positive numbers with the proper-ties; <S>(70-> oo and $(T+ l)/f>(r)->l as r-> «. Let X(0, / = 0,±1, ... be an r vector-valued function with the property

for u = 0, ±1,.... Show that there exists an r X r matrix-valued functionG*;KX), — TT < X ^ T, such that

2.13.32 Let X(r), t — 0, ±1,... be a vector-valued stationary series with cumulantspectra fai ak(\i, . . . , A*)- Evaluate the cumulant spectra of the time-reversed series X(—/), t = 0, ±1, . . . .

2.13.33 Show that

for — oo < X < oo. (The Poisson Summation Formula, Edwards (1967))2.13.34 Show that the function cxx(u) = 1, for \u\ ^ m and cxx(u) = 0 otherwise,

cannot be an autocovariance function.

2.13.35 Let Xj(t\ t = 0, ±1,. . . \j = 0,. . . , / — 1 be / independent realizationsof a stationary process. Let

Show that Y(t), / = 0, ±1, . . . is a stationary series. Show that its powerspectrum is fxxO^J)-

48 FOUNDATIONS

2.13.36 In the case that X(/), f = 0, ±1,... is a linear process with

show that Assumption 2.6.3 is satisfied provided for z in a neighborhoodofO

2.13.37 A filter is called stable if it carries a bounded input series over into abounded output series. Show that a summable filter is stable.

2.13.38 Let x(t), t = 0, ±1, . . . be an autoregressive process of order 1. Let e(r),t — 0, ±1,... be a pure noise series. Let X(t) = x(t) + e(r). Show that theseries A^r), t = 0, ±1,... is a mixed moving average autoregressive processof order (1,1).

2.13.39 State and prove an extension of Theorem 2.9.1 in which both the seriesX(r) and Y(f) are vector-valued.

3

ANALYTIC PROPERTIESOF FOURIER TRANSFORMSAND COMPLEX MATRICES

3.1 INTRODUCTION

The principal analytic tool that we will employ with time series is theFourier transform. In this chapter we present those portions of Fourieranalysis that will be required for our discussion. All the functions consideredin this chapter will be fixed, rather than stochastic. Stochastic properties ofFourier transforms will be considered in the next chapter.

Among the topics discussed here are the following: the degree of approxi-mation of a function by the partial sums of its Fourier series; the improve-ment of this approximation by the insertion of convergence factors; theFourier transform of a finite set of values; the rapid numerical evaluation ofFourier transforms; the spectrum of a matrix and its relation to the approxi-mation of one matrix by another of reduced rank; mathematical propertiesof functions of Fourier transforms; and finally the spectral or harmonicrepresentation of functions possessing a generalized harmonic analysis.

We begin by considering the Fourier series of a given function /4(X).

3.2 FOURIER SERIES

Let A(\), — o> < \ < oo, be a complex-valued function of period 2irsuch that

49

50 ANALYTIC PROPERTIES OF FOURIER TRANSFORMS

The Fourier coefficients of A(\) are given by

The function

is plotted in Figure 3.2.1 for the values n = 1, 3, 5, 10. We note that itfluctuates in sign and that it is concentrated in the neighborhood of a = 0,becoming more concentrated as n increases. Also

and the Fourier series of A(\) is then given by

There is extensive literature concerning Fourier series and Fourier coeffi-cients; see Zygmund (1959) and Edwards (1967), for example. Much of thisliterature is concerned with the behavior of the partial sums

In this work we will have frequent occasion to examine the nearness of the/4(n)(X) to A(\) for large n. Begin by noting, from expression (3.2.2) andExercise 1.7.5, that

from Exercise 1.7.5. In consequence of these properties of the function(3.2.6) we see that A(n}(\) is a weighted average of the function A(\ — a),with weight concentrated in the neighborhood of a = 0. We would expect^(n)(X) to be near A(\) for large n, if the function A(a) is not too irregular. Infact we can show that A(n)(\) tends to ^(X) as n —> » if, for example, A(a) isof bounded variation; see Edwards (1967) p. 150.

Under supplementary regularity conditions we can measure the rapidityof approach of /J(fl)(X) to A(\) as n —> o°. Suppose

3.2 FOURIER SERIES 51

Figure 3.2.1 Plot of />„(«) = sin (n + l)a/2ir sin $a.


This condition is tied up with the degree of smoothness of A(a). Under itA(a) has bounded continuous derivatives of order ^ k. We have therefore

and so

In summary, the degree of approximation of A(\) by A(n\\) is intimatelyrelated to the smoothness of /4(\).

We warn the reader that /4(/l)(X) need not necessarily approach A(\) asn —> °o even in the case that A(X) is a bounded continuous function of X; seeEdwards (1967) p. 150, for example. However, the relationship between thetwo functions is well illustrated by (3.2.5). The behavior of A(\) — A(n)(\) isespecially disturbed in the neighborhood of discontinuities of A(\). Gibbs'phenomenon involving the nondiminishing overshooting of a functionalvalue can occur; see Hamming (1962) p. 295 or Edwards (1967) p. 172.

3.3 CONVERGENCE FACTORS

Fejer (1900, 1904) recognized that the partial sums of a Fourier seriesmight be poor approximations of a function of interest even if the functionwere continuous. He therefore proposed that instead of the partial sum(3.2.4) we consider the sum

Using expression (3.2.2) and Exercise 1.7.12 we see that (3.3.1) may bewritten

The function

'We will make use of the Landau o, O notations writing an = oC3n) whenan/fin —»0 as n —> °o and writing an = O(j8n) when | an/fin | is bounded for sufficientlylarge n.

3.3 CONVERGENCE FACTORS 53

is plotted in Figure 3.3.1 for « = 2, 4, 6, 11. It is seen to be non-negative,concentrated in the neighborhood of a = 0 and, following Exercise 1.7.12,such that

It is blunter than the function (3.2.6) of the previous section and has fewerripples. This greater regularity leads to the convergence of (3.3.2) to A(\) inthe case that A(a) is a continuous function in contrast to the behavior of(3.2.5); see Edwards (1967) p. 87. The insertion of the factors 1 — \u\/n inexpression (3.3.1) has expanded the class of functions that may be reason-ably represented by trigonometric series.

Figure 3.3.1 Plot of sin2 Jna/2ir/j sin2 $«.

indicating that (3.3.5) is a weighted average of the function of interestA wide variety of convergence factors h(u/ri) have been proposed. Some

of these are listed in Table 3.3.1 along with their associated //(n)(X). Thetypical shape of h(u/n) involves a maximum of 1 at u = 0, followed by asteady decrease to 0 as \u\ increases to n. Convergence factors have alsobeen called data windows and tapers; see Tukey (1967).

The typical form of //(n>(\) is that of a blocky weight function in theneighborhood of 0 that becomes more concentrated as n —* «>. In fact itfollows from (3.3.6) that

as we would expect. An examination of expression (3.3.7) suggests that forsome purposes we may wish to choose //<n)(X) to be non-negative. The secondand third entries of Table 3.3.1 possess this property. The function //(n)(X)has been called a frequency window and a kernel. From (3.3.7) we see that thenearness of (3.3.5) to A(\) relates to the degree of concentration of the func-tion HM(a) about a = 0. Various measures of this concentration, or band-width, have been proposed. Press and Tukey (1956) suggested the half-powerwidth given by O.L — «t/, where O.L and au are the first positive and negative asuch that //<">(«) = //<">(0)/2. Grenander (1951) suggested the measure


In general we may consider expressions of the form

for some function h(x) with h(0) = 1 and h(x) = 0 for \x\ > 1. The multi-plier, h(u/n\ appearing in (3.3.5) is called a convergence factor; see, for ex-ample, Moore (1966). If we set

(3.3.5) may be written

This is the mean-squared error about 0 if H{n\a) is considered as a probabil-ity distribution on (—*•,*•). Parzen (1961) has suggested the measure


Table 3.3.1 Some Particular Convergence Factors

Authors

Dirichlet[Edwards (1967)]

Fejer, Bartlett[Edwards (1967),Parzen (1963)]de la Valle'-Poussin,Jackson, Parzen[Akhiezer(1956),Parzen (1961)]

Hamming, Tukey[Blackman andTukey (1958)]

Bohman[Bohman (I960)]

Poisson[Edwards (1967)]

Riemann, Lanczos[Edwards (1967),Lanczos (1956)]

GaussWeierstrass[Akhiezer(1956)]

Cauchy, Abel,Poisson[Akhiezer (1956)]

Riesz, BochnerParzen[Bochner (1936),Parzen (1961)]

Tukey[Tukey (1967)]


This is the width of the rectangle of the same maximum height and areaas //<">(«).

A measure that is particularly easy to handle is

Its properties include: if h(u) has second derivative ft"(Q) at u = 0, then

showing a connection with Grenander's measure (3.3.9). Alternately if thekernel being employed is the convolution of kernels GM(a), HM(a), thenwe can show that

for large n. Finally, if

exists for some q > 0, as Parzen (1961) assumes, then

Table 3.3.2 gives the values of 0nH and l///c">(0) for the kernels of Table

3.3.1. The entries of this table, give an indication of the relative asymptoticconcentration of the various kernels.

The following theorem gives an alternate means of examining the asymp-totic degree of approximation.

Theorem 3.3.1 Suppose A(\) has bounded derivatives of order ^ P.Suppose

with


Table 3.3.2 Bandwidths of the Kernels

be 0 for p = 1, 2 , . . . . If h(x) = h(—x), then this is the case for odd valuesof p. The requirement for even p is equivalent to requiring that h(x) be veryflat near x = 0. The last function of Table 3.3.1 is notable in this respect.

In fact, the optimum h(u/ri) will depend on the particular A(\) of interest.A considerable mathematical theory has been developed concerning thebest approximation of functions by trigonometric polynomials; see Akhiezer(1956) or Timan (1963), for example. Bohman (1960) and Akaike (1968)were concerned with the development of convergence factors appropriatefor a broad class of functions; see also Timan (1962), Shapiro (1969), Hoff(1970), Butzer and Nessel (1971).

Wilkins (1948) indicates asymptotic expansions of the form of (3.3.18)that are valid under less restrictive conditions.

for some finite K, then

Expression (3.3.11) gives a useful indication of the manner in which thenearness of (3.3.5) to A(\) depends on the convergence factors employed.If possible it should be arranged that

Kernel

Dirichlet

Fejer

de la Valle'-PoussinHamming

Bohman

Poisson

Riemann

GaussCauchyRieszTukey

The preceding discussion leads us to consider a filter of, for example,the form

for some convergence factors h(u/ri).


As an application of the discussion of this section we now turn to theproblem of filter design. Suppose that we wish to determine time domain co-efficients a(u), u = 0, ±1,... of a filter with prespecified transfer functionA(\). The relation between a(u) and A(\) is given by the expressions

The filter has the form

if X(t\ t = 0, ± 1,. . . is the initial series. Generally a(u) does not vanish forlarge \u\ and only a finite stretch of the X(t) series is available. These factslead to difficulty in applying (3.3.22). We can consider the problem of deter-mining a finite length filter with transfer function near A(\). This maybe formalized as the problem of determining multipliers h(u/ri) so that

is near A(\). This is the problem discussed above.Suppose that we wish to approximate a low-pass filter with cut-off fre-

quency, ft < IT; that is, the desired transfer function is

— TT < X < TT. This filter has coefficients


Figure 3.3.2 Transfer function of ideal Hilbert transform and approximations with variousfactors, n = 7.


Suppose alternately that we would like to realize numerically the Hilberttransform introduced in Section 2.7. Its transfer function is

The filter coefficients are therefore

Suppose n is odd. We are led therefore to consider niters of the form

Figure 3.3.2 indicates the imaginary part of the ideal A(\) of (3.3.27) for0 < X < 7T/2 and the imaginary part of the A(\) achieved by (3.3.29) withn = 1 for a variety of the convergence factors of Table 3.3.1. Because of thesymmetries of the functions involved we need present only the functions forthis restricted frequency range. The importance of inserting convergencefactors is well demonstrated by these diagrams. We also see the manner inwhich different convergence factors can affect the result.

References concerned with the design of digital filters include: Kuo andKaiser (1966), Wood (1968) and No. 3 of the IEEE Trans. Audio Electro.(1968). Goodman (1960) was concerned with the numerical realization of aHilbert transform and of band-pass filters. In Section 3.6 we will discuss ameans of rapidly evaluating the filtered series Y(t). Parzen (1963) discusses avariety of the topics of this section.

3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES

Given the sequence a(u), u = 0, ± 1,. . . our work in the previous sectionshas led us to consider expressions of the form

For fixed n, such an expression is called a finite Fourier transform of the se-quence a(u), u = 0, ±1,. . . , ±«. Such transforms will constitute theessential statistics of our analysis of time series.

These two properties imply that, in the case of real-valued components, theprincipal domain of AX(T}(\) may be taken to be 0 ^ X ̂ TT. Continuing, wenote that if X(f), Y(/), t = 0,. . ., T — 1 are given and if a and /3 are scalars,then

3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES 61

Before proceeding further, it is worthwhile to alter our notation slightlyand to consider the general case of a vector-valued sequence. Specifically weconsider an r vector-valued sequence X(0), X(l) , . . . , X(T — 1), whose do-main is 0, 1,.. . ,r — 1 rather than — n, —n+ 1 , . . . , — 1, 0 ,1, . . . , n. Wedefine the finite Fourier transform of this sequence to be

In the case that T — 2n -j- 1, n being an integer, we may write

and thus we see that the only essential difference between definitions of theform (3.4.1) and the form (3.4.2) is a multiplier of modulus 1. Which defini-tion is more convenient depends on the situation being discussed.

Among the properties of the definition (3.4.2) we note

Also, if the components of X(/) are real-valued, then

On occasion we may wish to relate the finite Fourier transform of theconvolution of two sequences to the Fourier transforms of the two se-quences themselves. We have

Lemma 3.4.1 Let X(/)> t = 0, ±1, ... be r vector-valued and uniformlybounded. Let a(/), / = 0, ±1,. .. be s X r matrix-valued and such that

Set

Then there is a finite K such that


where

We see that the finite Fourier transform of a filtered series is approximatelythe product of the transfer function of the filter and the finite Fourier trans-form of the series. This result will later provide us with a useful means ofrealizing the digital filtering of a series of interest. See also Lemma 6.3.1.

We now indicate a few examples of finite Fourier transforms. For thesecases it is simplest to take the symmetric definition

Example 1 (Constant) Suppose^?) = 1, f = 0, ±1,. . . , then expression(3.4.10) equals

This function was plotted in Figure 3.2.1 for 0 < X < TT. Notice that ithas peaks at X = 0, ±2*-,. . . .

Example 2 (Cosinusoid) Suppose X(f) — exp{/W}, / = 0, ±1,. . . with wreal-valued, then (3.4.10) equals

This is the transform of Example 1 translated along by w units. It has peaksat X = co, to ± 27r, . . . .

Example 3 (Trigonometric Polynomial) Suppose X(t) = ^k pk exp{/o>/c/}.Clearly, from what has gone before, (3.4.10) equals

an expression with large amplitude at X = co& ± 2?r/, / = 0, ±1 , . . . .

Example 4 {Monomials) Suppose X(i) = tk, t — Q, ±1,. . . , /c a positiveinteger. Expression (3.4.10) becomes

3.4 FINITE FOURIER TRANSFORMS AND THEIR PROPERTIES 63

This transform behaves like the derivatives of the transform of Example 1.Notice that it is concentrated in the neighborhood of X = 0, ±2ir,... forlarge n.

A polynomial J£* «*** will behave as a linear combination of functions ofthe form (3.4.14).

Example 5 (Monomial Amplitude Cosinusoid) Suppose A{0 =tkexp{iut},then (3.4.10) is

The general nature of these results is the following: the Fourier transformof a function X(f) is concentrated in amplitude near \ = 0, ±2ir, . . . ifX(f) is constant or slowly changing with /. It is concentrated near X = o>,w ± 2ir, . . . if X(t) is a cosinusoid of frequency « or is a cosinusoid of fre-quency o> multiplied by a polynomial in /.

The transform (3.4.2) may be inverted by the integration

The Tr vectors Ax{T)(2ws/T')> s = 0, . . . , T — 1, are sometimes referred to asthe discrete Fourier transform of X(0, t = 0,. . . , T — 1. We will discuss itsnumerical evaluation and properties in the next two sections.

The discrete Fourier transform may be written in matrix form. Let 9C de-note the r X T matrix whose columns are X(0),. . . , X(T — 1) successively.Let 3D denote the r X T matrix whose columns are Ax(T)(2irs/T), s = 0,. . . ,T - 1. Let fr denote the T X T matrix with exp {- i(2irst/T)} in row s + 1and column t -f 1 for s,t = 0,. . . , T — 1. Then we see that we have

This is the function of Example 4 translated along by w frequency units.

Alternatively it is seen to be inverted by the sum


The cases T — 1, 2, 3, 4 are seen to correspond to the respective matrices

General discussions of discrete and finite Fourier transforms are given iiStumpff (1937), Whittaker and Robinson (1944), Schoenberg (1950)Cooley, Lewis, and Welch (1967). Further properties are given in the exercises at the end of this chapter.

3.5 THE FAST FOURIER TRANSFORM

In this book the discrete Fourier transform will be the basic entity fronwhich statistics of interest will be formed. It is therefore important to be abhto calculate readily the discrete Fourier transform of a given set of number:X(t), 0 ^ t $ T - \ .

We have

and note that T2 complex multiplications are required if we calculate thediscrete Fourier transform directly from its definition. If Tis composite (th(product of several integers), then elementary procedures to reduce the required number of multiplications have often been employed; see Cooley et a(1967). Recently formal algorithms, which reduce the required number o:multiplications to what must be a near minimum, have appeared; see Gooc(1958), Cooley and Tukey (1965), Gentleman and Sande (1966), Cooley et a(1967), Bergland (1967), Bingham et al (1967) and Brigham and Morrow(1967). For a formulation in terms of a composition series of a finite groupsee Posner (1968) and Cairns (1971).

3.5 THE FAST FOURIER TRANSFORM 65

We now indicate the form of these Fast Fourier Transform Algorithmsbeginning with two elementary cases. The underlying idea is to reduce thecalculation of the discrete Fourier transform, of a long stretch of data, to thecalculation of successive Fourier transforms of shorter sets of data. Webegin with

Theorem 3.S.1 Let T = T\T2, where Ti and Ti are integers; then

We note that j\T2 + h runs through all integers j, 0 ^ j ^ T - I for0 ^ 71 ̂ Ti - 1 and 0 ^ j2 ^ T2 - 1. We note that (Ti + T2)TiT2 com-plex multiplications are required in (3.5.2) to perform discrete Fouriertransforms of orders T\ and ^^. Certain additional operations will be re-quired to insert the terms exp{ — i2irT~lj2ti}.

A different algorithm is provided by the following theorem in which we letX(f) denote the period T extension of ̂ (0),.. . ,X(T - 1).

Theorem 3.5.2 Let T = TiT2, where T\ and T2 are relatively prime integers;then for;' e >i(mod Ti),j = ;2(mod T2), 0 ̂ j\ ^ TI - \, 0 ^ J2 ^ Tz - 1

The number of complex multiplications required is again (Ti + T2)T\T^.In this case we must determine, for each j, j\ and 72 above and use this in-formation to select the appropriate Fourier coefficient. Notice that theexp{ —/2*T-1./2/i5 terms of (3.5.2) are absent and that the result is sym-metric in T\ and TV Good (1971) contrasts the two Fast Fourier Algorithms.

When we turn to the case in which T =T\ . ..Tk, for general k, withT\,...,Tk integers, the extension of Theorem 3.5.1 is apparent. In (3.5.2), Tzis now composite and so the inner Fourier transform, with respect to t2, maybe written in iterated form (in the form of (3.5.2) itself). Continuing in thisway it is seen that the dxm(2vT^lj), j = 0 , . . . , T - 1 may be derived by ksuccessive discrete Fourier transforms of orders T],.. . , Tk in turn. The

C#(0 is here the periodic extension, with period T, of X(t).)

By way of explanation of this result, we note that the numberstiT/Ti -(- tkT/Tk, when reduced mod T, run through all integer tt0 ^ t ^ T - 1 for 0 ^ ti ^ Ti - 1,. . ., 0 ^ tk ^ Tk - 1. For eachy wemust determine the 71 , . . . ,jk above, and select the appropriate Fourier co-efficient from those that have been calculated. This may be done by settingup a table of the residues of j, 0 ^ j ^ T — 1.

The number of complex multiplications indicated in Theorem 3.5.3 isalso (Ti + • • • + Tk)T. We see that we will obtain the greatest saving if theTj are small. If T = 2", we see that essentially 2T Iog2 T multiplicationsare needed. At the end of Section 3.4, we gave the discrete Fourier trans-form for the cases T.= 1, 2, 3, 4. Examination of the results shows thatfewer than the indicated number of operations may be required, the casesT = 4 and T = 8 being particularly important. Additional gains can beachieved by taking note of the real nature of the X(t) or by transformingmore than one series; see Cooley et al (1967) and Exercise 3.10.30.

It often occurs that T is not highly composite and one is not interested inthe values of dx

m(\) at frequencies of the form 2irj/TJ = 0, . . . , T - 1. Ifthis is so, we can add S — T zeros to the X(f) values, choosing S > T to behighly composite. The transform dx(T}(\) is now obtained for X = 2irj/S,7 = 0, 1,.. . ,S- 1.

Quite clearly we can combine the technique of Theorem 3.5.3, where thefactors of T are relatively prime, with the previously indicated procedure fordealing with general factors. The number of extra multiplications by co-sinusoids may be reduced in this way. See Hamming (1962), p. 74, for thecase T = 12. A FORTRAN program for the mixed radix Fast FourierTransform may be found in Singleton (1969).

In conclusion we remark that the Fast Fourier Transform is primarily anefficient numerical algorithm. Its use or nonuse does not affect the basis ofstatistical inference. Its effect has been to radically alter the calculations ofempirical time series analysis.


number of complex multiplications required is (Ti -\\- Tk)T. Specificformulas for this case may be found in Bingham et al (1967).

The generalization of Theorem 3.5.2 is as follows:

Theorem 3.5.3 Let T = T\ • • - Tk, where Ti,. . . , Tk are relatively prime inpairs. Lety = ji (mod Ti), 0 ^ ji ^ 71/ — 1, / = ! , . . . , £ ; then

then we quickly see that the convolution (3.6.1) is the coefficient ofexp{— i\u} in the trigonometric polynomial dxm(\)dY(T)(\). It is thereforegiven by

We may obtain the desired values of the convolution from (3.6.4) by takingS large enough. If S is taken to be highly composite then the discrete Fouriertransforms required in the direct evaluation of (3.6.4) may be rapidly cal-culated by means of the Fast Fourier Transform Algorithm of the previoussection. Consequently the convolution (3.6.1) may well be more rapidlycomputed by this procedure rather than by using its definition (3.6.1) di-rectly. This fact was noted by Sande; see Gentleman and Sande (1966), andalso Stockham (1966). From (3.6.5) we see that for S - T < \u\ ̂ T - 1,

3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS 67

3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS

Suppose the values X(t\ Y(t\ t = 0,. . ., T - 1 are available. We willsometimes require the convolution

If

In general (3.6.4) equals

This occurrence suggests that we may be able to compute (3.6.1) by means ofa discrete Fourier transform and so take advantage of the Fast FourierTransform Algorithm. In fact we have

Lemma 3.6.1 Given X(t), Y(t), t = 0,. . . , T - 1 and an integer S > T,the convolution (3.6.1) is given by

and so if a(«) falls off rapidly as |M| —» °° and Q ^ t ^ T — I expression(3.6.8) should be near (3.6.7). If S is taken to be highly composite then thecalculations indicated in (3.6.8) may be reduced by means of the FastFourier Transform. We might introduce convergence factors.

We remark that Lemma 3.6.1 has the following extension:

Lemma 3.6.2 Given X}(t\ t = 0,. . . , T — 1, j = 1,. . . , r and an integerS > T the expression


expression (3.6.4) gives (3.6.1) plus some additional terms. For moderatevalues of \u\ it will approximately equal (3.6.1). It can be obtained for all uby taking S ^ 2T.

One situation in which one might require the convolution (3.6.1) is in theestimation of the moment function mn(ii) ~ E[X$t + u) X2(t)] for somestationary bivariate series. An unbiased estimate of m\2(u) is provided by

equals

an expression of the form of (3.6.1). Exercise 3.10.7 indicates how theresult of Lemma 3.6.1 might be modified to construct an estimate ofci2(«) = cov{*i(/ + «),*2(01.

Another situation in which the result of Lemma 3.6.1 proves useful is inthe calculation of the filtered values of Section 3.3:

given the values X(t), t = 0 , . . . , T — 1. Suppose the transfer function ofthe filter [a(ii)} is A($. Then Lemmas 3.4.1 and 3.6.1 suggest that we form

These values should be near the desired filtered values. In fact by direct sub-stitution we see that (3.6.8) equals

3.6 APPLICATIONS OF DISCRETE FOURIER TRANSFORMS 69

We conclude this section by indicating some uses of the finite Fouriertransform. Suppose

then

following Example 2 of Section 3.4. By inspection we see that the amplitudeof expression (3.6.13) is large for X near ±w and not otherwise, — ir < X < ir.In consequence the finite Fourier transform (3.6.13) should prove useful indetecting the frequency of a cosinusoid of unknown frequency. This use wasproposed in Stokes (1879).

We remark that if X(t) contains two unknown frequencies, say

then we may have difficulty resolving coi and o>2 if they are close to one an-other for

This function will not have obvious peaks in amplitude at X = ±coi, ±«2 ifo>i and W2 are so close together that the ripples of the Dn functions interferewith one another. This difficulty may be reduced by tapering the X(t) seriesprior to forming the Fourier transform. Specifically consider

in the case of (3.6.14) where we have made use of (3.3.6). If the convergencefactors h(u/ri) are selected so that HM(\) is concentrated in some interval,say |X| < A/«, then the amplitude of (3.6.16) should have obvious peaks if|wi - co2| > 2A/n.

Other uses of the finite Fourier transform include: the evaluation of thelatent values of a matrix of interest, see Lanczos (1955); the estimation ofthe mixing distribution of a compound distribution, see Medgyessy (1961);and the determination of the cumulative distribution function of a randomvariable from the characteristic function, see Bohman (1960).


is the matrix of the discrete Fourier transform considered in Section 3.4,then the matrix T~ll2^f is unitary. Its latent values are given in Exercise3.10.12.

Such an a is called a latent vector of Z. If Z is Hermitian, then its latentvalues are real-valued; see MacDuffee (1946). We denote they'th largest ofthese by M, or M/Z) for/ = 1,...,./. The corresponding latent vector is de-noted by ctj or «/Z). The collection of latent values of a square matrix iscalled its spectrum. We will shortly discuss the connection between thisspectrum and the previously defined second-order spectrum of a stationaryseries.

Given a matrix Z, we note that the matrices ZZT, ZrZ are always Hermi-tian and non-negative definite. Also, following Theorem 2.5.1, we note thatif X(r), / = 0, ±1,. . . is an r vector-valued stationary series with absolutelysummable covariance function, then fxx(X), its spectral density matrix, isHermitian and non-negative definite. We remark that if

3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES

We turn to a consideration of matrices whose entries are complex num-bers and remark that the spectral density matrix introduced in Section 2.5is an example of such a matrix. Begin with several definitions. If Z = [Zjk] isa J X A" matrix with the complex number Zjk in they'th row and kth column,then we define Z = [Zjk]to be the matrix whose entries are the complexconjugates of the entries of Z. Let ZT = [Zk\ denote the transpose of Z. Wethen say that Z is Hermitian if ZT = Z. If Z is J X J Hermitian then we saythat Z is non-negative definite if

for all complex scalars a/, j_— 1,. . ., J. A square matrix Z is unitary ifZrl = ZT or equivalently ZZT = I with I the identity matrix. The complexnumber n is called a latent value or latent root of the J X J matrix Z if

where I is the identity of the same dimension as Z. Because Det(Z — /J) is apolynomial of order J in ju, the equation (3.7.2) has at most J distinct roots.It is a classic result (MacDuffee (1946)) that corresponding to any latentvalue M there is always a J vector a such that

It is discussed in Wedderburn (1934), Lanczos (1956), Bellman (1960),Brenner (1961), Good (1963), and Goodman (1963). The correspondence isexceedingly useful for carrying out numerical computations involvingmatrices with complex-valued entries. However, Ehrlich (1970) suggests thatwe should stick to complex arithmetic when convenient.

Latent vectors and values are important in the construction of representa-tions of matrices by more elementary matrices. In the case of a Hermitianmatrix we have

Theorem 3.7.1 If H is a J X J Hermitian matrix, then

In fact the correspondence of this lemma may be taken to be

providing the dimensions of the matrices appearing throughout the lemmaare appropriate.

3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES 71

It is sometimes useful to be able to reduce computations involving com-plex matrices to computations involving only real matrices. Lemma 3.7.1below gives an important isomorphism between complex matrices andreal matrices. We first set down the notation; if Z = [Zjk] withZjk = Re Zjk + i Im Z/*, then

Lemma 3.7.1 To any J X K matrix Z with complex entries there corre-sponds a (2J) X (2AT) matrix Z* with real entries such that

(i) if Z = X + Y, then Z* = X* -f Y*(ii) if Z = XY, then Z* = X«Y*

(iii) if Y = Z-i, then Y* = (Z*)-»(iv) Det Z* = |Det Zp(v) if Z is Hermitian, then Z* is symmetric

(vi) if Z is unitary, then Z* is orthogonal(vii) if the latent values and vectors of Z are /x/, «/,./ = 1,...,/, then

those of Z* are, respectively,


where M; is the jf'th latent value of H and U, is the corresponding latentvector.

The theorem has the following:

Corollary 3.7.1 If H is J X J Hermitian, then it may be written UMUT

where M = diag{M/,7 = 1, • • • , J} and U = [Ui • • -Uy] is unitary. Also if His non-negative definite, then MJ ^ 0, j = 1,. . . , J.

This theorem is sometimes known as the Spectral Theorem. In the case ofmatrices of arbitrary dimension we have

Theorem 3.7.2 If Z is J X K, then

where My2 is the jth latent value of ZZr (or_ZTZ), U, is thejth latent vector ofZZT and V, is the y'th latent vector of ZrZ and it is understood /*/ ^ 0.


Corollary 3.7.2 If Z is J X K, then it may be written UMVT where theJ X K M = diagU,; j = 1,. . ., J}, the J X J U = [Ui• • -U/] is unitaryand the K X K V = [Vi • • • VK] is also unitary.

This theorem is given in Autonne (1915). Structure theorems for matricesare discussed in Wedderburn (1934) and Hua (1963); see also Schwerdtfeger(1960). The representation Z = UMUr is called the singular value decom-position of Z. A computer program for it is given in Businger and Golub(1969).

An important class of matrices, in the subject of time series analysis, is theclass of finite Toeplitz matrices. We say that a matrix C = [Cjk] is finiteToeplitz if Cjk depends only on j — k, that is, Cjk = c(j — k) for somefunction c(.). These matrices are discussed in Widom (1965) where otherreferences may be found. Finite Toeplitz matrices are important in timeseries analysis for the following reason; if X(f), t — 0, ±1, ... is a real-valued stationary series with autocovariance function cxx(u), u — 0,±1,.. ., then the covariance matrix of the stretch X(t), t = 0,. . . , T — 1is a finite Toeplitz matrix with cxx(j — k) in the ;'th row and fcth column.

We will sometimes be interested in the latent roots and vectors of the Co-variance matrix of X(t), t = 0, . . ., T — 1 for a stationary X(t). Variousapproximate results are available concerning these in the case of large T.Before indicating certain of these we first introduce an important class of

3.7 COMPLEX MATRICES AND THEIR EXTREMAL VALUES 73

finite Toeplitz matrices. A square matrix Z = [ZJk] is said to be a circulantof order T if Zjk = z(k — j) for some function z(.) of period T, that is,

In connection with the latent values and vectors of a circulant we have

Theorem 3.7.3 Let Z = [z(k - j)] be a T X T circulant matrix, then itslatent values are given by

and the corresponding latent vectors by

respectively.

The latent values are seen to provide the discrete Fourier transform of thesequence z(/), t = 0 , . . . , T — 1. The matrix of latent vectors is proportionalto the matrix ff of Section 3.4. Theorem 3.7.3 may be found in Aitken (1954),Schoenberg (1950), Hamburger and Grimshaw (1951) p. 94, Good (1950),and Whittle (1951).

Let us return to the discussion of a general square finite Toeplitz matrixC = [c(j — k)], j, k = 1, . . . , T. Consider the related circulant matrix Zwhose fcth entry in the first row is c(l — k) + c(l — k + T), where we con-sider c(r) = 0. Following Theorem 3.7.3 the latent values of Z are

giving a discrete Fourier transform of the c(u), u = 0, ±1, . . . , ±(r — 1).Let JT denote the T XT matrix whose columns are the vectors (3.7.12). LetMj- denote the diagonal matrix with corresponding entries ^*(Z), then


and we may consider approximating C by 3rMr3rT = Z. We have

then the latent roots of C are tending to be distributed like the values of thediscrete Fourier transform of c(w), u = 0, ±1,. . ., ±(r — 1) as T—* <». Avariety of results of this nature may be found in Grenander and Szego(1958); see also Exercise 3.10.14. This sort of result indicates a connectionbetween the power spectrum of a stationary time series (defined as theFourier transform of its autocovariance function) and the spectrum (definedto be the collection of latent values) of the covariance matrix of longstretches of the series. We return to this in Section 4.7.

Results concerning the difference between the latent vectors of C andthose of Z may be found in Gavurin (1957) and Davis and Kahan (1969).We remark that the above discussion may be extended to the case of vector-valued time series and block Toeplitz matrices; see Exercise 3.10.15.

The representation (3.7.9) is important in the approximation of a matrixby another matrix of reduced rank. We have the following:

Theorem 3.7.4 Let Z be J X K. Among J X K matrices A of rank L ^ J, K

where nj, U/, Vy are given in Theorem 3.7.2. The minimum achieved is n2j+L.

giving us a bound on the difference between C and Z. This bound may beused to place bounds on the differences between the latent roots and vectorsof C and Z. For example the Wielandt-Hoffman Theorem (Wilkinson(1965)) indicates that there is an ordering /*ii(Q» • • • > M/r(Q °f tne latentroots m(C), • • • , Mr(C) of C such that

If

is minimized by

3.8 FUNCTIONS OF FOURIER TRANSFORMS 75

We see that we construct A from the terms in (3.7.9) corresponding to theL largest /iy; see Okamoto (1969) for the case of real symmetric Z and A.

Corollary 3.7.4 The above choice of A also minimizes

3.8 FUNCTIONS OF FOURIER TRANSFORMS

Let X(0, t = 0, ±1,.. . be a vector-valued time series of interest. In orderto discuss the statistical properties of certain series resulting from the appli-cation of operators to the series X(0, we must now develop several analyticresults concerning functions of Fourier transforms. We begin with thefollowing:

Definition 3.8.1 Let C denote the space of complex numbers. A complex-valued function/(z) defined for z == (zi,. . ., zn) £ D, an open subset of Cn,is hoiomorphic in D if each point w = (wi,. . . , wn) £ D is contained in anopen neighborhood U such that /(z) has a convergent power series ex-pansion

for A of rank L ^ J, K. The minimum achieved is

Results of the form of this corollary are given in Eckart and Young (1936),Kramer and Mathews (1956), and Rao (1965) for the case of real Z, A.

for all z £ U.

A result that is sometimes useful in determining hoiomorphic functions isprovided by

Theorem 3.8.1 Suppose Fj(y\,. . . , ym;zi,. . . , zn), j = 1,. .. , m arehoiomorphic functions of m + n variables in a neighborhood of(MI, .. ., um\v\,. .. , D,,) 6 Cm+H. If FJ(UI, . . ., wm ; i?i , . . . , vn) = 0, j = 1,. . ., m, while the determinant of the Jacobian matrix


is nonzero at (MI, . . . , um\v\,. . ., vn), then the equations

have a unique solution y/ = yj(z\,. . . , zn)J = 1,.. . , m which is holomor-phic in a neighborhood of ( i n , . . . , vn).

This theorem may be found in Bochner and Martin (1948) p. 39. It im-plies, for example, that the zeros of a polynomial are holomorphic functionsof the coefficients of the polynomial in a region where the polynomial hasdistinct roots. It implies a fortiori that the latent values of a matrix areholomorphic functions of the elements of the matrix in a region of distinctlatent values; see Exercise 3.10.19.

Let V+(l), / ^ 0, denote the space of functions z(X), — <» < X < <», thatare Fourier transforms of the form

with the a(u) real-valued and satisfying

Under the condition (3.8.5) the domain of z(X) may be extended to consist ofcomplex X with -co < Re X < oo, Im X ̂ 0. We then have

Theorem 3.8.2 If z,(X) belongs to V+(l\ j = 1,. . . , n and f(z\t. . . , zn)is a holomorphic function in a neighborhood of the range of values|zt(X), . . . , zn(X)}; - oo < Re X < «, Im X ̂ 0, then/(zi(X),. . . , zn(X))also belongs to V+(l).

This theorem may be deduced from results in Gelfand et al (1964). Thefirst theorems of this nature were given by Wiener (1933) and Levy (1933).

As an example of the use of this theorem consider the following: let|a(w)j, u = 0, 1, 2,. . . be an r X r realizable /summable filter with transferfunction A(X) satisfying Det A(X) ̂ 0, - oo < Re X < oo, Im X ̂ 0. Thilast condition implies that the entries of A(X)-1 are holomorphic functions ofthe entries of A(X) in a neighborhood of the range of A(X); see Exercis3.10.37. An application of Theorem 3.8.2 indicates that the entries ofB(X) = A(X)~l are in K+(/) and so B(X) is the transfer function of an r X rrealizable / summable filter {b(«)U « = 0, 1, 2 , . . . . In particular we seethat if X(f), t = 0, ±1,.. . is a stationary r vector-valued series with£|X(/)| < °°, then the relation

We remark that the condition Det A(X) ^ 0, -co < R e X < «>, Im X ̂ 0is equivalent to the condition

and has no roots in the unit disc \z\ ^ 1. In the case that Y(/) = e(0, a purenoise series with finite mean, the above reasoning indicates that if

Det[I -f a(l)z + • • • + a(/w)z*] (3.8.10)

has no roots in the unit disc, then the autoregressive scheme

X(/) + a(l)X(/ - ! )+• • • + a(/n)X(f - m) = t(t) (3.8.11)

has, with probability 1, a stationary solution of the form


may, with probability 1, be inverted to give

for some b(«), u = 0, 1, 2 , . . . with

with

for all / ^ 0.An alternate set of results of the above nature is sometimes useful. We

set down

Definition 3.8.2 A complex-valued function /(z) defined for z =(z\,. . . , zn) £ D, an open subset of C", is real holomorphic in D if each pointw = (wi,.. . , Wn) G D is contained in an open neighborhood U such that/(z) has a convergent power series expansion

for all z 6 U.

where the series

Theorem 3.8.4 Let X(t), t = 0, ± 1, . . . be a real-valued series with mean 0and cov{*(/ -f- u), X(t)} = cxx(u\ t, u = 0, ±1,. . . . Suppose

Suppose /*A-(X) 5^ 0, — oo <X< oo. Then we may write


We next introduce F(/), / ^ 0, the space of functions z(X), — « < X < °°,that are Fourier transforms of the form

with the a(u) real-valued and satisfying

We then have the following:

Theorem 3.8.3 If z/X) belongs to V(l),j = ! , . . . , » and/(zi , . . . , zn) is areal holomorphic function in a neighborhood of the range of values( z i ( X ) , . . . , zn(X); - oo < x < oo}, then /(zi(X),. . . , zn(X)) also belongsto V(t).

This theorem again follows from the work of Gelfand et al (1964). Com-paring this theorem with Theorem 3.8.2, we note that the required domain ofregularity of/(•) is smaller here and its values are allowed to be moregeneral.

As an application of this theorem: let (a(w)} u = 0, ±1, ±2, . . . be anr X r I summable filter with transfer function A(X) satisfying Det A(X) ^ 0,— oo < X < oo. Then there exists an /summable filter {b(w)}tt = 0, ±1,...with transfer function B(X) = A(X)-1. Or with the same notation, thereexists an / summable filter { C(M) } u = 0, ± 1,. . . with transfer functionC(X) = (A(X)A(X))-».

As an example of the joint use of Theorems 3.8.2 and 3.8.3 we mention thefollowing result useful in the linear prediction of real-valued stationaryseries.


has mean 0 and autocovariance function c,,(«) = 6{u}. The coefficientssatisfy

The \a(u)\, \b(u)\ required here are determined somewhat indirectly. If

then we see that it is necessary to have

As (3.8.17) holds and/o{X) does not vanish, we may write

with

and

following Theorem 3.8.3. Expression (3.8.24) suggests defining

The corresponding (o(«)), |&(H)| satisfy expression (3.8.20) followingTheorem 3.8.2.

Theorems 3.8.2 and 3.8.3 have previously been used in a time series con-text in Hannan (1963). Arens and Calderon (1955) and Gelfand et al (1964)are general references to the theorems. Baxter (1963) develops an inequality,using these procedures, that may be useful in bounding the error of finiteapproximations to certain Fourier transforms.


3.9 SPECTRAL REPRESENTATIONS IN THE FUNCTIONAL APPROACHTO TIME SERIES

In Section 2.7 we saw that the effect of linear time invariant operations ona time series X(0, t — 0, ±1,. .. was easily illustrated if the series could bewritten as a sum of cosinusoids, that is, if for example

exists for /, M = 0, ±1, . . . . Then the following limit exists,

Also there exists an r vector-valued Zx(^',s), — ?r < X <J ?r, s = 0, ±1 , . . .such that

in the sense that

ZA-(X;S) also satisfies

the z(y) being r vectors. In this section we consider representations of a seriesX(0 that have the nature of expression (3.9.1), but apply to a broader class oftime series: such representations will be called spectral representations. Theyhave the general form

for some r vector-valued ZAT(A). We begin with

Theorem 3.9.1 Let X(/), t = 0, ± 1,. . . be an r vector-valued function suchthat

3.9 SPECTRAL REPRESENTATIONS IN THE FUNCTIONAL APPROACH 81

and

The matrix Gjr*(A) of (3.9.4) may be seen to be bounded, non-negativedefinite, nondecreasing as a function of X, 0 ^ X ̂ *-, and such thatGjr*(—X) = G*jr(X)T. Exercise 2.13.31 indicates a related result.

Expression (3.9.5) provides a representation for X(r + -0 as a sum of co-sinusoids of differing phases and amplitudes. Suppose that {a(u)}, u = 0,±1,... is a filter whose coefficients vanish for sufficiently large \u\. LetA(X) denote the transfer function of this filter. Then if we set

we see that the filtered series has the representation

The cosinusoids making up X(/ + s) have become multiplied by the transferfunction of the filter.

A version of Theorem 3.9.1 is given in Bass (1962a,b); however, thetheorem itself follows from a representation theorem of Wold (1948).

An alternate form of spectral representation was given by Wiener (1930)and a discrete vector-valued version of his result is provided by

Theorem 3.9.2 Let X(j), / = 0, ±1, . . . be an r vector-valued functionA...*.!* *l*A +

then there exists an r vector-valued Z*(X), — r < X ̂ T, with Zx(w) —ZX(-T,) = X(0), such that

Expression (3.9.12) holds in the sense of the formal integration by parts

Expression (3.9.12) may clearly be used to illustrate the effect of linearfilters on the series X(/).

Yet another means of obtaining a spectral representation for a fixed seriesX(f), t = 0, ±1, . . . is to make use of the theory of Schwartz distributions;see Schwartz (1057, 1959) and Edwards (1967) Chap. 12. We will obtain aspectral representation for a stochastic series in Section 4.6. Bertrandias(1960, 1961) also considers the case of fixed series as does Heninger (1970).

3.10 EXERCISES

3.10.1 Suppose A(\) = 1 for |X ± co| < A with A small and A(\) = 0 otherwisefor — IT < X < TT. Show that

3.10.2 Let A(\) denote the transfer function of a filter. Show that the filter leavespolynomials of degree k invariant if and only if A(0) = 1, A(»(Q) = 0,1 ^ j ^ k. (Here A(A(\) denotes the y'th derivative.) See Schoenberg(1946) and Brillinger (1965a).

3.10.3 If

with \H(a)\ ^ K(l + \a\)~2, show that #<«>(X) of (3.3.6) is given by


The function Z*(X) satisfies

If X(/) also satisfies expression (3.9.3) and GA-AT(^) is given by (3.9.4), then

at points of continuity of G^^(X), 0 ^ X ̂ T.

A theorem of Wiener (1933), p. 138, applies to show that expression(3.9.11) holds if

3.10 EXERCISES 83

3.10.4 If SF denotes the matrix with exp{-/27r(;- !)(£ - \)/T\ in row j,column k,l ^ j, k ^ T, show that 3FSFT = 71 and ff4 = T2!.

3.10.5 If Z>n(X) is given by expression (3.2.6), prove that (2n + l)~lDn(X) tendstoi;{X}/27ras n—> co.

3.10.6 Prove that expression (3.4.14) tends to

3.10.7 Let CA-(r), cy(r> denote the means of the values AT(/), Y(t\ t = 0,. . . , T - 1.Show that

3.10.9 Let

Show that dy(r'(X) = A(\)djr(:r)(X), -co < X < ».3.10.10 Let n(r)(wi, . . . , wr_0 denote expression (3.6.10). Show that

is given by

3.10.8 Let jfrX'X t = 0, ±1,. .. denote the period T extension of *//), t = 0,. . . 'T — 1 for y = 1, . . . , r. Show that the expression

3.10.11 If W = Z-1, show that

Re W = {Re Z + (Im Z)(Re Z)-»(Im Z)}"1

Im W = -(Re W)(Im Z)(Re Z)-».


3.10.12 Let 5F denote the matrix of Exercise 3.10.4. Show that its latent values are7-1/2, -,T1/2, -r1/2, jT1/2 with multiplicities [7/4] + !, [(r+l)/4],[(r+ 2)/4], [(r -}- 3)/4] - 1 respectively. (Here [N] denotes the integralpart of M) See Lewis (1939).

3.10.13 If the Hermitian matrix Z has latent values p\,. . . , p, and correspondinglatent vectors Ui, .. . , U-, prove that the matrix Z — jujUiUr has latentvalues 0, /Li2,. . . , & and latent vectors Ui, . . . , Ur. Show how this resultmay be used to reduce the calculations required in determining the latentvalues and vectors of Z from those of Z*.

3.10.14 Use the inequality (3.7.16) to prove the following theorem; let

Theorems of this sort are given in Grenander and Szego (1958).3.10.15 A (7» X (TV) matrix Z is said to be a block circulant if it is made up of

r X r matrices Z;* = z(k — j) for some r X r matrix-valued function z(.)of period r. Prove that the latent values of Z are given by the latent valuesof

- oo < X < oo with 2*u \u\ |c(w)|2 < co. Let C^ = [c(j - *)], j, k = 1,. . . , T. If F[.] is a function with a uniformly bounded derivative on therange of /(X), — oo < X < », then

and the corresponding latent vectors by

where u/* are the latent vectors of (*); see Friedman (1961). Indicate howthis result may be used to determine the inverse of a block circulant matrix.

3.10.16 Let Z be a J X J Hermitian matrix. Show that

for x a J vector and D a matrix of rank ^ j — I that has J rows. This is theCourant-Fischer Theorem and may be found in Bellman (1960).

3.10.17 If the pure noise series c(r), / = 0, ±1, . . . has moments of all orders,prove that the autoregressive scheme (3.8.11) has, with probability 1, asolution X(/) satisfying Assumption 2.6.1 provided that the polynomial(3.8.10) has no roots in the unit disc.

3.10 EXERCISES 85

3.10.18 If A, B are r X r complex matrices and F(B;A) = BA, prove that thedeterminant of the Jacobian dF/dB is given by (Det A)r; see Deemer andOlkin (1951) and Khatri (1965a).

3.10.19 Let Z be an r X r complex matrix with distinct latent values /*/, j = 1,. . . , r. Prove that the pt are holomorphic functions of the entries of Z.Hint: Note that the m are the solutions of the equation Det(Z — /*!) = 0and use Theorem 3.8.1; see Portmann (1960).

3.10.20 Let Zo be an r X r complex matrix with distinct latent values. Show thatthere exists a nonsingular Q whose entries are holomorphic functions ofthe entries of Z for all Z in a neighborhood of Zo and such that Q~'ZQ isa diagonal matrix in the neighborhood; see Portmann (1960).

3.10.21 If Zo, Z of Exercise 3.10.20 are Hermitian, then the columns of Q areorthogonal. Conclude that a unitary matrix, U, whose entries are realholomorphic functions of the entries of Z may be determined so thatUTZU is a diagonal matrix.

3.10.22 If {a(w)J, M = 0, ±1,... is an r X r realizable filter and (b(w)} its inverseexists, prove that the b(«), u = 0,1,. . . are given by: a(0)b(0) = I, a(0)b(l)+ a(l)b(0) = 0, a(0)b(2) + a(l)b(l) + a(2)b(0) = 0, ....

3.10.23 Prove Exercise 2.13.22 using the results of Section 3.8.

3.10.24 Let p(S) be a monotonically increasing function such that limô,p(5 + l)/p(5) = 1. Let^r(/), / = 0, ±1, . . . be a function such that

exists for /,« = 0, ±1 Indicate the form that Theorem 3.9.1 takes forsuch an X(f).

3.10.25 Adopt the notation of Theorem 3.9.1. If the moments mai...ak(ui,..., wt_i)of expression (2.11.9) exist and are given by the Fourier-Stieltjes transformsof the functions Mai...ak(\i, . . . , A*-0, — TT < X; ^ TT, prove that

3.10.26 Let Z be a / X J Hermitian matrix with ordered latent vectors x i , . . . , x/.Show that

where the maximum is over x orthogonal to x i , . . . , x^_i. Equality occursfor x = X,.


3.10.27 Let A be an r X r Hermitian matrix with latent roots and vectors MJ,Vj,y = 1,..., r. Given # mapping the real line into itself, the r X r matrix-valued function A(A) is denned by

3.10.30 Let real-valued data X(t\ Y(t\ t = 0, . . . , T - 1 be given. Set Z(t) =X(t) + iT(r). Show that

Show that <KA)* = <KA*).

3.10.28 Show that there exist constants K, L such that

for

3.10.29 Suppose the conditions of Theorem 3.3.1 are satisfied and in addition theFth derivative of A(a) is continuous at a = X. Show that the last expressionof (3.3.18) may be replaced by

This exercise indicates how the Fourier transforms of two real-valued setsof data may be found with one application of a Fourier transform to acomplex-valued set of data; Bingham et al (1967).

3.10.31 Prove that for S an integer

when a(u)t A(\) are related as in (3.2.2).

3.10.32 If a is an r vector and Z is an r X r Hermitian matrix, show that

where

3.10 EXERCISES 87

3.10.33 With the notation of Corollary 3.7.2, set M+ = diag{M;+,7 = !,...,/}where n+ =•_!//* if n ̂ 0, n+ = 0 if n = 0. Then the K X J matrixZ+ = VM+Ur is called the generalized inverse of Z. Show that(a) ZZ+Z = Z(b) Z+ZZ+ = Z4"(c) (ZZ+)r = ZZ+(d) (Z+Z)T = Z+Z.

3.10.34 Show for 5 ̂ Tthat

3.10.36 Use the singular value decomposition to show that a J X K matrix A ofrank L may be written A = BC, where B is J X L and C is L X K.

3.10.37 Let Zo be an r X r matrix with Det Zo ̂ 0. Show that the entries of Z~'are holomorphic functions of Z in a neighborhood of Zo.

3.10.35 If A(n}(\) is given by expression (3.2.4), show for m ^ n that

4

STOCHASTIC PROPERTIES OFFINITE FOURIER TRANSFORMS

4.1 INTRODUCTION

Consider an r vector-valued sequence X(/), / = 0, ± 1,. . . . In the previ-ous chapter we considered various properties of the finite Fourier transform

in the case that X(/) was a fixed, nonstochastic function. In this chapter wepresent a variety of properties of Ax(T}(\), if X(0, t = 0, ±1,. . . is a sta-tionary time series. We will also consider asymptotic distributions, probabil-ity 1 bounds, behavior under convolution as well as develop the Cramerrepresentation of X(/).

In previous chapters we have seen that Fourier transforms possess awealth of valuable mathematical properties. For example, in Chapter 3 wesaw that the discrete Fourier transform has the important numericalproperty of being rapidly computable by the Fast Fourier TransformAlgorithm, while in this chapter we will see that it has useful and elementarystatistical properties. For all of the reasons previously given, the Fouriertransform is an obvious entity on which to base an analysis of a time seriesof interest.

However, before developing stochastic properties of the transform (4.1.1)we first define two types of complex-valued random variables. These vari-ables will prove important in our development of the distributions of varioustime series statistics.

88

4.2 THE COMPLEX NORMAL DISTRIBUTION 89

4.2 THE COMPLEX NORMAL DISTRIBUTION

If X is an r vector-valued random variable having real-valued componentsand having a multivariate normal distribution with mean y* and covariancematrix Xxx, write: X is Nfax&xx). Throughout this text we will oftenhave to consider r vector-valued random variables X whose individual com-ponents are complex-valued. If, for such an X, the 2r vector-valued variatewith real components

We remark that within the class of complex vector-valued random variableswhose real and imaginary parts have a joint multivariate normal distribu-tion, the complex multivariate normals have the property that if (4.2.4) isdiagonal, then the components of X are statistically independent; seeExercise 4.8.1. Various properties of the complex multivariate normal aregiven in Wooding (1956), Goodman (1963), James (1964), and in Exercises4.8.1 to 4.8.3. We mention the properties: if *Sxx is nonsingular, then theprobability element of X is given by

is distributed as

for some r vector yx and r X r Hermitian non-negative definite SAT*, wewill write: X is Nr

c(^x^xx\ Then X is complex multivariate normal withmean yx and covariance matrix SA-A-, which leads us to

and

for — oo < Re AT/, Im Xj < <». And in the case r = 1 if A'is Nic(nx,<rxx), thenRe X and Im X are independent TV^Re nx,<rxx/2) and AÎm nx,axx/2),respectively.

Turning to a different class of variates, suppose Xi , . .. , Xn are inde-pendent Nr(Qj£>xx) variates. Then the r X r matrix-valued random variable

90 STOCHASTIC PROPERTIES OF FINITE FOURIER TRANSFORMS

is said to have a Wishart distribution of dimension r and degrees of freedomn. We write: W is Wr(nJLxx). If on the other hand X i , . . . , Xn are inde-pendent Nr

c(to^xx) variates, then the r X r matrix-valued random variable

The complex Wishart distribution will be useful in the development ofapproximations to the distributions of estimates of spectral density matrices.

In later sections of this text, we will require the concept of a sequence ofvariates being asymptotically normal. We will say that the r vector-valuedsequence Cr, T = 1 ,2, . . . is asymptotically ^Vr(tfr,Sr) if the sequence£^1/2((r — yr) tends, in distribution, to M(0,I). We will also say that the rvector-valued sequence O, T = 1, 2 , . . . is asymptotically M-c(yr,Sr) if thesequence Sf 1/2(£r — tfr) tends, in distribution, to JV,C(0,I).

4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM

Consider the r vector-valued stationary series X(0, f = 0, ±1, . . . . Inthis section we will develop asymptotic expressions for the cumulants of thefinite Fourier transform of an observed stretch of the series. In Section 3.3we saw that certain benefits could result from the insertion x>f convergencefactors into the direct definition of the finite Fourier transform. Now let us

for n ^ r and W ^ 0. Other properties include:

and

is said to have a complex Wishart distribution of dimension r and degrees offreedom n. In this case we write: X is Wr

c(n&xx)> The complex Wishart dis-tribution was introduced in Goodman (1963). Various of its properties aregiven in Exercises 4.8.4 to 4.8.8 and in Srivastava (1965), Gupta (1965),Kabe (1966,1968), Saxena (1969), and Miller (1968,1969). Its density func-tion may be seen to be given by

In the present context we will refer to the function ha(t/T) as a taper or datawindow. The transform involves at most the values X(0, / = 0, ±1,.. . ,±(r — 1) of the series. If ha(u) = 0 for M < 0, then it involves only thevalues X(0, t = 0 , . . . , T — 1. This means that the asymptotic results wedevelop apply to either one-sided or two-sided statistics. If a segment of theseries is missing, within the time period of observation, then the data avail-able may be handled directly by taking h(t/T) to vanish throughout the miss-ing segment. If the component series are observed over different time inter-vals, this is handled by having the ha(t/T) nonzero over different timeintervals.

Set

and if it is possible to apply the Poisson summation formula (Edwards(1967) p. 173), then we may write

The discussion of convergence factors in Section 3.3 suggests that Hai...ak(^)will have substantial magnitude only for X near 0. This implies that the func-tion (4.3.2) will have substantial magnitude only for X near some multipleOf27T.

4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM 91

begin by inserting convergence factors here and then deducing the results forthe simple Fourier transform as a particular case. We begin with

Assumption 4.3.1 h(u), — °° 1.

Suppose ha(u) satisfies this assumption for a = ! , . . . , / • . The finiteFourier transform we consider is defined by

If


We repeat the definition

and if

we also repeat the definition

Now we have

Theorem 4.3.1 Let X(/), t = 0, ± 1, .. . be a stationary r vector-valuedseries satisfying (4.3.6). Suppose ha(u), — °o < u < °°, satisfies Assumption4.3.1 for a — 1, . . ., r. Then

The error term is uniform in Xi, . . . , X/c.

If Xi + (- X*-. = 0 (mod 2*-), then

If Xi + • • • + \k ^ 0 (mod 2?r), then the cumulant will be of reduced order.Expression (4.3.9) suggests that we can base an estimate of the cumulantspectrum (4.3.7) on the dai

(T)(\i),. . . , dak(T)(\k)with X, H\-\k = 0

(mod ITT).There are circumstances in which the error term of (4.3.8) is of smaller

order of magnitude than o(T). Suppose, in place of (4.3.6), we have

then we can prove

4.3 STOCHASTIC PROPERTIES OF THE FINITE FOURIER TRANSFORM 93

Theorem 4.3.2 Let X(0, t = 0, ±1, . . . be a stationary r vector-valuedseries satisfying (4.3.10). Suppose ha(ii), — <» < u < «, satisfies Assump-tion 4.3.1 for a = 1 , . . . , r. Then

is of special interest. In this case the Fourier transform is

Also, from expression (4.3.2)

The function A(r)(X) has the properties: A<"(X) = T for X s 0 (mod 2*-),&<n(2irs/T) = 0 for s an integer with s ^ 0 (mod T). Also |A("(X)| ^l/|sin i\| and so A(r)(^) is of reduced magnitude for X not near a multipleof 2ir. Expression (4.3.11) here takes the form

This joint cumulant has substantial magnitude for Xi -f- • • • + X* near somemultiple of 2ir. Note that the first term on the right side of expression (4.3.15)vanishes for Xy = 2irSj/T, Sj an integer, if si + • • • -f Sk ̂ 0 (mod T).

The error term is uniform in X i , . . . , X*.

Qualitatively the results of Theorem 4.3.1 are the same as those of Theo-rem 4.3.2. However, this theorem suggests to us that decreasing the span ofdependence of series, as is the effect of expression (4.3.10) over (4.3.6), re-duces the size of the asymptotic error term. Exercise 4.8.14 indicates that theerror term may be further reduced by choosing the ha(u) to have Fouriertransforms rapidly falling off to 0 as |X| increases.

The convergence factor


Expression (4.3.15) was developed in Brillinger and Rosenblatt (1967a);other references to this type of material include: Davis (1953), Root andPitcher (1955), and Kawata (1960, 1966). Exercise 4.8.21 suggests that, onoccasion, it may be more efficient to carry out the tapering through compu-tations in the frequency domain.

4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIERTRANSFORM

In the previous section we developed asymptotic expressions for the jointcumulants of the finite Fourier transforms of a stationary time series. In thissection we use these expressions to develop the limiting distribution of thetransform. We set ex — EX(t) and have

Theorem 4.4.1 Let X(0, t — 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.1. Let s/T) be an integer with X/r) = 2irsv{T)/r —> X,as T —> oo for; = 1,.. ., J. Suppose 2X/jT), X/r) ± \k(T) ^ 0 (mod 2ir) for1 ^ j < k ^ J. Let

In the case X = 0,

and the theorem is seen to provide a central limit theorem for the series X(f).Other central limit theorems for stationary series are given in Rosenblatt(1956,1961), Leonov and Shiryaev (1960), losifescu and Theodorescu (1969)p. 22, and Philipp (1969). The asymptotic normality of Fourier coefficientsthemselves is investigated in Kawata (1965, 1966).

If the conditions of the theorem are satisfied and \j = X, j = 1 , . . . , J,then we see that the dx

(T)(\j{T))J= 1 , . . . , J are approximately a sample ofsize J from Nr

c(Q,2irTfxx(>3)- This last remark will prove useful later in thedevelopment of estimates of fxx(ty and in the suggesting of approximatedistributions for a variety of statistics of interest.

Then dAr(r)(X/r)), j = I, . . . , J are asymptotically independentJV,c(0,2ir7f;r*(Xy)) variates respectively. Also if X = 0, =fc2r, . . . , d*(r)(X)is asymptotically Nr(Tcx, 2irTfxx(X)) independently of the previous variatesand if X = ±TT, ±3ir,. . . , dx(T)(\) is asymptotically N^,2irT fxx(\D inde-pendently of the previous variates.

where ha(t) satisfies Assumption 4.3.1, a = 1, . . . , r. Then theAX(T)(\J), \j ^ 0 (mod 2ir), j = 1 , . . . , 7 are asymptotically independentNc(Q,2irT[Hab(.tyfab(\j))variates. Also if X = 0, ±2x,. .. , d^(r)(X) iasymptotically Nr(T[caHa(0)2^T[Hafî)fab(\)})independently of theprevious variates and if X = ±ir, ±3?r,.. . , d;r(r)(X) is asymptoticallyA^(0,27rT[//a6(0)/afc(X)]) independently of the previous variates.

If the same taper h(t) is applied to each of the components of X(/), then wesee that the asymptotic covariance matrix of dA-(r)(X) has the form

4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIER TRANSFORM 95

If the series X(/), t = 0, d b l , . . . is tapered prior to evaluating its finiteFourier transform, then an alternate form of central limit theorem is avail-able. It is

Theorem 4.4.2 Let X(f), / = 0, ±1,. . . be an r vector-valued seriessatisfying Assumption 2.6.1. Suppose 2Xy, X7 ± X^ ^ 0 (mod 2ir) for1 ^ j < k ^ J. Let

Under additional regularity conditions on the ha(t), a = 1,. . . , r, we canobtain a theorem pertaining to sequences X/r) of frequencies tending tolimits \j,j = 1,...,/; see Brillinger (1970) and Exercise 4.8.20. The corre-sponding dA-(r)(X/r)) will be asymptotically independent provided theX/7"), Xfc(r) are not too near each other, (mod 2ir), for 1 ^ j < k ^ /.Exercise 4.8.23 gives the asymptotic behavior of Fourier transforms basedon disjoint stretches of data.

Suppose that X(f), t = 0, db 1, . . . is a real-valued stationary series whosepower spectrum fxXX) is near constant, equal a2/(2ir) say, — °° < X < <».From Theorem 4.4.1 we might expect the values dx(T)(2irs/T), s = 1, . . . ,(T — l)/2 to be approximately independent JVi C(0,7V2) variates and afortiori the values Re dx^(2irs/T), Im dx(T)(2*s/T\ s = 1,. . . , (T - l)/2to be approximately independent 7Vi(0,7V2/2) variates. We turn to a partialempirical examination of this conclusion.

Consider the series V(i), t = 0, 1 , . . . of mean monthly temperatures inVienna for the period 1780-1950; this series, partially plotted in Figure 1.1.1,has a strong yearly periodic component. In an attempt to obtain a serieswith near constant power spectrum, we have reduced this periodic com-ponent by subtracting from each monthly value the average of the values for


for 7 = 0, . . . , 11 and k = 0, 1,. . . . We then evaluated the Fourier trans-form dxw(2irs/T), s = 1,. . . , (T - l)/2 taking T = 2048 = 2»» so thatthe Fast Fourier Transform Algorithm could be used.

Figures 4.4.1 and 4.4.2 are normal probability plots of the values

respectively. The construction of such plots is described in Chernoff andLieberman (1954). The estimated power spectrum of this series, given inSection 7.8, falls off slowly as X increases and is approximately constant. Ifeach of the variates has the same marginal normal distribution, the valuesshould lie near straight lines. The plots obtained are essentially straight lines,with slight tailing off at the ends, suggesting that the conclusions of Theorem4.4.1 are reasonable, at least for this series of values.

Figure 4.4.1 Normal probability plot of real part of discrete Fourier transform of season-ally adjusted Vienna mean monthly temperatures 1780-1950.

the same month across the whole stretch of data. Specifically we haveformed the series

4.4 ASYMPTOTIC DISTRIBUTION OF THE FINITE FOURIER TRANSFORM 97

Figure 4.4.2 Normal probability plot of imaginary part of discrete Fourier transform ofseasonally adjusted Vienna mean monthly temperatures 1780-1950.

The theorems in this section may provide a justification for the remark,often made in the communications theory literature, that the output of anarrow band-pass filter is approximately Gaussian; see Rosenblatt (1961).Consider the following narrow band-pass transfer function centered at Xo

If a series X(t), t = 0, ±1,.. . is taken as input to this filter, then expression(3.6.8), of the previous chapter, indicates that the output series of the filterwill be approximately

Here s is the integral part of T\o/(2ir) and so 2irs/T = XQ. Theorem 4.4.1now suggests that the variate (4.4.8) is asymptotically N(0,4irT~lfxx(ôy) inthe case Xo ̂ 0 (mod TT). Related references to this result are Leonov andShiryaev (1960), Picinbono (1959), and Rosenblatt (1956c).


Exercise 4.8.23 contains the useful result that finite Fourier transformsbased on successive stretches of data are asymptotically independent andidentically distributed in certain circumstances.

4.5 PROBABILITY 1 BOUNDS

It is sometimes useful to have a bound on the fluctuations of a finiteFourier transform

with probability 1.

This means that for K > 1, there is probability 1 that only finitely many ofthe events

occur.We see from expression (4.5.2) that under the indicated conditions, the

Fourier transform has a rate of growth at most of order (T log T)1/2. If X(t) isbounded by a constant, M say, then we have the elementary inequality

giving a growth rate of order T. On the other hand, if we consider |c/jr(r)(X)|for a single frequency X, we are in the realm of the law of the iteratedlogarithm; see Maruyama (1949), Parthasarathy (I960), Philipp (1967),losifescu (1968), and losifescu and Theodorescu (1969). This law leads to arate of growth of order (T log log T)112; other results of the nature of (4.5.2)are given in Salem and Zygmund (1956), Whittle (1959), and Kahane (1968).

as a function of frequency X and sample size T. In this connection wemention

Theorem 4.5.1 Let the real-valued series X(f), t = 0, = h l , . . . satisfyAssumption 2.6.3 and have mean 0. Let h(t) satisfy Assumption 4.3.1. Letdx

(T)(\) be given by (4.5.1). Then

In the case that X(0, t = 0, ±1, . . . is an r vector-valued stochastic serieswe have

Theorem 4.5.2 Let the r vector-valued X(/), / = 0, =fc 1,. .. satisfy Assump-tion 2.6.3 and have mean 0. Let Y(/) be given by (4.5.7) where (a(w)} satisfiescondition (4.5.8), then there is a finite L such that

with probability 1.

where

and

then there is a finite K such that

for some s X r matrix-valued filter {a(«)}. On occasion we will be interestedin relating the finite Fourier transform of Y(0 to that of X(0- Lemma 3.4.1indicates that if X(/), / = 0, ±1, ... is bounded and

4.5 PROBABILITY 1 BOUNDS 99

An immediate implication of Theorem 4.5.1 is that, under the statedregularity conditions,

with probability 1 as T —> °°. In particular, taking X = 0 we see

with probability 1 as T —> °o — this last is the strong law of large numbers.Results similar to this are given in Wiener and Wintner (1941).

Turning to the development of a different class of asymptotic results,suppose the s vector-valued series Y(/), / = 0, ± 1, . . . is a filtered versionof X(t), say


Expression (4.5.12) indicates a possible rate of growth for

with probability 1.

In the case that X(f), t — 0, ±1,... is a series of independent variates,Y(r) given by (4.5.7) is a linear process. Expressions (4.5.12) and (4.5.14)suggest how we can learn about the sampling properties of the Fouriertransform of a linear process from the sampling properties of the Fouriertransform of a series of independent variates. This simplification wasadopted by Bartlett (1966) Section 9.2.

On certain occasions it may be of interest to have a cruder bound on thegrowth of sup \dx(T)(X)\ when the series X(t), t = 0, ±1, . . . satisfies theweaker Assumption 2.6.1.

Theorem 4.5.4 Let the real-valued series X(t), t = 0, ±1,. . . satisfy As-sumption 2.6.1 and have mean 0. Let h(t) satisfy Assumption 4.3.1 and letdx

(T)(X) be given by (4.5.1). Then for any e > 0,

to be of order (log T)1/2. In Theorem 4.5.3 we will see that this rate ofgrowth may be reduced to the order r-1/2(log T)112 if the series are taperedprior to evaluating the Fourier transform.

Theorem 4.5.3 Let the r vector-valued X(f), f = 0, ±1,. . . satisfy Assump-tion 2.6.3 and have mean 0. Let Y(f) be given by (4.5.7) where {a(u)| satisfies(4.5.8). Let

where h(u) = 0 for \u\ ^ 1 and has a uniformly bounded derivative. Thenthere is a finite L such that

with probability 1 as T —> °°.

4.6 THE CRAMER REPRESENTATION

In Section 3.9 we developed two spectral representations for series in-volved in the functional approach to time series while in this section we indi-cate a spectral representation in the stochastic approach. The representationis due to Cramer (1942).

to be the period 2ir extension of the Dirac delta function. We may now state

Theorem 4.6.1 Let X(0, t = 0, ±1,. . . satisfy Assumption 2.6.1. LetZxm(\), — « < X < oo, be given by (4.6.4). Then there exists ZAT(X),— oo < X < oo, such that ZAr(r)(X) tends to Z*(X) in mean of order v, forany v > 0. Also ZA-(X + ITT) = Zx(X), ZAT(X) = Z*(-X) and

4.6 THE CRAMER REPRESENTATION 101

Suppose X(0, t = 0, ±1, . . . is an r vector-valued series. Consider thetapering function

giving the finite Fourier transform

This transform will provide the basis for the representation. Set

We see

if we understand

Define

for o i , . . . , ak = 1,. . ., r, k = 2, 3,

We may rewrite (4.6.7) in differential notation as

Expression (4.6.8) indicates that


where fxx(^) denotes the spectral density matrix of the series X(/). The in-crements of Zjr(X) are orthogonal unless X = \t (mod 2ir). Also jointcumulants of increments are negligible unless £)* \/ = 0 (mod 2v). Theincrements of Z*(X) mimic the behavior of d;r(r)(X) as given in Section 4.3.

In Theorem 4.6.2 we will need to consider a stochastic integral of the form

If

this integral exists when defined as

See Crame'r and Leadbetter (1967) Section 5.3. We may now state theCramer representation of the series X(/), / = 0, ±1, . . . .

Theorem 4.6.2 Under the conditions of Theorem 4.6.1

with probability 1, where Z*(X) satisfies the properties indicated in Theorem4.6.1.

It is sometimes convenient to rewrite the representation (4.6.13) in a forminvolving variates with real-valued components. To this end set

These satisfy

and

If we make the substitutions


In differential notation the latter may be written

then from expression (4.6.8) we see that

where the summations extend over e, 7 = ± 1. In the case k + 7 = 2, theserelations give

The Cramer representation (4.6.13) takes the form

in these new terms.The Cramer representation is especially useful for indicating the effect of

operations on series of interest. For example, consider the filtered series

where the series \(i) has Cramer representation (4.6.13). If

with

then


As an example of an application of (4.6.27) we remark that it, together with(4.6.9), gives the direct relation

of Section 2.8.Suppose the filter is a band-pass filter with transfer function, — TT < X ̂ TT,

applied to each coordinate of the series X(/)> t = 0, ±1,. . . .Suppose, as we may, that the Cramer representation of X(/) is written

Then the band-pass filtered series may be written

for small A. The effect of band-pass filtering is seen to be the lifting, from theCramer representation, of cosinusoids of frequency near ±w. For small Athis series, Y(f), is sometimes called the component of frequency <o of X(/) andis denoted by X(/,oo), suppressing the dependence on A. By considering abank of exhaustive and mutually exclusive band-pass filters with transferfunctions such as

j = 0, 1,. . ., J where (2J + 1)A = r, we see that a series X(f), / = 0, ± 1,. . .may be thought of as the sum of its individual frequency components,

We will see, later in this work, that many useful statistical procedures havethe character of elementary procedures applied to the separate frequencycomponents of a series of interest.

Let us next consider the effect of forming the Hilbert transform of eachcomponent of X(/), / = 0, ±1, . . . . The transfer function of the Hilberttransform is


If we write the Crame'r representation of X(0 in the form

then we quickly see

The cosinusoids of the representation have been shifted through a phaseangle of ir/2. In the case of X(t,cu), the component of frequency u in the seriesX(0, we see from (4.6.36) that

and so

for example. Function (4.6.38) provides us with another interpretation ofthe differential dZx(u>), appearing in the Cramer representation.

Next let us consider the covariance matrix of the 2r vector-valued series

Elementary calculations show that it is given by

in the case w ^ 0 (mod ?r) and by

in the case co = 0 (mod TT). These results provide us with a useful interpreta-tion of the real and imaginary parts of the spectral density matrix of a seriesof interest.

As another example of the use of the Cramer representation let us seewhat form a finite Fourier transform takes in terms of it. Suppose

for some tapering function h(u). By direct substitution we see that

4.7 PRINCIPAL COMPONENT ANALYSIS AND ITS RELATION TO THECRAMER REPRESENTATION

Let Y be a J vector-valued random variable with covariance matrix Syy.If the components of Y are intercorrelated and / > 2 or 3, then it is oftendifficult to understand the essential statistical nature of Y. Consider, there-fore, the problem of obtaining a variate < more elementary than Y, yet con-taining most of the statistical information in Y. We will require ( to havethe form

where FjrA:(X) denotes the r X r matrix-valued function whose existence wasindicated in Theorem 2.5.2. The integral representation now holds in anintegral in mean square sense only; the proof of Theorem 3.9.1 may bemodified to provide this result.


where

From what we have seen in previous discussions of tapering, for large valuesof T, the function //(r)(X — a) is concentrated in the neighborhood ofX s a (mod 2*-). Therefore, from (4.6.43), for large values of T, djr<r>(X) isessentially getting at dZ*(A). As a final remark we mention that (4.6.8) and(4.6.43) imply

exactly. The latter may usefully be compared with the asymptotic expression(4.3.8).

In fact Cramer (1942) developed the representation (4.6.13) under theconditions of Theorem 2.5.2. In this more general case, the function Zjt(X)satisfies

4.7 PRINCIPAL COMPONENT ANALYSIS 107

for some K X J matrix A with K < J. And we will formalize the requirementthat C contains much of the statistical information in Y by requiring that itminimize

We have

where U, is thejth latent vector of Syy,y = 1,...,./. The minimum achievedis

The error caused by replacing Y by (4.7.7) is seen from (4.7.4) to depend onthe magnitude of the latent roots withy > K. If K = J then the error is 0 andexpression (4.7.8) is seen to provide a representation for Y in terms of un-correlated variates fy.

Theorem 4.7.1 Let Y be a J vector-valued variate with covariance matrixSyy. The K X J matrix A that minimizes (4.7.2) for C of the form (4.7.1) isgiven by

where ju./ is the y'th latent root of Syy. The extremal B, C are given by

The individual components of £ are called the principal components of Y.They are seen to have the form f y = UyTY and to satisfy

The variate C, therefore, has a more elementary statistical structure than Y.The theorem has led us to consider approximating the J vector-valued

Yby

and its yth component by


Principal components will be discussed in greater detail in Chapter 9.They were introduced by Hotelling (1933). Theorem 4.7.1 is essentially dueto Kramer and Mathews (1956) and Rao (1964, 1965).

We now turn to the case in which the variate Y refers to a stretch of valuesX(i), t = — Tt... , T of some real-valued stationary time series. In this case

Following the discussion above, the principal components of the variate(4.7.9) will be based upon the latent vectors of the matrix (4.7.10). Thismatrix is finite Toeplitz and so, from Section 3.7, its latent values andvectors are approximately

and

respectively s ~ —T,...,T. The principal components of (4.7.9) are there-fore approximately

If we refer back to expression (4.6.2), then we see that (4.7. l"3) isdx(T)(2irs/(2T + 1)), the finite Fourier transform on which the Crame'rrepresentation was based and which we have proposed be taken as a basicstatistic in computations with an observed stretch of series.

Suppose X(i), t = 0, ± 1,. . . has autocovariance function cxx(u), u — 0,± 1 , . . . , so

4.8 EXERCISES 109

Following Theorem 4.7.1 we are led to approximate X(t) by

if

in some sense. Expression (4.7.15) is seen to be the Cramer representation ofX(t). The Cramer representation therefore results from a limit of a principalcomponent analysis of X(i)t t = 0, db 1,. . . .

Craddock (1965) carried out an empirical principal component analysis ofa covariance matrix resulting from a stretch of time series values. The princi-pal components he obtained have the cosinusoidai nature of (4.7.13).

The collection of latent values of a matrix is sometimes referred to as itsspectrum, which in the case of matrix (4.7.10) are seen, from (4.7.11), toequal approximately 2wfxx(2irs/(2T + 1)), s = -T,...,T, where fxx(\) isthe power spectrum of the series X(t), t = 0, ± 1,. . . . We have thereforebeen led to an immediate relation between two different sorts of spectra.

4.8 EXERCISES

4.8.1 Let Y = U + jV where U and V are jointly multivariate normal. Let£Y = y and £(Y — y)(Y — y)r = 0. Prove that the individual componentsof Y are statistically independent if £(Y — y)(Y — y)7" is diagonal.

4.8.2 If Y is MC(0,S), prove that AY is MC(0,ASAT) for any s X r matrix A.Conclude that if the entries of Y are independent Nic(0,<r2) variates and ifA is r X r unitary, then the entries of AY are also independent Nic(Q,<r2).Also conclude that the marginal distributions of a multivariate complexnormal are complex normal.

4.8.3 If X is AT,c(y,S) and Im S = 0, show that Re X and Im X are statisticallyindependent.

4.8.4 If W is distributed as Wrc(n,V) prove that W'„ is distributed as S^X^/2.

4.8.5 If W is distributed as H^rc(n,£), prove that E\V = nS. Also prove that

E(Wjk - nLJk}(Wlm - nS/w) = nSj^Sik.

where the summation in expression (4.7.14) is over s corresponding to the Klargest values of (4.7.11). If we take K = IT + 1 and let T—> oo, then wewould expect the value (4.7.14) to be very nearer). In fact, (4.7.14) tends to


4.8.6 If W is distributed as W,c(«f Z), prove that £-1/2WS-"2 is distributed asWr

c(n,l) if £ is nonsingular.4.8.7 Let Y be distributed as Nn

c(y,a2T) and let

if j = k where (*) denotes ay' X j permanent; see Goodman and Dubman(1969).

4.8.10 Let X(f), / = 0, ±1, ... be a stationary process with finite moments andsatisfying X(t + D = X(/), f = 0, ±1, ... for some positive integer T.(Such an X(r) is called a circular process.) Prove that

4.8.11. If X(/), t — 0, ±1,... is a stationary Gaussian series, prove that for k > 2

4.8.12 Let X(r), r = 0, ±1, ... be an r vector-valued pure noise series with

where A* is_Hermitian of rank /jfc. A necessary and sufficient condition forthe forms YrA*Y to be distributed independently with YTA*Y distributedas (noncentral chi-squared)<r2X2^(^ry/(r2)/2 is that n\ -\ \- nK = n\see Brillinger (1973).

4.8.8 Let W be distributed as ^(n.S). Let it be partitioned into

and equals

with Wn r X r and Y/22S X s. Suppose that 2 is similarly partitioned.Prove that W22 - W2iWir!Wi2 is distributed as

If Si2 = 0, prove that W2iWir»Wi2 is distributed as Wsc(r,JLid and is

independent of W22 - \V2iWir1 W|2.4.8.9 Let Y be Nr

c(Q, S). Prove that

4.8 EXERCISES 111

If da(T)[\]is given by expression (4.3.13), prove that

4.8.13 Let X(/), t = 0, ±1, . . . be an r vector-valued stationary series satisfyingexpression (4.3.6). Let T = min; Tj. If c/u

(r)(X) is given by expression (4.3.13)show that

4.8.14 Suppose the conditions of Theorem 4.3.2 are satisfied. Suppose that Ha(\)of (4.3.3) satisfies

for some finite K where v > 2, a = 1, . . . , r. Then prove that

4.8.15 Let X(r), / = 0, ±1,.. . be an r vector-valued series. Suppose the stretch ofvalues X(r), / = 0 , . . . , T - I is given. Prove that Ax

(T>(2irs/T), 5 = 0 , . . . ,T/2 is a sufficient statistic.

4.8.16 Under the conditions of Theorem 4.6.1, prove that ZA-(X) is continuous inmean of order v for any v > 0.

4.8.17 Let Y be a / vector-valued random variable with covariance matrix 2yy.Determine the linear combination «TY, with ara = 1, that has maximumvariance.

4.8.18 Making use of Exercise 3.10.15, generalize the discussion of Section 4.7 tothe case of vector-valued series.

4.8.19 Under the conditions of Theorem 4.4.2, prove that if X ̂ 0 (mod TT), thenarg{£/a(r)(X)} tends in distribution to a uniform variate on the interval(0,27r) as r-> oo. What is the distribution if X = 0 (mod TT)?

4.8.20 Suppose the conditions of Theorem 4.4.2 are satisfied. Suppose //a(X) of(4.3.3) satisfies

for some finite K where v > 2, a = 1 , . . . , r. Suppose X/7) —» X; as T —» oowith min/ T\\j(T) - 2irl\, min/ r|X/r) ± X*(7) - 2irl\ -* oo as T-^ oo,1 ^ j < k ^ J. Prove that the conclusions of Theorem 4.4.2 apply to*r<"(XXD)J =! , . . . , / .


r-i4.8.21 Let dxW(\) = J] X(t) exp{ -i\t }. Show that

r=0

(a) Show that dx(T)(2-irs/T) is distributed as Wic(0,7V2) for s an integer

with 2irs/T ^ 0 (mod TT).(b) Indicate the distribution of

4.8.23 Let X(r), / = 0, ±1, ... be an r vector-valued series satisfying Assumption2.6.1. Let ha(u), — °° , satisfy Assumption 4.3.1. Let

where //«<r)(X) is given by (4.3.2).4.8.22 Let X(t), t = 0, . . . , T — 1 be a sequence of independent normal variates

with mean 0 and variance <r2. Let

(c) Indicate the distribution of arg

for - oo < X < co ; / = 0 , . . . , L - 1; a = 1, . . . , r. Show that d;r(K)(X,/) =[da

<y)(\,f)], I = 0, . . . , L — 1 are asymptotically independent

variates if X ̂ 0 (mod TT) and asymptotically Nr(Q,2irV[Hab(Q) fab(\)])variates if X = ±TT, ±3?r,. . . , as K—> °°. Hint: This result follows directlyfrom Theorem 4.4.2 with X(t) and the ha(u) suitably redefined.

4.8.24 If X is MC(0,S) and A is r X r Hermitian, show that XTAX is distributedas 2^'jî Hj(.Uj2 + Vj2) where MI, • • • , Mr are the latent values of

and HI, . . . , Mr, 01, . . . , vr are independent W(0,l) variates.4.8.25 Let X i , . . . , XB be independent Mc(y,S) variates. Showthat £ = ^;X7/«

and £ = 2J/X, ~ tfvXXy — tfy)T/« are the maximum likelihood estimates

of y and £; see Giri (1965).4.8.26 Show that a Wr

c(n^~) variate with Im S = 0 may be represented asi(Wn + W22 + i(Wi2 - Wzi)} where

4.8 EXERCISES 113

Conclude that the real part of such a complex Wishart is distributed asi^2(2rt,2).

4.8.27 Use the density function (4.2.9) to show that W = Wrc(n$ may be repre-

sented as (X + /Y)(X + /Y)T where X, Y are lower triangular, Xjk, YJk,1 ^ k < j ^ r are independent #1(0,1) variates and Xjf, Yi3

2 are in-dependent Xn-j+l-

4.8.28 Under the conditions of Exercise 4.8.25, show that 32 = ^TS-1v is dis-tributed as X'2r

2(2/i|irS-1v)X22n r) where X'2r2(S) denotes a noncentral chi-

square variate with 2r degrees of freedom and noncentrality parameter d andX2(n-r) denotes an independent central chi-squared with 2(n — r) degrees offreedom; see Giri (1965).

4.8.29 Under the conditions of Theorem 4.4.2, show that the asymptotic distribu-tion of dx(T)(\) is unaffected by the omission of any finite number of theX(/).

4.8.30 Let W be distributed as Wrc(n,*£). Show that

where Kv is the modified Bessel function of the second kind and order v,see Pearson et al (1929) and Wishart and Bartlett (1932).

(a) Show that the density of x = W\2 (with respect to Re x, Im x) isgiven by

4.8.33 Let W be distributed as

4.8.32 Let W be distributed as Jfaf n\ Y Show that the density function ofx = Wn is given by

all possible interchanges a}•,<-> bj for j - 1, . . . , £ — 1}

where P is as in the previous exercise. Show that the number of termssummed in all is 2k~}(k — 1)!

where the summation is over all permutations, P, of the set {1, 2, . . . , k]with the property that P leaves no proper subset of {1, 2,. . ., k} invariant.Show that the number of such permutations is (k — 1)!

4.8.31 Let W be distributed as ^(w,£). Show that


(b) Show that the density of y = Re Jf 12 is given by

in terms of the Cramer representation of the series X(i), t - 0, ±1,. . . .4.8.36 (a) Let W be Wr(n,"£) and a, (J, Y, & be r vectors. Show that

where a = Re p, ft = Im p.(c) Show that the density of z = Im W\i is given by

where fi(X), 0 ^ X ̂ TT, is a complex Brownian motion process satisfyingcov{B(\), B(n)\ = min{X, M} andfi(-X) = 5(X).

4.8.35 Suppose the series Y(i), t - 0, ±1,... is given by (2.9.15). Show that it hasthe form

where /o is the modified Bessel function of the first kind and order 0.

All of the densities of this exercise were derived in Goodman (1957).4.8.34 Let X(t), t — 0, ±1, ... be a stationary Gaussian series with mean 0 and

power spectrum fxx(X), — °° < X < oo. Show that the Cramer representa-tion may be written

(d) Show that the density of </> = arg W\2 is given by

where <£o = arg p.(e) Show that the density of w = \ W12! is given by

(b) Let W be Wrc(/i,S) and o, (J, Y, 8 complex r vectors. Show that

4.8 EXERCISES 115

4.8.37 Let X(t), t = 0, ±1, ... be a 0 mean, real-valued series satisfying Assump-tion 2.6.1. Let u be a non-negative integer. Then Theorem 2.9.1 indicatesthat the series Y(i) = X(t + u)X(t) also satisfies Assumption 2.6.1. Use thisand Theorem 4.4.1 to show that

is asymptotically normal with mean cxx(u) and variance

5

THE ESTIMATION OFPOWER SPECTRA

116

We have seen in Section 2.5 that this power spectrum is non-negative, even,and of period 2-n- with respect to X. This evenness and periodicity means thatwe may take the interval [0, ir] as the fundamental domain of definition of

/A-A-(X) if we wish.

then the power spectrum of the series X(t), t = 0, ±1, . . . at frequency X isdefined to be the Fourier transform

Suppose that the autocovariance function satisfies

and autocovariance function

5.1 POWER SPECTRA AND THEIR INTERPRETATION

Let X(t\ t = 0, ±1, . . . be a real-valued time series with mean function

5.1 POWER SPECTRA AND THEIR INTERPRETATION 117

Under the condition (5.1.3), fx*O) is bounded uniformly continuousfunction. Also the relation (5.1.4) may be inverted and the autocovariancefunction cxx(u) expressed as

In particular setting u = 0 gives

In Sections 2.8 and 4.6 we saw that the power spectrum transforms in anelementary manner if the series is filtered in a linear time invariant way.Specifically, suppose the series Y(t), t = 0, ±1,. . . results when the seriesX(i), t = 0, dbl , . . . is passed through a filter having transfer functionA(\), — oo < X < oo. Then, from Example 2.8.1, the power spectrum of theseries Y(t), t = 0, ±1, . . . is given by

We may use expressions (5.1.6) and (5.1.7) to see that

Expression (5.1.8) suggests one possible means of interpreting the powerspectrum. Suppose we take for — x < a ^ w and A small

and then extend A(a) outside of the interval (—ir,ir] periodically. This trans-fer function corresponds to a filter proportional to a band-pass filter; seeSection 2.7. The output series Y(t), t = 0, ±1 , . . . is therefore proportionalto X(t,\), the component of frequency X in the series X(t), t = 0, ±1, . . . ;see Section 4.6. From expressions (5.1.8) and (5.1.9) we now see that

This means that/A-A-(X) may be interpreted as proportional to the variance ofX(t,\), the component of frequency X in the series AX/), t = 0, ±1, . . . . In-cidentally, we remark that

This equals 0 if X is farther than A from 0, ±27r,. . ., and so

118 THE ESTIMATION OF POWER SPECTRA

Now if F(0 is taken to be the voltage applied across the terminals of thesimple electric circuit of Figure 5.1.1 containing a resistance of R = 1 ohm,then the instantaneous power dissipated is Y(t)2. An examination of (5.1.12)now indicates that /r*(X) may be interpreted as the expected amount ofpower dissipated in a certain electric circuit by the component in X(t) offrequency X. This example is the reason that/y;r(X) is often referred to as a"power" spectrum.

Figure 5.1.1 An elementary electric circuit withvoltage Y(t) applied at time t.

Roberts and Bishop (1965) have discussed a simple vibratory system forillustrating the value of the power spectrum. It consists of a cylindrical brasstube with a jet of air blown across an open end; this may be thought of asX(t). The output signal is the pressure at the closed end of the tube, while thetransfer function of this system is sketched in Figure 5.1.2. The peaks inthe figure occur at frequencies

where / = length of the tube, c = velocity of sound, and n = I, 2 , . . . . Theoutput of this system will have pressure proportional to

Figure 5.1.2 Approximate form of transfer function of system consisting of brass tube withair blown across one end.

5.1 POWER SPECTRA AND THEIR INTERPRETATION 119

where the Xn are given by (5.1.13). A microphone at the closed end allowsone to hear the output.

We conclude this section by presenting some examples of autocovariancefunctions and the corresponding power spectra, which are given in Figure5.1.3. For example, if cxx(u) is concentrated near u = 0, then/r;r(X) is nearconstant. If cxx(u) falls off slowly as u increases, then/**(X) is concentratednear X = 0, ±2ir,.... If Cxx(u) oscillates about 0 as u increases, then /rjr(A)has substantial mass away from X = 0 (mod 2ir),

We now turn to the development of an estimate of/**(X), — °° < X < °°,and a variety of statistical properties of the proposed estimate. For addi-tional discussion the reader may wish to consult certain of the following

Figure 5.1.3 Selected autocovariances,cxx(u),and associated power spectra,fxxfy)-


review papers concerning the estimation of power spectra: Tukey (1959a,b),Jenkins (1961), Parzen (1961), Priestley (1962a), Bingham et al (1967), andCooley et al (1970).

5.2 THE PERIODOGRAM

Suppose X(t), t = 0, ± 1,. . . is a stationary series with mean function exand power spectrum fxxO^), — °° < X < <». Suppose also that the valuesX(G),. . ., X(T — 1) are available and we are interested in estimating/rXX).Then we first compute the finite Fourier transform

These distributions suggest a consideration of the statistic

as an estimate offxxO^) in the case X ̂ 0 (mod 2w).The statistic Ixx(T)(X) of (5.2.3) is called the second-order periodogram, or

more briefly periodogram, of the values X(V), . . ., X(T — 1). It was intro-duced by Schuster (1898) as a tool for the identification of hidden periodi-cities because in the case

/A-Ar(r)(X) has peaks at the frequencies X == ±wy (mod 2ir).We note that /jr*(r)(X), given by (5.2.3), has the same symmetry, non-

negativity, and periodicity properties as/o-(X).Figure 5.2.1 is a plot of monthly rainfall in England for the period 1920

to 1930; the finite Fourier transform, dx(T)(\\ of values for 1780-1960 was

calculated using the Fast Fourier Transform Algorithm. The periodogramIxx(T)(\was then calculated and is given as Figure 5.2.2. It is seen to be arather irregular function of X. This irregularity is also apparent in Figures

Following Theorem 4.4.2, this variate is asymptotically

5.2 THE PERIODOGRAM 121

5.2.3 and 5.2.4 which are the lower and upper 100 periodogram ordinates ofthe series of mean monthly sunspot numbers (see Figure 1.1.5 for meanannual numbers). Other examples of periodograms are given in Wold (1965).In each of these examples Ixx(T)(ty is a very irregular function of X despitethe fact that fxx(tyis probably a regular function of X. It appears that/**(r)(X) is an inefficient estimate of/A-A-(X) and so we turn to a considerationof alternate estimates. First, we will present some theorems relating to thestatistical behavior of Ixx(T)(X) in an attempt to understand the source of theirregularity and so construct better estimates.

First consider the expected value of the periodogram. We have

Figure 5.2.2 Periodogram of composite rainfall series of England and Wales for the years1789-1959. (Logarithmic plot)

Figure 5.2.1. Composite index of rainfall for England and Wales for the years 1920-1930.


Figure 5.2.3 Low frequency portion of logio periodogram of monthly mean sunspot num-bers for the years 1750-1965.

Figure 5.2.4 High frequency portion of logio periodogram of monthly mean sunspotnumbers for the years 1750-1965.

Theorem 5.2.1 Let X(f), t = 0, ±1,.. . be a time series with EX(f) = cx,cov{*(/ + w), X(t)\= cxx(u\ t, u = 0, ±1, Suppose

then


In the case that X ̂ 0 (mod 2ir), the final term in (5.2.6) is reduced in sizeand we see that EIxxlT)Q3 is essentially a weighted average of the powerspectrum of interest with weight concentrated in the neighborhood of X. Inthe limit we have

Corollary 5.2.1 Under the conditions of the theorem, Ixxm(X) is anasymptotically unbiased estimate of/rx(X) if X ̂ 0 (mod 2ir).

The next theorem gives a bound for the asymptotic bias of Ixx(T)(X).

Theorem 5.2.2 Under the conditions of Theorem 5.2.1 and if

The O(r~1) term is uniform in X.

we have

We remark that in the case X = 2ns/T, s an integer ^ 0 (mod T), thesecond term on the right side of expressions (5.2.6) and (5.2.8) drops outleading to useful simple results. Now a consideration of /*xcr)(X) only forfrequencies of the form 2irs/T, s an integer ^ 0 (mod T), amounts to a con-sideration of IxT~cx(T\x-cx(r>(\), the periodogram of the sample values

removed, because we have the identity

for s ̂ 0 (mod T). In view of the fact that the basic definition of a powerspectrum is based on covariances and so is mean invariant, the restrictedconsideration of Ixx(T)Qirs/T), s an integer ^ 0 (mod T), seems reasonable.We will return to this case in Theorem 5.2.4 below.

We have seen in Sections 3.3 and 4.6 that advantages result from taperingobserved values prior to computing their Fourier transform. We now turn tothe construction of a modified periodogram that is appropriate for a series oftapered values. Suppose that we have formed


for some taper h(u) satisfying Assumption 4.3.1. Then Theorem 4.4.2suggests that the distribution of dx(T)(\)may be approximated by

in the case X ̂ 0 (mod TT). This suggests that we might consider the statistic

as an estimate of fxx(^) in the tapered case.We have replaced T J h(f)2dtby the sum of the squares of the taper co-

efficients as this is easily computed. Suppose we set

and

If it is possible to apply the Poisson summation formula, then these two areconnected by

and //(r)(X) is seen to have substantial magnitude for large T only ifX 55 0 (mod 2vr). This observation will help us in interpreting expression(5.2.17). We can now state

Theorem 5.2.3 Let X(t\ t = 0, ± 1,. . . be a real-valued series satisfying theconditions of Theorem 5.2.1. Let h(u) satisfy Assumption 4.3.1. Let /**<r)(X)be given by (5.2.13). Then

In the case that X ̂ 0 (mod 2ir\ the final term in (5.2.17) will be of re-duced magnitude. The first term on the right side of (5.2.17) is seen to be aweighted average of the power spectrum of interest with weight concen-trated in the neighborhood of X and relative weight determined by the taper.This expression is usefully compared with expression (5.2.6) corresponding


to the nontapered case. lffxx(oi) has a substantial peak for a in the neighbor-hood of X, then the expected value of Ixx(T)(X),given by (5.2.6) or (5.2.17),can differ quite substantially from fxx(ty. The advantage of employing ataper is now apparent. It can be taken to have a shape to reduce the effect ofneighboring peaks.

Continuing our investigation of the statistical properties of the periodo-gram as an estimate of the power spectrum, we find Theorem 5.2.4 describesthe covariance structure of IXX(T))in the nontapered case and when it is ofthe special form 2-n-s/T, s an integer.

Theorem 5.2.4 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Ixx(T)(X) be given by expression (5.2.3). Let r, s beintegers with r, s, r ± s ̂ 0 (mod T\ Let u = 2irr/T, \ = 2irs/T. Then

Given e > 0, the O(T~l) term is uniform in X, n deviating from all multiplesof 2ir by at least s.

The O(7T~1) terms are uniform in X, ju of the indicated form.

In connection with the conditions of this theorem, we remark thatIxx(T)(2irr/T) = Ixx

(T)(2irs/T) if r + s or r - s = 0 (mod T) so the esti-mates are then identical.

This theorem has a crucial implication for statistical practice. It suggeststhat no matter how large Tis taken, the variance of Ixx(T)(ty will tend to re-main at the level fxx(X)- If an estimate with a variance smaller than this isdesired, it is not to be obtained by simply increasing the sample length andcontinuing to use the periodogram. The theorem also suggests a reason forthe irregularity of Figures 5.2.2 to 5.2.4 — namely, adjacent periodogramordinates are seen to have small covariance relative to their variances. Infact we will see in Theorem 5.2.6 that distinct periodogram ordinates areasymptotically independent.

Theorem 5.2.5 describes the asymptotic covariance structure of theperiodogram when X is not necessarily of the form 2-n-s/T.

Theorem 5.2.5 Let X(t\ t — 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Ixx(T)(^) be given by expression (5.2.3). SupposeX, M ^ 0 (mod 2ir). Then

for — oo < X < oo. Then Ixx(T}(^j(T)),j = 1,. . . ,/are asymptotically in-dependent fxx(\i)X2

2/2 variates. Also if X = ±ir, ±3*-,. . ., Ixx(T)(X) isasymptotically fxx(X)x\ independently of the previous variates.

In Theorem 5.2.6 X,2 denotes a chi-squared variate with v degrees of free-dom. The particular case of X22/2 is an exponential variate with mean 1.

A practical implication of the theorem is that it may prove reasonable toapproximate the distribution of a periodogram ordinate, /**(r)(X), by amultiple of a X2

2 variate. Some empirical evidence for this assertion is pro-vided by Figure 5.2.5 which is a two degree of freedom chi-squared probabil-ity plot of the values Ixx

(T)(2irs/T), s = r/4,. . . , T/2, for the series of meanmonthly sunspot numbers. We have chosen these particular values of s be-cause Figures 5.2.4 and 5.4.3 suggest that/A-A-(X) is approximately constantfor the corresponding frequency interval. If the values graphed in a twodegree of freedom chi-squared probability plot actually have a distributionthat is a multiple of X22, then the points plotted should tend to fall along astraight line. There is substantial evidence of this happening in Figure 5.2.5.Such plots are described in Wilk et al (1962).

Theorem 5.2.6 reinforces the suggestion, made in the discussion of Theo-rem 5.2.4, that the periodogram might prove an ineffective estimate of thepower spectrum. For large T its distribution is approximately that of amultiple of a chi-squared variate with two degrees of freedom and hence isvery unstable. In Section 5.4 we will turn to the problem of constructingestimates that are reasonably stable.


We remark that expression (5.2.20) is more informative than (5.2.19) inthat it indicates the transition of cov{/A-A-(r)(X), IXX(T)(JJ)\ into varIxx(T)(^)as M —> X. It also suggests the reason for the reduced covariance in the casethat X, /z have the particular forms 2irs/T, 2irr/T with s, r as integers.

We now complete our investigation of the elementary asymptotic proper-ties of the periodogram by indicating its asymptotic distribution underregularity conditions. Theorem 4.4.1 indicated the asymptotic normality of</Ar(r)(X) for X of the form 2ws/T, s an integer. An immediate application ofthis theorem gives

Theorem 5.2.6 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let Sj(Tbe an integer with X/r) =2irSj{)/tending toXjas r-> oo for; = 1,. . ., J. Suppose 2X/r), X/r) db \k(T0 (mod2ir)for 1 ̂ j < k ^ J and T = 1,2, Let

j2


Figure 5.2.5 X22 probability plot of the upper 500 periodogram ordinates of monthly

mean sunspot numbers for the years 1750-1965.

The mean and variance of theasymptoticdistributionofIxx(T)(2irs/T)are seen to be consistent with the large sample mean and variance ofIxxm(2irs/T) given by expressions (5.2.8) and (5.2.18), respectively.

Theorem 5.2.6 does not describe the asymptotic distribution of Ixxm(tywhen X = 0 (mod 2ir). Theorem 4.4.1 indicates that the asymptotic distribu-tion isfxx(\)Xi2when EX(i) = cx = 0. In the case that ex ^ 0, Theorem4.4.1 suggests approximating the large sample distribution by fxx(Wi2

where x'i2 denotes a noncentral chi-squared variate with one degree of free-dom and noncentrality parameter \cx\^T/(2irfxx(\))-

Turning to the tapered case we have

Theorem 5.2.7 Let X(t), t = 0, =b l , . . . be a real-valued series satisfyingAssumption 2.6.1. Suppose 2\y, X7 ± \k ^ 0 (mod 2?r) for 1 ̂ j < k ^ J.Let h(u) satisfy Assumption 4.3.1. Let


for — o > < X < oo. Then IXX(T)(^J), j = 1,.. ., J are asymptotically in-dependent/rA<X;)X22/2 variates. Also if X = ±TT, ±3r,. . . , Ixx(T)(\) isasymptotically fxx(\)*i2, independently of the previous variates.

With the definition and limiting procedure adopted, the limiting distribu-tion of Ixxm(^) is the same whether or not the series has been tapered. Thehope is, however, that in large samples the tapered estimate will have lessbias. A result of extending Theorem 5.2.5 to tapered values in the case of 0mean is

Theorem 5.2.8 Let X(t), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1) and having mean 0. Let h(u) satisfy Assumption 4.3.1and let Ixx{T)(X) be given by (5.2.22). Then

5.3 FURTHER ASPECTS OF THE PERIODOGRAM

The power spectrum, fxx(K), of the series X(t\ t = 0, ±1, ... was de-fined by

Here

The extent of dependence of Ixx(T)(>C) and IXX(T)(P) is seen to fall off as thefunction H2

(T) falls off.Bartlett (1950, 1966) developed expressions for the mean and covariance

of the periodogram under regularity conditions; he also suggested approxi-mating its distribution by a multiple of a chi-squared with two degrees offreedom. Other references to the material of this section include: Slutsky(1934), Grenander and Rosenblatt (1957), Kawata (1959), Hannan (1960),Akaike (1962b), Walker (1965), and Olshen (1967).

for — oo < X, /i < oo. The error term is uniform in X, ju.

5.3 FURTHER ASPECTS OF THE PERIODOGRAM 129

where cxx(u), u = 0, ±1,.. . was the autocovariance function of the series.This suggests an alternate means of estimating fxx(fy- We could first esti-mate cxx(u) by an expression of the form

where

and then, taking note of (5.3.1), estimate/A-AT(X) by

If we substitute expression (5.3.2) into (5.3.4), we see that this estimate takesthe form

that is, the periodogram of the deviations of the observed values from theirmean. We noted in the discussion of Theorem 5.2.2 that

for A of the form 2irs/T, s ^ 0 (mod T) and so Theorems 5.2.4 and 5.2.6 infact relate to estimates of the form (5.3.5).

In the tapered case where

we see directly that

suggesting the consideration of the 0 mean statistic

where

in the tapered case. The expected value of this statistic is indicated in Ex-ercise 5.13.22.

Turning to another aspect of the periodogram, we have seen that theperiodogram ordinates Ixx(T}(î), J — 1» • • • »-A ai>e asymptotically inde-pendent for distinct \j, j = 1,...,/. In Theorem 5.3.1 we will see thatperiodogram ordinates of the same frequency, but based on differentstretches of data, are also asymptotically independent.

Theorem 5.3.1 Let X(t\ t = 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let h(u) satisfy Assumption 4.3.1 and vanish for it < 0.T p t


We have therefore been led to consider the Fourier transform

based on mean-corrected values. We remark that, in terms of the Cram6rrepresentation of Section 4.6, this last may be written

showing the reduction of frequency components for X near 0, ±27r , . . . . Inthe light of this discussion it now seems appropriate to base spectral esti-mates on the modified periodogram

for - * < \ < oo, / = 0,.. . , L - 1. Then, as F-» «>, IXXM(\J) / = 0,.. ., L — 1, are asymptotically independent fxx(^22/2 variates if X ̂ 0(mod TT) and asymptotically independent fxx(X)*i2variates if X = ±x,±3r,

This result will suggest a useful means of constructing spectral estimateslater. It is interesting to note that we can obtain asymptotically independentperiodogram values either by splitting the data into separate segments, as wedo here, or by evaluating them at neighboring frequencies, as in Theorem5.2.7.

We conclude this section by indicating several probability 1 results relat-ing to the periodogram. We begin by giving an almost sure bound forIXX

(T)(X) as a function of X and T.

5.4 THE SMOOTHED PERIODOGRAM 131

Theorem 5.3.2 Let X(t), t = 0, ± 1 , . . . be a real-valued series satisfyingAssumption 2.6.3 and having mean 0. Let h(u) satisfy Assumption 4.3.1. Let

Then

with probability 1.

In words, the rate of growth of the periodogram is at most of order log T,uniformly in X, under the indicated conditions. A practical implication ofthis is that the maximum deviation of Ixxm(ty homfxx(ty as a function of Xbecomes arbitrarily large as T —> ». This is yet another indication of thefact that Ixx(T)0C)is often an inappropriate estimate ofxx(X).

We now briefly investigate the effect of a linear time invariant operationon the periodogram. Suppose

for some filter {a(u)} satisfying

5.4 THE SMOOTHED PERIODOGRAM

In this section we make our first serious proposal for an estimate of thepower spectrum. The discussion following Theorem 5.2.4 indicated that acritical disadvantage of the periodogram as an estimate of the power spec-trum,/**^), was that its variance was approximately fxxW1, under reason-

and having transfer function A(\). Theorem 4.5.2 indicated that underregularity conditions

almost surely. Elementary algebra then indicates that

with probability 1, the error term being uniform in X. In words, the effect offiltering on a periodogram is, approximately, multiplication by the modulussquared of the transfer function of the filter. This parallels the effect offiltering on the power spectrum as given in expression (5.1.7).


able regularity conditions, even when based on a lengthy stretch of data. Onmany occasions we require an estimate of greater precision than this andfeel that it must exist. In fact, Theorem 5.2.6 suggests a means of construct-ing an improved estimate.

Suppose s(T) is an integer with 2irs(T)/T near X. Then Theorem5.2.6 indicates that the (2m -f 1) adjacent periodogram ordinatesIXX(T)(ITT{S(T)+ j]/T),j = 0, ±1,. . . , ±m are approximately independent/A-A-(X)X2

2/2variates,if2[s(r) + j\ ^ Q(modT)J = 0, ±1,. . . , ±w. Thesevalues may therefore provide (2m +1) approximately independent esti-mates of/A-A-(X), which suggests an estimate having the form

that is, a simple average of the periodogram ordinates in the neighborhoodof X. A further examination of Theorem 5.2.6 suggests the consideration ofthe estimate

if X = 0, ±27r, ±4ir,... or if X = d=7r, ±37r,. . . and T is even, and

if X = ±TT, ±37r,. . . and T is odd.The estimate given by expressions (5.4.1) to (5.4.3) is seen to have the

same non-negativity, periodicity, and symmetry properties as/^^(X) itself.It is based on the values dx(T)(2irs/T), s — 0,. . . , T — \ and so may berapidly computed by the Fast Fourier Transform Algorithm if T happens tobe highly composite. We will investigate its statistical properties shortly.

In preparation for Theorem 5.4.1 set

the Fejer kernel of Section 3.3. Then set


and set

Figure 5.4.1 Plot of the kernel ATm(\) for T = 11 and m = 0,1,2, 3.

for

Figure 5.4.2 Plot of the kernel £rm(X) for T = 11 and m = 1, 2, 3.

Taking note of the properties of FT-(X), indicated in Section 3.3, we see thaty4rm(X), Brm(\), and Crm(X) are non-negative, have unit integral over theinterval (— TT.TT) and have period 2ir. They are concentrated principally in theinterval (—2vm/T)2irm/T) for — TT < X < TT. y4rm(X) is plotted in Figure5.4.1 for the values T = 11, m = 0,1, 2, 3. It is seen to have an approximaterectangular shape as was to be expected from the definition (5.4.5). #rm(X) isplotted in Figure 5.4.2 for the values T = 11, m - 1, 2, 3. It is seen to have ashape similar to that of Arm(\) except that in the immediate neighborhood ofthe origin it is near 0.

Turning to an investigation of the expected value of /or(r)(X) we have

Theorem 5.4.1 Let X(f), t = 0, ± 1,. . . be a real-valued series withEX(t) = cx and cov{*(f + «), *(01 = cxx(u) for t, u = 0, ±1Suppose



Let/A-Ar(r)(X) be given by (5.4.1) to (5.4.3). Then

The expected value of fxx(T)0$ is a weighted average of the power spec-trum of interest, fxx(ot), with weight concentrated in a band of width 4irm/Tabout X in the case X ̂ 0 (mod 2ir). In the case X = 0 (mod 2*-)Efxx(T)0^)remains a weighted average of fxx(a) with weight concentrated in theneighborhood of X with the difference that values offxx(a) in the immediateneighborhood of 0 are partially excluded. The latter is a reflection of thedifficulty resulting from not knowing EX(t). If m is not too large comparedto T andfxx(ci) is smooth, then Efxx(T}(X) can be expected to be near/A-A-(X)in both cases. A comparison of expressions (5.2.6) and (5.4.8) suggests thatthe bias of fxxm(X) will generally be greater than that of Ixxm(ty as theintegral extends over a greater essential range in the former case. We willmake detailed remarks concerning the question of bias later.


Corollary 5.4.1 Suppose in addition that X - 2ws(T)/T = O(T~*), m isconstant with respect to r and

and T is odd.

then

In the limit, fxx(T)(^)is an asymptotically unbiased estimate offxxO^)-

In summary, with regard to its first moment, /r*(r)(X) seems a reasonableestimate of/o-(X) provided that m is not too large with respect to T. Theestimate seems reasonable in the case X = 0 (mod 2?r) even if EX(t) is un-known. Turning to the second-order moment structure of this estimatewe have

Theorem 5.4.2 Let X(i), t = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let /r*(r)(X) be given by (5.4.1) to (5.4.3) with

In the case X f^ 0 (mod TT), the effect of averaging 2m + 1 adjacentperiodogram ordinates has been to produce an estimate whose asymptoticvariance is l/(2m + 1) times that of the periodogram. Therefore, con-template choosing a value of m so large that an acceptable level of stabilityin the estimate is achieved. However, following the discussion of Theorem5.4.1, note that the bias of the estimate/*x(r)(X) may well increase as m isincreased and thus some compromise value for m will have to be selected.

The variance of /o-(r)(X) in the case of X = 0 (mod tr) is seen to be ap-proximately double that in the X ̂ 0 (mod TT) case. This reflects the fact thatthe estimate in the former case is based approximately on half as many inde-pendent statistics. The asymptotic distribution of fxx(T)(K) under certainregularity conditions is indicated in the following:

Theorem 5.4.3 Let X(f), t — 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let fxx(T)(Xbe given by (5.4.1) to (5.4.3) with 2irs(T)/T—> X as T —» oo. Suppose X, ± X* ^ 0 (mod 2?r) for 1 ^ j < k ^ J. Then/o-(7)(Xi), . . . , f x x ( T ) ( \ j ) are asymptotically independent with fxxm(\)asymptotically fxx(K)xlm+2/(4m + 2) if X ̂ 0 (mod ?r), asymptotically/™(X)x2m2/(2m) if X s 0 (mod *).

This theorem will prove especially useful when it comes time to suggestapproximate confidence limits for /rjr(X).

Figure 5.4.3 presents the logarithm base 10 of/A-A-(r)(X), given by (5.4.1)to (5.4.3), for the series of monthly sunspot numbers whose periodogramwas given in Figures 5.2.3 and 5.2.4. The statistic fxx(r)(X) is calculated for0 ^ X ̂ IT, m — 2, 5, 10, 20 and the growing stability of the estimate as mincreases is immediately apparent. The figures suggest that/rXX) has a lotof mass in the neighborhood of 0. This in turn suggests that neighboringvalues of the series tend to cluster together. An examination of the seriesitself (Figure 1.1.5) confirms this remark. The periodogram and the plotscorresponding to m = 2, 5, 10 suggest a possible peak in the spectrum in theneighborhood of the frequency .01 ST. This frequency corresponds to the

Also


X - 2irs(T)/T = O(T~l). Suppose X ± M ̂ 0 (mod 2ir) and that m does notdepend on T. Then


Figure 5.4.3 Logio/jcx(r)(X) for monthly mean sunspot numbers for the years 1750-1965with 2m + 1 periodogram ordinates averaged.


eleven-year solar cycle suggested by Schwabe in 1843; see Newton(1958). Thispeak has disappeared in the case m = 20 indicating that the bias of theestimate has become appreciable. Because this peak is of special interest, wehave plotted/r*(r)(X) in the case m — 2 in an expanded scale in Figure 5.4.4.In this figure there is an indication of a peak near the frequency .030ir thatis the first harmonic of .01 ST.

Figures 5.4.5 to 5.4.8 present the spectral estimate /r*<r)(X) for theseries of mean monthly rainfall whose periodogram was given as Figure5.2.2. The statistic is calculated for m = 2, 5, 7, 10. Once again the increas-ing stability of the estimate as m increases is apparent. The substantial peak

Figure 5.4.4 Low frequency portion of logto fxx(T)(^)for monthly mean sunspot numbersfor the years 1750-1965 with five periodogram ordinates averaged.

Figure 5.4.5 fxx(T)0<) of composite rainfall series of England and Wales for the years1789-1959 with five periodogram ordinates averaged. (Logarithmic plot.)

Figure 5.4.6 fXx(T)(X)of composite rainfall series of England and Wales for the years1789-1959 with eleven periodogram ordinates averaged. (Logarithmic plot.)

Figure 5.4.7 fxx(T)(Xof composite rainfall series of England and Wales for the years1789-1959 with fifteen periodogram ordinates averaged. (Logarithmic plot.)

Figure 5.4.8 fxx(T)(X) of composite rainfall series of England and Wales for the years1789-1959 with twenty-one periodogram ordinates averaged. (Logarithmic plot.)


in the figures occurs at a frequency of one cycle per year as would be ex-pected in the light of the seasonal nature of the series. For other values ofA, fxx(T)(ty is near constant suggesting that the series is made up approx-imately of an annual component superimposed on a pure noise series.

Figure 5.4.9 presents some empirical evidence related to the validity ofTheorem 5.4.3. It is a X30

2 probability plot of thevaluesfxx(T)(2irs/T),T = 2592, s = r/4,. .. , (r/2) - 1, for the series of monthly sunspots./¥A-(r)(A) has been formed by smoothing 15 adjacent periodogram ordinates.If/rXA) is near constant for ir/2 < X < T, as the estimated spectra suggest,and it is reasonable to approximate the distributionoffxx(T)0$by a multipleof Xso2, as Theorem 5.4.3 suggests, then the plotted values should tend to fallalong a straight line. In fact the bulk of the points plotted in Figure 5.4.9appear to do this. However, there is definite curvature for the rightmostpoints. The direction of this curvature suggests that the actual distributionmay have a shorter right-hand tail than that of a multiple of Xso2.

Figure 5.4.9 X302 probability plot of the upper 500 power spectrum estimates, when fifteen

periodograms are averaged, of monthly mean sunspot numbers for the years 1750-1965.

where

Let s(T) be an integer such that 2irs(T)/T is near X and 2[s(T) + j\ ^ 0 (modT), j = 0, ±1,.. ., ±w. Consider the estimate

5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES

The spectral estimate of the previous section weights all periodogramordinates in the neighborhood of X equally. If fxx(a) is near constant for anear X, then this is undoubtedly a reasonable procedure; however, if fxx(a)varies to an extent, then it is perhaps more reasonable to weight periodo-gram ordinates in the immediate neighborhood of X more heavily than thoseat a distance. We proceed to construct an estimate that allows differentialweighting.

Let Wj,j — 0, ±1, . . . , ±w be weights satisfying


It is important to note how informative it is to have calculated/r*(r)(X)not just for a single value of m, but rather for a succession of values. Thefigures for small values of m help in locating any nearly periodic componentsand their frequencies, while the figures for large values of m give exceedinglysmooth curves that could prove useful in model fitting. In the case that thevalues Ixxm(2irs/T), s = 1, 2 , . . . are available (perhaps calculated usingthe Fast Fourier Transform), it is an elementary matter to prepare estimatesfor a succession of values of m.

The suggestion that an improved spectral estimate might be obtained bysmoothing the periodogram was made by Daniell (1946); see also Bartlett(1948b, 1966), the paper by Jones (1965), and the letter by Tick (1966).Bartlett (1950) made use of the x2 distribution for smoothed periodogramestimates.

Because of the shape of Fr(X), both /4r(X) and #r(X) will be weight functionsprincipally concentrated in the interval ( — 2Trm/T,2irm/T) for — x < X < TT.BT(\) will differ from /4r(X) in having negligible mass for —2-rr/T < X< 2ir/T. In the case of equal weights, AT(\) and fir(X) are rectangular inshape. In the general case, the shape of /4r(X) will mimic that of WjJ = — m,. . . , 0, . . . , m.

Turning to an investigation of the properties of this estimate we beginwith

Theorem 5.5.1 Let X(t), t = 0, il,... be a real-valued series withEX(t) = cx and cov{*0 + «), X(i)\ = cxx(u) for t, u = 0, ±1,Suppose

Set

5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES 143

In order to discuss the expected value of this estimate we must definecertain functions. Set

Let/™(r)(X) be given by (5.5.2) to (5.5.4), then

and T is even

and T is odd.

The expected value of the estimate (5.5.2) to (5.5.4) differs from that of theestimate of Section 5.4 in the nature of the weighted average of the power


spectrum fx*(«)• Because we can affect the character of the weighted averageby the choice of the Wj, we may well be able to produce an estimate withless bias than that of Section 5.4 in the case that/**(«) varies in the neigh-borhood of X.

Corollary 5.5.1 Suppose in addition to the assumptions of the theorem,X - 2vs(T)/T = O(r-J) and

then

In the limit, fxx(T)(X) is an asymptotically unbiased estimate of fxx(ty-

Turning to the second-order moment structure, we have

Theorem 5.5.2 Let X(t), t — 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.2(1). Let fxx(T\X) be given by (5.5.2) to (5.5.4) withX - 2Trs(T)/T = O(r-0. Suppose X ± M ^ 0 (mod 2*-), then

Also

The variance of the estimate is seen to be proportional to ^/ W/2 forlarge T. We remark that

and so ]T)j Wp is minimized, subject to ]£_,• W:t = 1, by setting

It follows that the large sample variance of/>A-(r)(X) is minimized by takingit to be the estimate of Section 5.4. Following the discussion after Theorem

In the case W} = 1 /(2m + 1), this leads us back to the approximationsuggested by Theorem 5.4.3. The approximation of the distribution of

and

or

and

5.5 A GENERAL CLASS OF SPECTRAL ESTIMATES 145

5.5.1, it may well be the case that the estimate of Section 5.4 has greater biasthan the estimate (5.5.2) involving well-chosen Wj.

Turning to an investigation of limiting distributions we have

Theorem 5.5.3 Let X(t), t = 0, ±1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let/or(r)(A) given by (5.5.2) to (5.5.4) with 2vs(T)/T-* \as r-> oo. Suppose X, db \k ̂ 0 (mod 2*) for 1 ̂ j < k ^ J. Thenfxx(T)(\\),. .. ,/A-jf(r)(X/)areasymptoticallyndependentwithfxxm(tyasymptotically

The different chi-squared variates appearing are statistically independent.

The asymptotic distribution of fxx(T)(ty is seen to be that of a weightedcombination of independent chi-squared variates. It may prove difficult touse this as an approximating distribution in practice; however, a standardstatistical procedure (see Satterthwaite (1941) and Box (1954)) is to approxi-mate the distribution of such a variate by a multiple, 0X,2, of a chi-squaredwhose mean and degrees of freedom are determined by equating first- andsecond-order moments. Here we are led to set


power spectral estimates by a multiple of a chi-squared was suggested byTukey (1949). Other approximations are considered in Freiberger andGrenander (1959), Slepian (1958), and Grenander et al (1959).

In this section we have obtained a flexible estimate of the power spectrumby introducing a variable weighting scheme for periodogram ordinates. Wehave considered asymptotic properties of the estimate under a limiting pro-cedure involving the weighting of a constant number, 2m + 1, of periodo-gram ordinates as T —» <». For some purposes this procedure may suggestvalid large sample approximations, for other purposes it might be better toallow m to increase with T. We turn to an investigation of this alternatelimiting procedure in the next section, which will lead us to estimates thatare asymptotically normal and consistent.

It is possible to employ different weights Wj or different m in separateintervals of the frequency domain if the character of fxx(X) differs in thoseintervals.

5.6 A CLASS OF CONSISTENT ESTIMATES

The class of estimates we consider in this section has the form

where Wm(a), — <» < a < », 7 = 1,2, . . . is a family of weight functionsof period 2ir whose mass is arranged so that estimate (5.6.1) essentially in-volves a weighting of 2wr + 1 periodogram ordinates in the neighborhoodof X. In order to obtain an estimate of diminishing variance as T —> oo, wewill therefore require mr —» °° in contrast to the constant m of Section 5.5.Also the range of frequencies involved in the estimate (5.6.1) is 2ir(2mr + 1 )/T,and so, in order to obtain an asymptotically unbiased estimate, we will re-quire nij/T —»0 as T —•» °°. The estimate inherits the smoothness propertiesof WW(a).

A convenient manner in which to construct the weight function W(T\appearing in the estimate (5.6.1) and having the properties referred to, is toconsider a sequence of scale parameters BT, T = 1 ,2 , . . . with the propertiesBT > 0, BT —> 0, BTT -> » as T —> oo and to set

where W(fi), — oo < /? < oo , is a fixed function satisfying

5.6 A CLASS OF CONSISTENT ESTIMATES 147

Assumption 5.6.1 W(fi), — <» < ft < «>, is real-valued, even, of boundedvariation

that help to explain its character. For large T, in view of (5.6.3), the sum ofthe weights appearing in (5.6.1) should be near 1. The worker may wish toalter (5.6.1) so the sum of the weights is exactly 1. This alteration will haveno effect on the asymptotic expressions given below.

Turning to a large sample investigation of the mean of fxx(T)(X) we have

Theorem 5.6.1 Let X(t\ t = 0, ±1, ... be a real-valued series withEX(i) = cx and cov{X(t + u),X(f)\ = cxx(u) for /, u = 0, ±1, Suppose

Letfxx(T)(\) be given by (5.6.1) where W(fi) satisfies Assumption 5.6.1. Then

and

If we choose W(@) to be 0 for \0\ > 2ir, then we see that the estimate(5.6.1) involves the weighting of the IBrT + 1 periodogram ordinates whosefrequencies fall in the interval (X — 27rJ3r,X + 2?rBr). In terms of the intro-duction to this section, the identification mr = BjT is made.

Because W(T)(a) has period 2ir, the same will be true of fxx(T)(\). Like-wise, because W™(-a) = W(T)(a), we will have f x x ( T ) ( - X ) =fxx(T)(\).The estimate (5.6.1) is not necessarily non-negative under Assumption 5.6.1;however if, in addition, we assume W(ft) ^ 0, then it will be the case thatfxx(T)(\) 2 0. Because of (5.6.3), JJ' W^(a)da = 1.

In view of (5.6.2) we may set down the following alternate expressionsfor/™(7->(\),

The error terms are uniform in X.

In Corollary 5.6.2 below we make use of the function

This is a periodic extension of the Kronecker delta function


The expected value offxx(T)(X)is seen to be a weighted average of thefunction fxx(a\ — <» < a < », with weight concentrated in an intervalcontaining X and of length proportional to BT. We now have

Corollary 5.6.1 Under the conditions of the theorem and if BT —» 0 asT —» oo, fxx(T)0\) is an asymptotically unbiased estimate offxxO^), that is,

The property of being asymptotically unbiased was also possessed by theestimate of Section 5.5. Turning to second-order large sample properties ofthe estimate we have

Theorem 5.6.2 Let X(i), t — 0, ± 1,. . . be a real-valued series satisfyingAssumption 2.6.2(1). Let fxx(T}(X) be given by (5.6.1) where W(p) satisfiesAssumption 5.6.1. Then

Corollary 5.6.2 Under the conditions of Theorem 5.6.2 and if BrT —» <» as


In the case of p = \, this corollary indicates

In either case the variance of /r*(r'(X) is tending to 0 as BjT —» <». InCorollary 5.6.1 we saw that Efxx

(T\^) ~^fxx(\) as T -» <*> if BT -> 0. There-fore the estimate (5,6,1) has the property

under the conditions of Theorem 5.6.2 and if BT —» 0, BTT~-» °° as T -> °°.Such an estimate is called consistent in mean square.

Notice that in expression (5.6.13) we have a doubling of variance atX = 0, ±T, ±2r Expression (5.6.9) is much more informative in thisconnection. It indicates that the transition between the usual asymptoticbehavior and that at X = 0, ±T, ±2w, , . . takes place in intervals aboutthese points whose length is of the order of magnitude of BT.

We see from (5.6.12), t\iatfxxm(*),fxxm(n) are asymptotically uncorre-lated as T —» «, provided X - ^ , X - f - / u ^ O (mod IK). Turning to theasymptotic distribution ofyWl"(X), we have

Theorem 5.6.3 Let X(t), t = 0, ± 1, . . . be a real-valued series satisfyingAssumption 2.6.1. Let fxx(nW be given by (5.6.1) with W(&) satisfyingAssumption 5.6.1. Supposefxx(\j) ?* 0,y = 1, . . . , 7. Then/V,r(T)(Xi)fxx(T)(^j) are asymptotically normal with covariance structure given by(5.6.12)asT-> => with BTT—> », fir —» 0.

The estimate considered in Section 5.4 had an asymptotic distributionproportional to chi-squared under the assumption that we were smoothing afixed number of periodogram ordinates. Here the number of periodogramordinates being smoothed is increasing to <» with T and so it is not sur-prising that an asymptotic normal distribution results. One interesting impli-cation of the theorem is that fxxmW and fxxm(n) are asymptoticallyindependent if X ±fc M ^ 0 (mod 2*-). The theorem has the following:

Corollary 5.6.3 Under the conditions of Theorem 5.6.3 and if fxxM ^ 0,logio/tA-<r)(X) is asymptotically normal with

var


This corollary suggests that the variance of log/AMr(r)(X) may not dependtoo strongly on the magnitude of fxx(ty, nor generally on X, for large T.Therefore, it is probably more sensible toplotthestatisticlogfxx(T)(^),rather than /o-(r)(X) itself. In fact this has been the standard engineeringpractice and is what is done for the various estimated spectra of this chapter.

Consistent estimates of the power spectrum were obtained by Grenanderand Rosenblatt (1957) and Parzen (1957, 1958). The asymptotic mean andvariance were considered by these authors and by Blackman and Tukey(1958). Asymptotic normality has been demonstrated by Rosenblatt (1959),Brillinger (1965b, 1968), Brillinger and Rosenblatt (1967a), Hannan (1970),and Anderson (1971) under various conditions. Jones (1962a) is also ofinterest.

In the case that the data have been tapered prior to forming a powerspectral estimate, Theorem 5.6.3 takes the form

Theorem 5.6.4 Let X(t), t = 0, db 1,. . . be a real-valued series satisfyingAssumption 2.6.1. Let h(t), — <» < / < °°, be a taper satisfying Assumption4.3.1. Let W(a), — oo < a < <», satisfy Assumption 5.6.1. Set

where

Set

Let BT-+O, BTT-><*> as r-»oo. Then fxxm(\\),.. . ,fxx(T)(\j)areasymptotically jointly normal with

and

5.7 CONFIDENCE INTERVALS 151

By comparison with expression (5.6.12) the limiting variance of thetapered estimate is seen to differ from that of the untapered estimate bythe factor

By Schwarz's inequality this factor is ^ 1, In the case where we employ acosine taper extending over the first and last 10 percent of the data, its valueis 1.116. It is hoped that in many situations the bias of the tapered estimatewill be reduced so substantially as to more than compensate for this increasein variance. Table 3.3.1 gives some useful tapers.

5.7 CONFIDENCE INTERVALS

In order to communicate an indication of the possible nearness of anestimate to a parameter, it is often desirable to provide a confidence intervalfor the parameter based on the estimate. The asymptotic distributions deter-mined in the previous sections for the various spectral estimates may beused in this connection. We first set down some notation. Let z(a), x,2^*)denote numbers such that

and

where z is a standard normal variate and X,2 a chi-squared variate with vdegrees of freedom.

Consider first the estimate of Section 5.4,

If we take logarithms, this interval becomes

for 2irj(r)/rnear \ ^ 0 (mod *). Theorem 5.4.3 suggests approximating itsdistribution byfxxWx*^/^™ + 2)- This leads to the following lOOy per-cent confidence interval (arfxx(\)


The degrees of freedom and multipliers of chi-squared will be altered inthe case X = 0 (mod TT) in accordance with the details of Theorem 5.4.3.

In Figure 5.7.1 we have set 95 percent limits around the estimate, corre-sponding to m = 2, of Figure 5.4.4. We have inserted these limits in twomanners. In the upper half of Figure 5.7.1 we have proceeded in accordancewith expression (5.7.5). In the lower half, we have set the limits around astrongly smoothed spectral estimate; this procedure has the advantage ofcausing certain peaks to stand out.

In Section 5.5, we considered the estimate

involving a variable weighting of periodogram ordinates. Its asymptotic dis-tribution was found to be that of a weighted sum of exponential variates.This last is generally not a convenient distribution to work with; however,in the discussion of Theorem 5.5.3 it was suggested it be approximated byfxx(\W/vwhere

in the case X ̂ 0 (mod TT). Taking this value of v, we are led to the following1007 percent confidence interval for log/o-(X),

If Wj = l/(2m + 1), j = 0, ±1, . . . , ±m, then the interval (5.7.8) is thesame as the interval (5.7.5).

If v is large, then logio {**2A) is approximately normal with mean 0 andvariance 2(.4343)2/V- The interval (5.7.8) is therefore approximately

Interval (5.7.9) leads us directly into the approximation suggested by theresults of Section 5.6. The estimate considered there had the form

Figure 5.7.1 Two mannersof setting 95 percent confi-dence limits about the powerspectrum estimate of Figure5.4.4.


the interval (5.7.11) is in essential agreement with the interval (5.7.9). Theintervals (5.7.9) and (5.7.11) are relevant to the case X ̂ 0 (mod TT). IfX s= 0 (mod T), then the variance of the estimate is approximately doubled,indicating we should broaden the intervals by a factor of -^2.

In the case that we believe fxx(a) to be a very smooth function in someinterval about X, an ad hoc procedure is also available. We may estimate thevariance of fxx(T)(tyfrom the variation offxx(T)(a)in the neighborhood ofX. For example, this might prove a reasonable procedure for the frequencies7T/2 < X < ir in the case of the series of monthly mean sunspot numbersanalyzed previously.

The confidence intervals constructed in this section apply to the spectralestimate at a single frequency X. A proportion 1 — 7 of the values may beexpected to fall outside the limits. On occasion we may wish a confidenceregion valid for the whole frequency range. Woodroofe and Van Ness (1967)determined the asymptotic distribution of the variate

where NT —> °° as T —» oo. An approximate confidence region for fxx(\),0 < X < ir, might be determined from this asymptotic distribution.

5.8 BIAS AND PREFILTERING

In this section we will carry out a more detailed analysis of the bias ofthe proposed estimates of a power spectrum. We will indicate how an ele-mentary operation, called prefiltering, can often be used to reduce this bias.We begin by considering the periodogram of a series of tapered values. Forconvenience, assume EX(f) = 0, although the general conclusions reachedwill be relevant to the nonzero mean case as well.

Because

Corollary 5.6.3 leads us to set down the following 1007 percent confidenceinterval for logio/rX^)>

5.8 BIAS AND PREFILTERING 155

Let

where h(u) is a tapering function vanishing for u < 0, u > 1. The peri-odogram here is taken to be

If we define the kernel

where

then Theorem 5.2.3 indicates that

We proceed to examine this expected value in greater detail. Set

As we might expect from Section 3.3, we have

Theorem 5.8.1 LeiX(t), t = 0, ± 1, . . . be a real-valued series with 0 meanand autocovariance function satisfying

Let the tapering function h(u) be such that k(T)(u) given by (5.8.6) satisfies

Let Ixx(T)(X) be given by (5.8.2), then

where/rjr'^X) is the/rth derivative offxx(ty- The error term is uniform in X.


From its definition, fc(r>(w) = k(T)(-u) and so the kp in (5.8.8) are 0 forodd p. The dominant bias term appearing in (5.8.9) is therefore

This term is seen to depend on both the kernel employed and the spectrumbeing estimated. We will want to chose a taper so that jfo) is small. In fact, ifwe use the definition (3.3.11) of bandwidth, then the bandwidth of thekernel K(T)(a) is -\j\k2\/T also implying the desirability of small \ki\. Thebandwidth is an important parameter in determining the extent of bias. Inreal terms, the student will have difficulty in distinguishing (or resolving)peaks in the spectrum closer than ^\k2\/T apart. This was apparent to anextent from Theorem 5.2.8, which indicated that the statistics Ixx(T)(ty andfxx(T}(n) were highly dependent for \ near /x- Expressions (5.8.9) and(5.8.10) do indicate that the bias will be reduced in the case that/**(«) isnear constant in a neighborhood of X; this remark will prove the basis forthe operation of prefiltering to be discussed later.

Suppose next that the estimate

with 2irs(T)/T near X and

is considered. Because

the remarks following Theorem 5.8.1 are again relevant and imply that thebias of (5.8.11) will be reduced in the case that ki is small orfxx(a) is nearconstant. An alternate way to look at this is to note, from (5.8.5), that

where the kernel

appearing in (5.8.14) has the shape of a function taking the value Wj for anear 2irj/T,j = 0, ±1,. . . , ±m. In crude terms, this kernel extends over aninterval m times broader than that of K(T)(a) and so tffxx(a) is not constant,


the bias of (5.8.11) may be expected to be greater than that of /*jr(r)(X). Itwill generally be difficult to resolve peaks in fxx(X) nearer than m^k.2/Twith the statistic (5.8.11). The smoothing with weights Wj has caused a lossin resolution of the estimate Ixx(T)(ty. It must be remembered, however, thatthe smoothing was introduced to increase the stability of the estimate and itis hoped that the smoothed estimate will be better in some overall sense.

We now turn to a more detailed investigation of the consistent estimateintroduced in Section 5.6. This estimate is given by

with Ixx(T)(X) given by (5.8.2) and W(T)(a) given by (5.6.2).

Theorem 5.8.2 Let X(i), t = 0, ±1, . . . be a real-valued series withEX(t) = 0 and autocovariance function satisfying

for some P ^ 1. Let the tapering function h(u) be such that k(T)(u) of (5.8.6)satisfies (5.8.8) for \u\ ^ T. Let/^"(X) be given by (5.8.16) where W(a)satisfies Assumption 5.6.1. Then

The error terms are uniform in X.

From expression (5.8.18) we see that advantages accrue from tapering inthis case as well. Expression (5.8.18) indicates that the expected value isgiven, approximately, by a weighted average with kernel W(T)(a) of thepower spectrum of interest. The bandwidth of this kernel is

and so is of order O(#r).In Corollary 5.8.2 we set

Now if/ATA-(a:) is constant, fxx, then (5.8.24) equalsfxx exactly. This suggeststhat the nearer/A-*(«) is to being constant, the smaller the bias. Suppose thatthe series X(t), / = 0, ±1,.. . is passed through a filter with transfer func-tion A(\). Denote the filtered series by Y(t), t = 0, ± 1,. . . . From Example2.8.1, the oower spectrum of this series is eiven bv

with inverse relation

Let/yy(T)(X) be an estimate of the power spectrum of the series Y(t). Rela-tion (5.8.26) suggests the consideration of the statistic


Corollary 5.8.2 Suppose in addition to the conditions of Theorem 5.8.2

then

Because W(P) = W(-&), the terms in (5.8.22) with p odd drop out. Wesee that the bias, up to order B^~l, may be eliminated by selecting a W(p)such that Wp = 0 for p = 1,.. . , P - 1. Clearly such a W(&) must take onnegative values somewhere leading to complications in some situations. IfP = 3, then (5.8.22) becomes

Now from expression (5.8.19) the bandwidth of the kernel W{T)(a) is essen-tially Br^W2/1 and once again the bias is seen to depend directly on boththe bandwidth of the kernel and the smoothness of /**(«) for « near X.

Tne discussion of Section 3.3 gives some help with the question of whichkernel W(T)(d) to employ in the smoothing of the periodogram. Luckily thisquestion can be made academic in large part by a judicious filtering of thedata prior to estimating the power spectrum. We have seen that £/V*(r)(X) isessentially given by

and proceed as above. In the case that the series X(f) is approximately anautoregressive scheme of order m, see Section 2.9; this must be a near opti-mum procedure. It seems to work well in other cases also.

A procedure of similar character, but not requiring any filtering of thedata, is if the series Y(t) were obtained from the series X(f) by filtering withtransfer function A(\), then following (5.3.20) we have

then form the filtered series

and so

Had A(\) been chosen so that \A(a)\2fxx(a) were constant, then (5.8.28)would equal /r*(X) exactly. This result suggests that in a case where/*•*(«)is not near constant, we should attempt to find a filter, with transfer functionA(\), such that the filtered series Y(t) has near constant power spectrum;then we should estimate this near constant power spectrum from a stretch ofthe series Y(i)\ and finally, we take |/4(X)|-2/yy(r)(X) as an estimateoffxxO>).This procedure is called spectral estimation by prefiltering or prewhitening; itwas proposed in Press and Tukey (1956). Typically the filter has been deter-mined by ad hoc methods; however, one general procedure has been pro-posed by Parzen and Tukey. It is to determine the filter by fitting an autore-gressive scheme to the data. Specifically, for some m, determine a(r)(l),. . . ,a(T\m) to minimize


as an estimate of /A-A-(X). Following the discussion above the expected valueof this estimate essentially equals


The discussion above now suggests the following estimate of fxx(X),

A similar situation holds if the ordinate Ixx(T)(2irS/T) is dropped from theestimate. Since the values dx(T)(2*s/T), s = 0,. . . , S - I, S + 1, . . . , T/2are unaffected by whether or not a multiple of the series exp {±i2irSt/T},t = 0,. .. , T 1 is subtracted droppingIxx(T)(2frS/Tis equivalent toforming the periodogram of the values X(t), t = 0, . . . , T — 1 with bestfitting sinusoid of frequency 2irS/T removed. The idea of avoiding certainfrequencies in the smoothing of the periodogram appears in Priestley(1962b), Bartlett (1967) and Brillinger and Rosenblatt (1967b).

Akaike (1962a) discusses certain aspects of prefiltering. We sometimeshave a good understanding of the character of the filter function, A(\), usedin a prefiltering and so are content to examine the estimated spectrum of thefiltered series Y(t) and not bother to divide it by |/4(X)|2.

5.9 ALTERNATE ESTIMATES

Up until this point, the spectral estimates discussed have had the char-acter of a weighted average of periodogram values at the particular fre-quencies 2-n-s/r, s = 0,. . ., T — 1. This estimate is useful because theseparticular periodogram values may be rapidly calculated using the FastFourier Transform Algorithm of Section 3.5 if T is highly composite and inaddition their joint large sample statistical behavior is elementary; seeTheorems 4.4.1 and 5.2.6. In this section we turn to the consideration ofcertain other estimates.

where

where the function A(a) has been chosen in the hope that \A(a)\2fxx(of) isnear constant. This estimate is based directly on the discrete Fourier trans-form of the values X(i), t = 0 , . . . , T — 1 and is seen to involve the smooth-ing of weighted periodogram ordinates. In an extreme situation where/A-x(ot)appears to have a high peak near 2irS/T ^ X,wemaywishtotake/4(27rS/r) = 0.The sum in (5.8.33) now excludes the periodogramordinateIxx(T)(2irS/T)altogether. We remark that the ordinate Ixx(T}(0) is already missing fromthe estimate (5.8.16). Following the discussion of Theorem 5.2.2, this isequivalent to forming the periodogram of the mean adjusted values

a1

5.9 ALTERNATE ESTIMATES 161

The estimate considered in Section 5.6 has the specific form

where

If the discrete average in (5.9.1) is replaced by a continuous one, this esti-mate becomes

Now

If this is substituted into (5.9.2), then that estimate takes the form

where

The estimate (5.9.5) is of the general form investigated by Grenander(1951a),Grenander and Rosenblatt (1957), and Parzen (1957); it contains as par-ticular cases the early estimates of Bartlett (1948b), Hamming and Tukey(1949), and Bartlett (1950). Estimate (5.9.5) was generally employed untilthe Fast Fourier Transform Algorithm came into common use.

In fact the estimates (5.9.1) and (5.9.2) are very much of the same char-acter as well as nearly equal. For example, Exercise 5.13.15 shows that(5.9.5) may be written as the following discrete average of periodogramvalues

for any integer S ^ 2T — 1; see also Parzen (1957). The expression (5.9.7)requires twice as many periodogram values as (5.9.1). In the case that S is

— IT < a, X < ir, A small, and

for some finite L and — <» < X < ».

It is seen that, in the case that BT does not tend to 0 too quickly, theasymptotic behavior of the two estimates is essentially identical.

The discussion of the interpretation of power spectra given in Section5.1 suggests a spectral estimate. Specifically, let A(a) denote the transferfunction of a band-pass filter with the properties


highly composite it may be rapidly computed by the Fast Fourier Transformof the series

or by computing CXX{T)(U), u — 0, ±1,. .. using a Fast Fourier Transformas described in Exercise 3.10.7, and then evaluating expression (5.9.5), againusing a Fast Fourier Transform. In the reverse direction, Exercise 5.13.15shows that the estimate (5.9.1) may be written as the following continuousaverage of periodogram values

where

A uniform bound for the difference between the two estimates is provided by

Theorem 5.9.1 Let W(a), — °° < a. < °°, satisfy Assumption 5.6.1 andhave a bounded derivative. Then

The estimate (5.9.15) therefore has similar form to estimate (5.4.1).In Theorem 5.3.1 we saw that periodogram ordinates of the same fre-

quency, X ̂ 0 (mod ?r), but based on different stretches of data wereasymptotically independent fxx(^)X22/2 variates. This result suggests that

and so is approximately equal to

and approximately equals 0 otherwise. Using Parseval's formula

If dxl.]\)(2irs/T) denotes the discrete Fourier transform of the filtered valuesX(t,\\ t = 0,. . . , T - 1 and 2irs(T)/T = X, then

in the case X ̂ 0 (mod 2r). In fact, it appears that this last is the firstspectral estimate used in practice; see Pupin (1894), Wegel and Moore(1924), Blanc-Lapierre and Fortet (1953). It is the one generally employedin real-time or analog situations. Turning to a discussion of its character, webegin by supposing that

This suggests the consideration of the estimate

The construction of filters with such properties was discussed in Sections 2.7,3.3, and 3.6. If X(t,\), t = 0, db l , . . . denotes the output series of such afilter, then



we construct a spectral estimate by averaging the periodograms of differentstretches of data. In fact we have

Theorem 5.9.2 Let X(f), t — 0, ±1 , . . . be a real-valued series satisfyingAssumption 2.6.1. Let

where T = LV. Then fxx(T)(X) is asymptotically fxx(^2L2/(2L) ifX ^ 0 (mod IT) and asymptotically fxx(X)*L2/L if X = ±ir, ±3w,. . . asK—> oo.

Bartlett (1948b, 1950) proposed the estimate (5.9.21); it is also discussed inWelch (1967) and Cooley, Lewis, and Welch (1970). This estimate has theadvantage of requiring fewer calculations than other estimates, especiallywhen V is highly composite. In addition it allows us to examine the assump-tion of stationarity. Welch (1967) proposes the use of periodograms basedon overlapping stretches of data. Akcasu (1961) and Welch (1961) con-sidered spectral estimates based on the Fourier transform of the data. Theresult of this theorem may be used to construct approximate confidencelimits for/™(X), if we think of the Ixx

(y)(\,l), / = 0,. . . , L - 1 as L inde-pendent estimates of fxxfr).

In the previous section it was suggested that an autoregressive scheme befitted to the data in the course of estimating a power spectrum. Parzen (1964)suggested that we estimate the spectrum of the residual series 7(/) for asuccession of values m and when that estimate becomes nearly flat wetake

as the estimate of fxx(ty, where A(T)(\~) is the transfer function of the filtercarrying the series over to the residual series. This procedure is clearlyrelated to prefiltering. Certain of its statistical properties are considered inKromer (1969), Akaike (1969a), and Section 8.10.

In the course of the work of this chapter we have seen the importantmanner in which the band-width parameter m, or BT, affects the statisticalbehavior of the estimate. In fact, if we carried out some prewhitening of the

f

This is essentially the estimate (5.9.21) if J = V. In the case H(x) = jr1, theestimate takes the form


series, the shape of the weight function appearing in the estimate appears un-important. What is important is its band-width. We have expected the stu-dent to determine m, or BT, from the desired statistical stability. If the de-sired stability was not clear, a succession of band-widths were to be em-ployed. Leppink (1970) proposes that we estimate BT from the data andindicates an estimate; see also Picklands (1970). Daniels (1962) and Akaike(1968b) suggest procedures for modifying the estimate.

In the case that X(t), t = 0, ±1,... is a 0 mean Gaussian series, an esti-mate of fxx(X), based solely on the values

(where sgn X = 1 if X > 0, sgn X = — 1 if X < 0), was proposed byGoldstein; see Rodemich (1966), and discussed in Hinich (1967), McNeil(1967), and Brillinger (1968). Rodemich (1966) also considered the problemof constructing estimates of fxx(ty from the values of X(f) grouped in ageneral way.

Estimates have been constructed by Jones (1962b) and Parzen (1963a) forthe case in which certain values X(t), t = 0,. . . , T — 1 are missing in asystematic manner. Brillinger (1972) considers estimation for the case inwhich the valuesX(T\), . . . , X(Tn) are available n,. . . , rn being the times ofevents of some point process. Akaike (1960) examines the effect of observingX(t) for / near the values 0 ,1 , . . . , T — \ rather than exactly at these values;this has been called jittered sampling.

Pisarenko (1972) has proposed a flexible class of nonlinear estimates. Letthe data be split into L segments. Let cxxm(u,l), u = 0, ±1, . . . ; / = 0,. . . , L — \ denote the autocovariance estimate of segment /. Let HJ(T), U/r),j = 1 , . . . , /denote the latent roots and vectors of [Ir1 J)/ cxx(T}(J — A:,/);j, k = 1, . . . , J]. Pisarenko suggests the following estimate offxx(X),

where H(x), 0 < x < », is a strictly monotonic function with inverse h(.).He was motivated by the definition 3.10.27 of a function of a matrix.In the case H(x) = x, the estimate (5.9.24) may be written

of


with [Cjk(T}] the inverse of the matrix whose latent values were computed.The estimate (5.9.26) was suggested by Capon (1969) as having high resolu-tion. Pisarenko (1972) argues that if J, L —> » as T —+ <», and if the series isnormal, then the estimate (5.9.24) will be asymptotically normal withvariance

Capon and Goodman (1970) suggest approximating the distribution of(5.9.26) by fxxWX2L-2j+i/2L if X ̂ 0 (mod TT) and by /™(X)xi_y+,/L ifX = ±TT, ±3?r, . . . .

Sometimes we are interested in fitting a parametric model for the powerspectrum. A useful general means of doing this was proposed in Whittle(1951, 1952a, 1961). Some particular models are considered in Box andJenkins (1970).

5.10 ESTIMATING THE SPECTRAL MEASURE ANDAUTOCOVARIANCE FUNCTION

Let X(i), t = 0, ±1,. . . denote a real-valued series with autocovariancefunction cxx(u), u = 0, ±1,. . . and spectral density fxA-(X), — °° < X < <».There are a variety of situations in which we would like to estimate thespectral measure

introduced in Section 2.5. There are also situations in which we would liketo estimate the autocovariance function

itself, and situations in which we would like to estimate a broad-bandspectral average of the form

W(a) being a weight function of period 2x concentrated near a = 0 (mod 2?r).The parameters (5.10.1), (5.10.2), and (5.10.3) are all seen to be particular

cases of the general form

5.10 ESTIMATING THE SPECTRAL MEASURE 1«7

For this reason we turn to a brief investigation of estimates of the parameter(5.10.4) for given A(a). This problem was considered by Parzen (1957).

As a first estimate we consider the statistic

where IxxmM, — °° < X < <*>, js the periodogram of a stretch of valuesX(t), t = 0,. . . , T — 1. Taking a discrete average at the points 2*s/T allowsa possible use of the Fast Fourier Transform Algorithm in the course of thecalculations.

Setting

otherwise

amounts to proposing

as an estimate of Fxx(\). Taking

we see from Exercise 3.10.8 that we are proposing the circular autoco-variance function

as an estimate of cxx(u). (HereX(t), / = 0, ± 1 , . . . is the period T extensionof X(t), t = 0,. . . , T - 1.) Taking

leads to our considering a spectral estimate of the form

The statistic of Exercise 5.13.31 is sometimes used to test the hypothesis that a stationary Gaussian series has power spextrum fxx). We have


Theorem 5.10.1 Let ̂ (0, f = 0, ±1, . . . be a real-valued series satisfyingAssumption 2.6.2(1). Let Aj(a), 0 ^ a < 2ir, be bounded and of boundedvariation for./ = 1, . . . , / . Then

Also

Finally J(T)(Aj), j — 1 , . . . ,7 are asymptotically jointly normal with theabove first- and second-order moment structure.

From expression (5.10.12) we see that J(T)(Aj) is an asymptotically un-biased estimate of J(Aj). From the fact that its variance tends to 0 as T —> <»,we see that it is also a consistent estimate.

In the case of estimating the spectral measure Fxx(ty, taking A(a) to be(5.10.6), expression (5.10.13) gives

In the case of estimating the autocovariance function cxx(u), where A(a) isgiven by (5.10.8), expression (5.10.13) gives

In this case of a constant weight function, the spectral estimatesfxxm(tyand fxx(T)(v) are not asymptotically independent as was the case for theweight functions considered earlier.

If estimates offxx(oi) andfxxxx(a,@,y) are computed, then we may substi-tute them into expression (5.10.13) to obtain an estimate of var J(T)(Aj). Thisestimate may be used together with the asymptotic normality, to constructapproximate confidence limits for the parameter.

In some situations the student may prefer to use the following estimate in-volving a continuous weighting,

where CXX{T\U) is the sample autocovariance function of (5.9.4) and

We see that the two estimates will be close for large T and that theirasymptotic distribution will be the same.

5.10 ESTIMATING THE SPECTRAL MEASURE 169

In the case of the broad-band spectral estimate of (5.10.10), expression(5.10.13) gives

For example, if A(a) = exp {iua}, this gives the sample autocovariancefunction CXX(T)(U) itself, in contrast to the circular form obtained before.

The estimate (5.10.17) does not differ too much from the estimate (5.10.5).We have

Theorem 5.10.2 Let A(a), 0 ^ a ^ 2-jr, be bounded and of bounded varia-tion. Let^(0, t = 0, ±1,. . . satisfy Assumption 2.6.2(1). Then


The spectral measure estimate, Fxx(r'QC), given by (5.10.7) is some-times useful for detecting periodic components in a series and for examiningthe plausibility of a proposed model especially that of pure noise. InFigure 5.10.1 we give Fxx(T)(X)/Fxx(T)(ir),0 < X ̂ TT, for the series of meanmonthly sunspot numbers. The periodogram of this series was given inSection 5.2. The figure shows an exceedingly rapid increase at the lowestfrequencies, followed by a steady increase to the value 1. We remark that ifAA-(X) were constant in a frequency band, then the increase of Fxx(ty wouldbe linear in that frequency band. This does not appear to occur in Figure5.10.1 exceot. oossiblv. at freauencies above 7r/2.

Figure 5.10.1 Plot of Fxx(T)(^)/Fxx(T)M for monthly mean sunspot numbers for the years1750-1965.

The sample autocovariance function, CXX(T)(U), u = 0, ±1,.. ., of aseries stretch also is often useful for examining the structure of a series. InFigures 5.10.2 and 5.10.3 we present portions of CXX(T)(U) for the series ofmean annual and mean monthly sunspot numbers, respectively. The mostapparent character of these figures is the substantial correlation of values ofthe series that are multiples of approximately 10 years apart. The kink nearlag 0 in Figure 5.10.3 suggests that measurement error is present in this data.

Asymptotic properties of estimates of the autocovariance function wereconsidered by Slutsky (1934) in the case of a 0 mean Gaussian series. Bartlett

5.10 ESTIMATING THE SPECTRAL MEASURE 171

Figure 5.10.2 The autocovariance estimate, CXX(T)(U), for annual mean sunspot numbersfor the years 1750-1965.

Figure 5.10.3 The autocovariance estimate, CXX(T\U), for monthly mean sunspot numbersfor the years 1750-1965.


(1946) developed the asymptotic second-order moment structure in the caseof a 0 mean linear process. Asymptotic normality was considered in Walker(1954), Lomnicki and Zaremba (1957b, 1959), Parzen (1957), Rosenblatt(1962), and Anderson and Walker (1964). Akaike (1962a) remarked that itmight sometimes be reasonable to consider CXX(T\U), u = 0, ±1,. . . as asecond-order stationary times series with power spectrum 2irT-lfxx(X)2-Thiscorresponds to retaining only the second term on the right in (5.10.15).Brillinger (1969c) indicated two forms of convergence with probability 1 anddiscussed the weak convergence of the estimate to a Gaussian process.

5.11 DEPARTURES FROM ASSUMPTIONS

In this section we discuss the effects of certain elementary departures fromthe assumptions adopted so far in this chapter. Among the importantassumptions adopted are

and

where cxx(u) = cov \X(t + u),X(t)\ for t, u = 0, ±1,We first discuss a situation in which expression (5.11.2) is not satisfied.

Suppose that the series under consideration is

with RJ, Wj constants, 0y uniform on (—v,ir),j = !,...,/ and the series e(0satisying Assumption 2.6.1. The autocovariance function of the series (5.11.3)is quickly seen to be

and so condition (5.11.2) is not satisfied. We note that the spectral measure,Fxx(X), whose existence was demonstrated in Theorem 2.5.2, is given by

in this case where

This is essentially the procedure suggestion of Schuster (1898) to use theperiodogram as a tool for discovering hidden periodicities in a series. Theresult (5.11.10) suggests that we may estimate/,,(X) by smoothing the period-ogram Ixx(T)(X), avoiding the ordinates at frequencies in the immediateneighborhoods of the coy. If v periodogram ordinates Ixx(T)(2irs/T) (s aninteger) are involved in a simple averaging to form an estimate, then itfollows from Theorem 4.4.1 and expression (5.11.10) that this estimate willbe asymptotically fxx(^)^-2v

2/(2v)in the case X ̂ 0 (mod TT) with similarresults in the case X = 0 (mod T).

Bartlett (1967) and Brillinger and Rosenblatt (1967b) discuss the abovesimple modification of periodogram smoothing that avoids peaks. It isclearly related to the technique of prefiltering discussed in Section 5.8.

5.11 DEPARTURES FROM ASSUMPTIONS 173

and/«(X) denotes the spectral density of the series e(0, t = 0, ±1, . ... Thegeneralized derivative of expression (5.11.5) is

0 ^ X ̂ TT, 5(X) being the Dirac delta function. The function (5.11.7) has in-finite peaks at the frequencies Uj,j = 1,.. ., J, superimposed on a boundedcontinuous function, /e,(X). A series of the character of expression (5.11.3) issaid to be of mixed spectrum.

Turning to the analysis of such a series, we note from expression (5.11.3)that

where A(r)(X) is given in expression (4.3.14). Now the function A(r)(X) haslarge amplitude only for X s= 0 (mod 2?r). This means that

while

for |X ± wy| > S/T, -TT < X ̂ TT. The result (5.11.9) suggests that we mayestimate the coy by examining the periodogram 7*;r(r)(X) for substantialpeaks. At such a peak we might estimate Rj by


Other references include: Hannan (1961b), Priestley (1962b, 1964) andNicholls (1967). Albert (1964), Whittle (1952b), Hext (1966), and Walker(1971) consider the problem of constructing more precise estimates of the o>7

of (5.11.3).We next turn to a situation in which the condition of constant mean

(5.11.1) is violated. Suppose in the manner of the trend model of Section 2.12

for t = 0, ±1,... with <£i(/)) • • • > 4>XO known fixed functions, 0i, . . . , 6jbeing unknown constants, and e(0, t — 0, ± 1, . . . being an unobservable 0mean series satisfying Assumption 2.6.1. This sort of model was consideredin Grenander (1954). One means of handling it is to determine the leastsquares estimates 6$T\ . . ., 0/r) of 61,.. ., 6j by minimizing

and then to estimate /.,(X) from the residual series

t = 0,. . ., T — 1. We proceed to an investigation of the asymptoticproperties of such a procedure. We set down an assumption concerning thefunctions <f>\(t\ • • • > <k/(0-

Assumption 5.11.1 Given the real-valued functions <£//)> * = 0, ±1, . . . ,j — 1,. . . ,J, there exists a sequence NT, T = 1 ,2 , . . . with the propertiesNT —* °°, NT+I/NT —» 1 as T—> « such that

As examples of functions satisfying this assumption we mention

and finite collections of estimates/yr)(Ai),.. . ,fee(T)(^K) are asymptotically

jointly normal.

Its covariance function satisfies

and it is asymptotically normal. If BTT —> <» as T —> <», then the vanatefee

(T\$ is asymptotically independent of 6(r) with mean

where W(T)(a) = £, ̂ ^( '̂'[a 4- 2?r/]) and W(a) satisfies Assumption5.6.1. Then the variate 6(r) = [e\(T)- • -Bj™] has mean 6 = [6\ • • -8j]. Its Co-variance matrix satisfies


for constant Rj, a>j, <t>j,j= 1 , . . . , J. We see directly that

taking NT - T. Other examples are given in Grenander (1954).We suppose that /M/*(M) is taken as the entry in row j and column k of the

J X J matrix m^(«),y, k — 1,...,/. It follows from Exercise 2.13.31 thatthere exists an r X r matrix-valued function G^(X), — TT < X ̂ TT, whoseentries are of bounded variation, such that

for u = 0, =fc 1, . . . . We may now state

Theorem 5.11.1 Let e(t), t — 0, ±1,.. . be a real-valued series satisfyingAssumption2.6.2(l),havingOmeanandpowerspectrum/,,(X), — °° <X < «>.Let 0//), j — 1,...,/, / = 0, ±1,.. . satisfy Assumption 5.11.1 withm^(0) nonsingular. Let X(t) be given by (5.11.12) for some constants0 i , . . ., 6j. Let 0i(r),. . . , dj(T) be the least squares estimates of 6\, . . . , 6 j .Let e(t) be given by (5.11.14), and


Under the limiting procedure adopted, the asymptotic behavior of/ee(T)(X)

is seen to be the same as that of an estimate /«,(r)(X) based directly on theseries e(r), t — 0, ±1,. . . . We have already seen this in the case of a seriesof unknown mean, corresponding toJ = l,<£i(0 — l»0 i = cx, B\(T} — CX{T)

= T~l Sf=o *(0- The theorem has the following:

Corollary 5.11.1 Under the conditions of the theorem, 6(r) and/ee(r)(X) are

consistent estimates of 6 and /,,(X), respectively.

Other results of the character of this theorem will be presented in Chapter6. Papers related to this problem include Grenander (1954), Rosenblatt(1956a), and Hannan (1968). Koopmans(1966) is also of interest. A commonempirical procedure is to base a spectral estimate on the series of first differ-ences, e(f) = X(i) — X(t — 1), t = 1,. .. , T - 1. This has the effect ofremoving a linear trend directly. (See Exercise 3.10.2.)

A departure of much more serious consequence, than those consideredso far in this section, is one in which cov [X(t + w), X(t)\ depends on both tand u. Strictly speaking a power spectrum is no longer well defined, seeLoynes (1968); however in the case that cov [X(t + u), X(f)\ depends onlyweakly on t we can sometimes garner important information from a stretchof series by spectral type calculations. We can proceed by forming spectralestimates of the types considered in this chapter, but based on segments ofthe data, rather than all the data, for which the assumption of stationaritydoes not appear to be too seriously violated. The particular spectral esti-mate which seems especially well suited to such calculations is the one con-structed by averaging the squared output of a bank of band-pass filters (seeSection 5.9). Papers discussing this approach include Priestley (1965) andBrillinger and Hatanaka (1969).

A departure from assumptions of an entirely different character isthe following: suppose the series X(t) is defined for all real numbers /,— °o < t < °o. (Until this point we have considered X(i) defined fort = 0, ±1,. . . .) Suppose

is defined for — <» < t, u < <» and satisfies

then both

— a> < X < oo 5 are defined. The function fxxO^), — °° < X < °°, is calledthe power spectrum of the discrete series JT(/), / = 0, dt 1,. . . , whereasgxx(b), — °° < X < °°, is called the power spectrum of the continuous seriesX(t), — oo < t < oo. The spectrum gxx(ty may be seen to have very muchthe same character, behavior, and interpretation as/r*(X). The two spectra

/A-A-(X) and gA-A-(X) are intimately related because from (5.11.26) we have

We see from expression (5.11.29) that a frequency X in the discrete seriesX(t),t = 0, ±1,. . . relates to the frequencies X, X ± 2?r, . . . of the continuousseries AX/), — °=> < t < oo. As/o-(X) = fxx( — X), it also relates to the fre-quencies — X, — X ± 2ir,.... For this reason the frequencies

and


giving

f o r w = 0, ±1 From (5.11.25)

have been called aliases by Tukey. It will be impossible to distinguish theirindividual character by means of fxx(h) alone. As an example of the mean-ing of this, consider the series

— oo < / < OD , where 0 is uniform on ( — ir,ir). Considering this continuousseries we have

— oo cannot be determined directly, but only said to be one ofthese frequencies. An implication, for practice, of this discussion is that if apower spectral estimate fxx<r)(X) is computed for 0 ^ X ̂ TT, and is foundto have a peak at the frequency co, we cannot be sure which of the frequencies±o> + 2-irj, j — 0, ±1,. . . might be leading to the peak.

An example of such an occurrence with which the author was once con-cerned is the following: data were to be taken periodically on the number ofelectrons entering a conical horn on a spinning satellite of the Explorerseries. The electron field being measured was highly directional in characterand so the data could be expected to contain a substantial periodic com-ponent whose period was that of the satellite's rotation. It was planned thatthe satellite would rotate at such a rate and the data would be taken withsuch a time interval that the frequency of rotation of the satellite would fallin the interval 0 < X < IT. Unfortunately the satellite ended up spinningsubstantially more rapidly than planned and so the frequency of rotationfell outside the interval 0 < X < IT. The spectrum of the data was estimatedand found to contain a substantial peak. It then had to be decided which ofthe aliased frequencies was the relevant one. This was possible on thisoccasion because optical information was available suggesting a crude valuefor the frequency.

Sometimes a prefiltering of the data can be carried out to reduce the diffi-culties of interpretation caused by aliasing. Suppose that the continuoustime series AT(r), — «> < / < <», is band-pass filtered to a band of the sort[—Try — IT,—wj], [irj,irj + TT] prior to recording values at the time t = 0,±1, . . . . In this case we see from (5.11.29) that


This function has infinite peaks at X = ±co. Now considering the function(5.11.32) for / = 0, ±1,.. . , we have from (2.10.8) or (5.11.29)

and the interpretation is consequently simplified.We conclude this section by indicating some of the effects of sampling the

series X(i), — °° < t < °°, at a general time spacing h > 0. The values re-corded for analysis in the time interval [0,7*) are nov/X(uh), u = 0 , . . . , ( /— 1where U = T/h. If the series is stationary with cov \X(uh),X(G)} — cxx(uh),u — 0, ±1,. .. and

and proceed by smoothing Ixxm(X), for example.The problem of aliasing was alluded to in the discussion of Beveridge

(1922). Discussions of it were given in Press and Tukey (1956) and Blackmanand Tukey (1958).

5.12 THE USES OF POWER SPECTRUM ANALYSIS

In Chapter 1 of this work we documented some of the various fields ofapplied research wherein the frequency analysis of time series had provenuseful. In this section we indicate some examples of particular uses of thepower spectrum.

A Descriptive Statistic Given a stretch of data, X(i), t = 0,. . . , T — 1, thefunction fxx (r)(X) is often computed simply as a descriptive statistic. It con-

for |X| ̂ ir/h and no aliasing complications arise.When we come to estimate /o-(X) from the stretch X(uh), u = 0, .. .,

U — 1, T — uh, we define

The upper limit of the interval [0,7r/A], namely ir/h, is called the Nyquist fre-quency or folding frequency. If the series X(f), —• » < / < oo, possesses nocomponents with frequency greater than the Nyquist frequency, then

5.12 THE USES OF POWER SPECTRUM ANALYSIS 179

we define the power spectrum fxx(^), — °° < X < °°, by

and have the inverse relation

The power spectrum fxx(X) is seen to have period 2ir/h. As/**(— X) =fxx(ty, its fundamental domain may be taken to be the interval [0,ir/h]. Theexpression (5.11.29) is replaced by


denses the data, but not too harshly. For stationary series its approximatesampling properties are elementary. Its form is often more elementary thanthat of the original record. It has been computed in the hope that an under-lying mechanism generating the data will be suggested to the experimenter.Wiener (1957, 1958) discusses electroencephalograms in this manner. Thespectrum fxx(T)(h) has been calculated as a direct measure of the power, inwatts, of the various frequency components of an electric signal; see Bode(1945) for example. In the study of color (see Wright (1958)), the powerspectrum is estimated as a key characteristic of the color of an object.

We have seen that the power spectrum behaves in an elementary mannerwhen a series is filtered. This has led Nerlove (1964) and Godfrey and Karre-man (1967) to use it to display the effect of various procedures that havebeen proposed for the seasonal adjustment of economic time series. Cart-wright (1967) used it to display the effect of tidal filters.

Some further references taken from a variety of fields include: Condit andGrum (1964), Haubrich (1965), Yamanouchi (1961), Manwell and Simon(1966), Plageman et al (1969).

Informal Testing and Discrimination The use of power spectra for testingand discrimination has followed their use as descriptive statistics. In thestudy of color, workers have noted that the spectra of objects of differentcolor do seem to vary in a systematic way; see Wright (1958). Carpenter(1965) and Bullard (1966) question whether earthquakes and explosionshave substantially different power spectra in the hope that these two couldbe discriminated on the basis of spectra calculated from observed seismo-grams. Also, the spectra derived from the EEGs of healthy and neuro-logically ill patients have been compared in the hope of developing a diag-nostic tool; see Bertrand and Lacape (1943), Wiener (1957), Yuzuriha(1960), Suhara and Suzuki (1964), Alberts et al (1965), and Barlow (1967).

We have seen that the power spectrum of a white noise series is constant.The power spectrum has, therefore, been used on occasion as an informaltest statistic for pure noise; see Granger and Morgenstern (1963), Press andTukey (1956), for example. It is especially useful if the alternate is someother form of stationary behavior. A common assumption of relationshipbetween two series is that, up to an additive pure noise, one comes aboutfrom the other in some functional manner. After a functional form has beenfit, its aptness can be measured by seeing how flat the estimated powerspectrum of the residuals is; see also Macdonald.and Ward (1963). The mag-nitude of this residual spectrum gives us a measure of the goodness of fitachieved. Frequency bands of poor fit are directly apparent.

A number of papers have examined economic theories through the spec-trum; for example, Granger and Elliot (1968), Howrey (1968), and Sargent(1968).

5.13 EXERCISES 181

Estimation Power spectra are of use in the estimation of parameters ofinterest. Sometimes we have an underlying model that leads to a functionalform for the spectrum involving unknown parameters. The parameters maythen be estimated from the experimental spectrum; see Whittle (1951,1952a,1961) and Ibragimov (1967). Many unknown parameters are involved if wewant to fit a linear process to the data; see Ricker (1940) and Robinson(1967b). The spectrum is of use here. The shift of a peak in an observedspectrum from its standard position is used by astronomers to determine thedirection of motion of a celestial body; see Brace well (1965). The motionof a peak in a spectrum calculated on successive occasions was used byMunk and Snodgrass (1957) to determine the apparent presence of a stormin the Indian Ocean.

Search for Hidden Periodicities The original problem, leading to the defini-tion of the second-order periodogram, was that of measuring the frequencyof a (possibly) periodic phenomenon; see Schuster (1898).Peaksinfxx(T)(tydo spring into immediate view and their broadness gives a measure of theaccuracy of determination of the underlying frequency. The determinationof the dominant frequency of brain waves is an important step in the analy-sis of a patient with possible cerebral problems; see Gibbs and Grass (1947).Bryson and Button (1961) searched for the period of sunspots in treering records.

Smoothing and Prediction The accurate measurement of power spectra isan important stage on the way to determining Kolmogorov-Wiener smooth-ing and predicting formulas; see Kolmogorov (1941a), Wiener (1949),Whittle (1963a). The problems of signal enhancement, construction of opti-mum transmission forms for signals of harmonic nature (for example,human speech) fall into this area.

while

5.13 EXERCISES

5.13.1 If K(0 = £;„ a(t - u)X(u), while f(/) = ]£„ a(t - «)*(«), where AX/) isstationary with mean 0, prove that

and hence, if Ixx(T)(2irj/T) is smoothed across its whole domain, the valueobtained is nixx(T)(Q)< (This result may be employed as a check on thenumerical accuracy of the computations.) Indicate a similar result con-cerningcxx(T)(fy-

5.13.3 Let X(t), t = 0, ±1, ±2, . . . be a real-valued second-order stationaryprocess with absolutely summable autocovariance function cxx(u) andpower spectrum fxx(X), — TT < X ̂ TT. If fxx(X) ^ 0, show that thereexists a summable filter b(u), such that the series E(t) — T^ b(t — u)X(u)has constant power spectrum. Hint: Take the transfer function of b(u) tobe [/jrjr(X)]-1'2 and use Theorem 3.8.3.

5.13.4 Prove that exp{ — Ixx(T)(X)/fxx(X)}tends, in distribution, to a uniformvariate on (0,1) as 7*—» » under the conditions of Theorem 5.2.7.

5.13.5 Under the conditions of Theorem 5.2.7, prove that the statistics (irT)~l

[Re </*(r)(X)P and (xD~l[Im «/jr(r>(X)P tend, in distribution, to inde-pendent fxx(X) Xi2 variates.

5.13.6 Let Jxx(T)(X) be the smaller of the two statistics of the previous exerciseand Kxx(T)(X) the larger. Under the previous conditions, prove that theinterval [Jxx(T)(X), Kxx(T)(Xj\ provides an approximate 42 percent confi-dence interval for fxx(\). See Durbin's discussion of Hannan (1967b).

5.13.7 Prove that the result of Theorem 5.2.6 is exact, rather than asymptotic, ifX(0), . . . ,X(T — 1) are mean 0, variance a2 independent normal variates.

5.13.8 Let W(ct) = 0 for a < A, a > B, B > A, A, B finite. Prove that theasymptotic variance given in (5.6.13) is minimized by setting W(a) =(B - A)~l for A ^ a ^ B.

5.13.9 Prove, under regularity conditions, that the periodogram is a consistentestimate of fxx(X) if fxxfr) = 0.

5.13.10 Let Y(t) be a series with power spectrum /yy(X) and Z(t) an independentseries with power spectrum fxx(\). Let X(f) = Y(t) for 0 ^ / ^ T/2 andX(t) = Z(t) for 272 < t ^ T — 1. Determine the approximate statisticalproperties ofIxx(T}(X).

integer.

5.13.14 Under the conditions of Theorem 5.6.2 prove that

5.13.2 Prove that


If A(a) has a bounded first derivative, then it equals O(l) uniformly in X.5.13.18 Suppose *(/) = R cos (at + <£) + eO), t = 0 , . . . , T - I where the s(/) are

independent N(Q,a2) variates. Show that the maximum likelihood estimateof a is approximately the value of X that maximizes Ixx(T)0$', see Walker(1969).

5.13.19 Let X(t), / = 0, ±1, ... be real-valued, satisfy Assumption 2.6.2(1), andhave mean 0. Let W(a) satisfy Assumption 5.6.1. If BTT—» oo as F—> <»show that

5.13.15 (a) Prove that

5.13 EXERCISES 183

where Z>r-i(«) is given by (5.9.10). Hint: Use expression (3.2.5).(b) Prove that

then

5.13.17 Under Assumption 2.6.1, prove that if

5.13.20 Let X(t), t = 0, ±1, . . . be a real-valued series satisfying Assumption2.6.1. Let cx

(T) = T-i XX"1 X(t). Show that ^(CX(T) - cx) is asymptot-ically independent of ^JT(cxx(T)(u) — cxx(iij) which is asymptoticallynormal with mean 0 and variance

5.13.16 Prove that


5.13.21 Under the conditions of Theorem 5.6.3, show that •^ff(cx - ex) and-\lB^f[fxx(T)(\)- £/Ar*(T)(A)] are asymptotically independent and normal-

5.13.22 Show that the expected value of the modified periodogram (5.3.13) is givenby

tends in distribution to a Student's t with 2m degrees of freedom. (Thisresult may be used to set approximate confidence limits for ex.)

5.13.26 Let X(t), t = 0, ±1, . . . be a series with EX(t) = 0 and cov{^(r + K),X(t)\ = cxx(u) for t, u = 0, ±1, . . . . Suppose

and tends to fxx(v for X ̂ 0 (mod 2ir).5.13.23 Under the conditions of Theorem 5.2.6 prove that

for some finite K. Conclude that supx | fxx(T\\) - Efxx(T)(\)\ tends to 0 inprobability if BrT —» °°.

5.13.25 Under the conditions of Theorem 5.4.3 show that

Show that

Show that there exists a finite K such that

for u = 0, ±1, . . . and T = 1, 2,5.13.27 Let the real-valued series AX/), / = 0, ±1, ... be generated by the auto-

regressive scheme

5.13.24 Let fxx(T)(\) be given by (5.6.1) with W(0) bounded. Suppose

5.13 EXERCISES 185

for / = 0, ±1, . . . where s(/) is a series of independent identically dis-tributed random variables with mean 0 and finite fourth-order moment.Suppose all the roots of the.equation

is asymptotically normal with EGxx(T)(\) = X + (XT'-1) and

5.13.32 Use Exercise 2.13.31 to show that (5.11.20) may be written

satisfy |z| > 1. Let a(n(l),. . . , a(r>(m) be the least squares estimates offl(l),. . ., a(ni). They minimize

Show that Vna(r)(l) - a(l),..., a(r)(m) - a(m)] tends in distribution toNm(Q,[cxx(j - A:)]"1) as T-* <*>. This result is due to Mann and Wald(1943).

5.13.28 If a function g(x) on [0,1] satisfies \g(x) - g(y)\ ^ G\x - y\afor somea, 0 < a. ^ 1, show that

5.13.29 Show that /br(r)(A), given by (5.6.1), satisfies

5.13.30 Under the conditions of Theorem 5.2.1 and ex = 0, show that

5.13.31 Under the conditions of Theorem 5.10.1, show that

5.13.33 With the notation of Section 5.11, show that fxx(\) ^ gxxfr) for all X.

6

ANALYSIS OF A LINEAR TIMEINVARIANT RELATION BETWEEN

A STOCHASTIC SERIES ANDSEVERAL DETERMINISTIC SERIES

6.1 INTRODUCTION

Let Y(t), e(/), t = 0, ± 1,. .. be real-valued stochastic series and letX(/), t = 0, ±1,... be an r vector-valued fixed series. Then suppose M is aconstant and that {a(i/)j is a 1 X r filter. Hence, in this chapter we shall beconcerned with the investigation of relations that have the form

We will assume that the error series, e(/), is stationary with 0 mean andpower spectrum/«,(X). This power spectrum is called the error spectrum, it isseen to measure the extent to which the series Y(t) is determinable from theseries X(f) by linear filtering. We will assume throughout this text that valuesof the dependent series, 7(/)» and values of the independent series, X(Y), areavailable for t = 0,.. ., T - 1. Because Es(t) = 0,

That is, the expected value of Y(t) is a filtered version of X(t). Note from re-lation (6.1.2) that the series Y(t) is not generally stationary. However, fork *> 1.

and so the cumulants of Y(t) of order greater than 1 are stationary.

186

6.1 INTRODUCTION 187

The transfer function of the filter a(w) is given by

Let us consider the behavior of this transfer function with respect to filteringsof the series Y(t), X(t). Let {b(w)} be an r X r filter with inverse jc(w)}. Let\d(u)\ be a 1 X 1 filter. Set

then

Set

and

The relation (6.1.1) now yields

where

That is, the relation between the filtered series Y\(t), Xi(0, si(/) has the sameform as the relation (6.1.1). In terms of transfer functions, (6.1.11) mayhe written

or

We see that the transfer function relating Y(t) to X(r) may be determinedfrom the transfer function relating Yi(t) to Xi(f) provided the required in-

188 A STOCHASTIC SERIES AND SEVERAL DETERMINISTIC SERIES

verses exist. We note in passing that similar relations exist even if Yi(t) of(6.1.7) involves X through a term

6.2 LEAST SQUARES AND REGRESSION THEORY

Two classical theorems form the basis of least squares and linear regres-sion theory. The first is the Gauss-Markov Theorem or

Theorem 6.2.1 Let

for some 1 X r filter (e(tt)}. These remarks will be especially importantwhen we come to the problem of prefHtering the series prior to estimatingA(X).

Throughout this chapter we will consider the case of deterministic X(/)and real-valued stochastic Y(i). Brillinger (1969a) considers the model

t = 0, ± 1,. . . where X(i) is deterministic and Y(/), t(t) are s vector-valued.In Chapter 8 the model (6.1.15) is considered with X(/) stochastic.

where E is a 1 X n matrix of random variables with Et = 0, EtTt = o-2!, a isa 1 X k matrix of unknown parameters, X is a k X n matrix of knownvalues. Then

is minimized, for choice of a, by a = YX^XX7)"1 if XXT is nonsingular. Theminimum achieved is Y(I — XT(XXT)~1X)YT. Also £a = a, the covariancematrix of a, is given by £(a — a)T(a — a) = er^XX1")"1 and if a2 = (« — fc)~l

Y(I - XT(XXr)~1X)YT then Ed2 = a2. In addition, a is the minimum vari-ance linear unbiased estimate of a.

These results may be found in Kendall and Stuart (1961) Chapter 19, forexample. The least squares estimate of a is a. Turning to distributional as-pects of the above a and a2 we have

Theorem 6.2.2 If, in addition to the conditions of Theorem 6.2.1, the ncomponents of e have independent normal distributions, then ar isAâV^XXO"1), and a2 is <r2x2

n_k/(n - K) independent of a.

which has a /„_* distribution.These results apply to real-valued random variables and parameters. In

fact, in the majority of cases of concern to us in time series analysis wewill require extensions to the complex-valued case. We have

Theorem 6.2.3 Let

is minimized, for choice of a, by a = YX^XXO"1 if XXT is nonsingular. Theminimum achieved is Y(I - Xr(XXT)-1X)YT. Also £a = a, £(a - a)'X (a - a) = 0 and £(a - a)*(a - a) = (XXO-1^2. If 62 = (n - k)~{

Y(I - Xr(XXT)-1X)YT, then Ed2 = a2.

Turning to distributional aspects, we have

where e is a 1 X n matrix of complex-valued random variables with Et = 0,EtTt — 0, EiTt = ff2l. a is a 1 X k matrix of unknown complex-valuedparameters, X is a k X n matrix with known complex-valued entries and Y isa 1 X n matrix of known complex-valued entries. Then

and so its distribution is determinable directly from the noncentral F.Suppose dj and 0, denote they'th entries of a and a respectively and c}j de-

notes they'th diagonal entry of (XX7")"1. Then the confidence intervals for a/may be derived through the pivotal quantity,

the squared sample multiple correlation coefficient. It may be seen that0 ^ RYx2 ^ 1. Also from (6.2.3) we see that

is noncentral F, degrees of freedom k over n — k and noncentrality param-eter aXXTar/cr2. We see that the hypothesis a = 0 may be tested by notingthat (6.2.3) has a central /*.„_* distribution when the hypothesis holds. Arelated statistic is

It follows directly from Theorem 6.2.2 tf

6.2 LEAST SQUARES AND REGRESSION THEORY 189


Theorem 6.2.4 If in addition to the conditions of Theorem 6.2.3, thecomponents of ehavindependenWic(0,<2distributions,then aT isJV^(aT,(XXr)~V2) and 62 is a2X2(»_fc)/[2(« - fc)] independent of a.

We may conclude from this theorem that

is noncentral F, degrees of freedom 2k over 2(n — k) and noncentralityparameter aXXTar/<r2. This statistic could be used to test the hypothesisa = 0. A related statistic is

the squared sample complex multiple correlation coefficient. It may be seendirectly that 0 £ \&Yx\2 ^ 1. Also from (6.2.10) we see that

and so its distribution is determinable directly from the noncentral F.Theorems 6.2.3 and 6.2.4 above are indicated in Akaike (1965). Under theconditions of Theorem 6.2.4, Khatri (1965a) has shown that a and (n — k)62/nare the maximum likelihood estimates of a and a2.

An important use of the estimate a is in predicting the expected value ofyo, the variate associated with given XQ. In this connection we have

Theorem 6.2.5 Suppose the conditions of Theorem 6.2.4 are satisfied.Suppose also

where eo is independent of e of (6.2.7). Let yo = axo, then yo is distributed as7Vic(axo,(r2xoT(XXT)~1xo) and is independent of <r2.

On a variety of occasions we will wish to construct confidence regions forthe entries of a of expression (6.2.7). In the real-valued case we saw that con-fidence intervals could be constructed through the quantity (6.2.6) whichhas a t distribution under the conditions of Theorem 6.2.2. In the presentcase complications arise because a has complex-valued entries.

Let dj and «y denote they'th entries of a and a respectively. Let Cjj denotethe /th diagonal entry of (XXr)-1. Let w, denote

The quantity

6.2 LEAST SQUARES AND REGRESSION THEORY 191

has the form ir1/2z where z is Wic(0,l) and v is independently X2(*_*)/\2(n - k)}. Therefore

has an /2;2(n-;t) distribution. A 100/8 percent confidence region for Re a/,Im dj may thus be determined from the inequality

where F(fi) denotes the upper 1000 percent point of the F distribution. Wenote that this region has the form of a circle centered at Re dj, Im dj.

On other occasions it may be more relevant to set confidence intervals for\dj\ and arg dj. One means of doing this is to derive a region algebraicallyfrom expression (6.2.16). Let

then (6.2.16) is, approximately, equivalent to the region

This region was presented in Goodman (1957) and Akaike and Yamanouchi(1962).

The region (6.2.18) is only approximate. An exact 1007 percent intervalfor \dj\ may be determined by noting that

is noncentral F with degrees of freedom 2 and 2(n — k) and noncentralityparameter |o/|2/2. Tables for the power of the F test (see Pearson andHartley (1951) can now be used to construct an exact confidence interval for\dj\. The charts in Fox (1956) may also be used.

Alternatively we could use expression (6.2.15) to construct an approxi-mate lOOy percent confidence interval by determining its distribution bycentral F with degrees of freedom

and 2(« — k). This approximation to the noncentral F is given in Abramo-witz and Stegun (1964); see also Laubscher (1960).

In the case of </>y = arg «/ we can determine an exact 1005 percent con-fidence interval by noting that


has a /2(»-fc) distribution. It is interesting to note that this procedure is re-lated to the Creasy-Fieller problem; see Fieller (1954) and Halperin (1967).The two exact confidence procedures suggested above are given in Grovesand Hannan (1968).

If simultaneous intervals are required for several of the entries of a, thenone can proceed through the complex generalization of the multivariate /.See Dunnett and Sobel (1954), Gupta (1963a), Kshirsagar (1961), and Dickey(1967) for a discussion of the multivariate t distribution. However, we con-tent ourselves with defining the complex t distribution. Let z be distributedas Nic(Q,l) and independently let s2 be distributed as xn

2/n. Then z/s has acomplex t distribution with n degrees of freedom. If u = Re t, v = Im t,then its density is given by

— oo < u, v < oo. A related reference is Hoyt (1947).

6.3 HEURISTIC CONSTRUCTION OF ESTIMATES

We can now construct estimates of the parameters of interest. Set

The model (6.1.1) then takes the form

The values X(r), t = 0, . . . , T — 1 are available and therefore we can cal-culate the finite Fourier transform

In the present situation, it is an r vector-valued statistic. Define

The approximate relation between c//?(r)(X) and d^(r)(X) is given by

Lemma 6.3.1 Suppose that |X(01 ^ M, t = 0, ±1, ... and that

6.3 HEURISTIC CONSTRUCTION OF ESTIMATES 193

and

if - oo < a < oo with |a - X| ̂ LT-1.

Let 5(7") be an integer with 2irs(T)/T near X. Suppose T is large. Fromexpression (6.3.6)

for s — 0, ±1,. . . , ±m say. If e(0 satisfies Assumption 2.6.1, then follow-ing Theorem 4.4.1 the quantities dt

(T)(2ir[s(T) + s]/T), s = 0, ±1,. . . , ±mare approximately Nic(Q,2irTflt(\y) variates. Relation (6.3.7) is seen to havethe form of a multiple regression relation involving complex-valued variates.Noting Theorem 6.2.3, we define

Suppose that the /• X r matrix fxx(T\\) is nonsingular. We now estimateA(X) by

and/,,(X) by

Theorem 6.2.4 suggests the approximating distributions A^rc(A(X)%

(2m + l)-'/,,(X)f^(:r)(X)-Ofor A«W and [2(2m + 1 - /•)]-'/,.(X)xi(2m+1_r)

for ^t,(^(X).

In the next sections we generalize the estimates (6.3.12) and (6.3.13) andmake precise the suggested approximate distributions.

As an estimate of n we take


where CY(T) and c*(r) are the sample means of the given Y and X values. Wewill find it convenient to use the statistic n(T} + A(r)(0)cjr<r) = cr(T) in thestatement of certain theorems below.

The heuristic approach given above is suggested in Akaike (1964, 1965),Duncan and Jones (1966) and Brillinger (1969a).

6.4 A FORM OF ASYMPTOTIC DISTRIBUTION

In this section we determine the asymptotic distribution of a class of ele-mentary estimates, suggested by the heuristic arguments of Section 6.3, forthe parameters A(X) and/.,(X). The form and statistical properties of these es-timates will depend on whether or not X = 0 (mod T). Consider three cases:

Case A X satisfies X ̂ 0 (mod *•)Case B X satisfies X = 0 (mod 2ir) or = ±TT, ±3ir,. . . and T is evenCase C X satisfies X = dbir, ±3ir,. . . and T is odd.

Suppose s(T) is an integer with 2irs(T)/T near X. (We will later require2irs(T)/T —» X as T —> <».) Suppose m is a non-negative integer. Let IyA-(r)(X)be given by (6.3.8). Define

and

with similar definitions for/yy(r)(X) and fA-*(r)(X). These estimates are basedon the discrete Fourier transforms of the data and so may be computed by aFast Fourier Transform Algorithm.

As estimates of A(X), f,K\), n we take

as an estimate of/x. A theorem indicating the behavior of the mean of A(r)(X)in the present situation is

Theorem 6.4.1 Let e(t), / = 0, ±1,. .. satisfy Assumption 2.6.1 and havemean 0. Let X(0, f = 0, ± 1,. . . be uniformly bounded. Let Y(f) be given by(6.1.1) where {a(ii)\ satisfies S \u\ |a(w)| < ». Let A(r)(X) be given by(6.4.4) where fy;r(r)(X) is given by (6.4.1). Then

6.4 A FORM OF ASYMPTOTIC DISTRIBUTION 195

with C(m,r) a constant given by

and finally take

in Case A where

for finite K. There are similar expressions in Cases B and C.

We note from expression (6.4.8) that £A(r)(X) is principally a matrixweighted average of the transfer function A(a). In addition, the expressionsuggests that the larger fxxm(ty is, the smaller the departure from theweighted average will be. From Theorem 6.4.1 we may conclude

Corollary 6.4.1 Under the conditions of Theorem 6.4.1 and if ||f^(r)(X>"!llis bounded, as T~> «>, A(r)(X) is asymptotically unbiased.

Turning to an investigation of asymptotic distributions we have

Theorem 6.4.2 Suppose the conditions of Theorem 6.4.1 are satisfied.Suppose also that fxx(T)(h) is nonsingular for T sufficiently large and that2irs(T)/T-*\ as r-> oo. Then A(r)(X)T is asymptotically Nr

c(A(\)T,(2m + l)~!/;,(X)fA'A-(r)(X)-1) in Case A, is asymptotically Nr(

T,(2w)-'/te(X)fA-A'(7')(X)-1) in Case B and Case C. Also g,,(r)(X) tends to

is often of special interest as it provides a measure of the strength of a lineartime invariant relation between the series 7(/), / = 0, ±1, . . . and the seriesX(/), t = 0, ± 1, . . . . Its large sample distribution is indicated by

Theorem 6.4.3 Suppose the conditions of Theorem 6.4.1 are satisfied andsuppose |/?y;r(r)(X)l2 is given by (6.4.11). Then, in Case A,

6.5 EXPECTED VALUES OF ESTIMATES OF THE TRANSFERFUNCTION AND ERROR SPECTRUM

We now turn to an investigation of the expected values of estimates ofslightly more general form than those in the previous section. Suppose that


/«(X)X2(2m+i-r)/[2(2m + 1 - /O] in Case A and to /,.(X)xL-r/(2m - r) inCase B and Case C. The limiting normal and x2 distributions are inde-pendent. Finally M(T) + A(r)(0)CA-(r) is asymptotically N\(n + A(0)cAr(r),2irT-yit(Vj) independently of A(r)(X), g,,(r)(X), - « < X<

In the case X ̂ 0 (mod IT), Theorem 6.4.2 suggests the approximation

In the case X = 0 (mod IT), the theorem suggests the approximate variance2/«(X)V(2m - r).

The limiting distributions of the estimates of the gain and phase may bedetermined through

Corollary 6.4.2 Under the conditions of Theorem 6.4.2, functions ofA(T)(X), g,.(r)(X), M(J) + A(r)(0)c(r) tend in distribution to the same functionsbased on the limiting variates of the theorem.

In Section 6.9 we will use Theorem 6.4.2 and its corollary to set up con-fidence regions for the parameters of interest. The statistic

as T~» oo where F is a noncentral F with degrees of freedom 2r over2(2m +!-/•) and noncentrality parameter A(X)fA-A-(r)(X)A(X)T//«(X).

We will return to a discussion of this statistic in Chapter 8. The notationOfl.s.O) means that the term tends to 0 with probability 1.

6.5 ESTIMATES OF THE TRANSFER FUNCTION 197

we are interested in estimating the parameters of the model (6.1.1) given thevalues X(/), Y(t), t = 0 , . . . , T - 1. Let Iy;r(r)(X) be given by (6.3.8) withsimilar definitions for /yy(r)(X) and lxx(T)&\ We will base our estimates onthese statistics in the manner of (6.3.10); however, we will make our esti-mates more flexible by including a variable weighting of the terms in ex-pression (6.3.10). Specifically, let W(a) be a weight function satisfying

Assumption 6.5.1 W(a), — <» < a < °°, is bounded, even, non-negative,equal to 0 for |«| > ir and such that

The principal restrictions introduced here on W(a), over those of Assump-tion 5.6.1, are the non-negativity and finite support.

In order to reflect the notion that the weight function should becomemore concentrated as the sample size T tends to °°, we introduce a band-width parameter BT that depends on T. Also in order that our estimatepossess required symmetries we extend the weight function periodically. Wetherefore define

The mass of W(T)(a) is concentrated in intervals of width 2irBr abouta = 0 (mod 2ir) as T—> °°.

We nx>w define

We see that W(T)(a) is non-negative

and if BT —* 0 as T —» oo, then for T sufficiently large

As estimates of A(X), /,«(X), /* we now take


and

Also A(T)(X) and g,,(:r)(X) have period ITT, while g..(T)(X) is non-negative andsymmetric about 0. Finally /*(r) is real-valued as is its corresponding popula-tion parameter ju-

A statistic that will appear in our later investigations is |/?yA-(r)(X)|2

given by

respectively. If m is large, then definitions (6.5.9) and (6.4.5) are essentiallyequivalent.

Because of the conditions placed on W(d), we see that

where the error terms are uniform in X.

Theorem 6.5.1 Let e(0, f = 0, ±1,.. . satisfy Assumption 2.6.2(1), X(/),t = 0, ±1,... satisfy Assumption 6.5.2. Let 7(0, f = 0, ±1,... be givenby (6.1.1) where |a(w)} satisfies S \u\ |a(«)| < «. Let W(a) satisfy Assump-tion 6.5.1. Let A(T)(X) be given by (6.5.8), then

It will be seen to be a form of multiple correlation coefficient and may beseen to be bounded by 0 and 1, and it will appear in an essential mannerin estimates of the variances of our statistics.

We make one important assumption concerning the sequence of fixed(as opposed to random) values X(r) and that is

Assumption 6.5.2 X(0 t = 0, ± 1,. . . is uniformly bounded and iffxx(TWis given by (6.5.5), then there is a finite K such that

for all X and T sufficiently large.

Turning to the investigation of the large sample behavior of A(r)(X)we have

6.5 ESTIMATES OF THE TRANSFER FUNCTION 199

We see that the expected value of A(r)(X) is essentially a (matrix) weightedaverage of the population function A(a) with weight concentrated in aneighborhood, of width 2-n-Br, of X. Because it is a matrix weighted average,an entanglement of the various components of A(a) has been introduced. Ifwe wish to reduce the asymptotic bias, we should try to arrange for A(a) tobe near constant in the neighborhood of X. The weights in (6.5.14) dependon the values X(/), / = 0 , . . . , T — 1. It would be advantageous to makeIxx(T)(a) near constant as well and such that off-diagonal elements are near0. The final expression of (6.5.14) suggests that the asymptotic bias of A(r)(X)is generally of the order of the band-width BT- We have

Corollary 6.5.1 Under the conditions of Theorem 6.5.1 and if BT —» 0 asT —> oo, A(r)(X) is an asymptotically unbiased estimate of A(X).

Let the entries of A(X) and A(r)(X) be denoted by Aj(\) and /4/r)(X),y = 1,. . . , / • , respectively. On occasion we may be interested in the real-valuedgains

These may be estimated by

and

Theorem 6.5.2 Under the conditions of Theorem 6.5.1,

and if Aj(\) ^ 0, then

(In this theorem, ave denotes an expected value derived in a term by termmanner from a Taylor expansion, see Brillinger and Tukey (1964).)

and the real-valued phases


Corollary 6.5.2 Under the conditions of Theorem 6.5.2 and if BT —» 0,BrT—* oo asT—> °°, G/T)(X) is an asymptotically unbiased estimate of G/X).

Turning to the case of g«,(r)(X), our estimate of the error spectrum, we have


This result may be compared instructively with expression (5.8.22) in thecase P = 1. In the limit we have

Corollary 6.5.3 Under the conditions of Theorem 6.5.3 and if BT —•> 0,BTT —» oo as T —> oo, g«(r)(X) is an asymptotically unbiased estimate of/,,(X).

In the case of /x(r) we may prove


From Theorem 6.5.4 follows

Corollary 6.5.4 Under the conditions of Theorem 6.5.4 and if BT —> 0 asT—* oo. i iW> is an asvmntntir.allv unhiaspH estimate r»f n

6.6 ASYMPTOTIC COVARIANCES OF THE PROPOSED ESTIMATES

In order to be able to assess the precision of our estimates we require theform of their second-order moments. A statistic that will appear in thesemoments is defined by

This statistic has the same form as fxx(T)0$ given by expression (6.5.5) ex-cept that the weight function W(a) has been replaced by W(a)2. Typicallythe latter is more concentrated; however, in the case that W(a) = (2ir)~ for|a| ^ TT

6.6 ASYMPTOTIC COVARIANCES OF THE PROPOSED ESTIMATES 201

In a variety of cases it may prove reasonable to approximate Iujr(r)(X) byf*;r(r)(X). This has the advantage of reducing the number of computationsrequired. Note that if fxx(T)(X) is bounded, then the same istrueforhjo-^X).Thus we may now state

Theorem 6.6.1 Let e(f), t = 0, ±1,. . . satisfy Assumption 2.6.1 and havemean 0. Let X(/), f = 0, ±1,. . . satisfy Assumption 6.5.2. Let 7(0, t = 0,±1,. . . be given by (6.1.1) where |a(«)} satisfies S \u\ |a(u)| < «. LetW(a) satisfy Assumption 6.5.1. If BT —» 0 as T—> », then

In the case that (6.6.2) holds, the second expression of (6.6.3) has the form

an expression that may be estimated by

We see from expression (6.6.3) that the asymptotic variance of A(T)(X) is oforder Br~lT~l and so we have

Corollary 6.6.1 Under the conditions of the theorem and if BrT —» » asT —> oo, A<r)(X) is a consistent estimate of A(X).

We also note, from (6.6.3), that A(r)(X) and A(r)(/x) are asymptotically un-correlated for X ̂ n (mod 2?r).

In practice we will record real-valued statistics. The asymptotic co-variance structure of Re A(r)(X), Im A(r)(X) is given in Exercise 6.14.22.Alternatively we may record G/(r)(X), </»/r)(X) and so we now investigatetheir asymptotic covariances. We define ^^"(X) to be the entry in the jthrow and fcth column of matrix

and

Expressions (6.6.11) and (6.6.12) should be compared with expressions(5.6.12) and (5.6.15). We see that under the indicated limiting processes theasymptotic behavior of the second-order moments of g..(r)(A) is the same asif g..(r)(X) were a power spectral estimate based directly on the valuese(/), r = 0,. . . ,r- 1.

In the case of M(r) + A(r>(0)c^(7') = CY(T) we have

Corollary 6.6.3 Under the conditions of Theorem 6.6.3 and if BT71—> °°as T—> oo

In the limit we have,


Theorem 6.6.2 Under the conditions of Theorem 6.6.1 and if Aj(\),AM ^ 0

for j, k = 1, . . . ,r.

Note that the asymptotic covariance structure of log Gj(T)(\) is the sameas that of <t>j(T\\) except in the cases X = 0 (mod TT). We can construct esti-mates of the covariances of Theorem 6.6.2 by substituting estimates for theunknowns Aj{\), fM; We note that log G,(r)(X) and <£*(r)(M) are asymp-totically uncorrelated for ally, k and X, M-

Turning to the investigation of g«.(r)(^)> we have


6.7 ASYMPTOTIC NORMALITY OF THE ESTIMATES 203


We may use expression (6.6.15) below to obtain an expression for thelarge sample variance of n(T). (See Exercise 6.14.31.) This variance tends to 0as T —> oo and so we have

Corollary 6.6.4 Under the conditions of the theorem and if BrT —> oo asT—•* oo, /u(r) is a consistent estimate of/x.

In the case of the joint behavior of A(r)(X), £,,(r)(X), M(r) + A(r)(0)cjr<r)

we have


and

We see that g<t(T)(n) is asymptotically uncorrelated with both A(r)(^) ar>d

M(r) -|- A(r)(0)cAr(r). Also A(r>(X) and M(r) + A^>(0)cAr(r) are asymptotically

uncorrelated.In the case of the gains and phases we have


and

6.7 ASYMPTOTIC NORMALITY OF THE ESTIMATES

We next turn to an investigation of the asymptotic distributions of theestimates A(r)(X), g.,(r)(X), p.(7"> under the limiting condition BrT—^ »as T—> oo.

Theorem 6.7.1 Let s(/), / = 0, ±1,. .. satisfy Assumption 2.6.1 and havemean 0. Let X(/), t = 0, ±1,. . . satisfy Assumption 6.5.2. Let 7(0, t = 0,±1,.. . be given by (6.1.1) where {a(w)| satifies 2 \u\ |a(w)| < oo. Let

if X ̂ 0 (mod TT) where *F(r)(X) is given by (6.6.6). This result will later beused to set confidence regions for A(X).

Following a theorem of Mann and Wald (1943a), we may conclude

Corollary 6.7.1 Under the conditions of Theorem 6.7.1 loge G/r)(X),*/r)00» g.«(r)O), CY

(T) = M(r) + A(r)(0)c^(r) are asymptotically normal withcovariance structure given by (6.6.7) to (6.6.10), (6.6.13), and (6.6.17) to(6.6.20) for j = l , . . . , r .

We note in particular that, under the indicated conditions, log G/r)(X)and <£/r)(X) are asymptotically independent.

The asymptotic distribution of A(r)(X) given in this theorem is the same asthat of Theorem 6.4.2 once one makes the identification

The asymptotic distribution of g«(r)(X) is consistent with that of Theorem6.4.2 in the case that (6.7.2) is large, for a x2 variate with a large number ofdegrees of freedom is near normal.

6.8 ESTIMATING THE IMPULSE RESPONSE

In previous sections we have considered the problem of estimating thetransfer function A(X). We now consider the problem of estimating the cor-responding impulse response function (a(w)}. In terms of A(X) it is given by


W(a) satisfy Assumption 6.5.1. If BT-*Q, BTT-+ <*> as T-+ «, thenA(r)(Xi), g,,(r)(Xi),. . . , A(r)(Xy), g,,(r)(Xy) are asymptotically jointly normalwith covariance structure given by (6.6.3), (6.6.11), and (6.6.14). FinallyM(D _j_ A(r)(0)cA-(r) is asymptotically independent of these variates withvariance (6.6.13).

We see from expression (6.6.14) that A(r)(X) and gu(T)(n) are asymptoti-

cally independent for all X, n under the above conditions. From (6.6.3)we see that A(T)(V) and A(r)(/i) are asymptotically independent ifX - n ^ 0 (mod 2ir). From Exercise 6.14.22 we see Re A(r)(X) and Im A(r)(Xare asymptotically independent. All of these instances of asymptotic inde-pendence are in accord with the intuitive suggestions of Theorem 6.2.4.

The theorem indicates that A(r)CXV is asvmototicallv

6.8 ESTIMATING THE IMPULSE RESPONSE 205

Let A(r)(X) be an estimate of A(X) of the form considered previously. LetPT be a sequence of positive integers tending to °° with T. As an estimate ofa(w) we consider

Note that because of the symmetry properties of A(r)(X), the range of sum-mation in expression (6.8.2) may be reduced to 0 ^ p ^ (PT — l)/2 interms of Im A(r), Re A(r). Also, the estimate has period PT and so, forexample

We may prove

Theorem 6.8.1 Let e(f), t = 0, ±1,.. . satisfy Assumption 2.6.1 and havemean 0. Let X(f), t — 0, ±1,. . . satisfy Assumption 6.5.2. Let Y(t) be givenby (6.1.1) where (a(w)} satisfies S \u\ |a(w)| ^ <». Let W(oi) satisfy As-sumption 6.5.1. Let a(r)(w) be given by (6.8.2), then

We see that for large PT and small BT the expected value of the suggestedestimate is primarily the desired a(tt). A consequence of the theorem is

Corollary 6.8.1 Under the conditions of Theorem 6.8.1 and if BT —> 0,PT —> °° as T —» oo, a(r)(«) is asymptotically unbiased.

Next turn to an investigation of the second-order moments of a(r)(w). Wehave previously defined

and we now define

This expression is bounded under the conditions we have set down. Wenow have

Note from (6.8.6) that the asymptotic covariance matrix of a(r)(«) doesnot depend on w. Also the asymptotic covariance matrix of a(r)(w) witha(r)(0) depends on the difference u — v so in some sense the process a(r)(«),M = 0, ± 1,. . . may be considered to be a covariance stationary time series.In the limit we have

Corollary 6.8.2 Under the conditions of Theorem 6.8.2 and if PTBTT —> «as T—> oo, a(r)(w) is a consistent estimate of a(w).

Turning to the joint behavior of a(r)(w) and g,,(r)(X) we have


6.9 CONFIDENCE REGIONS

The confidence regions that will be proposed in this section will be basedon the asymptotic distributions obtained in Section 6.4. They will be con-structed so as to be consistent with the asymptotic distributions of Sec-tion 6.7.


Theorem 6.8.2 Under the assumptions of Theorem 6.8.1 and if BT ^ PT~I,5 r->0asr—» oo,

We see that a(r)(w), g«,(r)(X) are asymptotically uncorrelated for all u, X. Inthe case of the limiting distribution we may prove

Theorem 6.8.4 Under the conditions of Theorem 6.8.1 and if PrBr —•> 0 asr-> oo, a<r>(tt,), .. ., a<r>(w,), g,,(r)(\i),. . . ,g,,(r)(X*) are asymptoticallynormal with covariance structure given by (6.6.10), (6.8.7), and (6.8.8).

In Theorem 6.8.2 we required BT ^ PT~]. From expression (6.8.7) we seethat we should take PrBj as large as possible. Setting PT = BT~I seems asensible procedure, for the asymptotic variance of a(r)(w) is then of order T~*.However, in this case we are unable to identify its principal term from ex-pression (6.8.7). In the case that PrBr —> 0, the first term in (6.8.7) is thedominant one. Finally we may contrast the asymptotic order of this vari-ance with that of A(7">(X) which was BT~lT-i.

6.9 CONFIDENCE REGIONS 207

Suppose estimates A(r)(X), Mm> g«e(r)(X), a(r)(w) have been constructed inthe manner of Section 6.5 using a weight function W(a). A comparison ofthe asymptotic distributions obtained for A(r)(X) in Theorems 6.4.2 and 6.7.1suggests that we set

Following Theorem 6.7.1 we then approximate the distribution of A(r)(X)T by

and by

At the same time the distribution of g,,(r)(X) is approximated by an inde-pendent

and by

A 100/3 percent confidence interval for/,,(X) is therefore provided by

in Case A with similar intervals in Cases B and C. A confidence interval forlog/.,(X) is algebraically deducible from (6.9.6).

If we let Cjj denote the yth diagonal entry of

and Wj denote Cyyg,,(r)(X), then following the discussion of Section 6.2 a100/3 percent confidence region for Re A/T)(\), Im ^y(r)(X) may be deter-mined from the inequality

This region is considered in Akaike (1965) and Groves and Hannan (1968).If a 100/3 percent simultaneous confidence region is desired for all the /4/X),j = ! , . . . , / • then following Exercise 6.14.17 we can consider the region


If we set

then the region (6.9.9) is approximately equivalent to the region

giving a simultaneous confidence region for the real-valued gains andphases. Regions of these forms are considered in Goodman (1965) andBendat and Piersol (1966). The exact procedures based on (6.2.19) and(6.2.21) may also be of use in constructing separate intervals for j/</X)| or4>/(X). They involve approximating the distribution of

by a noncentral F with degrees of freedom 2, 2(2w -f 1 — /•) and non-centrality |/4/X)|2/2 and on approximating the distribution of

by a t2(2m+i-r) distribution and then finding intervals by algebra.On occasion we might be interested in examining the hypothesis A(X) = 0.

This may be carried out by means of analogs of the statistics (6.2.9) and(6.2.10), namely

and

In the case A(X) = 0, (6.9.14) is distributed asymptotically as /2;2(2m+i-r)and the latter statistic as

respectively.We now turn to the problem of setting confidence limits for the entries of

a(n(H). The investigations of Section 6.8 suggest the evaluation of thestatistic

6.10 A WORKED EXAMPLE 209

Let Ajjm signify the/th diagonal entry of A(r). Theorem 6.8.4 now suggests

6.10 A WORKED EXAMPLE

As a first example we investigate relations between the series, B(f), ofmonthly mean temperatures in Berlin and the series, V(t), of monthly meantemperatures in Vienna. Because these series have such definite annual varia-tion we first adjust them seasonally. We do this by evaluating the mean valuefor each month along the course of each series and then subtracting thatmean value from the corresponding month values. If Y(t) denotes the ad-justed series for Berlin, then it is given by

Figure 6.10.1 Seasonally adjusted series of monthly mean temperatures in °C at Berlin forthe years 1920-1930.

Figure 6.10.2 Seasonally adjusted series of monthly mean temperatures in °C at Viennafor the years 1920-1930.

as an approximate 100)8 percent confidence interval for aj(u).Simultaneous regions for O/,(MI), .. ., O//M/) may be constructed from

(6.9.18) using Bonferroni's inequality; see Miller (1966).

j = 0,. . ., 11; k = 0,.. . , K - 1 and K = T/12. LetX(t) likewise denotethe series of adjusted values for Vienna. These series are given in Figures6.10.1 and 2 for 1920-1930. The original series are given in Figure 1.1.1.


The period for which we take these temperature series is 1780-1950. Wedetermine the various statistics in the manner of Section 6.4. In fact we takeT — 2048 and so are able to evaluate the required discrete Fourier trans-forms by means of the Fast Fourier Transform Algorithm. In forming thestatistics /yy(r)(X),/y^^(X),/^(r)(X) we take ro = 10.

The results of the calculations are recorded in a series of figures. Figure6.10.3 is a plot of logio/rr(r)(X) and logio g,.(r)(X), the first being the uppercurve. If we use expressions (5.6.15) and (6.6.12) we find that the asymptoticstandard errors of these values are both .095 for X ̂ 0 (mod ir). Figure6.10.4 is a plot of Re A(T)(\) which fluctuates around the value .85; Figure6.10.5 is a plot of Im /4(r)(X) which fluctuates around 0; Figure 6.10.6 is aplot of G(r)(X) which fluctuates around .9; Figure 6.10.7 is a plot of 4>(7X\)which fluctuates around 0; Figure 6.10.8 is a plot of |/?yA-(r)(X)|2 whichfluctuates around .7. Remember that this statistic is a measure of the degreeto which Y is determinable from X in a linear manner. Figure 6.10.9 is aplot of a(T)(u) for \u\ ^ 50. Following (6.8.7) the asymptotic standard error

Figure 6.10.3 An estimate of the power spectrum of Berlin temperatures and an estimateof the error spectrum after fitting Vienna temperatures for the years 1780-1950.

6.1A WORKED EXAMPLE211

Figure 6.10.4 Re A^(\), an estimate of the real part of the transfer function for fittingBerlin temperatures by Vienna temperatures.

of this statistic is .009. The value of a(r)(0) is .85. The other values are notsignificantly different from 0.

Our calculations appear to suggest the relation

where the power spectrum of s(0 has the form of the lower curve in Figure6.10.3. We fitted the instantaneous relation by least squares and found thesimple regression coefficient of Y(f) on X(f) to be .81. If we assume the e(0are independent and identically distributed, then the estimated standarderror of this last is .015. The estimated error variance is 1.57.

As a second example of the techniques of this chapter we present theresults of a frequency regression of the series of monthly mean temperaturesrecorded at Greenwhich on the monthly mean temperatures recorded at thethirteen other locations listed in Table 1.1.1. We prefilter these series by re-


Figure 6.10.5 Im /4(T)(X), an estimate of the imaginary part of the transfer function forfitting Berlin temperatures by Vienna temperatures.

moving monthly means and a linear trend. Figure 1.1.1 presents the originaldata here.

We form estimates in the manner of (6.4.1) to (6.4.5) with m = 57. TheFourier transforms required for these calculations were computed using aFast Fourier Transform Algorithm with T= 2048. Now: Figure 6.10.10presents <j/r)(X), <£/r)(X) fory = 1,. . . , 13; Figure 6.10.11 presents logicg,.(r)(X); Figure 6.10.12 presents \Ryx(TW\2 as defined by (6.4.11); The powerspectrum of Greenwich is estimated in Figure 7.8.8.

Table 6.10.1 gives the results of an instantaneous multiple regression ofthe Greenwich series on the other thirteen series. The estimated error vari-ance of this analysis is .269. The squared coefficient of multiple correlationof the analysis is .858.

The estimated gains, G/r)(X), appear to fluctuate about horizontal levelsas functions of X. The highest levels correspond to Edinburgh, Basle, andDe Bilt respectively. From Table 6.10.1 these are the stations having the

Figure 6.10.6 G(r)(X), an estimate of the amplitude of the transfer function for fitting Berlintemperatures by Vienna temperatures.

Figure 6.10.7 4>(T)(X), an estimate of the phase of the transfer function for fitting Berlintemperatures by Vienna temperatures.

Figure 6.10.8 \Ryxm(^)\2, an estimate of the coherence of Berlin and Vienna temperaturesfor the years 1780-1950.

Figure 6.10.9 a(T)(w), an estimate of the filter coefficients for fitting Berlin temperatures byVienna temperatures.


Figure 6.10.10 Estimated gains and phases for fitting seasonally adjusted Greenwichmonthly mean temperatures by similar temperatures at thirteen other stations for the years1780-1950.



Figure 6.10.11 logio £«(T)00> the logarithm of the estimated error spectrum for fittingGreenwich temperatures by those at thirteen other stations.

Figure 6.10.12 |/?rjr(T>00|2, an estimate of the multiple coherence of Greenwich tempera-tures with those at thirteen other stations.

6.11 FURTHER CONSIDERATIONS 219

Table 6.10.1. Regression Coefficients of Greenwich on Other Stations

Locations

ViennaBerlinCopenhagenPragueStockholmBudapestDe BiltEdinburghNew HavenBasleBreslauVilnaTrondheim

SampleRegression Coefficients

-.071-.125

.152-.040-.041-.048

.469

.305

.053

.338

.030-.024-.010

EstimatedStandard Errors

.021

.023

.022

.010

.016

.019

.022

.014

.009

.016

.017

.009

.013

largest sample regression coefficients but in the order De Bilt, Basle, andEdinburgh. The estimated phases <£/r)(X), corresponding to these stations,are each near constant at 0, suggesting there is no phase lead or lag and therelationship is instantaneous for these monthly values. As the estimated gainsof the other stations decrease, the estimated phase function is seen to be-come more erratic. This was to be expected in view of expression (6.6.9) forthe asymptotic variance of the phase estimate. Also, the estimated gain forNew Haven, Conn., is least; this was to be expected in view of its greatdistance from Greenwich.

The estimated multiple coherence, |/?yA-(r)(X)|2, is seen to be near constantat the level .87. This is close to the value .858 obtained in the instantaneousmultiple regression analysis. Finally, the estimated error spectrum, g..(r)(X),is seen to fall off steadily as X increases.

6.11 FURTHER CONSIDERATIONS

We turn to an investigation of the nature of the dependence of the variousresults we have obtained on the independent series X(f). We first consider thebias of A(r)(X). From expressions (6.4.8) and (6.5.14) we see that the ex-pected value of A(r)(X) is primarily a matrix weighted average of A(a) withweights depending on lxx(T)(a). From the form of the expressions (6.4.8)and (6.5.14) it would be advantageous if we could arrange that the functionIxx(T)(a) be near constant in a and have off-diagonal terms near 0. Near 0off-diagonal terms reduces the entanglement of the components of A(a).Continuing, an examination of the error term in (6.4.8) suggests that theweighted average term will dominate in the case that Hfjrjr^X^)"""1!! is small,that is, fA-A-(r)(X) is far from being singular.


Next we consider the asymptotic second-order properties of A(r)(X). Ex-pression (6.6.4) and the results of Theorem 6.4.2 indicate that in order toreduce the asymptotic variances of the entries of A(r)(X) if it is possiblewe should select an X(/), t — 0, ±1,.. . such that the diagonal entries offxx(T)O^Tlare large. Suppose that ///r)(X), j = 1, . . . , r,' the diagonalentries of f*A-(r)(X) are given. Exercise 6.14.18 suggests that approximately

and that equality is achieved in the case that the off-diagonal elements are 0.We are again led to try to arrange that the off-diagonal elements of f*;r(r)(X)be near 0 and that its diagonal elements be large.

An additional advantage accrues from near 0 off-diagonal elements. From(6.6.4) we see that if they are near 0, then the statistics AJ(T)(\\ /4*(r)(X),1 ^ j < k ^ r, will be nearly uncorrelated and asymptotically nearly inde-pendent. Their interpretation and approximate properties will be moreelementary.

In order to obtain reasonable estimates of A(r)(X), — °° < X < <», wehave been led to seek an X(f), t = 0, ±1,. . . such that fxx(T}(a) is nearconstant in a, has off-diagonal terms near 0, and has large diagonal terms.We will see later that a choice of X(/) likely to lead to such an fxx(T)(d) is arealization of a pure noise process having independent components withlarge variance.

On the bulk of occasions we will be presented with X(0, t = 0, ± 1,. . . asa fait accompli; however as we have seen in Section 6.1 we can alter certainof the properties of X(/) by a filtering. We could evaluate

t = 0,. . . ,T — 1 for some r X r filter (c(w)} and then estimate the transferfunction Ai(X) relating 7(0 to Xi(/), t - 0, ±1,. . . . Let this estimate beAi(T)(X)- As an estimate of A(X) we now consider

From (6.1.10) and (6.5.14)

suggesting that we should seek a filter C(X) such that A(X)C(X)-1 does notvary much with X. Applying such an operation is called prefiltering. It can beabsolutely essential even in simple situations.


Consider a common relationship in which Y(f) is essentially a delayedversion of X(t)\ specifically suppose

In terms of the previous discussion, we are led to prefilter the data using thetransfer function C(X) = exp { — / X u } , that is, to carry out the spectral cal-culations with the series X$t) = X(t — v) instead of X(t). We would thenestimate A($ here by exp {—/XD}/i^,(X)//Fjri(X). This procedure wassuggested by Darzell and Pearson (1960) and Yamanouchi (1961); seeAkaike and Yamanouchi (1962). In practice the lag v must be guessed atbefore performing these calculations. One suggestion is to take the lag thatmaximizes the magnitude of the cross-covariance of the series 7(0 and Jf(/).

In Section 7.7 we will discuss a useful procedure for prefiltering in the caseof vector-valued X(f). It is based on using least squares to fit a preliminarytime domain model and then to carry out a spectral analysis of X(f) with theresiduals of the fit.

We have so far considered means of improving the estimate A(r)(X). Theother estimates g,.(r)(X), n(T\ a(r)(w) are based on this estimate in an intimatemanner. We would therefore expect any improvement in A(r)(X) to result inan improvement of these additional statistics. In general terms we feel thatthe nearer the relation between Y(i) and X(f), / = 0, ±1,... is to multipleregression of Y(f) on X(0 with pure noise errors, the better the estimateswill be. All prior knowledge should be used to shift the relation to one nearthis form.

A few comments on the computation of the statistics appear in order. Theestimates have been based directly on the discrete Fourier transforms of theseries involved. This was done to make their sampling properties more ele-

for some v. In this case

and so for example

In the case that v is large, cos 2vsv/T fluctuates rapidly in sign as s varies.Because of the smoothing involved, expression (6.11.7) will be near 0, ratherthan the desired


mentary. However, it will clearly make sense to save on computations byevaluating these Fourier transforms using the Fast Fourier TransformAlgorithm. Another important simplification results from noting that theestimates of Section 6.4 can be determined directly from a standard multipleregression analysis involving real-valued variates. Consider the caseX ^ 0 (mod TT). Following the discussion of Section 6.3, the model (6.1.1)leads to the approximate relation

s — 0, ±1,.. ., ±w for 2irs(T)/T near X. In terms of real-valued quantitiesthis may be written

5 = 0, ±l, . . . , ±m. Because the values Re dtm(2Tr[s(T) + s]/T),

Im dt(T>(2ir[s(T) -|- s]/T), 5 = 0, d= l , . . ., ±m are approximately uncorre-

lated 7r7/t,(X) variates, (6.11.10) has the approximate form of a multipleregression analysis with regression coefficient matrix

and error variance irTftt(\). Estimates of the parameters of interest willtherefore drop out of a multiple regression analysis taking the Y matrix as

6.12 COMPARISON OF ESTIMATES OF THE IMPULSE RESPONSE

and the X matrix as

223

and so forth in (6.1.1).

6.12 A COMPARISON OF THREE ESTIMATES OF THEIMPULSE RESPONSE

Suppose the model of this chapter takes the more elementary form

which is of the form of (6.12.2) with expanded dimensions.

Estimates in the case X = 0 (mod TT) follow in a similar manner.

We remark that the model (6.1.1) is of use even in the case that X(f),t — 0, ± 1 , . . . is not vector valued. For example if one wishes to investigatethe possibility of a nonlinear relation between real-valued series Y(t) andX(t), t = 0, ±1, . . . one can consider setting

for (6.12.1) may be rewritten

for some finite m, n. In this case the dependence of Y(t) on the series X(t),t = 0, ± 1, . . . is of finite duration only. We turn to a comparison of threeplausible estimates of the coefficients a(—m), . . ., a(0),. . . , a(«) that nowsuggest themselves. These are the estimate of Section 6.8, a least squaresestimate, and an asymptotically efficient linear estimate.

We begin by noting that it is enough to consider a model of the simpleform

We note that both of the estimates (6.12.4) and (6.12.9) are weightedaverages of the \(T)(2irp/P) values. This suggests that we should consider, asa further estimate, the best linear combination of these values. Now Exercise6.14.11 and expression (6.12.5) indicate that this is given approximately by

Using (6.12.5), the covariance matrix of (6.12.9) will be approximately


The estimate of a of (6.12.2) suggested by the material of Section 6.8 is

From (6.6.4)

and so the covariance matrix of aim is approximately

The particular form of the model (6.12.2) suggests that we should alsoconsider the least squares estimate found by minimizing

with respect to /x and a. This estimate is

We may approximate this estimate by

6.13 USES OF THE PROPOSED TECHNIQUE

The statistics of the present chapter have been calculated by many re-searchers in different situations. These workers found themselves consider-ing a series 7(0, t = 0, ±1,. .. which appeared to be coming about from aseries X(t), t — 0, ±1, . . . in a linear time invariant manner. The latter is theprincipal implication of the model (6.1.1). These researchers calculatedvarious of the statistics A™(\), G(7XA), 4><r>(\), ge6

(r)(X), |/?y^(r)(X)|2, a(r)(«).An important area of application has been in the field of geophysics.

Robinson (1967a) discusses the plausibility of a linear time invariant modelrelating a seismic disturbance X(t), with 7(0 its recorded form at some sta-tion. Tukey (1959c) relates a seismic record at one station with the seismicrecord at another station. Other references to applications in seismologyinclude: Haubrich and MacKenzie (1965) and Pisarenko (1970). Turning tothe field of oceanography, Hamon and Hannan (1963) and Groves andHannan (1968) consider relations between sea level and pressure and windstress at several stations. Groves and Zetler (1964) relate sea levels at SanFrancisco with those at Honolulu. Munk and Cartwright (1966) take ,¥(0 tobe a theoretically specified mathematical function while 7(0 is the series oftidal height. Kawashima (1964) considers the behavior of a boat on an oceanvia cross-spectral analysis. Turning to the field of meteorology, Panofsky(1967) presents the results of spectral calculations for a variety of series in-cluding wind velocity and temperature. Madden (1964) considers certainelectromagnetic data. Rodriguez-Iturbe and Yevjevich (1968) take 7(0 tobe rainfall recorded at a number of stations in the U.S.A. and X(i) to be

6.13 USES OF THE PROPOSED TECHNIQUE 225

with approximate covariance matrix

In view of the source of 33(r), the matrix differences (6.12.6) — (6.12.12)and (6.12.10) — (6.12.12) will both be non-negative definite. In the case thatg.«(r)(X) is near constant, as would be the case were the error series e(0 whitenoise, and T not too small, formulas (6.12.9) and (6.12.11) indicate that theleast squares estimate a2(r) will be near the "efficient" estimate a3(T). In thecase that ixx(T)(\}g»(T\X) is near constant, formulas (6.12.4) and (6.12.11)indicate that the estimate ai(r) will be near the estimate E3(r).

Hannan (1963b, 1967a, 1970) discusses the estimates ai(r), a3(r) in the case

of stochastic X(0, t = 0, d= 1, Grenander and Rosenblatt (1957), Rosen-blatt (1959), and Hannan (1970) discuss the estimates a2(r), a3(r) in the caseof fixed X(0, t = 0, ±1,


relative sunspot numbers. Brillinger (1969a) takes Y(t) to be monthly rainfallin Sante Fe, New Mexico, and X(i) to be monthly relative sunspot numbers.

Lee (1960) presents arguments to suggest that many electronic circuitsbehave in a linear time invariant manner. Akaike and Kaneshige (1964)take 7(0 to be the output of a nonlinear circuit, X(f) to be the input of thecircuit, and evaluate certain of the statistics discussed in this chapter.

Goodman et al (1961) discuss industrial applications of the techniquesof this chapter as do Jenkins (1963), Nakamura (1964), Nakamura andMurakami (1964). Takeda (1964) uses cross-spectral analysis in an investiga-tion of aircraft behavior.

As examples of applications in economics we mention the books byGranger (1964) and Fishman (1969). Nerlove (1964) uses cross-spectralanalysis to investigate the effectiveness of various seasonal adjustment pro-cedures. Naylor et al (1967) examine the properties of a model of the textileindustry.

Results discussed by Khatri (1965b) may be used to construct a test of thehypothesis Im A(\) = 0, — °° < X < », that is cfoi) = a{—u), u = 0,±1 , . . . . The latter would occur if the relation between 7(0 and X(t) weretime reversible.

A number of interesting physical problems lead to a consideration ofintegral equations of the form

to be solved for/(/), given g(0» ft, b(i). A common means of solution is toset down a discrete approximation to the equation, such as

which is then solved by matrix inversion. Suppose that we rewrite expression(6.13.2) in the form

with the series e(0 indicating the error resulting from having made a discreteapproximation, with X(0) = 0 + b(0), X(u) = b(u), u ** 0, and 7(0 = g(/).The system (6.13.3) which we have been considering throughout this chapter,suggests that another way of solving the system (6.13.1) is to use cross-spectral analysis and to take a(T)(u), given by (6.8.2), as an approximation tothe desired/(O-

So far in this chapter we have placed principal emphasis on the estimationof A(X) and a(u). However, we next mention a situation wherein the param-

6.14 EXERCISES 227

eter of greatest interest is the error spectrum/.,(X). The model that has beenunder consideration is

with S \u\ \a(u)\ < oo. Suppose that we think of a(f), t = 0, ±1, . . . asrepresenting a transient signal of brief duration. Suppose we define

Expression (6.13.4) now takes the simpler form

The observed series, 7(0, is the sum of a series of interest, n + e(0, and apossibly undesirable transient series a(i). The procedures of this chapterprovide a means of constructing an estimate of/,,(X), the power spectrum ofinterest. We simply form g8«

(r)(X), taking the observed values Y(t), t = 0,. . . , T — 1 and X(t) as given by (6.13.5). This estimate should be sensibleeven when brief undesirable transients get superimposed on the series ofinterest. In the case that the asymptotic procedure of Section 6.4 is adopted,the distribution of g«(r)(A) is approximately a multiple of a chi-squared with4m degrees of freedom. This is to be compared with the 4m + 2 degrees offreedom the direct estimate /,.(r)(X) would have. Clearly not too muchstability has been lost, in return for the gained robustness of the estimate.

6.14 EXERCISES

6.14.1 Let the conditions of Theorem 6.2.1 be satisfied, but with EtTt — <r2Ireplaced by EtTt = S. Show that Ea. = a as before, but now

6.14.2 Let the conditions of Theorem 6.2.1 be satisfied, but with EeTt = a2!replaced by Etrt = cr2V. Prove that

is minimized by

Show that b is unbiased with covariance matrix tr2(XV~1XT)"1- Show thatthe least squares estimate a = YXT(XXT)-1 remains unbiased, but hascovariance matrixa2(XXT)-lX\XT(XXr)-1.

6.14.3 In the notation of Theorem 6.2.3, prove that the unbiased, minimumvariance linear estimate of aTa, for a a & vector, is given by aTa.


6.14.4 Let /if be a complex-valued random variable. Prove that

varpr| ^ covMf,*}.

6.14.5 In the notation of Theorem 6.2.3, let \R\2 = aXXTIT/YYr. Prove that0 ^ \R\2 ^ 1. Under the conditions of Theorem 6.2.4 prove that (n — k)\R\2/(k(\ — \R\2)] is distributed as noncentral_F, degrees of freedom 2kand 2(/i — k) and noncentrality parameter aXXTar/<r2-

6.14.6 Under the conditions of either Theorem 6.4.2 or Theorem 6.7.1, prove that<£(r>(X) is asymptotically uniform on (0,2Tr] if A(\) = 0.

6.14.7 Prove that the following is a consistent definition of asymptotic normality.A sequence of r vector-valued random variables Xn is asymptoticallynormal with mean 6n -f Wny and covariance matrix £„ = ^P*n]£^rn if*Pn-*(Xtt — 6») tends, in distribution, to M(y,S) where On is a sequence ofr vectors and *Fn a sequence of nonsingular r X r matrices.

6.14.8 With the notation of Exercise 6.14.2, show that (XXO^XVX^XX')"1 ^(XV^X)-1.! A ^ B here means A — B is non-negative definite.]

6.14.9 Show that fy;r(r)(X) of (6.5.6) may be written in the form

6.14.10 Let 7(r), / = 0, ±1, . . . denote the series whose finite Fourier transform isdr<"(\) - A<r>(X)d;r<r>(X). Prove that /77

(r)(X) = g«,(r)(X), that is, theestimate of the error spectrum may be considered to be a power spectralbased on a series of residuals.

6.14.11 Let Yy-, / = ! , . . . , J be 1 X r matrix-valued random variables withEYj = 0,£{(Yy - SHYT^)} = 6{j - k}Vjfor 1 ̂ j <C k ^ J. Provethat the best linear unbiased estimate of (J is given by

Hint: Use Exercises 6.14.2 and 6.14.8. Exercise 1.7.6 is the case r = 1. Showthat £{(5 - 0)T(5 - 5)) - CCy V,--1]-'.

6.14.12 Suppose the conditions of Theorem 6.2.4 hold. Prove that if two columnsof the matrix X are orthogonal, then the corresponding entries of & arestatistically independent.

where w(T)(u) is given by

and CYX(T)(U) is given by

6.14 EXERCISES 229

6.14.13 Demonstrate that #«,(n(X), the estimate (6.4.5) of the error spectrum,is non-negative.

6.14.14 If |/?yA(r)(X)|2 is defined by (6.4.11), show that it may be interpreted as theproportion of the sample power spectrum of the Y(t) values explained bythe X(r) values.

6.14.15 Show that the statistics A(7-)(\), gtt(T)(\) do not depend on the values of the

sample means c*(T), CY(T).6.14.16 Prove that /yy(r)(X) ^ gei

m(\) with the definitions of Sections 6.4.6.14.17 Let a be a A: vector. Under the conditions of Theorem 6.2.4 show that

provides a 100/3 percent multiple confidence region for all linear combina-tions of the entries of a. (This region is a complex analog of the Schefferegion; see Miller (1966) p. 49.)

6.14.18 We adopt the notation of Theorem 6.2.3. Let X, denote they'th row of Xand let X,X/ = C,, j = 1, . . . , k with Ci, . . . , Ck given. Prove that

and that the minimum is achieved when X,X*r = 0, k ^ j that is, when therows of X are orthogonal. (For the real case of this result see Rao (1965)p. 194.)

6.14.19 Let w be a Nic(n,a2) variate and let R = \w\, p = |ju|, /= arg w, <f> — argju. Prove that the density function of R is

where /o(*) is the 0 order Bessel function of the first kind. Prove

for v > 0, where \F\(a;b;x) is the confluent hypergeometric function.Evaluate ER if p = 0. Also prove the density function of/is

See Middleton (1960) p. 417.6.14.20 Let

where e is an s X n matrix whose columns are independent JV,C(0,E)variates, a is an s X r matrix of unknown complex parameters, x is anr X n matrix of known complex entries and y is an s X n matrix of knowncomplex variates. Let


where 3 = yxT(xxT)~l» provided the indicated inverses exist.6.14.22 Under the conditions of Theorem 6.6.1 prove that

and

Prove that vec & is #£(vec a, S (x) (xx7)"1) and E is independent of a and(n — r)~lW,c(n — r,S). The operations vec and 0 are defined in Section82

6.14.21 Let x and y be given s X n and r X n matrices respectively with complexentries. For given c X sC, r X « U, c X u T, show that the s X r a,constrained by Call = T that minimizes

is given by

6.14.23 Under the conditions of Theorem 6.4.2 prove that

tends to (2m + l)-1/,,(X)X2r2 independently of g,.(r)(X) for X ̂ 0 (mod

IT). Develop a corresponding result for the case of X s= 0 (mod TT).

6.14.24 Suppose that in the formation of (6.4.2) one takes m = T — 1. Prove thatthe resulting A<r>(X) is

where cy(r) and CX(T} are the sample means of the Y and X values. Relatethis result to the multiple regression coefficient of Y(f) on X(r).

6.14.25 Under the conditions of Theorem 6.4.2 and if A(X) = 0, prove that forX ^ 0 (mod TT),

tends in distribution to

where F has an F distribution with degrees of freedom 2(2w + 1 — r) and2r.

6.14.32 Suppose that we consider the full model (6.12.3), rather than the simplerform (6.12.2). Let [a/T>(-m)- • -a/r)(ii)],7 = 1, 2, 3, be the analogs hereof the estimates ai(r), 82(r), 33(r> of Section 6.12. Show that the covariancescov {a/r)(H), aj(T>(v)\J = 1, 2, 3 are approximately BT-l7^l2ir f W(a)2datimes


6.14EXERCISE231

6.14.26 Suppose that Y(f), t(f), t = 0, ±1, . . . are s vector-valued stochasticseries. Let y denote an s vector and a(/) denote an s X r matrix-valuedfunction. Let X(/), / = 0, ±1, . . . denote an r vector-valued fixed series.Suppose

Develop estimates A(r)(X) of the transfer function of {a(w)} and g,«(r)(X) ofthe spectral density matrix of e(r); see Brillinger (1969a).

6.14.27 Suppose that fjr*(r)(X) tends to fxx(X) uniformly in X as T—> °o and sup-pose that ||f*jr(X)||, HfrXX)-1!! < K, - °° < X < °° for finite K. Provethat Assumption 6.5.2 is satisfied.

6.14.28 Prove that fjr*(r)(X) as defined by (6.5.5) is non-negative definite if W(a) ^0. Also prove that g«,(r)(X) given by (6.5.9) is non-negative under thiscondition.

6.14.29 Let Xi(/) = SJK/ - u)X(u) where {b(u)} is a summable rXr filter withtransfer function B(X). Suppose that B(X) is nonsingular, — °o < X < oo.Prove that Xi(0, t = 0, ±1, . . . satisfies Assumption 6.5.2 if X(/), / = 0,±1, ... satisfies Assumption 6.5.2.

6.14.30 Suppose Y(f) and X(r) are related as in (6.1.1). Suppose that Xj(f) is in-creased toXjit) + exp {i\t} and the remaining components of X(/) are heldfixed. Discuss how this procedure is useful in the interpretation of Aj{X).

7

ESTIMATING THESECOND-ORDER SPECTRA OF

VECTOR-VALUED SERIES

7.1 THE SPECTRAL DENSITY MATRIX AND ITS INTERPRETATION

In this chapter we extend the results of Chapter 5 to cover the case of thej oint behavior of second-order statistics based on various components of avector-valued stationary time series.

Let X(/), t — 0, ±1, . . . be an r vector-valued series with componentseries Xa(t), t = 0, ±1, . . . for a = 1,. . . , r. Suppose

Indicate the individual entries of ex by ca, a = 1,. . ., r so ca = EXa(t) isthe mean of the series Xa(i), t = 0, ± 1, . . . . Denote the entry in row a,column b of Cxx(u) by cab(u), a, b — 1,.. ., r, so cab(u) is the cross-covari-ance function of the series Xa(f) with the series Xt,(t). Note that.

Supposing

232

7.1 THE SPECTRAL DENSITY MATRIX 233

we may define fxx(X), the spectral density matrix at frequency X of the seriesX(0, ; = 0, ± l , . . . by

The definition of the spectral density matrix may be inverted to obtain

/afc(X), the entry in row a and column b of fxx(X), is seen to be the powerspectrum of the series Xa(f) if a = b and to be the cross-spectrum of theseries Xa(t) with the series Xb(i) if a ^ b. ixx(^) has period 2ir with respectto X. Also, because the entries of Cxx(u) are real-valued

for an s X r matrix-valued filter with transfer function

then the spectral density matrix of the series Y(/) is given by

Expressions (7.1.9) and (7.1.10) imply that the covariance matrix of the svector-valued variate Y(/) is given by

With a goal of obtaining an interpretation of fÂr(X) we consider the implica-tion of this result for the 2r vector-valued filter with transfer function

from (7.1.3). The matrix fxx(ty is Hermitian from the last expression. Theseproperties mean that the basic domain of definition of fxx(\) may be theinterval [0,7r]. We have already seen in Theorem 2.5.1 that f*jr(X) is non-negative definite, fÂ-(X) ^ 0, for — » < x < o>, extending the result thatthe power spectrum of a real-valued series is non-negative.

Example 2.8.2 shows the effect of filtering on the spectral density matrix.Suppose

234 ESTIMATING SECOND-ORDER SPECTRA

and = 0 for all other essentially different frequencies. (For (7.1.12) we areusing the definition of Theorem 2.7.1 of the filter.) If A is small, the output ofthis filter is the 2r vector-valued series

involving the component of frequency X discussed in Section 4.6. By in-spection, expression (7.1.11) takes the approximate form

and the approximate form

Both approximations lead to the useful interpretation of Re f**(X) as pro-portional to the covariance matrix of X(f,X) (the component of frequency Xin X(0), and the interpretation of Im f*jr(X) as proportional to the cross-covariance matrix of X(/,X) with its Hilbert transform X"(f,X). Re/fl/,(X), theco-spectrum of Xa(i) with Xb(t), is proportional to the covariance of thecomponent of frequency X in the series Xa(t) with the corresponding com-ponent in the series Av,(0- Im/afr(X), the quadrature spectrum, is proportionalto the covariance of the Hilbert transform of the component of frequency Xin the seriesXa(t) with the component of frequency X in the series Xdj).Being covariances, both of these are measures of degree of linear rela-tionship.

When interpreting the spectral density matrix, fxx(h), it is also useful torecall the second-order properties of the Crame*r representation. In Theorem4.6.2 we saw that X(f) could be represented as

7.2 SECOND-ORDER PERIODOGRAMS 235

where the function Z*(X) is stochastic with the property

where r?(-) is the 2ir periodic extension of the Dirac delta function. From(7.1.17) it is apparent that fxx(^) maybe interpreted as being proportionalto the covariance matrix of the complex-valued differential dZjr(A). Bothinterpretations will later suggest plausible estimates for fxx(ty-

7.2 SECOND-ORDER PERIODOGRAMS

Suppose that the stretch X(/), / = 0,. . ., T — 1 of T consecutive values ofan r vector-valued series is available for analysis and the series is stationarywith mean function c.y and spectral density matrix fA-A-(X), — °° < A < ».Suppose also we are interested in estimating f*x(X). Consider basing an esti-mate on the finite Fourier transform

where ha(f) is a tapering function vanishing for \t\ sufficiently large,a = 1,. . . , r. Following Theorem 4.4.2, this variate is asymptotically

where

and

These distributions suggests a consideration of the statistic

as an estimate of fxx(\) in the case X 7* 0, ±27r , . . . . The entries ofIxx(T}(k)are the second order periodograms of the tapered values ha(t/T)Xa(f), t = 0,±1,.... This statistic is seen to have the same symmetry and periodicityproperties as fA-x(A). In connection with it we have

The character of the tapering function ha(t/T) is such that its Fouriertransform Ha

(T)(\) is concentrated in the neighborhood of the frequenciesX == 0, ±2ir,. .. for large T. It follows that in the case X ̂ 0 (mod 2x), thefinal term in (7.2.7) will be of reduced magnitude for T large. The first termon the right side of (7.2.7) is seen to be a weighted average of the cross-spectrum fab of interest with weight concentrated in the neighborhood of Xand with relative weight determined by the tapers. In the limit we have

Corollary 7.2.1 Under the conditions of Theorem 7.2.1 and if/ ha(u)hh(u)du 9* 0 for a, b = 1,. . . , r

The estimate is asymptotically unbiased if X ̂ 0 (mod 2?r) or if ex = 0. Ifca, Cb are far from 0, then substantial bias may be present in the estimate\xx(T)(X) as shown by the term in ca, cb of (7.2.7). This effect may be reducedby subtracting an estimate of the mean of X(f) before forming the finiteFourier transform. We could consider the statistics

with


Theorem 7.2.1 Let X(f), t = 0, ±1, . . . be an r vector-valued series wt!mean function £X(/) = c* and cross-covariance function cov {X(f + u]X(/)} = CA-A-(M) for t, u = 0, ±1,. . . . Suppose

Let ha(u), — oo , satisfy Assumption 4.3.1 for a = 1,. . . , r. LetlAr*(r)(X) be given by (7.2.5). Then


and then the estimate

The asymptotic form of the covariance of two entries of lxx(T}0^) in thecase that the series has mean 0 is indicated by

Theorem 7.2.2 Let X(/), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let ha(u), a = 1, . . . , / • satisfy Assumption 4.3.1.Let Ixx(T}C^) be given by (7.2.5), then

where |/MM| ^ *i|tf.(r)(X)| |#.(r)G*)l + K2\Ha^(\}\ + K,\Ha^(^\ + *4

for constants K\, . . . , K* and a = a\, 02, b\, 62, — » < X, M < °° •

The statistical dependence of /^ and /j^, is seen to fall off as the functionsHab

(T) fall off. In the limit the theorem becomes

Corollary 7.2.2 Under the conditions of Theorem 7.2.2

In the case of untapered values, ha(u) = 1 for 0 ^ M < 1, and = 0 other-wise, Exercise 7.10.14 shows that we have

for frequencies X, n of the form 2irr/T, 2-n-s/T where r, s are integers withr,s?£Q (mod T).

We complete the present discussion of the asymptotic properties of thematrix of second-order periodograms by indicating its asymptotic dis-tribution.

Theorem 7.2.3 Let X(f), t = 0, ±1,... be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(t), a = 1 , . . . , r satisfy Assumption 4.3.1. Let


Wr)(X) be given by (7.2.5). Suppose 2Xy, Xy ±\k^Q (mod 2ir) for1 ^.j<k^J. Then IXX(T}(*J), j = 1,.. . ,J are asymptotically inde-pendent Wrc(\fxx(\)) variates. Also if X = ±TT, ±3ir,.. ., then lA-A-(r)(X)is asymptotically W^\tfxx(\)) independently of the previous variates.

The Wishart distribution was given in Section 4.2 with its density functionand various properties. The limiting distribution of this theorem is seen toinvolve fxx(X) in a direct manner. However, being a Wishart with just 1 de-gree of freedom, the distribution is well spread out about fxx(X). ThereforeIjrjr(r>(^) cannot be considered a reasonable estimate.

It is interesting to note that the limiting distributions of Theorem 7.2.3 donot involve the particular tapering functions employed. In the limit thetaper used does not matter; however, as expression (7.2.7) shows, the taperdoes affect the large sample bias before we actually get to the limit. Conse-quently, if there may be peaks close together in fxx(\), we should taper thedata to improve the resolution.

The frequencies considered in Theorem 7.2.3 did not depend on T. Thefollowing theorem considers the asymptotic distribution in the case of anumber of frequencies tending to X as T —>• «. We revert to the untaperedcase in

Theorem 7.2.4 Let X(/), t = 0, ±1,.. . be an r vector-valued series satis-fying Assumption 2.6.1. Let

for - oo < X < oo. Let Sj(T) be an integer with X/r) = 2irSj(T)/T tendingto X, as r-> oo fory = 1, . . . , J. Suppose 2X/r), X/7) ± \k(T) ^ 0 (mod 2*-)for 1 ^ j < k <C J and T sufficiently large. Then lxx(T)(\j(T))J= 1,. . . , Jare asymptotically independent Wr

c(\fxx(\j)),j = I , . . . , / . Also ifX = ±TT, ±3?r,. . . , lA-A-(r)(X) is asymptotically Wr(\^xx(^y) independentlyof the previous variates.

The most important case of this theorem occurs when X, = X for j = 1,. . . , J. The theorem then indicates a source of J asymptotically independentestimates of fA-A-(X). The conclusions of this theorem were very much to beexpected in light of Theorem 4.4.1 which indicated that the £], X(/) exp{ -it\j(T)},j = 1 , . . . , J, are asymptotically independentNr

c(Q,2irTfxx(\jVvariates.

In order to avoid technical details we have made Theorem 7.2.4 refer tothe untapered case. Exercise 4.8.20 and Brillinger (1970b) present results


applying to frequencies depending on T as well as in the tapered case. Theessential requirement for asymptotic independence indicated by them is that\j(T) - \k(T), 1 ̂ j < k ^ J do not tend to 0 too quickly.

In particular, Theorems 7.2.3 and 7.2.4 give the marginal distributionspreviously determined in Chapter 5 for a periodogram /aa

(r)(X).The following theorem shows how we may construct L asymptotically in-

dependent estimates of fxxO$ in the case that the data have been tapered.We split the data into L disjoint segments of V observations, taper andform a periodogram for each stretch.

Theorem 7.2.5 Let \(t), t = 0, ±1,. . . be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(ii), — °° < u < <» satisfy Assumption 4.3.1and vanish for u < 0, u ^ I. Let

Figure 7.2.1 Periodogram of seasonally adjusted monthly mean temperatures at Berlin forthe years 1780-1950. (Logarithmic plot.)

/ = 0,. . ., L — 1 where

Then the IjrA-(K)(X,/), / = 0,. . . , L — 1 are asymptotically independent


Wrc(l,fxx(W variates if X ̂ 0 (mod w) and asymptotically independent

Wr(\,fxx(Xj) variates if X = in-, ±3*-,. . . , as K-> «.

Once again the limiting distribution is seen not to involve the tapers em-ployed; however, the tapers certainly appeared in the standardization ofda™(\J) to form W>(\,/).

Goodman (1963) introduced the complex Wishart distribution as anapproximation for the distribution of spectral estimates in the case of vector-valued series. Brillinger (1969c) developed \Vr

c(\,fxx(^) as the limiting dis-tribution of the matrix of second-order periodograms.

In Figures 7.2.1 to 7.2.5 we give the periodograms and cross-periodogramfor a bivariate series of interest. The series Xi(t) is the seasonally adjustedseries of mean monthly temperatures for Berlin (1780-1950). The seriesA^(0 is the seasonally adjusted series of mean monthly temperatures forVienna (1780-1950). Figures 7.2.1 and 7.2.2 give /n(r)(X), /22

(T)(X), the pe-riodograms of the series. The cross-periodogram is illustrated in the remain-ing figures which give Re /i2(r)(X), Im 7i2(r)(X), arg /i2(T)(V) in turn. All ofthe figures are erratic, a characteristic consistent with Theorem 7.2.3, whichsuggested that second-order periodograms were not generally reasonableestimates of second-order spectra.

Figure 7.2.2 Periodogram of seasonally adjusted monthly mean temperatures at Viennafor the years 1780-1950. (Logarithmic plot.)

Figure 7.2.3 Real part of the cross-periodogram of temperatures at Berlin with those atVienna.

Figure 7.2.4 Imaginary part of the cross-periodogram of temperatures at Berlin with thoseat Vienna.


Figure 7.2.5 Phase of the cross-periodogram of temperatures at Berlin with those at Vienna.

7.3 ESTIMATING THE SPECTRAL DENSITY MATRIXBY SMOOTHING

Theorem 7.2.4 suggests a means of constructing an estimate of fxx(^)with a degree of flexibility. If

then, from that theorem, for s(T) an integer with 27rs(r)/rnear X ̂ 0 (mod T),the distribution of the variates lxxm(2ir[s(T) + s]/T), s = 0, ±1,. . . , dbmmay be approximated by 2m + 1 independent Wr

c(l,fxx03) distributions.The preceding suggests the consideration of the estimate

f

7.3 ESTIMATING THE SPECTRAL DENSITY MATRIX BY SMOOTHING 243

A further examination of the results of the theorem suggests the form

and the form

The estimate given by (7.3.2) to (7.3.4) is seen to have the same symmetryand periodicity properties as fÂ-(X) and to be based on the values,dx(T) (2-n-s/T), 5 ^ 0 (mod T7) of the discrete Fourier transform. In con-nection with it we have

Theorem 7.3.1 Let X(/), / = 0, ±1,. . . be an r vector-valued series withmean function CA- and cross-covariance function £xx(u) = cov{X(f + w),X(r)}for /, w = 0, ±1 Suppose

Let f*A:(r)(A) be given by (7.3.2) to (7.3.4). Then

and

and

The functions ̂ rm(«), #rm(«), Crm(ot) are non-negative weight functions.The first has peaks at a = 0, ±2ir, ±4?r,. . . and has width there of approxi-mately Airm/T. The second and third are also concentrated in intervals ofapproximate width 4irm/T about the frequencies a = 0, ±2ir,. . .; how-ever, they dip at these particular frequencies. They are graphed in Figure5.4.1 for T = 11. In any case, Eixx{T)(*) should be near the desired fxx(tyin the case that fxx(a) is near constant in a band of width 4irm/T about X.In the limit we have

Corollary 7.3.1 Under the conditions of Theorem 7.3.1 and if 2trs(T)/T —> Xas 7"—> 0°

The estimate is asymptotically unbiased as is clearly desirable. We nextturn to a consideration of second-order properties.

Theorem 7.3.2 Let X(0, t = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let txx(T)(\) be given by (7.3.2) to (7.3.4) withX - 2*s(T)/T = O(r-i). Then


where

7.3 ESTIMATING THE SPECTRAL DENSITY MATRIX BY SMOOTHING 245

The second-order moments are seen to fall off in magnitude as m in-creases. By choice of m, the statistician has a means of reducing the asymp-totic variability of the estimates to a desired level. The statistics are seen tobe asymptotically uncorrelated in the case that A ± n ^ 0 (mod 2ir). Inaddition, expression (7.3.13) has a singularity at the frequencies A, n — 0,dzir, ±27r , . . . . This results from two things: not knowing the mean c* andthe fact that f**(A) is real at these particular frequencies. We remark that theestimate fxx(T}(ty is not consistent under the conditions of this theorem.However, in the next section we will develop a consistent estimate.

Turning to the development of a large sample approximation to the dis-tribution of fxx(T)(X), we may consider

Theorem 7.3.3 Let X(/), t = 0, ± 1,. .. be an r vector-valued series satis-fying Assumption 2.6.1. Let f>;r(r)(A) be given by (7.3.2) to (7.3.4) with2irs(T)/T —»X as T—> °°. Then ixx(T)(^) is asymptotically distributed as(2m + I)"1 Wrc(2m + 1 fxx(\)) if A ̂ 0 (mod *•) and as (2m)-» Wr(2m,fxx(\})if X = 0 (mod TT). Also fxx(T)(hj), j = 1 , . . . ,J are asymptotically inde-pendent if Ay ± \k ̂ 0 (mod 2ir) for 1 ̂ j < k ^ J.

Asymptotically, the marginal distributions of the diagonal entries of fxx(h)are seen to be those obtained previously. The diagonal elements/,fl

(r)(A) areasymptotically the scaled chi-squared variates of Theorem 5.4.3. The stan-dardized off-diagonal elements asymptotically have the densities of Exercise7.10.15.

The approximation of the distribution of fxx(T)(ty by a complex Wishartdistribution was suggested by Goodman (1963). Wahba (1968) considers theapproximation in the case of a Gaussian series and m —» <». Brillinger(1969c) considers the present case with mean 0.

Theorems of the same character as Theorems 7.3.1 to 7.3.3 may be de-veloped in the case of tapered values if we proceed by splitting the data intoL disjoint segments of V observations. Specifically we set

Following Theorem 7.2.5, the estimates IA-A:(K)(A,/), / = 0,. . . , L — 1 areasymptotically independent Wr

c(\,fxx(\y) variates if A ̂ 0 (mod TT) andWr(\,fxx(K) variates if A = ±TT, d = 3 ? r , . . . . This suggests a consideration ofthe estimate

where /fo(«), — <» 1. Next we set


In connection with the above we have

Theorem 7.3.4 Suppose the conditions of Theorem 7.3.1 are satisfied.Suppose also the functions ha(u), a = 1,. . . , r satisfy Assumption 4.3.1,vanish for u < 0, u ^ 1, and satisfy J ha(u)hb(u)du ^ 0. Let fxx(LV) be givenby (7.3.16). Then

This theorem is an immediate consequence of Theorem 7.2.1 and itscorollary. It is interesting to note that the weighted average of fab appear-ing in expression (7.3.17) is concentrated in an interval of width propor-tional to V~l.

Theorem 7.3.5 Let X(f), f = 0, ±1,.. . be an r vector-valued series satis-fying Assumption 2.6.1. Let ha(ii), a = 1,. . . , r satisfy Assumption 4.3.1,vanish for u < 0, u ^ 1 and be such that J ha(u)hi,(u) du j* 0. Let tXx<Ly}(\)be given by (7.3.16). Then

The second-order moments are here reduced from those of (7.2.13) bythe factor \/L. The statistician may choose L appropriately large enoughfor his purposes in many cases. Finally we have

Theorem 7.3.6 Under the conditions of Theorem 7.2.5 and if iXx(LV)(^) isgiven by (7.3.16), iXx(LV\\) is asymptotically L-lWr

c(L,fxx(\)) if X pi 0(mod TT) and asymptotically L~J W£L,fxx(\y) if X = ±ir, ±3ir, . . . as K—> ».

Again the Wishart distribution is suggested as an approximation for thedistribution of an estimate of the spectral density matrix. One difficulty with

We shall form an estimate of/flfc(X) by taking a weighted average of thisstatistic concentrating weight in a neighborhood of X having width O(Br)where BT is a band-width parameter tending to 0 as T —» °°.

Let W0b(a), — <» < a < °°, be a weight function satisfying

is computed. The corresponding second-order periodograms are thengiven by

7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX 247

the above estimation procedure is that it does not provide an estimate in thecase of A = 0 (mod 2ii). An estimate for this case may possibly be obtainedby extrapolating estimates at nearby frequencies. Note also the estimate ofExercise 7.10.23.

Exercise 7.10.24 indicates the asymptotic distribution of the estimate

involving an unequal weighting of periodogram values.

7.4 CONSISTENT ESTIMATES OF THE SPECTRAL DENSITY MATRIX

The estimates of the previous section were not generally consistent, that is,fxx(r)(ty did not tend in probability to fxx(ty as T—» °°, typically. How-ever, the estimates did involve a parameter (m or Z,) that affected theirasymptotic variability. A consideration of the specific results obtainedsuggests that if we were to allow this parameter to tend to <» as T —> °°, thenwe might obtain a consistent estimate. In this section we shall see that thisis in fact the case. The results to be obtained will not be important so muchfor the specific computations to be carried out, as for their suggestion ofalternate plausible large sample approximations for the moments and distri-bution of the estimate.

Suppose the stretch X(/), t = 0,. . . , T — 1 of an r vector-valued series isavailable for analysis. Suppose the discrete Fourier transform


Let BT,T= 1, 2 , . . . be a bounded sequence of non-negative scale param-eters. As an estimate of /"a&(X) consider

where

This estimate has the same symmetry and periodicity properties as doesfjrXX) in the case that the functions Wat(a) are even, Wab(—a) = Wab(ot). Inaddition, if the matrix [Wa*(a)] is non-negative definite for all a, thenfxx(T)0^) will be non-negative definite as is fxx(^)\ see Exercise 7.10.26.We now set down

Theorem 7.4.1 Let X(/), / = 0, dbl, . . . be an r vector-valued series withmean function £X(/) = c* and covariance function cov {X(/ -f- M), X(/)} =£xx(u), for /, u = 0, ± 1, . . . . Suppose

Let /a6(r)(X) be given by (7.4.5) where Wab(a) satisfies Assumption 5.6.1,a, b = ! , . . . , / - . Then

In view of the 2tr period of Iab(T)(a), the estimate may be written

The estimate (7.4.4) is seen to weight periodogram values heavily at fre-quencies within O(flr) of X. This suggests that we will later require BT —» 0as T -» oo.

As an estimate of fÂ-(X) we now take


If in addition

then

The error term is uniform in X.

Expressions (7.4.9) and (7.4.11) show that the expected value of the pro-posed estimate is a weighted average of/,*(«), — °° < a < °°, with weightconcentrated in a band of width O(Br) about X. In the case that BT —> 0 asT—> oos the estimate is asymptotically unbiased. We may proceed as inTheorem 3.3.1 to develop the asymptotic bias of the estimate (7.4.5) as afunction of BT. Specifically we have

Theorem 7.4.2 Let/,&(X) have bounded derivatives of order ^P. Suppose

If P — 3, the above theorems and the fact that W(—a) = W(a) give

From this, and expression (7.4.13), we see that in connection with the biasof the estimate y^,(r)(X) it is desirable that fab(oi) be near constant in theneighborhood of X, that BT be small and that J a? W(a)da, p = 2, 4,. . . besmall. The next theorem will show that we cannot take BT too small if wewish the estimate to be consistent.


Theorem 7.4.3 Let X(0, t = 0, ± 1,. . . be an r vector-valued series satis-fying Assumption 2.6.2(1), Let Wab(a), — «> < a < «>, satisfy Assumption5.6.1, a, b = 1,. . ., r. Let/«*(7X\) be given by (7.4.5). Let BTT -> «. Then

for ai, «2, b\, b2 = 1,. . ., r. The error term is uniform in X, M.

Given the character of the W(T) functions, this covariance is seen to havegreatest magnitude for X ± M ^ 0 (mod 2;r). The averages in (7.4.15) areapproximately concentrated in a band of width O(flr) about X, n, and so thecovariance approximately equals

In the limit we have

Corollary 7.4.3 Under the conditions of Theorem 7.4.3 and if BT —» 0,BrT —> oo as T —» oo

We see that the second-order moments are O(BT~IT~I) and so tend to 0 as7—> OD . We have already seen that the estimate is asymptotically unbiased.It therefore follows that it is consistent. We see that estimates evaluated atfrequencies X, n with X ± p, ̂ 0 (mod 2ir) are asymptotically uncorrelated.


The first statement of expression (7.4.15) may be used to give an ex-pression for the large sample covariance in the case where BT = 2r/T.Suppose Wab(a) vanishes for |a| sufficiently large and X = 2irs(T)/T withs(T) an integer. For large T, the estimate (7.4.4) then takes the form

The estimate (7.3.2) had this form with Wab(s) = T/2ir(2m + l)for|s| ^ m.Expression (7.4.16) may be seen to give the following approximate form forthe covariance here

The results of Theorem 5.5.2 are particular cases of (7.4.19).Expression (7.4.17) may be combined with expression (7.4.14) to obtain a

form for the large sample mean squared error of/,6(r)(X). Specifically, ifX fzi 0 (mod IT) it is

Exercise 7.10.30 indicates that BT should be taken to fall off as T'1'5 if wewish to minimize this asymptotic mean-squared error.

Turning to the asymptotic distribution itself, we have

Theorem 7.4.4 Suppose Theorem 7.4.1 and Assumption 2.6.1 are satisfied.Then fA-.v ( r )(Xi),..., fxx{T)0^j) are asymptotically jointly normal with co-variance structure given by (7.4.17) as T—» « with BrT—* »} BT —» 0.

An examination of expression (7.4.17) shows that the estimates f*.r(r)(X),fxx{T)(n) are asymptotically independent if X ± M ̂ 0 (mod 2ir). In the casethat X = 0 (mod T), the estimate fxx(T)0$ is real-valued and its limitingdistribution is seen to be real normal.

In Section 7.3, taking an estimate to be the average of 2m + 1 periodo-gram ordinates, we obtained a Wishart with 2m + 1 degrees of freedom asthe limiting distribution. That result is consistent with the result just ob-tained in Theorem 7.4.4. The estimate (7.4.4) is essentially a weightedaverage of periodogram ordinates at frequencies within O(fir) of X. Thereare O(BrT) such ordinates, in contrast with the previous 2m + 1. Now the

Having formed an estimate in the manner of (7.4.4) or (7.4.18) we may con-sider approximating the distribution of that estimate by (2m + I)"1

Wrc(2m + l,fAr*(X)) if X ̂ 0 (mod *•) and by (2m)-iWr(2mfxx(X)) ifX = 0 (mod ?r) taking 2m + 1 to be given by (7.4.21).

Rosenblatt (1959) discussed the asymptotic first- and second-order mo-ment structure and the joint asymptotic distribution of consistent estimatesof second-order spectra. Parzen (1967c) was also concerned with the asymp-totic theory and certain empirical aspects. We end this section by remarkingthat we will develop the asymptotic distribution of spectral estimates basedon tapered data in Section 7.7.


Wishart is approximately normal for large degrees of freedom. As we haveassumed BTT —> <», the two approximations are essentially the same. Wemay set up a formal equivalence between the approximations. Suppose thesame weight function is used in all the estimates, Wab(a) = W(a) fora, b = 1, . . . ,r. Comparing expression (7.4.16) with expression (7.3.13)suggests the identification

7.5 CONSTRUCTION OF CONFIDENCE LIMITS

Having determined certain limiting distributions for estimates,/afe(r)(X), of

second-order spectra we turn to a discussion of the use of these distributionsin setting confidence limits for the parameter ̂ (X). We begin with the esti-mate of Section 7.3. In the case of X ̂ 0 (mod T), the estimate is given by

for s(T) an integer with 2irs(T)/T near X. Its consideration resulted fromTheorem 7.2.4 which suggested that the variates

might be considered to be 2m + 1 independent estimates of /fli>(X). Having anumber of approximately independent estimates of a parameter of interest,a means of setting approximate confidence limits is clear. Consider for ex-ample the case of 6 = Re/,/,(X). Set

by a Student's / distribution with 2m degrees of freedom. This leads to thefollowing approximate 100/3 percent confidence interval for 6 — Re /0j,(X)

where tj(y) denotes the lOOy percentile of Student's / distribution with v de-grees of freedom. In the case of X = 0 (mod ?r) we again proceed fromTheorem 7.2.4.

By setting

7.5 CONSTRUCTION OF CONFIDENCE LIMITS 253

Our estimate of 6 is now

Set

Even when the basic variates §s are not normal, it has often proved reason-able statistical practice (see Chap. 31 in Kendall and Stuart (1961)) toapproximate the distribution of a variate such as

for s = 0, ±1, . . . , ±m we may likewise obtain an approximate confidenceinterval for the quad-spectrum, Im /,i(X).

A closely related means of setting approximate confidence limits followsfrom Theorem 7.2.5. Here the statistics Ittb<n(\J), I = 1,. . . , L for X ̂ 0(mod 2x) provide L approximately independent estimates of y^>(X). Pro-ceeding as above, we set 6 = Re/aft(X),

and set


We then approximate the distribution of

by a Student's / distribution with L — 1 degrees of freedom and thenceobtain the desired limits. Similar steps lead to approximate limits in thecase of the quad-spectrum, Im /afr(X).

The results of Theorem 7.4.4 and Exercise 7.10.8 suggest a different meansof proceeding. Suppose X ̂ 0 (mod TT) and that the estimate/fli

(T)(X) is givenby (7.4.4). Then the exercise suggests that the distribution of Re/a/,

(n(X) isapproximately normal with mean Re/^X) and variance

Expression (7.5.13) can be estimated by

and the following approximate 100/3 percent confidence interval can be setdown

where z(y) denotes the 1007 percent point of the distribution of a standardnormal variate. We may obtain an approximate interval for the quad-spectrum Im/0i(X) in a similar manner.

Finally, we note that the approximations suggested in Freiberger (1963)may prove useful in constructing confidence intervals for Re/ai(X), Im/^X).Rosenblatt (1960) and Gyires (1961) relate to these approximations.

7.6 THE ESTIMATION OF RELATED PARAMETERS

Let X(/), t = 0, ± 1, . . . denote an r vector-valued stationary series withcovariance function cxx(u), u = 0, ±1, . . . and spectral density matrixfÂ-(X), — 0° < X < oo. Sometimes we are interested in estimating param-eters of the process having the form

for some function A(a) and a, b = 1,. . . , r. Examples of such a parameterinclude the covariance functions

Finally, Jab(T)(Aj),j = 1,.. ., J; a, b = 1,.. . , r are asymptotically jointly

normal with the above first- and second-order moment structure.

From Theorem 7.6.1, we see that Jab(T\Aj) is an asymptotically unbiased

and consistent estimate of Jab(Aj). It is based on the discrete Fourier trans-form and so can possibly be computed taking advantage of the Fast Fourier

7.6 THE ESTIMATION OF RELATED PARAMETERS 255

and the spectral measures

0, b = 1,. . . , r. If /a6(r)(X) indicates a periodogram of a stretch of data,

then an obvious estimate of Jab(A) is provided by

In connection with this estimate we have

Theorem 7.6.1 Let X(r), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.1. Let A}{a), 0 ^ a ^ 2ir, be of bounded variationfor /' = 1, . . . , J. Then


Transform Algorithm. Were Assumption 2.6.2(1) adopted the error termswould be 0(7^0, O(T~2) in the manner of Theorem 5.10.1.

In the case of the estimate

of the spectral measure, Fa&(X), corresponding to A(oi) = 1 for 0 ^ a ^ Xand = 0 otherwise, expression (7.6.7) gives

is also asymptotically normal with the covariance structure (7.6.11).It will sometimes be useful to consider the parameters

for 0 ^ X, M ^ ir; a\,b\,02,b2 = 1,. . . , r. We will return to the discussionof the convergence of Ffl6

(7>)(X) later in this section. In the case of the estimate

of cab(u), corresponding to A(a) = exp {/««} and with&(/) denoting the Tperiodic extension of the sequence X(G),. .. ,X(T — 1), expression (7.6.7)gives

Exercise 7.10.36 shows that the autocovariance estimate

In the case that the spectral estimates are of the form considered in Section7.4 we have

Theorem 7.6.2 Under the conditions of Theorem 7.4.3 and if Rab(T\ty is

given by (7.6.14)


— oo <x< o o ; l <C a < b ^ r. Rab(K) is called the coherency of the seriesXa(t) with the series Xb(t) at frequency X. Its modulus squared, |/?ai(X)|2, iscalled the coherence of the series Xa(f) with the series Xb(i) at frequency X. Theinterpretation of the parameter /?flfe(X) will be considered in Chapter 8. It is acomplex-valued analog of the coefficient of correlation. We may estimateit bv

for a, b, c, d = 1, . . . , r. Also the variates 7?at(r)(X), a, b — 1, . . . , r are

asymptotically jointly normal with covariance structure indicated by ex-pression (7.6.16) where we have written Rab for ^(X), a, b = 1, . . . , r.

The asymptotic covariance structure of estimated correlation coefficientsis presented in Pearson and Filon (1898), Hall (1927), and Hsu (1949) for thecase of vector-valued variates with real components. We could clearly de-velop an alternate form of limiting distribution taking the estimate andlimiting Wishart distributions of Theorem 7.3.3. This distribution is givenby Fisher (1962) for the case of vector-valued variates with real components.The theorem has this useful corollary:

Corollary 7.6.2 Under the conditions of Theorem 7.6.2,

and, for given J, the variates Rab(T)(^\), • • • , Rab(T)(^j) are asymptotically

jointly normal with covariance structure given by (7.6.18) for 1 ̂ a < b ^ r.


In Section 8.5 we will discuss further aspects of the asymptotic distribu-tion of |J?a6<r)(X)|2, and in Section 8.9, we will discuss the construction ofapproximate confidence intervals for |jRa*(X)|.

Let £>[0,7r] signify the space of right continuous functions having left-handlimits. This space can be endowed with a metric which makes it completeand separable; see Billingsley (1968), Chap. 3. Let Dr£r[Q,ir] denote thespace of r X r matrix-valued functions whose entries are complex-valuedfunctions that are right continuous and have left-hand limits. This space isisomorphic with D2r\Q,ir] and may be endowed with a metric making it com-plete and separable. Continuing, if PT, T = 1, 2 , . . . denotes a sequence ofprobability measures on DC^[Q,TT], we shall say that the sequence convergesweakly to a probability measure P on DÔ.TT] if

as T—> oo for all real-valued bounded continuous functions, h, on DÔ,*-].In this circumstance, if PT is determined by the random element Xr and P isdetermined by the random element X, we shall also say that the sequenceXr, T = 1, 2, ... converges in distribution to X.

The random function $xx(T)(\\ 0 ^ X ̂ ir, clearly lies in Dr£r[Q,Tr] asdoes the function ^f[¥xxm(X)— Fjrjr(X)]. We may now state

Theorem 7.6.3 Let X(r), t = 0, ±1,... be an r vector-valued series satis-fying Assumption 2.6.2(1). Let Fjr;r(r)(X) be given by (7.6.8). Then thesequence of processes {^T[¥xx ( T ) (X) — $xx(\)]; 0 ^ X ̂ •*} converges indistribution to an r X r matrix-valued Gaussian process {Y(X); 0 ^ X ̂ ir\with mean 0 and

for 0 ^ X, M ̂ TT and ai, 02, hi, 62 = 1,. . . , r.

We may use the results of Chapter 4 in Crame'r and Leadbetter (1967) tosee that the sample paths of the limit process (Y(X); 0 ^ X ̂ TT} are con-tinuous with probability 1. In the case that the series X(r), / = 0, ±1,... isGaussian, the fourth-order spectra are identically 0 and the covariance func-tion (7.6.20) is simplified. In this case, by setting Ai(d) — 1 for MI ^ « ^ AIand Ai(oi) = 1 for M2 ̂ « ^ \i and both = 0 otherwise, we see from (7.6.7)that


That is, the limiting Gaussian process has independent increments.A key implication of Theorem 7.6.3 is that if h is a function on Dg°"[0,7r]

whose set of discontinuities has probability 0 with respect to the process(Y(X); 0 ^ X ̂ TT}, then h(^[¥xx(T)(') - $xx(-)]) converges in distribu-tion to /<¥(•)); see in Biilingsley (1968) p. 31. The metric for DgÔ,*-] usedabove is often not convenient. Luckily, as the limit process of the theorem iscontinuous, a result of M. L. Straf applies to indicate that if h is continuousin the norm

tends in distribution to

where Yaa(\) is a Gaussian process with 0 mean and

It may be shown that the process

converges in distribution to a 0 mean Gaussian process with covariancefunction (7.6.20).

If r = 1 and the series X(i), / = 0, ± 1, ... is a 0 mean linear process, thenGrenander and Rosenblatt (1957) demonstrated the weak convergence ofthe process

and the h(^[Fxx(T)(-) — F*X')]) are (measurable) random variables, thenh converges in distribution to A(Y(-))- For example this implies that

The estimate considered in the theorem has the disadvantage of being dis-continuous even though the corresponding population parameter is con-tinuous and indeed differentiable. A continuous estimate is provided by


They also considered the weak convergence of the process

where fxx(T)(X) is an estimate of the spectral density involving a weightfunction. The case of a 0 mean Gaussian process with square integrablespectral density was considered by Ibragimov (1963) and Malevich (1964,1965). MacNeil (1971) considered the case of a 0 mean bivariate Gaussianprocess. Brillinger (1969c) considers the case of a 0 mean r vector-valuedprocess satisfying Assumption 2.6.2(1) and shows convergence in a finertopology. Clevenson (1970) considered the weak convegence of the discon-tinuous process of the theorem in the case of a 0 mean Gaussian series.

7.7 FURTHER CONSIDERATIONS IN THE ESTIMATION OFSECOND-ORDER SPECTRA

We begin this section by developing the asymptotic distribution of a con-sistent estimate of the spectral density matrix based on tapered data. Sup-pose that we wish to estimate the spectral density matrix, fxxO$, of an rvector-valued series X(/)5 t = 0, ±1, . . . with mean function c*. Let ha(u),-co < u < 03, denote a tapering function satisfying Assumption 4.3.1 fora = 1 , . . . , / • . Suppose that the tapered values ha(t/T)Xa(t)t t — 0, ± 1, . ..are available for analysis. Suppose the mean function is estimated by

where

Let

We will base our estimate of fxx(\) on the Fourier transforms of mean-adjusted tapered values, specifically on

is an estimate of the cross-covariance function cab(u).Suppose Wab(a\ — oo < a < oo, a, b = 1, . . . , / • are weight functions

satisfying J Wab(a)da = 1. In this present case involving arbitrary taperingfunctions, no particular advantage accrues from a smoothing of the periodo-gram values at the particular frequencies 2-n-s/T, s = 0, ±1, . . . . For thisreason we consider the following estimate involving a continuous weighting

where

From (7.7.3) we see that expression (7.7.5) may be written

where

Following the discussion of Section 7.2, we next form the second-orderperiodograms


where the values BT, T = 1,2,... are positive and bounded. Using ex-pression (7.7.7) we see that (7.7.9) may be written

where

We will require this function to satisfy the following:

Assumption 7.7.1 The function w(u), — <» < u < <», is real-valued,bounded, symmetric, w(0) = 1, and such that

Following Schwarz's inequality this is ^ 1 and-so the limiting variance isincreased by tapering. However, the hope is that there has been a sufficientreduction in bias to compensate for any increase in variance. We also have

Corollary 7.7.1 Under the conditions of Theorem 7.7.1, and if Bj —> 0as T—> * t the estimate is asymptotically unbiased.

Historically, the first cross-spectral estimate widely considered had theform (7.7.10) (see Goodman (1957) and Rosenblatt (1959)), although taper-ing was not generally employed. Its asymptotic properties are seen to be


Exercise 3.10.7 shows the estimate (7.7.10) may be computed using a FastFourier Transform. We have

Theorem 7.7.1 Let X(f), t = 0, ±1,. . . be an r vector-valued series satis-fying Assumption 2.6.2(1). Let /za(w), — » < u < oo, satisfy Assumption4.3.1 for a = 1, . . . , / • and be such that J ha(u)hb(u)du ^ 0. Let wad(u),— oo < u < oo} satisfy Assumption 7.7.1 for a, b = 1,... , r. Let BrT —» <»asr-» oo. Then

Also

Finally, the variates/^Xi),. . . ,/a|C(JK^*) are asymptotically normal with

the above covariance structure.

A comparison of expressions (7.7.14) and (7.4.17) shows that, asymp-totically, the effect of tapering is to multiply the limiting variance by a factor

This factor equals 1 in the case of no tapering, that is, ha(t) = 1 for 0 ^ / < 1and = 0 for other t. In the case that the same tapering function is used forall series, that is, ha(t) — h(t) for a = 1,.. ., r, the factor becomes

or the average of the two. These estimation procedures have the advantageof allowing us to investigate whether or not the structure of the series isslowly evolving in time; see Brillinger and Hatanaka (1969). This type ofestimate was suggested in Blanc-Lapierre and Fortet (1953). One usefulmeans of forming the required series is through the technique of complexdemodulation; see Section 2.7 and Brillinger (1964b).

Brillinger (1968) was concerned with estimating the cross-spectrum of a 0mean bivariate Gaussian series from the values sgn X\(t), sgn X2(t), t = 0,. . . , T — 1. The asymptotic distribution of the estimate (7.7.10), withouttapering, was derived.

On some occasions we may wish a measure of the extent to which/j/,(r)(X)may deviate from its expected value simultaneously as a function of X and T.We begin by examining the behavior of the second-order periodograms.Theorem 4.5.1 indicated that for a 0 mean series and under regularityconditions

with probability 1. This gives us directly

Theorem 7.7.2 Let X(0, t = 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.3 and having mean 0. Let ha(u), — « < u < », satisfyAssumption 4.3.1, a = 1,. . . , r. Let I*;r(r)(A) be given by (7.2.5). Then

with probability 1 for a, b = 1, .. ., r.


essentially the same as those of the estimate of Section 7.4. It is investigatedin Akaike and Yamanouchi (1962), Jenkins (1963a), Murthy (1963), andGranger (1964). Freiberger (1963) considers approximations to its distribu-tion in the case that the series is bivariate Gaussian.

The discussion of Section 7.1 suggests an alternate class of estimates of thesecond-order spectrum /06(X). Let Ya(i) denote the series resulting fromband-pass filtering the series Xa(t) with a filter having transfer functionA(a) = 1 for |a ± X| < A, and = 0 otherwise, — TT < a, X ̂ TT. Considerestimating Re /,t(X) by

or the average of the two. Consider estimating Im /,f>(X) by


Whittle (1959) determined a bound for the second-order periodogram thatheld in probability; see also Walker (1965). Parthasarathy (1960) found aprobability 1 bound for the case of a single periodogram ordinate; he foundthat a single ordinate could grow at the rate log log T, rather than the log Tof (7.7.20). Before turning to an investigation of the behavior of/0j,

(r)(X) —Efab(T)(X) we set down a further assumption of the character of Assumption2.6.3 concerning the series X(/), t — 0, ±1,.. ..

Assumption 7.7.2 X(0, / = 0, ± 1,.. . is an /• vector-valued series satis-fying Assumption 2.6.1. Also, with Cn given by (2.6.7),

This is finite for 1C-^z\ < 1 and so Assumption 7.7.1 is satisfied in this caseso long as Assumption 2.6.1 is satisfied. We may now set down

Theorem 7.7.3 X(0, t = 0, ±1, . . . is an r vector-valued series satisfyingAssumption 7.7.2 ha(u), — °° < u< <», satisfies Assumption 4.3.1 fora - 1, . . . , r. The wah(u), — <» 0 be given and such that ^T B} < «. Then

Ihn" sup |/.*«-)(x) - £/«6(r)(A)|(ftT/log l/fir)1/2

T->co X

for z in a neighborhood of 0. In (7.7.21) the inner summation is over all inde-composable partitions v = ( y i , . . ., vp) of the table

with vp having np > 1 elements, p = 1,. . ., P.

In the case of a Gaussian series, Cn = 0 for n > 2 and the series of (7.7.21)becomes

with probability 1 for a, b = 1,. . . , r.


If 2 \u\ |cflfr(w)| < oo, and / \a.\\Wab(a)\da < «, then Theorem 3.3.1 andexpression (7.7.13) show that

and so we can say

with probability 1, the error terms being uniform in X.We see that in the case of Theorem 7.7.3,/fl&

(r)(X) is a strongly consistentestimate of/^(X). Woodroofe and Van Ness (1967) showed, under regularityconditions including X(f) being a linear process, that

in probability. The data is not tapered here. They also investigated thelimiting distribution of the maximum deviation.

The following cruder result may be developed under the weaker Assump-tion 2.6.1:

Theorem 7.7.4 Let X(/), / = 0, ±1,. .. be an /• vector-valued series satis-fying Assumption 2.6.1. Let ha(u), — °° (\) be given by (7.7.10). Letfirr-> oo, BT -> 0 as T -» oo. Then for any e > 0,

in probability as T—» oo. If, in addition, 2_,r B™ < oo for some m > 0,then the event (7.7.28) occurs with probability 1 as T—> oo.

In Theorem 7.7.4 the multiplier (BTT/log l/flr)1/2 of (7.7.27) has becomereplaced by the smaller (BTT)Î2B^,

If we wish to use the estimate (7.4.5) and are content with a result con-cerning the maximum over a discrete set of points, we have

Theorem 7.7.5 Let X(/), / = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.1. Let Wab(ot), — °° < a < oo, satisfy Assumption5.6.1. Let/fl6

(r)(X) be given by (7.4.5). Let BT -> 0, PT, BTT'-» oo as r-» oo.Then for any e > 0

in probability as T—» oo. If, in addition, X)r FT™ < °°5 for some m > 0,then the event (7.7.29) occurs with probability 1 as T—* ».


In Section 5.8 we discussed the importance of prefiltering a stretch of dataprior to forming a spectral estimate. Expression (7.7.13) again makes thisclear. The expected value of /06

(r)(X) is not generally /ai(X), rather it is aweighted average of fab(ci), — °° < a < <», with weight concentrated in theneighborhood of X. If /a/>(a) has any substantial peaks or valleys, theweighted average could be far from/afc(X). In practice it appears to be thecase that cross-spectra vary more substantially than power spectra. Con-sider a commonly occurring situation in which a series A^(0 is essentially adelayed version of a series X\(i), for example

Here, if v has any appreciable magnitude at all, the function /2i(X) will berapidly altering in sign as X varies. Any weighted average of it, such as(7.7.13) will be near 0. We could well be led to conclude that there was norelation between the series, when in fact there was a strong linear relation.Akaike (1962) has suggested that a situation of this character be handled bydelaying the series Xi(t) by approximately v time units. That is, we analyzethe series [X\(t — v*), AÔL t = 0, ±1, . . . with v* near v, instead of theoriginal stretch of series. This is a form of prefiltering. Akaike suggests thatin practice one might determine v* as the lag where £2i(r)(w)| is greatest. Ifthe estimated delay is anywhere near v at all, the cross-spectrum being esti-mated now should be a much less rapidly varying function.

In Section 5.8 it was suggested that a prewhitening filter be determinedby fitting an autoregressive model to a time series of interest. Nettheim(1966) has suggested an analagous procedure in the estimation of a cross-spectrum. We fit a model such as

by least squares and estimate the cross-spectrum of the residuals with Xi(t).In the full r vector-valued situation we could determine r vectors a(r)(l),. . .,a(T)(m) to minimize

t = 0, ±1, . . . for constants a, v and e(/) an error series orthogonal to theseries X\(t). Then the cross-spectrum is given by


We then form f«,(r)(X), a spectral estimate based on the residuals

It follows that the population parameter and corresponding estimate will beessentially the same for all the frequencies


For an example of the estimate developed in Section 7.3 we return to theseries considered in Section 7.2. There X\(t) was the seasonally adjustedseries of mean monthly temperatures for Berlin (1780 to 1950) andAÔ wasthe seasonally adjusted series of mean monthly temperatures for Vienna(1780 to 1950). The periodograms and cross-periodogram for this data weregiven in Figures 7.2.1 to 7.2.4.

Figures 7.8.1 to 7.8.4 of the present section give /n(T)(X), /22(r>(X),Re/i2(r)(X), Im/i2(r)(X) using estimates of the form (5.4.1) and (7.3.2) withm = 10. If we consider logio power spectral estimates, expression (5.6.15)suggests that the standard errors are both approximately .095. It is inter-esting to contrast the forms of Re/i2

(r)(X) and Im/i2<r>(X); Re/i2(r)(X) is

and then estimate f**(X) by

where

Generally it is wise to use prior knowledge to suggest a statistical model for aseries of interest, to fit the model, and then to compute a spectral estimatebased on the residuals.

Nothing much remains to be said about the complication of aliasing afterthe discussion of Section 5.11. We simply note that the population param-eter fÂ-(X) and its estimates both possess the periodicity and symmetryproperties

If possible we should band-pass filter the series prior to digitization in orderto essentially eliminate any frequency components that might cause con-fusion in the interpretation of the spectral estimate.

Figure 7.8.2 /22(r)00 for seasonally adjusted monthly mean temperatures at Vienna for theyears 1780-1950 with 21 periodogram ordinates averaged. (Logarithmic plot.)

Figure 7.8.1 fn(T)(\) for seasonally adjusted monthly mean temperatures at Berlin for theyears 1780-1950 with 21 periodogram ordinates averaged. (Logarithmic plot.)

Figure 7.8.3 Re/ia(r)(X), estimate of the cospectrum of Berlin and Vienna temperaturesfor the years 1780-1950 with 21 periodogram ordinates averaged.

Figure 7.8.4 Im/i2(r)(X), estimate of the quadspectrum of Berlin and Vienna temperaturesfor the years 1780-1950 with 21 periodogram ordinates averaged.


Figure 7.8.5 c\ \ (r)(«), estimate of the auto- Figure 7.8.6 C22(r)(«), estimate of the auto-covariance function of Berlin temperatures, covariance function of Vienna temperatures.

Figure 7.8.7 c\2(T)(u), estimate of the crosscovariance function of Berlin and Viennatemperatures.

everywhere positive, of appreciable magnitude at several frequencies andapproximately constant otherwise, while Im/i2(r)(X) simply fluctuates alittle about the value 0 suggesting that Im/i2(X) = 0. Other statistics forthis example were given in Section 6.10.

For completeness we also give estimates of the auto- and cross-covariancefunctions of these two series. Figure 7.8.5 is an estimate of the autoco-variance function of the series of Berlin mean monthly temperatures, with


seasonal effects removed. Likewise Figure 7.8.6 is an estimate of the auto-covariance function of the Vienna series. Figure 7.8.7 is the function ci2(T)(u)for u = 0, ±1,

Figure 7.8.8 Logarithm of estimated power spectrum of seasonally adjusted monthly meantemperatures at various stations, with 115 periodogram ordinates averaged.

Table 7.8.1 Covariance Matrix of the Temperature Series

Vienna 4.272Berlin 3.438 4.333Copenhagen 2.312 2.962 2.939Pragwe 3.986 3.756 2.635 6.030Stockholm 2.056 2.950 3.052 2.325 4.386Budapest 3.808 3.132 2.047 3.558 1.843 4.040DeBilt 2.665 3.209 2.315 2.960 2.170 2.261 3.073Edinburgh .941 1.482 1.349 1.182 1.418 .627 1.509 2.050New Haven .045 .288 .520 .076 .672 .009 .206 .404 2.939Awe/ 3.099 3.051 1.946 3.212 1.576 2.776 2.747 1.179 .178 3.694Breslau 3.868 4.227 2.868 4.100 2.805 3.646 3.053 1.139 .165 3.123 5.095Vilna 3.126 3.623 2.795 3.152 3.349 2.993 2.392 .712 .057 1.962 3.911 6.502Trondheim 1.230 2.165 2.358 1.496 3.312 .984 1.656 1.429 .594 .884 1.801 2.185 3.949Greenwich 1.805 2.255 1.658 2.005 1.570 1.450 2.300 1.564 .440 2.310 2.059 1.261 1.255 2.355


Table 7.8.2Sample Correlation Matrix of the Seasonally Adjusted Series

1 2 3 4 5 6 7 8 9 10 11 12 13

1 Vienna2 Berlin .803 Copenhagen .65 .834 Prague .79 .73 .635 Stockholm .48 .68 .85 .456 Budapest .92 .75 .59 .72 .447 DeBilt .74 .88 .77 .69 .59 .648 Edinburgh .32 .50 .56 .34 .48 .22 .619 New Haven .01 .08 .18 .02 .19 .00 .07 .16

10 Basel .78 .76 .59 .68 .39 .72 .82 .43 .0511 Breslau .83 .90 .74 .74 .59 .80 .77 .36 .04 .7212 Vilna .59 .68 .64 .50 .63 .58 .54 .20 .01 .40 .6813 Trondheim .30 .52 .69 .31 .80 .25 .48 .50 .17 .23 .40 .4314 Greenwich .57 .71 .67 .53 .49 .47 .86 .72 .17 .78 .59 .32 .41

1 2 3 4 5 6 7 8 9 10 11 12 13

All of these figures are consistent with a hypothesis of an instantaneousrelation between the two series. (Instantaneous here means small time leador lag relative to an interval of one month, because the data is monthly.)

As a full vector-valued example we consider the series of mean monthlytemperatures recorded at the stations listed in Table 1.1.1. The series wereinitially seasonally adjusted by removing monthly means. Table 7.8.1 givesCxx(T)(0), the estimated 0 lag autocovariance matrix. Table 7.8.2 gives the 0lag correlations of the series. Except for the New Haven series the series areseen to be quite intercorrelated.

The spectral density matrix was estimated through a statistic of the form(7.3.2) with m = 57. Because there are so many second-order spectra we donot present all the estimates. Figure 7.8.8 gives the logio of the estimatedpower spectra. These are all seen to have essentially the same shape. Figure7.8.9 gives the sample coherences, |/?i/r)(A)|2, takingX\(i) to be the Green-wich series and letting j run across the remaining series. The horizontal linein each of the diagrams corresponds to the 0 lag correlation squared. Theplots are seen to be vaguely constant, fluctuating about the horizontal linein each case. This last is suggestive of instantaneous dependence of the seriesfor if cab(u) = 0 for u ̂ 0, then |/U(A)|2 = cafc(0)|2/^(0)^(0)1 for— oo < A < oo. The correlation is seen to be greatest for the De Bilt seriesfollowed by Basel. The correlation is lowest for New Haven, Conn., on theopposite side of the Atlantic.


Figure 7.8.9 Estimated coherences of seasonally adjusted Greenwich monthly meantemperatures with similar temperatures at 13 other stations for the years 1780-1950.


Figure 9.6.1 gives logio of the estimated power spectra for an estimate ofthe form (7.3.2) with m = 25. These curves are more variable as to be ex-pected from the sampling theory developed in this chapter.

where /i is a constant; where «(/), / = 0, ± 1, . . . is a 0 mean stationary serieswith power spectrum /a«(X); where the series j3/0> t - 0, ±1,.. . ,j = 1,. . . , J are 0 mean stationary series each with power spectrum /^(X),— oo < X < oo; and where the series e/*(0> t = 0, ± 1,. . . ; k = 1,.. . , K;

j = 1,. . . , J are 0 mean stationary series each with power spectrum/.«(X), — 0° < X < oo. The parameter M relates to the mean thickness of thesheeting. Series «(?), t = 0, ±1, . . . is common to all the sheets and theseries /3/0» t = 0, ± 1, ... relates to the effect of they'th batch, if such an in-dividual effect exists. It is common to all sheets selected from batch j. Theseries £;*(/)> t = 0, ± 1, . . . is an error series. Taking note of the language ofthe random effects model of experimental design (Scheffe (1959)) we mightcall /aa(X), /#»(X), /.E(X) components of the power spectrum at frequency X.Spectrum /0/s(X) might be called the between batch power spectrum at fre-quency X and/,,(X) the within batch power spectrum at frequency X.

Under the above assumptions, we note that EXjk(t) = n, t = 0, ± 1,. . .and the series have power spectra and cross-spectra as follows:

and

The coherency between series corresponding to sheets selected from thesame batch is seen to be


7.9 THE ANALYSIS OF SERIES COLLECTED IN ANEXPERIMENTAL DESIGN

On occasion the subscripts a = 1,.. ., r of an r vector-valued seriesX(0 = [Xi(OL t = 0, ±1,.. . may have an inherent structure of their ownas in the case where the series have been collected in an experimental design.Consider for example the case of a balanced one-way classification, where Kof the series fall into each of J classes. Here we would probably denote theseries by Xjk(t), t = 0, ±1,. .. ; k = 1,. . . , K',j = 1 , . . . , , / with r — JK.Such series would arise if we were making up J batches of sheeting and draw-ing K pieces of sheeting from each batch. If we were interested in the uni-formity of the sheeting we could let t refer to position from an origin along across-section of the sheeting and let Xjk(f) denote the thickness at position ton sheet k selected from batch j. A model that might come to mind for thissituation is

7.9 ANALYSIS OF SERIES COLLECTED IN AN EXPERMENTAL DESIGN 277

This might be called the intraclass coherency at frequency X. The coherencybetween the series corresponding to sheets selected from different batches isseen to be/aa(X)/[/aa(X) +/^(X) + /,,(X)].

We might be interested in a measure of the extent to which sheets from thesame batch are related at frequency X. One such measure is the coherency(7.9.5). In the extreme case of «(/), /3/0 identically 0, this measure is iden-tically 0. In another extreme case where sjk(t) is identically 0, this measureis 1. We turn to the problem of estimating/aa(X),/^(X), and/»(X).

From the model (7.9.1) we see that

where

with similar definitions for aa(T\ a/i > «« • From Theorem 4.4.2 the vanat

da(T)(\) is approximately Nic(Q,2irTfaa(\y), the variates <fc/r)(A),./ = 1 , . . . , ,

are approximately independent Nic(0,2irTf^(\y) variates for X ̂ 0 (mod TTwhile the variates ^^(X), k = 1, . . . , K, j = 1, . . . , J are approximatelindependent Nic(Q, 2irTftl(\}) variates for X ̂ 0 (mod T). The model (7.9.6therefore has the approximate form of the random effects model of analysiof variance in a balanced one-way classification; see Scheffe' (1959). Thisuggests that we evaluate the statistic

and then estimate/.,(X) by

We estimate A/^X) + Jtt(\) by

and finally estimate JKfatt(\) + Kf^\) + /.,(X) by

in the case that X ̂ 0 (mod 2?r).

Theorem 7.9.1 Let JK series XJk(t), t = 0, ±1, . . . ; k = 1,. . . , K;j = 1 , . . . , J be given of the form (7.9.1) where p, is a constant, where a(t),

J),y)ss

e


|8XO> 6;fc(0» t = 0, ±1,.. . ; A; = 1,. . ., K\j = 1,. . . , Jare independent 0mean series satisfying Assumption 2.6.1 and having power spectra/,a(X),

/fl*(X), /«(X) respectively. Let /,,^(X), W>(X) + /,.(r)(X), JKIaa^(\) +#Vr)(X) + /,,(r)(X) be (7.9.9) to (7.9.11). Then if X ̂ 0 (mod *•), thesestatistics are asymptotically independent ftl(\)K2j(K-i)/[U(K— 1)],[AT/^X) +/.,(X)]x2y

2/(2J), [/A/UX) + WX) + /,,(X)]x22/2. Also for s,(T)

an integer with X/(7) = 1irSi(T)/T -» X/ as T -»• » with 2X/(7), X/(7) ± Xm(F)^ 0 (mod IT) for 1 ̂ / < m ^ L for T sufficiently large, the statistics/e.(r)(X/(T)), JCVr>(X/(D) + /,«(T)(X,(T)), JKIaa™(Xt(T» + ^/^(^(X/(D) +/«(r)(X/(r)), / = 1,. . . , L are asymptotically independent.

It follows from Theorem 7.9.1 that the estimate

of /#j(X) will be distributed asymptotically as the difference of two inde-pendent chi-squared variates. It also follows that the ratio

will be distributed asymptotically as

as T —> oo. This last result may be used to set approximate confidence inter-vals for the ratio of power spectra /p0(X)//,,(X).

We have seen previously that advantages accrue from the smoothing ofperiodogram type statistics. The same is true in the present context. For s(T)an integer with 2irs(T)/T near X ̂ 0 (mod 2tr) consider the statistics

and

It follows from Theorem 7.9.1 that these will be asymptotically distributed asindependent/,,(X)xL(^-i)(2m+i)/[27(^- lX2w + 1)] and [AT/^X) +/«(X)]X2/(2m+D/[2y(2w + 1)] respectively.

The discussion of this section may clearly be extended to apply to timeseries collected in more complicated experimental designs. The calculationsand asymptotic distributions will parallel those going along with a normal

j — I , . .., N where s(t) is a fixed unknown signal and rij(i) a random noiseseries. He -suggests the consideration of F ratios computed in the frequencydomain. Brillinger (1973) considers the model (7.9.1) also in the case that theseries «(0, 0X0 are fixed and in the case that a transient series is present.

7.10 EXERCISES 279

random effects model for the design concerned. Shumway (1971) con-sidered the model

7.10 EXERCISES

7.10.1 Given the series [X\(t),X2(i)}, t = 0, ±1, . . . with absolutely summablecross-covariance function ciatw) = cov [X\(t + u),X2(t)\ t, u = 0, ±1,. . .show that /2i(X) = /12(-X).

7.10.2 Under the conditions of the previous exercise, show that the co-spectrumof the series X\(f)Hwith the series X2(t) is the quad-spectrum of the seriesX\(t) with the seriesX2(f).

7.10.3 Under the conditions of the first exercise, show that/i2(X), — °° < X < oo ,is real-valued in the case that c\2(u) = c2\(u).

7.10.4 Suppose the auto- and cross-covariance functions of the stationary series\X\(i),X2(i)}, t = 0, ±1, .. . are absolutely summable. Use the identity

to prove that

7.10.7 Let [Xi(t)Mt)]t t - 0, ±1, . . . be a stationary series.(a) If Yi(t) = Xi(i) + X2(t), Y2(t) = Xi(t) - X2(t\ show how the co-

spectrum, Re /ia(X), may be estimated from the power spectra ofKiCO and y2(0.

(b) If YiW = Xi(t + 1) - Xi(t - 1), and Y2(f) = X2(t), show how thequad-spectrum, Im/i2(X), may be estimated from the co-spectrumof Yi(i) and Y2(i).



is asymptotically bivariate normal with variances

and covariance

is a plausible estimate of /i2(0).7.10.13 Show that the results of Theorems 7.2.3, 7.2.4, 7.2.5, and 7.3.3 are exact

rather than asymptotic when [X\(f),i(f = 0, ±1, ... is a sequence ofindependent identically distributed bivariate normals.

for / = 1, . . . , L. Show that c*(K)(/), / = 1, . . . , L are asymptoticallyindependent

7.10.12 Let [X\(f)J(i(t)}, t = 0, ±1,... be a bivariate series satisfying Assumption2.6.1. Let

variates. Conclude that with T = LV

7.10.9 Under the conditions of Theorem 7.4.4, show that !/i2(T)(X)|, |/i2

(r)(M)lare asymptotically bivariate normal with covariance structure

7.10.10 Under the conditions of Theorem 7.4.4, show that </>i2(r)(X) = arg/i2<

r>(X),</>i2

(T)(M) = arg /i2(T)(M) are asymptotically bivariate normal with covari-

ance structure given by

7.10.11 Under the condition 2 |cj2(w)| < <», show that the expected value of/A-,S<r).A-,-c,(T)(X) is

7.10 EXERCISES 281

7.10.14 Let the series [Xi(t)^2(t)], t = 0, ±1, . . . satisfy Assumption 2.6.2(1)and have mean 0. Then

where there are finite K, L such that

7.10.15 Suppose the conditions of Theorem 7.3.3 are satisfied. Let p = /aft(X)/V/aoOO/wW, a*b. Then x = /a6

(r>(X)/V/aa(r)(X)/6^>(X) is asympto-

tically distributed with density function

and density function

Hint: Use Exercise 4.8.33.7.10.16 Let cxx(u), u = 0, ±1, . . . denote the autocovariance matrix of the

stationary r vector-valued series X(r), / == 0, ±1, . . . . Show that thematrix c*;r(0) — Cxx(u)Tcxx(0)~lcxx(u) is non-negative definite for u = 0,

Suppose Det fxxQd 5^0, — o o <X< °°. Show that there exists an /summable r X r filter {a(w)}, such that the series

7.10.20 Under the conditions of Theorem 4.5.2, show that there exists a finite Lsuch that with probability 1

7.10.19 Let X(/), / = 0, ±1,.. . be the vector-valued series of Example 2.9.7.Show that

7.10.17 Let fxx(X), — °° < X > °°, denote the spectral density matrix of thestationary r vector-valued series X(/), / = 0, ±1, . . . . Show that Imf*;r(X) = 0 for X == 0 (mod TT).

7.10.18 Let the autocovariance function of Exercise 7.10.16 satisfy


7.10.21 Show that Theorem 7.2.1 takes the form

in the case of untapered data.

7.10.23 Suppose the conditions of Theorem 7.2.5 are staisfied. Set

is asymptotically (L - V)-lWrc(L - l,fxx(\)) if X ^ 0 (mod T) and

asymptotically (L - 1)-' Wr(L - l,f™(X)) if X = 0 (mod TT).7.10.24 Consider the estimate

where 2irs(T)/T-+ X ̂ 0 (mod TT) and £, Wt = 1. Under the conditionsof Theorem 7.3.3, show that fxx(T)(\) is distributed asymptotically as

where the Wj, s = 0, = t l , . . . , ±m are independent W,c(l,fxx03) variates.Indicate the mean and covariance matrix of the limiting distribution.

where B(a) is Brownian motion on [0,7r].

7.10.34 If the series X(t\ t — 0, ±1, . . . is a real-valued white noise process withvariance a2 and fourth cumulant *4, show that the limit process of Theorem7.6.3 has covariance function

7.10.25 Suppose the estimate

is used in the case of T even and X = TT. Under the conditions of Theorem7.3.3, show that it is asymptotically (2m + 1)-' Wr(2m + l,fxx(ir)).

7.10.26 Show that the estimate (7.4.5) is non-negative definite if the matrix [ Wab(a)]is non-negative definite for — oo < a < ». Hint: Use Schur's resultthat [AabBah] is non-negative if [Aab], [Bab] are; see Bellman (1960) p. 94.

7.10.27 Show that the matrix [ fab(T)(X)] of estimates (7.7.9) is non-negative definite

if the matrix [Wab(a)} is non-negative definite, — <» < a < «>, and ifha(u) = h(u) for a = 1, . . . , r.

7.10.28 Under the conditions of Theorem 7.3.2, show that fab(T)(X) is consistent if

/-a(X)or/M(X) = 0.

7.10.29 Under the conditions of Theorem 7.3.3, show that V?Wr) - c*) andfxx(T)(Q) are asymptotically independent Nr(Q,2trfxx(fy) and (2m)~l

Wr(2mfXx(ty) respectively. If

conclude that 32/r is asymptotically Fr>2m. This result may be used to con-struct approximate confidence regions for C*.

7.10.30 Under the conditions of Theorem 7.4.3, show that in order to minimize themean-squared error E\fab

(1">(\) — fab(\)\2 asymptotically, one should have

BT = O(r~1/5); see Bartlett (1966) p. 316.

7.10.31 Under the conditions of Theorem 7.6.2, prove that Rab(T)Q^ and RCd(T)(X)

are asymptotically independent if Rab, Rac, Rad, Rbc, /?*</, Rca are 0.

7.10.32 Show that in the case that the series X(/), / = 0, ±1,... is not necessarilyGaussian, the covariance (7.6.21) equals

7.10.33 If the series AX/), / = 0, ±1, ... is stationary, real-valued and Gaussian,prove that the covariance structure of the limit process of Theorem 7.6.3is the same as that of

7.10 exrhvuyh 283


7.10.35 If X(t), t = 0, ±1,... is a real-valued linear process, under the conditionsof Theorem 7.6.3 show that

for. — oo < X < oo; a, b = 1, . . . , r. Show that the matrix Ixx(T)(ty =[/a6(r)(X)l is non-negative definite.

7.10.40 Let the series XO), t = 0, ±1, . . . satisfy Assumption 2.6.2(1). Show thatexpression (7.2.14) holds with the O(T~l), O(T~2) terms uniform in r,s ^ 0 (mod T).

7.10.41 Use the results of the previous exercise to show that, under the conditionsof Theorem 7.4.3,

converges weakly to a Gaussian process whose covariance function doesnot involve the fourth-order spectrum of the series X(f).

7.10.36 Let X(0, / = 0, ±1,... be an r vector-valued series satisfying Assumption2.6.1. Show that cfli

(r>(w) given by (7.6.10) and ca6<T)(w) given by (7.6.12)

have the same limiting normal distributions. See also Exercise 4.8.37.

7.10.37 Let

7.10.38 With the notation of Section 7.9, show that the following identity holds

/ = 0 , , . . , L — 1; a, b = 1 , . . . , r. Under the conditions of Theorem 7.6.1show that Jab

(y)(A,l\ I — 0, . . . , L — 1 are asymptotically independentnormal with mean JQT A(a) fao(ot)da. as V —» <». This result may be used toset approximate confidence limits for Jai,(A).

71039 jkkhjsdlerki

and

7.10.42 Let X(/), / = 0, ±1, . . . satisfy Assumption 2.6.2(1). Let A(a) be ofbounded variation. Let W(a) satisfy Assumption 6.4.1. Suppose PT =P -> oo, with PTBT ^ 1, PTBTT-> oo as T -> oo. Let

Show that Jab(p)(A) is asymptotically normal with

Hint: Use the previous exercise.

7.10 njbhgkshf285

8

ANALYSIS OF A LINEARTIME INVARIANT RELATION

BETWEEN TWO VECTOR-VALUEDSTOCHASTIC SERIES

8.1 INTRODUCTION

Consider an (r -f s) vector-valued stationary series

/ = 0, ±1, . . . with X(0 r vector-valued and Y(/) s vector-valued. Weassume the series (8.1.1) satisfies Assumption 2.6.1 and we define the means

the covariances

and the second-order spectral densities

286

8.2 ANALAGOUS MULTIVARIATE RESULTS 287

The problem we investigate in this chapter is the selection of an s vectory and an s X r filter (a(w)} such that the value

and

where M/A), ju/B) denote they'th largest latent values of A, B, respectively.In the theorem below, when we talk of minimizing a Hermitian matrix-

valued function A(0) with respect to 6, we mean finding the value 60 such that

for all 0. A(00) is called the minimum value of A(0). We note that if 00

minimizes A(0), then from (8.2.2) to (8.2.5) it also minimizes simultaneouslythe functionais Det A(0), tr A(0), Ajj(0), and M/A(0)).

We next introduce some additional notation. Let Z be an arbitrary matrixwith columns Zi,. . . , Z/. We use the notation

is near the value Y(f) in some sense. We develop statistical properties of esti-mates of the desired y, a(«) based on a sample of values X(/), Y(0, t = 0,. . . , T — 1. The problems considered in this chapter differ from those ofChapter 6 in that the independent series, X(r), t — 0, ±1, . . . , is taken tobe stochastic rather than fixed.

In the next section we review a variety of results concerning analagousmultivariate problems.

8.2 ANALOGOUS MULTIVARIATE RESULTS

We remind the reader of the ordering for Hermitian matrices given by

if the matrix A — B is non-negative definite. This ordering is discussed inBellman (1960), Gelfand (1961), and Siotani (1967), for example. The in-equality (8.2.1) implies, among other things, that

288 TWO VECTOR-VALUED STOCHASTIC SERIES

for the column vector obtained from Z by placing its columns under oneanother successively. Given arbitrary matrices U, V we define their Kroneckerproduct, U (x) V, to be the block matrix

if V is J X K. An important relation connecting the two notations of thisparagraph is

if the dimensions of the matrices that appear are appropriate; see Exercise8.16.26. Neudecker (1968) and Nissen (1968) discuss statistical applicationsof these definitions.

We now turn to the consideration of (r + s) vector-valued random vari-ables of the form

with X, r vector-valued and Y, s vector-valued. Suppose the variate (8.2.10)has mean

Consider the problem of choosing the 5 vector y and s X r matrix a tominimize the s X s Hermitian matrix

We have

Theorem 8.2.1 Let an (r + s) vector-valued variate of the form (8.2.10),with mean (8.2.11) and covariance matrix (8.2.12), be given. Suppose Vxx isnonsingular. Then y and a minimizing (8.2.13) are given by

and

and covariance matrix

8.2 ANALAGOUS MULTIVARIATE RESULTS 289

The minimum achieved is

We call a, given by (8.2.15), the regression coefficient of Y on X. Thevariate

is called the best linear predictor of Y based on X. From Theorem 8.2.1, wesee that the y and a values given also minimize the determinant, trace,diagonal entries, and latent values of the matrix (8.2.13). References to thistheorem include: Whittle (1963a) Chap. 4, Goldberger (1964) p. 280, Rao(1965), and Khatri (1967). In the case s = 1, the square of the correlationcoefficient of Y with the best linear predictor of Y is called the squared coef-ficient of multiple correlation. It is given by

In the case of vector-valued Y, 'S>YY~ll2'S,Yx'S'Xx~l^xY^lYY~112 has been pro-posed. It will appear in our discussion of canonical correlations given inChapter 10. Real-valued functions of it, such as trace and determinant, willsometimes be of use. The matrix appears in Khatri (1964). Tate (1966)makes remarks concerning multivariate analogs of the correlation coeffi-cient; see also Williams (1967) and Hotelling (1936).

We may define an error variate by

This variate represents the residual after approximating Y by the best linearfunction of X. The covariance matrix of e is given by

that is, the matrix (8.2.16). The covariance of sy with ejt is called the partialcovariance of Yj with Yk. It measures the linear relation of Yj with Yk afterthe linear effects of X have been removed. Similarly the correlation coeffi-cient of Sj with ek is called the partial correlation of Yj with Yk. These para-meters are discussed in Kendall and Stuart (1961) Chap. 27, and Morrison(1967) Chap. 3.

In the case that the variate (8.2.10) has a multivariate normal distribution,the predictor suggested by Theorem 8.2.1 is best within a larger class ofpredictors.

Theorem 8.2.2 Suppose the variate (8.2.10) is multivariate normal withmean (8.2.11) and covariance matrix (8.2.12). Suppose J^xx is nonsingular.

The reason for the divisor (n — r) rather then n will become apparent in thecourse of the statement of the next theorem. We have


The s vector-valued function <j>(X), with E{ <j>(X)r<|>(X)} < °°, that minimizes

is given by


In the case that the variate has a normal distribution, the conditional dis-tribution of Y given X is

and so we see that the partial correlation of Yj with Y* is the conditionalcorrelation of Yj with Y* given X.

We turn to some details of the estimation of the parameters of the abovetheorems. Suppose that a sample of values

j = ! , . . . , « of the variate of Theorem 8.2.1 are available. For convenienceassume yx = 0 and yy = 0. Define the r X n matrix x and the s X nmatrix y by

We may estimate the covariance matrix (8.2.12) by

and

The regression coefficient of Y on X may be estimated by

and the error matrix (8.2.20) may be estimated by

Theorem 8.2.3 Suppose the values (8.2.24), j = !,.. . ,«, are a samplefrom a multivariate normal distribution with mean 0 and covariance matrix(8.2.12). Let a be given by (8.2.28) and ±,. by (8.2.29). Then for any («)vector «

is distributed as

and if n —> <», a is asymptotically normal with these moments. Also Ss, isindependent of a and distributed as (n — r)~l W£n — r,S«,). In the case5 = 1 , Ryx2 = 2rx2xx~l2xr/%YY has density function

The function appearing in (8.2.32) is a generalized hypergeometric func-tion; see Abramowitz and Stegun (1964). Percentage points and momentsof Ryx2 are given in Amos and Koopmans (1962), Ezekiel and Fox (1959)and Kramer (1963). Oikin and Pratt (1958) construct an unbiased estimateof Ryx2- The distributions of further statistics may be determined from thefact that the matrix

is distributed as

The distribution of a is given in Kshirsagar (1961). Its density function isproportional to

This is a form of multivariate t distribution; see Dickey (1967).Estimates of the partial correlations may be based on the entries of S,« in a

manner paralleling their definition. For example an estimate of the partialcorrelation of Yj and 7* with X held linearly constant is

with [S£,]/jt denoting the entry in row 7, column k of Let.

8.2 jhhuanhdju jylidrnjugnucnuncmisj291


From the distribution of S.t given in Theorem 8.2.3, we see that this expres-sion is distributed as the sample correlation coefficient of sj with e* based onn — r observations. The density function of its square will be given by ex-pression (8.2.32) with Ryx2, Ryx2, n, r replaced by RY^.Y^X, RY^Y^X, n — r,1, respectively. The large sample variance of this R2 is approximately4R2[\ — R2]/n. The distribution of correlation coefficients developed inFisher (1962) may be modified to obtain the joint distribution of all thepartial correlations. The asymptotic joint covariance structure may be de-duced from the results of Pearson and Filon (1898), Hall (1927), and Hsu(1949). Further results and approximations to the distributions of estimatesof squared correlation coefficients are given in Kendall and Stuart (1961) p.341, Gajjar (1967), Hodgson (1968), Alexander and Vok (1963), Giri (1965),and Gurland (1966).

There are complex variate analogs of the preceding theorems. For ex-ample:

Theorem 8.2.4 Let the (r + s) vector-valued variate

have complex entries, mean 0 and be such that

and

Suppose Zxx is nonsingular. Then the y and a minimizing

are given by


We call a, given by (8.2.40), the complex regression coefficient of Y on X.It is a consequence that the indicated y, a also minimize the determinant,trace, and diagonal entries of (8.2.39). In the case s = 1 the minimum

(8.2.41) may be written

where we define

This parameter is clearly an extension to the complex-valued case of thesquared coefficient of multiple correlation. Because the minimum (8.2.41)must lie between Syy and 0, it follows that 0 ^ l^y*!2 ^ 1, the value 1occurring when the minimum is 0. On occasion we may wish to partition\RYx\2 into

and

where we have Sy* = Re SKA- + / Im Sy*. These expressions are measuresof the degree of linear relation of Y with Re X and Im X respectively.

Returning now to the case of vector-valued Y, a direct measure of thedegree of approximation of Y by a linear function of X is provided by theerror variate

which has mean 0 and is such that

and

Analogs of the partial covariance and partial correlation may be based onthe matrix (8.2.47) in an immediate manner.

Suppose now that a sample of values

of the variate of Theorem 8.2.4 are available. Define matrices x and y as in(8.2.25) and (8.2.26). We are led to construct the statistics

28255n jhhuckuaskoasujfnhcmidruepohjyu hyuserknhj 3293kui


and

which leads us to

Theorem 8.2.5 Suppose values of the form (8.2.49), j = ! , . . . ,« , are asample from a complex multivariate normal distribution with mean 0 andcovariance matrix (8.2.37). Let a be given by (8.2.51) and £., by (8.2.52).Then for any (rs) vector a

and if n —» <», vec a is asymptotically Nrsc(yec a^'S,, (x) S**"1)- Continu-

ing £,. is independent of a and distributed as (n — r)~lWsc(n — /%£„).

Finally in the case s = 1 the density function of |$y;r|2 = SYX^XX'^XY/ZYY is

We note that the distribution of |jfto-|2 in the complex case is identicalwith the real case distribution having twice the sample size and twice the Xdimension. The heuristic approach described in Section 8.4 will suggest thereason for this occurrence. A useful consequence is that we may use tablesand results derived for the real case. The density function (8.2.55) is given inGoodman (1963); see also James (1964) expression (112), and Khatri (1965a).In the case \Rvx\2 = 0, expression (8.2.55) becomes

is distributed as

8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER 295

This is the same as the null distribution of (6.2.10) derived under the as-sumption of fixed X. Percentage points in this case may therefore be derivedfrom F percentage points as they were in Chapter 6. Amos and Koopmans(1962) and Groves and Hannan (1968) provide a variety of non-null per-centage points for I/?™!2.

Confidence regions for the entries of a may be constructed from ex-pression (8.2.53) in the manner of Section 6.2.

By analogy with (8.2.34) the density function of a will be proportional to

8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER

We return to the notation of Section 8.1 and the problem of determiningan 5 vector, y, and an s X r filter, {a(w)}, so that

is close to Y(0- Suppose we measure closeness by the s X s Hermitianmatrix

Wahba (1966) determined this density in the case 5 = 1 .Sometimes it is of interest to consider the following complex analogs of

the partial correlations

A natural estimate of is provided by

We see from the distribution of S.. given in Theorem 8.2.5 that this last isdistributed as the sample complex correlation coefficient of e, with e* basedon n — r observations. Its modulus-square will have density function(8.2.55) with the replacement of RYX, RYX, n,r by RY^Y^X, RY^.Y^X, n — r,\,respectively. The asymptotic covariances of pairs of these estimates may bededuced from expression (7.6.16).

We then have


Theorem 8.3.1 Consider an (r -f- s) vector-valued second-order stationarytime series of the form (8.1.1) with mean (8.1.2) and autocovariance function(8.1.3). Suppose Cxx(u), Cyy(w) are absolutely summable and suppose fxx(\),given by (8.1.4), is nonsingular, — a> < X < oo. Then the y and a(w) thatminimize (8.3.2) are given by

and

where

The filter {a(w)j is absolutely summable. The minimum achieved is

A(X), given by expression (8.3.5), is the transfer function of the s X r filterachieving the indicated minimum. We call A(X) the complex regressioncoefficient of Y(r) on X(0 at frequency X.

The s vector-valued series

where y and a(w) are given in Theorem, 8.3.1 is called the error series. It isseen to have 0 mean and spectral density matrix

is called the error spectrum. We may write in the form

and thus we are led to measure the linear association of Y(/) with X(0 by thes X s matrix

In the case that s = 1, (8.3.10) is called the multiple coherence of Y(t) withX(r) at frequency X. We denote it by |jRy;r(X)j2 and write

and see that \Ryx(^)\2 = 0 corresponds to the incoherent case in which X(f)does not reduce the error variance. The value |.RyA-(A)|2 = 1 corresponds tothe perfectly coherent case in which the error series is reduced to 0. The co-efficient of multiple coherence was defined in Goodman (1963); see alsoKoopmans (1964a,b).

Returning to the case of general s we call the cross-spectrum between theath and bth components of the error series, zj(i) and sb(f), the partial cross-spectrum of Ya(t) with Yb(t) after removing the linear effects of X(;). It isgiven by

This gives us an interpretation for the individual entries of a matrix-valuedcomplex regression coefficient.

8.3 DETERMINATION OF AN OPTIMUM LINEAR FILTER 297

(In the case, r, s = 1, we define the coherency RYX(^) = fYx(^)/[fxx(tyfYY(W2-) The multiple coherence satisfies the inequalities

(see Exercise 8.16.35) and measures the extent to which the real-valued Y(f)is determinable from the r vector-valued X(/) by linear time invariantoperations. We write

— oo < X < oo. We call the coherency of these components, the partialcoherency of Ya(i) with Yt,(t) after removing the linear effects of X(/). It isgiven by

These last parameters are of use in determining the extent to which anapparent time invariant linear relation between the series YJ(i) and YA(/) isdue to the linear relation of each to a series X(f); see Gersch (1972). We canlikewise define the partial complex regression coefficient of Ya(t) on Yh(t)after removing the linear effects of X(/) to be

As would have been expected from the situation in the real variate case itturns out that expression (8.3.16) is the entry corresponding to Yb(t) in thematrix-valued complex regression coefficient of Ya(t) on the r + 1 vector-valued series


The above parameters of the partial cross-spectral analysis of time serieswere introduced by Tick (1963) and Wonnacott, see Granger (1964) p. xiii.They are studied further in Koopmans (1964b), Goodman (1965), Akaike(1965), Parzen (1967c), and Jenkins and Watt (1968).

As an example of the values of these various parameters consider themodel

where X(f) is r vector-valued, stationary with spectral density matrix fxx(X)\t(t) is s vector-valued, stationary, mean 0, with spectral density matrix, f,«(X),and independent of X(f) at all lags; y, is an s vector; and {a(w)} is an ab-solutely summable s X r matrix-valued filter. We quickly see that the com-plex regression coefficient of Y(r) on X(f) is given by

Also

and so

In the case that the series (8.1.1) is Gaussian a direct interpretation maybe placed on y and a(w) of Theorem 8.3.1. We have

Theorem 8.3.2 Under the conditions of Theorem 8.3.1 and if the series(8.1.1) is Gaussian, y. and a(w) of (8.3.3) and (8.3.4) are given by

Also

General references to the previous development, in the case r, s = în-clude: Wiener (1949), Solodovnikov (1950), Koopmans (1964a), and Black-man (1965). There are a variety of connections between the approach of thissection and that of Chapter 6. The principal difference in assumption is thatthe series X(0 is now stochastic rather than fixed. The model of Chapter 6was

This A(X) is called the matched filter for the signal Y(0 in the noise KJ(/)'We see its general character is one of not passing the frequency componentsof X(/) in frequency intervals where f,,(X) is very large relative to fyy(X),while the components are passed virtually unaltered in intervals where f,,(X)is small relative to fyy(X). In the case s — 1, the parameter /yyCxy/^X) iscalled the signal to noise ratio at frequency X.

8.4 HEURISTIC INTERPRETATION OF PARAMETERS ANDCONSTRUCTION OF ESTIMATES

8.4 HEURISTIC INTERPRETATION OF PARAMETERS 299

with i* constant, a(w) a summable filter, and e(/) a 0 mean error series. Exer-cise 8.16.33 is to show that such a model holds under the conditions ofTheorem 8.3.1.

We end this section with an example of the application of Theorem 8.3.1.Suppose thatijO) and Y(/) are independent s vector-valued, 0 mean sta-tionary series. Suppose that the series X(f) is given by

The series Y(/) may be thought of as a signal immersed in a noise series ij(f).Suppose that we wish to approximate Y(f) by a filtered version of X(r). Thespectral density matrix of X(f) and Y(r) is given by

Following expression (8.3.5) the transfer function of the best linear filter fordetermining Y(0 from X(0 is given by

/ = 0, ± 1, . . . satisfy Assumption 2.6.1 and suppose its values are availablefor t = 0 , . . . , T — 1. We evaluate the finite Fourier transform of thesevalues

— oo < X < oo. Following Theorem 4.4.2, for large T, this variate will bedistributed approximately as

larehjncujunuvyjkjmcidliser u


Referring to the discussion of Theorem 8.2.4, we now see that A(X), thecomplex regression coefficient of Y(r) on X(0 at frequency X may be inter-preted, approximately, as the complex regression coefficient of dy

(r)(X) ond;r(r)(X). It is therefore of use in the prediction of the value of dy(r)(X) fromthat of d^(r)(X) in a linear manner. The error spectrum, f,«(X), is approxi-mately proportional to the covariance matrix of the error variate of thisprediction problem. Likewise the partial complex regression coefficient ofYa(t) on Yb(t) after removing the linear effects of X(0 is nearly the complexregression coefficient of e/ya

(r)(X) on </yt(r)(X) after removing the linear effects

of d;r(r)(X). Continuing, suppose 5 = 1 . We see that |jRy*(X)!2> the multiplecoherence of Y(t) with X(V) at frequency X, may, following the discussion ofTheorem 8.2.4, be interpreted as the complex analog of the squared coeffi-cient of multiple correlation of dy(r)(X) with &x(T)(ty. Finally the partialcoherency of Ya(t) with Yb(t) after removing the linear effects of X(0 may beinterpreted as the complex analog of the partial correlation of </ra

(r)(X) withdYb

(T)(\) after removing the linear effects of d;r(r)(X). In the case that theseries (8.4.1) is Gaussian these partial parameters will be approximately con-ditional parameters given the value d*(r)(X).

Similar interpretations may be given in the case X = 0 (mod ir). Real-valued statistics and distributions will be involved in this case.

Let us next turn to the construction of estimates of the various param-eters. Suppose s(T) is an integer with 2irs(T)/T near X, where we takeX ^ 0 (mod TT). Following Theorem 4.4.1, the values

5 = 0, db 1,. .. , ±m will be approximately independent realizations of thevariate (8.4.3). Following the discussion of Theorem 8.2.5, specifically ex-pression (8.2.50), we can consider forming the statistics

and

in turn, the latter two being estimates of A(X), f,.(X), respectively. Theorem8.2.5 suggests approximations to the distributions of these statistics. In


Section 8.6 we will make the definition (8.4.5) more flexible by includingweights in the summation.

Heuristic approaches to the linear analysis of multivatiate series are givenin Tick (1963), Akaike (1965), and Groves and Hannan (1968). A discussionof the parameters and estimates is given in Fishman (1969).

We may also provide an interpretation of the parameters of Section 8.3 bymeans of the frequency components X(f,X), Y(f,X), t — 0, ±1,. . . and theirHilbert transforms Xa(t,\), Y"(f,X), t = 0, ±1,. . . . From the discussion ofSection 7.1 we see that the covariance matrix of the variate

Likewise we see that Im A(X) may be interpreted as the coefficient of XH(t,\)in the same regression.

The covariance matrix of the error variate of this regression analysis is

is, approximately, proportional to

Now

and so

We now see that Re A(X) may be interpreted as the coefficient of X(/,X) inthe regression of Y(/,X) on

We see that the coefficient of multiple coherence may be interpreted as thesquared coefficient of multiple correlation of Y(t) with expression (8.4.12).

We end this section with a discussion of some useful parameters. Theentries are generally complex-valued. In practice we may wish to deal withthe real-valued Re Aab(ty, Im /^(X), or the real-valued modulus <ja&(X) =|y4fl/,(X)|, and argument <£a&(X) = arg /^(X). Consider the case r, s = 1.(7(X) = |/4(X)| is called the gain of 7(0 over X(t) at frequency X. The functionG(X) is non-negative and we see that


We see, therefore, that the real parts of the partial coherencies may be inter-preted as partial correlations involved in the regression of Y(/,X) on thevariate (8.4.12). Similar considerations indicate that the imaginary partsmay be interpreted as partial correlations of the regression of Yff(f,X) on(8.4.12).

If s = 1, then the squared coefficient of multiple correlation of the regres-sion of Y(f) on the variate (8.4.12) is

and

If

then

Expression (8.4.18) suggests the source of the term gain. We see that theamplitude of the component of frequency X in X(t) is multiplied by G(X) inthe case of Y(t).

In the example Y(f) = aX(t — M), we see that

The gain here has the nature of the absolute value of a regression coefficientand is constant with respect to X.

The function <£(X) = arg A(\) is called the phase between Y(f) and X(t) at


frequency X. The fundamental range of values of 0(X) is the interval (—T,TT].Because fxx(ty ^ 0, <£(X) is given by

We see

and so <£(0) = 0. Also

Suppose

In terms of the Cramer representations

and so <£(X) may be interpreted as the angle between the component of fre-quency X in X(t) and the corresponding component in Y(t).

If, for example, Y(t) = aX(t — u), we see

Figure 8.4.1 <£(X), phase angle corresponding to delay of u time units when « > 0.

Figure 8.4.2 0(\), phase angle corresponding to delay of u time units when a < 0.

8.5 A LIMITING DISTRIBUTION FOR ESTIMATES

In this section we determine the limiting distribution of the estimates con-structed in the previous section under the conditions T—> °°, but m isfixed. Let

Set

and

These two functions are plotted in Figures 8.4.1 and 8.4.2 respectively taking(—7r,7r] as the fundamental range of values for <£(\).

On occasion the function

is more easily interpreted. It is called the group delay of Y(i) over X(t) atfrequency X.

In the case of the example, we see that the group delay is u for all valuesof a. That is, it is the amount that Y(i) is delayed with respect to X(t).

We note that the group delay is defined uniquely, whereas </>(A) is definedonly up to an arbitrary multiple of 2ir.

and so

Let m be a non-negative integer and s(T), T = 1, 2 , . . . a sequence of integerswith 2irs(T)/T-^\ as 7-> ». In the manner of Section 7.3, set


has the limiting distribution /2(2«+i-r) in the case X ̂ 0 (mod TT). Similarresults hold in the case X = 0 (mod TT).

We conclude from Exercise 4.8.8 that under the conditions of Theorem8.5.1, gee

(:r)(X) is asymptotically (2m + 1 - rYlWf(2m + 1 - r,f,.(X)) if

8.5 A LIMITING DISTRIBUTION FOR ESTIMATES 305

We now construct the estimates

where

We notice that if m is large, then C(m,r) = 1 and definition (8.5.6) is simpli-fied. We also form

We now state,

Theorem 8.5.1 Let the (/• + s) vector-valued series (8.5.1) satisfy Assump-tion 2.6.1 and have spectral density matrix (8.5.2). Let (8.5.2) be estimatedby (8.5.4) where m, s(T) are integers with 2irs(T)/T-> \ as T—> oo. Let

be distributed as (2m + \YlWf+s(2m + l,fzz(X)) if X ̂ 0 (mod TT), as(2m)-i Wr+s(2m, fzz(X)) if X = 0 (mod TT). Then A<r>(X) - A(X), gt,

(r)(X) tendin distribution to WrjrW**"1, W.s = C(w,rXWyy - V^YX^XX-^XY) re-spectively. Also /?Jvv*(X) tends to WtatJ[WSataWtbtb}^J, k= 1,. . . , s andif .v = 1, |/?™(r)(X)|2 tends to ^YX^XX'^XY/WYY.

The density function of the limiting distribution of A(r)(X) is deduciblefrom (8.2.57) and (8.2.34). This was given in Wahba (1966) for the cases = 1, X ̂ 0 (mod IT). A more useful result comes from noting that for any(r.v) vector «


X ̂ 0 (mod ir), asymptotically (2m - r)-1 WJ(2m - r,f,,(X)) if X = 0 (mod *•).It is also asymptotically independent of A(T)(X). We note, from Theorem7.3.3, that the asymptotic distribution of g..(r)(X) has the nature of theasymptotic distribution of a spectral estimate based directly on the valuest(0, t = 0 , . . . , T — 1 with the parameter 1m in that case replaced by2m — r in the present case.

The partial coherencies R^b.x(^\ a, b = 1,.. ., s are based directly onthe matrix g,,(r)(X). We conclude from the above remarks that under theconditions of Theorem 8.5.1, their asymptotic distribution will be that ofunconditional coherencies with the parameter 2m replaced by 2m — r. Inthe case of vector-valued normal variates this result was noted by Fisher(1924). The distribution for a single /?%,.*(*) is given by (8.2.32) and(8.2.55) with r = 1.

Turning to the asymptotic distribution of the coefficient of multiple co-herence in the case s = 1, set \Rrx\2 = \Ryx(*)\2, \Rrx\2 = |/*rjr(r)(X)|2.Then the limiting distribution of |/?yjr(r)(X)|2 will be given by (8.2.55) withn = 2m + 1, if X ̂ 0 (mod *•), by (8.2.32) with n = 2m, if X s 0 (mod ir).

Goodman (1963) suggested the above limiting distribution for the co-herence. See also Goodman (1965), Khatri (1965), and Groves and Hannan(1968). Enochson and Goodman (1965) investigate the accuracy of approxi-mating the distribution of tanh"1 |/to-(r)(X)| by a normal distribution withmean

and variance l/[2(2m — r)]. The approximation seems reasonable.

8.6 A CLASS OF CONSISTENT ESTIMATES

In this section we develop a general class of estimates of the parametersthat have been defined in Section 8.3. Suppose the values

t = 0,.. . , T - 1 are available. Define d^(r)(X), dy(r)(X), - » < X < «,

the manner of (8.4.2). Define the matrix of cross-periodograms

— oo < X < oo with similar definitions for IA-A^X), lyr^X). Let W(a) bea weight function satisfying Assumption 5.4.1.

We now estimate


the matrix of second-order spectra by

having taken note of the heuristic estimate (8.4.5). We estimate A(X) by

The typical entry, Aab(\), of A(X) is generally complex-valued. On occasionwe may wish to consider its amplitude Gab(\) and its argument 4»a/,(X). Basedon this estimate we take

and

for a = 1 , . . . , s and b = 1, . . . , / • . We estimate the error spectral densitymatrix fI£(X) by

We estimate the partial coherency Rr^-xQ^ by

In the case s — 1 we estimate |/?yA-(X)j2, the multiple coherence of Y(t) withX(0 by

— co < X < oo. The various estimates are seen to be sample analogs of cor-responding population definitions.

Turning to the asymptotic first-order moments of the various statisticswe have

Theorem 8.6.1 Let the (r + s) vector-valued series (8.6.1) satisfy Assump-tion 2.6.2(1) and have spectral density matrix (8.6.3). Suppose fxx(ty is non-singular. Let W(a) satisfy Assumption 5.6.1. Suppose the statistics A(r)(X),â/>

{r)(X), Gahw(^, gt,(7XX), R^Yb.x(\) are given by (8.6.5) to (8.6.9). Then

if BT -> 0, BTT -» oo as T -» «


and

We see that, in each case, the asymptotic means of the various statisticsare nonlinear matrix weighted averages of the population values of interest.The asymptotic bias will therefore depend on how near constant theseaverages are in the neighborhood of X. In the limit we have


and in the case s = 1

The various estimates are asymptotically unbiased in an extended sense.We can develop expansions in powers of BT of the asymptotic means; seeExercise 8.16.25. The important thing that we note from such expressions is

8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES 309

that the nearer the derivatives of the population second-order spectra are to0, the less the asymptotic bias. Nettheim (1966) expanded in powers ofBT~IT~* in the Gaussian case.

Estimates of the parameters under consideration were investigated inGoodman (1965), Akaike (1965), Wahba (1966), Parzen (1967), and Jenkinsand Watt (1968). The case r,s = 1 was considered in Goodman (1957),Tukey (1959a,b), Akaike and Yamanouchi (1962), Jenkins (1963a,b),Akaike (1964), Granger (1964), and Parzen (1964).

8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES

We now turn to the development of certain second-order properties of thestatistics of the previous section.

Theorem 8.7.1 Under the conditions of Theorem 8.6.1 and if fxx(a) is notsingular in a neighborhood of X or ju, then

To consider various aspects of these results, let *F(X) denote the matrixfxx(^)~l, then from (8.7.1) and the perturbation expansions given in Exer-cise 8.16.24, we conclude


for a, c = 1,.. ., s\ b, d = 1,. . . , r.Let us use the notation X'b to denote the set of Xj, d = 1,.. ., r excluding

Xb. Then we have from Exercise 8.16.37

We also have

and so

From the standpoint of variability we see, from (8.7.10), that the estimateAah

(T)(\) will be best if the multiple coherence of Yj(i) with X(/) is near 1 andif the multiple coherence of Xb(t) with X$t),. .. ,Xb-i(t), Xh+i(t),. . . ,XJ(t)is near 0.

Turning to a consideration of the estimated gain and phase we first notethe relations

and

We have from expressions to (8.7.6) to (8.7.8), (8.7.11), and (8.7.13) thefollowing:

and

We see that the variability of log Gab(T\$, 4>flb

(r)(\) will be small if thepartial coherence of Ya(t) with Xd(t) after removing the linear effects ofX\(t),. . . , Xt,-\(t), Xh+\(t), . . . ,XM is near 1. In the case that r, s = 1, the

8.7 SECOND-ORDER ASYMPTOTIC MOMENTS OF THE ESTIMATES 311

partial coherence in expressions (8.7.14) and (8.7.15) is replaced by the bi-variate coherence \RYx(X)\2.

We note that if X ± p f^ 0 (mod 2?r), then the asymptotic covariancestructure of the log gain and phase is identical.

Turning to the estimated error spectral density matrix we note, from(8.7.2) and (7.4.17), that the second-order asymptotic behavior of g««(r)(X) isexactly the same as if it were a direct spectral estimate/..(r)(X) based on thevalues e(/), t = 0, . . . , T - 1.

We note from (8.7.3) that the asymptotic behavior of estimated partial co-herencies is the same as that of the estimated coherencies of an s vector-valued series whose population coherencies are the partial coherenciesRYaYb-x(h), a, b = 1,. . . , s. Taking a = c, b = d we may deduce from(8.7.3) in the manner of Corollary 7.6.2 that

whose behavior is indicated in Table 8.7.1 and Figure 8.7.1. We see thatvalues of \R\ near 0 are not changed much, while values near 1 are greatlyincreased. Now

Figure 8.7.1 Graph of the transformation y = tanh~l x.

The asymptotic covariance structure of \RyaYb.x(^)\2 is seen to be the samefor all values of s, r. An examination of expression (8.7.16) suggests theconsideration of the variance stabilizing transformation


Table 8.7.1 Values of the Hyperbolic Tangent

x tanh"1 x

.00 .0000

.05 .0500

.10 .1003

.15 .1511

.20 .2027

.25 .2554

.30 .3095

.35 .3654

.40 .4236

.45 .4847

.50 .5493

.55 .6184

.60 .6931

.65 .7753

.70 .8673

.75 .9730

.80 1.0986

.85 1.2562

.90 1.4722

.95 1.83181.00

In the case 5 = 1 , the partial coherence is the multiple coherence,|/?y;r(\)!2, its estimate becomes the estimate \RYx(T)0^\2- It follows that ex-pressions (8.7.18) and (8.7.19) are valid for |#y*(r)(X)i2 as well. Enochsonand Goodman (1965) have investigated the effect of this transformation andhave suggested the approximations

8.8 ASYMPTOTIC DISTRIBUTION OF THE ESTIMATES 313

Were the estimate of Section 8.5 employed, then n = 2m + 1.Parzen (1967) derived the asymptotic mean and variance of Aa^

T)(\),log <?flfr

(r)(X), and <t>abm(\) in the case s = I . Jenkins and Watt (1968), pp.

484, 492, indicated the asymptotic covariance structure of A(r)(X) and\Ryx(T)(X)\2. In the case r, s - \ Jenkins (1963a) derived the asymptoticvariances of the phase, gain, and coherence.

8.8 ASYMPTOTIC DISTRIBUTION OF THE ESTIMATES

We now indicate limiting distributions for the statistics of interest. Webegin with

Theorem 8.8.1 Under the conditions of Theorem 8.6.1 and if fxx(^(l)) isnot singular for / = 1, . . . , L the estimates A^XX^), g,.(r)(X(/))» RYJ^),a, b = 1,. . ., s are asymptotically normally distributed with covariancestructure given by (8.7.1) to (8.7.3). A(r)(X) and g,,(r)(X) are asymptoticallyindependent.

This theorem will be of use in constructing confidence regions of interest.We conclude from Theorem 8.8.1 and expression (8.7.1) that if X ̂ 0(mod TT), then vec A(r)(X) is asymptotically

where

It follows from Exercise 4.8.2 that the individual entries of A(r)(X) will beasymptotically complex normal as conjectured in Parzen (1967). Theorem8.8.1 has the following:

Corollary 8.8.1 Under the conditions of Theorem 8.8.1, functions ofA(r)(X), g,,(r)(X), Ry^Yb(X) with nonsingular first derivative will be asymp-totically normal.

In particular, we may conclude that log GW>(r)(X) will be asymptoticallynormal with variance

will be asymptotically normal with variance

and log Gah(T)(\), 0fl/>

(r)(X) will be asymptotically independent a — 1,. . . , s

a, b = 1,.. . , 5; and if s — 1, tanhr1 l/?r*m(X)| will be asymptoticallynormal with variance (8.8.5) as well. Experience with variance stabilizingtransformations (see Kendall and Stuart (1966) p. 93) suggests that thetransformed variate may be more nearly normal than the untransformedone. We will use the transformed variate to set confidence intervals for thepopulation coherence in the next section.

We note that the limiting distribution of A(r)(X) given in Theorem 8.8.1 isconsistent with that of Theorem 8.5.1 for large m, if we make the identi-fication

8.9 CONFIDENCE REGIONS FOR THE PROPOSED ESTIMATES

The asymptotic distributions derived in the previous section may be usedto construct confidence regions for the parameters of interest. Throughoutthis section we make the identification (8.8.6).

We begin by constructing an approximate confidence region for /4fl/>(X).Suppose X ̂ 0 (mod ir). Expression (8.5.11) lead us to approximate thedistribution of


and b = 1 , . . . , r. Also tanh"1 |/?i^y6.^(X)| will be asymptotically normalwith variance

The distributions of the other variates are also consistent, since the Wishartdistribution is near the normal when the degrees of freedom are large.

by f2(2m+i-r) where *F(r)(X) = fxx(r)(^)~{- This approximation may bemanipulated in the manner of Section 6.9 to obtain a confidence region foreither {Re /^(X), Im /40«,(X)} or {log Ga/,(X), </><,&(X)}. Tn the case X = 0(mod r) we approximate the distribution of (8.9.1) by ho-m-r)-

If we let A«(r)(X), A«(X) denote the ath row of A(r)(X), A(X) respectively,then a confidence region for Afl(X) may be obtained by approximating theHistriVmtinn nf

by F2r;2(2m+i-r) in the case X ̂ 0 (mod T). Exercise 6.14.17 indicates a meansto construct approximate multiple confidence regions for all linear combina-

8.9 CONFIDENCE REGIONS FOR THE PROPOSED ESTIMATES 315

tions of the entries of Aa(X). This leads us to a consideration of the 100/8 per-cent region of the form

Figure 8.9.1 Confidence intervals of size 80 percent for the coherence, indexed by thenumber of periodograms averaged.

b = 1, . . . , / • in the case X ̂ 0 (mod TT). This last may be converted directlyinto a simultaneous region for <£fl&(X), log <J0&(X), b — 1,. . . , r in the mannerof expression (6.9.11).

Turning to a consideration of f.XX) we note that the parameters /,0%(X),1 ^ a ^ b ^ 5 are algebraically equivalent to the parameters /.0.0(X),a — 1,. .. , s; RYaYb-x(X), I ^ a < b ^ s. We will indicate confidenceintervals for these.

Theorem 8.5.1 leads us to approximate the distribution of g.(J^(X)//,0«a(X)


by X2(2m+i_r)/{2(2m + 1 - r)| if X ^ 0 (mod ir), by X22m-r/{2m - r\ if

X = 0 (mod TT). Confidence intervals for/,0,a(X) may be obtained from theseapproximations in the manner of expression (5.7.5).

In the case of a single /?yoy(1.XX), Theorem 8.8.1 leads us to consider the100(1 — «) percent confidence interval

Figure 8.9.2 Confidence intervals of size 90 percent for the coherence, indexed by thenumber of periodograms averaged.

Alternately we could consult the tables of Alexander and Vok (1963).The setting of confidence regions of the sort considered in this section is

carried out in Goodman (1965), Enochson and Goodman (1965), Akaike(1965), and Groves and Hannan (1968). In the case \Ryx(X)\2 = 0, X ̂ 0(mod TT), the approximate lOOa percent point of \RYxlT)(X)\2 is given by theelementary expression 1 — (1 — «)1/2m; see Exercise 8.16.22.

8.10 ESTIMATION OF THE FILTER COEFFICIENTS 317

Alternately we could consult the tables of Amos and Koopmans (1962) forthe distribution of the complex analog of the coefficient of correlation withthe sample size reduced by r or use Figures 8.9.1 and 8.9.2 prepared fromthat reference.

In the case of a multiple coherence, we can consider the approximate100(1 — «) percent confidence interval

8.10 ESTIMATION OF THE FILTER COEFFICIENTS

Suppose that the (r + s) vector-valued series (8.1.1) satisfies

/ = 0, ±1,.. . where t(t), t = 0, ±1, . . . is a stationary series independentof the series X(/). Theorem 8.3.1 leads us to consider the time domain co-efficients

where A(X) = fYx(Wxx(X)-1.Suppose now that A(r)(X) is an estimate of A(X) of the form considered

previously in this chapter. We can consider estimating a(w) by the statistic

where PT is a sequence of integers tending to <» as T —> <».We would expect the distribution of a(r)(«) to be centered near


After the discussion following Theorem 7.4.2, the latter will be near

in the case that the population parameters fy*(a), fxx(a) do not vary muchin intervals of length O(#r). Expression (8.10.5) may be written

which is near the desired a(«) in the case that the filter coefficients fall off to 0sufficiently rapidly. These remarks suggest that, if anything, the procedureof prefiltering will be especially necessary in the present context.

Turning next to second-order moment considerations, expression (8.7.1)suggests that

provided PT is not too large. In fact we have

Theorem 8.10.1 Let the (r + s) vector-valued series (8.1.1) satisfy (8.10.1)where the series X(0, e(0 satisfy Assumption 2.6.2(1) and are independent.Suppose fxx(ty is nonsingular and has a bounded second derivative. LetW(a) satisfy Assumption 6.4.1. Let A(r)(X) be given by (8.6.5) and a(r)(w)by (8.10.3) for w = 0, ±1, Suppose PT -» «> with PTBT ^ 1,PTi+*BT-lT-1 ->0 for an e > 0. Then a(r)(ui), . . . , a(7-)(wy) are asymptoti-cally jointly normal with means given by (8.10.4) and covariances given by(8.10.7).

We note that, to first order, the asymptotic covariance matrix of vec a(r)(w)does not depend on u. We may consider estimating it by

where g.«(r)(X) is given by (8.6.8). If we let *F(r)(X) denote fxx(T)(\)~l and set

we consider the problem of estimating y, a, and f.,(X), — <» < X < <». Themodel (8.10.12) is broader than might be thought on initial reflection. Forexample consider a model

8.10 ESTIMATION OF THE FILTER COEFFICIENTS 319

then we can set down the following approximate 100(1 — a) percent con-fidence interval for ajk(ii):

If one sets PT = BT~I, then the asymptotic variance is of order T~l.Hannan (1967a) considered the estimation of the a(w) in the case that

a(u) = 0 for v sufficiently large and for a linear process error series c(f),/ = 0, ±1 , . . . . Wahba (1966, 1969) considers the Gaussian case withfixed P.

It is of interest to consider also least squares estimates of the a(w), u — 0,±1 ; . . . obtained by minimizing the sum of squares

for some p, q ^ 0. We approach the investigation of these estimates througha consideration of the model

for / = 0, ±1 , . . . . Here we assume that y is an unknown s vector; a is anunknown s X r matrix; X(/), / = 0, ±1,. . . is an observable stationary rvector-valued series; and e(0, t = 0, ±1,. . . an unobservable 0 meanstationary s vector-valued error series having spectral density matrix f.,(X),— oo < X < c». The series Y(0, / = 0, ±1, ... is assumed observable.Given a stretch of values

for t = 0, db l , . .. where X(0> t = 0, ±1,.. . is a stationary r' vector-valued series and the series t(t\ / = 0, ±1, . . . is an independent stationaryseries. This model may be rewritten in the form (8.10.12) with the definitions


and

in which e(0> / = 0, ±1,. . . is a 0 mean white noise process. The resultsbelow may therefore be used to obtain estimates and the asymptotic proper-ties of those estimates for the models (8.10.14) and (8.10.17).

Given the stretch of values (8.10.13), the least squares estimates y(r), a(r)

of y and a are given by

Theorem 8.10.2 Let the s vector-valued series Y(0, / = 0, ±1, ... satisfy(8.10.12) where X(/), / = 0, ±1, . . . is a 0 mean r vector-valued series satis-fying Assumption 2.6.1 having autocovariance function cxx(u\ w = 0,±1, ... and spectral density matrix fxx(^),— °° < X < °°; e(0> t = 0,± 1 , . . . is an independent s vector-valued series satisfying Assumption 2.6.1,having spectral density matrix f«(X) = [/o&(X)]; and y, a are s X 1 and s X rmatrices. Let v<r>, a^> be given by (8.10.18) and (8.10.19). Let fee

(r)(X) =[/«6(r)(X)] be given by (8.10.20) where W(a), - « < a < «, satisfies As-sumption 5.6.1 and BTT —» <» as T—> «>. Then y(r) is asymptoticallyjYX^r-^Trf.Ô)); vec a(r) is asymptotically independent A^(vec 3,2^7-'J f.,(o!)(E)[cJfXO)~I^jK«)c^^(0)~1]â)- Also gs«

(r)(X) is asymptotically inde-pendent normal with

for t = 0, ± 1, These last matrices have the dimensions s X r'(p + q — 1)and r'(p + </ — 1) X 1 respectively. A particular case of the model (8.10.14)is the autoregressive scheme

and

As an estimate of f£,(X), we could consider

where e(r) is the residual series given by

In connection with these estimates we have

In the case that e(/), / = 0, ±1, ... is a white noise series with spectraldensity matrix fei(X) = (27r)-

15:, — a , < A < «. Theorem 8.10.2 indicatesthat vec [a ( r )(—p), . . . , a(r)(g)] is asymptotically normal with mean vec[a(— p), . . . , a(q)] and covariance matrix jMS (x) cxx(G)~l> This gives theasymptotic distribution of the least squares estimates of the parameters ofan autoregressive scheme. We considered corresponding results in the caseof fixed X(f) in Section 6.12. We could also have here considered an ana-log of the "best" linear estimate (6.12.11).

8.11 PROBABILITY 1 BOUNDS

In Section 7.7 we derived a probability 1 bound for the deviations of aspectral estimate from its expected value as T —> ». That result may be usedto develop a bound for the deviation' of A(r)(X) from {£fy;r(r)(X)}[Efxx(T)(X)}~1. We may also bound the deviation of A(r)(X) from A(X) andthe other statistics considered from their corresponding population param-eters. Specifically, we have

8.11 PROBABILITY 1 BOUNDS 321

and

The asymptotic distribution of g,,(r)(X) is seen to be the same as that off«(r)(X), the variate based directly on the error series t(r), t = 0, ±1, . . . .In the case of the model (8.10.14) the limiting distributions are seen to in-volve the parameters

and


Theorem 8.11.1 Let the (r + s) vector-valued series (8.1.1) satisfy As-sumption 2.6.1. Let the conditions of Theorem 8.6.1 be satisfied. LetDT = (BTT)l'2B^ for some e > 0. Suppose 2r B% < «> for some m > 0.Then

8.12 FURTHER CONSIDERATIONS

The statistics discussed in this chapter are generally complex-valued.Thus, if we have computer programs that handle complex-valued quantitiesthere will be no difficulty. However, since this is often not the case, it isworth noting that the statistics may all be evaluated using programs basedon real-valued quantities. For example, consider the estimate of the com-plex regression coefficient:

This gives

if one uses the operation of Section 3.7. Taking the first s rows ofgives

almost surely as T —> oo. In addition

We conclude from this theorem that if ET-, DT~I —* 0 as T—» «, then thevarious statistics are strongly consistent estimates of their correspondingpopulation parameters.

almost surely as T—» °° for — °° < X < <», j = 1,. . . , s\ k — 1, . . . , / • .The error terms are uniform in X.

is observed where r)(i), t = 0, ±1, ... is an error series independent of 9C(0-The problem of estimating y, {a(w)| in a situation such as this is a problemof errors in variables. Considerable literature exists concerning this problemfor series not serially correlated; see Durbin (1954), Kendall and Stuart(1961), for example. If the series involved are stationary then we may write


a set of equations that involves only real-valued quantities. The principalcomplication introduced by this reduction is a doubling of the dimension ofthe X variate. Exercise 3.10.11 indicates an identity that we could use in analternate approach to equation (8.12.1).

Likewise we may set down sample parallels of expressions (8.4.13) and(8.4.14) to determine the error spectral density, partial coherency, andmultiple coherence statistics.

We next mention that there are interesting frequency domain analogs ofthe important problems of errors in variables and of systems of simulta-neous equations.

Suppose that a series Y(f), / = 0, ±1,... is given by

where the r vector-valued series 9C(0» * = 0, ±1,. . . is not observed directlyand where e(0, t — 0, ±1,.. . is an error series independent of the series9fXO- Suppose, however, that the series

with the variates approximately uncorrelated for distinct s. Because of thisweak correlation we can now consider applying the various classical pro-cedures for approaching the problem of errors in variables. The solution ofthe problem (8.12.4-5) will involve separate errors in variables solution foreach of a number of frequencies X lying in [0,7r].

Perhaps the nicest results occur when an r vector-valued instrumentalseries Z(/), / = 0, ±1,.. . is available for analysis as well as the seriesY(/), X(/). This is a series that is correlated with the series 9C(f)> * = 0,± I , . . . , but uncorrelated with the series e(f) andr^t). In the stationary casewe have, from expressions (8.12.5) and (8.12.4),

The statistic


now suggests itself as an estimate of A(X). Hannan (1963a) and Parzen(1967b) are references related to this procedure. Akaike (1966) suggests aprocedure useful when the seriesr)(/) is Gaussian, but the series ac(0 is not.

A variety of models in econometrics lead to systems of simultaneousequations taking the form

where Y(r), t(/) are s vector-valued series and Z(0 is an r vector-valued seriesindependent of the series e(0; see Malinvaud (1964). A model of the form(8.12.10) is called a structural equation system. It is exceedingly general be-coming, for example, an autoregressive scheme in one case and a linearsystem

with the series X(0, t(0 correlated in another. This correlation may be dueto the presence of feed-back loops in the system. The econometrician is ofteninterested in the estimation of the coefficients of a single equation of thesystem (8.12.10) and a variety of procedures for doing this have now beenproposed (Malinvaud (1964)) in the case that the series are not seriallycorrelated.

In the stationary case we can consider setting down the expression

for lirs/T near X with the variates approximately uncorrelated for distinct s.It is now apparent that complex analogs of the various econometric estima-tion procedures may be applied to the system (8.12.12) in order to estimatecoefficients of interest. The character of this procedure involves analyzing asystem of simultaneous equations separately in a number of narrow fre-quency bands. Brillinger and Hatanaka (1969) set down the system (8.12.10)and recommend a frequency analysis of it. Akaike (1969) and Priestly (1969)consider the problem of estimation in a system when feed-back is present.

In fact, as Durbin (1954) remarks, the errors in variables model (8.12.4)and (8.12.5) with instrumental series Z(f) may be considered within thesimultaneous equation framework. We simply write the model in the form

and look on the pair Y(f), X(/) as being ¥(/) of (8.12.10).

8.13 ALTERNATE FORMS OF ESTIMATES 325

8.13 ALTERNATE FORMS OF ESTIMATES

The estimates that we have constructed of the gain, phase, and coherencehave in each case been the sample analog of the population definition. Forexample, we defined

and then constructed the estimate

On some occasions it may prove advantageous not to proceed in such adirect manner.

For example expressions (8.6.11) and (8.6.13) indicate that asymptoticbias occurs for G(r)(X) if the spectra fyx(oi) and/**(«) are not flat for a nearX. This suggests that if possible we should prefilter X(t) and Y(t) to obtainseries for which the second-order spectra are near constant. The gain relat-ing these filtered series should be estimated and an estimate of G(X) beconstructed.

In another vein expression (8.7.14) indicated that

This suggests that in situations where |/?rA-(X)|2 is near constant, with respectto X, we could consider carrying out a further smoothing and estimatelog G(X) by

for some N, Ar where it is supposed that G(T)(a) has been constructed in themanner of Section 8.6.

We note in passing the possibility suggested by (8.4.18) of estimatingG(X)2 by

Exercise 8.16.12 indicates that this is not generally a reasonable procedure.We have proposed


as an estimate of the phase, <£(X). Expression (8.6.12) indicates that ave <f>(T)(\)is principally a nonlinear average of the phase with unequal weights. Thisoccurrence leads us, when possible, to prefilter the series prior to estimatingthe phase in order to obtain a flatter cross-spectrum.

Alternately we could consider nonlinear estimates that are not as affectedby variation in weights. For example, we could consider an estimate ofthe form

or of the form

The fact that the phase angle is only defined up to an arbitrary multipleof 2ir means we must be careful in the determination of the value ofarg/yA-(r)(X + nAr) when forming (8.13.8).

This indetermination also leads to complications in the pictorial displayof <£(r)(X). If either <£(X) is changing rapidly or var <£(r)(X) is large, then an ex-tremely erratic picture can result. For example, Figure 7.2.5 is a plot of theestimated phase angle between the series of seasonally adjusted mean month-

Figure 8.13.1 <£<r)(X), the estimated phase angle between seasonally adjusted mean monthlyBerlin temperatures and the negative of seasonally adjusted mean monthly Vienna tem-peratures. (IS periodograms averaged in estimation.)


Figure 8.13.2 Another manner of plotting the data of Figure 8.13.1. (The range of <£(r)(X) istaken to be [-27T, 2ir].)

ly temperatures at Berlin and Vienna determined from the cross-periodogram.It is difficult to interpret this graph because when the phase takes a smalljump of the form ?r - e to ?r + e, <£(r)(X) when plotted in the range (-ir,7r]moves from ir — e to — ir — e. One means of reducing the impact of this


Figure 8.13.3 Another manner of plotting the data of Figure 8.13.1. (The heavy line cor-responds to <£(7Xx) in [TT, 2ir].)

effect is to plot each phase twice, taking its two values in the interval(-27r,2ir]. If the true phase is near TT, then an especially improved pictureis obtained. For example, Figure 8.13.1 is the estimated phase when 15 pe-riodograms are averaged between seasonally adjusted monthly Berlin tem-peratures and the negative of seasonally adjusted monthly Vienna tempera-tures taking the range of </>(r)(X) to be (—*-,*•]. If this range is increased to

2^ Frequency in cycles per month

Figure 8.13.4 |/?y*(r)(X)|2, estimated coherence of seasonally adjusted mean monthlyBerlin temperatures and seasonally adjusted mean monthly Vienna temperatures. (15periodograms averaged in estimation.)


(—2TT,2ir], as suggested, Figure 8.13.2 results. J. W. Tukey has proposedmaking a plot on the range [0,7r] using different symbols or lines for phaseswhose principal values are in [0,x] from those whose values are in(7r,27r].If this is done for the Berlin-Vienna data, then Figure 8.13.3 is obtained.

Another procedure is to plot an estimate of the group delay expression(8.4.27), then the difficulty over arbitrary multiples of 2x does not arise.Generally speaking it appears to be the case that the best form of plot de-pends on the 4>(X) at hand.

We next turn to alternate estimates of the coherence. The bias of\RYx(T)(ty\2 may be reduced if we carry out a prefiltering of the filtered series,and then algebraically deduce an estimate of the desired coherence.

Alternatively we can take note of the variance stabilizing properties of thetanhr1 transformation and by analogy with expression (8.13.4) consider asan estimate of tanrr1 |/?y*(X)|:

Figure 8.13.5 Coherence estimate based on the form (8.13.9) with m = 5, N = 2 for Berlinand Vienna temperature series.

We note that the effect of the tanh""1 transformation is to increase values of\RYX(T)(O)\ that are near 1 while retaining the values near 0. High co-herences are therefore weighted more heavily if we form (8.13.9). Figure


8.13.4 is a plot of |/?rA-(r)(X)|2, f°r the previously mentioned Berlin andVienna series, based on second-order spectra of the form (8.5.4) withm = 7. Figure 8.13.6 results from expression (8.13.9) basing |/?rjr(r)(«)| onsecond-order spectra of the form (8.5.4) with m = 5 and then taking N = 2.The estimates in the two pictures therefore have comparable bandwidth andstability. It is apparent that the peaks of Figure 8.13.5 are less jagged thanthose of Figure 8.13.4. The nonlinear combination of correlation coefficientsis considered in Fisher and Mackenzie (1922). See also Rao (1965) p. 365.

Tick (1967) argues that it may well be the case that \fYx(a)\2 is near con-stant, whereas/rA-(a) is not. (This would be the case if Y(t) = X(t — u) forlarge M.) He is then led to propose estimates of the form

in the case that |/?y;r(a)|2 is near constant, but the second-order spectraare not.

Jones (1969) considered the maximum likelihood estimation of \Ryx(X)\2

from the marginal distribution of/yjr(r)(X),/yy(r)(X), and |/y*(r)(X)|2, deriv-ing the latter from the limiting distribution of Theorem 8.5.1.

The importance of using some form of prefiltering, prior to the estimationof the parameters of this chapter, cannot be overemphasized. We saw, inSection 7.7, the need to do this when we estimated the cross-spectrum oftwo series. A fortiori we should do it when estimating the complex regressioncoefficient, coherency, and error spectrum. Akaike and Yamanouchi (1962)and Tick (1967) put forth compelling reasons for prefiltering. In particularthere appear to be a variety of physical examples in which straight-forwarddata processing leads to a coherency estimate that is near 0 when for physi-cal reasons the population value is not. Techniques of prefiltering are dis-cussed in Section 7.7, the simplest being to lag one series relative to theother.


As a worked example of the suggested calculations in the case r, s — 1 werefer the reader to the Berlin and Vienna monthly temperature series previ-

he also proposes estimates of the form

8,15 USES OF THE ANALYSIS OF THIS CHAPTER 331

ously considered in Chapters 6 and 7. The spectra and cross-spectrum of thisseries are presented as Figures 7.8.1 to 7.8.4. The estimates are equal tothose in expression (8.5.4) with m = 10. Figure 6.10.3 gives g,,(T)(X), Figure6.10.4 gives Re A(T)(\\ Figure 6.10.5 gives Im v4(r)(X), Figure 6.10.6 givesG<r>(\), Figure 6.10.7 gives </»(r)(X) and Figure 6.10.8 gives I/W'O)!2-Finally Figure 6.10.9 gives a(T)(ti). The estimated standard errors of thesevarious statistics are given in Section 6.10.

As a worked example in the case r = 13 and s = 1 we refer the readerback to Section 6.10 where the results of a frequency analysis of the sortunder study are presented: the series Y(t) refers to the seasonally adjustedmonthly mean temperatures at Greenwich, England and X(f) refers toseasonally adjusted mean monthly temperatures at 13 other stations.Figure 6.10.10 gives the gains, Gfl

(r)(X) and phases, 4>a(r)(X). Figure 6.10.11

gives the error spectrum, logio g..(r)(X). Figure 6.10.12 gives the multiplecoherence, |/?y*(r)(X)|2.

8.15 USES OF THE ANALYSIS OF THIS CHAPTER

The uses that the techniques of this chapter have been put to are inti-mately entwined with the uses of the analysis of Chapter 6. We have alreadynoted that many of the statistics of the present chapter are the same asstatistics of Chapter 6, however, the principal difference in assumption be-tween the chapters is that in the present chapter the series X(r), t = 0,±1,.. . is taken as stochastic, whereas in Chapter 6 it was taken as fixed. Inconsequence, the statistical properties developed in this chapter refer toaverages across the space of all realizations of X(/) whereas those of Chapter6 refer to the particular realization at hand.

One area in which researchers have tended to assume X(f) stochastic, isthe statistical theory of filtering and prediction. See Wiener (1949), Solodov-nikov (1960), Lee (1960), Whittle (1963a), and Robinson (1967b) for exam-ple. The optimum predictors developed work best across the space of allrealizations of X(/) that may come to hand and statistical properties of em-pirical predictors refer to this broad population.

The reader may look back to Section 6.10 for a listing of situations inwhich the various statistics of this chapter have been calculated. In fact theauthors of the papers listed typically introduced the statistics in terms ofstochastic X(0- Brillinger and Hatanaka (1970), Gersch (1972) estimatepartial coherences and spectra.

The choice of whether to make X(r) fixed or stochastic is clearly tiedup with the choice of population to which we wish to extend inferencesbased on a given sample. Luckily, as we have seen, the practical details ofthe two situations are not too different if the sample is large.


8.16 EXERCISES

8.16.1 Under the conditions of Theorem 8.2.2 and if s = 1, prove that <KX) is thefunction with finite second moment having maximum correlation with Y;see Rao (1965) p. 221, and Brillinger (1966a).

8.16.2 Under the conditions of Theorem 8.2.2 prove that the conditional dis-tribution of Y given X is multivariate normal with mean (8.2.14) andcovariance matrix (8.2.16).

8.16.3 Let ^XA-(X) denote the complex regression coefficient of Y(i) on the seriesX(i) and AXY(\) denote the complex regression coefficient of X(t) on theseries Y(f) in the case s, r = 1. Show that

Hence, note that /4*y(X) = AYX(^)~I only if the coherence between X(t)and 7(0 is 1.

8.16.4 If A(X), the complex regression coefficient of Y(0 on the series X(0, isconstant for all X, show that it is equal to the ordinary regression coefficientof Y(0 on X(0.

8.16.5 Let p(0, t — 0, ±1, . . . be a white noise process, that is, a second-order stationary process with constant power spectrum. Suppose X(t) =Sa b(t - «)p(«), y(0 = 2uc(t - u)p(u). Determine A(\), <#X), G(X),RYX&), and \RYX(X)\2.

8.16.6 Under the conditions of Theorem 8.3.1 and if s = 1, prove that |.Ky;r(X)|2

= 1, — oo <X< 03, if and only if Y(f) is a linear filtered version of X(/).8.16.7 Under the conditions of Theorem 8.3.1 and if s, r — 1 determine the co-

herency between Y(t) and its best linear predictor based on X(t). Also de-termine the coherency between the error series e(0 and X(t).

8.16.8 Under the conditions of Theorem 8.3.1 and if s, r = 1, prove that /?yjr(X),|/?y;r(X)|2 are the Fourier transforms of absolutely summable functions if/**(X), /yy(X) ̂ 0, - oo <X< ».

8.16.9 If Y(f) = X»(t\ prove that <£(X) = ir/2. Find <£(X) if Y(t) = X»(t - u)for some integer u.

in the case r, s = 1.

8.16.11 Prove |/rA-(r)(X)i2 = /ATA-(r)(X)/yy(7->(X) and so \IYx(T)(\)\2/[lxxm(X)/yy(r)(X)] is not a reasonable estimate of |/?yA-(X)|2 in the case r, s = 1.

8.16.12 Prove that

and so [/yy(r)(X)//r*(r)(X)]I/2 is not generally a reasonable estimate ofG(X) in the case r, s = 1.

8.16 EXERCISES 333

8.16.13 Discuss the reason why X(f) and Y(t) may have coherence 1 and yet it is notthe case that |/?™<r)(X)|2 = 1.

8.16.14 Suppose that we estimate the spectral density matrix, fzz(X), by the secondexpression of (8.5.4) with m = T — \. Show that

Discuss the effect on these expressions if Y(t) had previously been laggedu time units with respect to X(t).

8.16.15 Under the conditions of Section 8.6, if W(a) ^ 0, and if r, s = 1 provethat |*KA:(r)(X)l2 ^ 1.

8.16.16 Under the conditions of Theorem 8.7.1 and r, s = 1, prove that

8.16.17 Under the conditions of Theorem 8.7.1, and r, s — 1 except that /yjr(X)= 0, show that

8.16.18 Under the conditions of Theorem 8.8.1, and r, s = 1 except that/VxCX) =0, show that <£(r)(X) is asymptotically uniformly distributed on (—TT.IT].

8.16.19 Develop a sample analog of the error series e(/) and of expression (8.3.8).8.16.20 Under the conditions of Theorem 8.7.1 and r, s = 1, show that

8.16.21 Under the conditions of Theorem 8.2.3, show that the conditional varianceof the sample squared coefficient of multiple correlation given the X valuesin approximately

8.16.25 Let the bivariate time series [X(t),Y(fj\ satisfy Assumption 2.6.2(3). LetW(a) satisfy Assumption 5.6.1 and (5.8.21) with P = 3. Suppose the re-maining conditions of Theorem 8.6.1 are satisfied, then


Contrast this with the unconditional value 4/?y*2(l — Ryx2)/n; seeHooper (1958).

8.16.22 For the random variable whose density is given by expression (8.2.56) showthat E\&YX\2 = r/n and

for 0 < x < 1; see Abramowitz and Stegun (1964) p. 944. In the caser = 1, this leads to the simple expression x = 1 — (1 — a)1/^-') for thelOOa percent point of |$y*|2.

8.16.23 For a real-valued series Y(t) and a vector-valued series X(/), show that themultiple coherence is unaltered by nonsingular linear filtering of the seriesseparately.

8.16.24 Show that the following perturbation expansions are valid for small a, /3,

where /" denotes the second derivative.

8.16.29 Prove that the partial correlation of Yi with ¥2 after removing the lineareffects of X does not involve any covariances based on Yj} j > 2.

8.16.30 Prove that a given by (8.2.15) maximizes the squared vector correlationcoefficient

8.16.33 Under the conditions of Theorem 8.3.1, prove that there exist y, absolutelysummable {a(w)} and a second-order stationary series e(/) that is ortho-gonal to X(0 and has absolutely summable autocovariance function, suchthat Y(/) = y + £ „ a(/ - n)X(«) + t(t).

8.16.34 Let the series of Theorem 8.3.1 be an m dependent process, that is, suchthat values of the process more than m time units apart are statisticallyindependent. Show that a(w) = 0 for \u\ > m.

8.16.35 Under the conditions of Theorem 8.3.1, prove that |JRr0rt.A:(X)|2 ^ 1. Ifs = I , prove that \Rrx(\)\2 ^ 1.

8.16.36 Prove that in the case s = 1

8.16.26 Prove that

if the dimensions of the matrices are appropriate.

8.16.27 In connection with the matrix just after (8.2.18) prove that

8.16.28 Given the error variate (8.2.19), under the conditions of Theorem 8.2.1prove:

8.16.31 Under the conditions of Theorem 8.2.1 and if the s X s T ^ 0, determiney and a that minimize

8.16.32 Let X(/), / = 0, ±1, . . . be an r vector-valued autoregressive process oforder m. Prove that the partial covariance function

8.16 bufhufygk 335

vanishes for u > m.


8.16.37 Show that the inverse of the matrix (8.2.47) of partial covariances is thes X s lower diagonal matrix of the inverse of the covariance matrix(8.2.37).

8.16.38 If s = 1, determine the coherency between Y(t) and the best linear predictorbased on the series X(/), / = 0, ±1, . . . .

8.16.39 Prove that

8.16.40 Let py*(0)2 denote the instantaneous squared multiple correlation of Y(t)with X(/). Show that

8.16.41 Under the conditions of Theorem 8.3.2, prove that the conditional spectraldensity matrix of Y(r) given the series X(/), / = 0, ±1, ... is

8.16.42 Suppose the weight function W(pt) used in forming the estimate (8.6.4) isnon-negative. Show that |/?y

($6.*(X)|2, \RYx(T)(\)\2 ^ 1.8.16.43 Suppose the conditions of Theorem 8.5.1 are satisfied. Suppose ffaxb.x

= 0. Show that the asymptotic distribution of <f>ai,(T)(X) is the uniform

distribution on (—ir,ir).8.16.44 Let the conditions of Theorem 8.3.1 be satisfied. Show that the complex

regression coefficient of the real-valued series Ya(t) on the series X(r) is thesame as the ath row of the complex regression coefficient of the s vector-valued series Y(/) on the series X(/) for a = 1,.. ., s. Discuss the implica-tions of this result.

8.16.45 Under the conditions of Theorem 8.2.1, show that a = %Yx2>xx~l max-imizes 2,Y,ax('ZaX,ax)~l'S>aX.Y.

8.16.46 Let W be distributed as Wrc(n, £). Show that vec W has covariance matrix

«£(X) Sr.

8.16.47 (a) If W is distributed as W,(/i,S) show that

(b) If W is distributed as *F,c(n,S) show that

See Wahba (1966).

9

PRINCIPAL COMPONENTSIN THE FREQUENCY DOMAIN

9.1 INTRODUCTION

In the previous chapter we considered the problem of approximating astationary series by a linear filtered version of another stationary series. Inthis chapter we investigate the problem of approximating a series by afiltered version of itself, but restraining the filter to have reduced rank.

Specifically, consider the r vector-valued series X(0, / = 0, ±1, . . .with mean

absolutely summable autocovariance function

and spectral density matrix

Suppose we are interested in transmitting the values of the X(0 seriesfrom one location to another; however, only q ^ r channels are availablefor the transmission. Imagine forming the series

is small.We might view the problem as that of determining a q vector-valued series

C(/) that contains much of the information in X(f). Here, we note thatBowley (1920) once remarked "Index numbers are used to measure thechange in some quantity which we cannot observe directly, which we knowto have a definite influence on many other quantities which we can so ob-serve, tending to increase all, or diminish all, while this influence is con-cealed by the action of many causes affecting the separate quantities invarious ways." Perhaps, <X/) above plays the role of an index number seriesfollowing some hidden series influencing X(f). As we have described in itsderivation, the above series C(/) is the q vector-valued series that is best forgetting back X(r) through linear time invariant operations.

Alternatively suppose we define the error series e(/) by

338 PRINCIPAL COMPONENTS IN THE FREQUENCY DOMAIN

with {b(w)} a q X r matrix-valued filter, transmitting the series ((/) over theq available series and then, on receipt of this series, forming

as an estimate of X(f) for some r vector-valued y and r X q filter (C(M)}. Inthis chapter we will be concerned with the choice of p and the filters {b(")}»|C(M)} so that X*(0 is near X(/).

The relation between X*(f) — y and X(t) is of linear time invariant formwith transfer function

where B(\), C(X) indicate the transfer functions of |b(w)}, {c(w)| respectively.We now see that the problem posed is that of determining an r X r matrixA(X) of reduced rank so that the difference

and then write

Then X(/) is represented as a filtered version of a series ((/) of reduced di-mension plus an error series. A situation in which we might wish to set downsuch a model is the following: let ((f) represent the impulse series of q earth-quakes occurring simultaneously at various locations; let X(/) represent thesignals received by r seismometers; and let C(M) represent the transmissioneffects of the earth on the earthquakes. Seismologists are interested in in-vestigating the series <(/), t = 0, ±1, . . . ; see for example Ricker (1940)and Robinson (1967b).

9.2 PRINCIPAL COMPONENT ANALYSIS OF VECTOR-VALUED VARIATES 339

An underlying thread of these problems is the approximation of a seriesof interest by a related series of lower dimension. In Section 9.2 we reviewsome aspects of the classical principal component analysis of vector-valuedvariates.

9.2 PRINCIPAL COMPONENT ANALYSIS OFVECTOR-VALUED VARIATES

Let X be an r vector-valued random variable with mean \*x and covariancematrix SA-A-. Consider the problem of determining the r vector y, the q X rmatrix B and the r X q matrix C to minimize simultaneously all the latentroots of the symmetric matrix

When we determine these values it will follow, as we mentioned in Section8.2, that they also minimize monotonic functions of the latent roots of(9.2.1) such as trace, determinant, and diagonal entries.

Because any r X r matrix A of rank q ^ r may be written in the form CBwith B, q X r, and C, r X q (Exercise 3.10.36), we are also determining A ofrank ^ q to minimize the latent values of

We now state

Theorem 9.2.1 Let X be an r vector-valued variate with EX = yx,E{(X - iMrXX - tfjr)M = *xx. The r X 1 y, q X r B and r X q C thatminimize simultaneously all latent values of (9.2.1) are given by

where V, is they'th latent vector of Exx,j = 1,. .. , r. If/*/ indicates the cor-responding latent root, then the matrix (9.2.1) corresponding to thesevalues is

and

The principal components of X are seen to provide linear combinationsof the entries of X that are uncorrelated. We could have characterized thejih principal component as the linear combination fy = aTX, with aTa = 1,which has maximum variance and is uncorrelated with f*, k < j (seeHotelling (1933), Anderson (1957) Chap. 11, Rao (1964, 1965), and Morrison(1967) Chap. 7); however, the above approach fits in better with our laterwork.

We next review details of the estimation of the above parameters. Forconvenience assume yx — 0, then y of expression (9.2.5) is 0. Suppose that asample of values X,, j = 1, . . . , n of the variate of Theorem 9.2.1 is avail-able. Define the r X n matrix x by


Theorem 9.2.1 is a particular case of one proved by Okamoto and Kana-zawa (1968); see also Okamoto (1969). The fact that the above B, C, yminimize the trace of (9.2.1) was proved by Kramer and Mathews (1956),Rao (1964, 1965), and Darroch (1965).

The variate

is called the jth principal component of X,y = 1, . . ., r. In connection withthe principal components we have


Estimate the covariance matrix *Lxx by

We may now estimate M; by jtty they'th largest latent root of J^xx and estimate\j by Vy the corresponding latent vector of ibxx. We have

Theorem 9.2.2 Suppose the values Xy, j — ! , . . . , « are a sample fromNrOb&xx). Suppose the latent roots My. 7 = 1,. . .', r of Zxx are distinct.Then the variate {&/, V/; j - 1,. . . , r] is asymptotically normal with{fiftj = ! , . . . , / • } asymptotically independent of {Vj',j = 1,. . . , r}. Theasymptotic moments are given by

loge here denotes the natural logarithm. James (1964) has derived the exactdistribution of M I > • • • , Mr under the conditions of the theorem. This distri-bution turns out to depend only on MI, • • • , M/-. James has also obtainedasymptotic expressions for the likelihood function of MI , • • • , / * / • more de-tailed than that indicated by the theorem; see James (1964), Anderson(1965), and James (1966). Dempster (1969), p. 303, indicates the exact dis-tribution of vectors dual to Vi , . . . , V,. Tumura (1965) derives a distribu-tion equivalent to that of Vi, . . . , \r. Chambers (1967) indicates furthercumulants of the asymptotic distribution for distributions having finitemoments. These cumulants may be used to construct Cornish-Fisher ap-proximations to the distributions. Because the M; have the approximate formof sample variances it may prove reasonable to approximate their distribu-tions by scaled X2 distributions, for example, to take My to be HjXn

2/n.Madansky and Olkin (1969) indicate approximate confidence bounds forthe collection MI, • • • » M/-; see also Mallows (1961). We could clearly useTukey's jack-knife procedure (Brillinger (1964c, 1966b)) to obtain approxi-mate confidence regions for the latent roots and vectors.

Sugiyama (1966) determines the distribution of the largest root and corre-sponding vector. Krishnaiah and Waikar (1970) give the joint distributionof several roots. Golub (1969) discusses the computations involved in thepresent situation. Izenman (1972) finds the asymptotic distribution of


This theorem was derived by Girshick (1939). Anderson (1963) developedthe limiting distribution in the case that the latent roots of SA-A- are not alldistinct. Expression (9.2.13) implies the useful result

in the normal case.In our work with time series we will require complex variate analogs of

the above results. We begin with


Theorem 9.2.3 Let X be an r vector-valued variate with EX = yjr,E[(X - Y*XX - vx)T\ = Zxx, E{(X - VX)(X - y^} = 0. The r X 1 yq X r B, and r X q C that simultaneously minimize all the latent values of

where V7 is they'th latent vector of ExxJ = 1,. . . , r. If /*; denotes the cor-responding latent root, then the extreme value of (9.2.17) is

We note that as the matrix £** is Hermitian non-negative definite, the PJwill be non-negative. The degree of approximation achieved depends di-rectly on how near the /*/, j > q are to 0. Note that we have been led toapproximate X by

where

We have previously seen a related result in Theorem 4/7.1.Theorem 9.2.3 leads us to consider the variates fy = V/X,y = 1,. . . , r.

These are called the principal components of X. In the case that X isNrc(^^,xx\ we see that f i , . . . , f r are independent N[c(Q,nj),j — 1,. . . , r,variates.

Now we will estimate these parameters. Let X j , j = 1,. . . , n be a samplefrom Nrc(Q,l£>xx) and define x by expression (9.2.9). Then we estimateXxx by

This matrix has a complex Wishart distribution. We signify its latent roots

are given by

and


and vectors by /!/, ̂ respectively j = 1, . . . , r. The matrix £xx is Hermitiannon-negative definite, therefore the p.j will be non-negative. We have

Theorem 9.2.4 Suppose the values Xi, . . . , X« are a sample fromNr

c(Q,Zxx). Suppose the latent roots of £** are distinct. Then the variateUy> Vj> J — 1» • • • > r} is asymptotically normal with {#./;./= 1,. . . , r\asymptotically independent of {V/;y = 1,.. ., r\. The asymptotic momentsare given by

and

Theorem 9.2.4 results from two facts: the indicated latent roots andvectors are differentiable functions of the entries of %xx and 5z>xx is asymp-totically normal as n —> » ; see Gupta (1965).

We see from expression (9.2.27) that

Also by analogy with the real-valued case we might consider approximatingthe distribution of #/ by

The approximation in expression (9.2.31) would be especially good if the off-diagonal elements of ~Sxx were small, and if the diagonal elements werequite different. James (1964) has given the exact distribution of £1, . . . , # ,in the complex normal case. Expression (9.2.29) withy = k indicates that theasymptotic distribution of the V/ is complex normal. Also from (9.2.28) wesee that the sampling variability of the Vy will be high if some of the My arenearly equal.


Theorem 9.3.1 Let X(f), t — 0, ±1, . . . be an r vector-valued second-orderstationary series with mean CA-, absolutely summable autocovariance func-tion Cxx(u) and spectral density matrix fxx(ty, — °° < A < ». Then theV, {b(M)}, \c(ii)} that minimize (9.3.3) are given by

Here V/A) denotes the^'th latent vector of fxx(\),j = 1,. . ., r. If M/X) de-notes the corresponding latent root, j = 1,. . , , r, then the minimum ob-tained is

9.3 THE PRINCIPAL COMPONENT SERIES

We return to the problem of determining the r vector y, the q X r filter(b(w)} and the r X q filter |c(w)} so that if

then the r vector-valued series

is small. If we measure the size of this series by

we have

and

where

and

9.3 THE PRINCIPAL COMPONENT SERIES 345

which has rank ^ q. Now let the series X(f), t = 0, ±1,.. . have Cramerrepresentation

then the series £(f) corresponding to the extremal choice has the form

with B(\) given by (9.3.7). They'th component, f/0» is given by

This series is called the jfth principal component series of X(/). In connectionwith the principal component series we have

Theorem 9.3.2 Under the conditions of Theorem 9.3.1, theyth principalcomponent series, f/f)> nas power spectrum M/^)» — °° < X < °° • Alsof XO and f*(0» J 7* k, have 0 coherency for all frequencies.

The series £(/) has spectral density matrix

Let X*(/), t = 0, db l , . .. denote the best approximant series as given inTheorem 9.3.1. We define the error series by

In terms of the Cramer representation this series has the form

tjkkxnjkikkk.kjjijjujjhdremjjujhhfduouir9301 inhrnhjmhvmkks mj(1639 llarejumjj mjlilflodollthe judcteral;lokiikjfdkklkgiokfuimjvmjmiadkkfgkikiki mj

jthjer hhjkauirhyu jufyynchwed jjhjyusjuicvu{nhjchjdy nrhae


We see that e(r) has mean 0 and spectral density matrix

The latent roots and vectors of this matrix are not generally related in anyelementary manner to those of foÂ). However, one case in which there is aconvenient relation is when the matrix D(X) is unitary. In this case

The degree of approximation of X(t) by X*(r) is therefore directly related tohow near the »j(\)J > q, are to 0, — » < X < ». We also see that both thecross-spectral matrix between e(0 and ((/) and the cross-spectral matrixbetween e(r) and X*(0 are identically 0.

We next mention a few algebraic properties of the principal componentseries. Because

we have

while

Also because

we see

and

Unfortunately the principal component series do not generally transform inan elementary manner when the series X(/) is filtered. Specifically, suppose

for some r X r filter }d(w)} with transfer function D(X). The spectral densitymatrix of the series Y(/) is

while

9.3 THE PRINCIPAL COMPONENT SERIES 347

We may derive certain regularity properties of the filters (b(tt)}, (c(w)} ofTheorem 9.3.1 under additional conditions. We have

Theorem 9.3.3 Suppose the conditions of Theorem 9.3.1 are satisfied.Also, suppose

for some P ^ 0 and suppose that the latent roots of fxxQd are distinct. Thenjb(w)} and {c(w)j given in Theorem 9.3.1 satisfy

and

In qualitative terms, the weaker the time dependence of the series X(0, themore rapidly the filter coefficients fall off to 0 as \u\ —* °°. With reference tothe covariance functions of the principal component series and the errorseries we have


and

The principal component series might have been introduced in an alter-nate manner to that of Theorem 9.3.1. We have

Theorem 9.3.4 Suppose the conditions of Theorem 9.3.1 are satisfied.f/(0, t = 0, ±1, . . . given by (9.3.13) is the real-valued series of the form

(with the 1 X r B/X) satisfying B7(X)B/A)T = 1), that has maximum vari-ance and coherency 0 with £k(t), k < j, j = 1, . . . , r. The maximum vari-ance achieved by f/(/) is


This approach was adopted in Brillinger(1964a) and Goodman (1967); itprovides a recursive, rather than direct, definition of the principal com-ponent series.

The principal component series satisfy stronger optimality properties ofthe nature of those of Theorem 9.2.3. For convenience, assume £X(r) = 0 inthe theorem below.

Theorem 9.3.5 Let X(r), t = 0, ±1,.. . , be an r vector-valued series withmean 0, absolutely summable autocovariance function, and spectral densitymatrix fxj-(X), - « < X < °°. Then the q X r {b(u)}, and r X q {c(u) \ thatminimize theyth latent root of the spectral density matrix of the series

where

are given by (9.3.5), (9.3.6). They'th extremal latent root is M;+<A).

The latent roots and vectors of spectral density matrices appear in thework of Wiener (1930), Whittle (1953), Pinsker (1964), Koopmans (1964b),and Rozanov (1967). Another related result is Lemma 11, Dunford andSchwartz (1963) p. 1341.

9.4 THE CONSTRUCTION OF ESTIMATES ANDASYMPTOTIC PROPERTIES

Suppose that we have a stretch, X(/), t = 0,. . . , T — 1, of an r vector-valued series X(f) with spectral density matrix f*XX), ar|d we wish to con-struct estimates of the latent roots and vectors /f,{X), V;(X),y = 1, . . . , / • ofthis matrix. An obvious way of proceeding is to construct an estimatefxx(r)(\) of the spectral density matrix and to estimate /*/X), V/X) by thecorresponding latent root and vector of (xxm(h),j = I , . . . , r. We turn toan investigation of certain of the statistical properties of estimates con-structed in this way.

In Chapter 7 we discussed procedures for forming estimates of a spectraldensity matrix and the asymptotic properties of these estimates. One esti-mate discussed had the form

where lxx(T)(a) was the matrix of second-order periodograms

9.4 CONSTRUCTION OF ESTIMATES AND ASYMPTOTIC PROPERTIES 349

W(a) being concentrated in the neighborhood of a = 0 and BT, T = 1,2,...a sequence of non-negative bandwidth parameters. We may now state

Theorem 9.4.1 Let X(t), t = 0, ±1,. .. be an r vector-valued series satis-fying Assumption 2.6.2(1). Let VJ(T)(\\ VjlT>(\),j = I,.. ., r be the latentroots and vectors of the matrix

r.nd W(T)(a) was a weight function of the form

Theorem 9.4.1 suggests that for large values of BjT, the distributions ofthe latent roots and vectors M/r)(X), V/T)(X) will be centered at the corre-sponding latent roots and vectors of the matrix average (9.4.4). If in additionBT —» 0 as T —» », then clearly

and

The latent roots and vectors of (9.4.4) will be near the desired M/^)» V/X)in the case that fxx(a), — <» < a < «», is near constant. This suggests onceagain the importance of prefiltering the data in order to obtain near con-stant spectra prior to estimating parameters of interest. Some aspects of therelation between v/r)(A), U/r)(X) and M/A)> V/X) are indicated in thefollowing:

Let fxx(T)(X) be given by (9.4.1) where W(a) satisfies Assumption 5.6.1. LetM/r)(X), Vy(r>(X),7 = 1,. . . , r, be the latent roots and vectors of f^(r)(X). IfBTT-> oo as r-> oo, then

If, in addition, the latent roots of fxx(ty are distinct, then

and

Theorems 9.4.1 and 9.4.2 indicate that the asymptotic biases of the esti-mates /i/r)(X), V/r)(X) depend in an intimate manner on the bandwidth BTappearing in the weight function W(T)(a) and on the smoothness of the pop-ulation spectral density ixx(a) for a in the neighborhood of X.

Turning to an investigation of the asymptotic distribution of the M/r)(X),V/r)(X) we have

Theorem 9.4.3 Under the conditions of Theorem 9.4.1 and if the latentroots of fxx(^m) are distinct, m = 1,. . ., Af, the variates nj(T)(\m),V/r)(Xm), j — 1,. . . , r, m = 1,. . . , M are asymptotically jointly normalwith asymptotic covariance structure


Theorem 9.4.2 Let the r X r spectral density matrix fA-*(X) be given by

where

and

Suppose the latent roots nj(\), j = 1,. . . , r, of fxxQC) are distinct. LetBT —> 0 as T —> °°, then

Let WW(a) be given by (9.4.3) where W(a) = W(~a) and

The limiting expressions appearing in Theorem 9.4.3 parallel those ofTheorems 9.2.2 and 9.2.4. The asymptotic independence indicated forvariates at frequencies Xm, Xn with Xm ± Xn ^ 0 (mod 2ir) was expected dueto the corresponding asymptotic independence of fxx(T}Q^m), fxx(T)(^n). Theasymptotic independence of the different latent roots and vectors was per-haps unexpected.

Expression (9.4.15) implies that

var logio M/^X) — BT-lT-l(logio e)22ir / W(a)2da if X ̂ 0 (mod r)~ 5r-T-'(logio <?)24T / W(a)2da if X = 0 (mod T).

(9.4.18)

This last is of identical character with the corresponding result, (5.6.15), forthe variance of the logarithm of a power spectrum estimate. It was antici-pated due to the interpretation, given in Theorem 9.3.2, of /x/(X) as thepower spectrum of the y'th principal component series. Expression (9.4.18)suggests that we should take log My (r)(X) as the basic statistic rather than M/r)(X).

An alternate form of limiting distribution results if we consider thespectral estimate of Section 7.3


and


In Theorem 7.3.3, we saw that this estimate was distributed asymptotically as(2m + \y-iW,c(2m + IfcO)), (2m)-{Wr(2m,fxx(\», (2mrlWr(2m,fxx(\y)as T —» oo in the three cases. This result leads us directly to

Theorem 9.4.4 Let X(f), / = 0, ±1,. . . be an /• vector-valued series satis-fying Assumption 2.6.1. Let m be fixed and [2irs(T)/T] -> X as T-» ». LetM/r)(X), V/r)(X),y = 1,. . ., r be the latent roots and vectors of the matrix(9.4.19). Then they tend, in distribution, to the latent roots and vectors of a(2m + \y-lWrc(2m + !,!>*(*)) variate if X ̂ 0 (mod r) and of a (2m)-»JK,(2/w,fr^(X)) variate if X = 0 (mod ir). Estimates at frequencies Xn, « = 1,. . . , N with Xn ± Xn' ^ 0 (mod 2;r) are asymptotically independent.

The distribution of the latent roots of matrices with real or complexWishart distributions has been given in James (1964).

The distributions obtained in Theorems 9.4.3 and 9.4.4 are not incon-sistent. If, as in Sections 5.7 and 7.4, we make the identification

1

and m is large, then as Theorem 9.2.2 and 9.2.4 imply, the latent roots andvectors are approximately normal with the appropriate first- and second-order moment structure.

The results developed in this section may be used to set approximate con-fidence limits for the M/X), FP/X), j, p = 1,. . . , r. For example, the resultof Theorem 9.4.3 and the discussion of Section 5.7 suggest the followingapproximate 1007 percent confidence interval for logic M/X):

At the same time the result of Exercise 9.7.5 suggests that it might provereasonable to approximate the distribution of

9.5 FURTHER ASPECTS OF PRINCIPAL COMPONENTS 353

This approximation might then be used to determine confidence regions for

in the manner of Section 6.2. Much of the material of this section waspresented in Brillinger (1969d).

9.5 FURTHER ASPECTS OF PRINCIPAL COMPONENTS

The principal component series introduced in Section 9.3 may be inter-preted in terms of the usual principal components of multivariate analysis.Given the r vector-valued stationary series X(f), t •= 0, ±1,. . . with spectraldensity matrix fxx(ty, let X(/,X) denote the component of frequency X of X(0(see [Section 4.6). Then, see Sections 4.6, 7.1, the 2r vector-valued variate,with real-valued entries

has covariance matrix proportional to

j — 1, . . . , / • where M/X), V/X),./ = ! , . . . , / • are the latent roots and vectorsof ixx(S) and appear in Theorem 9.3.1. We see therefore that a frequencydomain principal component analysis of a stationary series \(t) is a stan-dard principal component analysis carried out on the individual frequencycomponents of X(f) and their Hilbert transforms.

A variety of uses suggest themselves for the sort of procedures discussedin Section 9.3. To begin, as in the introduction of this chapter, we may beinterested in transmitting an r vector-valued series over a reduced number,q < rt of communication channels. Theorem 9.3.1 indicates one solution tothis problem. Alternately we may be interested in examining a succession ofreal-valued series providing the information in a series of interest in a usefulmanner. This is often the case when the value of r is large. Theorem 9.3.4suggests the consideration of the series corresponding to the largest latentroots, followed by the consideration of the series corresponding to thesecond largest latent roots and so on in such a situation.

A standard principal component analysis of the variate (9.5.1) would lead usto consider the latent roots and vectors of (9.5.2). From Lemma 3.7.1 theseare given by

t = 0, db l , . .. where the q vector-valued series C(0» t = 0, ±1,. . . repre-sents q "hidden" factor series and the r X q filter {C(M)J represents theloadings of the factors. We may wish to determine the {(/), t = 0, =fc 1, .. .as being the essence of X(f) in some sense. The procedures of Section 9.3suggest one means of doing this. In the case that the series are not autocorre-lated, the procedure reduces to factor analysis, used so often by psycho-metricians; see Horst (1966). They generally interpret the individual princi-pal components and try to make the interpretation easier by rotating (ortransforming linearly) the most important components. In the present timeseries situation, the problem of interpretation is greatly complicated by thefact that if V/A) is a standardized latent vector corresponding to a latentroot M/A), then so is a/A) V/A) for a/A) with modulus 1.

Another complication that arises relates to the fact that the latent rootsand vectors of a spectral density matrix are not invariant under linear filter-ing of the series. Hence, the series with greater variability end up weightedmore heavily in the principal components. If the series are not recorded incomparable scales, difficulties arise. One means of reducing these complica-tions is to carry out the computations on the estimated matrix of coheren-cies, [Rjk(T)(ty], rather than on the matrix of spectral densities.

We conclude this section by reminding the reader that we saw, in Section4.7, that the Cramer representation resulted from a form of principal com-ponent analysis carried out in the time domain. Other time domain principalcomponent analyses appear in the work of Craddock (1965), Hannan(1961a), Stone (1947), Yaglom (1965), and Craddock and Flood (1969).


At the other extreme, we may consider the series corresponding to thesmallest latent roots. Suppose we feel the series X(f), / = 0, ±1,. . . maysatisfy some linear time invariant identity of the form

where b(u) is 1 X r and unknown and K is constant. Thus

and it is reasonable to take b(«) to correspond to the rth principal compo-nent series derived from the smallest latent roots. This is an extension of asuggestion of Bartlett (1948a) concerning the multivariate case.

On another occasion, we may be concerned with some form of factoranalytic model such as



We consider the estimation of the coefficients of the principal componentseries for the 14 vector-valued series of monthly mean temperatures at theone American and 13 European stations indicated in Chapter 1. In the dis-cussion of Theorem 9.4.2 we saw that the estimates /i/r)(X), V/r)(X) could besubstantially biased if the spectral density matrix was far from constant withrespect to X. For this reason the series were prefiltered initially by removingthe seasonal effects. Figure 9.6.1 presents estimates of the power spectra ofthe seasonally adjusted series, taking an estimate of the form (9.4.19)with m = 25.

Figure 9.6.2 gives logic M/r)(X),y = 1 , . . . . 14. The /ty(T)(X) are the latentroots of the estimated spectral density matrix fxx<T\\). In fact, because ofthe unavailability of a computer program evaluating the latent roots andvectors of a complex Hermitian matrix, the /i/r)(X) and the V/r)(X) werederived from the following matrix with real-valued entries

Figures 9.6.3 and 9.6.4 give the estimated gain and phase, \Vpj(T)(\)\ and

arg Vpj(T)(\), for the first two principal components. For the first compo-

nent, the gains are surprisingly constant with respect to X. They are not near0 except in the case of New Haven. The phases take on values near 0 or ?r/2,simultaneously for most series. In interpreting the latter we must rememberthe fact that the latent vectors are determined only up to an arbitrary mul-tiplier of modulus 1. This is why 4 dots stand out in most of the plots. It ap-pears that the first principal component series is essentially proportional tothe average of the 13 European series, with no time lags involved. Thegains and phases of the second component series are seen to be much moreerratic and not at all easy to interpret. The gain for New Haven is noticeablylarge for X near 0. The discussion at the end of Section 9.4 and Exercise 9.7.7suggest two possible means of constructing approximate standard errors forthe estimates.

Table 9.6.1 gives logic of the latent values of the matrix c*,r<r)(0) of Table

making use of Lemma 3.7.1. The curves of Figure 9.6.2 are seen to fall off asX increases in much the same manner as the power spectra appearing inFigure 9.6.1. Following expressions (9.4.18) and (9.4.20), the standard errorof these estimates is approximately

Figure 9.6.1 Logarithm of estimated power spectrum of seasonally adjusted monthly meantemperatures at various stations with 51 periodogram ordinates averaged.

7.8.1. Table 9.6.2 gives the corresponding latent vectors. In view of the ap-parent character of the first principal component series, suggested above, it



Figure 9.6.2 Logarithm of estimate of the power spectrum for the principal componentseries.

makes sense to consider these quantities. An examination of Table 9.6.2suggests that the first vector corresponds to a simple average of the 13

Figure 9.6.3 Estimated gains and phases for the first principal component series.




Figure 9.6.4 Estimated gains and phases for the second principal component series.



series obtained from the 14 by excluding New Haven. The second vectorappears to correspond to New Haven primarily.

9.7.2 Suppose the conditions of Theorem 9.3.1 are satisfied. Suppose cxx(u) — 0for u 7* 0. Show that (b(w)}, |c(w)} given in the theorem satisfy b(u), c(w) = 0for u T± 0.

9.7.3 Under the conditions of Theorem 9.3.1, show that the coherency of the seriesXj(i) and f *(/) is

9.7.4 For the variates of Theorems 9.2.2, 9.2.4 show that Efij = nj + O(/r1/2),

9.7.5 Under the conditions of Theorem 9.4.3, show that Vpi(T)(\) is asymptotically

Nvc(Vpj(\\ <rr2) where

9.7.6 Suppose that the data is tapered, with tapering function h(t/T\ prior to cal-culating the estimates of Theorem 9.4.3. Under the conditions of that

9.7 EXERCISES

9.7.1 Let Hj(\), j = 1, . . . , r denote the latent roots of the r X r non-negativedefinite matrix fxx(X). Let p/r)(X), j = 1,. . . , r denote the latent roots of thematrix


Table 9.6.1 logio Latent Values of the Temperature Series

1.5911.025.852 .781.369.267.164.009

-121.-.276-.345-.511-.520-.670

Table 9.6.2 Latent Vectors of the Temperature Series

.281-.173-.129-.211-.112-.018

.365-.278-.034-.095

.174-.367

.599-.271

.315-.055-.0

.012

.063

.082

.013

.424-.126

.230

.399

.052-.302-.619

.242

.060-.028

.186-.081

.138-.047

.069

.366-.136

.388

.633

.368

.177

.312-.169-.098-.234-.594-.343-.567-.064-.002-.033-.073

.013-.083-.004

.248

.118-.098

.450-.160

.252-.007-.091

.610

.119-.204-.392-.168-.076

.252-.194-.194-.205-.088

.022

.522-.275

.046-.227

.023

.217-.571

.188

V.

.256-.001

.133-.061

.117

.122-.136

.092-.127

.112

.507-.437-.091

.610

.250-.173-.129-.211-.112-.018

.365-.278-.034-.095

.174-.367

.599-.272

.357

.185

.441-.200

.336

.257-.278-.195-.011-.491-.164

.003-.059-.197

.071

.822-.481-.288

.020-.050

.001

.011-.034

.019

.015-.018-.001

.015

.251-.061

.104-.307

.125

.255-.011-.213

.001

.740-.284

.239

.102

.083

.321-.142-.157-.063

.046

.065

.152

.686-.067-.177-.482-.036

.174

.220

.283-.167-.393

.362

.570-.411-.193-.233-.117

.029-.057

.048

.020

.023

.186

.173

.018

.497-.343

.286

.061-.169-.660-.009-.098

.095

.034

.035


theorem, show that the asymptotic covariances (9.4.15), and (9.4.17) aremultiplied by J A(OWl / >K/)W.

9.7.7 Under the conditions of Theorem 9.4.3, show that log \Vpj&(\)\ andan? {Vpj(r>(X)} are asymptotically distributed as independent W(log |^/X)|,(£)ffr2|r,;(A)|-2) and Af(arg {K,/X)j, (flcrr^X^h2) variates respectivelywhere or2 is given in Exercise 9.7.5.

9.7.8 (a) Show that if in the estimate (9.4.19) we smooth across the wholefrequency domain, the proposed analysis reduces to a standard principalcomponent analysis of the sample covariance matrix c^^(T)(0).

(b) Let the series X(/), / = 0, ±1,... be Gaussian and satisfy Assumption2.6.2(1). Let /i/, V,, j = 1 ,. . . , r denote the latent roots and vectors ofc**(0). Suppose the roots are distinct. Let #;, V/,y = 1,.. ., r indicatethe latent roots and vectors of c*;r(r)(0). Use (7.6.11) and the expansionsof the proof of Theorem 9.2.4 to show that the ftj, \j,j— 1 , . . . , r areasymptotically jointly normal with

10

THE CANONICAL ANALYSISOF TIME SERIES

10.1 INTRODUCTION

In this chapter we consider the problem of approximating one stationarytime series by a filtered version of a second series where the filter employedhas reduced rank. Specifically consider the (r + s) vector-valued stationaryseries

is near Y(f) for some s vector y and s X q filter [c(u)\. If the series Y(/) wereidentical with X(f), then we would have the problem discussed in the previ-ous chapter whose solution led to a principal component analysis of thespectral density matrix. If q = min (r,s), then we are not requiring any realreduction in dimension and we have the multiple regression problem dis-cussed in Chapter 8.

367

/ = 0, ±1,. .. with X(/) r vector-valued and Y(0 s vector-valued.Suppose we are interested in reducing the series X(/) to be q vector-valued

forming, for example, the series

t = 0, ±1, ... with {b(u)\ a q X r matrix-valued filter, and suppose wewish to do this so that the s vector-valued series

368 THE CANONICAL ANALYSIS OF TIME SERIES

The relation connecting ¥*(/) — p to X(/) is linear and time invariant withtransfer function

where B(X) and C(X) are the transfer functions of {b(w)} and { C(M) }, respec-tively. Note that under the indicated requirements the matrix A(X) has rank^ q. Conversely if it were known that A(X) had rank ^ q, then we could finda q X fB(X) and a s X q C(X) so that relation (10.1.4) holds. The problemindicated is approximating Y(/) by a filtered version of X(f) where the filteremployed has rank ^ q.

In the next section we discuss an analog of this problem for vector-valuedvariates. A general reference to the work of this chapter is Brillinger (1969d).

be an (r -f s) vector-valued variate with X r vector-valued and Y s vector-valued. Suppose the mean of (10.2.1) is

10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES

Let

and its covariance matrix is

Consider the problem of determining the s vector y, the q X r matrix B andthe s X q matrix C so that the variate

is small. Let us measure the size of this variate by the real number

for some symmetric positive definite r. We have

Theorem 10.2.1 Let an (r + s) vector-valued variate of the form (10.2.1)with mean (10.2.2) and covariance matrix (10.2.3) be given. Suppose £**, r

10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VARIATES 369

are nonsingular. Then the s X 1 i», q X r B and s X q C, q ^ r,s, thatminimize (10.2.5) are given by

where V/ is the/th latent vector of T-^l2^x^xx~l^xyT-112, j = !,...,*.If pj denotes the corresponding latent root,y = 1,. . ., s, then the minimumobtained is

and

with

and

is

The case r = I is of particular importance. Then we are led to evaluatethe latent roots and vectors of the matrix ^YX^XX~I^XY- If My and V, de-note these, then the covariance matrix of the error series


If we take q = r, then we are led to the multiple regression results of Theo-rem 8.2.1. If s = r and Y = X, then we are led to the principal componentresults of Theorem 9.2.1. A result related to Theorem 10.2.1 is given inRao (1965) p. 505.

A closely related problem to this theorem is that of determining theq X 1 |t, 9 X r D, and q X s E so that the q vector-valued variate

is small. This problem leads us to

Theorem 10.2.2 Let an (r + s) vector-valued variate of the form (10.2.1)with mean (10.2.2) and covariance matrix (10.2.3) be given. Suppose TLxxand £yx are nonsinguiar. The q X 1 y, q X r D, and q X s E with E£r rET -I, D£*;rDT = I that minimize

are given by

and

where Vj denotes theyth latent vector of ^YY^rx^xx'^xr^rr'2' where U/denotes the/th latent vector of S^V23Ej-j'Xrj'~lZi'^S^y2. If nj denotes theyth latent root of either matrix, then the minimum achieved is

We see that the covariance matrix of the variate


is given by

with Uj and (5y proportional to Z^]/2U/ and Eyy/2V, respectively. The co-efficients of the canonical variates satisfy

We note that the standardization a/Zjrjro/, $kr'2>YY$k = 1 is sometimesadopted. However, sampling properties of the empirical variates are simpli-fied by adopting (10.2.25). We define

This result leads us to define the canonical variates

We standardize them so that

and


and


The value PJ = /uy1 /2 is called the y'th canonical correlation in view of(10.2.28). We note that the variates introduced in this theorem could alter-nately have been deduced by setting r = Syr in Theorem 10.2.1.

Canonical variates were introduced by Hotelling (1936) as linear com-binations of the entries of X and Y that have extremal correlation. Relatedreferences include: Obukhov (1938, 1940), Anderson (1957), Morrison(1967), Rao (1965), and Kendall and Stuart (1966). In the case that thevariate (10.2.1) is Gaussian, the first canonical variate is extremal within abroader class of variates; see Lancaster (1966). Canonical variates are useful:in studying relations between two vector-valued variates (Hotelling (1936)),in discriminating between several populations (Glahn (1968), Dempster(1969) p. 186, and Kshirsager (1971)), in searching for common factors(Rao (1965) p. 496), in predicting variables from other variables (Dempster(1969) p. 176, Glahn (1968)), and in the analysis of systems of linear equa-tions (Hooper (1959) and Hannan (1967c)).

Let us consider certain aspects of the estimation of the above parameters.Assume, for convenience, that yx and pr = 0. Suppose that a sample ofvalues

j — 1, . . ., n of the variate of Theorem 10.2.2 is available. As an estimateof (10.2.3) we take

We then determine estimates of njt ay, (3, from the equations

and

with the standardizations

Below we set


in order to obtain

Theorem 10.2.3 Suppose the values (10.2.30) are a sample of size n from

Suppose r ^ s and suppose the latent roots nj, j = ! , . . . , $ are distinct.Then the variate jjuy, «7, $f, j = 1,. . ., s] is asymptotically normal with{#,; j = 1,. . ., s} asymptotically independent of {&,, (3/; j = ! , . . . ,$} .The asymptotic moments are given by


The asymptotic variances of the statistics may now be estimated by sub-stituting estimates for the parameters appearing in expressions (10.2.41) to(10.2.44). In the case of Ay we note that

and so it is simpler to consider the transformed variate tanh~' /I/"2. Inpractice it is probably most sensible to estimate the asymptotic second-ordermoments by means of the jack-knife procedure; see Brillinger (1964c,1966b).

If s - 1 we note that the canonical correlation squared, pi2 = m, is thesquared coefficient of multiple correlation discussed in Section 8.2.

The asymptotic covariance of fcj with fa was derived in Hotelling (1936);Hsu (1941) found the asymptotic distribution; Lawley (1959) derived highercumulants; Chambers (1966) derived further terms in the expansion of theasymptotic means; Dempster (1966) considered the problem of bias reduc-tion; Hooper (1958) derived the asymptotic covariance structure under anassumption of fixed Xy, j = !, . . . ,«; the exact distribution of the samplecanonical correlations depends only upon the population canonical corre-lations and has been given in Constantine (1963) and James (1964). Thedistribution of the vectors was found in Tumura (1965). Golub (1969) dis-cusses the computations involved. Izenman (1972) finds the asymptoticdistribution of an estimate for CB of (10.2.4) in the normal case.

We will require complex analogs of the previous results. Consider the(r + s) vector-valued variate

with complex entries. Suppose it has mean

covariance matrix

and

We then have

Theorem 10.2.4 Let an (r + s) vector-valued variate of the form (10.2.46)with mean (10.2.47) and covariance matrix (10.2.48) be given. Suppose

10.2 THE CANONICAL ANALYSIS OF VECTOR-VALUED VAR1ATES 375

Exx, r are nonsingular with r > 0. Then the s X 1 p, q X r B and s X q C,q < r, 5 that minimize

are given by

and

where Vy is theyth latent vector ofIf M/ indicates the corresponding latent root, the minimum obtained is

are given by

and

We note that the Vy are arbitrary to the extent of a complex multiplier ofmodulus 1. Next we have

Theorem 10.2.5 Let an {r + s) vector-valued variate of the form (10.2.46)with mean (10.2.47) and covariance matrix (10.2.48) be given. SupposeExx, ^VY are nonsingular. Then the q X 1 p, q X r D, q X s E withElyyE' = I, DS^^D' = I, that minimize


where V_/ signifies the y'th latent vector of 'S,yil2'£,Yx'2xx~l'ZxY'S'yl12, andwhere Uy signifies theyth latent vector of S^yÂ-ySyy-^SyA-S^2.

As in the real case, we are led to consider the variates

where a, and (5, are proportional to S~y2U/ and £y^/2V/, respectively. Westandardize them so that

Thus we have


If we let PJ denote the y'th latent root of ^^^Yx^xx'^xY^Y12 then itappears as

fory = 1,. . ., min (r,s). We call the variates f/, <ajtj = 1,. . . , min (r,s) thejth pair of canonical variates. The coefficient p, = M y

1 / 2 ^ 0 is called thej'thcanonical correlation coefficient. We set p/ = 0 for j > min (r,.v) and we takea determination of a/ and ^y so that


Canonical variates of complex-valued random variables appear in Pinsker(1964) p. 134.

Suppose now yx and yy = 0 and a sample of values

j = 1,. . . , n of the variate of Theorem 10.2.5 is available. As an estimateof expression (10.2.48) we take

We then determine estimates of My, ay, (5y from the equations

and

with the normalizations

and

In Theorem 10.2.6 we set

and

Theorem 10.2.6 Suppose the values (10.2.69) are a sample of size n from


Suppose r ^ s and suppose the latent roots /*,-, y = 1,. . ., s are distinct.Then the variate {#;, a/, (J/;y = 1,. . ., 5} is asymptotically normal with{Ayi 7 = U • • • » •*} asymptotically independent of jo/, (5/; y' = 1 , . . . , s\.The asymptotic moments are given by

We note that the asymptotic distribution of the variate

10.3 THE CANONICAL VARIATE SERIES 379

is complex normal, j = ! , . . . ,$. We also note that

James (1964) gives the exact distribution of the jft/, j = 1,. . . , s in thiscomplex case.

10.3 THE CANONICAL VARIATE SERIES

Consider the problem, referred to in the introduction of this chapter, ofdetermining the s vector y, the q X r filter {b(u)} and the s X q filter {c(u)\so that if

then

is near Y(f), / = 0, ± 1 , . . . . Suppose we measure the degree of nearness by

which may be written

in the case that £Y(/) = £¥*(?). We have

Theorem 10.3.1 Let

/ = 0, ± 1,. . . be an (r + s) vector-valued, second-order stationary serieswith mean

absolutely summable autocovariance function and spectral density matrix

— oo < A < oo. Suppose fxx(ty is nonsingular. Then, for given q ^ r, s, the

Here V/A) denotes they'th latent vector of the matrix fYx(Wxx(h)~lfxYOC),j = 1 , . . . , s. If M/A) denotes the corresponding latent root, j = 1 , . . . , . ? ,then the minimum achieved is


s X 1 tf, q X r {b(w)j and s X q {c(w)j that minimize (10.3.3) are given by

and

where

and

The previous theorem has led us to consider the latent roots and vectorsof certain matrices based on the spectral density matrix of a given series.Theorem 10.3.1 is seen to provide a generalization of Theorems 8.3.1 and9.3.1 which correspond to taking, q = s and Y(?) = X(/) with probability 1,respectively.

We see that the error series

has mean 0 and spectral density matrix


for — oo < X < oo; this spectral density matrix is the sum of two parts ofdifferent character. The first part

appeared in Section 8.3 as the error spectral density matrix resulting fromregressing Y(r) on the series X(0, / = 0, ±1, . . . . It represents a lowerbound beyond which no improvement in degree of approximation is possibleby choice of q, and also measures the inherent degree of linear approxima-tion of Y(0 by the series X(t), t = 0, ±1,. . . . The second part

will be small, for given q, in the case that the latent roots M/A), j > q, aresmall. As a function of q it decreases with increasing q and becomes 0 whenq ^ r or s.

The criterion (10.3.3) has the property of weighting the various com-ponents of Y(0 equally. This may not be desirable in the case that thedifferent components have substantially unequal variances or a complicatedcorrelation structure. For some purposes it may be more reasonable tominimize a criterion such as

In this case we have

Corollary 10.3.1 Under the conditions of Theorem 10.3.1, expression(10.3.18) is minimized by the |b(w)} and (C(M)} of the theorem now based onV/\),y= 1 , . . . , . y , the latent vectors of frK*)~!/2fy;r(*)frjKX)~!f*i<A)frKA)-"2-

The procedure suggested by this corollary has the advantage of being in-variant under nonsingular filtering of the series involved; see Exercise 10.6.5.The latent vectors of the matrix of this corollary essentially appear in thefollowing:

Theorem 10.3.2 Suppose the conditions of Theorem 10.3.1 are satisfied.The real-valued series f//)» ^X0> t — 0, ±1,.. . of the form

and


with the standardizations Ay(a)rA/a) = 1, B/a)rB/a) = 1, having maxi-mum coherence, |/?r,i,,(A)|2, and coherence 0 with the series f*(/), Tj*(0, t = 0,±1,. . . , k < j, j = 1, . . . , min (r,s), are given by the solutions of theequations:

and

j — 1,. .. , min (r,s), where m(X) *£ w(A) ^ • • • . The maximum coherenceachieved is /iy{^)> J' — ! » • • • > niin (r,s).

The solutions of the equations (10.3.21) and (10.3.22) are intimately con-nected to the latent roots and vectors of the matrix of Corollary 10.3.1,which satisfy

This last gives

and

allowing us to identify nj(\) as p/A) and to take A/A) and B/A) proportionalto fA-Ar(^)"1^y(A)fyy(A)-»/2Vy(A) and fyy(A)-»/2V/A), respectively.

Theorem 10.3.2 has the advantage, over Corollary 10.3.1, of treating theseries X(/) and Y(/) symmetrically. The pair f//) and 7jy(0» / = 0, ±1,.. . ofthe theorem is called the jth pair of canonical series. Their coherence, nj(\),is called the jth canonical coherence. They could also have been introducedthrough an analog of Theorem 10.2.4.

In the case that the autocovariance functions involved fall off rapidly as|u| —> 0°, the filter coefficients appearing will similarly fall off. Specificallywe have

Theorem 10.3.3 Suppose the conditions of Theorem 10.3.1 are satisfiedand in addition


and

for some P ^ 0 and suppose that the latent roots of fYx(WxxO$~lfxY(X) aredistinct. Then the b(«), c(w) given in Theorem 10.3.1 satisfy

and

Likewise the autocovariance function of the error series t(t), t = 0, ±1,. ..satisfies

The following theorem provides a related result sometimes useful insimplifying the structure of a series under consideration.

Theorem 10.3.4 Suppose the conditions of Theorem 10.3.1 are satisfiedand in addition

and

for some P ̂ 0 and suppose the latent roots AH(X), . . . , nJ(\) of frr(X)~1/2

fy^(X)fÂ-(X)~1fA-y(X)fyK(X)~1/2 are distinct and nonzero. Then there existr X r and s X s filters {a(w)} and {b(w)j satisfying

and

such that the series


has spectral density matrix

Pinsker (1964) indicates that we can filter a stationary series in order toobtain a spectral density matrix of the form of (10.3.36).

10.4 THE CONSTRUCTION OF ESTIMATES ANDASYMPTOTIC PROPERTIES

Suppose that we have a stretch

of the (r -}- s) vector-valued stationary series

with spectral density matrix

and that we wish to construct estimates of the latent roots and transfer func-tions, M/X), A/X), B/X), j = 1 , 2 , . . . , described in Theorem 10.3.2. Anobvious means in which to proceed is to construct an estimate


of the matrix (10.4.3) and then to determine estimates as solutions of theequations

Now let us investigate the statistical properties of estimates constructed inthis way.

Suppose we take

Theorem 10.4.1 Let the (r -f .9) vector-valued series (10.4.2) satisfy As-sumption 2.6.2(1). Let VJ(T)(\), R/r)(X), S/r)(X) be the solutions of the systemof equations:

with the standardizations

as the estimate (10.4.4) where

and

for some weight function W(a). Then we have

and


where

Let (10.4.4) be given by (10.4.8) where W(a) satisfies Assumption 5.6.1.Let ju/r)(X), A/r>(\), B/r>(X) be given by (10.4.5) and (10.4.6). If BTT-* »as T —> oo, then

If, in addition, the latent roots of fYY(X)-ll2fYx(Wxx(\TlfxY(WYY(X)~112

are distinct, then

and

and

Theorem 10.4.1 suggests the importance of prefiltering. The distributionsof jt/r)(X), Aj,(T)(X), B/r)(X) are centered at the solutions of the equations(10.4.11) and (10.4.12). These equations will be near the desired (10.3.21)and (10.3.22) only when the weighted average (10.4.13) is near (10.4.3). Thelatter is more likely to be the case when the series have been prefilteredadequately.

Turning to an investigation of the asymptotic distribution of M/r)(X) andA/r)(X), B/r>(X) we have

Theorem 10.4.2 Under the conditions of Theorem 10.4.1 and if theM/Xm), j = 1 , . . . , min (r,s) are distinct for m = 1 , . . . , Af, the variatesM/r)(Xm), A/r)(XM), B/r)(xm), j = 1, 2 , . . . , w = 1 , . . . , M are asymptoti-cally jointly normal with asymptotic covariance structure


and suppressing the dependence of population parameters on Xm,

with analagous expressions for cov {A/r)(Xm), B*(7>)(A,,)}, cov {B/(r)(Xw),Byt^(Xn)} deducible from (10.2.84) to (10.2.87).

Expression (10.4.20) implies that

in addition to which tanh"1 VM/"(X) will be asymptotically normal. Theseresults may be used to construct approximate confidence limits for thecanonical coherences.

An alternate form of limiting distribution results if we consider thespectral estimate (8.5.4) corresponding to a simple average of a fixed numberof periodogram ordinates.

Theorem 10.4.3 Let the (r -f- s) vector-valued series (10.4.2) satisfy As-sumption 2.6.1 and have spectral density matrix (10.4.3). Let this matrix beestimated by (8.5.4) where m, s(T) are integers with 2irs(T)/T —» X as T —» ».Then let

be distributed as (2m + I)"1 Wîm + 1, (10.4.3)) if X ̂ 0 (mod TT), and as(2m)-Wr+s(2m, (10.4.3)) if X = 0 (mod TT). Then, as r-» », M/D(A),A,(7XX), B/r>(X) tend in distribution to the distribution of £,, A,, By—thesolutions of the equations

and


Constantine (1963) and James (1964) give the distribution of the fa,7 = 1 , 2 , . . . .

The distributions obtained in Theorems 10.4.2 and 10.4.3 are not incon-sistent. If, as in Section 5.7, we make the identification

and m is large, then, as Theorems 10.2.3 and 10.2.6 imply, the HJ(T)(\),A/r)(X) and B/r)(X) are asymptotically normal with the appropriate first-and second-order moment structure.

10.5 FURTHER ASPECTS OF CANONICAL VARIATES

We begin by interpreting the canonical series introduced in this chapter, interms of the usual canonical variates of vector-valued variables with real-valued components. Let X(/,X) and Y(/,X) signify the components of fre-quency X of the series X(f), t = 0, ±1,. . . and Y(f), t = 0, ±1,. . ., respec-tively. Then (see Sections 4.6, 7.1) the 2(r -f s) vector-valued variate

A standard canonical correlation analysis of the variate (10.5.1) would thuslead us to consider latent roots and vectors based on (10.5.2), specifically theroots and vectors of

has covariance matrix proportional to

Following Lemma 3.7.1, these are essentially the roots and vectors of

10.5 FURTHER ASPECTS OF CANONICAL VARIATES 389

^YY2^Yx^xx~l^xY^YY2- *n summary> we &w that a frequency domain canoni-cal analysis of the series

may be considered to be a standard canonical correlation analysis carriedout on the individual frequency components of the series X(f) and Y(/) andtheir Hilbert transforms.

Alternately we can view the variates appearing in Theorem 10.4.3 as re-sulting from a canonical correlation analysis carried out on complex valuedvariates of the sort considered in Theorem 10.2.6. Specifically Theorem 4.4.1suggests that for s(T) an integer with 2trs(T)/T ^ X ̂ 0 (mod TT), the values

are approximately a sample of size (2m +1) from

The discussion preceding Theorem 10.2.6 now leads us to the calculationof variates of the sort considered in Theorem 10.4.3.

We remark that the student who has available a computer program forthe canonical correlation analysis of real-valued quantities may make use ofthe real-valued correspondence discussed above in order to compute esti-mates of the coefficients, rather than writing a new program specific to thecomplex case.

Further statistics we may wish to calculate in the present context include:

and

for u = 0, ±1, . . . , where A/r)(X), B/r)(A) are given by the solutions of(10.4.5) and (10.4.6). These statistics are estimates of the time domain co-efficients of the canonical series.

By analogy with what is done in multivariate analysis, we may wish toform certain real-valued measures of the association of the series X(/) andY(/) such as Wilks' A statistic


the vector alienation coefficient

Sample estimates of these coefficients would be of use in estimating thedegree of association of the series X(f) and Y(?) at frequency X.

Miyata (1970) includes an example of an empirical canonical analysis ofsome oceanographic series.

10.6 EXERCISES

10.6.1 Show that if in Theorem 10.2.1 we set q = r, then we obtain the multipleregression results of Theorem 8.2.1.

10.6.2 If T is taken to be Syy in Theorem 10.2.1, show that the criterion (10.2.5)is invariant under nonsingular linear transformations of Y.

10.6.3 Under the conditions of Theorem 10.3.1, prove that Mi(X) = \Ryx(\)\2 ifs= 1.

10.6.4 Under the conditions of Theorem 10.3.2, prove |/u/00! ^ 1-10.6.5 Under the conditions of Theorem 10.3.1, prove the canonical coherences

J*A),7 = 1» 2, ... are invariant under nonsingular filterings of theseries X(r), / = 0, dbl, . . . or the series Y(f), t = 0, ±1, . . . .

10.6.6 Suppose the conditions of Theorem 10.3.1 are satisfied. Also, Cxx(u),CXY(U), CYY(U) = 0 for u 7* 0. Then \b(u)}, }c(«)} given in the theoremsatisfy b(«), c(«) = 0 for u ̂ 0.

10.6.7 Demonstrate that the coherence |/?r;r(X)|2 can be interpreted as the largestsquared canonical correlation of the variate (AX/,X), X(t,\)H] with thevariate (>UX), WO"}.

10.6.8 Prove that if in the estimate (8.5.4), used in Theorem 10.4.3, we smoothacross the whole frequency domain, then the proposed analysis reduces toa standard canonical correlation analysis of the sample covariance matrix

or the vector correlation coefficient

10.6.9 Suppose the data is tapered with tapering function h(t/T) prior to calculat-ing the estimates of Theorem 10.4.2. Under the conditions of the theorem,prove the asymptotic covariances appearing become multiplied byJ/KOWl/MOW.

10.6.10 Suppose that there exist J groups of r vector-valued observations with Kobservations in each group. Suppose the vectors have complex entries and

10.6 EXERCISES 391

(a) Show that the linear discriminant functions, (3TY, providing the ex-trema of the ratio (JTSj(5/(JTS0'(5 (of between to within group sumsof squares) are solutions of the determinental equation

for some v.(b) Define a J — \ vector-valued indicator variable X = [X}] with

Xj = 1 if Y is in the y'th group and equal 0 otherwise, j = 1, . . . .J — 1. Show that the analysis above is equivalent to a canonicalcorrelation analysis of the values

See Glahn (1968).(c) Indicate extensions of these results to the case of stationary time

series ¥,*(/), / = 0, ±1,

PROOFS OF THEOREMS

PROOFS FOR CHAPTER 2

Proof of Theorem 2.3.1 Directly by identification of the indicated co-efficient.Proof of Lemma 2.3.1 First the "only if" part. If the partition is not inde-composable, then following (2.3.5), the </>(/•/./,) — ̂ C/Y/i); 1 ̂ ji ^ h ^ Ii>i = /i, . . . , /o generate only the values sm> — sm>>\ m', m" = m\,. . . , m/v.There is no way the values sm> — sm»', m' = m\, . . . , m/v and m" ^mi,. . . , WN can be generated.

Next the "if" part. Suppose the ^//i) — <t4.rij^l 1 ̂ ji ^ h ^ Ji\i = 1,. . ., / generate the .smi — smi-, 1 ^ mi 7* mi ^ M. It follows thateach pair Pm>, Pm" of the partition communicate otherwise sm> — sm» wouldnot have been generated. The indecomposability is thus shown.

We next demonstrate the alternate formulation. If the partition is notindecomposable, then following (2.3.5), the^(r,-;) — iK/Vy); OJ)> C1''/) ^ ?>»'•>m = mi,. . ., WAT generate only the values /, — /,-; /, /' = i\,. .. , z'o. Thereis no way the values ti — ti>,i = i\, . . . , /o; /' ^ ii,. . . , /o can be generated.

On the other hand, if the ^(r/y) — ̂ (r/'y) generate all the /,- — /,-<, thenthere must be some sequence of the Pm beginning with an / and ending withan /' and so all sets communicate.Proof of Theorem 2.3.2 We will proceed by induction on j. From Theorem2.3.1 we see that

EYi'-Yj-Z *>»•-*>„ (*)M

392

with ]5£/ extending over partitions with/) ^ 2. We see that the terms comingfrom decomposable partitions are subtracted out in the above expressionyielding the stated result.Proof of Theorem 2.5,1 We will write A ^ 0 to mean that the matrix A isnon-negative definite. The fact that fÂ-(X) is Hermitian follows from (2.5.7)and CXX(U)T = cxx(-u).

Next suppose EX(t) = 0 as we may. Consider

where C, = cum (Xau ... ,XUm) when v = (a\.,.. ., a«), (the a's being pairsof integers) and the sum extends over all partitions of the set {(m,«) | m = 1,. . . J and n = 1,.. . , km}.

From (*) and (**) we see

PROOFS FOR CHAPTER 2 393

where Z>M = cum (Yait..., Yam) when /* = (a i , . . . , am) and the sum ex-tends over all partitions (MI, . . . , MP) of (1,. . . ,y). Also

that is, it is a Cesaro mean of the series for fjr*(M- By assumption this seriesis convergent, and so Exercise 1.7.10 implies

and the latter must therefore be ^ 0.Proof of Theorem 2.5.2 Suppose EX(t) = 0. Set

By construction Wr)(X) ^ 0 and so therefore Elxx(TW 2 0. We have

lies between 0 and Cxx(ty. By an extension of Helly's selection theoremthis sequence will contain a subsequence converging weakly to a matrix-valued measure at all continuity points of the limit. Suppose the limit ofsuch a convergent subsequence, Jjrjr<r)(X), is Fxx(ty- By approximating theintegrals involved by finite sums we can see that

394 PROOFS OF THEOREMS

for u - 0, ±1,. . . , ±r. Now the sequence of matrix-valued measures

In addition from (**) it tends to cxx(ii). This gives (2.5.8).Expression (2.5.9) follows from the usual inversion formula for Fourier-

Stieltjes transforms. The increments of F^^(X) are ^ 0 by construction.Proof of Lemma 2.7.1 We have

usirig the properties of linearity and time invariance. Setting / = 0 itfollows that

and we have (2.7.6) with A(X) = ft[e](0).Proof of Lemma 2.7.2 The properties indicated are standard results con-cerning the Fourier transforms of absolutely summable sequences; seeZygmund (1968), for example.Proof of Lemma 2.7.3 We note that

It follows that

and so ̂ « a(f — w)X(«) is finite with probability 1. The stationarity of Y(r)follows from the fact that the operation is time invariant.

Continuing, we have

completing the proof. •

for some finite K. The latter tends to 0 as T, T —> oo in view of (2.7.23). Thesequence Yr(0> T = 1, 2 , . . . is therefore Cauchy and so the limit (2.7.25)exists by completeness.

Proof of Lemma 2.7A Set


Proof of Theorem 2.7.1 Set

showing that the operations are indeed linear and time invariant. Now

from which (2.7.34) and (2.7.35) follow.

Proof of Theorem 2,8.1 The series X(/) is r vector-valued, strictly sta-tionary, with cumulant spectra fai ak(\i, .. ., A*). Y(/) = ^JL-o, a(f — u)X(w); where the a(«) are the coefficients of an s X r filter and XL°°= - »|O/XM)| < °°, / = 1,. . . , s,j = !,... ,/•. From Lemma 2.7.3, Y(f) is alsostrictly stationary and its cumulant functions exist. Indicate these by

and the interchange of averaging and summation needed in going from (*)to (**) above is justified. Now, dbl bk(vi,.. ., pjt-i) is a sum of convolu-tions of absolutely summable functions and is, therefore, absolutely sum-mable. We see that Y(0 satisfies Assumption 2.6.1. Expression (2.8.1) followson taking the Fourier transform of the cumulant function of Y(/) and notingthat it is a sum of convolutions.Proof of Lemma 2.9.1 If X(t) is Markov and Gaussian, then X(s + i)- E{X(s + t) \X(s)\, t>0, s^ 0, is independent of JT(0). Therefore,cov {*(0), [X(s -f 0 - E{X(s + t)\X(s)\]\ = 0 and since E\X(s + /) | X(s)\= K + X(s)cxx(t)/cxx(Q), K a constant, we have

In view of the absolute summability of the %(M), ay, jk(u\,..., w*_i) isabsolutely summable by Fubini's theorem. Therefore


From this we see that

The proof is completed by noting that cxx(t) = cxx(-t).Proof of Theorem 2.9.1 We begin by noting that under the stated assump-tions Y(t) exists, with probability 1, since

Consider

where


The cumulant involving the X's is, from Theorem 2.3.2, the sum over inde-composable partitions of products of joint cumulants in theX(ti — M//), say

we see that the cumulants of the series Y(t) are absolutely summable.

Proof of Theorem 2.10.1 We anticipate our development somewhat in thisproof. In Section 4.6 we will see that we can write

Because the series is stationary, the cumulants will be functions of the differ-ences ti — tt> — iiij + Mfy. Following Lemma 2.3 .1 ,7—1 of the differences// — tf will be independent. Suppose that these are t\ — / / , . . . , ti-\ — t/.

Setting ti = 0 we now see that

where g is absolutely summable as a function of its arguments. Making thechange of variables

where

Substituting into (2.9.15) shows that

Now using Theorem 2.3.2 and the expressions set down above gives thedesired expression (2.10.10).



Proof of Theorem 3.3.1 We quickly see the first relation of (3.3.18). It maybe rewritten as

from which the second part of (3.3.18) follows.Proof of Lemma 3.4.1 We have

where |e(r)(X)| < L ̂ « |a(«)| • \u\ for some finite L because the componentsof X(0 are bounded.Proof of Theorem 3.5.1 The theorem follows directly from the substitu-tions j = jiT2 + J2, t = ti + tiT\ and the fact that exp {— i2irk\ = 1 for kan integer.Proof of Theorem 3.5.2 See the proof of Theorem 3.5.3.Proof of Theorem 3.5.3 We first note that the integers

when reduced mod T run through integers t,Q ^ t ^ T — 1. We see thatthere are Ti • • -7* = T possible values for (*), each of which is an integer.Suppose that two of these, when reduced mod T, are equal, that is

for some integer /. This means

The left side of this equation is not divisible by 7"i, whereas the right side is.We have a contradiction and so the values (*) are identical with the integers/ = 0,. . . . , r — 1. The theorem now follows on substituting

and reducing mod T.

has rank at mosty + L — \. By inspection we see that this minimum isachieved by the matrix A of (3.7.19) completing the proof.Proof of Theorem 3.8.1 See Bochner and Martin (1948) p. 39.


Proof of Lemma 3.6.1 See the proof of Lemma 3.6.2.Proof of Lemma 3.6.2 If we make the substitution

then expression (3.6.11) becomes

This last gives (3.6.5) in the case r = 2. It is also seen to give (3.6.10) in thecase | t/y | ^ S - T.Proof of Lemma 3.7 J The results of this lemma follow directly once thecorrespondence (3.7.7) has been set up.Proof of Theorem 3.7.1 See Bellman (1960).Proof of Theorem 3.7.2 The matrix ZrZ is non-negative definite andHermitian. It therefore has latent values ju;2 for some nj_^ 0. Take V to bethe associated matrix of latent vectors satisfying ZTZV = VD whereD = diag {juy2}. Let M = diag {/u/} be s X r. Take U such that UM_= ZV.We see that U is unitary and composed of the latent vectors of Z ZT. Theproof is now complete.Proof of Theorem 3.7.3 Let ju* denote expression (3.7.11) and a* denote ex-pression (3.7.12). We quickly check that Za^ = /**«*•Proof of Theorem 3.7.4 Set B = Z — A. By the Courant-Fischer theorem(see Bellman (1960) and Exercise 3.10.16),

where D is any (j — I) X J matrix and x is any J vector. Therefore

because the matrix

where £(A) is a spectral family of projection operators on H. This family hasthe properties; £(X)£(M) = £(M)£(X) = £(min {X, M}), E(-ir) = 0, E(ir) = /and £(X) is continuous from the right. Also for Y$t\ Y2(t) in H,(E($Yi,Y2}is of bounded variation and


Proof of Theorem 3.8.2 The space K+(/) is a commutative normed ring;see Gelfand et al (1964). The space 9fE of maximal ideals of this ring ishomomorphic with the strip — v < Re X ̂ IT, Im X ̂ 0, in such a way thatif M 6 3TI and X are corresponding elements then

These are the functions of K+(/). The stated result now follows from Theo-rem 1, Gelfand et al (1964) p. 82. The result may be proved directly also.Proof of Theorem 3.8.3 The space V(l) is a commutative normed ring. Itsspace of maximum ideals is homomorphic with the interval ( —TT.TT] throughthe correspondence

for M in the space of maximal ideals. The x(Af) are the elements of V(l).The theorem now follows from Theorem 1, Gelfand et al (1964) p. 82.Proof of Theorem 3.9.1 Consider the space consisting of finite linear com-binations ofXj(t + s),j = 1, . . . , / • and t = 0, ±1 , . . . . An inner productmay be introduced into this space by the definition

The space is then a pre-Hilbert space. It may be completed to obtain aHilbert space H. There is a unitary operator, 01, on H such that

Following Stone's theorem (see Riesz and Nagy (1955)) the operator 01 hasa spectral representation

If we define Z/(X;s) = E(\)Xj(s) we see that


in the sense of (3.9.6). Also from (*) above

Bochner's theorem now indicates that G^(X) defined by (3.9.4) is given by(ZXX;s),Z*(X;,y)). The remaining parts of the theorem follow from the prop-erties of £(X).Proof of Theorem 3.9.2 Set

In view of (3.9.11), there exists Z*(X) such that

Now take an equivalent version of Z(X) with the property= X(0). We see that

and

where nu*(u) is given by (3.9.3). Now because

the uniqueness theorem for Fourier-Stieltjes transforms gives (3.9.15).



Before proving Theorems 4.3.1 and 4.3.2 we first set down some lemmas.

Lemma P4.1 If ha(u) satisfies Assumption 4.3.1 and if ha(T)(f) = ha(t/T) for

a = 1,. . ., r, then

for some finite K.

Proof The expression in question is

for some finite L. Suppose for convenience ua > 0. (The other cases arehandled similarly.) The expression is now

as desired.

Lemma P4.2 The cumulant of interest in Theorems 4.3.1 and 4.3.2 isgiven by

where S = 2(T - 1) and

for some finite K.

Proof The cumulant has the form


Using Lemma P4.1 this equals

where er has the indicated bound.

Lemma P4.3 Under the condition (4.3.6), sr = o(T") as T-+ «.Proof

Now T~l(\ui\ -\ 1- |M*_I|) —> 0 as T—> «>. Because of (4.3.6) we may nowuse the dominated convergence theorem to see that T^jer) —* 0 as T —» «>.

Lemma P4.4 Under the condition (4.3.10), er = O(l).Proof Immediate.Proof of Theorem 4.3.1 Immediate from Lemmas P4.2, P4.3 and thefact that

Proof of Theorem 4.3.2 Immediate from Lemmas P4.2, P4.4 and the factthat

since (4.3.10) holds.

The following lemma will be needed in the course of the proof of Theo-rem 4.4.1.

Lemma P4.5 Let Y(r), T = 1,2,... be a sequence of r vector-valuedrandom variables, with complex components, and such that all cumulants ofthe variate [Y\(T\ Ti(T\ . . . , Yr

(T\ Fr(r)J exist and tend to the corresponding

cumulants of a variate [Yi, F i , . . . , Yr, Tr] that is determined by its mo-ments. Then Y(r) tends in distribution to a variate having componentsYi,...,Yr.


Proof All convergent subsequences of the sequence of cdf's of Y(r) tend tocdfs with the given moments. By assumption there is only one cdf withthese moments and we have the indicated result.

Proof of Theorem 4A.I We begin by noting that

We therefore see that the first cumulant of d*(r)(X/J)) behaves in the man-ner required by the theorem.

Next we note, from Theorem 4.3.1, that

The latter tends to 0 if X/T) ± \k(T) ^ 0 (mod 2*-). It tends to 2vfab(±\}) ifiX/I) ss ±X*(!T) (mod 27r). This indicates that the second-order cumulantbehavior required by the theorem holds.

Finally, again from Theorem 4.3.1,

This last tends to 0 as T-* » if k > 2 because A(r)(-) is O(J).Putting the above results together, we see that the cumulants of the

variates at issue, and the conjugates of those variates, tend to the cumulantsof a normal distribution. The conclusion of the theorem now follows fromthe lemma since the normal distribution is determined by its moments.

Before proving Theorem 4.4.2 we must state a lemma:

Lemma P4.6 Let ha(f) satisfy Assumption 4.3.1, a = 1,.. ., r and let//aS/X) be given by (4.3.2). Then if X ̂ 0 (mod'27r)

for some finite K.Proof Suppose, for convenience, that h(i) is nonzero only if 0 ^ t < T.Using Exercise 1.7.13, we see that


if we use the lemma required in the proof of Theorem 4.3.2.

Proof of Theorem 4.4.2 We proceed as in the proof of Theorem 4.4.1.

using Lemma P4.6 and Lemma P4.1.Next from Theorem 4.3.1,

This tends to 0 if X, ± X* ^ 0 (mod 2r) following Lemma P4.6. It tends to

2x{ / ha(i)hb(t)dt\ fab(±\j) = 2irHah(Q)fab(±\j) if ±X,- = ±X* (mod 2*-).

Finally

This tends to 0 for k > 2 as //«3t(x) = O(T) and the proof of the theorem

follows as before.

To prove Theorem 4.5.1, we proceed via a sequence of lemmas. Set

H2 = / h(t?dt,or = var Re dx™(\) = ± / \H™(\ - a) + H™(-\ - a^fxx(a)dot.

Lemma P4.7 Under the conditions of Theorem 4.5.1, for given X, e and asufficiently small,

Proof From the first expression of the proof of Lemma P4.2, we see

where L = sup \h(u)\ and C* is given by (2.6.7). Therefore

The indicated expression now follows on taking a sufficiently small.


Corollary Under the conditions of Lemma P4.7.

Lemma P4.8 Let \r = 2irr/R, r = 0 , . . . , / ? — 1 for some integerR > 6irT. Then

Proof This follows immediately from Lemma 2.1 of Woodroofe and VanNess (1967); see also Theorem 7.28, Zygmund (1968) Chap. 10.

Lemma P4.9 Under the conditions of Theorem 4.5.1

Proof The indicated expected value is

giving the result because the sum runs over R = exp {log R\ points and

Lemma P4.10 Given e, S > 0, let a2 = 2^1 + sX2 + 5)r(log T)H2 supx

fxxO*). Under the conditions of Theorem 4.5.1,

for some K.Proof The probability is^ exp { — aa\ 2 exp {log R

This last is ^ KT~*~S after the indicated choice of a.


Corollary Under the conditions of Theorem 4.5.1

with probability 1.Proof From the Borel-Cantelli lemma (see Loeve (1965)) and the fact thats, 5 above are arbitrary.

Proof of Theorem 4.5.1 We can develop a corollary, similar to the last one,for Im e/;r(r)(X). The theorem then follows from the fact that

Proof of Theorem 4.5.2 We prove Theorem 4.5.3 below. The proof ofTheorem 4.5.2 is similar with the key inequality of the first lemma belowreplaced by

To prove Theorem 4.5.3, we proceed via a sequence of lemmas.

Lemma P4.ll Suppose h(u) has a uniformly bounded derivative and finitesupport. Let

Then

Therefore

In absolute value this is

for some

Proof


for some finite M with L denoting a bound for the derivative of h(ii).

Lemma P4.12 For a sufficiently small there is a finite L such that

Proof From the previous lemma

for |«| sufficiently small and some finite L.

Lemma P4.13 Let \r = 2irr/R, r = 0,. . ., R — 1 for some integerR > \2irT, then there is a finite N such that

for some K. Now £(r)(A) may be written

The first term here is a trigonometric polynomial of order IT. From Lemma2.1 of Woodroofe and Van Ness (1967) we therefore have

The latter and (*) now give the indicated inequality.

Lemma P4.14 Under the conditions of Theorem 4.5.3,


Proof Immediate from Lemma P4.12 and the fact that the sup runs overR points.

Lemma P4.15 Given 8 > 0, let a1 = 4L(2 + 8) log T/T, then under theconditions of Theorem 4.5.3,

for some finite K.Proof Set R = T log T and

The probability is then

Corollary Under the conditions of Theorem 4.5.3

for some finite K with probability 1.

Proof of Theorem 4.5.3 The result follows from Theorem 4.5.1, the previ-ous corollary and Lemma P4.13.

Proof of Theorem 4.5.4 Exercise 3.10.34(b) gives

Let k be a positive integer. Holder's inequality gives

for some finite ^following Exercise 3.10.28. It follows from Theorem 2.3.2and (*) in the Proof of Lemma P4.7 that

for some finite M and so


for some finite N. This gives

As S J1-̂ *.-!) < co for k sufficiently large, we have the result of the theorem.

To prove Theorem 4.6.1, we first indicate a lemma.

Lemma P4.16 Suppose X(0, t = 0, ±1,. . . satisfies Assumption 2.6.1. Let

then

Proof The cumulant may be written as (2ir)~k times

if we substitute for the cumulant function.The limit indicated in the lemma now results once we note that

where ?/(.) is the periodic extension of the Dirac delta function; see Exercise2.13.33.

Proof The cumulant is a sum of terms of the form


In view of the lemma above these all tend to ± the same limit. The sumtherefore tends to 0.

Corollary

Proof The moment may be written as a sum of cumulants of the form ofthose appearing in the previous corollary. Each of these cumulants tend to 0giving the result.

Proof of Theorem 4.6.1 From the last corollary above we see that thesequence Za

(r)(X), T = 1, 2 , . . . is a Cauchy sequence in the space Lv forany v > 0. Because this space is complete, the sequence has a limit Zfl(X)in the space.

To complete the proof we note that expression (4.6.7) follows fromLemma P4.16 above.

Proof of Theorem 4.6.2 Set

From this we see that

Also

then


In a similar manner we may show that

From these last two we see

and so

with probability 1, / = 0, ±1,.. . giving the desired representation.Proof of Theorem 4.7.1 We may write

The latter is clearly minimized with respect to B by setting

Now

Following Corollary 3.7.4 this is minimized by setting

in the notation of the theorem. This gives the required result.


Proof of Theorem 5.2.1 We have

Expression (5.2.6) now follows after the substitution


Proof of Theorem 5.2.2 Proceeding as in the proof of Theorem 4.3.2, wesee that (5.2.7) implies

and (5.2.8) follows from (*) immediately above.Proof of Theorem 5.2.3 We begin by noting that

Next we have

Expression (5.2.17) now follows from the fact that

Proof of Theorem 5.2.4 See proof of Theorem 5.2.5 given immediatelybelow.Proof of Theorem 5.2.5 From Theorem 4.3.2 we have

giving the indicated result.We now set down a result that will be needed in the next proof and other

proofs throughout this work.

Theorem P5.1 Let the sequence of r vector-valued random variablesXr, T = 1,2,... tend in distribution to the distribution of a random vari-able X. Let g : Rr —» Rs be an s vector-valued measurable function whose


discontinuities have X probability 0. Then the sequence of s vector-valuedvariables g(Xr), T= 1,2,. . . tends in distribution to the distribution of g(X).Proof See Mann and Wald (1943a) and Theorem 5.1 of Billingsley (1968).

A related theorem that will also be needed later is

Theorem P5.2 Let the sequence of r vector-valued random variablesV?(YT- - y), T = 1 ,2 , . . . tend in distribution to A^(0,S). Let g : R'-> R'be an s vector-valued function differentiable in a neighborhood of y andhaving s X r Jacobian matrix J at y. Then ^7Xg(Y;r) — g(i»)) tends in distri-bution to #XO»JSJr) as r-» <*>.Proof See Mann and Wald (1943) and Rao (1965) p. 321.

Corollary P5.2 (The Real-Valued Case) Let ^T(YT - n), 7 = 1, 2,...tend in distribution to N(Q,ir2). Let g : R-+ R have derivative g' in a neigh-borhood of it. Then ^(^Yr) - «GO) -» N(0,[g'(n)?a2) as r-* ».

Proof of Theorem 5.2.6 Theorem 4.4.1 indicates that Re dx(T)(\j(T)),Im <£r(r)(X/r)) are asymptotically independent N(0,vTfxx(^j)) variates. Itfollows from Theorem P5.1 that

is asymptotically fxx(^j)Xi2/2. The asymptotic independence for differentvalues of j follows in the same manner from the asymptotic independenceofthe<//«(A/r)),y = l,...,J.Proof of Theorem 5.2.7 This theorem follows from Theorem 4.4.2 asTheorem 5.2.6 followed from Theorem 4.4.1.Proof of Theorem 5.2,8 From Theorem 4.3.2

The indicated result follows as

Proof of Theorem 5.3.1 This theorem is an immediate consequence ofExercise 4.8.23.Proof of Theorem 5.3.2 Follows directly from Theorem 4.5.1 and the defini-tion of I*x™(\).Proof of Theorem 5.4.1 This theorem follows directly from expression(5.2.6) of Theorem 5.2.1 and the definitions of /4rm(X), £rm(A), Cr"<X).


The corollary follows from Theorem 5.2.2.

Proof of Theorem 5.4.2 This follows from Theorem 5.2.4.

Proof of Theorem 5.4.3 Follows from Theorem 5.2.6.

Proof of Theorem 5.5.1 This theorem follows from expression (5.2.6) ofTheorem 5.2.1 and the definitions of /4r(A), 5r(A), Cr(A). The corollaryfollows from Theorem 5.2.2.

Proof of Theorem 5.5.2 From Theorem 5.2.4.

Proof of Theorem 5.5.3 From Theorem 5.2.6 and Theorem P5.1.The following lemma will be required in the course of the proofs of

several theorems.

Lemma P5.1 If a function g(x) has finite total variation, V, on [0,1], then

Proof See Polya and Szego (1925) p. 37; a related reference is Cargo (1966).If g is differentiable, the right side may be replaced by J \g'(x)\dx/n.

Further results are given as Exercises 1.7.14 and 5.13.28.

Proof of Theorem 5.6.1 The first expression in (5.6.7) follows directly fromexpression (5.2.8) and the definition (5.6.1).

If we use the lemma above to approximate the sum appearing by anintegral, then we see that

giving the final expression in (5.6.7).

Proof of Theorem 5.6.2 Using Theorem 5.2.5, the indicated covariance isgiven by

giving the indicated first expression. The second expression follows fromthis on replacing the sum by an integral making use of Lemma P5.1.Proof of Theorem 5.6.3 See the proof of Theorem 7.4.4.Proof of Corollary 5.6.3 This follows from Theorem 5.6.3 and CorollaryP5.2.

Proof of Theorem 5.8.2 The first expression in (5.8.18) follows directlyfrom the definition of /o-(r)(A) and expression (5.8.9). The second ex-pression of (5.8.18) follows from the first, neglecting terms after the first,and Lemma P5.1.

Proof of Corollary 5.5.2 This follows after we substitute the Taylor ex-pansion


Proof of Theorem 5.6.4 See the proof of Theorem 7.7.1.Proof of Theorem 5.8.1

in view of (5.8.7) and because |&(r)(w)| ^ 1. This in turn equals

giving the desired result because

into the second expression of (5.8.18).

Proof of Theorem 5.9.1 We write X' for X — CX(T) below. Now

The indicated result now follows as;


giving

and finally

Proof of Theorem 5.9.2 Follows directly from Theorem 5.3.1.

Proof of Theorem 5.10.1 From Theorem 5.2.2

for s ^ 0(mod T) and s an integer. This gives the first part of (5.10.12).The second part follows from Lemma P5.1.

Continuing, from Theorem 4.3.2


Taking note of the linear restrictions introduced by the A(r) functions, wesee that the dominant term in this cumulant is of order T~L+l.

Now, when the variates Tll2J(T)(Aj),j = 1,...,/, are considered, we seethat their joint cumulants of order greater than 2 all tend to 0. It followsthat these variates are asymptotically normal.Proof of Theorem 5.10.2 We proceed as in the proof of Theorem 5.9.1.Proof of Theorem 5.11.1 In order to avoid cumbersome algebraic detail,we present a proof only in the case J = 1. The general J case follows in asimilar manner.

The model is

the inner sum being over all indecomposable partitions of the table

giving expression (5.10.13).Turning to the higher order cumulants we have


and the least squares estimate

Because £e(0 = 0, we see from the latter expression that EB(T) = 0. Also

It follows by the bounded convergence criterion that

At the same time

and so

as indicated in (5.11.20). In the case of higher order cumulants we see

in view of the second condition of Assumption 5.11.1. It follows that

as T —» oo for L > 2 and so 6(T) is asymptotically normal as indicated in thestatement of the theorem.

We next consider the statistical behavior of fee(T)(\). As

we have


Now

showing that the asymptotic distribution of fee(T)(\) is the same as that of

f,.(T)(X) given in Theorem 5.6.3. (op(l) denotes a variate tending to 0 inprobability.)

The asymptotic independence of 0(r) and/ee(r)(î)>. . . ,fee

(T)(\K) followsfrom a consideration of joint asymptotic cumulants.


Proofs of Theorems 6.2.1 and 6.2.2 These are classical results. Proofs(maybe found in Chapter 19, Kendall and Stuart (1961), for example.Proofs of Theorems 6.2.3 and 6.2.4 These results follow from Theorems6.2.1 and 6.2.2 when we rewrite (6.2.7) in the form

This is a model of the form considered in those theorems.

for some finite N. Therefore

for some finite N' and so

giving (5.11.21) from Theorem 5.6.1. It also follows from these inequalitiesthat

for some finite M, M' while


Proof of Theorem 6.2.5 Follows directly from the properties of & indi-cated in Theorem 6.2.4.Proof of Lemma 6.3.1 We have

where, because the components of X(/) are bounded |e<r)(/3)|^ 4wi2« W")!- The last part follows directly.

In the proofs below we will require the following lemma.

Lemma P6.1 Given a 1 X M matrix P and an r X M matrix Q we have

Proof We begin by noting the matrix form of Schwarz's inequality

(This follows from the minimum achieved in Theorem 6.2.1.) This implies

Now

and the result follows.

Proof of Theorem 6.4.1 Because Es(t) = 0, we have EA(r)(X) = f/«(r)(X)

where following Lemma P6.1

from which (6.4.9) follows.Before proving Theorem 6.4.2 we state a lemma which is a slight extension

of a result of Billingsley (1966).


Lemma P6.2 Let Z(r) be a sequence of q vectors tending in distribution toNq

c(Q,l) as T—» <». Let U(r) be a sequence of q X q unitary matrices. ThenUfinz<r) aiso tends in distribution to Nq

c(Qf).Proof Consider any subsequence of Z(r), say Z(r/). Because the group ofunitary matrices is compact (see Weyl (1946)), U(r/) has a convergent subse-quence, say U(r//) tending to U. Now, by Theorem P5.1 U(:r">Z(r"> tendsin distribution to \JNq

c(0,l) = A^C(0,I). Therefore any subsequence ofU(r)Z(r) has a subsequence tending in distribution to A^C(0,I) and soU<:r>Z(r> must tend to #,C(0,I).

Proo/ of Theorem 6.4.2 Consider X of Case A to begin. From Lemma 6.3.1

Let U(r) indicate a (2m + 1) X (2m +1) unitary matrix whose first rcolumns are the matrix Ui(r> = DxT[DxDxT]-112. Write U(r) = [Ui(7>)U2

(r)].Applying U(r) to the matrix equation above gives

The first r columns of the latter give

The remaining give

Because U(r) is unitary we have

s = 0, ± 1,. . ., ±m where the error term O(l) is uniform in s because A(X)has a uniformly bounded first derivative and ||dA-(r)(«)|| = O(T). (Theequations above may be compared with (6.3.7).) Now let Dy denote the1 X (2m +1) matrix whose columns are the values (2irT)~112

dy(T\2ir[s(T) + s]/T), s = 0, dbl, . . . , ±m with a similar definition for DA-and D«. The equations above now take the form


and so

where OP(l) denotes a variate bounded in probability.

Now Theorem 4.4.1 applies indicating that because the series e(/), / = 0,±1,... satisfies Assumption 2.6.1, D.r tends to JV2«+i(0,/,,(X)I). Therefore/./X)-1/2D.r tends to A^+iOM). Lemma P6.2 applies indicating that/.XX)-1/2(D.U(«)' also tends to N2m+i(0,l) and so (D.U<")T tends to#2m+t(0,/,.(X)I). The indicated asymptotic behavior of A(r)(X) and g..(7-)(X)now follows from the representations obtained for them above.

If X is of Case B or Case C, then the above form of argument goes throughwith the unitary matrix replaced by an orthogonal one. The behavior of n(T)

follows from its dependence on A(r)(0).

We need the following lemma.

Lemma P6.3 (Skorokhod (1956)) Let V<«, T = 1, 2 , . . . be a sequence ofvector-valued random variables tending in distribution to a random vari-able V. Then, moving to an equivalent probability structure, we may write

where Z is Nqc(Q,l) and so

and U("Z is A^(0,I) for all T.

Proof of Theorem 6.4.3 The last lemma shows that we may write

where the f, are independent Nic(0,2irTf.t(\)) variates. LetX* = 2ir[s(T) + s]/T.We may make the substitution </r(r)(X,) = A(X)djrm(X,) + f , + oa.,.(^JT).We have the sum of squares identity,

The terms appearing are quadratic forms of ranks 2m + 1, r, 2m + I — rrespectively in the {•„ plus terms tending to 0 with probability 1. Exercise4.8.7 applies to indicate that the first term on the right here may be written

This lemma provides us with another proof of Lemma P6.2. We may write


/WxÂ(\)f,jr<"(X)A(X)V/,.(X))/2 + ofl.,.(l), while the second term maybe written/.^X)x!(2in+i-rt/2 + o0.,.(l) with the x2 variates independent. Ex-pression (6.4.12) now follows by elementary algebra.Proof of Theorem 6.5.1 Let R(t) = £» a(f — u)X(u). Now, because£e(r) = 0, we have

Let rf*(r>(0) = A(0)d*<"O) + e(T)C9). From Lemma 6.3.1, em(/3) is uni-formly bounded. By substitution we therefore have

where, following Lemma P6.1,

for some finite K, where we use the facts that e(n(0) is bounded and thatW(ff) is non-negative. The first part of (6.5.14) now follows from Assump-tion 6.5.2. Turning to the second part: suppose 0 ^ X < 2*-. The region inwhich W™ is nonzero is |X — (2vs/T)\ ^ BTv. In this region, \(2*s/T) =A(X) + O(BT) because under the given assumptions A(j3) has a uniformlybounded first derivative. The proof of the theorem is now completed by thesubstitution of this last into the first expression of (6.5.14).Proof of Theorem 6.5.2 To begin we note that

giving the first part of (6.5.19). The second part follows from (6.5.14) and thefact that |a + s| = |a| + 0(<0.

To prove the first parts of (6.5.20) and (6.5.21) we use the Taylor seriesexpansions

from (6.6.3), and so

Because the series e(/), t = 0, ± 1,. . . is unobservable, these variates are un-observable. However, we will see that the statistics of interest are elementaryfunctions of these variates. Continuing to set up a notation, let [dx(T)(ty]k de-note the kth entry of d.y(r)(X) with a similar notation for the entries off*.(r)(X). We have

Lemma P6.4 If fxx(T)(X) is uniformly bounded, then


taking f + s = /4/r)(X), f = £^/r>(X) and using (6.6.3). To prove thesecond parts, we again use these expansions; however, this time withf + e = EAjW(\), f = Aj(K) and using (6.5.14).

Before developing the remaining proofs of this section we must first setdown some notation and prove some lemmas. We define

The error term is uniform in X.Proof We have

from which the result follows.

Lemma P6.5 If fxx(T)(ty is uniformly bounded, then

The error term is uniform in X, n.


Proof By virtue of Schwarz's inequality the absolute value of the ex-pression at issue is

giving the desired result.


Proof We begin by noting that

if we use Lemma 6.3.1. Because

The cumulant in question is given by


The cumulant appearing in the last expression has principal term

where pp = 2N and ti,.. ., t2N is a permutation of (s\t — s\\- • ",SN, — SN)corresponding to an indecomposable partition. We have p ^ n. We now useLemmas P6.4, P6.5 to eliminate the summations on q and r and see that theprincipal term in A is

giving the indicated result for L -f- M > 1. The other expressions follow in asimilar manner.

These estimates of the order of the joint cumulants are sufficient for cer-tain purposes; however, they are too crude in the second-order case. Ingreater detail in that case we have,



Proof Consider the second of these expressions. Following the first ex-pression of the proof of Lemma P6.4, the required covariance is

The other co variances also follow from the expression of Lemma P6.4, thefact that/.,(A) has a uniformly bounded derivative and the fact that thesupport of W(T\a) is |a| ^ BTTT.

In the lemma below we let CT = BT + T~112.

Lemma P6.8 Let R(t) = ]£„ a(f — w)X(w). Under the assumptions ofTheorem 6.5.1,

Proof We derived the first expression in the course of the proof of (6.5.14).The second is immediate. For the third we note

from which the indicated result follows. For the next


for finite K and L following Assumption 6.5.2. For the final statement wenote that

and the result follows from the earlier expressions of the lemma.

Proof of Theorem 6.5.3 From Lemma P6.8, we see that

from Lemma P6.7. From Theorem 5.6.1, we see that

and we have the indicated result.Proof of Theorem 6.5.4 From (6.3.2) and Lemma 6.3.1 we see

This gives

Therefore

using (6.5.14). The result now follows because under the indicated bounded-ness of X(r), t = 0, ±1,. . . , CX(T) is uniformly bounded.

Proof of Theorem 6.6.1 Directly from the definition of A(r)(X), we see that

and (6.6.3) follows from the first expression of Lemma P6.7.Proof of Theorem 6.6.2 As in the proof of Theorem 6.5.2, we have theTaylor expansions

The desired covariances now follow from these expansions and (6.6.3).


Proof of Theorem 6.6.3 From Lemma P6.8 we see

From Lemma P6.6 we see that the remainder term is

The indicated result now follows from (5.6.12).Proof of Theorem 6.6.4 From (6.3.2) and Lemma 6.3.1 we see

Expression (6.6.13) now follows from Theorem 4.3.1.

Proof of Theorem 6.6.5 The first covariance required is

if we use the representation of Lemma P6.8 and Lemma P6.6.The second covariance follows from the representation of n(T) given in the

Proof of Theorem 6.6.4 and from Lemmas P6.6 and P6.8. The final Co-variance follows likewise.

Proof of Theorem 6.7.1 We prove the first part of this theorem by evaluat-ing joint cumulants of order greater than 2 of A(r)G*), gtt

(T)(y) and provingthat, when appropriately standardized, these joint cumulants tend to 0.From Lemma P6.6 we see that

and these each tend to 0 as T —» °°. The second part of the theorem followssimilarly by evaluating joint cumulants.

Proof of Theorem 6.8.1 From (6.8.2) we see that

where the error terms are uniform. This gives the first part of (6.8.4); thesecond part follows algebraically.

Proof of Theorem 6.8.2 We begin by examining (6.6.3) and noting that


in this case. Now from (6.8.2) we see that

for p i£ q, 1 ^ p, q ^ Pr~l because BT ^ PT~I and so

from (6.6.3) giving (6.8.7).Proof of Theorem 6.8.3 This follows as did the proof of Theorem 6.8.2;however, we use (6.6.14) rather than (6.6.3).Proof of Theorem 6.8.4 We prove that the standardized joint cumulants oforder greater than 2 of the variates of the theorem tend to 0. We have

where we use Lemma P6.6 and also the remark at the end of its proof toeliminate one of the summations on p. The cumulant is seen to tend to 0 be-cause PTBT —> 0 as T—> oo.


Proof of Theorem 7.2.1

and so

We also have


from the Cramer representation. It follows from this that

Finally from Parseval's formula

and we have (7.2.7).Proof of Corollary 7.2.1 Suppose hj(u) = 0 for u < 0. (The general casefollows by writing ha as a function vanishing for u < 0 plus a functionvanishing for u ^ 0.) Now

using the Abel transformation of Exercise 1.7.13. If Va denotes the varia-tion of ha(u) we see that

At the same time

(*) and (**) show that the term in caCb tends to 0 as T —* °° if X ̂ 0 (mod ITT)or if ca or c* = 0. Next consider

We split the region of integration here into the regions \a < 8 and |a| ^ 5.In the first region, \fab(\ — a) — /a*(X)| may be made arbitrarily small bychoice of 8 as fab is continuous. Also there


In the second region, /^(X — a) —/»fr(X) is bounded and

from (*). It therefore follows from (**) that (***) tends to 0 as T—>«.Proof of Theorem 7.2.2 From Theorem 4.3.2

This gives the required result once we note that Hab(T\ Haled = O(T).

Proof of Corollary 7.2.2 We simply consider in turn the cases X ± n = 0(mod ITT) and X ± TT ^ 0 (mod 2ir).Proof of Theorem 7.2.3 Theorem 4.4.2 indicates that d*(r)(Xi),..., d*(r)(Xy)are asymptotically independent Nr

c(Q,2irT[Hab(Q)fab(W) variates. TheoremP5.1 now indicates that

j = 1 , . . . ,J are asymptotically independent Wrc(l,fxx(^jJ) variates. Theconclusions of the theorem now follow as

Proof of Theorem 7.2.4 This follows from Theorem 4.4.1 as Theorem 7.2.3followed from Theorem 4.4.2.Proof of Theorem 7.2.5 This follows directly from Exercise 4.8.23 andTheorem P5.1.Proof of Theorem 7.3.1 From Exercise 7.10.21.

for r an integer ̂ 0 (mod 7"). If X ̂ 0 (mod TT), this gives

Efxx™(\)

for(2*r/r) - X = O(T~l) gives (7.3.13) in the case X ̂ 0(mod7r)as2m -f 1terms of the estimates match up, while the other terms have covarianceO(T~l). Turning to the case X = 0 (mod *•), from Exercise 7.10.22(b) and thefact that m terms match up, the covariance is given by


giving (7.3.6). If X = 0 (mod 2*) or X = ±T, =fc3ir , . . . with T even, then

giving (7.3.7). If X = ±x, ±3*-,... with T odd, then

giving (7.3.8).Proof of Corollary 7.3.1 As fxx(a) is a uniformly continuous function ofa, expression (*), of the above proof, tends to f^-jr(X) as T —» <» if 2-rr/T —> X.This gives the indicated result.Proof of Theorem 7.3.2 If r, s are integers with 2wr/Tt 2irs/T ^ 0 (mod 2v),Exercise 7.10.22(a) gives

This together with the fact that

and we check that this can be written in the manner (7.3.13).Proof of Theorem 7.3.3 This theorem follows directly from Theorem 7.2.4and Theorem P5.1.Proof of Theorem 7.3.4 This follows directly from Theorem 7.2.1 and itscorollary.Proof of Theorem 7.3.5 The pseudo tapers

for /, m — 0 , . . . , L — 1. The general expression of the proof of Theorem7.2.2 with appropriate redefinition, now shows that

This now gives (7.3.18).

Proof of Theorem 7.3.6 Follows directly from Theorem 7.2.5 and TheoremP5.1.

The second term on the right side here may be made arbitrarily small bysplitting the range of summation into a segment where $2irs/T) — Xj < 5implies \fab(2irs/T) - fab($\ < e and a remainder where S W™(\ - (2irs/T))tends to 0 and | fab(2vs/T) — ̂ (X)| is bounded. This completes the proof of(7.4.9).


have the property


for s — 1,. . . , T — 1. This gives the first part of (7.4.9). Beginning the proofof the second part, the right side of the above expression has the form

where o(l) is uniform in s, from Exercise 1.7.10. Using this we see


Turning to (7.4.11), from Theorem 4.3.2

with the error term uniform in s. This gives the first part of (7.4.11). Thesecond follows from Lemma P5.1.Proof of Theorem 7.4.2 By Taylor series expansion of /j&(X — BTO) as afunction of a.Proof of Theorem 7.4.3 From expression (7.2.14)

r, s = 1,. .. , T — \ with the error term uniform in rt s. This gives

giving the first part of (7.4.15). The second part follows from Lemma P5.1.Proof of Corollary 7.4.3 This follows directly from the final part of ex-pression (7.4.15).Proof of Theorem 7.4.4 We have already investigated the asymptotic first-and second-order moment structure of the estimates. We will complete theproof of asymptotic joint normality by showing that all standardized jointcumulants of order greater than 2 tend to 0 as T —> <» under the indicatedconditions.

We have


In the discussion below set rî = Sk, rk2 = — Sk, k — 1,. . . , K. Alsoneglect the subscripts a\>.,., OK, hi,..., b/( as they play no essential role.From Theorems 2.3.2 and 4.3.2 it follows that the cumulant in this lastexpression is given by

and nit denotes the number of elements in vt. The cumulant (*) therefore hasthe form

where the summation extends over all indecomposable partitionsv = [ v i , . . ., vp\ of the table

The effect of the A(r) functions is to introduce q linear restraints if q < Kand q — 1 if q = K. We write this number as q — [q/K]. (Here [ ] denotes"integral part.") It follows that (*) is of order

It follows that

is of order B-KI2+lT-KI2+i and so tends to 0 as r-» « for K > 2. The de-sired result now follows from Lemma P4.5.


with the error term uniform in s. This gives the first part of (7.6.6) directly.The second part follows from Lemma P5.1. Continuing, from Theorem 4.3.2

the inner sum being over all indecomposable partitions of the table


giving expression (7.6.7).

Turning to the higher order cumulants we have, neglecting subscripts,


Taking note of the linear restrictions introduced by the A(r) functions wesee that the dominant term in this cumulant is of order T~L+l.

Now, when the variates r1/2Ja6(r)(y4;), j = 1 , . . . , J, a, b - 1 , . . . , r are

considered, we see that their joint cumulants of order greater than 2 all tendto 0. It now follows from Lemma P4.5 that the variates are asymptoticallynormal as indicated.

Proof of Theorem 7.6.2 We use the Taylor series expansion

to derive (7.6.15) and (7.6.16) from (7.4.13) and (7.4.17) using theorems ofBrillinger and Tukey (1964). The indicated asymptotic normality followsfrom Theorem 7.4.4 and Theorem P5.2.Proof of Theorem 7.6.3 We have already seen in Theorem 7.6.1 that thefinite dimensional distributions converge as required. We also see that

uniformly in X and so it is enough to consider the process Y(T)(X) =V^[FATAr(r)(X) - EFXx(T)(X)};Q ^ X ̂ TT. We have therefore to show that thesequence of probability measures is tight. It follows from Problem 6, p. 41of Billingsley (1968) that we need show tightness only for the marginalprobability distributions. Following Theorem 15.6 of Billingsley (1968)this will be the case if

We see directly that

From the proof of Theorem 7.6.1 we see that all the second-order momentsof the variates Yab

m(\) - Yab(T)(\i\ JVr)(X2) - Yab™(\) and their conju-

gates are ^ £|X2 — Xi| for some finite L. We have therefore only to consider


these domains of summation are disjoint, the cumulants on the right sideare of reduced order, in fact expression (*) is seen to be

when EU, EV = 0. This gives the desired result.Before proving Theorem 7.7.1 we remark that as the estimate is transla-

tion invariant, we may act as if EX(t) = 0. We set down some lemmas show-ing that mean correction has no asymptotic effect in the case that E\(t) = 0.

Lemma P7.1 Let X(r), t = 0, ±1, . . . be an r vector-valued series satis-fying Assumption 2.6.2(1) and having mean 0. Let ha(u), — °° < u < °°,satisfy Assumption 4.3.1, a = 1,. . . , r. Let ca6

(7)(«)begivenby(7.7.8)and

Then

uniformly in u.

Now from Theorems 5.2.3 and 5.2.8 as ca = 0

Also from the arguments of those theorems


uniformly in w. It follows that

giving the desired result.

Lemma P7.2 Suppose the conditions of the theorem are satisfied. Suppose£X(0 = 0 and

then

uniformly in X.Proof This follows directly from Lemma P7.1 and the fact that

Proof of Theorem 7.7.1 Lemma P7.2 shows that the asymptotics of/06(r)(X)

are essentially the same as those of gfl6(r)(X). We begin by considering

Egah(TW. Now

where

From Theorem 4.3.2

and so

giving (7.7.13).Next, from Theorem 7.2.2


We next show that

uniformly in a. As

we may write (**) as

where from Lemma P4.1

for some finite H. A similar result holds for the second term of the integral.The covariance being evaluated thus has the form

and the desired (7.7.14) follows.Finally, we consider the magnitude of the joint cumulants of order K. We

neglect the subscripts a, b henceforth. We have


where the summation is over all indecomposable partitions v = (vi,... ,vp)of the table

As the partition is indecomposable, in each set vp of the partition we mayfind an element tp*, so that none of tj — tp*,j£vp,p = 1, . . . ,P is afi/-i — t2i, 1 = 1,2,... ,L. Define 2L — P new variables MI, . . . , UIL-Pas the nonzero tj — tp*. The cumulant (*) is now bounded by

In the next to last expression, Cn is given by (2.6.7) and «/ denotes the num-ber of elements in theyth set of the partition v. We see that the standardizedjoint cumulant

c u m {(*iT)1/2g(r>(*i), • • • > <BrT)"*gV>(\L)}

for L > 2, tends to 0 as Y—> «. This means that the variates ga$?Ai),.. .,Bog-b^K) are asymptotically normal with the moment structure of the

for some finite M where ai,.. ., «2L are selected from 1,. . ., 2L and0i,. . . , j82L are selected from 1,. . . , P. Defining <f>(tj) = tp*, j£v\, weapply Lemma 2.3.1 to see that there are P — I linearly independent differ-ences among the fo* — fy,*,.. ., ^2£,-, — $tL. For convenience supposethese are tp* — t^*,. .. , ^2P_3 — t^p_v Making a final change of variables

we see that the cumulant (*) is bounded by


theorem. From Lemma P7.2 the same is true of the f(T) and we have thetheorem.Proof of Corollary 7.7.7 Immediate from (7.7.13).Proof of Theorem 7.7.2 Follows directly from Theorem 4.5.1.Proof of Theorem 7.7.3 We prove this theorem by means of a sequence oflemmas paralleling those used in the proof of Theorem 4.5.1. FollowingLemma P7.2 it is enough to consider ga*(r)(X) corresponding to the 0 meancase. In the lemmas below we use the notation

Lemma P7.3 Under the conditions of Theorem 7.7.3, for given X, e and asufficiently small

Proof In the course of the proof of Theorem 7.7.1 we saw that

for some finite M. Therefore

The indicated expression now follows from (7.7.21) on taking \a\ sufficientlysmall and the fact that, from (7.7.14),

In the discussion below let

Corollary Under the conditions of Theorem 7.7.3, for given ft

for T sufficiently large.

Lemma P7.4 Let \r = 2irr/JR, r = 0 , . . . , / ? — 1 for some integer R, then

for some finite K.

From Exercise 3.10.28 the final integral here is O(«1/2fc). From the proofof Theorem 7.7.1, E\fab™(a) - Efab

(T)(a)\2k = O(BT~kT-^ This givesWW/^r'supx \fab(T)(\) - £/fl*

(r)(X)|]2fc = O(5^-i). Taking k sufficientlylarge gives the two results of the theorem.

Proof of Theorem 7.7.5 We have


Proof We first note that because M>(M) is 0 for sufficiently large \u\, gab(T)(\)is an entire function of order ^ KBf~l. The inequality of Lemma P7.4 nowfollows in the manner of Corollary 2.1 in Woodroofe and Van Ness (1967)using Bernstein's inequality for entire functions of finite order (see Timan(1963)).

Lemma P7.5 For T sufficiently large

for some finite N.

The proof of the theorem is now completed by developing similar lemmasfor Im gfl6

(r)(X) and applying the Borel-Cantelii lemma.

Proof of Theorem 7.7.4 Suppose wab(u) = 0 for \u\> 1. Then/flfr(r)(X) is a

trigonometric polynomial of degree BT~I — n. Exercise 3.10.35(b) gives

for positive integers k. From the proof of Theorem 7.4.4

uniformly in X. This gives


Proof of Theorem 8.2.1 We may write


Taking k sufficiently large gives the two results of the theorem.

Proof of Theorem 7.9.1 From Lemma P6.3 and Theorem 4.4.2 we maywrite

where i\ is #ic(0,/aa(X)), 0,, j = I,..., J are independent Nic(Qtfp^X)) and£jk> J — 1, • • - , • / , k = 1,. . ., K are independent N\ C(0,/,,(X)). It followsthat

By evaluating covariances, we see that the i> — fy-., the £/. — f.. + 0,- — 0.and the f.. -f- 0. -4-17 are statistically independent. This implies that thestatistics of the theorem are asymptotically independent.

We have the identity

Exercise 4.8.7 applies indicating that ^ If;* ~ f;-l2 *s distributed as/,(A)xi/(A--n/2. We also have the identity

and Exercise 4.8.7 again applies to indicate that 2) |fy. — f.. + Bj — 0.|2 isdistributed as [/^X) + ^-%(X)]xia-i)/2. Finally, |f.. + 0. + i»|2 is dis-tributed as [/aa(X) + J~1M\) + y-'^-1/,(X)]x22/2. This completes theproof of the theorem.


with equality achieved by the choices (8.2.14) and (8.2.15).

Before proving Theorem 8.2.2, we state a lemma of independent interest.

Lemma P8.1 Suppose the conditions of Theorem 8.2.1 are satisfied. Thes vector-valued function 4>(X), with £<j>(X)T«j>(X) < » minimizing

is given by the conditional expected value

Proof We may write (*) as

with equality achieved by the indicated <J>(X).

Proof of Theorem 8.2.2 If the variate (8.2.10) is normal, it is a classicalresult, given in Anderson (1957) for example, that

and the theorem follows from Lemma P8.1.Proof of Theorem 8.2.3 We prove Theorem 8.2.5 below, this theoremfollows in a similar manner.Proof of Theorem 8.2.4 This follows directly as did the proof of Theorem8.2.1.Proof of Theorem 8.2.5 Let x, y denote the matrices (8.2.25) and (8.2.26)respectively. We may write

with a = J^YX^XX~I and e = y — ax. The columns of e are independent7V.tc(0,S,,) variates. Also e is independent of x.

For fixed x it therefore follows from Exercise 6.12.20 that vec (a — a) isdistributed as Nrs

c(0,^tt (xXxx*)"1) and S«, is independently (n — r)~l

Wsc(n - r,S.,). For fixed x the distribution of (8.2.53) is therefore f2<»-/).

As this distribution does not depend on x, it is also the unconditional distri-bution. Next £{a | x} = a and so Ea. = a as desired. Also


As £(xxr)~! = (n - r)"1 .̂"1 (see Exercise 8.16.47) and cov {£a | x, £a | xj= 0, we have (8.2.54). The asymptotic normality of a follows from the jointasymptotic normality of the entries of yxT and XXT and the fact that a is adifferentiable function of those entries using Theorem P5.2.

It remains to demonstrate the independence of a and £.,. In terms ofprobability density functions we may write

from the conditional independence indicated above. It follows that

and the proof is completed.Proof of Theorem 8.3.1 Let A(X) be the transfer function of {a(w)}. Weshall see that it is well defined. We may write expression (8.3.2) in the form

with equality under the choices (8.3.3) and (8.3.5).The fact that A(X) given by (8.3.5) is the Fourier transform of an ab-

solutely summable function follows from Theorem 3.8.3 and the fact thatfxxO^) is nonsingular — °° < X < oo.Proof of Theorem 8.3.2 We have seen that we can write

with Ee(f) = 0, cov [\(t + u), t(i)\ = 0 for all u. Because the series arejointly normal this 0 covariance implies that X(/ + «) and t(t) are statis-tically independent for all u. We have, therefore,

giving the required (8.3.21) and (8.3.22).Proof of Theorem 8.5.1 This follows from Theorem 7.3.3 and Theo-rem P5.1.Proof of Theorem 8.6.1 Under the indicated assumptions it follows fromTheorem 7.4.1 that

Covariances, to first asymptotic order, coming out of the perturbation ex-pansions, will therefore be the same as those based on the variate (*). FromTheorem 8.2.5 we can now say that

as the limiting distribution of Theorem 8.2.5 is complex normal. Also here

here. From Theorem 7.4.3 we can say that

here.In the case that X + M — 0 (mod lir) and X ̂ 0 (mod 2-ir) we can say


The statistics $T\ <t>jk(T\ Gjk

(T\ R^k.x, \RYx(T)\2 are each differentiablefunctions of fxx(T\X), f;ry(r)(X), fyy(r)(X). The indicated expressions nowfollow from a theorem of Brillinger and Tukey (1964).Proof of Corollary 8.6,1 This follows directly from the expressions (8.6.11)to (8.6.15) and the convergence theorem of Exercise 1.7.4.Proof of Theorem 8.7.1 A(r)(X), g,.(r)(X), R™($, \R™(\)\2 are all differ-entiable functions of the entries of f*jr(r)(X), fy*(r)(X), fyy(r)(X) and soperturbation expansions such as

may be set down and used with Theorem 7.4.3 to deduce the indicatedasymptotic covariances. In fact it is much more convenient to take ad-vantage of the results of Section 8.2 to deduce the form of the covariances.

We begin by noting, from Corollary 7.4.3, that the covariances of variatesat frequencies X, n are o(Br~lT~i) unless X — / * o r X + ju = 0 (mod 2ir).

Suppose X — n = 0 (mod 2-ir) and X ̂ 0 (mod 2x). The asymptotic co-variance structure of

is seen to be the same as that of

where


In the case that X, n = 0 (mod 2-n), the statistics are real-valued and wemust make use of Theorem 8.2.3 instead. We see that here

This completes the development of expressions (8.7.1) and (8.7.2). Ex-pressions (8.7.3) and (8.7.4) follow from Theorems 8.2.5 and 7.6.2.Proof of Theorem 8.8.1 This follows from the remarks made at the be-ginning of the proof of Theorem 8.7.1, Theorem 7.4.4 and Theorem P5.2.

The asymptotic independence of A(r) and gtt(T) follows from their

negligible covariance indicated in Theorem 8.2.5.

Before proving Theorem 8.10.1 it will be convenient to set down somenotation and a lemma. If \p = lirp/Pr, p = 0, . . ., PT-I, we define

We can now state


for any d > 0.Proof We have the identity

The norm of the right side here is bounded by

and = Op(PTiB~{l2T~{12) uniformly in p. This gives the lemma.

for any e > 0. It follows that

if -Y ^ (3 with Y ̂ 0.From Theorem 7.7.5

will be made up of two parts, a term involving only second-order spectraand a term involving fourth-order spectra.

From our investigation of A(r)(X), we can say that the contribution of theterm in second-order spectra to cov {vec CP, vec £,} is asymptotically


Proof of Theorem 8.10.1 We must investigate the asymptotic behaviorof the

We begin by noting that E vec C/> = 0. Next, because PrBr ^ 1 and W(a)vanishes for \a\ > IT, Exercise 7.10.41 takes the form

It follows that the covariance matrix of the variate

Suppose we denote the term in cov {vec $p, vec $q\ that involves fourth-order spectra by (2ir/T)Vpq. Because of the model Y(f) = y + S a(/ — u)X X(w) + t(0, with the series t(t) independent of the series X(0, the corre-sponding terms in cov {vec ap, vec (5?} and cov {vec up, vec a,} will be

It follows that their contribution to cov {vec <p, vec C9} will be

as A(XP)^ ApBp-1.We may deduce from all this that


Exercise 7.10.42 may next be invoked to conclude that PT~I £ exp {i\pu}vec (P is asymptotically normal. Putting this together we have the desiredresult.

Proof of Theorem 8.10.2 By substitution

therefore

This last may be rewritten

From Exercise 7.10.36, c,*(r)(0) is asymptotically normal with mean 0 and

and so

This gives the indicated asymptotic distribution for vec a(r) as cxx(T)(Q)~~l

tends to cxx(ty~l in probability._Because £X(0 = 0, cx

(T) = o/l) and (*) shows that V?Vr) - tf) =Vr cf

(T) + op(l) giving the indicated limiting distribution for y(r) fromTheorem 4.4.1. The asymptotic independence of a(r) and y(7') follows fromthe asymptotic independence of c,(r) and c«A-(r)(0):

Continuing


It follows that


Proof of Theorem 9.2.1 We prove Theorem 9.2.3 below; Theorem 9.2.1follows in a similar manner.

Proof of Theorem 9.2.2 We prove Theorem 9.2.4 below; Theorem 9.2.2follows in a similar manner.

Proof of Theorem 9.2.3 We prove that the yth latent root of (9.2.17) is^ juy+4 with equality achieved by the indicated y, B, C.

In view of the previously determined asymptotic distributions of fÂ-(r)(X),fA-«(r)(X) we have from the last expression

giving the indicated asymptotic distribution for feem(\).

Before proving Theorem 8.11.1 we first set down a lemma.

Lemma P8.2 Let Xr, T = 1, 2 , . . . be a sequence of vector-valued randomvariables, y a constant vector and ax, T = 1, 2 , . . . a sequence of constantstending to 0 with T. Suppose

with probability 1. Let f(x) have a continuous first derivative in a neighbor-hood of y with |f'(y)| ^ 0. Then

with probability 1.Proof With probability 1, Xr will be in the indicated neighborhood of y. forall large T. Take it as being there. Next because f(x) has a first derivativewe have

for some < in the neighborhood. Because of the continuity of f'(x), f'(C) be-comes arbitrarily close to f'(y) as T —> « and we have the indicated result.

Proof of Theorem 8.11.1 The theorem follows from Lemma P8.2, Theo-rem 7.7.3, and Theorem 7.4.2.


From Theorem 8.2.4

because the matrix has rank q -f / — 1 at most. We quickly check that

the indicated p, B, C lead to a matrix (9.2.17) of the form (9.2.21). Nowequality in the above inequalities is achieved by the indicated choices be-cause the /th latent root of (9.2.21) is /*«+,.

We have here presented a complex version of the arguments of Okamotoand Kanazawa (1968).

Proof of Theorem 9.2.4 We have the Taylor series expansions

See Wilkinson (1965) p. 68.We see that t>xx is asymptotically normal with mean Exx and

This implies the useful result of Exercise 4.8.36(b)

for r vectors a, (3, y, 5. The indicated asymptotic moments now follow

where

The matrix D has rank ^ q. Now

where L is (i — 1) X r. This is


directly from (*) and (**) using these expressions. For example,

We see that the second term is minimized if we minimize

where V/(X) is they'th latent vector of fxx(W2 and a fortiori of fxx(ty- Theindicated B(X), C(X) are now seen to achieve the desired minimization.Proof of Theorem 9.3.2 The cross-spectrum of f/O with f *(0 is given by

giving the indicated results.Proof of Theorem 9.3.3 Because the latent roots of fxx(ty are simple forall X, its latent roots and vectors will be real holomorphic functions of itsentries, see Exercises 3.10.19 to 3.10.21.

giving the indicated covariances because as the Vy are latent vectors

The asymptotic normality follows from the asymptotic normality of t,xxand Theorem P5.2.Proof of Theorem 9.3.1 We may write (9.3.3) as

where A(«) = C(a)B(a). We may make the first term 0 by setting

for each a with A(a) of rank ^ q. From Theorem 3.7.4 we see that weshould take


Expressions (9.3.29) and (9.3.30) now follow from Theorem 3.8.3. Ex-pressions (9.3.31) and (9.3.32) follow directly from these and from ex-pression (9.3.28).Proof of Theorem 9.3.4 The desired B/A) must be some linear combina-tion of the V*(A)T, k — 1,. . ., say

expression (9.4.5) now follows.Expressions (9.4.6) and (9.4.7) result from the following Taylor series ex-

pansions set down in the course of the proof of Theorem 9.2.4:

The desired series is orthogonal to £*(f), k <j and so it must have G/*(A) = 0for k < j. The variance of (9.3.33) may be written

with ̂ k |(//jt(A)|2 = 1. This variance is clearly maximized by taking

and we have the result.

Proof of Theorem 9.3.5 The spectral density matrix of (9.3.35) is givenby

where A(X) = C(A)B(A). We see from Theorem 9.2.3 that the latent roots ofthe latter are minimized by the indicated B(A), C(A).Proof of Theorem 9.4.1 From the Wielandt-Hoffman theorem (see Wilkin-son (1965))

Also from Theorems 7.4.1 and 7.4.3

As

Proof of Theorem 9.4.2 Expressions (9.4.13) and (9.4.14) follow from thefollowing expressions given in the proof of Theorem 9.2.4:

under the given conditions.

Proof of Theorem 9.4.3 This follows from the expressions of the proof ofTheorem 9.4.1 in the manner of the proof of Theorem 9.2.4.

Proof of Theorem 9.4.4 The latent roots and vectors of a matrix are con-tinuous functions of its entries. This theorem consequently follows fromTheorem 7.3.3 and Theorem P5.1.

The minimum achieved is seen to be as stated.


and the result (7.4.13)


Proof of Theorem 10.2.1 Let A = CB, and write (10.2.5) as

From Theorem 3.7.4 this is minimized by setting

or

with UTU = I. Now, the latent roots that appear are maximized by takingthe columns of U to be the first q latent vectors of S^^SyjfSjr^'ZA-rZyJ'2;see Bellman (1960) p. 117. The theorem follows directly.Proof of Theorem 10.2.3 This follows as the proof of Theorem 10.2.6given below.Proof of Theorem 10.2.4 This follows as did the proof of Theorem 10.2.1.Proof of Theorem 10.2.5 This follows as did the proof of Theorem 10.2.2.Proof of Theorem 10.2.6 Let A** = t,xx — %xx with a similar definitionfor A*y, Ayy. Proceeding in the manner of Wilkinson (1965) p. 68 orDempster (1966) we have the expansions


Proof of Theorem 10.2.2 First take E as fixed. Then Theorem 10.2.1 indi-cates that the minimum with respect to y and D is

Let U = Es]/r2, then write

where

and

Using the expression developed in the course of the proof of Theorem9.2.4 we see that

ifj=k,l = m and equals 0 otherwise. Similarly

if j = m, I = k and equals 0 otherwise.


Continuing

if j = m, I = k and so on.The expansions above and these moments now give the indicated first-

and second-order asymptotic moments. The asymptotic normality followsfrom the asymptotic normality of the t*xx, %XY, and %YY and the fact thatthe latent roots and vectors are differentiable functions of these matricesthrough Theorem P5.2.

Proof of Theorem 10.3.1 The expression (10.3.3) may be written

and we see that we should choose y so that £¥(/) = £Y*(0- Now

It therefore follows from Corollary 3.7.4 that expression (10.3.3) is mini-mized by the indicated B(a) and C(a).Proof of Corollary 10.3.1 This result follows from an application of Theo-rem 10.3.1 to the transformed variate

noting, for example, that

for this series.Proof of Theorem 10.3.2 We are interested in the coherence

having defined


for B'(A) orthogonal to Vi(X),. . . , V/_i(X) the first j — 1 latent vectors of^YYI2^YX^XX~^XY^YY12 by Exercise 3.10.26. Expression (10.3.25) indicatesthat B/X) is as indicated in the theorem; that A/X) achieves equality followsby inspection.Proof of Theorem 103.3 Because the latent roots of fyxfxx~{fxY aresimple for all X, its latent roots and vectors are real holomorphic functionsof the entries, see Exercises 3.10.19-21. Expressions (10.3.28) and (10.3.29)now follow from Theorem 3.8.3. Expression (10.3.30) follows from (10.3.26)to (10.3.29).Proof of Theorem 10.3.4 Because the latent roots of {î2frxfxx~lfxYf^12

are simple for all X, its latent roots and vectors are real holomorphic func-tions of its entries; see Exercises 3.10.19 to 3.10.21. Expressions (10.3.33)and (10.3.34) now follow from Theorem 3.8.3. That the spectral density is(10.3.36) either follows from Theorem 10.3.1 or by direct computation.Proof of Theorem 10.4.1 This follows as did the proof of Theorem 9.4.1with the exception that the perturbation expansions of the proof of Theorem10.2.6 are now used.Proof of Theorem 10.4.2 This follows from the above perturbation ex-pansions in the manner of the proof of Theorem 10.2.6.Proof of Theorem 10.4.3 The fa, A7, and B7 are continuous functions ofthe entries of (10.4.25). The theorem consequently follows from Theorem7.3.3 and Theorem P5.1.

By Schwarz's inequality the coherency is

REFERENCES

ABELSON, R. (1953). Spectral analysis and the study of individual differences. Ph.D.Thesis, Princeton University.

ABRAMOWITZ, M., and STEGUN, I. A. (1964). Handbook of MathematicalFunctions. Washington: National Bureau of Standards.

ACZIiL, J. (1969). On Applications and Theory of Functional Equations. Basel:Birkhauser.

AITKEN, A. C. (1954). Determinants and Matrices. London: Oliver and Boyd.AKAIKE, H. (1960). "Effect of timing-error on the power spectrum of sampled

data." Ann. Inst. Statist. Math. 11:145-165.AKAIKE, H. (1962a). "Undamped oscillation of the sample autocovariance

function and the effect of prewhitening operation." Ann. Inst. Statist. Math.13:127-144.

AKAIKE, H. (1962b). "On the design of lag windows for the estimation of spectra."Ann. Inst. Statist. Math. 14:1-21.

AKAIKE, H. (1964). "Statistical measurement of frequency response function."Ann. Inst. Statist. Math., Supp. III. 15:5-17.

AKAIKE, H. (1965). "On the statistical estimation of the frequency response func-tion of a system having multiple input." Ann. Inst. Statist. Math. 17:185-210.

AKAIKE, H. (1966). "On the use of a non-Gaussian process in the identification ofa linear dynamic system." Ann. Inst. Statist. Math. 18:269-276.

AKAIKE, H. (1968a). "Low pass filter design." Ann. Inst. Statist. Math. 20:271-298.AKAIKE, H. (1968b). "On the use of an index of bias in the estimation of power

spectra." Ann. Inst. Statist. Math. 20:55-69.AKAIKE, H. (1969a). "A method of statistical investigation of discrete time para-

meter linear systems." Ann. Inst. Statist. Math. 21:225-242.AKAIKE, H. (1969b). "Fitting autoregressive models for prediction." Ann. Inst.

Statist. Math. 21:243-247.

461

462 REFERENCES

AKAIKE, H., and KANESHIGE, I. (1964). "An analysis of statistical response ofbackrash." Ann. Inst. Statist. Math., Supp. III. 15:99-102.

AKAIKE, H., and YAMANOUCHI, Y. (1962). "On the statistical estimation offrequency response function." Ann. Inst. Statist. Math. 14:23-56.

AKCASU, A. Z. (1961). "Measurement of noise power spectra by Fourier analysis."J. Appl. Physics. 32:565-568.

AKHIEZER, N. I. (1956). Theory of Approximation. New York: Ungar.ALBERT, A. (1964). "On estimating the frequency of a sinusoid in the presence of

noise." Ann. Math. Statist. 35:1403.ALBERTS, W. W., WRIGHT, L. E., and FEINSTEIN, B. (1965). "Physiological

mechanisms of tremor and rigidity in Parkinsonism. Confin. Neural. 26:318-327.ALEXANDER, M. J., and VOK, C. A. (1963). Tables of the cumulative distribution

of sample multiple coherence. Res. Rep. 63-67. Rocketdyne Division, NorthAmerican Aviation Inc.

AMOS, D. E., and KOOPMANS, L. H. (1962). Tables of the distribution of thecoefficient of coherence for stationary bivariate Gaussian processes. SandiaCorporation Monograph SCR-483.

ANDERSON, G. A. (1965). "An asymptotic expansion for the distribution of thelatent roots of the estimated covariance matrix." Ann. Math. Statist. 36:1153-1173.

ANDERSON, T. W. (1957). An Introduction to Multivariate Statistical Analysis.New York: Wiley.

ANDERSON, T. W. (1963). "Asymptotic theory for principal component analysis."Ann. Math. Statist. 34:122-148.

ANDERSON, T. W. (1971). Statistical Analysis of Time Series. New York: Wiley.ANDERSON, T. W., and WALKER, A. M. (1964). "On the asymptotic distribu-

tion of the autocorrelations of a sample from a linear stochastic process." Ann.Math. Statist. 35:1296-1303.

ARATO, M. (1961). "Sufficient statistics of stationary Gaussian processes."Theory Prob. Appl. 6:199-201.

ARENS, R., and CALDER6N, A. P. (1955). "Analytic functions of severalBanach algebra elements." Ann. Math. 62:204-216.

ASCHOFF, J. (1965). Circadian Clocks. Amsterdam: North Holland.AUTONNE, L. (1915). "Sur les matrices hypohermitiennes et sur les matrices

unitaires." Ann. Univ. Lyon. 38:1-77.BALAKRISHNAN, A. V. (1964). "A general theory of nonlinear estimation prob-

lems in control systems." /. Math. Anal. App. 8:4-30.BARLOW, J. S. (1967). "Correlation analysis of EEG-tremor relationships in man."

In Recent Advances in Clinical Neurophysiology, Electroenceph. Clin. Neuro-physiol., Suppl. 25:167-177.

BARTLETT, M. S. (1946). "On the theoretical specification of sampling propertiesof auto-correlated time series." J. Roy. Statist. Soc., Suppl. 8:27-41.

BARTLETT, M. S. (1948a). "A note on the statistical estimation of supply anddemand relations from time series." Econometrica. 16:323-329.

BARTLETT, M. S. (1948b). "Smoothing periodograms from time series with con-tinuous spectra." Nature. 161:686-687.

REFERENCES 463

BARTLETT, M. S. (1950). "Periodogram analysis and continuous spectra."Biometrika. 37:1-16.

BARTLETT, M. S. (1966). An Introduction to Stochastic Processes, 2nd ed. Cam-bridge: Cambridge Univ. Press.

BARTLETT, M. S. (1967). "Some remarks on the analysis of time series." Bio-metrika. 50:25-38.

BASS, J. (1962a). "Transformees de Fourier des fonctions pseudo-aleatoires."C. R. Acad. Scl 254:3072.

BASS, J. (1962b). Les Fonctions Pseudo-aleatoires. Paris: Gauthier-Villars.BATCHELOR, G. K. (1960). The Theory of Homogeneous Turbulence. Cambridge.

Cambridge Univ. Press.BAXTER, G. (1963). "A norm inequality for a finite section Weiner-Hopf equa-

tion." ///. J. Math. 7:97-103.BELLMAN, R. (1960). Introduction to Matrix Analysis. New York: McGraw-Hill.BEND AT, J. S., and PIERSOL, A. (1966). Measurement and Analysis of Random

Data. New York: Wiley.BERANEK, L. L. (1954). Acoustics. New York: McGraw-Hill.BERGLAND, G. D. (1967). "The fast Fourier transform recursive equations for

arbitrary length records." Math. Comp. 21:236-238.BERNSTEIN, S. (1938). "Equations differentielles stochastiques." Act. Sci. Ind.

738:5-31.BERTRAND, J., and LACAPE, R. S. (1943). Theorie de r Electro-encephalogram.

Paris: G. Doin.BERTRANDIAS, J. B. (1960). "Sur le produit de deux fonctions pseudo-aleatoires."

C. R. Acad. Sci. 250:263BERTRANDIAS, J. B. (1961). "Sur 1'analyse harmonique generalisee des fonctions

pseudo-aleatoires." C. R. Acad. Sci. 253:2829.BEVERIDGE, W. H. (1921). "Weather and harvest cycles." Econ. J. 31:429.BEVERIDGE, W. H. (1922). "Wheat prices and rainfall in Western Europe." J.

Roy. Statist. Soc. 85:412-459.BILLINGSLEY, P. (1965). Ergodic Theory and Information. New York: Wiley.BILLINGSLEY, P. (1966). "Convergence of types in £-space." Zeit. Wahrschein.

5:175-179.BILLINGSLEY, P. (1968). Convergence oj'Probability Measures. New York: Wiley.BINGHAM, C, GODFREY, M. D., and TUKEY, J. W. (1967). "Modern tech-

niques in power spectrum estimation." IEEE Trans. Audio Electroacoust. AU-15:56-66.

BLACKMAN, R. B. (1965). Linear Data Smoothing and Prediction in Theory andPractice. Reading, Mass.: Addison-Wesley.

BLACKMAN, R. B., and TUKEY, J. W. (1958). "The measurement of powerspectra from the point of view of communications engineering." Bell Syst.Tech. J. 37:183-282, 485-569.

BLANC-LAPIERRE, A., and FORTET, R. (1953). Theorie des Fonctions Aleatoires.Paris: Masson.

BLANC-LAPIERRE, A., and FORTET, R. (1965). Theory of Random Functions.New York: Gordon and Breach. Translation of 1953 French edition.

464 REFERENCES

BOCHNER, S. (1936). "Summation of multiple Fourier series by spherical means."Trans, Amer. Math. Soc. 40:175-207.

BOCHNER, S. (1959). Lectures on Fourier Integrals. Princeton: Princeton Univ.Press.

BOCHNER, S., and MARTIN, W. T. (1948). Several Complex Variables. Prince-ton: Princeton Univ. Press.

BODE, H. W. (1945). Network Analysis and Feedback Amplifier Design. NewYork: Van Nostrand.

BOHMAN, H. (1960). "Approximate Fourier analysis of distribution functions."Ark. Mat. 4:99-157.

BORN, M., and WOLF, E. (1959). Principles of Optics. London: Pergamon.BOWLEY, A. L. (1920). Elements of Statistics. London: King.BOX, G. E. P. (1954). "Some theorems on quadratic forms applied in the study of

analysis of variance problems." Ann. Math. Statist. 25:290-302.BOX, G. E. P. and JENKINS, G. M. (1970). Time Series Analysis, Forecasting

and Control. San Francisco: Holden-Day.BRACEWELL, R. (1965). The Fourier Transform and its Applications. New York:

McGraw-Hill.BRENNER, J. L. (1961). "Expanded matrices from matrices with complex ele-

ments." SIAM Review. 3:165-166.BRIGHAM, E. O., and MORROW, R. E. (1967). "The fast Fourier transform."

IEEE Spectrum. 4:63-70.BRILLINGER, D. R. (1964a). "The generalization of the techniques of factor

analysis, canonical correlation and principal components to stationary timeseries." Invited paper at Royal Statistical Society Conference in Cardiff, Wales.Sept. 29-Oct. 1.

BRILLINGER, D. R. (1964b). "A technique for estimating the spectral densitymatrix of two signals." Proc. I.E.E.E. 52:103-104.

BRILLINGER, D. R. (1964c). "The asymptotic behavior of Tukey's generalmethod of setting approximate confidence limits (the jackknife) when appliedto maximum likelihood estimates." Rev. Inter. Statis. Inst. 32:202-206.

BRILLINGER, D. R. (1965a). "A property of low-pass filters. "SIAM Review.7:65-67.

BRILLINGER, D. R. (1965b). "An introduction to polyspectra." Ann. Math.Statist. 36:1351-1374.

BRILLINGER, D. R. (1966a). "An extremal property of the conditional expecta-tion." Biometrika. 53:594-595.

BRILLINGER, D. R. (1966b). "The application of the jackknife to the analysis ofsample surveys." Commentary. 8:74-80.

BRILLINGER, D. R. (1968). "Estimation of the cross-spectrum of a stationarybivariate Gaussian process from its zeros." J. Roy. Statist. Soc., B. 30:145-159.

BRILLINGER, D. R. (1969a). "A search for a relationship between monthlysunspot numbers and certain climatic series. "Bull. ISI. 43:293-306.

BRILLINGER, D. R. (1969b). "The calculation of cumulants via conditioning."Ann. Inst. Statist. Math. 21:215-218.

BRILLINGER, D. R. (1969c). "Asymptotic properties of spectral estimates ofsecond-order." Biometrika. 56:375-390.

REFERENCES 465

BRILLINGER, D. R. (1969d). "The canonical analysis of stationary time series."In Multivariate Analysis — II, Ed. P. R. Krishnaiah, pp. 331-350. New York:Academic.

BRILLINGER, D. R. (1970a). "The identification of polynomial systems by meansof higher order spectra." J. Sound Vib. 12:301-313.

BRILLINGER, D. R. (1970b). "The frequency analysis of relations between sta-tionary spatial series." Proc. Twelfth Bien. Sem. Canadian Math. Congr, Ed.R. Pyke, pp. 39-81. Montreal: Can. Math. Congr.

BRILLINGER, D. R. (1972). "The spectral analysis of stationary interval func-tions." In Proc. Seventh Berkeley Symp. Prob. Statist. Eds. L. LeCam, J. Ney-man, and E. L. Scott, pp. 483-513. Berkeley: Univ. of California Press.

BRILLINGER, D. R. (1973). "The analysis of time series collected in an experi-mental design." Multivariate Analysis — III, Ed. P. R. Krishnaiah, pp. 241-256. New York: Academic.

BRILLINGER, D. R., and HATANAKA, M. (1969). "An harmonic analysis ofnonstationary multivariate economic processes. "Econometrica. 35:131-141.

BRILLINGER, D. R., and HATANAKA, M. (1970). "A permanent income hy-pothesis relating to the aggregate demand for money (an application of spectraland moving spectral analysis)." Economic Studies Quart. 21:44-71.

BRILLINGER, D. R., and ROSENBLATT, M. (1967a). "Asymptotic theory of/c-th order spectra." Spectral Analysis of Time Series, Ed. B. Harris, pp. 153-188.New York: Wiley.

BRILLINGER, D. R., and ROSENBLATT, M. (1967b). "Computation and inter-pretation of Ar-th order spectra." In Spectral Analysis of Time Series, Ed. B.Harris, pp. 189-232. New York: Wiley.

BRILLINGER, D. R., and TUKEY, J..W. (1964). Asymptotic variances, moments,cumulants and other average values. Unpublished manuscript.

BRYSON, R. A., and DUTTON, J. A. (1961). "Some aspects of the variancespectra of tree rings and varves." Ann. New York Acad. Sci. 95:580-604.

BULLARD, E. (1966). "The detection of underground explosions." ScL Am. 215:19.BUNIMOVITCH, V. I. (1949). The fluctuation process as a vibration with random

amplitude and phase." J. Tech. Phys. (USSR) 19:1237-1259.BURGERS, J. M. (1948). "Spectral analysis of an irregular function." Proc. Acad.

Sci. Amsterdam. 51:1073.BURKHARDT, H. (1904). "Trigonometrische Reihen und Integrale." Enzykl.

Math. Wiss. 2:825-1354.BURLEY, S. P. (1969). "A spectral analysis of the Australian business cycle."

Austral. Econ. Papers. 8:193-128.BUSINGER, P. A., and GOLUB, G. H. (1969). "Singular value decomposition of a

complex matrix." Comm. ACM. 12:564-565.BUTZER, P. L., and NESSEL, R. J. (1971). Fourier Analysis and Approximations,

Vol. 1. New York: Academic.CAIRNS, T. W. (1971). "On the fast Fourier transform on a finite Abelian group."

IEEE Trans. Computers. C-20:569-571.CAPON, J. (1969). "High resolution frequency wavenumber spectral analysis."

Proc. I.E.E.E. 57:1408-1418.

466 REFERENCES

CAPON, J. and GOODMAN, N. R. (1970). "Probability distributions for estima-tors of the frequency wavenumber spectrum." Proc. I.E.E.E. 58:1785-1786.

CARGO, G. T. (1966). "Some extension of the integral test." Amer. Math. Monthly.73:521-525.

CARPENTER, E. W. (1965). "Explosions seismology." Science. 147:363-373.CARTWRIGHT, D. E. (1967). "Time series analysis of tides and similar motions of

the sea surface." /. Appl. Prob. 4:103-112.CHAMBERS, J. M. (1966). Some methods of asymptotic approximation in multi-

variate statistical analysis. Ph.D. Thesis, Harvard University.CHAMBERS, J. M. (1967). "On methods of asymptotic approximation for multi-

variate distributions." Biometrika. 54:367-384.CHANCE, B., PYE, K., and HIGGINS, J. (1967). "Waveform generation by

enzymatic oscillators." IEEE Spectrum. 4:79-86.CHAPMAN, S., and BARTELS, J. (1951). Geomagnetism, Vol. 2. Oxford: Oxford

Univ. Press.CHERNOFF, H., and LIEBERMAN, G. J. (1954). "Use of normal probability

paper." J. Amer. Statist. Assoc. 49:778-785.CHOKSI, J. R. (1966). "Unitary operators induced by measure preserving trans-

formations." J. Math, and Mech. 16:83-100.CHOW, G. C. (1966). "A theorem on least squares and vector correlation in multi-

variate linear regression." /. Amer. Statist. Assoc. 61:413-414.CLEVENSON, M. L. (1970). Asymptotically efficient estimates of the parameters of

a moving average time series. Ph.D. Thesis, Stanford University.CONDIT, H. R., and GRUM, F. (1964). "Spectral energy distribution of daylight."

J. Optical Soc. Amer. 54:937-944.CONSTANTINE, A. G. (1963). "Some noncentral distributions in multivariate

analysis." Ann. Math. Statist. 34:1270-1285.COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1967a). "Historical notes

on the fast Fourier transform." IEEE Trans, on Audio and Electroacoustics.AU-15:76-79.

COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1967b). The fast Fouriertransform algorithm and its applications. IBM Memorandum RC 1743.

COOLEY, J. W., LEWIS, P. A. W., and WELCH, P. D. (1970). "The application ofthe Fast Fourier Transform Algorithm to the estimation of spectra and cross-spectra." J. Sound Vib. 12:339-352.

COOLEY, J. W., and TUKEY, J. W. (1965). "An algorithm for the machine cal-culation of complex Fourier series." Math. Comp. 19:297-301.

COOTNER, P. H. (1964). The Random Character of Stock Market Prices. Cam-bridge: MIT Press.

COVEYOU, R. R., and MACPHERSON, R. D. (1967). "Fourier analysis of uni-form random number generators."/. Assoc. Comp. Mach. 14:100-119.

CRADDOCK, J. M. (1965). "The analysis of meteorological time series for use inforecasting." Statistician. 15:167-190.

CRADDOCK, J. M., and FLOOD, C. R. (1969). "Eigenvectors for representingthe 500 mb geopotential surface over the Northern Hemisphere." Quart. J.Roy. Met. Soc. 95:576-593.

REFERENCES 467

CRAMER, H. (1939). "On the representation of functions by certain Fourierintegrals." Trans. Amer. Math. Soc. 46:191-201.

CRAMER, H. (1942). "On harmonic analysis in certain functional spaces." ArkivMath. Astr. Fysik. 28:1-7.

CRAMER, H., and LEADBETTER, M. R. (1967). Stationary and Related Sto-chastic Processes. New York: Wiley.

CRANDALL, I. B. (1958). Random Vibration, I. Cambridge: MIT Press.CRANDALL, I. B. (1963). Random Vibration, II. Cambridge: MIT Press.CRANDALL, I. B., and SACIA, C. F. (1924). "A dynamical study of the vowel

sounds." BellSyst. Tech. J. 3:232-237.DANIELL, P. J. (1946). "Discussion of paper by M. S. Bartlett," J. Roy. Statist.

Soc., Suppl. 8:27.DANIELS, H. E. (1962). "The estimation of spectral densities." J. Roy. Statist. Soc.,

B. 24:185-198.DARROCH, J. N. (1965). "An optimal property of principal components." Ann.

Math. Statist. 36:1579-1582.DARZELL, J. F., and PIERSON, W. J., Jr. (1960). The apparent loss of coherency

in vector Gaussian processes due to computational procedures with applicationsto ship motions and random seas. Report of Dept. of Meteorology and Oceano-graphy, New York University.

DAVIS, C., and KAHAN, W. M. (1969). "Some new bounds on perturbation ofsubspaces." Bull. Amer. Math. Soc. 75:863-868.

DAVIS, R. C. (1953). "On the Fourier expansion of stationary random processes."Proc. Amer. Math. Soc. 24:564-569.

DEEMER, W. L., and Olkin, I. (1951). "The Jacobians of certain matrix trans-formations." Biometrika. 38:345-367.

DEMPSTER, A. P. (1966). "Estimation in multivariate analysis." In MultivariateAnalysis, Ed. P. R. Krishmaiah, pp. 315-334. New York: Academic.

DEMPSTER, A. P. (1969). Continuous Multivariate Analysis. Reading: Addison-Wesley.

DEUTSCH, R. (1962). Nonlinear Transformations of Random Processes. EnglewoodCliffs: Prentice-Hall.

DICKEY, J. M. (1967). "Matricvariate generalizations of the multivariate t dis-tributions and the inverted multivariate / distribution." Ann. Math. Statist.38:511-519.

DOEBLIN, W. (1938). "Sur 1'equation matricielle A(t + s) = A(t)A(s) et ses ap-plications aux probabilites en chaine." Bull. Sci. Math. 62:21-32.

DOOB, J. L. (1953). Stochastic Processes. New York: Wiley.DRAPER, N. R., and SMITH, H. (1966). Applied Regression Analysis. New York:

Wiley.DRESSEL, P. L. (1940). "Semi-invariants and their estimates." Ann. Math. Statist.

11:33-57.DUGUNDJI, J. (1958). "Envelopes and pre-envelopes of real waveforms." IRE

Trans. Inf. Theory. IT-4:53-57.DUNCAN, D. B., and JONES, R. H. (1966). "Multiple regression with stationary

errors." J. Amer. Statist. Assoc. 61:917-928.

468 REFERENCES

DUNFORD, N., and SCHWARTZ, J. T. (1963). Linear Operators, Part II. NewYork: Wiley, Interscience.

DUNNETT, C. W., and SOBEL, M. (1954). "A bivariate generalization ofStudent's /-distribution, with tables for certain special cases." Biometrika.41:153-169.

DURBIN, J. (1954). "Errors in variables." Rev. Inter. Statist. Inst. 22:23-32.DURBIN, J. (1960). "Estimation of parameters in time series regression models."

J. Roy. Statist. Soc., B. 22:139-153.DYNKIN, E. B. (1960). Theory of Markov Processes. London: Pergamon.ECKART, C., and YOUNG, G. (1936). "On the approximation of one matrix by

another of lower rank." Psychometrika. 1:211-218.ECONOMIC TRENDS (1968). No. 178. London, Central Statistical Office.EDWARDS, R. E. (1967). Fourier Series: A Modern Introduction, Vols. I, II. New

York: Holt, Rinehart and Winston.EHRLICH, L. W. (1970). "Complex matrix inversion versus real." Comm. A.C.M.

13:561-562.ENOCHSON, L. D., and GOODMAN, N. R. (1965). Gaussian approximations to

the distribution of sample coherence. Tech. Rep. AFFDL — TR — 65-57,Wright-Patterson Air Force Base.

EZEKIEL, M. A., and FOX, C. A. (1959). Methods of Correlation and RegressionAnalysis. New York: Wiley.

FEHR, U., and MCGAHAN, L. C. (1967). "Analog systems for analyzing infra-sonic signals monitored in field experimentation." /. Acoust. Soc. Amer. 42:1001-1007.

FEJE~R, L. (1900). "Sur les fonctions bornees et integrates." C. R. Acad. Sci.(Paris) 131:984-987.

FEJ^R, L. (1904). "Untersuchungen iiber Fouriersche Reihen." Mat. Ann. 58:501-569.

FELLER, W. (1966). Introduction to Probability Theory and its Applications, Vol. 2.New York: Wiley.

FIELLER, E. C. (1954). "Some problems in interval estimation." /. Roy. Statist.Soc., B. 16:175-185.

FISHER, R. A. (1928). "The general sampling distribution of the multiple correla-tion coefficient." Proc. Roy. Soc. 121:654-673.

FISHER, R. A. (1962). "The simultaneous distribution of correlation coefficients."Sankhya A. 24:1-8.

FISHER, R. A., and MACKENZIE, W. A. (1922). "The correlation of weeklyrainfall" (with discussion). J. Roy. Met. Soc. 48:234-245.

FISHMAN, G. S. (1969). Spectral Methods in Econometrics. Cambridge: HarvardUniv. Press.

FISHMAN, G. S., and KIVIAT, P. J. (1967). "Spectral analysis of time seriesgenerated by simulation models. Management Science. 13:525-557.

FOX, M. (1956). "Charts of the power of the F-test." Ann. Math. Statist. 27:484-497.

FREIBERGER, W. (1963). "Approximate distributions of cross-spectral estimatesfor Gaussian processes." In Time Series Analysis, Ed. M. Rosenblatt, pp. 244-259. New York: Wiley.

REFERENCES 469

FREIBERGER, W., and GRENANDER, U. (1959). "Approximate distributions ofnoise power measurements." Quart. AppL Math. 17:271-283.

FRIEDLANDER, S. K., and TOPPER, L. (1961). Turbulence; Classic Papers onStatistical Theory. New York: Wiley Interscience.

FRIEDMAN, B. (1961). "Eigenvalues of composite matrices." Proc. Comb. Philos.Soc. 57:37-49.

GABOR, D. (1946). "Theory of communication." J. Inst. Elec. Engrs. 93:429-457.GAJJAR, A. V. (1967). "Limiting distributions of certain transformations of

multiple correlation coefficient." Metron. 26:189-193.GAVURIN, M. K. (1957). "Approximate determination of eigenvalues and the

theory of perturbations." Uspehi Mat. Nauk. 12:173-175.GELFAND, I., RAIKOV, D., and SHILOV, G. (1964). Commutative Normed

Rings. New York: Chelsea.GENTLEMAN, W. M., and SANDE, G. (1966). "Fast Fourier transforms — for

fun and profit." AFIPS. 1966 Fall Joint Computer Conference. 28:563-578.Washington: Spartan.

GERSCH, W. (1972). "Causality or driving in electrophysiological signal analy-sis." J. Math. Bioscience. 14:177-196.

GIBBS, F. A., and GRASS, A. M. (1947). "Frequency analysis of electroencephalo-grams." Science. 105:132-134.

GIKMAN, I. L, and SKOROKHOD, A. V. (1966). "On the densities of probabilitymeasures in function spaces." Russian Math. Surveys. 21:83-156.

GINZBURG, J. P. (1964). "The factorization of analytic matrix functions."Soviet Math. 5:1510-1514.

GIRI, N. (1965). "On the complex analogues of 71? and R2 tests." Ann. Math. Statist.36:664-670.

GIRSHICK, M. A. (1939). "On the sampling theory of roots of determinentalequations." Ann. Math. Statist. 10:203-224.

GLAHN, H. R. (1968). "Canonical correlation and its relationship to discriminantanalysis and multiple regression." J. Atmos. Sci. 25:23-31.

GODFREY, M. D. (1965). "An exploratory study of the bispectrum of an economictime series." Applied Statistics. 14:48-69.

GODFREY, M. D., and KARREMAN, H. F. (1967). "A spectrum analysis ofseasonal adjustment." In Essays in Mathematical Economics, Ed. M. Shubik,pp. 367-421. Princeton: Princeton Univ. Press.

GOLDBERGER, A. S. (1964). Econometric Theory. New York: Wiley.GOLUB, G. H. (1969). "Matrix decompositions and statistical calculations." In

Statistical Computation, Eds. R. C. Milton, J. A. Nelder, pp. 365-397. NewYork: Academic.

GOOD, I. J. (1950). "On the inversion of circulant matrices." Biometrika. 37:185-186.

GOOD, I. J. (1958). "The interaction algorithm and practical Fourier series." J.Roy. Stat. Soc., B. 20:361-372. Addendum (1960), 22:372-375.

GOOD, I. J. (1963). "Weighted covariance for detecting the direction of a Gaussiansource. In Time Series Analysis, Ed. M. Rosenblatt, pp. 447-470. New York:Wiley.

470 REFERENCES

GOOD, I. J. (1971). "The relationship between two fast Fourier transforms."IEEE Trans. Computers. €-20:310-317.

GOODMAN, N. R. (1957). On the joint estimation of the spectra, cospectrum andquadrature spectrum of a two-dimensional stationary Gaussian process. Ph.D.Thesis, Princeton University.

GOODMAN, N. R. (1960). "Measuring amplitude and phase." J. Franklin Inst.270:437-450.

GOODMAN, N. R. (1963). "Statistical analysis based upon a certain multivariatecomplex Gaussian distribution (an introduction)." Ann. Math. Statist. 34:152-177.

GOODMAN, N. R. (1965). Measurement of matrix frequency reponse functions andmultiple coherence functions. Research and Technology Division, AFSC,AFFDL TR 65-56, Wright-Patterson AFB, Ohio.

GOODMAN, N. R. (1967). Eigenvalues and eigenvectors of spectral density matrices.Seismic Data Lab. Report 179.

GOODMAN, N. R., and DUBMAN, M. R. (1969). "Theory of time-varying spec-tral analysis and complex Wishart matrix processes." In Multivariate AnalysisII, Ed. P. R. Krishnaiah, pp. 351-366. New York: Academic.

GOODMAN, N. R., KATZ, S., KRAMER, B. H., and KUO, M. T. (1961). "Fre-quency response from stationary noise: two case histories." Technometrics.3:245-268.

GORMAN, D., and ZABORSZKY, J. (1966). "Functional expansion in state spaceand the s domain." IEEE Trans. Aut. Control. AC-11:498-505.

GRANGER, C. W. J. (1964). Spectral Analysis of Economic Time Series. Princeton:Princeton Univ. Press.

GRANGER, C. W. J., and ELLIOTT, C. M. (1968). "A fresh look at wheat pricesand markets in the eighteenth century." Economic History Review. 20:257-265.

GRANGER, C. W. J., and HUGHES, A. O. (1968). "Spectral analysis of shortseries — a simulation study." J. Roy. Statist. Soc., A. 131:83-99.

GRANGER, C. W. J., and MORGENSTERN, O. (1963). "Spectral analysis ofstock market prices." Kyklos. 16:1-27.

GRENANDER, U. (1950). "Stochastic processes and statistical inference." Ark.Mat. 1:195-277.

GRENANDER, U. (1951 a). "On empirical spectral analysis of stochastic pro-cesses." Ark. Mat. 1:503-531.

GRENANDER, U. (1951b). "On Toeplitz forms and stationary processes." Ark.Mat. 1:551-571.

GRENANDER, U. (1954). "On the estimation of regression coefficients in the caseof an autocorrelated disturbance." Ann. Math. Statist. 25:252-272.

GRENANDER, U., POLLAK, H. O, and SLEPIAN, D. (1959). "The distributionof quadratic forms in normal variates: a small sample theory with applicationsto spectral analysis."/. Soc. Jndust. Appl. Math. 7:374-401.

GRENANDER, U., and ROSENBLATT, M. (1953). "Statistical spectral analysisof time series arising from stochastic processes." Ann. Math. Stat. 24:537-558.

GRENANDER, U., and ROSENBLATT, M. (1957). Statistical Analysis of Sta-tionary Time Series. New York: Wiley.

GRENANDER, U., and SZEG5, G. (1958). Toeplitz Forms and Their Applications.Berkeley: Univ. of Cal. Press.

REFERENCES 471

GROVES, G. W., and HANNAN, E. J. (1968). "Time series regression of sea levelon weather." Rev. Geophysics. 6:129-174.

GROVES, G. W., and ZETLER, B. D. (1964). "The cross-spectrum of sea level atSan Francisco and Honolulu." J. Marine Res. 22:269-275.

GUPTA, R. P. (1965). "Asymptotic theory for principal component analysis in thecomplex case." J. Indian Statist. Assoc. 3:97-106.

GUPTA, S. S. (1963a). "Probability integrals of multivariate normal and multi-variate /." Ann. Math. Statist. 34:792-828.

GUPTA, S. S. (1963b). "Bibliography on the multivariate normal integrals andrelated topics." Ann. Math. Statist. 34:829-838.

GURLAND, J. (1966). "Further consideration of the distribution of the multiplecorrelation coefficient." Ann. Math. Statist. 37:1418.

GYIRES, B. (1961). "Ober die Spuren der verallgemeinerten Toeplitzschen Ma-trize." Publ. Math. Debrecen. 8:93-116.

HAJEK, J. (1962). "On linear statistical problems in stochastic processes." Czech.Math. J. 12:404-443.

HALL, P. (1927). "Multiple and partial correlation coefficients." Biometrika. 19:100-109.

HALMOS, P. R. (1956). Lectures in Ergodic Theory. Tokyo: Math. Soc. Japan.HALPERIN, M. (1967). "A generalisation of Fieller's theorem to the ratio of

complex parameters." J. Roy. Statist. Soc., B. 29:126-131.HAMBURGER, H., and GRIMSHAW, M. E. (1951). Linear Transformations in

n-dimensional Vector Space. Cambridge: Cambridge Univ. Press.HAMMING, R. W. (1962). Numerical Methods for Scientists and Engineers. New

York: McGraw-Hill.HAMMING, R. W., and TUKEY, J. W. (1949). Measuring noise color. Bell

Telephone Laboratories Memorandum.HAMON, B. V., and HANNAN, E. J. (1963). "Estimating relations between time

series." J. Geophys. Res. 68:6033-6041.HANNAN, E. J. (1960). Time Series Analysis. London: Methuen.HANNAN, E. J. (1961a). "The general theory of canonical correlation and its

relation to functional analysis." J. Aust. Math. Soc. 2:229-242.HANNAN, E. J. (1961b). "Testing for a jump in the spectral function." J. Roy.

Statist. Soc., B. 23:394-404.HANNAN, E. J. (1963a). "Regression for time series with errors of measurement."

Biometrika. 50:293-302.HANNAN, E. J. (1963b). "Regression for time series." In Time Series Analysis,

Ed. M. Rosenblatt, pp. 17-37. New York: Wiley.HANNAN, E. J. (1965). "The estimation of relationships involving distributed

lags." Econometrica. 33:206-224.HANNAN, E. J. (1967a). "The estimation of a lagged regression relation." Bio-

metrika. 54:409-418.HANNAN, E. J. (1967b). "Fourier methods and random processes." Bull. Inter.

Statist. Inst. 42:475-494.HANNAN, E. J. (1967c). "Canonical correlation and multiple equation systems in

economics." Econometrica. 35:123-138.HANNAN, E. J. (1968). "Least squares efficiency for vector time series." /. Roy.

Statist. Soc., B. 30:490-498.

472 REFERENCES

HANNAN, E. J. (1970). Multiple Time Series. New York: Wiley.HASSELMAN, K., MUNK, W., and MACDONALD, G. (1963). "Bispectrum of

ocean waves." In Time Series Analysis, Ed. M. Rosenblatt, pp. 125-139. NewYork: Wiley.

HAUBRICH, R. A. (1965). "Earth noise, 5 to 500 millicycles per second. 1. Spectralstationarity, normality, nonlinearity." J. Geophys. Res. 70:1415-1427.

HAUBRICH, R. A., and MACKENZIE, G. S. (1965). "Earth noise, 5 to 500 milli-cydes per second. 2. Reaction of the earth to ocean and atmosphere." J.Geophys. Res. 70:1429-1440.

HENNINGER, J. (1970). "Functions of bounded mean square and generalizedFourier-Stieltjes transforms." Can. /. Math. 22:1016-1034.

HERGLOTZ, G. (1911). "Ober Potenzreihen mit positivem reellem Teil im Ein-heitskreis." Sitzgsber. Sachs Akad. Wiss. 63:501-511.

HEWITT, E., and ROSS, K. A. (1963). Abstract Harmonic Analysis. Berlin:Springer.

HEXT, G. R. (1966). A new approach to time series with mixed spectra. Ph.D. Thesis,Stanford University.

HINICH, M. (1967). "Estimation of spectra after hard clipping of Gaussian pro-cesses." Technometrics. 9:391-400.

HODGSON, V. (1968). "On the sampling distribution of the multiple correlationcoefficient." Ann. Math. Statist. 39:307.

HOFF, J. C. (1970). "Approximation with kernels of finite oscillations, I. Con-vergence." J. Approx. Theory. 3:213-228.

HOOPER, J. W. (1958). "The sampling variance of correlation coefficients underassumptions of fixed and mixed variates." Biometrika. 45:471-477.

HOOPER, J. W. (1959). "Simultaneous equations and canonical correlationtheory." Econometrica. 27:245-256.

HOPF, E. (1937). Ergodentheorie. Berlin: Springer.HOPF, E. (1952). "Statistical hydromechanics and functional calculus." J. Rat.

Mech. Anal. 1:87-123.HORST, P. (1965). Factor Analysis of Data Matrices. New York: Holt, Rinehart

and Winston.HOTELLING, H. (1933). "Analysis of a complex of statistical variables into

principal components." /. Educ. Psych. 24:417-441, 498-520.HOTELLING, H. (1936). "Relations between two sets of variates." Biometrika.

28:321-377.HOWREY, E. P. (1968). "A spectrum analysis of the long-swing hypothesis."

Int. Econ. Rev. 9:228-252.HOYT, R. S. (1947). "Probability functions for the modulus and angle of the normal

complex variate." Bell System Tech. J. 26:318-359.HSU, P. L. (1941). "On the limiting distribution of canonical correlations." Bio-

metrika. 33:38-45.HSU, P. L. (1949). "The limiting distribution of functions of sample means and

application to testing hypotheses." In Proc. Berkeley Symp. Math. Statist.Prob., Ed. J. Neyman, pp. 359-401. Berkeley: Univ. of Cal. Press.

HUA, L. K. (1963). Harmonic Analysis of Functions of Several Variables in ClassicalDomains. Providence: American Math. Society.

REFERENCES 473

IBRAGIMOV, I. A. (1963). "On estimation of the spectral function of a stationaryGaussian process." Theory Prob. Appl. 8:366-401.

IBRAGIMOV, I. A. (1967). "On maximum likelihood estimation of parameters ofthe spectral density of stationary time series." Theory Prob. Appl. 12:115-119.

IOSIFESCU, M. (1968). "The law of the interated logarithm for a class of de-pendent random variables." Theory Prob. Appl. 13:304-313.

IOSIFESCU, M., and THEODORESCU, R. (1969). Random Processes and Learn-ing. Berlin: Springer.

ISSERLIS, L. (1918). "On a formula for the product moment coefficient of anyorder of a normal frequency distribution in any number of variables." Bio-metrika. 12:134-139.

ITO, K., and NISIO, M. (1964). "On stationary solutions of a stochastic differentialequation." J. Math. Kyoto. 4:1-75.

IZENMAN, A. J. (1972). Reduced rank regression for the multivariate linear model.Ph.D. Thesis, University of California, Berkeley.

JAGERMAN, D. L. (1963). "The autocorrelation function of a sequence uniformlydistributed modulo 1." Ann. Math. Statist. 34:1243-1252.

JAMES, A. T. (1964). "Distributions of matrix variates and latent roots derivedfrom normal samples." Ann. Math. Statist. 35:475-501.

JAMES, A. T. (1966). "Inference on latent roots by calculation of hypergeometricfunctions of matrix argument." In Multivariate Analysis, Ed. P. R. Krishnaiah,pp. 209-235. New York: Academic.

JENKINS, G. M. (1961). "General considerations in the analysis of spectra."Technometrics. 3:133-166.

JENKINS, G. M. (1963a). "Cross-spectral analysis and the estimation of linearopen loop transfer functions." In Time Series Analysis, Ed. M. Rosenblatt,pp. 267-278. New York: Wiley.

JENKINS, G. M. (1963b). "An example of the estimation of a linear open-looptransfer function." Technometrics. 5:227-245.

JENKINS, G. M., and WATTS, D. G. (1968). Spectrum Analysis and Its Applica-tions. San Francisco: Holden-Day.

JENNISON, R. C. (1961). Fourier Transforms and Convolutions for the Experi-mentalist. London: Pergamon.

JONES, R. H. (1962a). "Spectral estimates and their distributions, II." Skand.Aktuartidskr. 45:135-153.

JONES, R. H. (1962b). "Spectral analysis with regularly missed observations." Ann.Math. Statist. 33:455-461.

JONES, R. H. (1965). "A reappraisal of the periodogram in spectral analysis."Technometrics. 7:531-542.

JONES, R. H. (1969). "Phase free estimation of coherence." Ann. Math. Statist.40:540-548.

KABE, D. G. (1966). "Complex analogues of some classical non-central multi-variate distributions." Austral. J. Statist. 8:99-103.

KABE, D. G. (1968a). "On the distribution of the regression coefficient matrix of anormal distribution." Austral. J. Statist. 10:21-23.

KABE, D. G. (1968b). "Some aspects of analysis of variance and covariance theoryfor a certain multivariate complex Gaussian distribution." Metrika. 13:86-97.

474 REFERENCES

KAHANE, J. (1968). Some Random Series of Functions. Lexington: Heath.KAMPE de FERIET, J. (1954). "Introduction to the statistical theory of turbu-

lence." J. Soc. Ind. Appl. Math. 2:244-271.KAMPE de FERIET, J. (1965). "Random integrals of differential equations."

In Lectures on Modern Mathematics, Ed. T. L. Saaty, 3:277-321. New York:Wiley.

KANESHIGE, I. (1964). "Frequency response of an automobile engine mounting."Ann. Inst. Stat. Math., Suppl. 3:49-58.

KAWASHIMA, R. (1964). "On the response function for the rolling motion of afishing boat on ocean waves." Ann. Inst. Stat. Math., Suppl. 3:33-40.

KAWATA, T. (1959). "Some convergence theorems for stationary stochasticprocesses." Ann. Math. Statist. 30:1192-1214.

KAWATA, T. (1960). "The Fourier series of some stochastic processes." JapaneseJ. Math. 29:16-25.

KAWATA, T. (1965). "Sur la serie de Fourier d'un processus stochastique sta-tionaire." C. R. Acad. Sci. (Paris). 260:5453-5455.

KAWATA, T. (1966). "On the Fourier series of a stationary stochastic process."Zeit. Wahrschein. 6:224-245.

KEEN, C. G., MONTGOMERY, J., MOWAT, W. M. H., and PLATT, D. C.(1965). "British seismometer array recording systems." J. Br. Instn. RadioEngrs.3Q:219.

KENDALL, M. (1946). Contributions to the Study of Oscillatory Time Series.Cambridge: Cambridge Univ. Press.

KENDALL, M. G., and STUART, A. (1958). The Advanced Theory of Statistics,Vol. I. London: Griffin.

KENDALL, M. G., and STUART, A. (1961). The Advanced Theory of Statistics,Vol. II. London: Griffin.

KENDALL, M. G., and STUART, A. (1968). The Advanced Theory of Statistics,Vol. III. London: Griffin.

KHATRI, C. G. (1964). "Distribution of the 'generalised' multiple correlationmatrix in the dual case." Ann. Math. Statist. 35:1801-1806.

KHATRI, C. G. (1965a). "Classical statistical analysis based on a certain multi-variate complex Gaussian distribution." Ann. Math. Statist. 36:98-114.

KHATRI, C. G. (1965b). "A test for reality of a covariance matrix in a certaincomplex Gaussian distribution." Ann. Math. Statist. 36:115-119.

KHATRI, C. G. (1967). "A theorem on least squares in multivariate linear re-gression." J. Amer. Statist. Assoc. 62:1494-1495.

KHINTCHINE, A. (1934). "Korrelationstheorie der stationaren Prozesse." Math.Annalen. 109:604-615.

KINOSITA, K. (1964). "On the behaviour of tsunami in a tidal river." Ann. Inst.Stat. Math., Suppl. 3:78-88.

KIRCHENER, R. B. (1967). "An explicit formula for exp At." Amer. Math. Monthly.74:1200-1203.

KNOPP, K. (1948). Theory and Application of Infinite Series. New York: Hafner.KOLMOGOROV, A. N. (1941a). "Interpolation und Extrapolation von stationaren

zufalligen Folgen." Bull. Acad. Sci. de l'U.R.S.S. 5:3-14.

REFERENCES 475

KOLMOGOROV, A. N. (1941 b). "Stationary sequences in Hilbert space." (InRussian.) Bull. Moscow State U. Math. 2:1-40. [Reprinted in Spanish inTrab. Estad. 4:55-73, 243-270.]

KOOPMANS, L. H. (1964a). "On the coefficient of coherence for weakly stationarystochastic processes." Ann. Math. Statist. 35:532-549.

KOOPMANS, L. H. (1964b). "On the multivariate analysis of weakly stationarystochastic processes." Ann. Math. Statist. 35:1765-1780.

KOOPMANS, L. H. (1966). "A note on the estimation of amplitude spectra forstochastic processes with quasi-linear residuals." J. Amer. Statist. Assoc. 61:397-402.

KRAMER, H. P., and MATHEWS, M. V. (1956). "A linear coding for transmittinga set of correlated signals." IRE Trans. Inf. Theo. IT-2:41-46.

KRAMER, K. H. (1963). "Tables for constructing confidence limits on the multiplecorrelation coefficient." J. Amer. Statist. Assoc. 58:1082-1085.

KRISHNAIAH, P. R., and WAIKAR, V. B. (1970). Exact joint distributions of fewroots of a class of random matrices. Report ARL 70-0345. Aerospace Res. Labs.

KROMER, R. E. (1969). Asymptotic properties of the autoregressive spectralestimator. Ph.D. Thesis, Stanford University.

KSHIRSAGAR, A. M. (1961). "Some extensions of the multivariate /-distributionand the multivariate generalization of the distribution of the regression co-efficient." Proc. Camb. Philos. Soc. 57:80-85.

KSHIRSAGAR, A. M. (1971). "Goodness of fit of a discriminant function from thevector space of dummy variables." J. Roy. Statist. Soc., B. 33:111-116.

KUHN, H. G. (1962). Atomic Spectra. London: Longmans.KUO, F. F., and KAISER, J. F. (1966). System Analysis by Digital Computer. New

York: Wiley.LABROUSTE, M. H. (1934). "L'analyse des seismogrammes." Memorial des

Sciences Physiques, Vol. 26. Paris: Gauthier-Villars.LAMPERTI, J. (1962). "On covergence of stochastic processes." Trans. Amer.

Math. Soc. 104:430-435.LANCASTER, H. O. (1966). "Kolmogorov's remark on the Hotelling canonical

correlations." Biometrika. 53:585-588.LANCZOS, C. (1955). "Spectroscopic eigenvalue analysis." /. Wash. Acad. Sci.

45:315-323.LANCZOS, C. (1956). Applied Analysis. Englewood Cliffs: Prentice-Hall.LATHAM, G., et al. (1970). "Seismic data from man-made impacts on the moon."

Science. 170:620-626.LAUBSCHER, N. F. (1960). "Normalizing the noncentral / and F distributions."

Ann. Math. Statist. 31:1105-1112.LAWLEY, D. N. (1959). "Tests of significance in canonical analysis." Biometrika.

46:59-66.LEE, Y. W. (1960). Statistical Theory of Communication. New York: Wiley.LEE, Y. W., and WIESNER, J. B. (1950). "Correlation functions and communica-

tion applications." Electronics. 23:86-92.LEONOV, V. P. (1960). "The use of the characteristic functional and semi-invariants

in the ergodic theory of stationary processes." Soviet Math. 1:878-881.

476 REFERENCES

LEONOV, V. P. (1964). Some Applications of Higher-order Semi-invariants to theTheory of Stationary Random Processes (in Russian). Moscow: Izdatilstvo,Nauka.

LEONOV, V. P., and SHIRYAEV, A. N. (1959). "On a method of calculation ofsemi-invariants." Theor. Prob. Appl. 4:319-329.

LEONOV, V. P., and SHIRYAEV, A. N. (1960). "Some problems in the spectraltheory of higher moments, II." Theory Prob. Appl. 5:460-464.

LEPPINK, G. J. (1970). "Efficient estimators in spectral analysis. "Proc. TwelfthBiennial Seminar Can. Math. Cong., Ed. R. Pyke, pp. 83-87. Montreal: Can.Math. Cong.

LliVY, P. (1933). "Sur la convergence absolue des series de Fourier." C. R. Acad.Sci. Paris. 196:463-464.

LEWIS, F. A. (1939). "Problem 3824." Amer. Math. Monthly. 46:304-305.LIGHTHILL, M. J. (1958). An Introduction to Fourier Analysis and Generalized

Functions. Cambridge: Cambridge Univ. Press.LOEVE, M. (1963). Probability Theory. Princeton: Van Nostrand.LOMNICKI, Z. A., and ZAREMBA, S. K. (1957a). "On estimating the spectral

density function of a stochastic process." J. Roy. Statist. Soc., B. 19:13-37.LOMNICKI, Z. A., and ZAREMBA, S. K. (1957b). "On some moments and dis-

tributions occurring in the theory of linear stochastic processes, I." Mh. Math.61:318-358.

LOMNICKI, Z. A., and ZAREMBA, S. K. (1959). "On some moments and dis-tributions occurring in the theory of linear stochastic processes, II." Mh. Math.63:128-168.

LOYNES, R. M. (1968). "On the concept of the spectrum for non-stationaryprocesses." J. Roy. Statist. Soc., B. 30:1-30.

MACDONALD, N. J., and WARD, F. (1963). "The prediction of geomagneticdisturbance indices. 1. The elimination of internally predictable variations."J. Geophys. Res. 68:3351-3373.

MACDUFFEE, C. C. (1946). The Theory of Matrices. New York: Chelsea.MACNEIL, I. B. (1971). "Limit processes for co-spectral and quadrature spectral

distribution functions." Ann. Math. Statist. 42:81-96.MADANSKY, A., and OLKIN, I. (1969). "Approximate confidence regions for

constraint parameters." In Multivariate Analysis — II, Ed. P. R. Krishnaiah,pp. 261-286. New York: Academic.

MADDEN, T. (1964). "Spectral, cross-spectral and bispectral analysis of lowfrequency electromagnetic data." Natural Electromagnetic Phenomena Below30 kc/s, Ed. D. F. Bleil, pp. 429-450. New York: Wiley.

MAJEWSKI, W., and HOLLIEN, H. (1967). "Formant frequency regions of Polishvowels." J. Acoust. Soc. Amer. 42:1031-1037.

MALEVICH, T. L. (1964). "The asymptotic behavior of an estimate for the spectralfunction of a stationary Gaussian process." Theory Prob. Appl. 9:350-353.

MALEVICH, T. L. (1965). "Some properties of the estimators of the spectrum ofa stationary process." Theory Prob. Appl. 10:447-465.

MALINVAUD, E. (1964). Statistical Methods of Econometrics. Amsterdam: North-Holland.

REFERENCES 477

MALLOWS, C. L. (1961). "Latent vectors of random symmetric matrices." Bio-metrika. 48:133-149.

MANN, H. B., and WALD, A. (1943a). "On stochastic limit and order relation-ships." Ann. Math. Statist. 14:217-226.

MANN, H. B., and WALD, A. (1943b). "On the statistical treatment of linearstochastic difference equations." Econometrica. 11:173-220.

MANWELL, T., and SIMON, M. (1966). "Spectral density of the possibly randomfluctuations of 3 C 273." Nature. 212:1224-1225.

MARUYAMA, G. (1949). "The harmonic analysis of stationary stochastic pro-cesses." Mem. Fac. Sci. Kyusyu Univ. Ser. A. 4:45-106.

MATHEWS, M. V. (1963). "Signal detection models for human auditory percep-tion." In Time Series Analysis, Ed. M. Rosenblatt, pp. 349-361. New York:Wiley.

MCGUCKEN, W. (1970). Nineteenth Century Spectroscopy. Baltimore: JohnsHopkins.

MCNEIL, D. R. (1967). "Estimating the covariance and spectral density functionsfrom a clipped stationary time series." /. Roy. Statist. Soc., B. 29:180-195.

MCSHANE, E. J. (1963). "Integrals devised for special purposes." Bull. Amer.Math. Soc. 69:597-627.

MEDGYESSY, P. (1961). Decomposition of Super positions of Distribution Functions.Budapest: Hungar. Acad. Sci.

MEECHAM, W. C. (1969). "Stochastic representation of nearly-Gaussian nonlinearprocesses." J. Statist. Physics. 1:25-40.

MEECHAM, W. C., and SIEGEL, A. (1964). "Wiener-Hermite expansion inmodel turbulence at large Reynolds numbers." Physics Fluids. 7:1178-1190.

MEGGERS, W. F. (1946). "Spectroscopy, past, present and future." J. Opt. Soc.Amer. 36:431-448.

MIDDLETON, D. (1960). Statistical Communication Theory. New York: McGraw-Hill.

MILLER, K. S. (1968). "Moments of complex Gaussian processes." Proc. IEEE.56:83-84.

MILLER, K. S. (1969). "Complex Gaussian processes." SIAM Rev. 11:544-567.MILLER, R. G. (1966). Simultaneous Statistical Inference. New York: McGraw-

Hill.MIYATA, M. (1970). "Complex generalization of canonical correlation and its

application to sea level study." J. Marine Res. 28:202-214.MOORE, C. N. (1966). Summable Series and Convergence Factors. New York:

Dover.MORAN, J. M., et al. (1968). "The 18-cm flux of the unresolved component of 3

C 273." AstrophysicalJ. 151:L99-L101.MORRISON, D. F. (1967). Multivariate Statistical Methods. New York: McGraw-

Hill.MORTENSEN, R. E. (1969). "Mathematical problems of modeling stochastic

non-linear dynamic systems." J. Statist. Physics. 1:271-296.MUNK, W. H., and CARTWRIGHT, D. E. (1966). "Tidal Spectroscopy and

prediction." Phil. Trans., A. 259:533-581.

478 REFERENCES

MUNK, W. H., and MACDONALD, G. J. F. (1960). The Rotation of the Earth.Cambridge: Cambridge Univ. Press.

MUNK, W. H., and SNODGRASS, F. E. (1957). "Measurements of southern swellat Guadalupe Island." Deep-Sea Research. 4:272-286.

MURTHY, V. K. (1963). "Estimation of the cross-spectrum." Ann. Math. Statist.34:1012-1021.

NAKAMURA, I. (1964). "Relation between superelevation and car rolling." Ann.Inst. Stat. Math., Suppl. 3:41-48.

NAKAMURA, H., and MURAKAMI, S. (1964). "Resonance characteristic of thehydraulic system of a water power plant." Ann. Inst. Stat. Math., Suppl.3:65-70.

NAYLOR, T. H., WALLACE, W. H., and SASSER, W. E. (1967). "A computersimulation model of the textile industry." J. Amer. Stat. Assoc. 62:1338-1364.

NERLOVE, M. (1964). "Spectral analysis of seasonal adjustment procedures."Econometrica. 32:241-286.

NETTHEIM, N. (1966). The estimation of coherence. Technical Report, StatisticsDepartment, Stanford University.

NEUDECKER, H. (1968). "The Kronecker matrix product and some of its appli-cations in econometrics." Statistica Neerlandica. 22:69-82.

NEWTON, H. W. (1958). The Face of the Sun. London: Penguin.NICHOLLS, D. F. (1967). "Estimation of the spectral density function when testing

for a jump in the spectrum." Austral. J. Statist. 9:103-108.NISIO, M. (1960). "On polynomial approximation for strictly stationary processes."

J. Math. Soc. Japan. 12:207-226.NISIO, M. (1961). "Remarks on the canonical representation of strictly stationary

processes." J. Math. Kyoto. 1:129-146.NISSEN, D. H. (1968). "A note on the variance of a matrix." Econometrica. 36:603-

604.NOLL, A. M. (1964). "Short-time spectrum and 'cepstrum' techniques for vocal-

pitch detection." J. Acoust. Soc. Amer. 36:296-302.OBUKHOV, A. M. (1938). "Normally correlated vectors." ho. Akad. Nauk SSR.

Section on Mathematics. 3:339-370.OBUKHOV, A. M. (1940). "Correlation theory of vectors." Uchen. Zap. Moscow

State Univ. Mathematics Section. 45:73-92.OCEAN WAVE SPECTRA (1963). National Academy of Sciences. Englewood

Cliffs: Prentice-Hall.OKAMOTA, M. (1969). "Optimality of principal components." In Multivariate

Analysis — II, Ed. P. R. Krisknaiah, pp. 673-686. New York: Academic.OKAMOTO, M., and KANAZAWA, M. (1968). "Minimization of eigenvalues of

a matrix and Optimality of principal components." Ann. Math. Statist. 39:859-863.

OLKIN, I., and PRATT, J. W. (1958). "Unbiased estimation of certain correlationcoefficients." Ann. Math. Statist. 29:201-210.

OLSHEN, R. A. (1967). "Asymptotic properties of the periodogram of a discretestationary process." J. Appl. Prob. 4:508-528.

OSWALD, J. R. V. (1956). "Theory of analytic bandlimited signals applied tocarrier systems." IRE Trans. Circuit Theory. CT-3:244-251.

REFERENCES 479

PANOFSKY, H. A. (1967). "Meteorological applications of cross-spectrumanalysis." In Advanced Seminar on Spectral Analysis of Time Series, Ed. B.Harris, pp. 109-132. New York: Wiley.

PAPOULIS, A. (1962). The Fourier Integral and its Applications. New York:McGraw-Hill.

PARTHASARATHY, K. R. (1960). "On the estimation of the spectrum of astationary stochastic process." Ann. Math. Statist. 31:568-573.

PARTHASARATHY, K. R., and VARADAHN, S. R. S. (1964). "Extension ofstationary stochastic processes." Theory Prob. Appl. 9:65-71.

PARZEN, E. (1957). "On consistent estimates of the spectrum of a stationary timeseries." Ann. Math. Statist. 28:329-348.

PARZEN, E. (1958). "On asymptotically efficient consistent estimates of the spectraldensity function of a stationary time series." J. Roy. Statist. Soc., B. 20:303-322.

PARZEN, E. (1961). "Mathematical considerations in the estimation of spectra."Technometrics. 3:167-190.

PARZEN, E. (1963a). "On spectral analysis with missing observations and ampli-tude modulation." Sankhya. A. 25:180-189.

PARZEN, E. (1963b). "Notes on Fourier analysis and spectral windows." Includedin Parzen (1967a).

PARZEN, E. (1963c). "Probability density functionals and reproducing kernelHilbert spaces." In Times Series Analysis, Ed. M. Rosenblatt, pp. 155-169.New York: Wiley.

PARZEN, E. (1964). "An approach to empirical-time series analysis." RadioScience. 680:937-951.

PARZEN, E. (1967a). Time Series Analysis Papers. San Francisco: Holden-Day.PARZEN, E. (1967b). "Time series analysis for models of signals plus white noise."

In Advanced Seminar on Spectral Analysis of Time Series, Ed. B. Harris, pp.233-257. New York: Wiley.

PARZEN, E. (1967c). "On empirical multiple time series analysis." In Proc. FifthBerkeley Symp. Math. Statist. Prob., 1, Eds. L. Le Cam and J. Neyman, pp.305-340. Berkeley: Univ. of Cal. Press.

PARZEN, E. (1969). "Multiple time series modelling." In Multivariate Analysis —II, Ed. P. R. Krishnaiah, pp. 389-409. New York: Academic.

PEARSON, E. S., and HARTLEY, H. O. (1951). "Charts of the power function foranalysis of variance tests derived from the non-central F distribution." Bio-metrika. 38:112-130.

PEARSON, K., and FILON, L. N. G. (1898). "Mathematical contributions to thetheory of evolution. IV. On the probable errors of frequency constants and onthe influence of random selection on variation and correlation." Phil. Trans.,A. 191:229-311.

PEARSON, K., JEFFERY, G. B., and ELDERTON, E. M. (1929). "On the co-efficient of the first product moment coefficient in samples drawn from anindefinitely large normal population." Biometrika. 21:164-201.

PHILIPP, W. (1967). "Das Gesetz vom iterierten Logarithmus fur stark mischendestationare Prozesse." Zeit. Wahrschein. 8:204-209.

PHILIPP, W. (1969). "The central limit problem for mixing sequences of randomvariables." Z. Wahrschein. verw. Gebiet. 12:155-171.

480 REFERENCES

PICINBONO, B. (1959). "Tendence vers le caractere gaussien par filtrage selectif."C. R. Acad. Sci. Paris. 248:2280.

PICKLANDS, J. (1970). "Spectral estimation with random truncation." Ann.Math. Statist. 41:44-58.

PINSKER, M. S. (1964). Information and Information Stability of Random Variablesand Processes. San Francisco: Holden-Day.

PISARENKO, V. F. (1970). "Statistical estimates of amplitude and phase correc-tions." Geophys. J. Roy. Astron. Soc. 20:89-98.

PISARENKO, V. F. (1972). "On the estimation of spectra by means of non-linearfunctions of the covariance matrix." Geophys. J. Roy Astron. Soc. 28:511-531.

PLAGEMANN, S. H., FELDMAN, V. A., and GRIBBIN, J. R. (1969). "Powerspectrum analysis of the emmission-line redshift distribution of quasi-stellarand related objects." Nature. 224:875-876.

POLYA, G., and SZEGO, G. (1925). Aufgaben und Lehrsatze aus der Analysis I.Berlin: Springer.

PORTMANN, W. O. (1960). "Hausdorff-analytic functions of matrices." Proc.Amer. Math. Soc. 11:97-101.

POSNER, E. C. (1968). "Combinatorial structures in planetary reconnaissance."In Error Correcting Codes, Ed. H. B. Mann, pp. 15-47. New York: Wiley.

PRESS, H., and TUKEY, J. W. (1956). Power spectral methods of analysis and theirapplication to problems in airplane dynamics. Bell Telephone System Monograph2606.

PRIESTLEY, M. B. (1962a). "Basic considerations in the estimation of spectra."Technometrics. 4:551-564.

PRIESTLEY, M. B. (1962b). "The analysis of stationary processes with mixedspectra." J. Roy. Statist. Soc., B. 24:511-529.

PRIESTLEY, M. B. (1964). "Estimation of the spectra density function in thepresence of harmonic components." J. Roy. Statist. Soc., B. 26:123-132.

PRIESTLEY, M. B. (1965). "Evolutionary spectra and non-stationary processes."J. Roy. Statist. Soc., B. 27:204-237

PRIESTLEY, M. B. (1969). "Estimation of transfer functions in closed loop sto-chastic systems." Automatica. 5:623-632.

PUPIN, M. I. (1894). "Resonance analysis of alternating and polyphase currents."Trans. A.I.E.E. 9:523.

QUENOUILLE, M. H. (1957). The Analysis of Multiple Time Series. London:Griffin.

RAO, C. R. (1964). "The use and interpretation of principal component analysis inapplied research." Sankhya, A. 26:329-358.

RAO, C. R. (1965). Linear Statistical Inference and Its Applications. New York:Wiley.

RAO, M. M. (1960). "Estimation by periodogram." Trabajos Estadistica. 11:123-137.

RAO, M. M. (1963). "Inference in stochastic processes. I." Tear. Verojatnest. iPrimemen. 8:282-298.

RAO, M. M. (1966). "Inference in stochastic processes, II." Zeit. Wahrschein. 5:317-335.

REFERENCES 481

RAO, S. T. (1967). "On the cross-periodogram of a stationary Gaussian vectorprocess." Ann. Math. Statist. 38:593-597.

RICHTER, C. P. (1967). "Biological clocks in medicine and psychiatry." Proc.Nat. Acad. Sci. 46:1506-1530.

RICKER, N. (1940). The form and nature of seismic waves and the structure ofseismograms." Geophysics. 5:348-366.

RIESZ, F., and NAGY, B. Sz. (1955). Lessons in Functional Analysis. New York:Ungar.

ROBERTS, J. B., and BISHOP, R. E. D. (1965). "A simple illustration of spectraldensity analysis." J. Sound Vib. 2:37-41.

ROBINSON, E. A. (1967a). Multichannel Time Series Analysis with Digital Com-puter Programs. San Francisco: Holden-Day.

ROBINSON, E. A. (1967b). Statistical Communication and Detection with SpecialReference to Digital Data Processing of Radar and Seismic Signals. London:Griffin.

RODEMICH, E. R. (1966). "Spectral estimates using nonlinear functions." Ann.Math. Statist. 37:1237-1256.

RODRIGUEZ-ITURBE, I., and YEVJEVICH, V. (1968). The investigation of re-lationship between hydrologic time series andsunspot numbers. Hydrology Paper.No. 26. Fort Collins: Colorado State University.

ROOT, W. L., and PITCHER, T. S. (1955). "On the Fourier expansion of randomfunctions." Ann. Math. Statist. 26:313-318.

ROSENBERG, M. (1964). "The square-integrability of matrix-valued functionswith respect to a non-negative Hermitian measure." Duke Math. J. 31:291-298.

ROSENBLATT, M. (1956a). "On estimation of regression coefficients of a vector-valued time series with a stationary disturbance." Ann. Math. Statist. 27:99-121.

ROSENBLATT, M. (1956b). "On some regression problems in time series analysis."Proc. Third Berkeley Symp. Math. Statist. Prob., Vol 1. Ed. J. Neyman, pp.165-186. Berkeley: Univ. of Cal. Press.

ROSENBLATT, M. (1956c). "A central limit theorem and a strong mixing condi-tion." Proc. Nat. Acad. Sci. (U.S.A.). 42:43-47.

ROSENBLATT, M. (1959). "Statistical analysis of stochastic processes with sta-tionary residuals." In Probability and Statistics, Ed. U. Grenander, pp. 246-275.New York: Wiley.

ROSENBLATT, M. (1960). "Asymptotic distribution of the eigenvalues of blockToeplitz matrices." Bull. Amer. Math. Soc. 66:320-321.

ROSENBLATT, M. (1961). "Some comments on narrow band-pass filters." Quart.Appl. Math. 18:387-393.

ROSENBLATT, M. (1962). "Asymptotic behavior of eigenvalues of Toeplitzforms." J. Math. Mech. 11:941-950.

ROSENBLATT, M. (1964). "Some nonlinear problems arising in the study of ran-dom processes." Radio Science. 68D:933-936.

ROSENBLATT, M., and VAN NESS, J. S. (1965). "Estimation of the bispectrum."Ann. Math. Statist. 36:1120-1136.

ROZANOV, Yu. A. (1967). Stationary Random Processes. San Francisco: Holden-Day.

482 REFERENCES

SALEM, R., and ZYGMUND, A. (1956). "A note on random trigonometricpolynomials. In Proc. Third Berkeley Symp. Math. Statist. Prob., Ed. J. Neyman,pp. 243-246. Berkeley: Univ. of Cal. Press.

SARGENT, T. J. (1968). "Interest rates in the nineteen-fifties." Rev. Econ. Stat.50:164-172.

SATO, H. (1964). "The measurement of transfer characteristic of ground-structuresystems using micro tremor." Ann. Inst. Stat. Math., Suppl. 3:71-78.

SATTERTHWAITE, F. E. (1941). "Synthesis of variance." Psychometrica. 6:309-316.

SAXENA, A. K. (1969). "Classification into two multivariate complex normal dis-tributions with different covariance matrices." J. Ind. Statist. Assoc. 7:158-161.

SCHEFFE, H. (1959). The Analysis of Variance. New York: Wiley.SCHOENBERG, I. J. (1946). "Contributions to the problem of approximation of

equidistant data by analytic functions." Quart. Appl. Math. 4:45-87, 112-141.SCHOENBERG, I. J. (1950). "The finite Fourier series and elementary geometry."

Amer. Math. Monthly. 57:390-404.SCHUSTER, A. (1894). "On interference phenomena." Phil. Mag. 37:509-545.SCHUSTER, A. (1897). "On lunar and solar periodicities of earthquakes." Proc.

Roy.Soc. 61:455-465.SCHUSTER, A. (1898). "On the investigation of hidden periodicities with applica-

tion to a supposed 26 day period of meteorological phenomena." Terr. Magn.3:13-41.

SCHUSTER, A. (1900). "The periodogram of magnetic declination as obtainedfrom the records of the Greenwich Observatory during the years 1871-1895."Camb. Phil. Trans. 18:107-135.

SCHUSTER, A. (1904). The Theory of Optics. London: Cambridge Univ. Press.SCHUSTER, A. (1906a). "The periodogram and its optical analogy." Proc. Roy.

Soc. 77:137-140.SCHUSTER, A. (1906b). "On the periodicities of sunspots." Philos. Trans. Roy.

Soc., A. 206:69-100.SCHWARTZ, L. (1957). Theorie des Distributions, Vol. I. Paris: Hermann.SCHWARTZ, L. (1959). Theorie des Distributions, Vol. II. Paris: Hermann.SCHWERDTFEGER, H. (1960). "Direct proof of Lanczos's decomposition the-

orem." Amer. Math. Mon. 67:856-860.SEARS, F. W. (1949). Optics. Reading: Addison-Wesley.SHAPIRO, H. S. (1969). Smoothing and Approximation of Functions. New York:

Van Nostrand.SHIRYAEV, A. N. (1960). "Some problems in the spectral theory of higher-order

moments, I." Theor. Prob. Appl. 5:265-284.SHIRYAEV, A. N. (1963). "On conditions for ergodicity of stationary processes in

terms of higher order moments." Theory Prob. Appl. 8:436-439.SHUMWAY, R. H. (1971). "On detecting a signal in N stationarily correlated

noise series." Technometrics. 13:499-519.SIMPSON, S. M. (1966). Time Series Computations in FORTRAN and FAP.

Reading: Addison-Wesley.SINGLETON, R. C. (1969). "An algorithm for computing the mixed radix fast

Fourier transform." IEEE Trans. Audio Elec. AU-17:93-103.

REFERENCES 483

SINGLETON, R. C., and POULTER, T. C. (1967). "Spectral analysis of the callof the male killer whale." IEEE Trans, on Audio and Electroacoustics. AU-15:104-113.

SIOTANI, M. (1967). "Some applications of Loewner's ordering of symmetricmatrices." Ann. Inst. Statist. Math. 19:245-259.

SKOROKHOD, A. V. (1956). "Limit theorems for stochastic processes." TheoryProb. Appl. 1:261-290.

SLEPIAN, D. (1954). "Estimation of signal parameters in the presence of noise."Trans. I.R.E. PGIT-3:82-87.

SLEPIAN, D. (1958). "Fluctuations of random noise power." Bell Syst. Tech. J.37:163-184.

SLUTSKY, E. (1929). "Sur 1'extension de la theorie de periodogrammes aux suitesdes quantites dependentes." Comptes Rendues. 189:722-733.

SLUTSKY, E. (1934). "Alcuni applicazioni di coefficienti di Fourier al analizo disequenze eventual! coherenti stazionarii." Giorn. d. Institute Italiano degliAtuari. 5:435-482.

SMITH, E. J., HOLZER, R. E., MCLEOD, M. G., and RUSSELL, C. T. (1967)."Magnetic noise in the magnetosheath in the frequency range 3-300 Hz." J.Geophys. Res. 72:4803-4813.

SOLODOVNIKOV, V. V. (1960). Introduction to the Statistical Dynamics of Auto-matic Control Systems. New York: Dover.

SRIVASTAVA, M. S. (1965). "On the complex Wishart distribution." Ann. Math.Statist. 36:313-315.

STIGUM, B. P. (1967). "A decision theoretic approach to time series analysis."Ann. Inst. Statist. Math. 19:207-243.

STOCKHAM, T. G., Jr., (1966). "High speed convolution and correlation." Proc.Spring Joint Comput. Conf. 28:229-233.

STOKES, G. G. (1879). Proc. Roy. Soc. 122:303.STONE, R. (1947). "On the interdependence of blocks of transactions." /. Roy.

Statist. Soc., B. 9:1-32.STRIEBEL, C. (1959). "Densities for stochastic processes." Ann. Math. Statist.

30:559-567.STUMPFF, K. (1937). Grundlagen und Methoden der Periodenforschung. Berlin:

Springer.STUMPFF, K. (1939). Tafeln und Aufgaben zur Harmonischen Analyse undPeriodo-

grammrechnung. Berlin: Springer.SUGIYAMA, G. (1966). "On the distribution of the largest latent root and corre-

sponding latent vector for principal component analysis." Ann. Math. Statist.37:995-1001.

SUHARA, K., and SUZUKI, H. (1964). "Some results of EEG analysis by analogtype analyzers and finer examinations by a digital computer." Ann. Inst. Statist.Math., Suppl. 3:89-98.

TAKEDA, S. (1964). "Experimental studies on the airplane response to the sidegusts." Ann. Inst. Statist. Math., Suppl. 3:59-64.

TATE, R. F. (1966). "Conditional-normal regression models." /. Amer. Statist.Assoc. 61:477-489.

484 REFERENCES

TICK, L. J. (1963). "Conditional spectra, linear systems and coherency." In TimeSeries Analysis, Ed. M. Rosenblatt, pp. 197-203. New York: Wiley.

TICK, L. J. (1966). "Letter to the Editor." Technometrics. 8:559-561.TICK, L. J. (1967). "Estimation of coherency." In Advanced Seminar on Spectral

Analysis of Time Series, Ed. B. Harris, pp. 133-152. New York: Wiley.TIMAN, M. F. (1962). "Some linear summation processes for the summation of

Fourier series and best approximation." Soviet Math. 3:1102-1105.TIMAN, A. F. (1963). Theory of Approximation of Functions of a Real Variable.

New York: Macmillan.TUKEY, J. W. (1949). "The sampling theory of power spectrum estimates." Proc.

on Applications of Autocorrelation Analysis to Physical Problems. NAVEXOS-P-735, pp. 47-67. Washington, D.C.: Office of Naval Research, Dept. of theNavy.

TUKEY, J. W. (1959a). "An introduction to the measurement of spectra." InProbability and Statistics, Ed. U. Grenander, pp. 300-330. New York: Wiley.

TUKEY, J. W. (1959b). "The estimation of power spectra and related quantities."In On Numerical Approximation, pp. 389-411. Madison: Univ. of WisconsinPress.

TUKEY, J. W. (1959c). "Equalization and pulse shaping techniques applied to thedetermination of initial sense of Rayleigh waves." In The Need of FundamentalResearch in Seismology, Appendix 9, pp. 60-129. Washington: U.S. Departmentof State.

TUKEY, J. W. (1961). "Discussion, emphasizing the connection between analysisof variance and spectrum analysis." Technometrics. 3:1-29.

TUKEY, J. W. (1965a). "Uses of numerical spectrum analysis in geophysics." Bull.I.S.I. 35 Session. 267-307.

TUKEY, J. W. (1965b). "Data analysis and the frontiers of geophysics." Science.148:1283-1289.

TUKEY, J. W. (1967). "An introduction to the calculations of numerical spectrumanalysis." In Advanced Seminar on Spectral Analysis of Time Series, Ed. B.Harris, pp. 25-46. New York: Wiley.

TUMURA, Y. (1965). "The distributions of latent roots and vectors." TRU Math-ematics. 1:1-16.

VAN DER POL, B. (1930). "Frequency modulation." Proc. Inst. Radio. Eng. 18:227.

VARIOUS AUTHORS (1966). "A discussion on recent advances in the technique ofseismic recording and analysis." Proc. Roy. Soc. 290:288-476.

VOLTERRA, V. (1959). Theory of Functional* and of Integrals and Integra-differ-ential Equations. New York: Dover.

VON MISES, R. (1964). Mathematical Theory of Probability and Statistics. NewYork: Academic.

VON MISES, R., and DOOB, J. L. (1941). "Discussion of papers on probabilitytheory." Ann. Math. Statist. 12:215-217.

WAHBA, G. (1966). Cross spectral distribution theory for mixed spectra and estima-tion of prediction filter coefficients. Ph.D. Thesis, Stanford University.

WAHBA, G. (1968). "One the distribution of some statistics useful in the analysis ofjointly stationary time series." Ann. Math. Statist. 39:1849-1862.

REFERENCES 485

WAHBA, G. (1969). "Estimation of the coefficients in a distributed lag model."Econometrica. 37:398-407.

WALDMEIR, M. (1961). The Sunspot Activity in the Years 1610-1960. Zurich:Schulthess.

WALKER, A. M. (1954). "The asymptotic distribution of serial correlation co-efficients for autoregressive processes with dependent residuals." Proc. Camb.Philos. Soc. 50:60-64.

WALKER, A. M. (1965). "Some asymptotic results for the periodogram of a sta-tionary time series." J. Austral. Math. Soc. 5:107-128.

WALKER, A. M. (1971). "On the estimation of a harmonic component in a timeseries with stationary residuals." Biometrika. 58:21-36.

WEDDERBURN, J. H. M. (1934). Lectures on Matrices. New York: Amer. Math.Soc.

WEGEL, R. L., and MOORE, C. R. (1924). "An electrical frequency analyzer."Bell Syst. Tech. J. 3:299-323.

WELCH, P. D. (1961). "A direct digital method of power spectrum estimation."1BMJ. Res. Deo. 5:141-156.

WELCH, P. D. (1967). "The use of the fast Fourier transform for estimation ofspectra: a method based on time averaging over short, modified periodograms."IEEE Trans. Electr. Acoust. AU-15:70.

WEYL, H. (1946). Classical Groups. Princeton: Princeton Univ. Press.WHITTAKER, E. T., and ROBINSON, G. (1944). The Calculus of Observations.

Cambridge: Cambridge Univ. Press.WHITTLE, P. (1951). Hypothesis Testing in Time Series Analysis. Uppsala: Alm-

qvist.WHITTLE, P. (1952a). "Some results in time series analysis." Skand. Aktuar. 35:

48-60.WHITTLE, P. (1952b). "The simultaneous estimation of a time series' harmonic and

covariance structure." Trab. Estad. 3:43-57.WHITTLE, P. (1953). "The analysis of multiple stationary time series." J. Roy.

Statist. Soc., B. 15:125-139.WHITTLE, P. (1954). "A statistical investigation of sunspot observations with

special reference to H. Alven's sunspot model." Astrophys. J. 120:251-260.WHITTLE, P. (1959). "Sur la distribution du maximim d'un polynome trigono-

metrique a coefficients aleatoires." Colloques Internationaux du Centre Nationalde la Recherche Scientifique. 87:173-184.

WHITTLE, P. (1961). "Gaussian estimation in stationary time series." Bull. Int.Statist. Inst. 39:105-130.

WHITTLE, P. (1963a). Prediction and Regulation. London: English UniversitiesPress.

WHITTLE, P. (1963b). "On the fitting of multivariate auto-regressions and theapproximate canonical factorization of a spectral density matrix." Biometrika.50:129-134.

WIDOM, H. (1965). "Toeplitz matrices." In Studies in Real and Complex Analysis,Ed. 1.1. Hirschman, Jr., pp. 179-209. Englewood Cliffs: Prentice-Hall.

WIENER, N. (1930). "Generalized harmonic analysis." Acta. Math. 55:117-258.

486 REFERENCES

WIENER, N. (1933). The Fourier Integral and Certain of its Applications. Cambridge:Cambridge Univ. Press.

WIENER, N. (1938). "The historical background of harmonic analysis." Amer.Math. Soc. Semicentennial Pub. 2:56-68.

WIENER, N. (1949). The Extrapolation, Interpolation and Smoothing of StationaryTime Series with Engineering Applications. New York: Wiley.

WIENER, N. (1953). "Optics and the theory of stochastic processes." J. Opt. Soc.Amer. 43:225-228.

WIENER, N. (1957). "Rhythms in physiology with particular reference to ence-phalography." Proc. Rud. Virchow Med. Soc. in New York. 16:109-124.

WIENER, N. (1958). Non-linear Problems in Random Theory. Cambridge: MITPress.

WIENER, N., SIEGEL, A., RANKIN, B., and MARTIN, W. T. (1967). DifferentialSpace, Quantum Systems and Prediction. Cambridge: MIT Press.

WIENER, N., and WINTNER, A. (1941). "On the ergodic dynamics of almostperiodic systems." Amer. J. Math. 63:794-824.

WILK, M. B., GNANADESIKAN, R., and HUYETT, M. J. (1962). "Probabilityplots for the gamma distribution." Technometrics. 4:1-20.

WILKINS, J. E. (1948). "A note on the general summability of functions." Ann.Math. 49:189-199.

WILKINSON, J. H. (1965). The Algebraic Eigenvalue Problem. Oxford: OxfordUniv. Press.

WILLIAMS, E. J. (1967). "The analysis of association among many variates."J. Roy. Statist. Soc., B. 29:199-242.

WINTNER, A. (1932). "Remarks on the ergodic theorem of Birkhoff." Proc. Nat.Acad. Sci. (U.S.A.). 18:248-251.

WISHART, J. (1931). "The mean and second moment coefficient of the multiplecorrelation coefficient in samples from a normal population." Biometrika.22:353-361.

WISHART, J., and BARTLETT, M. S. (1932). "The distribution of second ordermoment statistics in a normal system." Proc. Camb. Philos. Soc. 28:455-459.

WOLD, H. O. A. (1948). "On prediction in stationary time series." Ann. Math.Statist. 19:558-567.

WOLD, H. O. A. (1954). A Study in the Analysis of Stationary Time Series, 2nd ed.Uppsala: Almqvist and Wiksells.

WOLD, H. O. A. (1963). "Forecasting by the chain principle." In Time SeriesAnalysis, Ed. M. Rosenblatt, pp. 471-497. New York: Wiley.

WOLD, H. O. A. (1965). Bibliography on Time Series and Stochastic Processes.London: Oliver and Boyd.

WONG, E. (1964). "The construction of a class of stationary Markov processes."Proc. Symp. Applied Math. 16:264-276. Providence: Amer. Math. Soc.

WOOD, L. C. (1968). "A review of digital pass filtering." Rev. Geophysics. 6:73-98.WOODING, R. A. (1956). "The multivariate distribution of complex normal

variates." Biometrika. 43:212-215.WOODROOFE, M. B., and VAN NESS, J. W. (1967). "The maximum deviation of

sample spectral densities." Ann. Math. Statist. 38:1558-1570.

REFERENCES 487

WORLD WEATHER RECORDS. Smithsonian Miscellaneous Collections, Vol. 79(1927), Vol. 90 (1934), Vol. 105 (1947). Smithsonian Inst. Washington.

WORLD WEATHER RECORDS. 1941-1950 (1959) and 1951-1960 (1965). U.S.Weather Bureau, Washington, D.C.

WRIGHT, W. D. (1906). The Measurement of Colour. New York: Macmillan.YAGLOM, A. M. (1962). An Introduction to the Theory of Stationary Random

Functions. Englewood Cliffs: Prentice-Hall.YAGLOM, A. M. (1965). "Stationary Gaussian processes satisfying the strong

mixing condition and best predictable functional." In Bernoulli, Bayes,Laplace, Ed. J. Neyman and L. M. LeCam, pp. 241-252. New York: Springer.

YAMANOUCHI, Y. (1961). "On the analysis of the ship oscillations amongwaves—I, II, III." J. Soc. Naval Arch. (Japan). 109:169-183; 110:19-29; 111:103-115.

YULE, G. U. (1927). "On a method of investigating periodicities in disturbed series,with special reference to Wolfer's sunspot numbers." Phil. Trans. Roy. Soc.,A. 226:267-298.

YUZURIHA, T. (1960). "The autocorrelation curves of schizophrenic brain wavesand the power spectra." Psych. Neurol. Jap. 62:911-924.

ZYGMUND, A. (1959). Trigonometric Series. Cambridge: Cambridge Univ. Press.ZYGMUND, A. (1968). Trigonometric Series, Vols. I, II. Cambridge: Cambridge

Univ. Press.

NOTATION INDEX

a(«), 29a<r>(M), 317A(X), 29, 296A<T>(X), 300, 305, 307, 323/MX), 143AT

m(\), 132,244aveA', 16ave, 199arg z, 17

Br, 146flr(X), 143flr^X), 133,244

c0, 22ca<

T>, 236, 260c*,94, 116, 232CJT(T), 83, 123, 129, 150, 160, 183, 184ca6(«), 22, 42cjtjt(w), 23, 24, 116,232cxx<r>(«), 161, 182, 256cjrjr(r)(«), 167, 168cxx<r)(«,0, 165co6<

T>(H), 256Cr(X), 143Cj-^CX), 133, 244cov, 203cov {*, K}, 16, 19,90cov {X,Y),22cor (X, Y\, 16

c u m ( K i , . . . , Kr), 19c«, . . . « t ( / i , . . . , r*_i), 23,92c,,. . .« t ( / I , . . . , / * ) , 21

D«(a), 50, 162D[0,7r], 258dx(T)(X), 61,91, 120, 123,235^(X,/), 239Det A, 16

EX, 16

./^JVxCX), 297/», .. . a t (Xi , . . . ,X f c _! ) , 25, 92/., .. .a*(Xi Xt), 26/., 25/oi(X), 23£b(r)(X), 248, 261/xx(X), 24, 116,233fxx(T)(\), 132,142, 146, 147, 150, 164, 242,

243, 248, 282/rx(r)(X), 194/xxl"l(X), 155/ra6<T)(X), 256Fjrx(X), 25, 166/=ix(T)(X), 167, 168Fm;.W), 191

G(X), 302, 325^x(X), 177g«.(r)(X), 195, 300

488

NOTATION INDEX 489

//<r>(X), 124#2<

r>(X), 128H.™.. 0jt(X), 91

1,8I, 16/xx(T)(X), 120, 182Ixx<r>(X), 235/xx<F>(X,/), 164Ixx<V)(X,/), 239Imz, 17Im Z, 71

y^>(/4), 167yab(/0, 254JahW(A), 255

Jt^'>(M), 155/f<r>(«), 155

Hm, 98, 131

wafc(H), 41,68, 174mxx(«), 47, 80, 175mxx(T)(u), 115, 181

M<l»,S): 89WV,S), 89

oGU 52CK/U 52Opd), 423on.,.(l), 196

&YX2, 289£rx2, 189, 291|/?rx|2, 293|£yx|2, 294/?oi(X), 256/?ab^(X), 257^KAr(X), 297|*rx(X)|2, 296|-Ryx(r)(X)|2, 196, 198, 305, 307Re z, 17*Kay6.x(X), 297Re Z, 71

sgn X, 32, 165

T, translation, 28b, 291

/^T), 253t,c, 294, 305tr A, 16

V(0, 78V+(/), 76vec, 230, 287varX 16, 19va7, 149, 202

W"T>(a), 146Jfofc(

rK«), 248^r(«,S), 90Wr

c(fl,£), 90

X(r,o.), 104, 163

«/£), 70/3,,w, 56«(«), 17«l«l, >7A(r>(«), 86, 93nM,17^(a), 17, 26, 47, 101M,<Z), 70, 84, 85, 287£.2900(A), 86tf«6(X), 24

0(X), 303x,2, 126x-2(«), 151xV, 113, 127

(g), 230, 288>, 228, 287^, approximately congruent to, 120, 124\ transpose, 16, 70

, complex conjugate, 16, 70| |, absolute value of a matrix, 16[jk\, matrix, 16, 70*, convolution, 29+, generalized inverse, 87= , congruent to, 17[ ], integral part, 15, 84R, real matrix, 71| |, modulus, 17|| ||, matrix norm, 74*, Hilbert transform, 32, 105A, periodic extension, 65, 66, 167~ , associated process, 42

24

AUTHOR INDEX

Abel, N. H., 15, 55Abelson, R., 12Abramowitz, M., 191, 291, 334Aczel, J., 13Aitken, A. C., 73Akaike, H., 57,128,160,164,165,172,190,

191, 194, 207, 221, 226, 263, 266, 298,301,309,317,324,330

Akcasu, A. Z., 164Akhiezer, N. I., 55, 57Albert, A., 174Alberts, W. W., 12, 180Alexander, M. J., 292, 317Amos, D. E., 291, 295, 317Anderson, G. A., 341Anderson, T. W., 150, 340, 341, 372Arato, M., 12Arens, R., 79Aschoff, J., 12Autonne, L., 72

Balakrishnan, A. V., 38Barlow, J. S., 12, 180Bartels, J., 10Bartlett, M. S., 10, 12, 55, 100, 113, 128,

142, 160, 161, 164, 170, 173, 283, 354Bass, J., 81Batchelor, G. K.,11Baxter, G., 79Bellman, R., 71, 84, 283, 287, 399, 458

Bendat, J. S., 208Beranek, L. L., 11Bergland, G. D., 64Bernstein, S., 36, 445Bertrand, J., 12,180Bertrandias, J. B., 82Bessel, F. W., 113, 114,229Beveridge, W. H., 12, 179Billingsley, P., 43, 258, 259, 421, 439Bingham, C., 64, 66, 86, 120Bishop, R. E. D., 118Blackman, R. B., 55, 150, 179, 298Blanc-Lapierre, A., 10, 26, 36,163, 263Bochner, S., 8, 47, 55, 76, 399, 401Bode, H. W., 180Bohman, H., 55, 57, 69Bonferroni, C. E., 209Borel, E., 406Born, M., 11Bowley, A. L., 338Box, G. E. P., 13, 145, 166Bracewell, R., 181Brenner, J. L., 71Brigham, E. O., 64Brillinger, D. R., 5,9,12,26, 38,82,94,95,

110, 150,160, 165, 172, 173, 176, 188,194, 199, 226, 231, 238, 240, 245, 260,263, 279, 324, 332, 341, 343, 348, 353,368, 439, 449

Bryson, R. A., 181

490

AUTHOR INDEX 491

Bullard, E., 180Bunimovitch, V. I., 33Burgers, J. M., 11Burkhardt, H., 10Burley, S. P., 12Businger, P. A., 72Butzer, P. L., 57

Cairns, T. W., 64Calderon, A. P., 79Cantelli, F. P., 407Capon, J., 166Cargo,G. T., 415Carpenter, E. W., 1, 180Cartwright, D. E., 180, 225Cauchy, A. L., 55, 57, 395, 411Chambers,!. M., 341, 374Chance, B., 12Chapman, S., 10Chernoff, H., 96Choksi, J. R., 38Clevenson, M. L., 260Condit, H. R., 180Constantine, A. G., 374, 388Cooley, J. W., 64, 66, 120, 164Cootner, P. H., 12Cornish, E. A., 341Courant, R., 84, 399Coveyou, R. R., 12Craddock, J. M., 1, 109, 354Cramer, H., 10, 18, 25, 41, 43, 100, 102-

106, 108, 109, 114, 234, 258, 345, 354,432

Crandall, I. B., 11Creasy, M. A., 192

Daniell, P. J., 10, 142Daniels, H. E., 165Darroch, J. N., 340Darzell, J. F., 221Davis, C, 74, 94Davis, R. C, 94Deemer, W. L., 85Dempster, A. P., 341, 372, 374, 458Deutsch, R., 33Dickey, J. M., 192, 291Dirac, P. A. M., 17, 26, 101, 173, 235, 410Dirichlet, P. G. L., 55, 57Doeblin, W., 13Doob, J. L., 8, 18, 38,41,42,43Dressel, P. L., 20

Dubman, M. R., 110Dugundji, J., 33Duncan, D. B., 194Dunford, N., 348Dunnett, C. W., 192Durbin, J., 13, 182, 323,324Dutton, J. A., 181Dynkin, E. B., 36

Eckart, C., 75Edwards, R. E., 8, 17, 47, 50, 52, 53, 55,

82,91Ehrlich, L. W., 71Elderton, E. M., 113Elliot, C. M., 180Enochson, L. D., 120, 306, 312, 317Euler, L., 15Ezekiel, M. A., 291

Fehr, U., 11Feinstein, B., 12Fejer, L., 52, 55, 57Feldman, V. A., 180Feller, W., 36Fieller, E. C., 192Filon, L. N. G., 257, 292Fischer, E., 84, 399Fisher, R. A., 257, 292, 330, 341Fishman, G. S., 12, 226, 301Flood, C. R., 355Fortet, R., 10, 26, 36, 163, 263Fourier, J. J., 8, 9, 31, 49, 50, 52, 60-70,

73-75, 78,79, 88,91, 93-101,105,123,130, 132, 142, 160-163, 167, 194, 210,212, 221, 222, 228, 235, 247, 255, 260,262, 299, 332, 394, 396, 401

Fox, C., 291Fox, M., 191Freiberger, W., 254, 263Friedlander, S. K., 11Friedman, B., 84Fubini, G., 396

Gabor, D., 11Gajjar, A. V., 292Gauss, C. F.,55, 57, 188Gavurin, M. K., 74Gelfand, I., 76, 78, 79, 287, 400Gentleman, W. M., 64, 67Gersch, W., 297Gibbs, F. A., 12, 52, 181Gikman, I. I., 12

492 AUTHOR INDEX

Giri, N., 112,113, 292Girshick, M. A., 341Glahn, H. R., 372, 391Gnanadesikan, R., 126Godfrey, M. D., 5, 180Goldberger, A. S., 289Goldstein, R. M., 165Golub.G.H., 72, 341,374Good, I. J., 64, 65, 71,73Goodman, N. R., 9, 60, 71, 89, 90, 110,

114, 166,191, 208, 226, 240, 245, 262,294, 297, 298, 306, 309, 312, 317, 348

Granger, C. W. J., 12, 180, 226, 263, 298,309

Grass, A.M., 12,181Grenander, U., 10,12, 47, 54, 56, 84,128,

146, 150, 161, 174, 175, 176, 225, 229Gribbon, J. R., 180Grimshaw, M. E., 73Groves, G. W., 192, 207, 225, 295, 301,

306,317Grum, F., 180Gupta, R. P., 343Gupta, S. S., 90, 192, 343Gurland, J., 292Gyires, B., 254

Hajek,J., 12Hall, P., 257, 292Halmos, P. R., 43Halperin, M., 192Hamburger, H., 73Hamming, R. W., 52, 55, 57,66,161Hamon, B. V., 225Hannan, E. J., 10, 79, 128, 150, 174, 176,

182,192, 207,225, 295, 301, 306, 317,319, 324, 354, 372

Hartley, H. O., 191Hasselman, K., 26Hatanaka, M., 12,176, 263, 324Haubrich, R. A., 11, 180,225Helly, E., 394Heninger, J., 82Herglotz, G., 25Hewitt, E., 8Hext, G. R., 174Higgins, J., 12Hilbert, D., 32,60,104,234, 301, 353, 389,

400Hinich, M., 165Hodgson, V., 292

Hoff, J. C., 57Hoffman, K., 74, 456Holder, O., 409Hollien, H., 11Holzer, R. E., 11Hooper, J. W., 334, 372, 374Hopf, E., 11,43Horst, P., 354Hotelling, H., 108, 289, 340, 372, 374Howrey, E. P., 180Hoyt, R. S., 192Hsu, P. L., 257, 292, 374Hua, L. K., 44, 72Huyett, M. J., 126

Ibragimov, I. A., 181,260losifescu, M., 94, 98Isserlis, L., 21Ito, K., 38, 39Izenman, A. J., 341, 374

Jackson, D., 55Jagerman, D. L., 12James, A. T., 89, 294, 341, 343, 352, 374,

379, 388Jeffery, G. B., 113Jenkins, G. M., 13,120,166,226,263,298,

309,313Jennison, R. C., 11Jones, R. H., 142, 150, 165, 194, 330

Kabe, D. G., 90Kahan, W. M., 74Kahane, J., 98Kaiser, J. F., 60Kampe de FeYiet, J., 11, 38Kanazawa, M., 340, 454Kaneshige, I., 11, 226Karreman, H. F., 180Katz, S., 226Kawashima, R., 11,225Kawata, T., 94,128Keen, C. G., 1Kendall, M. G., 10,20,188, 253,289, 292,

314, 323, 372, 420Khatri, C. G., 85, 190, 226, 289, 294, 306Khintchine, A., 8, 10Kinosita, K., 11Kirchener, R. B., 13Kiviat, P. J., 12Knopp, K., 14

AUTHOR INDEX 493

Kolmogorov, A. N., 10, 42, 181Koopmans, L. H., 176, 291, 295, 297, 298,

317, 348Kramer, B. H., 226Kramer, H. P., 75, 108, 340Kramer, K. H., 291Krishnaiah, P. JR., 341Kromer, R. E., 164Kronecker, L., 17, 44, 148, 288Kshirsagar, A. M., 192, 291, 372Kuhn, H. G., 10Kuo, F. F., 60Kuo, M. T., 226

Labrouste, M. H., 11Lacape, R. S., 12, 180Lancaster, H. O., 372Lanczos, C., 55, 69, 71Landau, E., 52Latham, G., 11Laubscher, N. F., 191Lawley, D. N., 374Leadbetter, M. R., 18, 41, 43, 102, 258Lee, Y. W., 11,226,331Leonov, V. P., 20, 21, 26, 43, 94, 97Leppink, G. J., 165L6vy, P., 76Lewis, F. A., 84Lewis, P. A. W., 64, 164Lieberman, G. J., 96Lighthill, M. J., 17Loeve, M., 36, 407Lomnicki, Z. A., 172Loynes, R. M., 176

MacDonald, G. J. F., 11, 26MacDonald, N. J., 180MacDuffee, C. C., 70MacKenzie, G. S., 11MacKenzie, W. A., 11, 225, 330MacLaurin, C, 15MacNeil, I. B., 260MacPherson, R. D., 12Madansky, A., 341Madden, T., 225Majewski, W., 11Malevich, T. L., 260Malinvaud, E., 324Mallows, C. L., 341Mann, H. B., 13, 185, 204, 414Manwell, T., 180

Markov, A. A., 36, 45, 188, 396Martin, W. T., 43, 76, 399Maruyama, G., 98Mathews, M. V., 12, 75, 108, 340Maxwell, J. C, 11McGahan, L. C, 11McGucken, W., 10McLeod, M. G., 11McNeil, D. R., 165McShane, E. J., 38Medgyessy, P., 69Meecham, W. C, 11, 38Meggers, W. F., 10Middleton, D., 11,229Miller, K. S., 90Miller, R. G., 209, 229Miyata, M., 390Montgomery, J., 1Moore, C. N., 11, 163Moore, C. R., 54Morgenstern, O., 180Morrison, D. F., 289, 340, 372Morrow, R. E., 64Mortensen, R. E., 38Mowat, W. M. H., 1Munk, W. H., 11,26, 181,225Murakimi, S., 11,226Murthy, V. K., 263

Nagy, B. Sz., 400Nakamura, H., 11,226Naylor, T. H., 226Nerlove, M., 12, 180, 226Nessel, R. J., 57Nettheim, N., 266, 309Neudecker, H., 288Newton, H. W., 138Newton, I., 10Nicholls, D. F., 174Nisio, M., 38, 39Nissen, D. H., 288Noll, A. M., 11Nyquist, H., 179

Obukhov, A. M., 372Okamoto, M.,75, 340, 454Olkin, I., 85, 291, 341Olshen, R. A., 128Oswald, J. R. V., 33

Panofsky, H. A., 225Papoulis, A., 17

494 AUTHOR INDEX

Parseval-Deschenes, M. A., 163, 432Parthasarathy, K. R., 22, 98, 264Parzen, E., 12, 54-56, 60, 120, 150, 159,

161, 164,165, 167, 252, 298, 309, 313,324

Pearson, E. S., 191Pearson, K., 113, 257, 292Philipp, W., 94, 98Picinbono, B., 97Picklands, J., 165Piersol, A., 208Pierson, W. J., Jr., 221Pinsker, M. S., 11, 348, 377, 384Pisarenko, V. F., 165, 166, 225Pitcher, T. S., 94Plagemann, S. H., 180Platt, D. C., 1Poisson, S. D., 47, 55, 57, 91, 124Pollak, H. O., 146Polya, G., 415Portmann, W. O., 85Posner, E. C., 64Poulter, T. C., 5Pratt, J. W., 291Press, H., 9, 11, 54, 159, 179, 180Priestley, M. B., 160,174, 176, 324Pupin, M. I., 11, 163Pye, K., 12

Quenouille, M. H., 13

Raikov, D., 76, 78, 79, 287, 400Rankin, B., 43Rao, C. R., 75, 108, 229, 289, 330, 332,

340, 372, 414Rao, M. M., 12Richter, C. P., 12Ricker, N., 181,338Riemann, B., 55, 57Riesz, F., 55, 57, 400Roberts, J. B., 118Robinson, E. A., 10, 181, 225, 331, 338Robinson, G., 64Rodemich, E. R., 165Rodriguez-lturbe, 1., 225Root, W. L., 94Rosenberg, M., 31Rosenblatt, M., 5, 9, 26, 38, 94, 97, 128,

150, 160, 161, 173, 176, 225, 252, 254,259, 262

Ross, K. A., 8

Rozanov, Yu. A., 43, 348Russel, C. T., 11

Sacia, C. F., 11Salem, R., 98Sande, G., 64, 67Sargent, T. J., 180Sasser, W. E., 226Sato, H., 11Satterthwaite, F. E., 145Saxena, A. K., 90Scheffe, H., 229, 276, 277Schoenberg, I. J., 64, 73, 82Schur, I., 283Schuster, A., 9, 10, 11, 173, 181Schwabe, H. S., 138Schwartz, J. T., 348Schwartz, L., 27, 82Schwarz, H. A., 18,151, 262, 421, 426,460Schwerdtfeger, H., 72Sears, F. W., 11Shapiro, H. S., 57Shilov, G., 76, 78, 79, 287, 400Shiryaev, A. N., 20, 21, 26, 38, 41, 94, 97Shumway, R. H., 279Siegel, A., 11, 38Simon, M., 180Simpson, S. M., 10Singleton, R. C., 5, 66Siotani, M., 287Skorokhod, A. V., 12, 423Slepian, D., 12, 146Slutsky, E., 9, 10, 128, 170Smith, E. J., 11Snodgrass, F. E., 181Sobel, M., 192Solodovnikov, V. V., 11, 298, 331Srivastava, M. S., 90Stegun, I. A., 191,291,334Stieltjes, T. J., 394, 401Stigum, B. P., 12Stockham, T. G., Jr., 67Stokes, G. G., 9, 69Stone, R., 354, 400Straf, M. L., 259Stiiebel, C., 12Stuart, A., 20,188, 253, 289, 292, 314, 323,

372, 420Student, 253, 254Stumpff, K., 10, 64Sugiyama, G., 341

AUTHOR INDEX 495

Suhara, K., 12, 180Suzuki, H., 12, 180Szego, G., 84, 415

Takeda.S., 11,226Tate, R. F., 289Taylor, B., 199, 416, 424, 439Theodorescu, R., 94, 98Tick, L. J., 142, 298, 301, 330Timan, A. F., 57, 445Toeplitz, O., 72, 73, 74, 108Topper, L., 11Tukey, J. W., viii, 9, 10, 11, 26, 32, 54, 55,

57, 65, 120, 146, 150, 159, 161, 177,179,180, 199, 225, 309, 329, 341, 439,449

Tumura, Y., 341,374

de la Valle-Poussin, C, 55, 57Van der Pol, B., 11Van Ness, J. S., 9, 154, 265, 406, 408, 445Varadhan, S. R. S., 22Vok, C. A., 292, 317Volterra, V., 38Von Mises, R., 43

Wahba, G., 245, 295, 305, 309, 319, 336Waikar, V. B., 341Wald, A., 13, 185, 204, 414Waldmeir, M., 5Walker, A. M., 128, 172, 174, 183, 264Wallace, W. H., 226Ward, F., 180Watts, D. G., 298, 309, 313Wedderburn, J. H. M., 71, 72Wegel, R. L., 11, 163Weierstrass, K., 55

Welch, P. D., 64, 164Weyl, H., 422Whittaker, E. T., 64Whittle, P., 5, 9, 12, 13, 73, 98, 166, 174,

181,264,289,331,348Wielandt, H., 74, 456Wiener, N., 8, 9, 10,11, 12, 38, 41, 43, 76,

81,82,99,180,181,298,331,348Wiesner, J. B., 11Wilk, M. B., 126Wilkins, J. E., 57Wilkinson, J. H., 74, 456, 458Wilks, S. S., 389Williams, E. J., 289Wintner, A., 43, 99Wishart, J., 90, 113, 238, 240, 245, 246,

251,252,342,352Wold, H. O. A., 8, 10, 13, 43, 81, 121Wolfe, E., 11Wong, E., 36Wonnacott, T., 298Wood, L. C., 60Wooding, R. A., 89Woodroofe, M. B., 154, 265, 406, 408, 445Wright, W. D., 10, 180Wright, W. W., 12

Yaglom, A. M., 18, 354Yamanouchi, Y., 11, 180, 191, 221, 263,

309, 330Yevjevich, V., 225Young, G., 75Yule, G. U., 5Yuzuriha, T., 12, 180

Zaremba, S. K., 172Zetler, B. D., 225Zygmund, A., 8, 50, 98, 394, 406

SUBJECT INDEX

Acoustics, 11Adjustment, seasonal, 180, 209Algorithm, Fast Fourier, 13, 65,67, 88.

132,160, 167,212,222,255Alias, 177Aliasing, 267Analysis, canonical, 368, 391

cross-spectral, 225, 226factor, 354Fourier, 8, 26frequency, 10, 11, 12, 34, 179generalized harmonic, 41harmonic, 7, 8, 10multiple regression, 222power spectral, 179principal component, 366, 367regression, 301spectral, 10

Approach, functional, 41, 43, 80, 100stochastic, 41,43, 100

Argument, 17

Bandwidth, 32, 54, 57, 157, 158, 164, 165,350

Bias, 154, 158Biology, 12Bispectrum, 26

Classification, balanced one-way, 276Coefficient, canonical correlation, 376

Coefficient (continued)complex regression, 292, 296, 300, 322,

330, 332, 336filter, 317Fourier, 50partial complex regression, 297regression, 289squared sample multiple correlation, 189vector alienation, 390vector correlation, 335, 390

Coherence, 214, 257, 275, 325, 329, 332,333,382

canonical, 382, 390multiple, 219, 296, 302, 307, 310, 312,

317, 331, 334partial, 311, 312

Coherency, 257, 297, 330, 347, 364intraclass, 277partial, 297, 300, 302, 306, 311

Color, 180Comb, Dirac, 17, 26, 101

Kronecker, 17Communicate, 20Component, frequency, 104, 117, 353

principal, 106, 107, 108, 337, 339, 340,342

Consistent, 149, 168, 176, 182Convergence, in distribution, 258

weak, 258

496

SUBJECT INDEX 497

Convolution, 61, 67Correlation, 16, 289, 330, 332

canonical, 289, 372conditional, 290multiple, 289, 293, 300, 302, 333, 336partial, 289, 291, 293, 295, 302, 335

Cosinusoid, 8, 28, 35, 40, 62, 81, 104Covariance, 16

partial, 289, 293, 335, 336Cross-periodogram, 9, 306, 327Cross-spectrum, 23, 233

partial, 297Cumulant, 19, 341

Decomposition, singular value, 72, 87Delay, group, 304, 329Delta, Dirac, 17, 100

Kronecker, 17, 148Demodulate, complex, 33Demodulation, complex, 32, 47Design, experimental, 276

filter, 58Determinant, 16Discrimination, 180, 391Distribution, asymptotic, 88

complex normal, 89, 109complex Wishart, 90, 342finite dimensional, 18normal, 89, 332Schwartz, 82Student's /, 253uniform, 336Wishart, 90, 314

Domain, frequency, 13, 94, 337time, 13

Economics, 12Electroencephalogram (EEG), 180Engineering, electrical, 11Equation, integral, 226

stochastic difference, 38stochastic differential, 38

Equations, simultaneous, 323, 324Errors, in variables, 323Estimate, best linear, 321

consistent, 146, 306least squares, 174, 185, 188, 321maximum likelihood, 112, 183, 190, 330nonlinear, 326spectral, 130, 142spectral measure, 170

Estimation, spectral by prefiltering, 159Expansion, perturbation, 334

power series, 75, 77Taylor, 199Volterra functional, 38, 40

Expected value, 16Extension, period T, 65, 66, 83

Factor, convergence, 52, 54, 55, 90Filter, 8, 16, 27, 337, 344

band-pass, 32, 97, 104, 117, 162, 176digital, 60inverse, 30linear, 28low-pass, 32, 58summable, 30matched, 299nonsingular, 30optimum linear, 295, 299realizable, 29, 78, 85stable, 48summable, 29

Formula, Euler-MacLaurin, 15Poisson summation, 47, 91, 124

FORTRAN, 66Frequency, 8,40

angular, 23folding, 179Nyquist, 179radian, 23unknown, 69

Function, autocorrelation, 18autocovariance, 18, 22, 119, 166Bessel, 113, 114, 229characteristic, 39, 69circular autocovariance, 167cross-correlation, 18cross-covariance, 18, 221, 232fixed, 88generalized, 17, 27holomorphic, 75, 85hypergeometric, 229joint cumulant, 21linear discriminant, 391matrix-valued, 86mean, 18measurable, 43random, 8, 18real holomorphic, 77, 85sample autocovariance, 169stochastic, 8

498 SUBJECT INDEX

Function (continued)transfer, 28, 187, 196, 345transition probability, 36

Gain, 302, 310, 325, 358, 361Geophysics, 11, 225Group, finite, 64

Harmonic, first, 138Hook, 20

Identity, 16Inequality, Schwarz, 18, 151, 262Inference, 12Integral, stochastic, 102Interval, confidence, 151Inverse, generalized, 87Isomorphism, 71

Kernel, 54, 155FejeY, 132

Law, iterated logarithm, 98Law of large numbers, 99Least squares, 188, 221Limits, confidence, 252, 352, 387Loop, feed-back, 324

Matrix, 16, 70block circulant, 84circulant, 73complex, 70conditional spectral density, 336error spectral density, 307, 381finite Toeplitz, 72, 108Hermitian, 70, 71, 287Jacobian, 75, 85non-negative definite, 70, 287spectral density, 24, 233, 242, 247, 333unitary, 70, 71, 346

Mean, Cesaro, 14sample, 83

Measure, probability, 41spectral, 25, 166, 168

Medicine, 12Meteorology, 225Mixing, 8, 9Model, parametric, 166Modulus, 17Monomial, 62, 63Motion, Brownian, 283

Normal, asymptotically, 90, 228, 340complex multivariate, 89, 313

Notation, Landau, 52Numbers, sunspot, 5, 127, 137, 138, 141,

153, 170, 171

Oceanography, 225, 390Operation, linear, 27

time invariant, 28, 80Order (m,M), 37Orthogonal, 18

Part, integral, 84Partition, indecomposable, 20Path, sample, 18Periodicity, hidden, 9,173,181Periodogram, 9, 120, 128

cross-, 9fcth order, 9second-order, 120, 235smoothed, 131third-order, 9

Permanent, 110Phase, 302, 310, 325, 358, 361Phenomenon, Gibbs', 52Physics, 10Plot, chi-squared probability, 126, 141

normal probability, 96, 97Polynomial, Bernoulli, 15

trigonometric, 57, 62Power, instantaneous, 118Prediction, 300

linear, 78, 181, 331Predictor, best linear, 289, 331, 332, 336Prefiltering, 154, 220, 318, 329, 330, 349Prewhitening (prefiltering), 159, 266Probability, initial, 36Probability 1,43,98Procedure, jack-knife, 374Process, autoregressive

(autogressive scheme), 335Brownian motion, 114circular, 110ergodic, 43, 45Gaussian, 9, 284linear, 31, 35, 39, 100, 319m dependent, 335Markov, 36, 45mixed moving average and

autoregressive, 37point, 165white noise, 332

SUBJECT INDEX 499

Product, Kronecker, 288Program, computer, 66, 322, 389Psychology, 12Psychometrician, 354

Rainfall, English and Welsh, 121,122,139,140

Rank, 46Ratio, signal to noise, 299Realization, 18, 331Region, confidence, 154, 206, 314

multiple confidence, 229Regression, 188, 367Representation, Cramer, 100, 102, 106,

234, 345, 354spectral, 25, 80, 81

Resolution, 166Response, impulse, 29, 204, 223Root, latent (latent value), 69, 70, 76, 84,

107, 165, 339-341, 343, 366, 378

Sampling, jittered, 165Scheme, autoregressive, 37, 77, 84, 159,

164, 184, 320, 321, 324Seismology, 1, 225, 338Semi-invariant (cumulant), 20Series, canonical variate, 379, 382

continuous, 177dependent, 186deterministic, 186discrete, 177error, 186, 345, 369fixed, 82, 186, 231Fourier, 49Gaussian, 36, 110, 165, 283, 298, 324,

366independent, 186index number, 338instrumental, 323, 324principal component, 344-348, 351, 353,

357pure noise, 35, 39, 141, 180residual, 174stationary, 35stationary Gaussian, 36, 39, 167stochastic, 82, 186time, 1, 18trigonometric, 53white noise (white noise process), 321,

332Signal, 299

Sinusoid, 8Smoothing, 181Spacing, general time, 178Spectrum, 70, 74, 109

amplitude, 24co-, 24, 234, 279cross-, 23, 233cumulant, 25, 34, 39, 92error, 186, 196, 227, 296, 300, 330mixed, 173phase, 24power, 11, 23, 74, 116-119, 177, 179quadrature, 24, 234, 279residual, 180second-order, 23, 232, 260

Stationary, 22Stationary, 8

second-order, 22strictly, 22, 42wide sense, 22

Statistic, descriptive, 10, 179sufficient, 12, 111Wilks' X, 389

Stochastic process, 8, 18Stochastics, 17Sum, partial, 52System, structural equation, 324

Taper, 54, 91, 124, 150, 156, 364cosine, 151

TemperaturesBasel, 2, 271-275, 355-365Berlin, 2, 209-219, 239-242, 267-275,

327-330, 355-365Breslau, 2, 271-275, 355-365Budapest, 2, 271-275, 355-365Copenhagen, 2, 271-275, 355-365De Bill, 2, 271-275, 355-365Edinburgh, 2, 271-275, 355-365Greenwich, 2, 271-275, 355-365New Haven, 2, 271-275, 355-365Prague, 2, 271-275, 355-365Stockholm, 2, 271-275, 355-365Trondheim, 2, 271-275, 355-365Vienna, 2, 95-97, 209-219, 240-242,

268-275, 327-330, 355-365Vilna, 2, 271-275, 355-365

Testing, informal, 180Theorem, central limit, 94, 95

Gauss-Markov, 188Kolmogorov extension, 42

500 SUBJECT INDEX

Theorem (continued)Spectral, 72Wielandt-Hoffman, 74

Trajectory, 18Transform, discrete Fourier, 63,67,70,73,

221fast Fourier, 64, 68, 120, 142, 162, 262finite Fourier, 9, 60, 69, 88, 90,94, 105,

235,299Fourier, 49, 75, 123,164, 332Fourier-Stieltjes, 85Hilbert, 32, 59, 60, 104, 234, 301, 353,

389Transformation, Abel, 15

variance stabilizing, 150, 311, 314, 329Transient, 227Transitive, metrically, 43Transpose, 16, 70Trend, 43, 44, 174, 176Trispectrum, 26Turbulence, 11

Value, extremal, 70latent, 69, 70, 76, 84, 107, 287, 342

Variance, 16Variate, canonical, 371, 372, 376, 377, 388

chi-squared, 126,145complex /, 192error, 289, 301exponential, 126F, 189multivariate /, 192, 291noncentral chi-squared, 127noncentral F, 196, 228normal, 20,112,289, 332/, 184uniform, 111, 182, 333

Vector, 16latent, 70, 165, 339, 340, 342, 344, 366

Window, data, 54, 91frequency, 54

This invited paper is one of a series planned on topics of general inter-est-The Editor.

Manuscript received June 7, 1974; revised August 13, 1974. Thispaper was prepared while the author was a Miller Research Professorand was supported by NSF under Grant GP-31411.

The author is with the Department of Statistics, University of Cali-fornia, Berkeley, Calif. 94720.

501

I. INTRODUCTION

T HE FOURIER analysis of data has a long history, dat-ing back to Stokes [1] and Schuster [2], for example.It has been done by means of arithmetical formulas

(Whittaker and Robinson [3], Cooley and Tukey [4]), bymeans of a mechanical device (Michelson [5]), and by meansof real-time filters (Newton [6], Pupin [7]). It has been car-ried out on discrete data, such as monthly rainfall in the Ohiovalley (Moore [8]), on continuous data, such as radiated light(Michelson [5]), on vector-valued data, such as vertical andhorizontal components of wind speed (Panofsky and McCor-mick [9]), on spatial data, such as satellite photographs (Leeseand Epstein [10]), on point processes, such as the times atwhich vehicles pass a position on a road (Bartlett [11]), and on

ADDENDUMFourier Analysis of Stationary Processes

Reprinted with permission from Proceedings of theIEEE, Volume 62, No. 12, December 1974. Copyright ©1974—The Institute of Electrical and Electronics Engi-neers, Inc.

Abstract-Tim papet begins with a description of some of the impor-tant procedures of the Fourier analysis of real-valued stationary discretetime series. These procedures include the estimation of the power spec-trum, the fitting of finite parameter models, and the identification oflinear time invariant systems. Among the results emphasized is the onethat the large sample statistical properties of the Fourier transform aresimpler than those of the series itself. The procedures are next gen-eralized to apply to the cases of vector-valued series, multidimensionaltime series or spatial series, point processes, random measures, andfinally to stationary random Schwartz distributions. It is seen that therelevant Fourier transforms are evaluated by different formulas in thesefurther cases, but that the same constructions are carried out after theirevaluation and the same statistical results hold. Such generalizationsare of interest because of current work in the fields of picture process-ing and pulse-code modulation.

502 ADDENDUM

point processes in space, such as the positions of pine trees in afield (Bartlett [12]). It has even been carried out on thelogarithm of a Fourier transform (Oppenheim et al [ 13]) andon the logarithm of a power spectrum estimate (Bogert et al.[14]).

The summary statistic examined has been: the Fourier trans-form itself (Stokes [1]), the modulus of the transform(Schuster [2]), the smoothed modulus squared (Bartlett[15]), the smoothed product of two transforms (Jones [16]),and the smoothed product of three transforms (Hasselmanetal [17]).

The summary statistics are evaluated in an attempt to mea-sure population parameters of interest. Foremost among theseparameters is the power spectrum. This parameter was initiallydefined for real-valued-time phenomena (Wiener [18]). In re-cent years it has been defined and shown useful for spatialseries, point processes, and random measures as well. Our de-velopment in this paper is such that the definitions set downand mathematics employed are virtually the same for all ofthese cases.

Our method of approach to the topic is to present first anextensive discussion of the Fourier analysis of real-valueddiscrete-time series emphasizing those aspects that extend di-rectly to the cases of vector-valued series, of continuous spatialseries, of point processes, and finally of random distributions.We then present extensions to the processes just indicated.Throughout, we indicate aspects of the analysis that are pecu-liar to the particular process under consideration. We alsomention higher order spectra and nonlinear systems. Wold[19] provides a bibliography of papers on time series analysiswritten prior to 1960. Brillinger [20] presents a detailed de-scription of the Fourier analysis of vector-valued discrete-timeseries.

We now indicate several reasons that suggest why Fourieranalysis has proved so useful in the analysis of time series.

II. WHY THE FOURIER TRANSFORM?Several arguments can be advanced as to why the Fourier

transform has proved so useful in the analysis of empiricalfunctions. For one thing, many experiments of interest havethe property that their essential character is not changed bymoderate translations in time or space. Random functionsproduced by such experiments are called stationary. (A defini-tion of this term is given later.) Let us begin by looking for aclass of functions that behave simply under translation. If, forexample, we wish

for t > 0 and so f(t) =/(0) exp {ott} for a = In Ci. If/(f) is tobe bounded, then a = i\, for i = >/-T and X real. We have beenled to the functions exp {i\t}. Fourier analysis is concernedwith such functions and their linear combinations.

On the other hand, we might note that many of the opera-tions we would like to apply to empirical functions are linearand translation invariant, that is such that; if Xi(t)-+ Y$t)and X2(t) -> F2(0 then a^C) + a^X2(t) -» ^Y^t) +<*2 Y2 (t) and if X(t) -> 7(0 then X(t - w) -» Y(t - u). Such op-erations are called linear filters. It follows from these condi-tions that if X(t) = exp {i\t} -»• Y\(t) then

X(t + u) = exp {i\u} X(t) -> exp {iKt} Yx(t) = Y(t + «).

Setting u = t, t - 0 gives 7x(f) = exp {i\t} 7\(0). In sum-mary, exp {i\t} the complex exponential of frequency X iscarried over into a simple multiple of itself by a linear filter.A($ = 7x(0) is called the transfer function of the filter. If thefunction X(t) is a Fourier transform, X(t) = / exp {iat} x(a)da, then from the linearity (and some continuity) X(t) -+/exp ioct A(a) x(ct) da. We see that the effect of a linear filteris easily described for a function that is a Fourier transform.

In the following sections, we will see another reason for deal-ing with the Fourier transforms of empirical functions,namely, in the case that the functions are realizations of a sta-tionary process, the large sample statistical properties of thetransforms are simpler than the properties of the functionsthemselves.

Finally, we mention that with the discovery of fast Fouriertransform algorithms (Cooley and Tukey [4]), the transformsmay often be computed exceedingly rapidly.

III. STATIONARY REAL-VALUED DISCRETE-TIME SERIESSuppose that we are interested in analyzing T real-valued

measurements made at the equispaced times t = 0, • • • , T- 1.Suppose that we are prepared to model these measurements bythe corresponding values of a realization of a stationarydiscrete-time series X(t), t = 0, ±1, ±2, • • • . Important param-eters of such a series include its mean,

giving the average level about which the values of the series aredistributed and its autocovariance function

FOURIER ANALYSIS OF STATIONARY PROCESSES 503

with Ci =£ 0, then by recursion

504 ADDENDUM

providing a measure of the degree of dependence of values ofthe process \u\ time units apart. (These parameters do not de-pend on t because of the assumed stationarity of the series.)In many cases of interest the series is mixing, that is, such thatvalues well separated in time are only weakly dependent in aformal statistical sense to be described later. Suppose, in par-ticular, that cxx(u) ~* 0 sufficiently rapidly as \u \ -*• °° for

to be defined. The parameter fxxO^ *s called the power spec-trum of the series X(t) at frequency X. It is symmetric about 0and has period 27T. The definition (3) may be inverted to ob-tain the representation

of the au toco variance function in terms of the powerspectrum.

If the series X(t) is passed through the linear filter

with well-defined transfer function

then we can check that

and, by taking Fourier transforms, that

under some regularity conditions. Expression (6), the fre-quency domain description of linear filtering, is seen to bemuch nicer than (5), the time-domain description.

Expressions (4) and (6) may be combined to obtain an inter-


pretation of the power spectrum at frequency X. Suppose thatwe consider a narrow band-pass filter at frequency X havingtransfer function

In words, the power spectrum of the series X(t) at frequency Xis proportional to the variance of the output of a narrow band-pass filter of frequency X. In the case that X =£ 0, ±2ir, ±47r, • • •the mean of the output series is 0 and the variance of the out-put series is the same as its mean-squared value*. Expression(7) shows incidentally that the power spectrum is nonnegative.

We mention, in connection with the representation (4), thatKhintchine [21 ] shows that for X(t) a stationary discrete timeseries with finite second order moments, we necessarily have

where FXX^ *s a monotonic nondecreasing function.FxxQ^ is called the spectral measure. Its derivative is thepower spectrum. Going along with (8), Cramer [22] demon-strated that the series itself has a Fourier representation

(In these last expressions, if 8(X) is the Dirac delta functionthen r?(X) = 2 5(X- 27T/) is the Kronecker comb.) Also expres-

with A small. Then the variance of the output series Y(t), o'the filter, is given by

where Zj^(X) is a random function with the properties;

506 ADDENDUM

sion (11) concerns the covariance of two complex-varied vari-ates. Such a covariance is defined by cov {X, Y} -E{(X- EX)(Y~ EY)}.) Expression (9) writes the series X(t)as a Fourier transform. We can see that if the series X(t) ispassed through a linear filter with transfer function A(\), thenthe output series has Fourier representation

These remarks show that the finite Fourier transform may beinterpreted as, essentially, the result of narrow band-pass filter-ing the series.

Before presenting a second interpretation, we first remarkthat the sample covariance of pairs of values X(t), Y(t), t = Q,1, • • • , T- 1 is given by T~l 2 X(t) Y(f), when the Y(t) valueshave 0 mean. This quantity is a measure of the degree of linearrelationship of the X(t) and Y(t) values. The finite Fouriertransform is essentially, then, the sample covariance betweenthe X(t) values and the complex exponential of frequency X.It provides some measure of the degree of linear relationshipof the series X(t) and phenomena of exact frequency X.

In Section XV, we will see that the first and second-order rela-tions (10), (11) may be extended to fcth order relations withthe definition of fcth order spectra.

IV. THE FINITE FOURIER TRANSFORMLet the values of the series X(t) be available for t = 0, 1, 2,

' " ,T- 1 where tT is an integer. The finite Fourier transformof this stretch of series is defined to be

A number of interpretations may be given for this variate. Forexample, suppose we take a linear filter with transfer functionconcentrated at the frequency X, namely A (a) = 6 (a - X). Thecorresponding time domain coefficients of this filter are

The output of this filter is the series


In the case that X = 0, the finite Fourier transform (12) isthe sample sum. The central limit theorem indicates condi-tions under which a sum of random variables is asymptoticallynormal as the sample size grows to °°. Likewise, there aretheorems indicating that d^(\) is asymptotically normal asT -»• °°. Before indicating some aspects of these theorems weset down a definition. A complex-valued variate w is calledcomplex normal with mean 0 and variance a2 when its realand imaginary parts are independent normal variates withmean 0 and variance a2/2. The density function of w is pro-portional to exp {- |w|2/a2}. The variate |w|2 is exponentialwith mean a2 in this case.

In the case that the series X(t) is stationary, with finitesecond-order moments, and mixing (that is, well-separatedvalues are only weakly dependent) the finite Fourier transformhas the following useful asymptotic properties as T -*• °°:

a) d^P(O) - Tcx is asymptotically normal with mean 0 andvariance 2*nTfxx(Q);

b) for X =£ 0, ±TT, ±27r, • • • , d^(\) is asymptotically com-plex normal with mean 0 and variance 2nTfxxO*>)',

c) for si(T), / = ! , • • • , / integers with X'CD = 2irsf(T)/T^>-X =£ 0, ±TT, ±27T, • • • the variates ^(X^D), • • • ,d^\\J(T)) are asymptotically independent complexnormals with mean 0 and variance 2itTfxxOd>

d) for X ̂ 0, ±TT, ±27r, • • • and U = T/J and integer, thevariates

are asymptotically, independent complex normals withmean 0 and variance 2irUfxxO^-

These results are developed in Brillinger [20]. Related re-sults are given in Section XV and proved in the Appendix.Other references include: Leonov and Shiryaev (23], Picin-bono [24], Rosenblatt [25], Brillinger [26], Hannan andThomson [27]. We have seen that exp {z'Xf} d^(X) may beinterpreted as the result of narrow band-pass filtering theseries X(t). It follows that the preceding result b) is consistentwith the "engineering folk" theorem to the effect that narrowband-pass noise is approximately Gaussian.

Result a) suggests estimating the mean GX by

and approximating the distribution of this estimate by a nor-

508 ADDENDUM

in the case X =£ 0, ±2ir, • • • . We will say more about this sta-tistic later. It is interesting to note, from c) and d), thatasymptotically independent statistics with mean 0 and vari-ance proportional to the power spectrum at frequency X maybe obtained by either computing the Fourier transform atparticular distinct frequencies near X or by computing them atthe frequency X but based on different time domains. Wewarn the reader that the results a)-d) are asymptotic. Theyare to be evaluated in the sense that they might prove reason-able approximations in practice when the domain of observa-tion is large and when values of the series well separated in thedomain are only weakly dependent.

On a variety of occasions we will taper the data before com-puting its Fourier transform. This means that we take a datawindow 0^(0 vanishing for t < 0, t > T - 1, and compute thetransform

mal distribution with mean 0 and variance 2irfxx(Q)/T. Re-sult b) suggests estimating the power spectrum fxxO^) by theperiodogram

for selected values of X. One intention of tapering is to reducethe interference of neighboring frequency components. If

then the Cramer representation (9) shows that (14) may bewritten

From what we have just said, we will want to choose $T^(t)so that &T\ot) is concentrated near a = 0, ±2n, • • • . (Oneconvenient choice of <^T\t) takes the form 0(f/r) where0(w) = 0 for u < 0, u > 1.) The asymptotic effect of taperingmay be seen to be to replace the variance in b) by27T 2 0(r)(f)2/^(X).

Hannan and Thomson [27] investigate the asymptotic dis-tribution of the Fourier transform of tapered data in a casewhere fxxQd depends on T in a particular manner. The hopeis to obtain better approximations to the distribution.


V. ESTIMATION OF THE POWER SPECTRUMIn the previous section, we mentioned the periodogram,

/j^f(A), as a possible estimate of the power spectrum /jrjr(X)in the case that X =£ 0, ±27T, • • • . If result b) holds true, then/^(X), being a continuous function of d^(X), will be dis-tributed asymptotically as |w|2, where w is a complex normalvariate with mean 0 and variance fxxO^- Tnat is fjcxQti willbe distributed asymptotically as an exponential variate withmean fxxO^- From the practical standpoint this is interest-ing, but not satisfactory. It suggests that no matter how largethe sample size T is, the variate /j^y(X) will tend to be dis-tributed about fxxOd With an appreciable scatter. Luckily,results c) and d) suggest means around this difficulty. Follow-ing c), the variates 1^0^(T))J = ! , - • • , / are distributedasymptotically as independent exponential variates with meanfxx&)- Their average

will be distributed asymptotically as the average of / indepen-dent exponential variates having mean fxxO^- That is, it willbe distributed as

where \\j denotes a chi-squared variate with 2J degrees offreedom. The variance of the variate (17) is

if U = T/J. By choice of J the experimenter can seek to obtainan estimate of which the sampling fluctuations are smallenough for his needs. From the standpoint of practice, itseems to be useful to compute the estimate (16) for a numberof values of /. This allows us to tailor the choice of / to thesituation at hand and even to use different values of / for dif-ferent frequency ranges. Result d) suggests our considerationof the estimate

It too will have the asymptotic distribution (17) with variance(18).

We must note that it is not sensible to take / in (16) and(19) arbitrarily large as the preceding arguments might havesuggested. It may be seen from (15) that

510 ADDENDUM

If we are averaging / periodogram values at frequencies 2ir/Tapart and centered at X, then the bandwidth of the kernel of(21) will be approximately 4nJ/T. If / is large and fxxfa)varies substantially in the interval -2irJ/T<a- \<2irJ/T,then the value of (21) can be very far from the desired fxjr(X).In practice we will seek to have / large so that the estimate isreasonably stable, but not so large that it has appreciable bias.This same remark applies to the estimate (19). Parzen [28]constructed a class of estimates such that Ef^^(K) -+fxxQdand var /^(M •*• 0. These estimates have an asymptotic dis-tribution that is normal, rather than x3> Rosenblatt [29].Using the notation preceding these estimates correspond tohaving / depend on T in such a way that JT -*• °°, but JffT -*• 0as T -* °°.

Estimates of the power spectrum have proved useful; i) assimple descriptive statistics, ii) in informal testing and discrim-ination, iii) in the estimation of unknown parameters, and iv)in the search for hidden periodicities. As an example of i), wemention their use in the description of the color of an object,Wright [30]. In connection with ii) we mention the estima-tion of the spectrum of the seismic record of an event in at-tempt to see if the event was an earthquake or a nuclear explo-sion, Carpenter [31], Lampert et al [32]. In case iii), wemention that Munk and MacDonald [33] derived estimates ofthe fundamental parameters of the rotation of the Earth fromthe periodogram. Turning to iv), we remind the reader thatthe original problem that led to the definition of the powerspectrum, was that of the search for hidden periodicities. As a

where

is the Fejer kernel. This kernel, or frequency window, is non-negative, integrates to 1, and has most of its mass in the inter-val (-2JT/71, 27T/D. The term in ex may be neglected for X =£ 0,±27r, • • • and T large. From (16) and (20) we now see that


modern example, we mention the examination of spectral es-timates for the periods of the fundamental vibrations of theEarth, MacDonald and Ness [34].

VI. OTHER ESTIMATES OF THE POWER SPECTRUMWe begin by mentioning minor modifications that can be

made to the estimates of Section V. The periodograms of (16)may be computed at frequencies other than those of the formIns/T, s an integer, and they may be weighted unequally. Theperiodograms of the estimate (19) may be based on overlap-ping stretches of data. The asymptotic distributions are not sosimple when these modifications are made, but the estimate isoften improved. The estimate (19) has another interpretation.We saw in Section IV that exp {z'Xf} d^\\J) might be inter-preted as the output of a narrow band-pass filter centered at X.This suggests that (19) is essentially the first power spectralestimate widely employed in practice, the average of thesquared output of a narrow band-pass filter (Wegel and Moore[35 ]). We next turn to a discussion of some spectral estimatesof quite different character.

We saw in Section HI that if the series X(t) was passedthrough a linear filter with transfer function A(\), then theoutput series Y(t) had power spectrum given by /yy(X) =\A(\)\2fxxO^- In Section V, we saw that the estimates (16),(19) could have substantial bias were there appreciable varia-tion in the value of the population power spectrum. These re-marks suggest a means of constructing an improved estimate,namely: we use our knowledge of the situation at hand to de-vise a filter, with transfer function A(\), such that the outputseries Y(t) has spectrum nearer to being constant. We thenestimate the power spectrum of the filtered series in the man-ner of Section V and take M(X)r2/yy(X) as our estimate offxxOd- This procedure is called spectral estimation by pre-whitening and is due to Tukey (see Panofsky and McCormick[9]). We mention that in many situations we will be contentto just examine /^(X). This would be necessary wereA(\) = 0.

One useful means of determining an A(X) is to fit an auto-regressive scheme to the data by least squares. That is, forsome K, choose £(1), • • • ,ft(K) to minimize

where the summation extends over the available data. In thiscase 1(X) = !+£(!) exp {-iX} + • • • + ti(K) exp {-iA£>. Analgorithm for efficient computation of the £(«) is given inWiener [36, p. 136]. This procedure should prove especiallyeffective when the series X(t) is near to being an autoregressive

512 ADDENDUM

scheme of order K. Related procedures are discussed inGrenander and Rosenblatt [37, p. 270], Parzen [38], Lacoss[39], and Burg [40]. Berk [41] discusses the asymptotic dis-tribution of the estimate \A(\)r*(2irT)~l 2 [X(t) +ti(\)X(t- !) + • • • + d(K) X(t - K)|2. Its asymptotic varianceis shown to be (18) with U=2K.

Pisarenko [42] has proposed a broad class of estimates in-cluding the high resolution estimate of Capon [43] as a par-

/\

ticular case. Supp6se S is an estimate of the covariance matrixof the variate

determined from the sample values AT(0), • • • , X(T - 1). Sup-pose jlu, au, u = 1, • • • , U are the latent roots and vectors ofXv

E. Suppose H(p.), 0 < ju < °°, is a strictly monotonic functionwith inverse h(-). Pisarenko proposed the estimate

He presents an argument indicating that the asymptotic vari-ance of this estimate is also (18). The hope is that it is lessbiased. Its character is that of a nonlinear average of periodo-gram values in contrast to the simple average of (16) and (19).The estimates (16) and (19) essentially correspond to the caseH(n) = ju. The high resolution estimate of Capon [43] corre-sponds to H(n) = ju"1.

The autoregressive estimate, the high-resolution estimate andthe Pisarenko estimates are not likely to be better than anordinary spectral estimate involving steps of pre whitening,tapering, naive spectral estimation and recoloring. They areprobably better than a naive spectral estimate for a series thatis a sum of sine waves and noise.

VII. FINITE PARAMETER MODELSSometimes a situation arises in which we feel that the form

of the power spectrum is known except for the value of a finitedimensional parameter 6. For example existing theory maysuggest that the series X(t) is generated by the mixed movingaverage autoregressive scheme

where U, V are nonnegative integers and e(t) is a series of


Expression (25) is the likelihood corresponding to the assump-tion that the periodogram valuesI^(2ns/T), Q<s< T/2, areindependent exponential variates with means fxx(lirs/T', 6),0 < s < T/2, respectively. Under regularity conditions we canshow that this estimate, #, is asymptotically normal with mean6 and covariance matrix 2irT~lA~l(A + B)A~l where; ifVfxxO^> 0) is tne gradient vector with respect to 6 andfxxxxthe 4th order cumulant spectrum (see Section XV)

independent variates with mean 0 and variance a2. The powerspectrum of this series is

with 0 = a2, a(l), • • • ,a(K), b(l\ • • • , b(L). A number ofprocedures have been suggested for estimating the parametersof the model (23), see Hannan [44] and Anderson [45], forexample.

The following procedure is useful in situations more generalthan the above. It is a slight modification of a procedure ofWhittle [46]. Choose as an estimate of 6 the value thatmaximizes

We may carry out the maximization of (25) by a number ofcomputer algorithms, see the discussion in Chambers [47]. In[48], we used the method of scoring. Other papers investi-gating estimates of this type are Whittle [49], Walker [50],and Dzaparidze [51].

The power spectrum itself may now be estimated byfxxO^'y ® )• This estimate will be asymptotically normal withmean /^(X; 8) and variance 27r7""1V/jrjr(X; 6)TA~l(A + B) •A~l Vjjo^X; 0) following the preceding asymptotic normal dis-tribution for 6. In the case that we model the series by an

514 ADDENDUM

autoregressive^ scheme and proceed in the same way, the esti-mate fxxO^'y 0) has the character of the autoregressive estimateof the previous section.

VIII. LINEAR MODELSIn some circumstances we may find ourselves considering a

linear time invariant model of the form

where the values X(t), S(t), t = 0, 1, • • • , T- 1 are given, e(t)is an unknown stationary error series with mean 0 and powerSpectrum f€€(\), the a(u) are unknown coefficients, n is an un-known parameter, and S(t) is a fixed function. For example,we might consider the linear trend model

with ju and a unknown, and be interested in estimating/ee(X).Or we might have taken S(t) to be the input series to a linearfilter with unknown impulse-response function a(u), u = 0,±1, • • • in an attempt to identify the system, that is, to estimatethe transfer function ^4(X) = S a(u) exp {-/X«} and the a(u).The model (26) for the series X(t) differs in an important wayfrom the previous models of this paper. The series X(t) is notgenerally stationary, because EX(t) - ju + Sa(/ - u)S(u).

Estimates of the preceding parameters may be constructedas follows: define

with similar definitions for dJP(X), d^T\\), Then (26) leadsto the approximate relationship

for / = 1, • • • , / . Following b) of Section IV, the d(T\X(T))are, for large T, approximately independent complex normalvariates with mean 0 and variance 27r7>/€e(X). The approximatemodel (28) is seen to take the form of linear regression. Theresults of linear least-squares theory now suggest our considera-tion of the estimates,

for some integer P. In some circumstances it may be appro-priate to taper the data prior to computing the Fourier trans-form. In others it might make sense to base the Fouriertransforms on disjoint stretches of data in the manner of d) ofSection IV.

Under regularity conditions the estimate A^T\\) may beshown to be asymptotically complex normal with mean A(\)and variance J~lf€€(\)f^\\Tl (see [20]). The degree of fitof the model (26) at frequency X may be measured by thesample coherence function


and

where

with similar definitions for /£y, /£y, fj>p. The impulse re-sponse could be estimated by an expression such as

satisfying

This function provides a time series analog of the squaredcoefficient of correlation of two variates (see Koopmans[52]).

The procedure of prefiltering is often essential in the estima-tion of the parameters of the model (26). Consider a commonrelationship in which the series X(t) is essentially a delayedversion of the series S(t), namely

for some v. In this case

and

516 ADDENDUM

If v is large, the complex exponential fluctuates rapidly about0 as / changes and the first term on the right-hand side of (30)may be near 0 instead of the desired a exp {-i\v}fjg\\). Auseful prefiltering for this situation is to estimate v by v, thelag that maximizes the magnitude of the sample cross-covari-ance function, and then to cany out the spectral computationson the data X(t), S(t - v), see Akaike and Yamanouchi [53]and Tick [54]. In general, one should prefilter the X(t) seriesor the 5(0 series or both, so that the relationship between thefiltered series is as near to being instantaneous as is possible.

The most important use of the calculations we have describedis in the identification of linear systems. It used to be the casethat the transfer function of a linear system was estimated byprobing the system with pure sine waves in a succession ofexperiments. Expression (29) shows, however, that we canestimate the transfer function, for all X, by simply employinga single input series S(t) such that .$p(X) *£ 0.

In some situations we may have reason to believe that thesystem (26) is realizable that is a(u) = 0 for u < 0. The factor-ization techniques of Wiener [36] may be paralleled on thedata in order to obtain estimates of A(\), a(u) appropriate tothis case, see Bhansali [55]. In Section IX, we will discuss amodel like (26), but for the case of stochastic S(t).

Another useful linear model is

with 0i(t), • - • , $£•(?) given functions and B\, • • • , 6% un-known. The estimation of these unknowns and /ee(X) is con-sidered in Hannan [44] and Anderson [45], This modelallows us to handle trends and seasonal effects.

Yet another useful model is

with fi, Pi, 61, ai, • - -, PR, QK> OK unknown. The estimationof these unknowns and fee(\) is considered in Whittle [49].It allows us to handle hidden periodicities.

IX. VECTOR-VALUED CONTINUOUS SPATIAL SERIESIn this section we move on from a consideration of real-

valued discrete time series to series with a more complicated do-main, namely p-dimensional Euclidean space, and with a morecomplicated range, namely /--dimensional Euclidean space. Thisstep will allow us to consider data such as: that received by anarray of antennas or seismometers, picture or TV, holographic,turbulent field.

Provided we set down our notation judiciously, the changes


involved are not dramatic. The notation that we shall adoptincludes the following: boldface letters such as X, a, A willdenote vectors and matrices. AT will denote the transpose of amatrix A, tr A will denote its trace, det A will denote its de-terminant. EX will denote the vector whose entries are theexpected values of the corresponding entries of the vector-valued variate X. cov [X, Y} = E{(X - EX)(Y - EYf} willdenote the covariance matrix of the two vector-valued variatesX, Y (that may have complex entries), t, u, \ will lie in p-dimensional Euclidean space, Rp, with

The limits of integrals will be from -°° to °°, unless indicatedotherwise.

We will proceed by paralleling the development of SectionsIII and IV. Suppose that we are interested in analyzing mea-surements made simultaneously on r series of interest at loca-tion t, for all locations in some subset of the hypercube0 < ti, • • • , tp < T. Suppose that we are prepared to modelthe measurements by the corresponding values of a realizationof an r vector-valued stationary continuous spatial series X(t),t E.RP. We define the mean

the autocovariance function

and the spectral density matrix

in the case that the integral exists. (The integral will existwhen well-separated values of the series are sufficiently weaklydependent.) The inverse of the relationship (31) is

518 ADDENDUM

Let

As in Section HI, expressions (32) and (33) may be combinedto see that the entry in row /, column k of the matrix fxxOdmay be interpreted as the covariance of the series resultingfrom passing the /th and fcth components of X(t) through nar-row band-pass filters with transfer functions A (a) - 6(tt - X).

The series has a Cramer representation

be a linear filter carrying the r vector-valued series X(t) intothe s vector-valued series Y(t). Let

denote the transfer function of this filter. Then the spectraldensity matrix of the series Y(t) may be seen to be

where Z^-(X) is an r vector-valued random function with theproperties

If 7(0 is the filtered version of X(t), then it has Cramerrepresentation

We turn to a discussion of useful computations when valuesof the series X(t) are available for t in some subset of thehypercube 0 < t\, • • • , tp < T. Let 0^(0 be a data windowwhose support (that is the region of locations where 0^00 ̂0) is the region of observation of X(t). (We might take 0(*\t)of the form 0(f/T) where 0(0 = 0 outside 0<f i , • • • , fp < 1.)We consider the Fourier transform

based on the observed sample values.


Before indicating an approximate large sample distributionfor d£\\), we must first define the complex multivariatenormal distribution and the complex Wishart distribution. Wesay that a vector-valued variate X, with complex entries, ismultivariate complex normal with mean 0 and covariancematrix _E when it has probability density proportional toexp {-XT2T1X}. We shall say that a matrix-valued variate iscomplex Wishart with n Degrees of freedom and parameter Swhen it has the form X\X\ + • • • + XnX%, where Xi} • - • , Xn

are independent multivariate complex normal variates withmean 0 and covariance matrix 2. In the one dimensional case,the complex Wishart with n degrees of freedom is a multiple ofa chi-squared variate with In degrees of freedom.

In the case that well-separated values of the series X(t) areonly weakly dependent, the d^\\) have useful asymptoticproperties as T -*• °°. These include:

a') d$f\Q) is asymptotically multivariate normal with meanf<j>(T\t)dtcx and covariance matrix (2nf /0(r)(02 dtfxxW>

b') for X ̂ 0, djjp(X) is asymptotically multivariate complexnormal with mean 0 and covariance matrix

c') for X'(r) -> X =£ 0, with \*(T) - \k(T) not tending to 0too rapidly, 1 < / < k < /, the variates d^r)(X!(r)), • • • ,djjP(X(T)) are asymptotically independent multivariate com-plex normal with mean 0 and covariance matrix

/ = 1 , • • • , / are asymptotically independent multivariate com-plex normal with mean 0 and respective covariance matrices<2ffy ft(T\t,n2dtfxx(X),j = 1, • • - , / .

Specific conditions under which these results hold are givenin Section XV. A proof is given in the Appendix.

Results a'), b') are forms of the central limit theorem. Inresult d') the Fourier transforms are based on values of X(t)over disjoint domains. It is interesting to note, from c') andd') that asymptotically independent statistics may be obtained

520 ADDENDUM

by either taking the Fourier transform at distinct frequenciesor at the same frequency, but over disjoint domains.

Result a') suggests estimating the mean GX by

where / is chosen large enough to obtain acceptable stability,but not so large that the estimate becomes overly biased.From c ) the asymptotic distribution of the estimate (37) iscomplex Wishart with J degrees of freedom and parameterfxxOd- lfl tne case J = 1 this asymptotic distribution is that°f fxxOdxljIU- Result d') suggests the consideration of theperiodogram matrices

will have as asymptotic distribution J~l times a complexWishart with / degrees of freedom and parameter fxxO^ fol-lowing result d'). We could clearly modify the estimates (37),(39) by using a finer spacing of frequencies and by averagingperiodograms based on data over nondisjoint domains, Theexact asymptotic distributions will not be so simple in thesecases.

The method of fitting finite parameter models, described in

Result b') suggests the consideration of the periodogram matrix

as an estimate of fxxO$ when X =£ 0. From b') its asymptoticdistribution is complex Wishart with 1 degree of freedom andparameter /y^(X). This estimate is often inappropriate becauseof its instability and singularity. Result c') suggests the con-sideration of the estimate

/ = 1 ,'•' fJ as estimates of fxxO^y X ̂ 0. The estimate


Section VII, extends directly to this vector-valued situation.Result b') suggests the replacement of the likelihood function(25) by

in this new case for some large values S\, • •' , Sp such thatthere is little power left beyond the cutoff frequency(2nSi/T, • • •, 2itSp/T). Suppose that G is the value of 6leading to the maximum of (40). Under regularity conditions,we can show that 0 is asymptotically normal with mean 9 andcovariance matrix 2nT~lA~l(A + R)A~l where if Afif,^/k are

row /, column k of A, B

with Cabj(a) the entry in row a column b of

In a number of situations we find ourselves led to consideran (r + $) vector-valued series,

satisfying a linear model of the form

for some s vector ;j and some s X r matrix-valued functiona(u). The model says that the average level of the series X(t)at position t, given the series S(t), is a linear filtered version ofthe series S(t). If (41) is a stationary series and if A(\) is thetransfer function of the filter a(w), then (42) implies

522 ADDENDUM

If we define the error series e(t) by

then the degree of fit of the model (42) may be measured bythe error spectral density

The relationships (43M45) suggest the estimates

respectively. The asymptotic distributions of these statisticsare given in [26].

If there is a possibility that the matrix f$jj\\) might becomenearly singular, then we would be better off replacing the esti-mate (46) by a frequency domain analog of the ridge regressionestimate (Hoerl and Kennard [56], Hunt [57]), such as

for some k > 0 and / the identity matrix. This estimate in-troduces further bias, over what was already present, but it ishoped that its increased stability more than accounts for this.In some circumstances we might choose k to depend on X andto be matrix-valued.

X. ADDITIONAL RESULTS nsr THE SPATIAL SERIES CASEThe results of the previous section have not taken any essen-

tial notice of the fact that the argument t of the random func-tion under consideration is multidimensional. We now indicatesome new results pertinent to the multidimensional character.

In some situations, we may be prepared to assume that theseries X(t), t £ Rp, is isotropic, that is the autocovariancefunction cxx(") = cov {X(t + u), X(t)} is a function of \u\ only.In this case the spectral density matrix fxxO^ *s ̂ so rotation-ally symmetric, depending only on |X|. In fact (see in Bochnerand Chandrasekharan [58, p. 69])


where /^(O is the Bessel function of the first kind of order fe.The relationship (50) may be inverted as follows,

where the \f(T) are distinct, but with IX'CDI near |X|. Thereare many more \i(T) with |X/CT)I near |X| than there are X/(Dwith \*(T) near X. It follows that we generally obtain a muchbetter estimate of the spectrum in this case over the estimate inthe general case. Also the number of X'(T') with |X;(r)| near |X|increases as |X| increases. If follows that the estimate formedwill generally be more stable for the frequencies with |X| large.Examples of power spectra estimated in this manner may befound in Mannos [59].

Another different thing that can occur in the general pdimensional case is the definition of marginal processes andmarginal spectra. We are presently considering processesX(ti,''' ,tp). Suppose that for some n, 1 < n < p, we areinterested in the process with tn+i, • • • , tp fixed, say at 0, • • • ,0. By inspection we see that the marginal process X(t\,' ' ° ,tn,0, • • • , 0) has autocovariance function cxx(ui>'' ' > un> 0 > ' ' ' >0). The spectral density matrix of the marginal process is,therefore,

The simplified character of fxxO^ m the isotropic case makesits estimation and display much simpler. We can estimate itby an expression such as

We see that we obtain the spectral density of the marginalprocess by integrating the complete spectral density. The sameremark applies to the Cramer representation for

524 ADDENDUM

Vector-valued series with multidimensional domain are dis-cussed in Hannan [44] and Brillinger [26].

XI. ADDITIONAL RESULTS IN THE VECTOR CASEIn the case that the series X(t) is r vector-valued with r > 1,

we can describe analogs of the classical procedures of multi-variate analysis including for example; i) partial correlation,ii) principal component analysis, iii) canonical correlation anal-ysis, iv) cluster analysis, v) discriminant analysis, vi) multi-variate analysis of variance, and vii) simultaneous equations.These analogs proceed from c') or d') of earlier section. Theprocedures listed are often developed for samples from multi-variate normal distributions. We obtain the time series pro-cedure by identifying the d$\tf(T))t /« 1 , • • • , / or d£\\t /),/ - 0, • • • , / - 1 with independent multivariate normals havingmean 0 and covariance matrix (2it~f / <fiT\rf dt fxxOd andsubstituting into the formulas developed for the classical situa-tion. For example, stationary time series analogs of correlationcoefficients are provided by the

the coherency at frequency X of the y'th component with thefcth component of X(t), where )J-fc(A) is the entry in row /',column k offxxQd an^ < V A A ) ̂ ̂ e entfy m row; of djjp(A)for /', k = 1, • • • , r. The parameter /tyt(A) satisfies 0 <Utyt(A)l < 1 and is seen to provide a measure of the degree oflinear relationship of the series Xj(t) with the series Xk(t) atfrequency A. Its modulus squared, Ufyfc(A)|2, is called thecoherence. It may be estimated by

where fj£\\) is an estimate of///t(A).As time series papers on corresponding multivariate topics,

we mention in case i) Tick [60], Granger [61], Goodman[62], Bendat and Piersol [63], Groves and Hannan [64], andGersch [65]; in case ii) Goodman [66], Brillinger [67], [20],and Priestley et al. [68]; in case iii) Brillinger [67], [20],


Miyata [69], and Priestley et al [68]; in case iv) Ligett [70];in case v) Brillinger [20]; in case vi) Brillinger [71]; in casevii) Brillinger and Hatanaka [72], and Hannan and Terrell [73].

Instead of reviewing each of the time series analogs we con-tent ourselves by indicating a form of discriminant analysisthat can be carried out in the time series situation. Supposethat a segment of the r vector-valued series X(t) is availableand that its spectral density matrix may be any one of/}(X),i = ! , • • • , / . Suppose that we wish to construct a rule forassigning X(t) to one of the/j-(A).

In the case of a variate U coming from one of / multivariatenormal populations with mean 0 and covariance matrix 2,-,i = 1 , • • • , / , a common discrimination procedure is to define adiscriminant score

for the i th population and then to assign the observation U tothe population for which the discriminant score has the highestvalue (see Rao [74, p. 488]). The discriminant score is essen-tially the logarithm of the probability density of the zthpopulation.

Result 2) suggests a time series analog for this procedure. Ifthe spectral density of the series X(t) is //(X), the log densityof d^'(X) is essentially

This provides a discriminant score for each frequency X. Amore stable score would be provided by the smoothed version

with /$$(X) given by (37) or (39). These scores could beplotted against X for i = 1 , - • • , / in order to carry out therequired discrimination. In the case that the//(X) are unknown,their values could be replaced by estimates in (52).

XII. ADDITIONAL RESULTS IN THE CONTINUOUS CASEIn Section IX, we changed to a continuous domain in con-

trast to the discrete domain we began with in Section III. Inmany problems, we must deal with both sorts of domains,because while the phenomenon of interest may correspond toa continuous domain, observational and computational con-siderations may force us to deal with the values of the processfor a discrete domain. This occurrence gives rise to the com-plication of aliasing. Let Z denote the set of integers, 2 ~0, ±1, • • • . Suppose X(t), t G Rp, is a stationary continuousspatial series with spectral density matrix fxxW ancl Cramerrepresentation

526 ADDENDUM

XIII. STATIONARY POINT PROCESSESA variety of problems, such as those of traffic systems,

queues, nerve pulses, shot noise, impulse noise, and micro-scopic theory of gases lead us to data that has the character oftimes or positions in space at which certain events have oc-curred. We turn now to indicating how the formulas we havepresented so far in this paper must be modified to apply todata of this new character.

Suppose that we are recording the positions in p-dimensionalEuclidean space at which events of r distinct types occur. For; = 1, • • • , r let Xj(t) = Xj(t\, • • • , tp) denote the number ofevents of the /th type that occur in the hypercube (0, ti ] X• • • X (0, t p ] . Let dXj(t) denote the number that occur in thesmall hypercube (/i, /i + dfj X • • • X (tp> tp + d t p ] . Supposethat joint distributions of variates such as dX(tl), • • • , dX(tk)are unaffected by simple translation of tl, • • • r*, we then saythat X(t) is a stationary point process.

Stationary point process analogs of definitions set downpreviously include

Suppose X(t) is observable only for t G Zp. For these valuesoff

This is the Cramer representation of a discrete series withspectral density matrix

We see that if the series X(t) is observable only for 16 Zpt then

there is no way of untangling the frequencies

These frequencies are called the aliases of the fundamentalfrequency A.

Cx is called the mean intensity of the process,


This last refers to an (r + s) vector-valued point process. Itsays that the instantaneous intensity of the series X(t) at posi-tion t, given the location of all the points of the process S(u),is a linear translation invariant function of the process S(M).The locations of the points of X(t) are affected by where thepoints of S(u) are located. We may define here a stationaryrandom measure de(t) by

The change in going from the case of spatial series to thecase of point processes is seen to be the replacement of X(t) dtby dX(t). In the case that well-separated increments of theprocess are only weakly dependent, the results a')-d') of Sec-tion IX hold without further redefinition.

We next indicate some statistics that it is useful to calculatewhen the process X(t) has been observed over some region.The Fourier transform is now

for the data window <^T\t) whose support corresponds to thedomain of observation. If r = 1 and points occur at the posi-tions TI , Ta, * * ' , then this last has the form

We may compute Fourier transforms for different domains inwhich case we define

528 ADDENDUM

References to the theory of stationary point processes in-clude: Cox and Lewis [75], Brillinger [76], Daley and Vere-Jones [77], and Fisher (78]. We remark that the material ofthis section applies equally to the case in which dX(t} is ageneral stationary random measure, for example with p, r = 1,we might take dX(t) to be the amount of energy released byearthquakes in the time interval (t, t + dt). In the next sectionwe indicate some results that do take note of the specificcharacter of a point process.

XIV. NEW THINGS IN THE POINT PROCESS CASEIn the case of a point process, the parameters ex, Cxx(u)

have interpretations further to their definitions (53), (54).Suppose that the process is orderly, that is the probability thata small region contains more than one point is very small.Then, for small dt

Cj dt = EdXj(t) = Pr [there is an event of type / in (t, t + dt] ].

It follows that Cj may be interpreted as the intensity withwhich points of type / are occurring. Likewise, for « =£ 0

It follows that

In the case that the processes Xj(t) and X^t) are independent,expression (62) is equal to Cjdu.

If the derivative c^(«) = dCjk(u)/du exists for u =£ 0 it iscalled the cross-covariance density of the two processes in thecase / =£ k and the autocovariance density in the case / = k. Formany processes

and so the power spectrum of the process ATy(r) is given by

For a Poisson process c//(w) = 0 and so .fay (X) = (27r)~pcy.The parameter (lit)pfxx(Q)/cx is useful in the classification

of real-valued point processes. From 1)


It follows that, for large T, (2ir)pfXx(Wcx is the ratio of thevariance of the number of points in the hypercube (0, T]p forthe process X(t) to the variance of the number of points in thesame hypercube for a Poisson process with the same intensityGX- For this reason we say that the process X(t) is under-dispersed or clustered if the ratio is greater than 1 and over-dispersed if the ratio is less than 1.

The estimation procedure described in Section XI for modelswith a finite number of parameters is especially useful in thepoint process case as, typically, convenient time domainestimation procedures do not exist at all. Results of applyingsuch a procedure are indicated in [79].

XV. STATIONARY RANDOM SCHWARTZ DISTRIBUTIONSIn this section, we present the theory of Schwartz distribu-

tions (or generalized functions) needed to develop propertiesof the Fourier transforms of random Schwartz distributions.These last are important as they contain the processes dis-cussed so far in this paper as particular cases. In addition theycontain other interesting processes as particular cases, such asprocesses whose components are a combination of the processesdiscussed so far and such as the processes with stationary in-crements that are useful in the study of turbulence, seeYaglom [80]. A further advantage of this abstract approach isthat the assumptions needed to develop results are cut back toessentials. References to the theory of Schwartz distributionsinclude Schwartz [81 ] and Papoulis [82].

Let 2) denote the space of infinitely differentiable functionson Rp with compact support. Let S denote the space of in-finitely differentiable functions on Rp with rapid decrease,that is such that if 0^(0 denotes a derivative of order q then

A continuous linear functional on 3) is called a Schwartz dis-tribution or generalized function. The Dirac delta functionthat we have been using throughout the paper is an example.A continuous linear functional on 3) is called a tempereddistribution.

Suppose now that a random experiment is being carried out,the possible results of which are continuous linear maps Xfrom 3) to L*(P), the space of square integrable functions for aprobability measure P. Suppose that r of these maps are col-lected into an r vector, A"(0). We call X((J>) an r vector-valuedrandom Schwartz distribution. It is possible to talk about

530 ADDENDUM

things such as E Jf($), cov {X(<j>), X(\J/)} in this case. An im-portant family of transformations on D consists of the shiftsSu defined by 5M0(r) = 0(r + u), t,u£Rp. The randomSchwartz distribution is called wide-sense stationary when

for all u E /?** and 0, ^/ € 3). It is called strictly stationarywhen all the distributions of finite numbers of values are in-variant under the shifts.

Let us denote the convolution of two functions 0, i// € 3) by

and the Fourier transform of a function in S by the corre-sponding capital letter

then we can set down the following Theorem.Theorem 1: (Ito [83], Yaglom [80].) If X(<t>), (j> G 3) is a

wide-sense stationary random Schwartz distribution, then

and

where ex is an r vector, cxx(') is an r X r matrix of tempereddistributions, FXX&) i$ a nonnegative matrix-valued measuresatisfying

for some nonnegative integer k, and finally Zj^-(X) is a randomfunction satisfying


The spatial series of Section IX is a random Schwartz dis-tribution corresponding to the functional

for 0 €= 3). The representations indicated in that section maybe deduced from the results of Theorem 1. It may be shownthat k of (67) may be taken to be 0 for this case.

The stationary point process of Section XII is likewise arandom Schwartz distribution corresponding to the functional

for 063). The representations of Section XII may be deducedfrom Theorem 1. It may be shown that k of (67) may betaken to be 2 for this case.

Gelfand and Vilenkin [84] is a general reference to thetheory of random Schwartz distributions. Theorem 1 isproved there.

A linear model that extends those of (42) and (58) to thepresent situation is one in which the (r + s) vector-valued sta-tionary random Schwartz distribution

suggesting that the system may be identified if the spectraldensity may be estimated. We next set down a mixing assump-tion, before constructing such an estimate and determining itsasymptotic properties.

Given k variates Xl, • • • , Xk let cum {X\ , • • • , - Xk} denotetheir joint cumulant or semi-invariant. Cumulants are defined

satisfies

In the case that the spectral measure is differentiate this lastimplies that

532 ADDENDUM

and discussed in Kendall and Stuart [85] and Brillinger [20].They are the elementary functions of the moments of thevariates that vanish when the variates are independent. Assuch they provide measures of the degree of dependence ofvariates. We will make use of

Assumption 1. X(<j>) is a stationary random Schwartz dis-tribution with the property that for 0 t, • • • , 0fc € S and

for some finite mi, • • • , m^.j, L^.In the case that the spectral measure F^^(X) is differentiable,

relation (65) corresponds to the case k = 2 of (72). The char-acter of Assumption 1 is one of limiting the size of the cumu-lants of the functional of the process X(<f>). It will be shownthat it is a form of weak dependence requirement, for func-tionals of the process that are far apart in t, in the Appendix.The function /fl( ...afc(X

!, • • • , X*~!) appearing in (72) is calleda cumulant spectrum of order k, see Brillinger [86] and thereferences therein. From (66) we see that it is also given by

The fact that it only depends on k - 1 arguments results fromthe assumed stationarity of the process.

Let 0(T)(f) = 0(f/r) with 0 € 2). As an analog of the Fouriertransforms of Sections IX and XII we now define

for the stationary random Schwartz distribution X($). We cannow state the following theorem.

Theorem 2: If Assumption 1 is satisfied, if <f^(X) is givenby (74) ajad if T \ \*(T) - X*(D | -+« 1 < / < fc < /, thenl)-4) of Section IX hold.

This theorem is proved in the Appendix. It provides a justi-fication for the estimation procedures suggested in the paper


and for the large sample approximations suggested for the dis-tributions of the estimates.

We end this section by mentioning that a point process withevents at positions T^, k = 1, • • • may be represented by thegeneralized function

the sampled function of Section III may be represented by thegeneralized function

and that a point process with associated variate S may berepresented by

see Beutler and Leneman [87]. Matheron [92] discusses theuse of random Schwartz distributions in the smoothing of maps.

XVI. HIGHER ORDER SPECTRA AND NONLINEAR SYSTEMSIn the previous section we have introduced the higher order

cumulant spectra of stationary random Schwartz distributions.In this section we will briefly discuss the use of such spectraand how they may be estimated.

In the case that the process under consideration is Gaussian,the cumulant spectra of order greater than two are identically0. In the non-Gaussian case, the higher order spectra provideus with important information concerning the distribution ofthe process. For example were the process real-valued Poissonon the line with intensity c^, then the cumulant spectrum oforder k would be constant equal to Cff(2n)1 ~*. Were theprocess the result of passing a series of independent identicallydistributed variates through a filter with transfer functionA(\), then the cumulant spectrum of order k would be pro-portional to

Such hypotheses might be checked by estimating higher cumu-lant spectra.

An important use of higher order spectra is in the identifica-tion of polynomial systems such as those discussed inWiener [88] and Brillinger [86] and Halme [89]. Tick [90]shows that if S(t) is a stationary real-valued Gaussian series, ife(f) is an independent stationary series and if the series X(t)is given by

534 ADDENDUM

Suppose that no proper subset of X1, • • • , Xfc sums to 0. Itthen follows from the principal relation connecting momentsand cumulants that

and fgsx(\ M) is a third-order cumulant spectrum. It followsthat both the linear transfer function A(\) and the bitransferfunction B(\, //) of the system may be estimated, fromestimates of second- and third-order spectra, following theprobing of the system by a single Gaussian series. Referencesto the identification of systems of order greater than 2, andto the case of non-Gaussian S(t) are given in [86].

We turn to the problem of constructing an estimate of a fcthorder cumulant spectrum. In the course of the proof ofTheorem 2 given in the Appendix, we will see that

then

where


as a naive estimate of the spectrum fa ...ak(\l, • • • , X*"1' pro-

vided that no proper subset of X1, • • • , \k~l sums to 0. Fromwhat we have seen in the case k = 2 this estimate will be un-stable. It'follows that we should in fact construct an estimateby smoothing the periodogram (76) over (k - IHuples of fre-quencies in the neighborhood of X1, • • • , \k~l, but such thatno proper subset of the (k - I )-tuple sums to 0. Details of thisconstruction are given in Brillinger and Rosenblatt [91] forthe discrete time case. We could equally well have constructedan estimate using the Fourier transforms d£'(\,;') based ondisjoint domains.

APPENDIX

We begin by providing a motivation for Assumption 1 ofSection XIV. Suppose that

is continuous in each of its arguments. Being a continuousmultilinear functional it can be written

where ca ...fl. is a Schwartz distribution on 3)(/Jp ), from theSchwartz nuclear theorem. If the process is stationary thisdistribution satisfies

It follows that it has the form

provided X1 + • • • + Xfe = 0. This last one suggests the use offcth order periodogram

536 ADDENDUM

when the supports of <j>i, • • • , 0fc_i are farther away from thatof 4>k than some number p. This means that the distribution Chas compact support. By the Schwartz-Paley-Wiener theorem,C is, therefore, the Fourier transform of a function of slowgrowth, say /fli ...flfc(X

!, • • • , X*~') and we may write the rela-tion (72). In the case that values of the process X(<f>) at a dis-tance from each other are only weakly dependent, we can ex-pect the cumulant to be small and for the representation (72)to hold with (73) satisfied.

Proof of Theorem 2: We see from (66) and (73)

for 0 € £(tfpk) where C is a distribution on 3) (/?p(*~J)).Now consider the case in which the process Jf(0) has the

property that

It follows from this last that the standardized joint cumulantsof order greater than 2 tend to 0 and so the Fourier transformsare asymptotically normal.


REFERENCES11 ] G. G. Stokes, "Note on searching for periodicities," Proc. Roy.

Soc., vol. 29, p. 122, 1879.(2] A. Schuster, "The periodogram of magnetic declination," Cam-

bridge Phil. Soc., vol. 18, p. 18, 1899.13] E. T. Whittaker and A. Robinson, The Calculus of Observations.

Cambridge, England: Cambridge Univ. Press, 1944.(4] J. W. Cooley and J. W. Tukey, "An algorithm for the machine

calculation of complex Fourier series," Math. Comput., vol. 19,pp. 297-301, 196S.

| S J A. A. Michelson, Light Waves and Their Uses. Chicago, 111.: Univ.Chicago Press, 1907.

[6] I. Newton, Opricfct. London, England: W. Innys, 1730.[7] M. I. Pupin, "Resonance analysis of alternating and polyphase

currents," Trans. AIEE, vol. 9, p. 523, 1894.[8] H. L. Moore, Economic Cycles Their Law and Cause. New York:

Macmillan, 1914.(9) H. A. Panofsky and R. A. McCormick, "Properties of spectra of

atmospheric turbulence at 100 metres," Quart. J. Roy. Meteorol.Soc., vol. 80, pp. 546-564, 1954.

[10] J. A. Leese and E. S. Epstein, "Application of two-dimensionalspectral analysis to the quantification of satelite cloud photo-graphs," /. Appl. Meteorol., vol. 2, pp. 629-644, 1963.

[11] M. S. Bartlett, "The spectral analysis of point processes," /. Roy.Stat. Soc., vol. B 25, pp. 264-296, 1963.

[12] , "The spectral analysis of two dimensional point processes,"Biometrika,vo\. 51, pp. 299-311, 1964.

{13) A. V. Oppenheim, R. W. Schafer, and T. G. Stockham, Jr., "Non-linear filtering of multiplied and convolved signals," Proc. IEEE,vol. 56, pp. 1264-1291, 1968.

[14] B. P. Bogert, M. J. Healey, and J. W. Tukey, "The quefrencyalanysis of time series for echoes: cepstrum, pseudo-covariance,cross-cepstrum and saphe cracking," in Time Series Analysts,M. Rosenblatt, Ed. New York: Wiley, pp. 209-243, 1963.

[15] M. S. Bartlett, "Periodogram analysis and continuous spectra,"Biometrtka,\o\. 37, pp. 1-16, 19SO.

[16] R. H. Jones, "A reappraisal of the periodogram in spectral analy-sis," Technometrics, vol. 7, pp. 531-542, 1965.

[17] K. Hasselman, W. Munk, and G. J. F. MacDonald, "Bispectra ofocean waves," in Time Series Analysis, M. Rosenblatt, Ed. NewYork: Wiley, pp. 125-139, 1963.

[18) N. Wiener, "Generalized harmonic analysis," ActaMath., vol. 55,pp. 117-258, 1930.

[19] H. Wold, Bibliography on Time Series and Stochastic Processes.London, England: Oliver and Boyd, 1965.

[20 J D. R. Brillinger, Time Series: Data Analysis and Theory. NewYork: Holt, Rinehart and Winston, 1974.

[211 A. Ya. Khintchine, "Korrelationstheories der stationaren sto-chastischen Prozesse," Math. Ann., vol. 109, pp. 604-615, 1934.

[22] H. Cramer, "On harmonic analysis in certain functional spaces,"Ark. Mat. Astron. Fys.,vol. 28B, pp. 1-7, 1942.

[23] V. P. Leonov and A. N. Shiryaev, "Some problems in the spectraltheory of higher moments, II," Theory Prob. Appl. (USSR), vol.5, pp. 460-464, 1960.

[24] B. Picinbono, "Tendence vers le caractere Gaussien par filtrageselectif," C. R. Acad. Sci. Paris, vol. 248, p. 2280, 1959.

[25] M. Rosenblatt, "Some comments on narrow band-pass filters,"Quart. Appl Math.,vo\. 18, pp. 387-393, 1961.

[26] D. R. Brillinger, "The frequency analysis of relations betweenstationary spatial series," in Proc. 12th Biennial Seminar of theCanadian Math. Congress, R. Pyke, Ed. Montreal, P.Q., Canada:Can. Math. Congr., pp. 39-81, 1970.

[27] E. J. Hannan and P. J. Thomson, "Spectral inference over narrow

538 ADDENDUM

bands,"/. Appl. Prob., vol. 8, pp. 157-169, 1971.[28] E. Parzen, "On consistent estimates of the spectrum of stationary

time series," Ann. Math. Statist, vol. 28, pp. 329-348, 1957.[29) M. Rosenblatt, "Statistical analysis of stochastic processes with

stationary residuals," in Probability and Statistics, U. Grenander,Ed. New York: Wiley, pp. 246-275, 1959.

[ 301 W. D. Wright, The Measurement of Color. New York: Macmillan.1958.

[31] E. W. Carpenter, "Explosions seismology," Science, vol. 147,pp. 363-373, 1967.

(32) D. G. Lambert, E. A. Flinn, and C. B. Archambeau, "A com-parative study of the elastic wave radiation from earthquakesand underground explosions," Geophys. J. Roy. Astron. Soc.,vol. 29, pp. 403-432, 1972.

[33] W. H. Munk and G. J. F. MacDonald, Rotation of the Earth.Cambridge, England: Cambridge Univ. Press, 1960.

[34] G. J. F. MacDonald and N. Ness, "A study of the free oscillationsof the Earth,"/. Geophys. Res.,vol. 66, pp. 1865-1911, 1961.

J 35) R. L. Wegel and C. R. Moore, "An electrical frequency analyzer,"Bell Syst. Tech. /..vol. 3, pp. 299-323, 1924.

[36) N. Wiener, Time Series. Cambridge, Mass.: M.I.T. Press, 1964.[37) U. Grenander and M. Rosenblatt, Statistical Analysis of Sta-

tionary Time Series. New York: Wiley, 1957.[38] E. Parzen, "An approach to empirical time series analysis,"

Radio Sci., vol. 68 D, pp. 551-565, 1964.(39) R. T. Lacoss, "Data adaptive spectral analysis methods," Geo-

physics, \ol. 36, pp. 661-675, 1971.(40) J. P. Burg, "The relationship between maximum entropy spectra

and maximum likelihood spectra," Geophysics, vol. 37, pp.375-376,1972.

[41] K. N. Berk, "Consistent autoregressive spectral estimates," Ann.Stat., vol. 2, pp. 489-502, 1974.

(42) V. E. Pisarenko, "On the estimation of spectra by means of non-linear functions of the covariance matrix," Geophys. J. Roy.Astron. Soc., vol. 28, pp. 511-531, 1972.

[43] J. Capon, "Investigation of long-period noise at the large apertureseismic array,"/. Geophys. Res., vol. 74, pp. 3182-3194, 1969.

[44] E. J. Hannan, Multiple Time Series. New York: Wiley, 1970.[45] T. W. Anderson, The Statistical Analysis of Time Series. New

York: Wiley, 1971.[46] P. Whittle, "Estimation and information in stationary time

series," Ark. Mat. Astron. Fys., vol. 2, pp. 423-434, 1953.[47] J. M. Chambers, "Fitting nonlinear models: numerical tech-

niques," Biometrika, vol. 60, pp. 1-14, 1973.[48] D. R. Brillinger, "An empirical investigation of the Chandler

wobble and two proposed excitation processes," Bull. Int. Stat.Inst., vol. 39, pp. 413-434, 1973.

[49) P. Whittle, "Gaussian estimation in stationary time series," Bull.Int. Stat. Inst., vol. 33, pp. 105-130, 1961.

[50] A. M. Walker, "Asymptotic properties of least-squares estimatesof parameters of the spectrum of a stationary nondeterministictime-series," /. Australian Math. Soc., vol. 4, pp. 363-384, 1964.

[51] K. O. Dzaparidze, "A new method in estimating spectrum pa-rameters of a stationary regular time series," Tear. Veroyat. EePrimen., vol. 19, p. 130, 1974.

[52] L. H. Koopmans, "On the coefficient of coherence for weaklystationary stochastic processes," Ann. Math. Stat., vol. 35, pp.532-549, 1964.

[53] H. Akaike and Y. Yamanouchi, "On the statistical estimation offrequency response function," Ann. Inst. Stat. Math., vol. 14,pp. 23-56, 1962.

[54] L. J. Tick, "Estimation of coherency," in Advanced Seminar onSpectral Analysis of Time Series, B. Harris, Ed. New York:Wiley, 1967, pp. 133-152.


(55] R. J. Bhansali, "Estimation of the Wiener filter," in ContributedPapers 39th Session Int. Stat. Inst.,vo\. 1, pp. 82-88, 1973.

[56] A. E. Hoerl and R. W. Kennard, "Ridge regression: biased estima-tion for nonorthogonal problems," Technometrics, vol. 12, pp.55-67, 1970.

[57] B. R. Hunt, "Biased estimation for nonparametric identificationof linear systems," Marti. Btosct,vol. 10, pp. 215-237, 1971.

[58] S. Bochner and K. Chandrasekharan, Fourier Transforms.Princeton, N.J.: Princeton Univ. Press, 1949.

[59] J. Mannos, "A class of fidelity criteria for the encoding of visualimages," Ph.D. dissertation, Univ. California, Berkeley, 1972.

[60] L. J. Tick, "Conditional spectra, linear systems and coherency,"in Time Series Analysis, M. Rosenblatt, Ed. New York: Wiley,pp. 197-203, 1963.

[61] C. W. J. Granger, Spectral Analysis of Economic Time Series.Princeton, N.J.: Princeton Univ. Press, 1964.

[62] N. R. Goodman, "Measurement of matrix frequency responsefunctions and multiple coherence functions," Air Force DynamicsLab., Wright Patterson AFB, Ohio, Tech. Rep. AFFDL-TR-65-56,1965.

[63] J. S. Bendat and A. Piersol, Measurement and Analysis of RandomData. New York: Wiley, 1966.

[64] G. W. Groves and E. J. Hannan, "Time series regression of sealevel on weather,"Rev. Geophys., vol. 6, pp. 129-174, 1968.

[65] W. Gersch, "Causality or driving in electrophysiological signalanalysis,"/. Math. Biosci.,vol. 14, pp. 177-196, 1972.

[66] N. R. Goodman, "Eigenvalues and eigenvectors of spectral densitymatrices," Tech. Rep. 179, Seismic Data Lab., Teledyne, Inc.,1967.

[67] D. R. Brillinger, "The canonical analysis of time series," in Multi-variate Analysis—II, P. R. Krishnaiah, Ed. New York: Academic,pp. 331-350, 1970.

[68] M. B. Priestley, T. Subba Rao, and H. Tong, "Identification ofthe structure of multivariable stochastic systems," in MultivariateAnalysis—III, P. R. Krishnaiah, Ed. New York: Academic, pp.351-368, 1973.

[69] M. Miyata, "Complex generalization of canonical correlation andits application to a sea-level study," /. Marine Res., vol. 28, pp.202-214, 1970.

[70] W. S. Ligett, Jr., "Passive sonar: Fitting models to multiple timeseries," paper presented at NATO Advanced Study Institute onSignal Processing, Loughborough, U. K., 1972.

[71] D. R. Brillinger, "The analysis of time series collected in an ex-perimental design," Multivariate Analysis—III, P. R. Krishnaiah,Ed. New York: Academic, pp. 241-256, 1973.

[72] D. R. Brillinger and M. Hatanaka, "An harmonic analysis of non-stationary multivariate economic processes," Econometrica, vol.35, pp. 131-141, 1969.

[73] E. J. Hannan and R. D. Terrell, "Multiple equation systems withstationary errors," Econometrica, vol. 41, pp. 299-320, 1973.

[74] C. R. Rao, Linear Statistical Inference and its Applications.New York: Wiley, 1965.

[75] D. R. Cox and P. A. W. Lewis, The Statistical Analysis of Seriesof Events. London, England: Methuen, 1966.

[76] D. R. Brillinger, "The spectral analysis of stationary interval func-tions," Proc. 6th Berkeley Symp. Math. Stat. Prob. Vol. 1,L. M. Le Cam, J. Neyman, and E. L. Scott, Eds. Berkeley,Calif.: Univ. California Press, pp. 483-513, 1972.

[77] D. J. Daley and D. Vere-Jones, "A summary of the theory ofpoint processes," in Stochastic Point Processes, P. A. W. Lewis,Ed. New York: Wiley, pp. 299-383, 1972.

[78] L. Fisher, "A survey of the mathematical theory of multidimen-sional point processes," in Stochastic Point Processes, P. A. W.Lewis, Ed. New York: Wiley, pp. 468-513, 1972.

540 ADDENDUM

[79] A. G. Hawkes and L. Adamopoulos, "Cluster models forearthquakes—regional comparisons," Bull. Int. Stat. Inst., vol. 39,pp.454-460,1973.

[80] A. M. Yaglom, "Some classes of random fields in n-dimensionalspace related to stationary random processes," Theory Prob.Appl. (USSR), vol. 2, pp. 273-322, 1959.

[81 ] L. Schwartz, Theorie des Distributions, Vols. 1, 2. Paris, France:Hermann, 1957.

[82] A. Papoulis, The Fourier Integral and Its Applications. NewYork: McGraw-Hill, 1962

[83] K. Ito, "Stationary random distributions," Mem. Col. Sci. Univ.Kyoto A, vol. 28, pp. 209-223, 1954.

[84] I. M. Gelfand and N. Ya. Vilenkin, Generalized Functions, vol. 4.New York: Academic, 1964.

[85] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics,vol. 1. London, England: Griffin, 1958.

[86] D. R. Brillinger, "The identification of polynomial systems bymeans of higher-order spectra," /. Sound Vibration, vol. 12, pp.301-313, 1970.

[87] F. J. Beutler and O. A. Z. Leneman, "On the statistics of randompulse processes,Inform. Contr.,vol. 18, pp. 326-341, 1971.

[88] N. Wiener, Nonlinear Problems in Random Theory. Cambridge,Mass.: M.I.T. Press, 1958.

[ 89 ] A. Halme, "Polynomial operators for nonlinear systems analysis,"Acta Poly tech. Scandinavica, no. 24, 1972.

[90] L. J. Tick, "The estimation of the transfer functions of quadraticsystems," Technometries, vol. 3, pp. 563-567, 1961.

[91] D. R. Brillinger and M. Rosenblatt, "Asymptotic theory of fcthorder spectra," in Advanced Seminar on the Spectral Analysis ofTime Series, B. Harris, Ed. New York: Wiley, pp. 153-188,1967.

[92] G. Matheron, "The intrinsic random functions and their applica-tions," Adv. Appl. Prob., vol. 5, pp. 439-468, 1973.

Documents

David R. Brillinger Time Series Data Analysis and Theory 2001