END - m IhE~ - DTIC · 2014. 9. 27. · spectral analysis are outlined in section 7. The final section provides a guide to the literature of autoregressive spectral estimation by

ADAL17 l" TEXAS A AND 14 UNIV COLLLSE STATION INST OF STATISTICS F/0 11/1AUTOREOR9SSIVE 5PEC TRAL ESTIMATION. (U)JUN 99 9 PARZEN DAA6960-C0070

UNCLASSIFIED T314*S N

END

MMMMMMMMjI'

L - m IhE~

4

TEXAS A&M UNIVERSITY

LCOLLEGE STATION, TEXAS 77843-3143

INSTITVTE OF STATISTICS'3 - 345-3141

AUTOREGRESSIVE SPECTRAL ESTIMATION

by Emanuel Parzen

Institute of Statistics

Texas A&M University

Technical Report No. N-28

June 1982

'< JUL 2 1

Te>.as A&M Research FoundationProject No. 4226T

"Multiple Time Series Modeling and

Time Series Theoretic Statistical Methods"

Sponsored by the Office of Naval Research

Approved for public release; distribution unlimited.

|013

UnclassifiedSECURITY CLASSIFICATION O0 THIS PAGE (iMen Datle eoro*

REPORT DOCUMENTATION PAGE IN__ STRUCTIONSREPOT DOUENTTIONPAGEBEFORE COMPLETING FORM

I. REPORT NUMBER OVT ACCESSION NO S. RECIPIENTS CATALOG NUME

N-28 1 ///, 64. TITLE (andSublitlo) B. TYPE OF REPORT I PERIOD COVERED

AUTOREGRESSIVE SPECTRAL ESTIMATION Technical

6. PERFORMING ORG. REPORT NUMMER

.AUTHOR ) S. CONTRACT OR GRANT NUMMER(q)

Emanuel Parzen ONR N00014-82-MP-2001

ARU DAAG29-80-C-0070S. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK

AREA I WORK UNIT NUMBERS

Texas A&M UniversityInstitute of StatisticsCollege Station, TX 77843

It. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE

June 198213. NUMBER OF PAGES

14. MONITORING AGENCY NAME & ADDRESS(If different from Confrolling Office) IS. SECURITY CLASS. (of this report)

Unclassified

IS&. DECLASSIFICATION/DOWNGRADINGSCHEDULE

16. DISTRIBUTION STATEMENT (of this Report)

Approved for public release; distribution unlimited.

17. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, it different from Report)

NA

IS. SUPPLEMENTARY NOTES

19. KEY WORDS (Continue on revere side If necessary ad identify by block number)

Time series analysis, autoregressive spectral density, maximumentropy, empirical spectral analysis.

20. AeS1 RACT (Continue onree aS side If necessar WW Identify by block mmber)Autoregressive spectral estimation is one of the new techniaues

for spectral analysis developed in the last two decades. Itstheory and applications are extremely extensive. This article aimto provide an overview (rather than a detailed account) of themain ideas. A comprehensive bibliography guides the reader toarticles in which case studies and comparisons of autoregressivespectral estimators are described.

DD IoAm 1473 EDITION OF I NOV 65 IS OBSOLETE UnclassifiedS/N0102-F- 014-6601 ECURITY CLASSIFICATION OF THIS PAGE (When Dale 3hlSeO

AUTOREGRESSIVE SPECTRAL ESTIMATION

by Emanuel Parzen

Institute of Statistics, Texas A&M University

0. Introduction

The problem of spectral analysis of time series is clearly

of great interest to the many applied scientists who use

spectral analysis in their scientific research. It should be

of great interest to statisticians because it embodies

prototypes of two of the great problems of modern statistics:

functional inference and modeling. A problem of statistical

inference usually assumes three ingredients: a sample of

observations, a parameter which indexes the family of possible

joint probability densities of the sample, and a formula for

the probability density of the sample

f(sampleIparameter).

Classical statistical inference assumes the parameter is a

finite dimensional parameter 0 = (01,..., 0 k). Functional

inference assumes the parameter is a function, such as f(w),

O<w<l.

The parameter estimation problem seeks to form optimalA

estimators (denoted 0) of the parameter. A typical model

This research was supported by the Office qf NavalResearch (Contract N00014-81-MP-10001, ARO DAAG29-80-C0070).

2

identification problem seeks to find the smallest number of

significantly non-zero components 0 of the parameter 0.

Estimation of a function often has similar features to

model identification, since a function can be parametrized

exactly by a countable infinity of parameters. However in

practice one can only efficiently estimate a finite number of

parameters. Therefore to estimate a function one must use

the smallest finite number of parameters which provide an

adequate approximation of the function.

The goals of functional inference and model identification

are in my view best pursued simultaneously. One seeks methods

of statistical inference which are finite-parametric and

non-(or infinite-) parametric. One achieves this goal by

using finite parameter models (which theory indicates might

be exact methods) in ways that enable them to be interpreted

also as approximating models.

Autoregressive spectral estimation is one of the new

techniques for spectral analysis developed in the last two

decades. Its theory and applications are extremely extensive.

This article aims to provide an overview (rather than a detailed

:1I account) of the main ideas. A comprehensive bibliography guides

the reader to articles in which case studies and-comparisons of

autoregressive spectral estimators are described.

The spectral density is defined in section 1. Infinite

order AR and MA representations of a stationary time series

are introduced in section 2. Entropy as a motivation for

3

autoregressive schemes is discussed in section 3. Alternate

parametrizations of an autoregressive scheme are outlined in

section 4. Algorithms for computing the coefficients of

autoregressive spectral densities are stated in section 5.

Criteria for determining the orders of autoregressive schemes

are mentioned in section 6. Suggestions for empirical

spectral analysis are outlined in section 7. The final

section provides a guide to the literature of autoregressive

spectral estimation by listing the articles which correspond

to some important developments.

1. Correlations and Spectral Functions of a Stationary Time

Series

The theory of time series discusses separately discrete

parameter time series {Y(t), t=O, +1, +2,...} and continuous

parameter time series {Y(t), -o<t<-o}. Only the former case is

discussed in this article. Discrete parameter series usually

arise by observing a continuous parameter time series at equi-

spaced times. The frequency variable w of a pure harmonic

4(i) Y(t) = A cos 27rwt + B sin 2nwt

observed at t = 0 , +1,... can be assumed to vary in either

-0.5 < w < 0.5 or 0 < w < 1. The first interval is usually

adopted, but the second interval will be adopted in this

article because it is more convenient for developing

- i ll I Il h l - -- , V. ,A

4

ibomorphisms between spectral analysis and non-parametric data

modeling using quantile and density-quantile functions [see

Parzen (1979)].

The definitions and notation we adopt for the functions

used to describe a zero mean stationary Gaussian discrete

parameter time series Y(t), t=O, +1 ... are as follows.

A "time domain" specification of the probability law of

Y(.) is provided by the covariance function

(2) R(v) = E[Y(t)Y(t+v)], v=O, +1, +2 ....

or by the variance R(O) and the correlation function

(3) P(v) = R(v) = Corr [Y(t), Y(t+v)].

.., To define'spectral (frequency) domain specification of

the probability law of Y(.) we first assume summability of

R(.) and p('). The Fourier transforms of R(v) and p(v) are

called the power spectrum S(w) and spectral density function f(w)

respectively, and are defined by

(4) S(w) = I e-27ivwR(v), Ow<l;V=0

00 (5 2nivw(5) f(W) e P(v), O<W<l.

The spectral distribution function is defined by

(6) F(M) = fW f(W') dw', O<w<l0

5

For data analysis one actually computes a modified spectral

distribution function denoted F+(w) and defined for O<w<0.5:

(7) F+(w) = 2F(w), 0iw<0.5.

A correlation function p(v) has a mathematical property of

being a positive definite function which, without assumingOTIC

summability, guarantees the existence of: (1) a spectral copyWgSPECTED

distribution function F(w), and (2) the spectral representation; 2

of the correlation functione(v) given by k ! T(8) p(v) i 2fivW dF(w)

cos 2vw dF (W), W F (.

The notion of an ergodic time series is not given a rigou-

definition in this article but its intuitive meaning is important

for us. We call a time series ergodic when the parameters of

its probability law possess consistent estimators (and thus can

be determined with probability one, given a sample of infinite

length). An example of a non-ergodic stationary Gaussian zero

mean time series is

Y(t) = A cos 2rwt + B sin 2Trwt

where A and B are independent N(0,o2 ) random variables. One

can infer the values of A and B in the sample observed, but

one cannot infer the value of a2. This time series has

correlation function

6

(9) p(v) Cos 21wv, v=O, + 1, + 2,...,

and does not possess a spectral density.

Parzen (1982) proposes that it is useful in practice to

distinguish qualitatively between three types of time series,

which we call

no memory: white noise,

short memory: stationary and ergodic,.

long memory: non-stationary or non-ergodic.

A no-memory or white noise time series is a stationary

Gaussian time series satisfying either of the equivalent

conditions:

p(v) = 0 for v>O;

(10)

f(W) : 1,0<w<l.

A short memory time series is a stationary time series

possessing a summable correlation function p(v) and a spectral

density f(w) which is bounded above and below in the sense

that the dynamic range of f(w)

max min(11) DR(f) = 0<w<l f(w) 0<w<l f(w)

satisfies 1< DR(f)< o.

A long memory time series is one which is neither no

memory nor short memory; alternatively, a long memory time

series is one which is non-stationary or non-ergodic. It

usually has components representing cycles or trends. An

example of a long memory time series is (1) where A and B

are independent N(O,a2 ) random variables.

For a short memory time series one can define the inverse-

correlation function

(12) pi(v) = 1 e2wivw f-l(w)dw + i fl(w) dwO 0

and the cepstral-correlation function

(13) y(v) f e2iivw log f(w) dw0

It should be noted that the inverse-correlation function is

positive definite. However the cepstral-correlation function

is not. These new types of correlation functions are

introduced because they may provide more parsimonious

parameterizations in the sense that they decay to 0 faster

than does the correlation function. Statistical inference

from a sample of the probability law of a time series often

achieves greatest statistical efficiency by using the most

parsimonious parametrizations. Thus to form estimators f(W)

of the spectral density f(w) from a raw estimator f(W),

greater precision may be attained by first forming estimators

{f-l(w)}^ and {log f(w)}^ of the inverse or logarithm of the

spectral density. Autoregressive spectral estimation may be

regarded as an approach to estimating f(w) by first

estimating fl(W).

L •.

8

2. Filter Models of a Short Memory Stationary Time Series

A short memory zero mean Gaussian stationary time series

Y(t) has a property of fundamental importance: it can be

linearly transformed to a white noise time series, denoted

YV (t) or E(t), by a filter that is invertible. The definition

of Y (t) is provided by the theory of minimum mean square error

prediction.

The definitions and notation of prediction theory that

we adopt are as follows:

(1) Ym (t) ' E[Y(t)jY(t-l),...,Y(t-m))

denotes the memory m one-step ahead predictor while

(2) y ,m (t) = Y(t) - Y.IIm(t)

is the prediction error, and

(3) CT = E[IYV,m (t)12] + E[IY(t)1 2]

is the mean square prediction error measured in units of the

variance R(O) of Y(t). Corresponding infinite memory notation

'is

1(4) Y(t) = E[Y(t)IY(t-I), X(t-2) ....

(5) YV (t) = Y(t) - Y1'(t)

(6) a2 = E[Yv(t)I2] " E[IY(t) 12 ].

9

Explicit formulas for the foregoing predictors,

prediction errors, and mean square prediction errors can be

obtained from the correlation function p(v). The autoregressive

coefficients cm(1),...,cm(m) of order m are defined by

(7) -Y m(t) = cra(1) Y(t-l) +...+cra(m) Y(t-m),

(8) yVm (t) = Y(t) + am(1) Y(t-l)+...+x (M) Y(t-m).

A predictor is characterized by the condition that the

prediction error is orthogonal (normal) to the predictor

variables:

(9) ElY 'vIm(t)Y(t-k) ] = 0, k=l,. .. m

By substituting (8) into (9) one obtains the famous Yule-Walker

equations, defining am(o) = 1.

m(10) 1 t m(j) p(j-k) = 0, k=l,...,m

j=0

One obtains G2 bym

m

(11) an = E[YV'm(t) Y(t)]- E[IY(t) = m p(j)j=0

For a short memory time series, these equations also hold with

The time series of infinite memory prediction errors

YV (t) is always a white noise series called the innovations.

10

It provides a transformation of the time series Y(t) to a

white noise time series Y (t) which we write

(12) Y(t) + a (l) Y(t-l) YV(t),

and call an AR(-) or infinite order autoregressive scheme for

Y(t). An MA() or infinite order moving average scheme

representation is

(13) Y(t) = YV(t) + B(1) Y(t-l) +.

whose coefficients B (k) can be determined recursively from

a (j) byB (0) = 1, and for k>O

k(14) Bj(k) + I U (j) a. (k-j) = 0

j=l

The AR(-) and MA(-) representations have important

implications for spectral analysis since they provide formulas

for the spectral density function f(w) alternative to the

formula that f(w) is the Fourier transform of p(v). One

can show that

2 2i)(15) f(W) = a h(e 2 i)12

(16) f- 1 ) = oIg.e 2i 2

defining

(17) h (z) "jz0 B (j) I, g(z) = kk.

I= I Ik =i. ,l .,,.--' ''

These infinite series converge in general in mean square on

the unit interval. In order to guarantee pointwise convergence

at each w in O<w<l one must make an additional smoothness

condition such as a Lipshitz condition on f(w), which is implied

in turn by the condition

(18) IvIlP(v)I <-

In spectral analysis we usually assume at least the existence

of a continuous second derivative, which is implied by the

condition

(19) Iv1'I P(v)I < -.

The notion of a time series Y(.) being an autoregressive

scheme of order p, denoted AR(p), can be defined in terms of

prediction theory as follows: Y(.) is an AR(p) if and only if

the memory p prediction errors yV'P(.) is a white noise time

series and a p(p) # 0. The spectral density of yV'P(.) can be

expressed in terms of the autoregressive transfer function of

order p

(20) g p(Z) = ap(0) +a p(1) z+...+a p(p) zp

by

12

(21) f VP() Ig2p ( e 2 r iiw f()

If the time series Y(.) is in fact AR(p), then its spectral

density equals the function

(22) f (W) = a2 Ig (e27riw)I-2P p p

which we call, in general, the approximating autoregressive

spectral density of order p. A time series Y(.) can be

regarded as approximated by an AR(p) if

(23) M (w) = f(w)

can be regarded as not "significantly" different from the

constant 1. In this way a test of the hypothesis that a time

series Y(-) is AR(p) can be converted to a test of the hypothesis

that the prediction error time series is white noise.

The sequence of approximating autoregressive spectral

densities fm(w), m-1,2,... may be shown to converge as m tends

to - to f(w) at each w in O<w<l under suitable conditions (see

especially Nevai (1979)). Sufficient conditions are that f(w)

has finite dynamic range (and therefore is bounded above and

below) and has a continuous derivative. When an estimator,

denoted fm(w),of fm(w) is used as an estimator of f(w), one has

to take into account two kinds of errors, called respectively

13

bias and variance. Bias is a measure of the deterministic

difference between fm(w) and f(w), while variance is a measure

of the stochastic distance between fm (w) and fm(w). As m

increases bias decreases while variance increases. This is an

example of the fundamental problem of empirical spectral

naalysis which is how to achieve an optimal balance between

bias and variance. When one uses autoregressive spectral

estimation, this problem reduces to a question of determining

the order m of the approximating autoregressive scheme, which

is discussed in section 6.

It should be noted that basic references for the

mathematical properties (especially convergence) of autoregressive

transfer functions gm(z) are Geronimus (1960) and Grenander and

Szego (1958).

14

3. Entropy Interpretation of Autoregressive Spectral Densities

The use of autoregressive spectral densities as exact

models, and as approximating models, for true spectral densities

is often questioned by sceptical statisticians on the ground

that their use in general is ad hoc and without theoretidal

justification. A possible answer to this criticism is provided

by entropy concepts. This section reviews these concents in order

to state their application to spectral estimation.

The notion of entropy in statistics is usually first defined

for a discrete distribution with probability mass function p(x).

The entr6py of this distribution, denoted H(p), is defined by

(1) H(p) = - p(x) log p(x)x

For the distribution of a continuous real valued random variable

X, with probability density function f(x), entropy is defined

(analogously or formally) by

(2) H(f) = -fc f(x) log f(x) dx

A concept closely related to entropy is information

divergence 1(f;g) between two probability densities f(x) and

g(x), defined by

(3) I(f;g) = fw{-log x)} f(x) dx

15

One should note that I(f;g) equals minus the generalized

entropy H(fjg) defined by

(4) H(flg) = f(x) log f(x)} g(x) dx

(4)_g -g- gx)

Another fundamental concept is cross-entropy defined by

(5) H(f;g) = fD {-log g(x)} f(x) dx.

Note that H(f) = H(f;f).

Information divergence is expressed in terms of cross-

entropy and entropy by

(6) l(f;g) = H(f;g) - H(f)

Important Information Inequality:

(7) l(f;g) > 0

with equality if and only if f f g; consequently

(8) H(f) < H(f;g)'J

Proof: Apply Jensen's inequality which states for an

arbitrary function h(x)

(9) fo {log h(x)} f(x) dx < log f]o h(x) f(x) dx

I-i~wr -- D-O

16

with equality if and only if h(x) = 1 for almost all x with

respect to the probability density f(x).

Some applications of entropy in probability and statistical

modeling are as follows.

The method of maximum likelihood parameter estimation can

be described abstractly as follows. One introduces a parametric

family of probability densities fe(x), indexed by a vector

parameter e= (e1,...,ek). Suppose there is a true parameter

value e in the sense that the true probability density

f(x) = f- (x). Then e satisfies

min Hf(10) H(f) = H(f;f6) = m f )

To estimate e from data, one forms an estimator H(f;fe) of

H(f;f) 6and defines an estimator a of - by

(11) H(f; f;) = Hin H (f;f0)

The estimator H(f; fr) could be of the form

(12) H(f; fe) H(f; f0)

for a suitable raw estimator f(x) of f(x).

The parametric families of probability densities fe(x)

are often derived axiomatically using a maximum entropy principle.

17

Theorem: Fix k functions T(x), J-l,2,... ,k, and k real

numbers T, T2 1...PTk such that there exists probability

densities f(x) satisfying

(13) f Tj(x) f(x) dx Tj, J=l,...,k.-00

The density with maximum entropy H(f) among these densities

is of the form

k(14) log fe(x) =J-lejTj(x) -TO ... ,ok)

where

k(15) T(el... ,1ek) = log fJdx exp =j ejTj(x)

and eI,....ek are chosen to satisfy

(16) fC T.(x) fe(x) dx = j, j1l,...,k.CO 6

Proof: The theorem may be proved by calculus of variations

arguments. A quick proof is to verify that for any f(x)

satisfying the moment constraints (13)

k(17) H(f; f8) - T(819,...,bek) - H(f ) ,

and therefore

18

(18) H(f) < H(f;f 0 ) - H(f 0 ).

Thus the maximum entropy is achieved by fe(x).Natural Exponential models. A parametric family of probability

densities fs (x) is said to obey a natural exponential model

when it is of the form (14). Thus natural exponential models

are maximum entropy probability densities.

To extend entropy concepts to short memory stationary zero

mean Gaussian time series, define the information divergence

for a sample {Y(t), t=l,2, ... ,T} as a function Of the true

probability density f of the sample, and a model g for f. We

define

(19) I(f; g) = T I(f;g)

-1 [1 g(Y(l),...,Y(T)) 1(20) IT(f;g) = T Ef 0lo f(Y(l),...,Y(T))

It should be noted that we are using the notation f and g with

a variety of meanings. For a Gaussian zero mean stationary time

series, the probability density of the sample is specified by

the spectral densities f(w) of the true distribution and g(w)

of the model. The arguments of the information divergence

I(f;g) indicate spectral densities in the following discussion.

Pinsker (1963) derives the following very important formula:

(21) I(f;g) 1 f I f2w)4- f(w)0 9g7

19

Since u - log u - I > 0 for all u, I has two of the properties

of a distance: I(f;g) > 0, I(f;f) = 0. However I does not

satisfy the triangle-inequality.

We define the cross-entropy of spectral density functions

f(w) and g (w) by

(22) H(f;g) {log g(W) + f(w)} dw

The entropy of f is

(23) H(f) = H(f;f) = . f' {log f(w) + 1} dw0

Information divergence can be expressed

(24) l(f;g) = H(f;g) - H(f)

Hence

(25) H(f) < H(f;g)

An approximating autoregressive spectral density of order

m, denoted w), to a spectral density f(w) is defined by

(26) H(f;m) _ min H(f;fm)fm

where the minimization is over all fm of the form

20

(27) fmo(w) -; jgm (e 2 riw) -amR fm

(28) gi(z) - 1 + cm(l) z+...+o (m) zm

One may verify that

7 i gMe2(iT)(29) H(f;f m {log M 0 egmi f(w)dt}

The coefficients o- , (1),... ,m(i) of the minimum cross-

entropy approximating autoregressive spectral density satisfy

(30) -M= f1Im(e 2 7iw)12 f(w) dw

0

fljm~ 2 f(j) fd w dw,(J PJ

(31) fme21Tiw -2nikw f(w) dw

(3) f k(e )e-0

a m(J) p(j-k) - 0, k1,2,...,m0

Further

(32) H(f;fm) - {log , + 1 -(

The autoregressive spectral density Tm(M) can be derived

axiomatically using a maximum entropy principle.

21

Theorem: The spectral density with maximum entropy among

all spectral densities f(w) satisfying the constraints

(33) f1 e 2 ' i wj f(w) dw = p(j), j=l,2,...m0

for m specified correlation coefficients (l),...,p(m) is

Im )

Proof: It may be verified that Tm(w) satisfies the

constraints (33), and (32) holds for any f(w) satisfying (33).

Since

(34) H(f) < H(f;Fm ) = H(Jm),

it follows that fm has maximum entropy among all spectral

densities satisfying the constraints (33).

The maximum entropy principle provides a motivation or

justification for the use of autoregressive spectral estimators.

However the maximum entropy principle provides no insight into

how to identify an optimal order m, or even what are the effects

of different methods of estimating the parameters o,

, cm(m). It provides no guidance for how to learn from

the data whether the time series is non-stationary (long memory)

or stationary (short memory), or whether the best time series

model is AR, MA, or ARMA. It is a principle for deriving

probability models, rather than statistically fitting models to

data. Further, the maximum entropy principle justifies auto-

regressive estimators only for short memory time series.

Autoregressive estimators are justified for long memory time

series by the fact that a pure harmonic Y(t) = A cos 2nwt

+ B sin 2nwt satisfies Y(t)- Y(t-l)+Y(t-2)-O where * - 2 cos 2 w.

22

4. Parametrizations of Autoregressive Spectral Estimators

There are many ways for forming autoregressive spectral

estimators, because there are four equivalent ways of parametrizing

them.

A. Autoregressive coefficients. Consider coefficients

a ()(m) such that g(z) = 1 +crn(l) z+.. .+am(m)zm satisfies

g(z) # 0 for complex z such that Iz] < 1. We call g(z) a

minimum phase filter transfer function. The autoregressive

coefficients determine a2 by

1= c~j g( 2riw) I2dwSf' Igm(e2

0

One computes the correlation coefficients p(l),...,p(m) by

P(j) f exp (27Tiwj) a m 2 Triw 1 dw.

B. Correlations. A set of m coefficients p(l),...,p(m)

such that the matrix [with p( 0 ) 1, p(-v) p(v)]

vp(O) p(-l) . . . p(l-m)p() p() . . . p(2-m)

p(m-1) p(m-2) . . p(O)

is positive definite are correlation coefficients of a time series.

They determine autoregressive coefficients by solving the Yule

23

Walker equations (with am (0) = 1]

I am (j) p(j-k) = 0, k=l,... ,m.j=O

m

One computes a' by a = PM

C. Partial Correlations Consider coefficients

Tr(l),...,(m) satisfying In(j)I < 1, j=l,2,...,m. They represent

partial correlation coefficients defined theoretically by:

ir(j) = Corr [Y(t), Y(t-j)IY(t-l),...,Y(t-j+l)]

Partial correlation coefficients determine autoregressive

coefficients and residual variances by a recursive algorithm

called the Levinson-Durbin recursion [see Levinson (1947) and

Durbin (1960)]: for k=l

a 1) =-7(1), u = 1 -(),

while for k=2, 3,...,m

a k (k) =-f(k) 2 Ok-i (j) p(k-j)a k-1 j=0

Y2 = a2Tr2(ak 0k-i {l- 2 (k)}

ak(j) ak 1 (j) - Tr(k) ak I (k-j)

24

Autoregressive coefficients determine partial correlation

coefficients by the recursion [see Barndorff-Nielsen and Schon

(1973)].

l(j) TT{1-'T2 (k)) - {k + 7T(k) ck(k-j)}

D. Residual variances. Consider coefficients02 . 2 satisfying

1 > a,2 > 02> . . .>o2 > 0 ,

11 2 m

and m coefficients representing sign n(1),...,sign 7(m). The

a's represent residual variances; they determine partial

correlation coefficients by a formula noted by Dickenson (1978)

7(k) =sign 7(k) I a'kl

• k

• 2

L,-

25

5. Empirical Autoregressive Spectral Estimation

Given a sample {Y(t), t-l,2,...,T} of a zero mean Gaussian

stationary time series whose spectral density f(w) is to be

estimated by an autoregressive spectral density estimator

f (W) = a- Ilgm( e 27i ) 2

(z)= i+ am(1) z+...+Lm(m) zm

we define the order identification problem to be the choice of

order m, and the parameter estimation problem to be the choice

of algorithm for computing the coefficients &m(1) .. (m(m) and

the residual variance "2.m

For a sample Y(l),...,Y(T) of a zero mean Gaussian

stationary time series, an approximation for the joint

probability density function f,(Y(l), .. ,Y(T)) indexed by a

parameter 6 is obtained as follows. We assume that the time

series Y(t) has been divided by {R(O)} so that its covariance

function equals its correlation function. Then

-2 log f(Y(l),...,Y(T)) = log (2 n)T det K-+ Y* K-IY

-2~~ loT YT

where * denotes complex conjugate transpose, Y* = (Y(l) ,...,Y(T)),T

SE YT Y* is a covariance matrix with (s,t)-element equal

to P8 (s-t). The subscript e on p0 (v) and f8 (w) indicate that

they are functions of the parameters 0, which are to be estimated.

- V-

26

The covariance matrix % is a Toepliz matrix. Asymptotically,

as T tends to -, all T by T Toeplitz matrices have the same

eigenvectors exp (-27itj/T), j=0,1,..., T-l. The corresponding

eigenvalues are fe(j/T). An approximation for likelihood

function frequently adopted is therefore

log f (Y(l),...,Y(T))

I g 2f+ f+ {log fe(w) + } dwo fM e

- log 2TT + H(f;f )

where f(w) is the sample spectral density defined by

~ T Tf(w) =ti Y(t) exp (-2nitw)j* t I y2(t)

tl t= 1

Maximum likelihood estimators e are asymptotically equivalent

to the estimators e minimizing the sample cross-entropy

H(f;fe).

If the parametric model f0 (w) is assumed to be the spectral

'! density of an AR(p), then estimators a', a (1),...,a(p) of the

coefficients satisfy Yule-Walker equations corresponding to the

sample correlation function

27

(v) f e Wv (w) dw

T-v T= I Y(t) Y(t+v) + Y2 (t)

t=l t=l

The sample correlation function,(v) can be computed, using the

Fast Fourier transform, by

v)= Q exp (27Ti kv) f()

which holds for 0 < v < Q - T.

Ic should be noted that we are assuming the time series

Y(t) to be zero mean, or more generally to have been detrended

by subtraction of j(t), an estimator of the mean value

function p(t) - E[Y(t)]. When p(t) = V, a constant, we take

A= ?. When p(t) is a function with period d (as might be

the case with d=12 for monthly time series) one might take

for P(t) the mean of Y(s) for all s=t modulo d.

By recursively solving the Yule-Walker equations one can

determine sequences of (1) estimated residual variances

Cy 1 02 ..>. .

(2) estimated partial correlations

--(2)~

28

(3) estimated autoregressive coefficients

a (0) = 1,o a (i) .... ,Om(m).! m m

(4) autoregressive spectral density estimators- m ^ -

km(w) = am I O cm(j) exp 27rijwl -

j =0

(5) residual spectral densities

fm(w) =f()m(

By forming a smoothed version fm(w) of fm(w) one can obtain a

final estimator f(w) of the unknown spectral density:

^2

f(W) m(w) fm(w)

When f(w) is tested for white noise, and found not to be

significantly different from white noise, then

f(W) -f (W),M

and the autoregressive spectral density estimator is the final

estimator.

The important question of criteria for choosing the orders

of approximating spectral densities is discussed in the next

section.

29

Computing estimators of autoregressive coefficients by

solving Yule-Walker equations is called stationary autoregression

because the autoregressive coefficients obtained are guaranteed

to correspond to a stationary time series. When am in the

foregoing analysis is tending to approximate 0, we consider

the time series to be long memory; experimental evidence

indicates that more reliable estimators of the spectral density,

and also of the autoregressive coefficients, are provided by

least-squares autoregression, which solves the normal equations

"A oo . A(,f m2K (0,0) ... K(0,m) I 0

K (1,0 .. .MA A 0

K(m,0) ... K(m,m) am(m) 0

for a suitable estimator K(ij) of

K(ij) = E(Y(t-i) Y(t-j)]

Possible estimators (for i, j=0,l,... ,m) are: least squares

forward algorithm

I T-m-lK(ij) -, T-- .-I, Y(t+m-i) Y(t4,m-j)

t=O

or least squares forward and backward algorithm

I T-m-iK(ij) 2(T-R) t0{Y(t+m-i) Y(t+m-j) +Y(t+i) Y(t+j)

i i___________________ -~-

-0

30

When several harmonics are present in the data, whose

frequencies are close together, least squares autoregressive

coefficient estimators are more effective than Yule-Walker

autoregressive coefficient estimators in providing autoregressive

spectral estimators which exhibit the split peaks one would like

to see in the estimated spectral density.

An important and popular algorithm for estimation of AR

coefficients was introduced by Burg in 1967 [see Burg (1967),

(1968)]. For references to descriptions of Burg's algorithm,

see Kay and Marple (1981).

a

*

*' 1

31

6. Autoregressive Order Determination

The problem of determining the orders of approximating

autoregressive schemes is an important example of the problem

of estimating a function by using the smallest finite number of

parameters which provide an adequate approximation of the

function. The true spectral density is denoted f(w) or

f (w). An approximation T(w) is defined by assuming a family

of densities fM which are functions of m scalar

parameters 0l'....em" The parameter values e1,..., which minimize

the cross-entropy H(f;f1 define a best approximating

spectral density f (W) = f- (w) . An estimator of f

is fm(W) = fl ... (M) where e1.... ,m minimizes H(f;f ...em)

To evaluate the properties of fm(w) as an estimator of f (w),

one must distinguish two kinds of error. The model approximation or

bias error is

B(m) = (f.; Tm ) .

The parameter estimation error or variance is

.

V(m, T) = E I (f m;fm)

As m tends to , B(m) tends to 0 and V(m, T) tends to w. The

optimal value m minimizes EI(f ; fm) as measured by

Ill ~ ~ ~ ~ o Il .. m "'-

32

C(m) B(m) + V(m,T)

In practice, one forms an estimator C(m) of the terms in C(m)

which depend on m.

One calls C(m) a criterion function for order determination.

It should be plotted, and interpreted as a function, not just

examined for its minimum value. It is useful to define a

best value of m (at which C(m) is minimized) and a second

best value of m (at which C(m) achieves its lowest relative

minimum).

One also has to define a value C(O) of the criterion function

at m=O. If

C(m) > C(0) for m=l,2,...

then the optimum order is 0, and the time series is considered

to be not significantly different from white noise. Further

research is required on the properties of order determining

criteria as tests for white noise.

Tests for white noise provide an alternative approach to

order determination since an autoregressive estimator fm(w) is

regarded as an adequate fit (or smoother) if the residual

spectral density f(w) + fm(w) is not significantly different

from the sample spectral density of white noise.

A widely used order determining criterion is that

introduced by Akaike (see Akaike (1974)]. It should be

33

emphasized that Akaike's criterion had a different conceptual

basis than the one outlined above; it seeks to determine the

order of an exact autoregressive scheme which the time series is

assumed to obey. Then one can raise the objection against it that

it does not consistently estimate the order, which is done by a

criterion due to Hannan and Quinn (1979). Our point of view is

that the approximating autoregressive scheme need only have the

property that f(w) fm(w) is just barely not significantly

different from the sample spectral density of white noise.

Akaike's order determining criterion AIC is defined by

AIC (m) = log om + -m 1

Possible definitions for AIC(O) are 0 or -I/T.

The Hannan and Quinn criterion is

lg^2

AICHQ(m) = log Cm + m log log T

Parzen (1974), (1977) introduced an approximating

autoregressive order criterion called CAT (criterion autoregressive

transfer function), defined by

CAT(m) - (1- ,

j 1 0M

CAT(O) = - ( +

In practice CAT and AIC lead in many examples to exactly

the same orders. It appears reassuring that quite different

conceptual foundations can lead to similar conclusions in

practice.

34

7. Suggestions for Empirical Spectral Analysic

The basic aim of spectral analysis is to obtain an estimated

spectral density which does not introduce spurious spectral

peaks, and resolves close spectral peaks. To arrive at the

final form of spectral estimator in an applied problem, auto-

regressive spectral estimators can be used to identify the

memory type of a time series (long, short, or no memory) and

the type of the whitening filter of a short memory time series

(AR, MA, or ARMA). An empirical time series spectral analysis

should involve the following stages.

A. 'Pre-processing. To analyze a time series sample

Y(t), t=l,...,T, one will proceed in stages which often involve

the subtraction of or elimination of strong effects in order to

see more clearly weaker patterns in the time series structure.

The aim of pre-processing is to transform Y(.) to a new time

series Y(.) which is short memory. Some basic pre-processing

operations are memory less transformation (such as square root

and logarithm), detrending, "high pass" filtering, and differencing.

One usually subtracts out the sample mean Y 1 Y(t); then thet-1

time series actually processed is Y(t) - Y.

B. Sample Fourier Transform by Data Windowing, Extending

with Zeroes, and Fast Fourier Transform. Let Y(t) denote a

pre-processed time series. The first step in the analysis could

be to compute successive autoregressive schemes using operations

only in the time domain. An alternative first step is the

computation of the sample Fourier transform

35

- T k(1) p(w) = [ Y(t) exp (-2 niwt) W = k = 0, 1,..

t=I Q

at an equi-spaced grid of frequencies in O<w<l. We call Q the

spectral computation number. One should always choose Q>T, and we

recommend Q>2T. Prior to computing tp(w), one should extend the

length of the time series by adding zeroes to it. Then 4)(w) can

be computed using the Fast Fourier transform.

If the time series may be long memory one should compute

in addition a sample "tapered" or "data windowed" Fourier

transform

T(2) W = I Y(t)W(4) exp (-2riwt)

C. Sample Spectral Density. The sample spectral density

f(w) is obtained essentially by squaring and normalizing the

sample Fourier transform;

= 2k(3) f(w) = P lw)I k=O, 1, ... ,Q-

(3) ~1 Ql1-

kioWV

D. Sample Correlation Function. The sample correlation

function p(v) is computed (using the Fast Fourier Transform).

E. Autoregressive analysis. The Yule-Walker equations are

solved to estimate innovation variances im, to which are applied

order determining criteria (AIC, CAT) to determine optimal

orders m and also to test for white noise. The value of o and

m

36

the dynamic range of the autoregressive spectral estimator

f (w) are used to determine the memory type of the time series.

Two orders (called the best m and second best M^ (2)) are

determined as candidates as optimal orders corresponding to the

absolute minimum and lowest relative minimum of the criterion

function.

F. ARMA analysis. When a time series is classified as

short memory, an approximating AR scheme of order 4m is

inverted to form MA (-) coefficients which are used to estimate

covariance matrix of Y(t-j) and YV (t-k). A subset regression

procedure is then used to determine a "best fitting" ARMA

scheme, and the corresponding ARMA spectral density estimator.

One will be able to identify moving average schemes and ARI4A

schemes which are barely invertible, and require a long AR scheme

for adequate approximation. The long AR spectral estimator

introduces spurious spectral peaks when compared to the MA or

ARMA estimator.

G. Non-stationary autoregression. When a time series is

classified as long memory, more accurate estimators of auto-

regressive coefficients are provided by minimizing a least

squares criterion or by Burg estimators. When several harmonics

are present in the data, whose frequencies are close together,

least squares autoregressive coefficient estimators are more

effective than Yule-Walker autoregressive coefficient estimators

in providing autoregressive spectral estimators which exhibit

the split peaks one would like to see in the estimated spectral

density.

37

H. Long Memory analysis. In the long memory case, one may

want to represent Y(t) as S(t) + N(t), a long memory signal plus

short memory noise. An approach to this problem may be provided

by treating the sample spectral density values f(k/Q) as a data

batch to be studied by non-parametric data modeling methods using

quantile functions [see Parzen (1979)]. The details of such

methods are under development.

I. Nonparametric kernel spectral density estimator. An

estimator f(w) of the spectral density is called: parametric

when it corresponds to a parametric model for the time series

(such as an AR or ARMA model); non-parametric otherwise. A

general form of non-parametric estimator is the kernel estimator

f() [ k~v (v) e- 27Tiwv 0<W<i

The problem of determining optimum truncations points M has no

general solution; one approach is to choose M = 4m to obtain a

preliminary smoothing of the sample spectral density.

J. Inverse correlations and cepstral correlations.

Estimators of pi(v) and y(v) are computed and used to form

nonparametric kernel estimators of f-l(w) and'log f(w), which

may provide additional insights into the peaks and troughs to

be given significance in the final estimator of the spectrum.

Extensive comparisons of different methods of spectral

estimation are given in Pagano (1980), Priestley and Beamish

(1981), Kay and Marple (1981). It seems clear that autoregressive

38

spectral estimators can give superior results when properly used.

One should: determine two best orders; compute autoregressive

coefficients by Yule-Walker equations and by least squares

since when the time series is long memory autoregressive spectral

estimators are most accurate when based on least squares

estimators of autoregressive coefficients; use approximating

autoregressive schemes to determine if an ARMA scheme fits

better.

The end of the story of the search for the perfect spectral

estimator seems attainable if one does not think of spectral

estimation as a non-parametric procedure which can be conducted

independently of model identification.

39

8. A Bibliography of Autoregressive Spectral Estimation

The bibliograDhv aims to provide a comprehensive list of

the publications in English which are directly concerned with

developing the theory and methods of autoregressive spectral

esatimation.

This section lists some of the publications which

contributed to the development of AR spectral estimation.

Yule (1927) introduces autoregressive schemes to model

disturbed periodicities as an alternative to Schuster

periodogram analysis and its spurious periodicities;

Yule-Walker (1931) equations relate autoregressive

coefficients and correlations of a stationary time

series.

Wold (1938) introduces infinite order autoregressive and

moving average representations of a stationary time

series; rigorous conditions are given by Akutowicz (1957).

Mann and Wald (1943) derive asymptotic distribution of

estimators of autoregressive coefficients.

Levinson (1947) - Durbin (1960) derive recursive methods

of solving Yule-Walker equations which subsequently

lead to fast algorithms for calculation of high order

* AR schemes.

Whittle (1954) seems to be the first to use autoregressive

schemes to estimate a spectral density. He used a low

order model in a case where high order models are

indicated by order determining criterion (Akaike

(1974), p. 720].

4044

Grenander and Rosenblatt (1956) criticize attempts to apply

low order autoregressive schemes, and develop theory

of non-parametric spectral density estimation, as do

Bartlett, Parzen, and Tukey and Blackman.

Parzen (1964), Schaerf (1964), and Parzen (1968)

discuss autoregressive spectral estimation as a

method for empirical time series analysis; no theory

is given.

Burg (1967), (1968) publishes his pioneering work on MEM

(maximum entropy method of spectral estimation) and

his method of calculating their coefficients.

Akaike (1969), (1970) derives asymptotic variance formulas

for autoregressive spectral estimators, and states

FPE (final prediction error) criterion for order

determination; precursor of FPE in Davisson (1965).

Parzen (1969) derives heuristically a formula for the

asymptotic variance of AR spectral estimators,

confirmed by Kromer (1969) and Berk (1974); an order

determining criterion is proposed.

Kromer (1969) in an unpublished Ph.D. thesis presents first

rigorous analysis of asymptotic distribution of

4i autoregressive spectral estimators, especially their

bias; consistency is proved only in an iterated

limit mode of convergence.

Berk (1974) provides first proof of consistency of

autoregressive spectral estimators.

41

Carmichael (1976) in an unpublished Ph.D. thesis provides

alternative proof of consistency of autorcgressive

estimators, and extends technique to general problems

of density estimation.

Akaike (1973), (1974), (1977) introduces AIC for model

order criterion and relates it to entropy maximization

principles.

Parzen (1974), (1977) introduces CAT for AR order

determination based on concept of finite parameter AR

schemes as approximations to infinite parameter AR

schemes.

Hannan and Quinn (1979) derive a modification of AIC which

provides consistent estimators of the AR order, when

exact model is assumed to be a finite order AR.

Huzii (1977), Shibata (1977), and Bhansali (1980), discuss

rigorously the convergence of AR spectral estimators

and inverse correlations.

Childers (1978) and Haykin (1979) contain very useful

collections of papers.

Pagano (1980), Beamish and Priestley (1981), and Kay and

Marple (1981) provide illuminating reviews of AR

spectral estimators and comparisons with alternative

methods.

42

BIBLIOGRAPHY

Akaike, H. (1969) Fitting autoregressive models forprediction, Ann. Inst. Statist. Math., 21, 243-247.

(1969). Power spectrum estimation throughautoregression model fitting, Ann. Inst. Statist.Math., 21, 407-419.

_ (1970) A fundamental relation between predictoridentification and power spectrum estimation. Ann.Inst. Statist. Math. 22, 219-223.

Ann (1970) Statistical predictor identification,Ann. Inst. Statist.-Math., 22, 203-217.

(1973) Information theory and an extension ofthe maximum likelihood principle. 2nd Int SYm .on Information Theory. B. N. Petrov--and-F. Saki,ed. Budapest: Akademia Kiado, 267-81.

(1974) A new look at the statistical modelidentification, IEEE Trans. Autom. Contr., AC-19,716-723.

_ (1977) On entropy maximization principle,Applications of Statistics, P. R. Krishnaiah, ed.,North Holland, Amsterdam, 27-41.

_ (1979) "A Bayesian Extension of the Minimum AICprocedure of Autoregressive Model Fitting, Biometrika,66, 237-242.

Akutowicz, E. J. (1957) On an explicit formula in linearleast squares prediction. Math. Scand. 5, 261-266.

Baggeroer, A. B. (1976) Confidence Intervals for Regression(MEM) Spectral Estimates, IEEE Trans. Inform. Theory,IT-22, 534-545.

Barndorf-Nielsen, 0. and Schon, G. (1973) On the parametr-ization of autoregressive models by partialautocorrelations, J. Multivariate Analysis, 3,408-419.

Beamish, M. and Priestley, M. B. (1981) A study ofautoregressive and window spectral estimation. Appl.Statist., 30, 41-58.

43

Berk, K. N. (1974) Consistent autoregressive spectralestimates. Ann. Statist., 2, 489-502.

Bhansali, R. J. (1980) Autoregressive and windowestimates of the inverse correlation function,Biometrika, 67, 551-66.

Bhansali, R. J. and Downham, D. Y. (1977) Some propertiesof the order of an autoregressive model selected bya generalization of Akaike's EPF criterion.Biometrika 64, 547-51.

Brillinger, D. R. (1981). Time Series: Data Analysis andTheory. Expanded Edlto-on, HoldenD-ay: San Francsco[see pg. 5121.

Burg, J. P. (1967) "Maximum entropy spectral analysis,"in Proc. 37th M n Societ of ExplorationGeopysiciEst (Oklahoma City, OK), ct. 31, r967.

_ (1968) A new analysis technique for time seriesdata, NATO Advanced Study Institute on SignalProcessn . Enschede,--e Netherlan.

Carmichael, J. P. (1976) The Autoregressive Method,unpublished Ph.D. thesis, Statistical ScienceDivision, State University of New York at Buffalo.

. (1978) Consistency of the Autoregressive Methodof Density Estimation, Technical Report, StatisticalScience Division, State University of New York atBuffalo.

Chatfield, C. (1979) Inverse autocorrelations, J. R.Statist. Soc. A 142, 363-77.

Childers, D. G. (1978) Modern Spectrum Analysis, NewYork: IEEE Press.

Cleveland, W. S. (1972) The inverse autocorrelationsof a time series and their applications. Technometrics14, 277-93.

Davisson, I. D. (1965) The prediction error of astationary Gaussian time series of unknowncovariance. IEEE Trans. on Information Theory,IT-lI, 527-532.

Dickenson, Bradley W. (1978) Autoregressive estimationusing residual energy ratios, IEEE Transactions onInformation Theory, IT-24, 503-5M.

44

Durbin, J. (1960) The fitting of time series models, Rev.Inst. de Stat., 28, 233-244.

Durrani, T. S. and Arslanian, A. S. (1982) Windowsassociated with high resolution spectral estimators.Submitted for publication.

Fuller, W. A., Hasen, D. P., and Goebel, J. J. (1981)Estimation of the parameters of stochastic differenceequations. Annals of Statistics, 9, 531-543.

Geronimus, Y. L. (1960). Polynomials Orthogonal on aCircle and Interval, New York: Pergamon Press.

Gersch, W. (1970) Spectral analysis of EEG's byautoregressive decomposition of time series, Math.Biosci., 7, 205-222.

Gersch, W. and Sharpe, D. R. (1973) Estimation of powerspectra with finite-order autoregressive models,IEEE Trans. Automat. Contr., AC-18, 367-369.

Grandell, J., Hamrud, M., and Toll, P. (1980) A remarkon the correspondence between the maximum entropymethod and the autoregressive models, IEEE Trans.Inform. Theory, IT-26, 750-751.

Grenander, U. and Rosenblatt, M. (1957) StatisticalAnalysis of Stationary Time Series, New York: Wiley.

Grenander, U. and Szego, G. (1958) Toe litz Forms andtheir Applications, Berkeley: Univ. Caliorna -

Press.

Griffiths, L. J. and Prieto-Diaz, R. (1977) Spectralanalysis of natural seismic events using auto-regressive techniques, IEEE Trans. Geosci. Electron.,GE-I5, 13-25.

Hannan, E. J. (1970) Multiple Time Series, Wiley: NewYork. [see p. 334].

Hannan, E. J. and Quinn, B. G. (1979) The determinationof the order of an autoregression, Journal of theRoyal Statistical Society, 41, 190-15

Haykin, S. S., ed. (1979) Nonlinear Methods of SpectralAnalysis. New York: Springer-Verlag.

Hsu, F. M. and Giordano, A. A. (1977) Line tracking usingautoregressive spectral estimates, IEEE Trans.Acoustics, Speech, Signal Process., A-- P 7,5l0-5l9.

45

Huzii, M. (1977) On a spectral estimate obtained by anautoregressive model fitting, Ann. Inst. Statist.Math., 29, 415-431.

Jones, R. H. (1974) Identification and autoregressionspectrum estimation, IEEE Trans. Automat. Contr.,AC-19, 894-898.

(1975) Fitting autoregression. J. Amer.Statist. Assoc., 70, 590-592.

(1976) Autoregression order selection, Geophys.,41, 771-773.

Kaveh, M. and Cooper, G. R. (1976) An empiricalinvestigation of the properties of the autoregressivespectral estimator, IEEE Trans. Inform. Theory,IT-22, 313-323.

Kay, S. M. (1978) Improvement of autoregressive spectralestimates in the presence of noise, Rec. 1978 Int.Conf. Acoustics, Speech, and Signal Processing,357-360.

Kay, S. M. and Marple, S. L., Jr. (1979) Sources of andremedies for spectral line splitting in autoregressivespectrum analysis, Rec. 1979 Int. Conf. Acoustics,Speech, and Signal P--ce ng,--T5 1=-.

Kay, S. M. (1979) The effects of noise on the auto-regressive spectral estimator, IEEE Trans. Acoust.,Speech, Signal Process., ASSP-277 78"48.

(1980) Noise compensation for autoregressivespectral estimates, IEEE Trans. Acoust., Speech,Signal Process., ASSP-2-, - 03,

Kay, S. M. and Marple, Stanley L., Jr. (1981) SpectrumAnalysis -- A Modern Perspective. Proceedings ofthe IEEE, 69, 1380-1419.

Kromer, R. E. (1969) Asymptotic properties of the auto-regressive spectral estimator. Ph.D. Thesis.Statistics Dept., Stanford University.

Lacoss, R. T. (1971) Data adaptive spectral analysismethods, Geophysics, 36. 661-675.

Landers, T. E. and Lacoss, R. T. (1977) Some geophysicalapplications of autoregressive spectral estimates,IEEE Trans. Geosci. Electron., GE-15, 26-32.

Levinson, N. (1946) The Wiener (root mean square) errorcriterion in filter design and prediction, J. MathPhys., 25, 261-278.

46

Mann, H. B. and Wald, A. (1943). On the statisticaltreatment of stochastic difference equations.Econometrica, 11, 173-220.

Marple. S. L., Jr. (1980). A new autoregressive spectrumanalysis algorithm, IEEE Trans. Acoust., Speech,Signal Process., ASSP-79,Z-454.

McClave, J. (1975). Subset autoregression. Technometrics,17, 213-220.

Morf, M., Vieira, A., Lee, D. T. L., and Kailath, T. (1978).Recursive multichannel maximum entropy spectralestimation, IEEE Trans. on Geoscience Electronics,GE-16, 85-94.

Nevai, Paul G. (1979). An asymptotic formula for thederivatives of orthogonal polynomials. SIAM J.Math. Anal. 10, 472-477.

Pagano, M. (1980). Some recent advances in autoregressiveprocesses. Directions in Time Series, ed. D. R.Brillinger and G. C. Tiao,_nst1tute of MathematicalStatistics, 280-302 (Comments by H. T. Davis).

Parzen, E. (1964). An approach to empirical time series,J. Res. Nat. Bur. Standards, 68B, 937-951.

(1967). Time Series Analysis Papers, HoldenDay: San Francisco. Includes Parzen (1964).

_ (1967). The Role of spectral analysis in timeseries analysis: Review of the InternationalStatistical Institute. T5,--=5-141.

• (1968). Statistical spectral analysis (singlechannel case) in 1968, Proceedings of NATO AdvanceStudy Institute on Signal Processin , Enscede,The Netherlands.

(1969). Multiple time series modelling,----Multivariate Analysis II, P. R. Krishnaiah, ed.

New York: Academic Press, pp. 389-409.

(1974). Some recent advances in time seriesmodeling, IEEE Trans. Automat. Contr., AC-19,723-730.

_ (1977). Multiple time series: determining theorder of approximating autoregressive schemes,Multivariate Analysis IV, P. Krishnaiah, ed. North-Holland: Amsterdam, 287-295.

47

(1979). Forecasting and whitening filterestimation, TIMS Studies in the Management Sciences,12, 149-165.

. (1979). Nonparametric statistical data modeling,Journal of the American Statistical Assoc., 74,1053l.-_

• (1980). Time series modeling, spectral analysis,and forecasting, Directions in Time Series Anal sised. D. R. Brillinger and G. U. 7Tia-, Institute ofMathematical Statistics.

' (1982). Time Series Model Identification andPrediction Variance Horizon, Applied Time SeriesAnalysis II, ed. D. Findley, Academic-Tress;New York,-l15-447.

Pinsker, M. (1963). Information and Information Stabilityof Random Variables, Holden--ay:. San Francisco.

Priestley, M. B. (1981). Spectral Analysis and TimeSeries, London: AcademiTcPress.

Rice, J. (1979). On the estimation of the parameters of apower spectrum. J. Multivariate Analysis, 9,378-392.

Sakai, H. (1979). Statistical properties of AR spectralanalysis, IEEE Trans. Acoust., Speech, SignalProcessing, ASSP-27, 4 3-4J9.

Schaerf, Mirella C. (1964) Estimation of the covarianceand autoregressive structure of a stationary timeseries. Ph.D. Thesis. Statistics Dept. StanfordUniversity.

Shibata, R. (1976). Selection of the order of anautoregressive model by Akaike's InformationCriterion. Biometrika, 63, 117-26.

(1980). Asymptotically efficient selection

of the order of the model for estimating parametersof a linear process. Ann. Statist. 8. 147-64.

• (1981). An optimal autoregression spectralestimate. Annals of Statistics, 9, 300-306.

Shore, J. E. (1981). Minimum Cross-Entropy SpectralAnalysis, IEEE Transactions on Acoustics, Speech,and Signal-FrP-cessing, ASSP-2, 2-237.

Shore, J. E. and Johnson, R. W. (1980). Axiomatic Derivationof the Principle of Maximum Entropy and the Principleof Minimum Cross-Entropy IEEE Transactions onInformation Theory, IT-26,--37.

48

Smylie, D. E., Clarke, G. K. C., and Ulrych, T. J. (1973).Analysis of irregularities in the earth's rotation,in Methods in Computational Physics, vol. 13. B. A.Bolt, ed. New York: Academic Press, pp. 391-430.

Thomson, D. J. (1977). "Spectrum estimation techniquesfor characterization and devleopment of WT4 wave-guide-l, Bell Syst. Tech. J., 56, 1769-1815.

Tong, H. (1977). More on autoregressive model fittingwith noisy data by Akaike's information criterion,

IEEE Trans. Inform. Theory, IT-23, 404-410.

Ulrych, T. J. and Bishop, T. N. (1975). Maximum entropyspectral analysis and autoregressive decomposition,Rev. Geophysics Space Phys., 13, 183-200.

Ulrych, T. J. and Clayton, R. W. (1976) Time Seriesmodelling and maximum entropy, Phys. Earth PlanetaryInteriors, 12, 188-200.

VanDenBos, A. (1971). Alternative interpretation ofmaximum entropy spectral analysis, IEEE Trans.Inform. Theory, IT-17, 493-494.

Walker, G. (1931). "On periodicity in series of relatedterms," Proc. Roy. Soc. London, Series A, 131,518-532.-

Whittle, P. (1954). The statistical analysis of a seicherecord. J. Marine Res. 13, 76-100.

(1963). On the fitting of multivdriate auto-regressions, and the approximate canonical.factorization of a spectral density matrix.Biometrika, 50, 129-134.

Wiener, N. (1949). The Extrapolation, Interpolation,and Smoothing oTStationar Time Series. New York,Wiley. Includes Levinson 194-T

Wiggins, R. A. and Robinson, E. A. (1965). Recursivesolution to the multichannel filtering problem,Journal of Geophysical Research, 70, 1885-1891.

Wold, H. (1938). A Study in the Analysis of StationaryTime Series. kt-c kf-m-Almquist an- Wiksell.

Yule, G. U. (1927). "On a method of investigatingperiodicities in disturbed series, with specialreference to Wolfer's sunspot numbers," PhilosophicalTrans. Royal Soc. London, Series A, 226, 267-298.

Pu;

Documents

END - m IhE~ - DTIC · 2014. 9. 27. · spectral analysis are outlined in section 7. The final section provides a guide to the literature of autoregressive spectral estimation by