Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
ADAL17 l" TEXAS A AND 14 UNIV COLLLSE STATION INST OF STATISTICS F/0 11/1AUTOREOR9SSIVE 5PEC TRAL ESTIMATION. (U)JUN 99 9 PARZEN DAA6960-C0070
UNCLASSIFIED T314*S N
END
MMMMMMMMjI'
L - m IhE~
4
TEXAS A&M UNIVERSITY
LCOLLEGE STATION, TEXAS 77843-3143
INSTITVTE OF STATISTICS'3 - 345-3141
AUTOREGRESSIVE SPECTRAL ESTIMATION
by Emanuel Parzen
Institute of Statistics
Texas A&M University
Technical Report No. N-28
June 1982
'< JUL 2 1
Te>.as A&M Research FoundationProject No. 4226T
"Multiple Time Series Modeling and
Time Series Theoretic Statistical Methods"
Sponsored by the Office of Naval Research
Approved for public release; distribution unlimited.
|013
UnclassifiedSECURITY CLASSIFICATION O0 THIS PAGE (iMen Datle eoro*
REPORT DOCUMENTATION PAGE IN__ STRUCTIONSREPOT DOUENTTIONPAGEBEFORE COMPLETING FORM
I. REPORT NUMBER OVT ACCESSION NO S. RECIPIENTS CATALOG NUME
N-28 1 ///, 64. TITLE (andSublitlo) B. TYPE OF REPORT I PERIOD COVERED
AUTOREGRESSIVE SPECTRAL ESTIMATION Technical
6. PERFORMING ORG. REPORT NUMMER
.AUTHOR ) S. CONTRACT OR GRANT NUMMER(q)
Emanuel Parzen ONR N00014-82-MP-2001
ARU DAAG29-80-C-0070S. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK
AREA I WORK UNIT NUMBERS
Texas A&M UniversityInstitute of StatisticsCollege Station, TX 77843
It. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE
June 198213. NUMBER OF PAGES
14. MONITORING AGENCY NAME & ADDRESS(If different from Confrolling Office) IS. SECURITY CLASS. (of this report)
Unclassified
IS&. DECLASSIFICATION/DOWNGRADINGSCHEDULE
16. DISTRIBUTION STATEMENT (of this Report)
Approved for public release; distribution unlimited.
17. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, it different from Report)
NA
IS. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on revere side If necessary ad identify by block number)
Time series analysis, autoregressive spectral density, maximumentropy, empirical spectral analysis.
20. AeS1 RACT (Continue onree aS side If necessar WW Identify by block mmber)Autoregressive spectral estimation is one of the new techniaues
for spectral analysis developed in the last two decades. Itstheory and applications are extremely extensive. This article aimto provide an overview (rather than a detailed account) of themain ideas. A comprehensive bibliography guides the reader toarticles in which case studies and comparisons of autoregressivespectral estimators are described.
DD IoAm 1473 EDITION OF I NOV 65 IS OBSOLETE UnclassifiedS/N0102-F- 014-6601 ECURITY CLASSIFICATION OF THIS PAGE (When Dale 3hlSeO
AUTOREGRESSIVE SPECTRAL ESTIMATION
by Emanuel Parzen
Institute of Statistics, Texas A&M University
0. Introduction
The problem of spectral analysis of time series is clearly
of great interest to the many applied scientists who use
spectral analysis in their scientific research. It should be
of great interest to statisticians because it embodies
prototypes of two of the great problems of modern statistics:
functional inference and modeling. A problem of statistical
inference usually assumes three ingredients: a sample of
observations, a parameter which indexes the family of possible
joint probability densities of the sample, and a formula for
the probability density of the sample
f(sampleIparameter).
Classical statistical inference assumes the parameter is a
finite dimensional parameter 0 = (01,..., 0 k). Functional
inference assumes the parameter is a function, such as f(w),
O<w<l.
The parameter estimation problem seeks to form optimalA
estimators (denoted 0) of the parameter. A typical model
This research was supported by the Office qf NavalResearch (Contract N00014-81-MP-10001, ARO DAAG29-80-C0070).
2
identification problem seeks to find the smallest number of
significantly non-zero components 0 of the parameter 0.
Estimation of a function often has similar features to
model identification, since a function can be parametrized
exactly by a countable infinity of parameters. However in
practice one can only efficiently estimate a finite number of
parameters. Therefore to estimate a function one must use
the smallest finite number of parameters which provide an
adequate approximation of the function.
The goals of functional inference and model identification
are in my view best pursued simultaneously. One seeks methods
of statistical inference which are finite-parametric and
non-(or infinite-) parametric. One achieves this goal by
using finite parameter models (which theory indicates might
be exact methods) in ways that enable them to be interpreted
also as approximating models.
Autoregressive spectral estimation is one of the new
techniques for spectral analysis developed in the last two
decades. Its theory and applications are extremely extensive.
This article aims to provide an overview (rather than a detailed
:1I account) of the main ideas. A comprehensive bibliography guides
the reader to articles in which case studies and-comparisons of
autoregressive spectral estimators are described.
The spectral density is defined in section 1. Infinite
order AR and MA representations of a stationary time series
are introduced in section 2. Entropy as a motivation for
3
autoregressive schemes is discussed in section 3. Alternate
parametrizations of an autoregressive scheme are outlined in
section 4. Algorithms for computing the coefficients of
autoregressive spectral densities are stated in section 5.
Criteria for determining the orders of autoregressive schemes
are mentioned in section 6. Suggestions for empirical
spectral analysis are outlined in section 7. The final
section provides a guide to the literature of autoregressive
spectral estimation by listing the articles which correspond
to some important developments.
1. Correlations and Spectral Functions of a Stationary Time
Series
The theory of time series discusses separately discrete
parameter time series {Y(t), t=O, +1, +2,...} and continuous
parameter time series {Y(t), -o<t<-o}. Only the former case is
discussed in this article. Discrete parameter series usually
arise by observing a continuous parameter time series at equi-
spaced times. The frequency variable w of a pure harmonic
4(i) Y(t) = A cos 27rwt + B sin 2nwt
observed at t = 0 , +1,... can be assumed to vary in either
-0.5 < w < 0.5 or 0 < w < 1. The first interval is usually
adopted, but the second interval will be adopted in this
article because it is more convenient for developing
- i ll I Il h l - -- , V. ,A
4
ibomorphisms between spectral analysis and non-parametric data
modeling using quantile and density-quantile functions [see
Parzen (1979)].
The definitions and notation we adopt for the functions
used to describe a zero mean stationary Gaussian discrete
parameter time series Y(t), t=O, +1 ... are as follows.
A "time domain" specification of the probability law of
Y(.) is provided by the covariance function
(2) R(v) = E[Y(t)Y(t+v)], v=O, +1, +2 ....
or by the variance R(O) and the correlation function
(3) P(v) = R(v) = Corr [Y(t), Y(t+v)].
.., To define'spectral (frequency) domain specification of
the probability law of Y(.) we first assume summability of
R(.) and p('). The Fourier transforms of R(v) and p(v) are
called the power spectrum S(w) and spectral density function f(w)
respectively, and are defined by
(4) S(w) = I e-27ivwR(v), Ow<l;V=0
00 (5 2nivw(5) f(W) e P(v), O<W<l.
The spectral distribution function is defined by
(6) F(M) = fW f(W') dw', O<w<l0
5
For data analysis one actually computes a modified spectral
distribution function denoted F+(w) and defined for O<w<0.5:
(7) F+(w) = 2F(w), 0iw<0.5.
A correlation function p(v) has a mathematical property of
being a positive definite function which, without assumingOTIC
summability, guarantees the existence of: (1) a spectral copyWgSPECTED
distribution function F(w), and (2) the spectral representation; 2
of the correlation functione(v) given by k ! T(8) p(v) i 2fivW dF(w)
cos 2vw dF (W), W F (.
The notion of an ergodic time series is not given a rigou-
definition in this article but its intuitive meaning is important
for us. We call a time series ergodic when the parameters of
its probability law possess consistent estimators (and thus can
be determined with probability one, given a sample of infinite
length). An example of a non-ergodic stationary Gaussian zero
mean time series is
Y(t) = A cos 2rwt + B sin 2Trwt
where A and B are independent N(0,o2 ) random variables. One
can infer the values of A and B in the sample observed, but
one cannot infer the value of a2. This time series has
correlation function
6
(9) p(v) Cos 21wv, v=O, + 1, + 2,...,
and does not possess a spectral density.
Parzen (1982) proposes that it is useful in practice to
distinguish qualitatively between three types of time series,
which we call
no memory: white noise,
short memory: stationary and ergodic,.
long memory: non-stationary or non-ergodic.
A no-memory or white noise time series is a stationary
Gaussian time series satisfying either of the equivalent
conditions:
p(v) = 0 for v>O;
(10)
f(W) : 1,0<w<l.
A short memory time series is a stationary time series
possessing a summable correlation function p(v) and a spectral
density f(w) which is bounded above and below in the sense
that the dynamic range of f(w)
max min(11) DR(f) = 0<w<l f(w) 0<w<l f(w)
satisfies 1< DR(f)< o.
A long memory time series is one which is neither no
memory nor short memory; alternatively, a long memory time
series is one which is non-stationary or non-ergodic. It
usually has components representing cycles or trends. An
example of a long memory time series is (1) where A and B
are independent N(O,a2 ) random variables.
For a short memory time series one can define the inverse-
correlation function
(12) pi(v) = 1 e2wivw f-l(w)dw + i fl(w) dwO 0
and the cepstral-correlation function
(13) y(v) f e2iivw log f(w) dw0
It should be noted that the inverse-correlation function is
positive definite. However the cepstral-correlation function
is not. These new types of correlation functions are
introduced because they may provide more parsimonious
parameterizations in the sense that they decay to 0 faster
than does the correlation function. Statistical inference
from a sample of the probability law of a time series often
achieves greatest statistical efficiency by using the most
parsimonious parametrizations. Thus to form estimators f(W)
of the spectral density f(w) from a raw estimator f(W),
greater precision may be attained by first forming estimators
{f-l(w)}^ and {log f(w)}^ of the inverse or logarithm of the
spectral density. Autoregressive spectral estimation may be
regarded as an approach to estimating f(w) by first
estimating fl(W).
L •.
8
2. Filter Models of a Short Memory Stationary Time Series
A short memory zero mean Gaussian stationary time series
Y(t) has a property of fundamental importance: it can be
linearly transformed to a white noise time series, denoted
YV (t) or E(t), by a filter that is invertible. The definition
of Y (t) is provided by the theory of minimum mean square error
prediction.
The definitions and notation of prediction theory that
we adopt are as follows:
(1) Ym (t) ' E[Y(t)jY(t-l),...,Y(t-m))
denotes the memory m one-step ahead predictor while
(2) y ,m (t) = Y(t) - Y.IIm(t)
is the prediction error, and
(3) CT = E[IYV,m (t)12] + E[IY(t)1 2]
is the mean square prediction error measured in units of the
variance R(O) of Y(t). Corresponding infinite memory notation
'is
1(4) Y(t) = E[Y(t)IY(t-I), X(t-2) ....
(5) YV (t) = Y(t) - Y1'(t)
(6) a2 = E[Yv(t)I2] " E[IY(t) 12 ].
9
Explicit formulas for the foregoing predictors,
prediction errors, and mean square prediction errors can be
obtained from the correlation function p(v). The autoregressive
coefficients cm(1),...,cm(m) of order m are defined by
(7) -Y m(t) = cra(1) Y(t-l) +...+cra(m) Y(t-m),
(8) yVm (t) = Y(t) + am(1) Y(t-l)+...+x (M) Y(t-m).
A predictor is characterized by the condition that the
prediction error is orthogonal (normal) to the predictor
variables:
(9) ElY 'vIm(t)Y(t-k) ] = 0, k=l,. .. m
By substituting (8) into (9) one obtains the famous Yule-Walker
equations, defining am(o) = 1.
m(10) 1 t m(j) p(j-k) = 0, k=l,...,m
j=0
One obtains G2 bym
m
(11) an = E[YV'm(t) Y(t)]- E[IY(t) = m p(j)j=0
For a short memory time series, these equations also hold with
The time series of infinite memory prediction errors
YV (t) is always a white noise series called the innovations.
10
It provides a transformation of the time series Y(t) to a
white noise time series Y (t) which we write
(12) Y(t) + a (l) Y(t-l) YV(t),
and call an AR(-) or infinite order autoregressive scheme for
Y(t). An MA() or infinite order moving average scheme
representation is
(13) Y(t) = YV(t) + B(1) Y(t-l) +.
whose coefficients B (k) can be determined recursively from
a (j) byB (0) = 1, and for k>O
k(14) Bj(k) + I U (j) a. (k-j) = 0
j=l
The AR(-) and MA(-) representations have important
implications for spectral analysis since they provide formulas
for the spectral density function f(w) alternative to the
formula that f(w) is the Fourier transform of p(v). One
can show that
2 2i)(15) f(W) = a h(e 2 i)12
(16) f- 1 ) = oIg.e 2i 2
defining
(17) h (z) "jz0 B (j) I, g(z) = kk.
I= I Ik =i. ,l .,,.--' ''
These infinite series converge in general in mean square on
the unit interval. In order to guarantee pointwise convergence
at each w in O<w<l one must make an additional smoothness
condition such as a Lipshitz condition on f(w), which is implied
in turn by the condition
(18) IvIlP(v)I <-
In spectral analysis we usually assume at least the existence
of a continuous second derivative, which is implied by the
condition
(19) Iv1'I P(v)I < -.
The notion of a time series Y(.) being an autoregressive
scheme of order p, denoted AR(p), can be defined in terms of
prediction theory as follows: Y(.) is an AR(p) if and only if
the memory p prediction errors yV'P(.) is a white noise time
series and a p(p) # 0. The spectral density of yV'P(.) can be
expressed in terms of the autoregressive transfer function of
order p
(20) g p(Z) = ap(0) +a p(1) z+...+a p(p) zp
by
12
(21) f VP() Ig2p ( e 2 r iiw f()
If the time series Y(.) is in fact AR(p), then its spectral
density equals the function
(22) f (W) = a2 Ig (e27riw)I-2P p p
which we call, in general, the approximating autoregressive
spectral density of order p. A time series Y(.) can be
regarded as approximated by an AR(p) if
(23) M (w) = f(w)
can be regarded as not "significantly" different from the
constant 1. In this way a test of the hypothesis that a time
series Y(-) is AR(p) can be converted to a test of the hypothesis
that the prediction error time series is white noise.
The sequence of approximating autoregressive spectral
densities fm(w), m-1,2,... may be shown to converge as m tends
to - to f(w) at each w in O<w<l under suitable conditions (see
especially Nevai (1979)). Sufficient conditions are that f(w)
has finite dynamic range (and therefore is bounded above and
below) and has a continuous derivative. When an estimator,
denoted fm(w),of fm(w) is used as an estimator of f(w), one has
to take into account two kinds of errors, called respectively
13
bias and variance. Bias is a measure of the deterministic
difference between fm(w) and f(w), while variance is a measure
of the stochastic distance between fm (w) and fm(w). As m
increases bias decreases while variance increases. This is an
example of the fundamental problem of empirical spectral
naalysis which is how to achieve an optimal balance between
bias and variance. When one uses autoregressive spectral
estimation, this problem reduces to a question of determining
the order m of the approximating autoregressive scheme, which
is discussed in section 6.
It should be noted that basic references for the
mathematical properties (especially convergence) of autoregressive
transfer functions gm(z) are Geronimus (1960) and Grenander and
Szego (1958).
14
3. Entropy Interpretation of Autoregressive Spectral Densities
The use of autoregressive spectral densities as exact
models, and as approximating models, for true spectral densities
is often questioned by sceptical statisticians on the ground
that their use in general is ad hoc and without theoretidal
justification. A possible answer to this criticism is provided
by entropy concepts. This section reviews these concents in order
to state their application to spectral estimation.
The notion of entropy in statistics is usually first defined
for a discrete distribution with probability mass function p(x).
The entr6py of this distribution, denoted H(p), is defined by
(1) H(p) = - p(x) log p(x)x
For the distribution of a continuous real valued random variable
X, with probability density function f(x), entropy is defined
(analogously or formally) by
(2) H(f) = -fc f(x) log f(x) dx
A concept closely related to entropy is information
divergence 1(f;g) between two probability densities f(x) and
g(x), defined by
(3) I(f;g) = fw{-log x)} f(x) dx
15
One should note that I(f;g) equals minus the generalized
entropy H(fjg) defined by
(4) H(flg) = f(x) log f(x)} g(x) dx
(4)_g -g- gx)
Another fundamental concept is cross-entropy defined by
(5) H(f;g) = fD {-log g(x)} f(x) dx.
Note that H(f) = H(f;f).
Information divergence is expressed in terms of cross-
entropy and entropy by
(6) l(f;g) = H(f;g) - H(f)
Important Information Inequality:
(7) l(f;g) > 0
with equality if and only if f f g; consequently
(8) H(f) < H(f;g)'J
Proof: Apply Jensen's inequality which states for an
arbitrary function h(x)
(9) fo {log h(x)} f(x) dx < log f]o h(x) f(x) dx
I-i~wr -- D-O
16
with equality if and only if h(x) = 1 for almost all x with
respect to the probability density f(x).
Some applications of entropy in probability and statistical
modeling are as follows.
The method of maximum likelihood parameter estimation can
be described abstractly as follows. One introduces a parametric
family of probability densities fe(x), indexed by a vector
parameter e= (e1,...,ek). Suppose there is a true parameter
value e in the sense that the true probability density
f(x) = f- (x). Then e satisfies
min Hf(10) H(f) = H(f;f6) = m f )
To estimate e from data, one forms an estimator H(f;fe) of
H(f;f) 6and defines an estimator a of - by
(11) H(f; f;) = Hin H (f;f0)
The estimator H(f; fr) could be of the form
(12) H(f; fe) H(f; f0)
for a suitable raw estimator f(x) of f(x).
The parametric families of probability densities fe(x)
are often derived axiomatically using a maximum entropy principle.
17
Theorem: Fix k functions T(x), J-l,2,... ,k, and k real
numbers T, T2 1...PTk such that there exists probability
densities f(x) satisfying
(13) f Tj(x) f(x) dx Tj, J=l,...,k.-00
The density with maximum entropy H(f) among these densities
is of the form
k(14) log fe(x) =J-lejTj(x) -TO ... ,ok)
where
k(15) T(el... ,1ek) = log fJdx exp =j ejTj(x)
and eI,....ek are chosen to satisfy
(16) fC T.(x) fe(x) dx = j, j1l,...,k.CO 6
Proof: The theorem may be proved by calculus of variations
arguments. A quick proof is to verify that for any f(x)
satisfying the moment constraints (13)
k(17) H(f; f8) - T(819,...,bek) - H(f ) ,
and therefore
18
(18) H(f) < H(f;f 0 ) - H(f 0 ).
Thus the maximum entropy is achieved by fe(x).Natural Exponential models. A parametric family of probability
densities fs (x) is said to obey a natural exponential model
when it is of the form (14). Thus natural exponential models
are maximum entropy probability densities.
To extend entropy concepts to short memory stationary zero
mean Gaussian time series, define the information divergence
for a sample {Y(t), t=l,2, ... ,T} as a function Of the true
probability density f of the sample, and a model g for f. We
define
(19) I(f; g) = T I(f;g)
-1 [1 g(Y(l),...,Y(T)) 1(20) IT(f;g) = T Ef 0lo f(Y(l),...,Y(T))
It should be noted that we are using the notation f and g with
a variety of meanings. For a Gaussian zero mean stationary time
series, the probability density of the sample is specified by
the spectral densities f(w) of the true distribution and g(w)
of the model. The arguments of the information divergence
I(f;g) indicate spectral densities in the following discussion.
Pinsker (1963) derives the following very important formula:
(21) I(f;g) 1 f I f2w)4- f(w)0 9g7
19
Since u - log u - I > 0 for all u, I has two of the properties
of a distance: I(f;g) > 0, I(f;f) = 0. However I does not
satisfy the triangle-inequality.
We define the cross-entropy of spectral density functions
f(w) and g (w) by
(22) H(f;g) {log g(W) + f(w)} dw
The entropy of f is
(23) H(f) = H(f;f) = . f' {log f(w) + 1} dw0
Information divergence can be expressed
(24) l(f;g) = H(f;g) - H(f)
Hence
(25) H(f) < H(f;g)
An approximating autoregressive spectral density of order
m, denoted w), to a spectral density f(w) is defined by
(26) H(f;m) _ min H(f;fm)fm
where the minimization is over all fm of the form
20
(27) fmo(w) -; jgm (e 2 riw) -amR fm
(28) gi(z) - 1 + cm(l) z+...+o (m) zm
One may verify that
7 i gMe2(iT)(29) H(f;f m {log M 0 egmi f(w)dt}
The coefficients o- , (1),... ,m(i) of the minimum cross-
entropy approximating autoregressive spectral density satisfy
(30) -M= f1Im(e 2 7iw)12 f(w) dw
0
fljm~ 2 f(j) fd w dw,(J PJ
(31) fme21Tiw -2nikw f(w) dw
(3) f k(e )e-0
a m(J) p(j-k) - 0, k1,2,...,m0
Further
(32) H(f;fm) - {log , + 1 -(
The autoregressive spectral density Tm(M) can be derived
axiomatically using a maximum entropy principle.
21
Theorem: The spectral density with maximum entropy among
all spectral densities f(w) satisfying the constraints
(33) f1 e 2 ' i wj f(w) dw = p(j), j=l,2,...m0
for m specified correlation coefficients (l),...,p(m) is
Im )
Proof: It may be verified that Tm(w) satisfies the
constraints (33), and (32) holds for any f(w) satisfying (33).
Since
(34) H(f) < H(f;Fm ) = H(Jm),
it follows that fm has maximum entropy among all spectral
densities satisfying the constraints (33).
The maximum entropy principle provides a motivation or
justification for the use of autoregressive spectral estimators.
However the maximum entropy principle provides no insight into
how to identify an optimal order m, or even what are the effects
of different methods of estimating the parameters o,
, cm(m). It provides no guidance for how to learn from
the data whether the time series is non-stationary (long memory)
or stationary (short memory), or whether the best time series
model is AR, MA, or ARMA. It is a principle for deriving
probability models, rather than statistically fitting models to
data. Further, the maximum entropy principle justifies auto-
regressive estimators only for short memory time series.
Autoregressive estimators are justified for long memory time
series by the fact that a pure harmonic Y(t) = A cos 2nwt
+ B sin 2nwt satisfies Y(t)- Y(t-l)+Y(t-2)-O where * - 2 cos 2 w.
22
4. Parametrizations of Autoregressive Spectral Estimators
There are many ways for forming autoregressive spectral
estimators, because there are four equivalent ways of parametrizing
them.
A. Autoregressive coefficients. Consider coefficients
a ()(m) such that g(z) = 1 +crn(l) z+.. .+am(m)zm satisfies
g(z) # 0 for complex z such that Iz] < 1. We call g(z) a
minimum phase filter transfer function. The autoregressive
coefficients determine a2 by
1= c~j g( 2riw) I2dwSf' Igm(e2
0
One computes the correlation coefficients p(l),...,p(m) by
P(j) f exp (27Tiwj) a m 2 Triw 1 dw.
B. Correlations. A set of m coefficients p(l),...,p(m)
such that the matrix [with p( 0 ) 1, p(-v) p(v)]
vp(O) p(-l) . . . p(l-m)p() p() . . . p(2-m)
p(m-1) p(m-2) . . p(O)
is positive definite are correlation coefficients of a time series.
They determine autoregressive coefficients by solving the Yule
23
Walker equations (with am (0) = 1]
I am (j) p(j-k) = 0, k=l,... ,m.j=O
m
One computes a' by a = PM
C. Partial Correlations Consider coefficients
Tr(l),...,(m) satisfying In(j)I < 1, j=l,2,...,m. They represent
partial correlation coefficients defined theoretically by:
ir(j) = Corr [Y(t), Y(t-j)IY(t-l),...,Y(t-j+l)]
Partial correlation coefficients determine autoregressive
coefficients and residual variances by a recursive algorithm
called the Levinson-Durbin recursion [see Levinson (1947) and
Durbin (1960)]: for k=l
a 1) =-7(1), u = 1 -(),
while for k=2, 3,...,m
a k (k) =-f(k) 2 Ok-i (j) p(k-j)a k-1 j=0
Y2 = a2Tr2(ak 0k-i {l- 2 (k)}
ak(j) ak 1 (j) - Tr(k) ak I (k-j)
24
Autoregressive coefficients determine partial correlation
coefficients by the recursion [see Barndorff-Nielsen and Schon
(1973)].
l(j) TT{1-'T2 (k)) - {k + 7T(k) ck(k-j)}
D. Residual variances. Consider coefficients02 . 2 satisfying
1 > a,2 > 02> . . .>o2 > 0 ,
11 2 m
and m coefficients representing sign n(1),...,sign 7(m). The
a's represent residual variances; they determine partial
correlation coefficients by a formula noted by Dickenson (1978)
7(k) =sign 7(k) I a'kl
• k
• 2
L,-
25
5. Empirical Autoregressive Spectral Estimation
Given a sample {Y(t), t-l,2,...,T} of a zero mean Gaussian
stationary time series whose spectral density f(w) is to be
estimated by an autoregressive spectral density estimator
f (W) = a- Ilgm( e 27i ) 2
(z)= i+ am(1) z+...+Lm(m) zm
we define the order identification problem to be the choice of
order m, and the parameter estimation problem to be the choice
of algorithm for computing the coefficients &m(1) .. (m(m) and
the residual variance "2.m
For a sample Y(l),...,Y(T) of a zero mean Gaussian
stationary time series, an approximation for the joint
probability density function f,(Y(l), .. ,Y(T)) indexed by a
parameter 6 is obtained as follows. We assume that the time
series Y(t) has been divided by {R(O)} so that its covariance
function equals its correlation function. Then
-2 log f(Y(l),...,Y(T)) = log (2 n)T det K-+ Y* K-IY
-2~~ loT YT
where * denotes complex conjugate transpose, Y* = (Y(l) ,...,Y(T)),T
SE YT Y* is a covariance matrix with (s,t)-element equal
to P8 (s-t). The subscript e on p0 (v) and f8 (w) indicate that
they are functions of the parameters 0, which are to be estimated.
- V-
26
The covariance matrix % is a Toepliz matrix. Asymptotically,
as T tends to -, all T by T Toeplitz matrices have the same
eigenvectors exp (-27itj/T), j=0,1,..., T-l. The corresponding
eigenvalues are fe(j/T). An approximation for likelihood
function frequently adopted is therefore
log f (Y(l),...,Y(T))
I g 2f+ f+ {log fe(w) + } dwo fM e
- log 2TT + H(f;f )
where f(w) is the sample spectral density defined by
~ T Tf(w) =ti Y(t) exp (-2nitw)j* t I y2(t)
tl t= 1
Maximum likelihood estimators e are asymptotically equivalent
to the estimators e minimizing the sample cross-entropy
H(f;fe).
If the parametric model f0 (w) is assumed to be the spectral
'! density of an AR(p), then estimators a', a (1),...,a(p) of the
coefficients satisfy Yule-Walker equations corresponding to the
sample correlation function
27
(v) f e Wv (w) dw
T-v T= I Y(t) Y(t+v) + Y2 (t)
t=l t=l
The sample correlation function,(v) can be computed, using the
Fast Fourier transform, by
v)= Q exp (27Ti kv) f()
which holds for 0 < v < Q - T.
Ic should be noted that we are assuming the time series
Y(t) to be zero mean, or more generally to have been detrended
by subtraction of j(t), an estimator of the mean value
function p(t) - E[Y(t)]. When p(t) = V, a constant, we take
A= ?. When p(t) is a function with period d (as might be
the case with d=12 for monthly time series) one might take
for P(t) the mean of Y(s) for all s=t modulo d.
By recursively solving the Yule-Walker equations one can
determine sequences of (1) estimated residual variances
Cy 1 02 ..>. .
(2) estimated partial correlations
--(2)~
28
(3) estimated autoregressive coefficients
a (0) = 1,o a (i) .... ,Om(m).! m m
(4) autoregressive spectral density estimators- m ^ -
km(w) = am I O cm(j) exp 27rijwl -
j =0
(5) residual spectral densities
fm(w) =f()m(
By forming a smoothed version fm(w) of fm(w) one can obtain a
final estimator f(w) of the unknown spectral density:
^2
f(W) m(w) fm(w)
When f(w) is tested for white noise, and found not to be
significantly different from white noise, then
f(W) -f (W),M
and the autoregressive spectral density estimator is the final
estimator.
The important question of criteria for choosing the orders
of approximating spectral densities is discussed in the next
section.
29
Computing estimators of autoregressive coefficients by
solving Yule-Walker equations is called stationary autoregression
because the autoregressive coefficients obtained are guaranteed
to correspond to a stationary time series. When am in the
foregoing analysis is tending to approximate 0, we consider
the time series to be long memory; experimental evidence
indicates that more reliable estimators of the spectral density,
and also of the autoregressive coefficients, are provided by
least-squares autoregression, which solves the normal equations
"A oo . A(,f m2K (0,0) ... K(0,m) I 0
K (1,0 .. .MA A 0
K(m,0) ... K(m,m) am(m) 0
for a suitable estimator K(ij) of
K(ij) = E(Y(t-i) Y(t-j)]
Possible estimators (for i, j=0,l,... ,m) are: least squares
forward algorithm
I T-m-lK(ij) -, T-- .-I, Y(t+m-i) Y(t4,m-j)
t=O
or least squares forward and backward algorithm
I T-m-iK(ij) 2(T-R) t0{Y(t+m-i) Y(t+m-j) +Y(t+i) Y(t+j)
i i___________________ -~-
-0
30
When several harmonics are present in the data, whose
frequencies are close together, least squares autoregressive
coefficient estimators are more effective than Yule-Walker
autoregressive coefficient estimators in providing autoregressive
spectral estimators which exhibit the split peaks one would like
to see in the estimated spectral density.
An important and popular algorithm for estimation of AR
coefficients was introduced by Burg in 1967 [see Burg (1967),
(1968)]. For references to descriptions of Burg's algorithm,
see Kay and Marple (1981).
a
*
*' 1
31
6. Autoregressive Order Determination
The problem of determining the orders of approximating
autoregressive schemes is an important example of the problem
of estimating a function by using the smallest finite number of
parameters which provide an adequate approximation of the
function. The true spectral density is denoted f(w) or
f (w). An approximation T(w) is defined by assuming a family
of densities fM which are functions of m scalar
parameters 0l'....em" The parameter values e1,..., which minimize
the cross-entropy H(f;f1 define a best approximating
spectral density f (W) = f- (w) . An estimator of f
is fm(W) = fl ... (M) where e1.... ,m minimizes H(f;f ...em)
To evaluate the properties of fm(w) as an estimator of f (w),
one must distinguish two kinds of error. The model approximation or
bias error is
B(m) = (f.; Tm ) .
The parameter estimation error or variance is
.
V(m, T) = E I (f m;fm)
As m tends to , B(m) tends to 0 and V(m, T) tends to w. The
optimal value m minimizes EI(f ; fm) as measured by
Ill ~ ~ ~ ~ o Il .. m "'-
32
C(m) B(m) + V(m,T)
In practice, one forms an estimator C(m) of the terms in C(m)
which depend on m.
One calls C(m) a criterion function for order determination.
It should be plotted, and interpreted as a function, not just
examined for its minimum value. It is useful to define a
best value of m (at which C(m) is minimized) and a second
best value of m (at which C(m) achieves its lowest relative
minimum).
One also has to define a value C(O) of the criterion function
at m=O. If
C(m) > C(0) for m=l,2,...
then the optimum order is 0, and the time series is considered
to be not significantly different from white noise. Further
research is required on the properties of order determining
criteria as tests for white noise.
Tests for white noise provide an alternative approach to
order determination since an autoregressive estimator fm(w) is
regarded as an adequate fit (or smoother) if the residual
spectral density f(w) + fm(w) is not significantly different
from the sample spectral density of white noise.
A widely used order determining criterion is that
introduced by Akaike (see Akaike (1974)]. It should be
33
emphasized that Akaike's criterion had a different conceptual
basis than the one outlined above; it seeks to determine the
order of an exact autoregressive scheme which the time series is
assumed to obey. Then one can raise the objection against it that
it does not consistently estimate the order, which is done by a
criterion due to Hannan and Quinn (1979). Our point of view is
that the approximating autoregressive scheme need only have the
property that f(w) fm(w) is just barely not significantly
different from the sample spectral density of white noise.
Akaike's order determining criterion AIC is defined by
AIC (m) = log om + -m 1
Possible definitions for AIC(O) are 0 or -I/T.
The Hannan and Quinn criterion is
lg^2
AICHQ(m) = log Cm + m log log T
Parzen (1974), (1977) introduced an approximating
autoregressive order criterion called CAT (criterion autoregressive
transfer function), defined by
CAT(m) - (1- ,
j 1 0M
CAT(O) = - ( +
In practice CAT and AIC lead in many examples to exactly
the same orders. It appears reassuring that quite different
conceptual foundations can lead to similar conclusions in
practice.
34
7. Suggestions for Empirical Spectral Analysic
The basic aim of spectral analysis is to obtain an estimated
spectral density which does not introduce spurious spectral
peaks, and resolves close spectral peaks. To arrive at the
final form of spectral estimator in an applied problem, auto-
regressive spectral estimators can be used to identify the
memory type of a time series (long, short, or no memory) and
the type of the whitening filter of a short memory time series
(AR, MA, or ARMA). An empirical time series spectral analysis
should involve the following stages.
A. 'Pre-processing. To analyze a time series sample
Y(t), t=l,...,T, one will proceed in stages which often involve
the subtraction of or elimination of strong effects in order to
see more clearly weaker patterns in the time series structure.
The aim of pre-processing is to transform Y(.) to a new time
series Y(.) which is short memory. Some basic pre-processing
operations are memory less transformation (such as square root
and logarithm), detrending, "high pass" filtering, and differencing.
One usually subtracts out the sample mean Y 1 Y(t); then thet-1
time series actually processed is Y(t) - Y.
B. Sample Fourier Transform by Data Windowing, Extending
with Zeroes, and Fast Fourier Transform. Let Y(t) denote a
pre-processed time series. The first step in the analysis could
be to compute successive autoregressive schemes using operations
only in the time domain. An alternative first step is the
computation of the sample Fourier transform
35
- T k(1) p(w) = [ Y(t) exp (-2 niwt) W = k = 0, 1,..
t=I Q
at an equi-spaced grid of frequencies in O<w<l. We call Q the
spectral computation number. One should always choose Q>T, and we
recommend Q>2T. Prior to computing tp(w), one should extend the
length of the time series by adding zeroes to it. Then 4)(w) can
be computed using the Fast Fourier transform.
If the time series may be long memory one should compute
in addition a sample "tapered" or "data windowed" Fourier
transform
T(2) W = I Y(t)W(4) exp (-2riwt)
C. Sample Spectral Density. The sample spectral density
f(w) is obtained essentially by squaring and normalizing the
sample Fourier transform;
= 2k(3) f(w) = P lw)I k=O, 1, ... ,Q-
(3) ~1 Ql1-
kioWV
D. Sample Correlation Function. The sample correlation
function p(v) is computed (using the Fast Fourier Transform).
E. Autoregressive analysis. The Yule-Walker equations are
solved to estimate innovation variances im, to which are applied
order determining criteria (AIC, CAT) to determine optimal
orders m and also to test for white noise. The value of o and
m
36
the dynamic range of the autoregressive spectral estimator
f (w) are used to determine the memory type of the time series.
Two orders (called the best m and second best M^ (2)) are
determined as candidates as optimal orders corresponding to the
absolute minimum and lowest relative minimum of the criterion
function.
F. ARMA analysis. When a time series is classified as
short memory, an approximating AR scheme of order 4m is
inverted to form MA (-) coefficients which are used to estimate
covariance matrix of Y(t-j) and YV (t-k). A subset regression
procedure is then used to determine a "best fitting" ARMA
scheme, and the corresponding ARMA spectral density estimator.
One will be able to identify moving average schemes and ARI4A
schemes which are barely invertible, and require a long AR scheme
for adequate approximation. The long AR spectral estimator
introduces spurious spectral peaks when compared to the MA or
ARMA estimator.
G. Non-stationary autoregression. When a time series is
classified as long memory, more accurate estimators of auto-
regressive coefficients are provided by minimizing a least
squares criterion or by Burg estimators. When several harmonics
are present in the data, whose frequencies are close together,
least squares autoregressive coefficient estimators are more
effective than Yule-Walker autoregressive coefficient estimators
in providing autoregressive spectral estimators which exhibit
the split peaks one would like to see in the estimated spectral
density.
37
H. Long Memory analysis. In the long memory case, one may
want to represent Y(t) as S(t) + N(t), a long memory signal plus
short memory noise. An approach to this problem may be provided
by treating the sample spectral density values f(k/Q) as a data
batch to be studied by non-parametric data modeling methods using
quantile functions [see Parzen (1979)]. The details of such
methods are under development.
I. Nonparametric kernel spectral density estimator. An
estimator f(w) of the spectral density is called: parametric
when it corresponds to a parametric model for the time series
(such as an AR or ARMA model); non-parametric otherwise. A
general form of non-parametric estimator is the kernel estimator
f() [ k~v (v) e- 27Tiwv 0<W<i
The problem of determining optimum truncations points M has no
general solution; one approach is to choose M = 4m to obtain a
preliminary smoothing of the sample spectral density.
J. Inverse correlations and cepstral correlations.
Estimators of pi(v) and y(v) are computed and used to form
nonparametric kernel estimators of f-l(w) and'log f(w), which
may provide additional insights into the peaks and troughs to
be given significance in the final estimator of the spectrum.
Extensive comparisons of different methods of spectral
estimation are given in Pagano (1980), Priestley and Beamish
(1981), Kay and Marple (1981). It seems clear that autoregressive
38
spectral estimators can give superior results when properly used.
One should: determine two best orders; compute autoregressive
coefficients by Yule-Walker equations and by least squares
since when the time series is long memory autoregressive spectral
estimators are most accurate when based on least squares
estimators of autoregressive coefficients; use approximating
autoregressive schemes to determine if an ARMA scheme fits
better.
The end of the story of the search for the perfect spectral
estimator seems attainable if one does not think of spectral
estimation as a non-parametric procedure which can be conducted
independently of model identification.
39
8. A Bibliography of Autoregressive Spectral Estimation
The bibliograDhv aims to provide a comprehensive list of
the publications in English which are directly concerned with
developing the theory and methods of autoregressive spectral
esatimation.
This section lists some of the publications which
contributed to the development of AR spectral estimation.
Yule (1927) introduces autoregressive schemes to model
disturbed periodicities as an alternative to Schuster
periodogram analysis and its spurious periodicities;
Yule-Walker (1931) equations relate autoregressive
coefficients and correlations of a stationary time
series.
Wold (1938) introduces infinite order autoregressive and
moving average representations of a stationary time
series; rigorous conditions are given by Akutowicz (1957).
Mann and Wald (1943) derive asymptotic distribution of
estimators of autoregressive coefficients.
Levinson (1947) - Durbin (1960) derive recursive methods
of solving Yule-Walker equations which subsequently
lead to fast algorithms for calculation of high order
* AR schemes.
Whittle (1954) seems to be the first to use autoregressive
schemes to estimate a spectral density. He used a low
order model in a case where high order models are
indicated by order determining criterion (Akaike
(1974), p. 720].
4044
Grenander and Rosenblatt (1956) criticize attempts to apply
low order autoregressive schemes, and develop theory
of non-parametric spectral density estimation, as do
Bartlett, Parzen, and Tukey and Blackman.
Parzen (1964), Schaerf (1964), and Parzen (1968)
discuss autoregressive spectral estimation as a
method for empirical time series analysis; no theory
is given.
Burg (1967), (1968) publishes his pioneering work on MEM
(maximum entropy method of spectral estimation) and
his method of calculating their coefficients.
Akaike (1969), (1970) derives asymptotic variance formulas
for autoregressive spectral estimators, and states
FPE (final prediction error) criterion for order
determination; precursor of FPE in Davisson (1965).
Parzen (1969) derives heuristically a formula for the
asymptotic variance of AR spectral estimators,
confirmed by Kromer (1969) and Berk (1974); an order
determining criterion is proposed.
Kromer (1969) in an unpublished Ph.D. thesis presents first
rigorous analysis of asymptotic distribution of
4i autoregressive spectral estimators, especially their
bias; consistency is proved only in an iterated
limit mode of convergence.
Berk (1974) provides first proof of consistency of
autoregressive spectral estimators.
41
Carmichael (1976) in an unpublished Ph.D. thesis provides
alternative proof of consistency of autorcgressive
estimators, and extends technique to general problems
of density estimation.
Akaike (1973), (1974), (1977) introduces AIC for model
order criterion and relates it to entropy maximization
principles.
Parzen (1974), (1977) introduces CAT for AR order
determination based on concept of finite parameter AR
schemes as approximations to infinite parameter AR
schemes.
Hannan and Quinn (1979) derive a modification of AIC which
provides consistent estimators of the AR order, when
exact model is assumed to be a finite order AR.
Huzii (1977), Shibata (1977), and Bhansali (1980), discuss
rigorously the convergence of AR spectral estimators
and inverse correlations.
Childers (1978) and Haykin (1979) contain very useful
collections of papers.
Pagano (1980), Beamish and Priestley (1981), and Kay and
Marple (1981) provide illuminating reviews of AR
spectral estimators and comparisons with alternative
methods.
42
BIBLIOGRAPHY
Akaike, H. (1969) Fitting autoregressive models forprediction, Ann. Inst. Statist. Math., 21, 243-247.
(1969). Power spectrum estimation throughautoregression model fitting, Ann. Inst. Statist.Math., 21, 407-419.
_ (1970) A fundamental relation between predictoridentification and power spectrum estimation. Ann.Inst. Statist. Math. 22, 219-223.
Ann (1970) Statistical predictor identification,Ann. Inst. Statist.-Math., 22, 203-217.
(1973) Information theory and an extension ofthe maximum likelihood principle. 2nd Int SYm .on Information Theory. B. N. Petrov--and-F. Saki,ed. Budapest: Akademia Kiado, 267-81.
(1974) A new look at the statistical modelidentification, IEEE Trans. Autom. Contr., AC-19,716-723.
_ (1977) On entropy maximization principle,Applications of Statistics, P. R. Krishnaiah, ed.,North Holland, Amsterdam, 27-41.
_ (1979) "A Bayesian Extension of the Minimum AICprocedure of Autoregressive Model Fitting, Biometrika,66, 237-242.
Akutowicz, E. J. (1957) On an explicit formula in linearleast squares prediction. Math. Scand. 5, 261-266.
Baggeroer, A. B. (1976) Confidence Intervals for Regression(MEM) Spectral Estimates, IEEE Trans. Inform. Theory,IT-22, 534-545.
Barndorf-Nielsen, 0. and Schon, G. (1973) On the parametr-ization of autoregressive models by partialautocorrelations, J. Multivariate Analysis, 3,408-419.
Beamish, M. and Priestley, M. B. (1981) A study ofautoregressive and window spectral estimation. Appl.Statist., 30, 41-58.
43
Berk, K. N. (1974) Consistent autoregressive spectralestimates. Ann. Statist., 2, 489-502.
Bhansali, R. J. (1980) Autoregressive and windowestimates of the inverse correlation function,Biometrika, 67, 551-66.
Bhansali, R. J. and Downham, D. Y. (1977) Some propertiesof the order of an autoregressive model selected bya generalization of Akaike's EPF criterion.Biometrika 64, 547-51.
Brillinger, D. R. (1981). Time Series: Data Analysis andTheory. Expanded Edlto-on, HoldenD-ay: San Francsco[see pg. 5121.
Burg, J. P. (1967) "Maximum entropy spectral analysis,"in Proc. 37th M n Societ of ExplorationGeopysiciEst (Oklahoma City, OK), ct. 31, r967.
_ (1968) A new analysis technique for time seriesdata, NATO Advanced Study Institute on SignalProcessn . Enschede,--e Netherlan.
Carmichael, J. P. (1976) The Autoregressive Method,unpublished Ph.D. thesis, Statistical ScienceDivision, State University of New York at Buffalo.
. (1978) Consistency of the Autoregressive Methodof Density Estimation, Technical Report, StatisticalScience Division, State University of New York atBuffalo.
Chatfield, C. (1979) Inverse autocorrelations, J. R.Statist. Soc. A 142, 363-77.
Childers, D. G. (1978) Modern Spectrum Analysis, NewYork: IEEE Press.
Cleveland, W. S. (1972) The inverse autocorrelationsof a time series and their applications. Technometrics14, 277-93.
Davisson, I. D. (1965) The prediction error of astationary Gaussian time series of unknowncovariance. IEEE Trans. on Information Theory,IT-lI, 527-532.
Dickenson, Bradley W. (1978) Autoregressive estimationusing residual energy ratios, IEEE Transactions onInformation Theory, IT-24, 503-5M.
44
Durbin, J. (1960) The fitting of time series models, Rev.Inst. de Stat., 28, 233-244.
Durrani, T. S. and Arslanian, A. S. (1982) Windowsassociated with high resolution spectral estimators.Submitted for publication.
Fuller, W. A., Hasen, D. P., and Goebel, J. J. (1981)Estimation of the parameters of stochastic differenceequations. Annals of Statistics, 9, 531-543.
Geronimus, Y. L. (1960). Polynomials Orthogonal on aCircle and Interval, New York: Pergamon Press.
Gersch, W. (1970) Spectral analysis of EEG's byautoregressive decomposition of time series, Math.Biosci., 7, 205-222.
Gersch, W. and Sharpe, D. R. (1973) Estimation of powerspectra with finite-order autoregressive models,IEEE Trans. Automat. Contr., AC-18, 367-369.
Grandell, J., Hamrud, M., and Toll, P. (1980) A remarkon the correspondence between the maximum entropymethod and the autoregressive models, IEEE Trans.Inform. Theory, IT-26, 750-751.
Grenander, U. and Rosenblatt, M. (1957) StatisticalAnalysis of Stationary Time Series, New York: Wiley.
Grenander, U. and Szego, G. (1958) Toe litz Forms andtheir Applications, Berkeley: Univ. Caliorna -
Press.
Griffiths, L. J. and Prieto-Diaz, R. (1977) Spectralanalysis of natural seismic events using auto-regressive techniques, IEEE Trans. Geosci. Electron.,GE-I5, 13-25.
Hannan, E. J. (1970) Multiple Time Series, Wiley: NewYork. [see p. 334].
Hannan, E. J. and Quinn, B. G. (1979) The determinationof the order of an autoregression, Journal of theRoyal Statistical Society, 41, 190-15
Haykin, S. S., ed. (1979) Nonlinear Methods of SpectralAnalysis. New York: Springer-Verlag.
Hsu, F. M. and Giordano, A. A. (1977) Line tracking usingautoregressive spectral estimates, IEEE Trans.Acoustics, Speech, Signal Process., A-- P 7,5l0-5l9.
45
Huzii, M. (1977) On a spectral estimate obtained by anautoregressive model fitting, Ann. Inst. Statist.Math., 29, 415-431.
Jones, R. H. (1974) Identification and autoregressionspectrum estimation, IEEE Trans. Automat. Contr.,AC-19, 894-898.
(1975) Fitting autoregression. J. Amer.Statist. Assoc., 70, 590-592.
(1976) Autoregression order selection, Geophys.,41, 771-773.
Kaveh, M. and Cooper, G. R. (1976) An empiricalinvestigation of the properties of the autoregressivespectral estimator, IEEE Trans. Inform. Theory,IT-22, 313-323.
Kay, S. M. (1978) Improvement of autoregressive spectralestimates in the presence of noise, Rec. 1978 Int.Conf. Acoustics, Speech, and Signal Processing,357-360.
Kay, S. M. and Marple, S. L., Jr. (1979) Sources of andremedies for spectral line splitting in autoregressivespectrum analysis, Rec. 1979 Int. Conf. Acoustics,Speech, and Signal P--ce ng,--T5 1=-.
Kay, S. M. (1979) The effects of noise on the auto-regressive spectral estimator, IEEE Trans. Acoust.,Speech, Signal Process., ASSP-277 78"48.
(1980) Noise compensation for autoregressivespectral estimates, IEEE Trans. Acoust., Speech,Signal Process., ASSP-2-, - 03,
Kay, S. M. and Marple, Stanley L., Jr. (1981) SpectrumAnalysis -- A Modern Perspective. Proceedings ofthe IEEE, 69, 1380-1419.
Kromer, R. E. (1969) Asymptotic properties of the auto-regressive spectral estimator. Ph.D. Thesis.Statistics Dept., Stanford University.
Lacoss, R. T. (1971) Data adaptive spectral analysismethods, Geophysics, 36. 661-675.
Landers, T. E. and Lacoss, R. T. (1977) Some geophysicalapplications of autoregressive spectral estimates,IEEE Trans. Geosci. Electron., GE-15, 26-32.
Levinson, N. (1946) The Wiener (root mean square) errorcriterion in filter design and prediction, J. MathPhys., 25, 261-278.
46
Mann, H. B. and Wald, A. (1943). On the statisticaltreatment of stochastic difference equations.Econometrica, 11, 173-220.
Marple. S. L., Jr. (1980). A new autoregressive spectrumanalysis algorithm, IEEE Trans. Acoust., Speech,Signal Process., ASSP-79,Z-454.
McClave, J. (1975). Subset autoregression. Technometrics,17, 213-220.
Morf, M., Vieira, A., Lee, D. T. L., and Kailath, T. (1978).Recursive multichannel maximum entropy spectralestimation, IEEE Trans. on Geoscience Electronics,GE-16, 85-94.
Nevai, Paul G. (1979). An asymptotic formula for thederivatives of orthogonal polynomials. SIAM J.Math. Anal. 10, 472-477.
Pagano, M. (1980). Some recent advances in autoregressiveprocesses. Directions in Time Series, ed. D. R.Brillinger and G. C. Tiao,_nst1tute of MathematicalStatistics, 280-302 (Comments by H. T. Davis).
Parzen, E. (1964). An approach to empirical time series,J. Res. Nat. Bur. Standards, 68B, 937-951.
(1967). Time Series Analysis Papers, HoldenDay: San Francisco. Includes Parzen (1964).
_ (1967). The Role of spectral analysis in timeseries analysis: Review of the InternationalStatistical Institute. T5,--=5-141.
• (1968). Statistical spectral analysis (singlechannel case) in 1968, Proceedings of NATO AdvanceStudy Institute on Signal Processin , Enscede,The Netherlands.
(1969). Multiple time series modelling,----Multivariate Analysis II, P. R. Krishnaiah, ed.
New York: Academic Press, pp. 389-409.
(1974). Some recent advances in time seriesmodeling, IEEE Trans. Automat. Contr., AC-19,723-730.
_ (1977). Multiple time series: determining theorder of approximating autoregressive schemes,Multivariate Analysis IV, P. Krishnaiah, ed. North-Holland: Amsterdam, 287-295.
47
(1979). Forecasting and whitening filterestimation, TIMS Studies in the Management Sciences,12, 149-165.
. (1979). Nonparametric statistical data modeling,Journal of the American Statistical Assoc., 74,1053l.-_
• (1980). Time series modeling, spectral analysis,and forecasting, Directions in Time Series Anal sised. D. R. Brillinger and G. U. 7Tia-, Institute ofMathematical Statistics.
' (1982). Time Series Model Identification andPrediction Variance Horizon, Applied Time SeriesAnalysis II, ed. D. Findley, Academic-Tress;New York,-l15-447.
Pinsker, M. (1963). Information and Information Stabilityof Random Variables, Holden--ay:. San Francisco.
Priestley, M. B. (1981). Spectral Analysis and TimeSeries, London: AcademiTcPress.
Rice, J. (1979). On the estimation of the parameters of apower spectrum. J. Multivariate Analysis, 9,378-392.
Sakai, H. (1979). Statistical properties of AR spectralanalysis, IEEE Trans. Acoust., Speech, SignalProcessing, ASSP-27, 4 3-4J9.
Schaerf, Mirella C. (1964) Estimation of the covarianceand autoregressive structure of a stationary timeseries. Ph.D. Thesis. Statistics Dept. StanfordUniversity.
Shibata, R. (1976). Selection of the order of anautoregressive model by Akaike's InformationCriterion. Biometrika, 63, 117-26.
(1980). Asymptotically efficient selection
of the order of the model for estimating parametersof a linear process. Ann. Statist. 8. 147-64.
• (1981). An optimal autoregression spectralestimate. Annals of Statistics, 9, 300-306.
Shore, J. E. (1981). Minimum Cross-Entropy SpectralAnalysis, IEEE Transactions on Acoustics, Speech,and Signal-FrP-cessing, ASSP-2, 2-237.
Shore, J. E. and Johnson, R. W. (1980). Axiomatic Derivationof the Principle of Maximum Entropy and the Principleof Minimum Cross-Entropy IEEE Transactions onInformation Theory, IT-26,--37.
48
Smylie, D. E., Clarke, G. K. C., and Ulrych, T. J. (1973).Analysis of irregularities in the earth's rotation,in Methods in Computational Physics, vol. 13. B. A.Bolt, ed. New York: Academic Press, pp. 391-430.
Thomson, D. J. (1977). "Spectrum estimation techniquesfor characterization and devleopment of WT4 wave-guide-l, Bell Syst. Tech. J., 56, 1769-1815.
Tong, H. (1977). More on autoregressive model fittingwith noisy data by Akaike's information criterion,
IEEE Trans. Inform. Theory, IT-23, 404-410.
Ulrych, T. J. and Bishop, T. N. (1975). Maximum entropyspectral analysis and autoregressive decomposition,Rev. Geophysics Space Phys., 13, 183-200.
Ulrych, T. J. and Clayton, R. W. (1976) Time Seriesmodelling and maximum entropy, Phys. Earth PlanetaryInteriors, 12, 188-200.
VanDenBos, A. (1971). Alternative interpretation ofmaximum entropy spectral analysis, IEEE Trans.Inform. Theory, IT-17, 493-494.
Walker, G. (1931). "On periodicity in series of relatedterms," Proc. Roy. Soc. London, Series A, 131,518-532.-
Whittle, P. (1954). The statistical analysis of a seicherecord. J. Marine Res. 13, 76-100.
(1963). On the fitting of multivdriate auto-regressions, and the approximate canonical.factorization of a spectral density matrix.Biometrika, 50, 129-134.
Wiener, N. (1949). The Extrapolation, Interpolation,and Smoothing oTStationar Time Series. New York,Wiley. Includes Levinson 194-T
Wiggins, R. A. and Robinson, E. A. (1965). Recursivesolution to the multichannel filtering problem,Journal of Geophysical Research, 70, 1885-1891.
Wold, H. (1938). A Study in the Analysis of StationaryTime Series. kt-c kf-m-Almquist an- Wiksell.
Yule, G. U. (1927). "On a method of investigatingperiodicities in disturbed series, with specialreference to Wolfer's sunspot numbers," PhilosophicalTrans. Royal Soc. London, Series A, 226, 267-298.
Pu;