On the martingale approximation of the estimation error of ARMA parameters

Systems & Control Letters 15 (1990) 417-423 417 North-Holland

On the martingale approximation of the estimation error of ARMA parameters

Lfiszl6 G e r e n c s 6 r *

Computer Vision and Robotics Laboratory, McGi l l Research

Centre for Intelligent Machines, McGil l University, Montrdal,

Quebec, Canada H3A 2A 7

Received 1 July 1990 Revised 30 September 1990

Abstract: The aim of this paper is to prove a theorem which is instrumental in verifying Rissanen's tail condition for the estimation error of the parameters of a Gaussian ARMA process. We get an improved error bound for the martingale approximation of the estimation error for a wide class of ARMA processes.

Keywords: ARMA process; prediction error estimation; strong approximation.

1. Introduction

This paper has been motivated by a problem formulated in [17] (cf. also [18]) as follows: Let the maximum-likelihood estimator of the parameter vector 0* of a Gaussian ARMA(p, q) process based on N samples be ~N" 0* is assumed to belong to an appropriate compact domain D* c R p+q. It was conjectured that for any c > 0 the following inequality holds:

oo

E sup p ( N t / 2 1 t ~ N - O * I > C log N ) < oo. N = I O * E D *

(TC)

Under the hypothesis that the conjecture is true, the Rissanen-Shannon inequality is applica- ble to stationary Gaussian ARMA processes, and a lower bound for the mean cumulated prediction error was obtained, which reflects the cost of parameter uncertainty (cf. [10]).

Rissanen's conjecture was proved in [5]. The aim of this paper is to present a significantly simplified proof of a result on the martingale approximation of the parameter estimation error, which is instrumental in the proof of Rissanen's conjecture. The proof is a careful reexamination of a standard technique (linearization around the estimator ~ ) combined with recently published inequalities for a class of mixing processes [6]. Thus we shall get a very sharp bound on the error term of a standard martingale approximation of ~N- 0 *, and many asymptotic properties o f /~N- 0 * can be derived from those of martingales. Thus for example central limit theorems (CLT's) and laws of iterated logarithms (LIL's) are easily obtained. Prior to the result of this paper very little has been known about the fine asymptotics of t~ N - 0 *. In [23] Taniguchi presents an Edgeworth expansion of/~v - 0 * but his result is not applica- ble to settle Rissanen's conjecture. However the result of the present paper combined with the result given in [9] does give a positive answer to Rissanen's conjecture.

The results of the paper are easily extended to multivariable finite-dimensional linear stochastic systems (cf. [7]), and to continuous-time systems driven by a diffusion term (cf. [8]). However, these extensions are not always obvious since an im- portant uniqueness theorem of .~str/3m and Srder- str/Sm has no general multivariable analogue. Some partial results have been given in [20] and [24]. Uniqueness is essential in the first part of the proof of Lemma 2.3.

Now we specify the notations and technical conditions for the present paper. Let (Yn), n - 0, + 1, ___ 2 . . . . , be a second order stationary ARMA (p, q) process satisfying the following difference equation:

* On leave from Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest.

Yn -Jr- a ~' yn _ 1 -Jr- • • • -k- a p Yn _ p

= e n + c~ 'en_ a + • . . + C q e n _ q. (1.1)

0167-6911/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)

418 L, Gerencs& / Martingale approximation of estimation error

Let A*, C* be polynomials of the backward shift operator. Then (1.1) is sometimes written in a shorthand notation as A*y = C*e. Define

P

A * ( z -1) = y" a*z -i, i = 0

q

= E i = 0

Condition 1.1. A * ( z -1) a n d C * ( z -1) have all their roots strictly inside the unit circle, i.e. A * ( z -1) and C * ( z -1) are asymptotically stable. Moreover, we assume that they are relative prime and a d ' = c ~ = l .

Condition 1.2. (e , ) is a discrete-time, second-order stationary L-mixing martingale-difference process with respect to a pair of families of o-algebras ( ~ , ~ + ) , n = 0, + 1, + 2 . . . . . such that

E(e~ I ~ - 1 ) = 0 *2 = const, a.s.

The concept of L-mixing together with the conditions imposed onto ~ , ~ + are described in the Appendix. A detailed exposition is given in [6].

Let G c R p+q denote the set of O's such that the corresponding polynomials A(z -1) and C(z -1) are stable. G is an open set. Let D* and D be compact domains such that 0 * ~ D * c i n t D and D c G. Here int D denotes the interior of D.

To estimate the unknown parameters a* , c~, i = 1 . . . . . p, j = 1 . . . . . q, and the unknown vari- ance 0 *2 we use the prediction-error method which works as follows. Let us take an arbitrary 0 ~ D and define an estimated prediction error process (e~), n > 0, by the equation e = ( A / C ) y with initial values e, = y , = 0 for n < 0. Let the coefficients of A ( z -a) and C(z -1) be denoted by a~ and e;, respectively, and set

0 = (a 1 . . . . . ap, C 1 . . . . . Cq) T.

To stress the dependence of (e ,) on 0 and 0* we shall write % = e, (0, 0 * ). Then the cost function associated with the prediction-error method is given by

N

v (o, E o*), n = l

and the estimate ff~ of 0* is defined as the solution of the equation

8----VN(O, 0 " ) = VoN(O, 0 " ) = 0 . (1.2) 00

(Here differentiation is taken both in the almost sure and in the M-sense. For the definition of the latter cf. the Appendix.) More exactly O N is a random vector such that 0N ~ D for all ~ and if the equation (1.2) has a unique solution in D, then O N is equal to this solution. By the measurable selection such a random variable does exist.

It is easy to see that e,(0, 0 " ) is a smooth function of 0 for all ~0, and hence (1.2) can be written as

N

Y'. co,(O, O*)e,(O, 0"1 = 0 . (1.3) n = l

Let us introduce the asymptotic cost function defined by

W(O, 0* ) = lim ~ 2 TEe , (0 , 0 " ) .

The function W(O, 0 " ) is smooth in the interior of D and we have

W e ( O * , O * ) = O and R * ~ = W o o ( O * , O * ) > O ,

i.e. R* is positive definite. It is well known that N 1 / 2 ( O N - 0") has the

asymptotic distribution N(0, a * (R * ) - 1). Various forms of the CLT are given in e.g. [1,3,4,11,12,15, 16,21]. However the rate of convergence to the normal law has not been investigated except in [23] where an asymptotic expansion of the em- pirical distribution is given.

2. The martingale approximation of 0N -- O *

Theorem 2.1. Under Condition 1.1 and 1.2 we have

u - 0 . = - ( R * ) - I ~ E Co,(0*, O * ) % + r u n=l

(2.1)

where r u = 0 M ( N - 1), i.e. we have for all 1 < q < 00,

s u p N E 1/q I rN [q • ~ . (2.2) N

L Gerencs& / Martingale approximation of estimation error 419

It is easy to see from the proof that (2.2) holds uniformly in 0* ~ D*. The power of the above theorem lies in the fact that the analysis of the estimation error is reduced to the analysis of a martingale. Since the error term is controlled in a convenient way, many statements for martingales ~uch as CLT or LIL) carry over to the process 0 N - 0 *. Also the verification of the tail condition (TC) for /~N -- 0 * is trivially reduced to the verification of a similar tail condition for the dominant term in (2.1). The latter problem can be solved in a straightforward way in the Gaussian case using the theorem below (of. [5,9]):

Theorem 2.2. Let (en) be a Gaussian white noise and let

~ = o ( e i : i < n } ,

~ + = o { e i : i > n }.

Let ( f , ) be an R P-valued L-mixing process with respect to ( ~ , .~+), such that Ef, f ~ = R* a.s. for all n. Then for all e > O,

N

E f ,e , = iN + OM(N2/5+~) n = l

where iN -- N(0, NR* ).

We start the proof of Theorem 2.1 with a lemma.

Lemma 2.3. For any d > 0 the equation (1.2) has a unique solution in D such that it is also in the sphere { 1 0 - 0 " 1 < d } with probability at least 1 - O(N -~) for any s > 0 where the constant in the error term O( N -~) = CN -s depends only on d and S.

Proof. We show first that the probability to have a solution outside the sphere { 0: I 0 - 0 * I < d ' } is less than O(N -~) with any s > 0. Indeed, the equation W0(0, 0 * ) = 0 has a single solution 0 = 0* in D (cf. [2]), thus for any d > 0 we have

d" &inf ( IWo(O, O*)l: O ~ D , a* ~ D * ,

I O - O * l > - - d } > O

since W0(0, 0 " ) is continuous in (0, 0 " ) and D x D* is compact. Therefore if a solution of (1.2)

exists outside the sphere I 0 - 0* I > d then we have for

1 aVoN= sup V0N(0, 0") -- W0(0, 0")

O~D, O*~D*

the inequality aVoN > d ' . But the process

uo(0, 0 " ) = e0,(0, 0 * ) e , ( 0 , 0 " ) - Wo(0, 0 " )

is an L-mixing process uniformly in (0, 0 " ) and the same holds for the process (uo,(O, 0")). Morever we have

c,~=Eu,(O, 0 " ) = O ( a " )

with some a such that 0 < a < 1. Indeed if the initial values %(0, 0") , e0o(0, 0 " ) had stationary distribution then we had c, = 0. On the other hand the effects of nonstationary initial values %(0, 0 " ) = 0 and e0o(0, 0 " ) = 0 decay exponen- tially. Hence by Theorem 3.3 we have 8VoN = OM(N- a/Z), therefore

P(aVoN> d ) = O ( U -s)

with any s by Markov's inequality, and thus the statement at the beginning of the proof follows.

Let us now consider the random variable

aVooN

1 *) (0, 0 " ) = sup ~ Voo u (0, 0 - Woo . OED, O*~D*

By the same argument as above we have

P( aVoON > a " ) = OM( N -~)

for any d " > 0 and hence for the event

AN= aV0N < a ' , aVoo <d") we have P(AN) > 1 -- O (N -~) with any s > 0. But on A N the equation (1.2) has a unique solution whenever d ' and d " are sufficiently small. In- deed, the equation W0(0, 0 " ) has a unique solution 0 = 0* in D by [2] and hence the existence of a unique solution of (1.2) can easily be derived from the implicit function theorem (cf. Lemma 3.4). Thus the lemma has been proved.

Let us now consider equation (1.2) and write it a s

0 = V N(aN, 0*)

= V o N ( O * , O * ) + VooN(I~N--O* ) (2.3)

420 L Gerencs& / Martingale approximation of estimation error

where

/o 1 1Ioo N = Voo u ((1 - X) 8 * + M~ N , 8 * ) d )~ .

Lemma 2.4. We have t~ N - O* = OM( N-1/2).

Proof. First note that VaN(O*, 8 * ) = O M ( N 1/2)

by Burkholder's inequality for martingales (cf. e.g. Theorem 3.3.6 in [22]), since e,(0*, 0" ) = e, + OM(a_" ) with some [a I < 1. Let us now investi- gate Va0N. Define

~'OON= foIWOo((1--~.)8* +~.ON, 8*) dX. (2.4)

Obviously WO0N > d with some positive c on A N if d is sufficiently small. Indeed since W is smooth we have for 0 < X _< 1,

ii Wo0(8* + 8 , ) - Wo0(8*, 8 , ) ,

<CldN--8*l<cd, (2.5)

where C is a constant depending on the system parameters and 0 2= Ee 2. Hence if d is sufficiently small then the positive definiteness of WO0(8*, 0" ) and (2.4) imply that Woo N > cI with some positive c. Since on A N we have

1 - 'N VaON -- WOON < d " ,

it follows that if d " is sufficiently small then

1 - ~ rain ("~ VOON ) > C ~ ' 0 (2.6)

on A N where in general ~ n ( B ) denotes the minimal eigenvalue of the matrix B. Hence II F0~ II < C N-a on AN with some nonrandom

constant C and we get from (2.3),

XAN(ON-- 8" ) = OM(N-1/2). (2.7)

Combining this inequality with the previous inequality P(ACN)= O(N -s) where A~v denotes the complement of An, and using the fact that [0N -- 8" [ is bounded, we get for any s > 0,

Xa%(~N- -8* )=OM(N-S ) . (2.8)

Adding this equality to (2.7) we get the lemma.

Now we can complete the proof of Theorem 2.1 as follows. Using the result of the lemma we can

improve the inequality (2.5) be writing OM(N 1/2) on the right hand side. Thus we get after integra- tion with respect to X that

]IWooN-- Woo(8*, O* ) [ ]=OM(N -1/2) (2.9)

On the other hand the inequality ~Vo0 u =

Oa4 ( N - 1/2) implies that

1 - - WOON = OM(U-a/2) . (2 .10) "~ Voo N

Hence we finally get

VaOU-- W00(0*, 8*) = O M ( N 1/2). (2.11)

Let us now focus on the event AN, where we have the inequality (2.6). A simple rearrangement shows that (2.6) and (2.11) imply

---1 1 w0~l(8. , 8" X A N VOON -- N ) = 0 m ( N - 3/2 )

(2.12)

Now we can get our final estimate for t~ N - 0 * by substituting (2.12) into (2.3) to obtain

_ ---1 . . ) -- -- XANVooNVoN ( 8 , 8

= -X.%v(1WO~I(8 *, 8 * ) + O M ( N - 3 / 2 ) )

• v N(8*, 8 * )

. 1 = --XANWO-01(8 *, 8 ) " ~ V o u ( 8 * , 8 * )

+ OM(N -1)

. 1 = -w 1(8 *, 8 8*)

+OM(N-' ) 1 N

= - ( R * ) - ' - ~ Z % , ( 8 * , 8 * ) e , n=l

+ OM(N-a).

The last but one equality is obtained by taking into account that 1 - XAN = OM(N-s) with any s > 0 and that the expression in the first term multiplied by XA N is OM(N -1/2) (hence also OM(1)). Finally adding the equality (2.8) to (2.13) we get the proposition of the theorem.

L Gerencsdr / Martingale approximation of estimation error 421

3. Appendix: Some previous results on L-mixing processes

We summarize a few results published in [6] and used in this paper. The set of real numbers will be denoted by R, the p-dimensional Euclidean space will be denoted by R P. Let D c R p be a compact domain and let the stochastic process (x,(O)) be defined on l x D, where Z denotes the set of natural numbers.

Definition 3.1. We say that (x ,(O)) is M-bounded if for all 1 < q < oe,

M q ( x ) = sup E1 /q lx , (O ) I q < oo. n > 0 O~D

We say that a sequence of r.v. x , tends to a r.v. x in the M-sense if for all q >_ 1 we have

lim E 1/q Ix n - x [ q = O. n --+ O0

Similarly we can define differentiation in the M- sense. A basic notion in our discussion is a kind of mixing, which appeared in a different form in [14], where it was called 'exponential stability'. See also [15,191.

Let ( ~ ) , n > 0, be a family of monotone in- creasing o-algebras, and ( ~ + ) , n _> 0, be a monotone decreasing family of o-algebras. We assume that for all n > 0, ~ and ~ + are independent. For n < 0, o~, + = ~0 ÷ . A typical example is pro- vided by the o-algebras

. ~ , = o ( e i : i < n } , ,~ ,+=o{e i : i > n } ,

where (e~) is an i.i.d, sequence of r.v.

Definition 3.2. A stochastic process (x n, 0), n >_ 0, is L-mixing with respect to ( ~ , ~ + ) uniformly in 0 if it is E-progress ively measurable, M-bounded and with "r being a positive integer and

"Yq( T , X ) = ' y q ( T )

= sup E ~/q I x , (O) - E(x , (O)I~+_r) I q, n>_'r O~D

we have for any 1 < q < oo,

rq = rq(x) = E < r = l

Example. Discrete-time stationary Gaussian AR- MA processes are L-mixing. (This can be seen using a state-space representation.)

Theorem 3.1 (cf. Theorem 1.1 in [6]). Let (u,) , n > O, be an L-mixing process with Eu , = 0 for all n and let ( f , ) be a deterministic sequence. Then we haoe for all 1 < m < oo,

• N 2m

E 1/2m n~=lfnUn

__ ~,Z2m \~*}Jt2m \ < Cm l/21,, r1/21u)

where Cm = 2(2m - 1) 1/2.

Define

Ax/A~O = Ix , (O + h) - x , (O) I / I hl ~

for n _ 0 , O ~ O + h ~ D with 0 < a _ < 1.

Definition 3.3. The stochastic process x,(O) is M-HNder-cont inuous in 0 with exponent a if the process Ax/A~O is M-bounded, i.e. if for all 1 < q < oo we have

M q ( A x / m a o )

= sup E 1 / q l x n ( O + h ) + x n ( O ) l q / l h [ a n>0

O ~ O + h ~ D

Example. If (x , (O)) is absolutely continuous with respect to O a.s. and the gradient process (x,(O)) is M-bounded, then (x , (O)) is M-HNder-continuous with a = 1, in other words (x,(O)) is M- Lipschitz-continuous.

Let us consider the case when (xn(O)) is a stochastic process which is measurable, separable, M-bounded and M-HNder-cont inuous in O with exponent a for O ~ D. By Kolmogorov's theorem (cf e.g. Theorem 19, Appendix I of [13]), the realizations of (x , (O)) are continuous in 0 with probabili ty 1, and hence we can define for almost all ~,

x * = m a x I x n ( 0 ) I 0~D0

422 1_, Gerencsdr / Martingale approximation of estimation error

where D o c i n t D is a compact domain. As the realizations of x , (O) are continuous, x* is measurable with respect to ~-, that is x* is a random variable. We shall estimate its moments.

Theorem 3.2 (cf. Theorem 3.4 in [6]). Assume that ( x , ( O)) is a stochastic process which is measurable, separable, M-bounded and M-Hrlder-continuous in 0 with exponent a for 0 ~ D. Let x* be the random variable defined above. Then we have for all positive integers q and s > p / a ,

M q ( X * ) <_ C ( Mqs(X ) q -Mqs(AX/ /AaO))

where C depends only on p, q, s, a and Do, D.

Combining Theorem 3.1 and 3.2 we get the following theorem when f , = 1 and a = 1.

Theorem 3.3. Let u,(O) be an L-mixing process uniformly in 0 E D such that E u , ( 0 ) = 0 for all n >_ O, 0 ~ D and assume that A u/AO is also L-mixing, uniformly in O, 0 + h ~ D. Then

sup ~ N 1 ~_~ u.(O) = OM(N-1/2) . O~Do n=l

I_~mma 3.4. Let D O and D be as above. Let W o (0), 8We(0), 0 ~ D c R p be R P-valued continuously differentiable functions, let for some 0 " ~ Do, Wo( O * ) = O, and let Woo(O* ) be nonsingular. Then for any d > 0 there exists positive numbers d ', d " such that

18W0(0) I < d ' and II 8Woo(O) II < d "

for all 0 ~ D O implies that the equation Wo(O ) + 6Wa( O) = 0 has exactly one solution in a neighbour- hood of radius d of 0 *.

The proof is obtained by the application of the imphcit theorem to the equation W0(0 ) + aSWo(O ) = 0 with 0 < a < 1.

Acknowledgements

This research was supported in part by the Natural Sciences and Engineering Research Council under Grant 01329, and by the Hungarian Academy of Sciences under the research project "The Mathematics of Control Theory", while

working in the Computer and Automation In- stitute in Budapest.

The author wishes to thank J immy Baikovicius, Kar im Nassiri-Toussi and Zsuzsanna Vfig6 for their careful reading of the manuscript and to Mindle Levitt and Solomon Seifu for their consid- erable amount of work in the preparation of this document.

References

[1] T.W. Anderson, The Statistical Analysis of Time Series (Wiley, New York, 1971).

[2] K.J. ,~strSm and T. SSderstrSm, Uniqueness of the maximum-likelihood estimates of the parameters of an ARMA model, IEEE Trans. Automat. Control 19 (1974) 769-773.

[3] P.E. Caines, Linear Stochastic Systems (Wiley, New York, 1988).

[4] W. Dunsmuir and E.J. Harman, Vector linear time series models, Adv. Appl. Probab. 8 (1976) 339-364.

[5] L. Gerencsrr, On the normal approximation of the maximum-likelihood estimator of ARMA parameters, Report WP 49, Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest (1985). Re- vised as: Verification of Rissanen's tail condition for the parameter estimator of a Gaussian ARMA process, Re- port TR-CIM-89-6, McGiU Research Center for Intelli- gent Machines (1989).

[6] L. Gerencsrr, On a class of mixing processes, Stochastics 26 (1989) 165-191.

[7] L. Gerencsrr, Some new results in the theory of recursive identification, Proe. of the 28th IEEE CDC, Vol. 1 (1989) 242-248.

[8] L. Gerencs&, Strong approximation theorems for estimator processes in continuous time, in: I. Berkes, E. Csfiki and P. Rrvrsz, Eds., Limit Theorems in Probability and Statistics, Colloquia Mathematica Societatis Jfinos Bolyai (North-Holland, Amsterdam, 1990, to appear).

[9] L. Gerencsrr, Strong approximation theorems for stochastic integrals, Submitted for publication (1990).

[10] L. Gerencs~r and J. Rissanen, A prediction bound for Gaussian ARMA processes, Proc. of the 25th CDC, Athens, Vol. 3 (1986) 1487-1490.

[11] E.J. Hannan and M. Deistler, The Statistical Theory of Linear Stystems (Wiley, New York, 1988).

[12] E.J. Hannan and L. Kavalieris, Multivariate linear time series models, Adv. Appl. Probab. 16 (1984) 492-561.

[13] I.A. Ibragimov and R.Z. Khasminskii, Statistical Estima- tion. Asymptotic Theory (Springer-Verlag, Berlin-New York, 1981).

[14] L. Ljung, On consistency and identifiability, Mathemati- cal Programming Study 5 (1976) 169-190.

[15] L. Ljung and P.E. Caines, Asymptotic normality of prediction error estimation for approximate system models, Stochastics 3 (1979) 29-46.

[16] L. Ljung, System Identification: Theory for the User (Pren- tice-HaU, Englewood Cliffs, N J, 1987).

L. Gerencs~r / Martingale approximation of estimation error 423

[17] J. Rissanen, Stochastic complexity and predictive model- ing, Annals of Statistics 14 (1986) 1080-1100.

[18] J. Rissanen, Stochastic Complexity in Statistical Inquiry (World Scientific, Singapore, 1989).

[19] J. Rissanen and P.E. Caines, The strong consistency of maximum likelihood estimators for ARMA processes, Ann Statist. 7 (1979) 297-315.

[20] T. SSderstr6m and P. Stoica, Uniqueness of prediction error estimates of multivariable moving average models, Automatica 18 (1982) 617-620.

[21] T. S6derstr/Sm and P. Stoica, System Identification (Pren- tice-Hall, Hemel Hempstead, NY, 1989).

[22] W.F. Stout, Almost Sure Convergence (Academic Press, New York, 1974).

[23] M. Taniguchi, Validity of Edgeworth expansions of minimum contrast estimates for Gaussian ARMA processes, J. Multivariate Anal. 18 (1986) 1-31.

[24] Zs. V~g6 and L. Gerencs6r, Uniqueness of the maximum- likelihood estimates of the Kalman-gain matrix of a state space model, Proc. of the IFA C/ IFORS Conference on Dynamic Modelling of National Economics, Budapest (1985).

Documents

On the martingale approximation of the estimation error of ARMA parameters