7
Statistics & Probability Letters 12 (1991) 201-207 North-Holland September 1991 Strong approximation of vector-valued stochastic integrals L&z16 Gerencskr * Department of Electrical Engineering. McGill University, Mont&al, QuC., Canada H3A 2A7 Received June 1990 Revised January 1991 Abstract: We present a strong approximation technique which is useful to develop fine asymptotic results for parameter estimator processes of linear stochastic systems. Among other applications Rissanen’s tail-condition has been verified for Gaussian ARMA processes using the results of this paper. Keywords: Strong approximation, central limit theorem, tail-probabilities, estimation theory, linear stochastic systems, 1. Introduction An important technical device in the asymptotic theory of the statistics of stochastic processes is a central limit theorem for stochastic integrals (cf. Kutoyants, 1980) which can be formulated as fol- lows. Let (w,), t > 0, be a real-valued standard Wiener process and let e = u{ w3: s < t } be the a-algebra generated by (w,) up to time t. Further- more let (f,) be an e-adapted R P-valued stochas- tic process such that ElOr 1 f, 1’ dt < co for all finite T > 0. Now if for some To > 0 we have $ J';%~,L dt = S a.s. 0 0 (1.1) where f,* denotes the transpose of f, and S is a constant p x p matrix then (1.2) This research was partially supported by the Natural Sciences and Engineering Research Council of Canada under grant 01329. * On leave from Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest. i.e. the distribution of the stochastic integral on the left hand side is a p-dimensional normal dis- tribution with zero mean and covariance matrix S. If (1.1) is not satisfied and p = 1 then f,f,* = 1 f, I* and we can define a stopping time r = 7(To) such that Tom’j,‘(To) If, j*dt=S whenever loo”lft l’dt = co. Hence T,-‘/2/,“T~’ I f, I* dw, -X(0, S), and the deviation of the stochastic integral (1.2) from normality can explicitely be expressed as where x,~,~] denotes the characteristic function of the interval [a, b]. Unfortunately this argument falls apart if p > 1. The purpose of the paper is to present an alternative construction for the strong approximation of (1.2) by a normally distributed random variable, which is valid for any p (cf. Theorem 1.1). This theorem provides us with an invaluable tool for analysing tail-probabilities of stochastic integrals. First we need a few definitions. The set of real numbers will be denoted by R, the k-dimensional Euclidean space will be denoted by lRk. Let (x,) be an Rk-valued stochastic process. 0167-7152/91/$03.50 0 1991 - Elsevier Science Publishers B.V. (North-Holland) 201

Strong approximation of vector-valued stochastic integrals

Embed Size (px)

Citation preview

Page 1: Strong approximation of vector-valued stochastic integrals

Statistics & Probability Letters 12 (1991) 201-207

North-Holland

September 1991

Strong approximation of vector-valued stochastic integrals

L&z16 Gerencskr * Department of Electrical Engineering. McGill University, Mont&al, QuC., Canada H3A 2A7

Received June 1990

Revised January 1991

Abstract: We present a strong approximation technique which is useful to develop fine asymptotic results for parameter estimator

processes of linear stochastic systems. Among other applications Rissanen’s tail-condition has been verified for Gaussian ARMA

processes using the results of this paper.

Keywords: Strong approximation, central limit theorem, tail-probabilities, estimation theory, linear stochastic systems,

1. Introduction

An important technical device in the asymptotic theory of the statistics of stochastic processes is a central limit theorem for stochastic integrals (cf. Kutoyants, 1980) which can be formulated as fol- lows. Let (w,), t > 0, be a real-valued standard Wiener process and let e = u{ w3: s < t } be the a-algebra generated by (w,) up to time t. Further- more let (f,) be an e-adapted R P-valued stochas- tic process such that ElOr 1 f, 1’ dt < co for all finite T > 0. Now if for some To > 0 we have

$ J';%~,L dt = S a.s. 0 0

(1.1)

where f,* denotes the transpose of f, and S is a constant p x p matrix then

(1.2)

This research was partially supported by the Natural Sciences and Engineering Research Council of Canada under grant

01329.

* On leave from Computer and Automation Institute of the Hungarian Academy of Sciences, Budapest.

i.e. the distribution of the stochastic integral on the left hand side is a p-dimensional normal dis- tribution with zero mean and covariance matrix S. If (1.1) is not satisfied and p = 1 then f,f,* = 1 f, I* and we can define a stopping time r = 7(To) such that Tom’j,‘(To) If, j*dt=S whenever loo”lft l’dt = co. Hence T,-‘/2/,“T~’ I f, I* dw, -X(0, S), and the deviation of the stochastic integral (1.2) from normality can explicitely be expressed as

where x,~,~] denotes the characteristic function of the interval [a, b]. Unfortunately this argument falls apart if p > 1. The purpose of the paper is to present an alternative construction for the strong approximation of (1.2) by a normally distributed random variable, which is valid for any p (cf. Theorem 1.1). This theorem provides us with an invaluable tool for analysing tail-probabilities of stochastic integrals.

First we need a few definitions. The set of real numbers will be denoted by R, the k-dimensional Euclidean space will be denoted by lRk. Let (x,) be an Rk-valued stochastic process.

0167-7152/91/$03.50 0 1991 - Elsevier Science Publishers B.V. (North-Holland) 201

Page 2: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

Definition 1.1. We say that (x,), t >, 0, is M- bounded if for all 1 < q < 00,

f>O

The definition extends to discrete time in an obvi- ous way. We shall also write x, = O,(l). Similarly if c, is a sequence of positive numbers we write x, = O,( c,) if x,/c, = O,(l).

Let us now consider a family of monotone increasing u-algebras (8) t > 0, and a family of monotone decreasing u-algebras ( TG7+), t > 0, such that 9 %+ are independent for all t and % = f) u(U,,O.$+,) for all t. For t < 0 we set E”;’ =Fz. A typical example is provided by the u-algebras

3$=u{w,: s<t},

~+=u{w,-w,~: s, b-t}.

Definition 1.2. We say that a stochastic process (x,), t > 0, is L-mixing with respect to (3, Tt;‘) if it is (q)-progressively measurable, M-bounded and if we set for q >, 1, 7 > 0,

~~(7, x) =y,(~) = supE’i4/x,-E(x,I~?,)/q f>7

then we have

I;=l;(x)=imyq(~) dT< cc.

An important technical tool in the theory of L-mixing processes is a moment inequality for L-mixing processes which is similar to Burk- holder’s inequality.

Theorem A. Let (u,), t > 0, be an L-mixing pro- cess with Eu, = 0 for all t. Let (f,) be a de- terminisic function in L,[O, T]. Then we have for all l<m<f22,

El/2m

2m

‘iu, dt

where C, depends only on m. 0

202

This result is obtained by a combination of an earlier moment inequality (Gerencser, 1989b, The- orem 1.1) for integrals of the type joTftu, dt with fixed T and a continuous-time extension of a basic technique developed in Moricz (1974) (cf. (Gerencser, 1989b, Theorem 5.1).

In the case of discrete time let (sn), n 2 0, be a family of monotone increasing u-algebras, and (3+), n > 0, be a monotone decreasing family of u-algebras. We assume that for all n > 0, Fn and Fn+ are independent. For n Q 0 we set Fni;,’ =sO,. A typical example is provided by the u-algebras

Fn=u{e,: i<n}, 9$+ = u { e, : i > n } ,

where ( ei) is a sequence of independent identically distributed random variables.

Definition 1.3. A stochastic process (x,), n > 0, is L-mixing with respect to (gn,, sn+) if it is pn-pro- gressively measurable, M-bounded and with 7 being a positive integer and q >, 1,

we have

rq=rq(x) = E yq(', x) < 00.

7=1

Example. Discrete time stationary Gaussian ARMA processes are L-mixing. (This can be seen using a state space representation.)

A discrete-time version of Theorem A (which can easily be derived from Theorem A) is the following:

Theorem B. Let (u,), n >, 0, be an L-mixing pro- cess with Eu, = 0 for all t. Let (f,) be a determin- istic sequence. Then we have for all 1 < m < 00,

where C,,, depends only on m. 0

Page 3: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

The main result of the paper is the following theorem:

Theorem 1.1. Let (f,) be a p-dimensional L-mixing

process with respect to (3, e;‘), such that

E&f,* = S a.s.

for all t > 0. Then for any E > 0,

(1.3)

I,= =f, dw,=&-+0,&T1’4+E) / 0

(1.4)

where .$‘r -J-(0, TS).

This theorem was instrumental in deriving a characterization of lo’ f, dw, as a stochastic pro- cess given in the following theorem.

Theorem C (Gerencser, 1991). Let (f,) be as above and assume that for any m 2 1 and q > 1 we have

~~(7, f) = O(T-~). Further assume that the un- derlying probability space is sufficiently rich. Then

for any E > 0 we have

I,= oT~dw,=G,+O~(T2~s+‘) / (1.5)

where ( Wr) is a p-dimensional Wiener process with covariance matrix S. q

The interesting point of this theorem is that a scalar-valued Wiener process can approximately be ‘disintegrated’ into a vector-valued Wiener pro- cess, using stochastic integration.

Theorem 1.1 is a minor extension of Theorem 1.2 below. This discrete-time version of Theorem 1.1 is what we shall actually prove, since it is this version which has directly been used in system identification (cf. Gerencser, 1989a). The ad- ditional step to be done in continuous time will be explained at the end of the next section.

Let (e,) be a real-valued Gaussian white noise, i.e. an independent sequence of JV(O, 1) random variables. Set

Fn==.{e,: i<n}, Fn+ =u{e,: iantl}.

With this notation we have the following theorem.

Theorem 1.2. Let (f,), n >, 0, be a p-dimensional predictable i.e. 9n _ , measurable L-mixing process

with respect to (3, Fn+) such that

Ef, f,* = S (1.6)

for all n > 0. Then for any E > 0,

IN = c f,e, = .$n + 0, ( N1’4+E) n=l

(1.7)

where 5, - JV(O, NS). More exactly the right hand side of (1.7) can be written as tn + r, where for r,

the following estimate holds: for any q 2 1 we have E’/‘J( rN)q < CN1/4+E where C is an a priori con-

stant in the sense that C depends on q, p and on

Msq( f’) and T,,( f ‘), i = 1,. . _, p, but does not depend on the specific form of (f,).

Theorem 1.2 was first proved in a weaker form in Gerencstr (1985) and later improved in Gerencstr (1989a). In that paper the projections of Z, into various directions were approximated by a normally distributed random variable. That result was used to settle a conjecture of Rissanen on the tail-probabilities of the estimators of ARMA- parameters in the Gaussian case (cf. Rissanen, 1986; and also Rissanen, 1989). Theorem 1.2 pro- vides a good estimate for the tail probabilities of Z, namely having Theorem 1.2 we can easily derive the crucial inequality: for any C > 0,

5 P(/N-"'1,1 > C log N) < co. N=l

It should be noted that general martingale limit theorems such as given in Hall and Heyde (1980) or in more recent surveys of Bolthausen (1982) and Haeusler (1988) are not suitable for settling Rissanen’s conjecture, since they do not provide good estimates of the tail-probabilities of martin- gales.

2. The proofs

The proof of Theorem 1.2 is based on the follow- ing well-known fact: if fn is predictable and

gfnc=NS a.s. (2.1) n=l

203

Page 4: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LE’TTERS September 1991

with some constant matrix S E Iw pxp then

7, = ; &,--A-(0, NS). n=l

The proof of this proposition can be carried out by a characteristic function argument.

We can reduce the general case to the case where S = Z in two steps. First we show that S can be assumed to be nonsingular, then by a proper scaling we get S = Z. Indeed, let T be an orthogonal transformation such that

TST*=D=(diagX,), i=l,..., p.

Assume that &,...,A,>0 and h,,, = ... =A, = 0 and split D and T as follows:

where D, is r X r, T, is r X p. Then the covariance matrix of the process (Tf,) will be D, hence we have

and this implies T, f, = 0 a.s. Thus if we can prove the theorem for the process T, f, replacing f,, the general case will follow. Now if D is nonsingular then we can replace f, by D-‘12Tfnn, and S will be replaced by Z.

Let N’ = N - [N1/2+E], E > 0, and define a predictable stopping time T by

i

I ITi 7= min m: sup 1 c fif: - 4, (

lgm<N’-1 n=l ;,/=I ,....P

> N1/2+E’ \

where 0 < E’ < E and set r = N’ if no m indicated above exists. We define a new predictable process f, such that j, = f, for n < r and

(2.2)

The last condition is equivalent to the condition

7-l

c f,lf,J + ; fnlj; = 6,,N n=l ?I=7

from which we get after rearrangement

7-l

=S,,(N-~+l)- c (f;fi-8,,) n=l

ki R'J_ (2.3)

By the definition of r we get for i = j the esti- mates

R” > [ N1/2+E] _ N1/2+E’ > +N1/2+E

for large enough N ‘s. On the other hand for i #j

we have

1 R’J 1 < Nl/‘+“.

It follows that for sufficiently large N’s the matrix ( RiJ) is diagonally dominant (since E’ c E), and thus R = (R”) is positive definite. Equation (2.3) can be considered as an equation for f, and since R is positive definite it can be ‘solved’ for f,. Also it is easy to see that R is F7”,-measurable, i.e. for any Bore1 set AclRpxp and l<n<N the set {REA, -r<n} is 5$_,-measurable.

To ensure proper measurability of f, we give a constructive solution. Let G denote the unique, positive definite square root of R and let g’ =

(gj,..., gi) denote the ith row of G. Then define

f 2+r=g:r=1,...,p and f;+,=O for r>p.

Obviously f, satisfies (2.3) and thus it also satis- fies the equivalent condition (2.2). The construc- tion given above can also be represented in the following more abstract form which we shall need later: for n >, r we have f, = +(n - 7, R) where + is a well-defined deterministic function of the scalar n - T and of the matrix R.

Now we show that f, is predictable. Indeed for any 1 < n < N and any Bore1 set A in [w p we have

{j&4} =o,u9,

204

Page 5: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

where

The definition of T and the predictability of f, implies that the events { r =G n} and { r > n } are .3$_,-measurable. For 0, we have the alternative form

which is clearly 90, _ 1-measurable. To complete the proof it is sufficient to show that s2, is gn_,-mea- surable. But we have r, = $(n - 7, R) for n > T, and the right hand side is 3$-measurable. Hence { (p( n - 7, R) E A } is %-measurable, and there- fore its intersection with {T G n} is 3$_,-mea- surable. This proves that j, is predictable.

Thus we get that

r,= E fne,-X(0, NZ). n=l

Let us now compare IN and ZN. We have

Z, - I, = E f,e, - 2 f,e, n=l n-l

= 5fne,- i.tie,. tl=7 n-7

We shall first estimate the moments of CN f e n-7 n “’ First consider the event { T < N ’ }. Since fif’l;r - Sij is a (matrix-valued) zero mean L-mixing process, we have by Theorem B that

,:~~_*I~~f~-Sijl=o~(Nl/i).

More exactly we have for any q > 2,

E’/4l<;~;,_, Igr:a- a,,1

< CqN’&y( hq-y( h”) (2.4)

where hz = fJf,’ - 6. . . Using the trivial inequality E”q1~-E~/q<2E ‘b7 I‘$1 4 we get that

Mq(h”)<2Mq(f’f’)<2M;,/2(f’)M;,/2(f’).

(2.5)

On the other hand the definition of T,(h’j) im- mediately yields

~q(h”)=~q(f’f’)<M;,/2(f’)l-;‘2(fj)

+ l-y( fi)M;6’( f’).

(24

Now if we take supremum over (i, j) (in addition to supremum over m) in (2.4) then we get

E”q,,~~_, l~lf~f~-‘~jl . . . ..p

< CqN”2 c i ,~~Ma”i(hjj)~~/‘(hji)

the last inequality being a consequence of the Cauchy-Schwartz inequality. Thus finally we have for q > 2,

where C is an ‘a priori constant’ in the sense that its value is completely determined by q, p, M2q(fi), r2q(f’), i=l,...,p, and it is indepen- dent of the special form of the 4.‘~.

Hence by Markov’s inequality we get

P(r < N’)

= P f;f,’ - &, / > N’/2+E

Q CN-' (2.7)

for any r > 0, where C is an a priori constant. Or equivalently we can write xTGNI = O,(N-‘) for any r > 0 and this is an a priori estimate.

By Burkholder’s inequality we have for q a 1,

Eri2’i iTX, < wfnen r

< CqE’/2q : (&<Ndfn I') (2.8) "=T

205

Page 6: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

Using the Cauchy-Schwarz inequality for the last In case of continuous time we define a stopping term in (2.8) we get that time r by

I N I

c XT<N’f”e” = %(N-‘) ?I=7

for any r > 0, and this is an a priori estimate. On the set r = N’ we use the estimate

I i 2q

EliZq 5 f,e, =oM((N-N’)“2), n=N'

which follows from Burkholder’s inequality. Thus we finally get the a priori estimate

E f,e, = 0, ( N1’4+E/2). n=7

(2.9)

We show that a similar estimate holds when f, is replaced by I,. On the set T < N’ we apply Burkholder’s inequality as in the first inequality of (2.8), we use condition (2.2) and then we use the equality (2.7) to get the a priori estimate

for any r > 0.

On the set { r = N’ } we proceed similarly: we apply Burkholder’s inequality and use the fact that on {r=N’} we have

“=7 i=l n=7 i=l

GP(N 1/2+e + N1/2+E’

>

by (2.3) and the definition of 7, thus giving the a priori estimate for q 2 1,

E'/2q g xr=N,fnen

2q

< cNl/4+~/2

n=N’

Thus we get that (2.9) holds with f, replacing f,, and finally we get the a priori estimate

which proves the theorem. 0

where T’ = T - T1/2+E and 0 < e’ < E. If the set of s’s indicated above is empty, then we set r = T’.

The continuation of (f,) denoted as (f,) for t > T can be defined as a piecewise constant function: f:+, = g: for r - 1 < s < r, where gi is the same as in the discrete time case, and we set f‘+, = 0 for s > p. We can also write for t 2 7: x = +([t - r] +

1, R). The rest of the proof is identical with that of the discrete time case. q

Acknowledgements

The author wishes to thank Jimmy Baikovicius, Karim Nassiri-Toussi and Zsuzsanna Vagg6 for their careful reading of the manuscript, to P. RCvCsz for providing two references, to Mindle Levitt and Solomon Seifu for their considerable amount of work in the preparation of this paper and to an anonymous referee for providing many useful remarks.

References

Bolthausen, E. (1982), Exact convergence rates in some martingale central limit theorems, Ann. Probub. 10, 672- 688.

Gerencser, L. (1985), On the normal approximation of the

maximum-likelihood estimator of ARMA parameters, WP49, Comput. Autom. Inst. Hungarian Acad. Sci. (Buda-

pest). Gerencser, L. (1989a), Verification of Rissanen’s tail-condition

for the parameter estimator of Gaussian ARMA process, TR-CIM-89-6, McGill Research Center for Intelligent Machines (Montreal).

Gerencstr, L., (1989b), On a class of mixing processes, Sro- chastics 26, 165-191.

Gerencstr, L., (1991) Strong approximation theorems for estimator processes in continuous time, in: I. Berkes, E. Csaki and P. RCvesz, eds., Limit Theorems (North-Holland, Amsterdam), to appear.

Gerencstr, L. and J. Rissanen (1986). A prediction bound for Gaussian ARMA processes, Proc. of the 25th. CDC Athens 3, 1487-1490.

206

Page 7: Strong approximation of vector-valued stochastic integrals

Volume 12, Number 3 STATISTICS & PROBABILITY LETTERS September 1991

Hall, P. and C.C. Heyde (1980) Martingale Limit Theory and its Application (Academic Press, New York).

Haeusler, E. (1988), On the rate of convergence in the central

limit theorem for martingales with discrete and continuous

time, Ann. Probab. 16, 275-299. Kutoyants, Yu.A. (1975), On a problem of testing hypothesis

and of the asymptotic normality of stochastic integrals,

Theory Probab. Appl. 20, 385-393. [In Russian.]

Kutoyants, Yu.A. (1980), Estimafion of Stochastic Processes

(Armenian Academy of Sciences, Yerevan; English transla- tion: Heldermann, Berlin, 1984).

M&icz, F. (1974) Moment inequalities and the strong laws of

large numbers, Z. Wahrsch. Venv. Gebiete 35, 299-314. Rissanen, J. (1986) Stochastic complexity and predictive mod-

eling, Ann. Statist. 14, 1080-1100.

Rissanen, J. (1989). Stochastic complexity in statistical inquiry (World Scientific, Singapore).

207