1.people.seas.harvard.edu/~samurphy/papers/clt.pdf · longitudinal and life history data. R unning He ad line: A Cen tral Limit Theorem for Martingales Key wor ds: Longitudinal Data,

A Central Limit Theorem for Local Martingales with Applications to the

Analysis of Longitudinal Data

S. A. MURPHY

September 20, 1996

Department of Statistics

Pennsylvania State University

SUMMARY

A functional central limit theorem for a local square integrable martingale with persistent disconti-

nuities is given. By persistent discontinuities, it is meant that the martingale has jumps which do

not vanish asymptotically. This central limit theorem is motivated by problems in the analysis of

longitudinal and life history data.

Running Headline: A Central Limit Theorem for Martingales

Key words: Longitudinal Data, Event History Analysis, Non-Classical Central Limit Theorem, Mar-

tingale

Research supported by NSF grant DMS-9307255 and partially carried out during the author's visit

with the Econometrics Dept., Free University, Amsterdam.

1

1. INTRODUCTION

Very little recent work has been done on Non-Classical Central Limit Theorems in addition to the

work by Gill (1982) and the paper by Liptser and Shiryaev (1983) which is later reformulated in the

book by Jacod and Shiryaev (1987). The central limit theorem given here is based on the later two

works, but the conditions given are amenable to applications in life history/ longitudinal data analysis.

Life history/longitudinal data typically involves observation of entities or individuals over a period of

time. Even though this type of data may be thought of as the observation of stochastic processes, the

statistical analysis is quite di�erent. The analysis of life history data is based on the observation of

several or many stochastic processes each over a short time period instead of observation of a very few

(or one) stochastic processes over a long period of time. Therefore the asymptotics given here will be

for the number of individuals/processes increasing without bound.

Both longitudinal and life history data can expressed as observations of marked point processes

(Arjas and Haara, 1992). The event times of the point process are the times at which one collects

information on the individuals and the marks are the information collected. Asymptotic results for

estimators and test statistics are based on a central limit theorem for an estimating equation. The

estimating equation may be based on the derivative of the log of the full or partial likelihood. This

derivative forms a local square integrable martingale under integrability conditions(Andersen et.al.,

1993). More generally, estimating equations can be constructed by parametrizing aspects of the

conditional distribution of the information collected at a time point given the past. These estimating

equations are integrals with respect to the marked point process and under integrability conditions

form locally square integrable martingales (Murphy and Li, 1993). Central limit theorems ( Rebolledo,

see Andersen et.al., 1993) for continuous time martingales assume that the intensity of the jumps of

the martingale is (asymptotically) continuous. However it is easy to envision the situation in which

one plans to make measurements on each individual at regular intervals (eg. every 3 months) but some

individuals appear earlier or later for measurements. The time at which the individual appears could

depend on past history, i.e. an appointment for a sicker patient may be scheduled earlier due to health

concerns ( doctor's care or patient self-selection, Gr�uger, Kay and Schumacher, 1991). The asymptotic

2

analysis should allow for both measurements taken at random times and clumping of measurements

(at the 3 month intervals). Additionally individuals may be lost to follow up or censored. The theorem

presented in the next section will also allow for dependence between individuals which is due to the

censoring mechanism and time dependent covariates.

The �rst theorem is given for a local square integrable martingale. Next this theorem is specialized

to a central limit theorem for integrals with respect to a marked point process. Lastly motivating

applications are discussed. All of the proofs are in the appendix.

2. A CENTRAL LIMIT THEOREM

The �rst theorem is the most general given here and is for a d-dimensional local square integrable

martingale, Mn with Mn(0) = 0, de�ned on a stochastic basis�n; IFn; fIFn

tgt2<+ ; Pn

�. Associated

with Mn is a marked point process which counts the jumps of Mn, �Mn, and records the sizes of

the jumps as follows, �n(dx; dt) =Ps3�Mn(s)6=0 ��Mn(s);s(dx; dt) (x 2 <d) where �u is a probability

measure giving mass 1 to the point u. The marked point process has a predictable compensator given

by, �n(dx; dt). For the precise de�nition of the predictable sigma �eld, IFnp, and other terminology see

Jacod and Shiryaev (Chapter 2, 1987). Using this marked point process one can decompose Mn into

a continuous local square integrable martingale, Mcn, plus the compensated jumps, Mn(�) =Mc

n(�) +R �0

Rx(�n(dx; dt) � �n(dx; dt)). It is always possible to write �n as, �n(dx; dt) = Kn(dx; t)�n(dt)

where Kn(dx; s) is a transition function from�<+ �n; IFn

p

�to�<d;B(<d)� and �n is a predictable

nondecreasing process. Let Jn be a subset of discontinuities of �n. These will be the persistent jumps

which will contribute to the �xed jumps of the limiting Gaussian process. The jumps of �n which are

not contained in Jn will be assumed to be asymptotically negligible.

The accumulation of information necessary for asymptotics on the continuous part (and the asymp-

totically negligible jumps) ofMn is formed by summing over ever smaller intervals in time, uncorrelated

increments of Mn. Lipster and Shiryaev avoid the details of how one might accumulate information

on the persistent jumps by requiring that the conditional distribution (given the past) of the jump

sizes approach a normal distribution su�ciently fast (condition R� in Liptser and Shiryaev, 1983).

3

In applications, it is necessary to give some thought to how one would accumulate information at the

persistent jumps. In the discussion following Corollary 1, the jump sizes correspond to the derivative

of a log density. The density is parametrized by a regression of the response observed at the jump

time on covariates. In regression one often assumes that, given the covariates, the responses are inde-

pendent. Denoting the covariates by the variable, �, a general assumption, which mimics the above

regression assumption, is that the conditional distribution of the size of a jump given that the jump

occurred and the past is the mixture of a convolution, i.e., for t 2 Jn, assume that

Kn(dx; t) =

ZIfx 6= 0g(�iFn;i)(dx; �; t)Gn(d�; t): (1)

The above product (�iFn;i) denotes the convolution of probability transition functions, Fn;i's, each

on <d. For �, of dimension n � p, Gn is a probability transition function from (<+ � n; IFnp) to

(<np;B(<np)) and Fn;i is a probability transition function from (<+ � n � <np; IFnp_ B(<np)) to

(<d;B(<d)).

Note that for t a discontinuity point of �n, the assumption ofMn(0) = 0, implies thatRxKn(dx; t) =

0 a:e:Pn. Make the further assumption thatZx(�iFn;i)(dx; �; t) = 0 (2)

a:e: (Gn(d�; t)dPn). De�ne �2n(t) =Rxx

T �n(dx; ftg) and hMcni to be the predictable variation matrix

of Mcn. Without loss of generality, each component of the vector, Mn, is assumed to belong to the

space of right continuous, left hand-limited functions on [0;1), denoted by D[0;1). Let M be a

Gaussian martingale on D[0;1)d with E(MMT ) = �.

Theorem 1

For each t 2 D, D a subset of <+, consider the following assumptions.

1) Asymptotic Negligibility.

a) For all � > 0, Zt

0

Ifs 62 JngZxx

T Ifjjxjj > �g�n(dx; ds) P�! 0:

b) For all � > 0,

Xs2Jn;s�t

Z Xi

Zxx

T Ifjjxjj > �gFn;i(dx; �; s)Gn(d�; s)�n(fsg) P�! 0:

4

2) Convergence of the variance.

a)

hMc

ni(t) +Z t

0

Zxx

T �n(dx; ds)P�! �(t):

b) For all l; j,

Xs2Jn; s�t

Z ��2n(s)lj �Xi

ZxlxjFn;i(dx; �; s)

��Gn(d�; s)�n(fsg) P�! 0;

and Xs2Jn;s�t

j�n(fsg)� 1j P�! 0: (�)

3) Tightness in D[0;1).

Xs�t

� dXj=1

�2n(s)jj

�2 P�!Xs�t

� dXj=1

��(s)jj

�2:

Under conditions 1) and 2), the �nite dimensional distributions of Mn in D converge to the �nite

dimension distributions of M. If D is dense on the real line then the additional assumption of 3)

implies that Mn converges weakly with respect to the Skorohod metric on D[0;1) to M.

Remarks:

(1) Intuitively, the joint conditional distribution of a jump at a time t 2 Jn and the jump size is,

Z�

(�iFn;i)(dx; �; t)Gn(d�; t)�n(ftg) + �0(dx)(1� �n(ftg): (3)

The assumption, 2b), is present because the distribution (3) is for �xed time, t, a mixture of a

convolution. IfR�(�iFn;i)(dx; �; t)Gn(d�; t) is a convolution then the �rst part of 2b) is implied by

the second part of 2b), assumption (*) on the persistent jumps. Furthermore if for �xed t, (3) can be

written as a convolution then assumption 2b) need not be made.

(2) In applications the above result may be more useful when the weak convergence is in supremum

norm on the space of bounded functions on a compact interval, (l1([0; � ])). The space, l1([0; � ]) is

discussed by van der Vaart and Wellner (1993). This can be achieved by employing a di�erent version

of 3) above. The more restrictive version essentially requires the a priori knowledge of the locations

of the persistent jumps (points of discontinuity of �). Replace 3) by:

5

30) Tightness in l1([0; � ]). Suppose D = [0; � ] and in addition, for each s, a discontinuity point

of � and each j, we have

�2n(s)jjP�! ��(s)jj :

Then there exists a version of M which is a tight Borel measurable, Gaussian process in l1([0; � ])

and M converges weakly to M.

(3) The Gaussian martingale, M, possess the following properties. For a �xed countable set of

times, t, �(t) � �(t�) = ��(t) will be nonzero, symmetric and nonnegative de�nite. These are the

�xed times of discontinuities of M. Outside of these �xed times of discontinuity, almost all paths

of M are continuous. Additionally �M(t) =M(t)�M(t�) is multivariate normal with mean zero

and variance-covariance matrix, ��(t). The covariance of Mk(t) and Ml(s), for t less than s, is the

entry in the kth row, lth column of �(t). For more properties of a Gaussian martingale see Jacod and

Shiryaev, (pg. 111, 1987).

Now specialize to local square integrable martingales which can be expressed as integrals with

respect to a nonexplosive marked point process. The marked point process, Nn, is de�ned on the

stochastic basis�n; IFn; fIFn

tgt2<+ ; Pn

�and is a composite of n marked point processes so that the

event times of Nn, say fTjgj�1, are the ordered event times of the individual marked point processes.

Part of the mark at each Tj is an indicator of which of the individual marked point processes jumped

at that time. Denote the mark at time Tj be (Xj ;Zj ; �j). In applications, Xj is the matrix of

reponses and Zj is the matrix of covariates collected at time Tj . So the mark is a matrix of real

components and it has row dimension n; if the ith marked point process has a jump at Tj , �ji is

set to one and the ith row of (Xj ;Zj) corresponds to the mark of the i process. Otherwise, the

ith row of (Xj ;Zj) is set to the empty set and �ji is set to zero. Denote the compensator of Nn

by �n. If the �ltration, IF is the internal �ltration, then �n, written as a measure, is given by,

�n(dx; dz; d�; ds) = P [Xj 2 dx;Zj 2 dz; �j 2 d�; Tj 2 ds��Tj � s; FTj�1 ] on Tj�1 < s � Tj ; j � 1,

where ds denotes the interval [s; s + ds). For fHigi�1, each a d dimensional deterministic function

6

and f isgi�1 predictable, consider,

Mn(�) =Z �

0

Zn�1=2

nXi=1

Hi(xi; zi; i

s; s)�iNn(dx; dz; d�; ds):

To ensure thatMn is a local square integrable martingale, assume that the compensator ofMn is zero

and thatRt

0

R �Pn

i=1Hi(xi; zi; is; s)�i

�2�n(dx; dz; d�; ds) is locally integrable (Jacod and Shiryaev,

pg. 73, 1987). After the corollary is a discussion of how equations of the form ofMn arise as estimating

equations in survival analysis and in the analysis of life history data.

The marked point process of the jumps of Mn is,

�n(du; dt) =

Z�fn�1=2

PiHi(xi;zi; is;s)�ig(du)Ifn

�1=2Xi

Hi(xi; zi; i

s; s)�i 6= 0gNn(dx; dz; d�; ds):

Of course �n follows the same formula as the above but with �n in place of Nn. Assume (1) with � =

(z; �) and Fn;i(dv; �; s) =RIfn�1=2Hi(x; zi;

is; s)�i 2 dvgFn;i(dx; zi; s). The distribution function

Fn;i is allowed to be a function of the past, that is, Fn;i is a probability transition function from

(<+�n�<p; IFnp_B(<p)) to (<k;B(<k)) where p is the column dimension of Z and k is the column

dimension of X. Assume thatRHi(x; zi;

is; s)Fn;i(dx; zi; s) = 0 which implies (2). De�ne �

fign ,

(Gnfig) to be �i�n, (�iGn) integrated over all �, (xl; zl; l 6= i). Let Jn be the set of all discontinuities

of �n. This set certainly includes the discontinuities of �n. If t 2 Jn then �fign (dx; dz; ftg) =

Fn;i(dx; z; t)Gnfig(dz; t)�n(ftg) and �2n(t)lj = n�1Pn

i=1

Rx;z

Hi(x; z; it; t)lHi(x; z;

it; t)jFn;i(dx; z; t)

Gnfig(dz; t)�n(ftg). Assume that at the continuity points of �n, at most one of the individual marked

point processes can jump, that is,

Zx;z;�;s

Ifs 62 Jng�i�j�n(dx; dz; d�; ds) = 0: (4)

The assumptions of theorem 1 simplify nicely.

Corollary 1

For each t 2 D consider the following assumptions.

1) Asymptotic negligibility.

For all � > 0,

n�1nXi=1

Z t

0

ZHi(x; z;

i

s; s)Hi(x; z; i

s; s)T I��Hi(x; z;

i

s; s)�� > �

pn�fign (dx; dz; ds)

P�! 0:

7

2) Convergence of the variance.

a)

n�1nXi=1

Z t

0

ZHi(x; z;

i

s; s)Hi(x; z; i

s; s)T�fign (dx; dz; ds)

P�! �t:

b) For all l; j,

Xs�t

Z ��2n(s)lj � n�1nXi=1

ZHi(x; z;

i

s; s)lHi(x; z;

i

s; s)jFn;i(dx; zi; s)�i

��Gn(dz; d�; s)�n(fsg) P�! 0

and (*)

3) Tightness. Xs�t

� dXj=1

�2n(s)jj

�2 P�!Xs�t

� dXj=1

��(s)jj

�2:

30) Tightness in l1([0; � ]). Suppose that for each s, a discontinuity point of � and each j, we have

�2n(s)jj

P�! ��(s)jj :

Under conditions 1) and 2), the �nite dimensional distributions of Mn in D converge to the �nite

dimensional distributions of M. If D is dense on the real line then the additional assumption of 3)

implies thatMn converges weakly with respect to the Skorohod metric on D[0;1) toM. If D = [0; � ]

and conditions 1), 2) and 30) hold, there exists a version of M which is a tight Borel measurable

process in l1([0; � ]) and M converges weakly to M.

As before, the assumption of 2b) is only necessary because the conditional distribution of X given

the time of the jump and the past is a mixture of a convolution of distributions. If the distributions

of the rows of (X;Z; �) are independent given the time of the jump and the past then the �rst part

of 2b) is implied by the assumption (*) on the persistent jumps. Furthermore if the joint conditional

distribution of the rows of (X ;Z; �) and jump time given the past can be expressed as a product of

distributions for �xed s ((3) is a convolution), then assumption 2b) need not be assumed.

A marked point process can be used to model life history data (Arjas and Haara, 1992), in which

case the event times of the process, indicate the ordering in which observations are collected. In many

applications Tj � j. The mark at time Tj is the information collected. A simpli�ed version is as

follows. In contrast to survival analysis there are two indicators of observability. First an indicator of

8

censoring,�j , an n�1 vector with ith entry equal to one if the ith subject leaves the study at \time"

Tj or is presently absent from the study and zero otherwise. Second an indicator of measurement

time, �j , an n � 1 vector with ith entry equal to one if the ith subject contributes both a covariate

Z and a response, X , at \time", Tj and zero otherwise. Subject i can only contribute a response at

time Tj , if �j�1;i = 0. Then (Xj ;Zj ; �j ;�j) is the mark at time Tj .

The marked point process approach is best illustrated using the terminology of a longitudinal

study. This illustration is a generalization of the work by Scheike, (1994). In an longitudinal study

of n individuals, Tj would be time of the jth event (either an appointment or a censoring time); if

individual i contributes a measurement of X at time Tj then �ji = 1 otherwise �ji = 0. Interest

lies in regressing X on Z and possibly other variables measured prior to the present event time.

These other variables can include the present time, and measurements of X and Z made on previous

appointments. Since some appointments may be scheduled a priori at regular intervals (e.g. every

three months), several individuals can contribute measurements at any one time. The times of the

other appointments or censorings may depend on previous measurements of X and Z. Let U (j�1) =

((Tl;Xl;Zl; �l;�l); l < j; Tj ;Zj ; �j) and de�ne, IFTj = �f(Tl;Xl;Zl; �l;�l); l � jg. Let I be

an arbitrary subset of the integers, 1 through n. Assume that responses from di�erent individuals

present at an appointment time are conditionally independent. That is, the conditional independence

assumption is : onQi2I �ji

Qi62I(1� �ji) = 1,

P [Xji � xi; i 2 IjU (j�1)] =Yi2I

Fn;i(xi;Zji; i

t; t)jt=Tj ;

where for each l and t 2 (Tl�1; Tl], it is a function of the variables in FTl�1 (i.e. i is predictable). This

assumption means that onQi2I �ji

Qi62I(1 � �ji) = 1, the Xji; i 2 I are conditionally independent

with distribution functions Fn;i; i 2 I. The conditional distribution, Fn;i, can be parametrized to

re ect the e�ect of the covariates on Xj . A partial likelihood for the response X is

Yj�1

nYi=1

�Fn;i(dXji;Zji;

i

Tj; Tj)

��ji:

By modeling only Fn;i, we model only the probabilistic evolution of X through time and not the

evolution of the other variables, Z and T .

9

If it is possible to completely model the density of Fn;i, say fn;i, as a function of a parameter,

�, then a partial likelihood analysis would be based on the partial likelihood score. That is the lo-

cal martingale, Mn is given by putting Hi =@

@�ln�fn;i

�. On the other hand, if one is willing to

parametrize at most the conditional mean and conditional variance of Xj , then an analysis based

on the projected partial score is still possible. Suppose that the conditional mean and variance of

Fn;i(�;Zji; iTji ; Tji) is given by �ji = �i(Zji; i

Tji; Tji;�), and Vji = Vi(Zji;

i

Tji; Tji;�), respectively.

The projected partial score is given by,Pj�1

Pn

i=1@

@��jiV

�1ji

(Xji � �ji)�ji (Murphy and Li, 1993).

Here, put Hi(x; z; it; t) = @

@��i(z;

it; t;�)V �1

i(z; i

t; t;�)(x� �i(z;

it; t;�)) to form the local martin-

gale, Mn. Scheike (1994) considers examples in which there are no Z's so that the mean and variance

of Xj are functions of i

Tjonly.

3. EXAMPLES

1.Conditionally Independent Marked Point Processes

A marked point process which satis�es, (1) and (4) can be formed by combining conditionally

independent nonexplosive point processes. Conditionally independent point processes share with the

multivariate point process, the property, that at time points which are continuity points of the compen-

sator only one of the component processes is allowed to jump. Suppose that on a given stochastic basis,

there exists nmarked point processes, Ni; i � 1 each with marks (Xj ; Zj) 2 E; the mark space; j � 1

and locally integrable compensators, �i; i � 1. De�ne the composite marked point process, Nn to

have event times, Tj , at the ordered event times of the Ni's and the ith row of the mark at Tj to

be the empty set if Ni does not jump at Tj and to be the mark from Ni if Ni does jump at Tj . In

addition add to the mark at time Tj , the n by one vector �j , with i row equal to one if Ni jumps at

Tj and zero otherwise. So the mark at time, Tj , is (Xj ;Zj ; �j) which has n rows, one for each of the

Ni's. The conditional independence assumption is;

Xs��

Yi2S

Ni(Ai; fsg)�Z �

0

Yi2S

�i(Ai; ds)

10

is a local martingale for all S, subsets of f1; 2; : : : ; ng and Ai, measurable subsets of the mark space,

E. This means (use the monotone class theorem) that the compensator of Nn is �n(dx; dz; d�; ds) =

If�� 1gQn

i=1 �i(dxi; dzi; ds)�i(1� �i(E; ds))

1��i�;(dxi; dzi)1��i . The dot in the place of the index

for � indicates that � is summed over the index. Note that at the continuity points (in s) of �n

only one of the Ni can jump, that is, (4) holds. So conditionally independent marked point pro-

cesses behave like a multivariate point process at the continuity points of �n. Write �i(dxi; dzi; ds) as

Fi(dxi; zi; s)Gi(dzi; s)�i(ds). Then,�n(dx; dz; d�; ds) = If�� 1gQn

i=1

�Fi(dxi; zi; s)Gi(dzi; s)�i(ds)

��i�(1� �i(ds))�;(dxi; dzi)

�1��iand �

fign (dx; dz; ds) = Fi(dx; z; s)Gi(dz; s)�i(ds).

To write �n in the form speci�ed by (1), put Fn;i(dv; �; s) =RIf(n�1=2Hi(x; zi;

is; s)�i 2

dvgFn;i(dx; zi; s)Gi(dzi; s), put Gn(d�; s) = If�� 1gQ

n

i=1

��i(ds)

��i�1��i(ds)

�1��i1�Q

n

i=1(1��i(ds)) and put �n(ds)

= 1�Qn

i=1(1� �i(ds)).

IfR t0

R �Pn

i=1Hi(xi; zi; is; s)�i

�2�n(dx; dz; d�; ds) < 1 holds for all t then Mn (de�ned in the

previous section) is a locally square integrable martingale, and since (3) can be written as a convo-

lution for �xed s, only assumptions 1), 2a) and 3) of the corollary need be veri�ed in order to prove

weak convergence forMn. Assumptions 1) and 2a) are identical to the assumptions (2.5.1) and (2.5.3)

of Rebolledo's theorem as stated by Andersen et. al. (1993). The only additional assumption here is

3) concerning the persistent jumps.

2.Nelson-Aalen Estimator for Cumulative Hazards with Jumps

Suppose that for each n, there exists mutually independentX1n; X2n; : : : ; Xnn and U1n; U2n; : : : ; Unn

where each Xin has distribution function F and integrated hazard rate, A(�) = R �0(1�F (u�))�1dF (u)

and each Uin has distribution function G. In the following we consider estimation of A. Let

NX

i(t) = IfXin � tg, NU

i(t) = IfUin � tg and Yi(t) = IfXin ^ Uin � tg for all t 2 <+. The

�ltration, IF, is the internal �ltration of the (NXi; NU

i)'s. These processes can be represented by a

marked point process with event times at each of the Uin and Xin and marks indicating which of the

Uin and Xin occur at each event time. Then IF is the internal �ltration for this process and Jacod's

formula for the compensator (Andersen et. al. pg. 96, 1993) yields,R �0

Qi2S

�IfXin � ugdA(u)�

11

as the predictable compensator ofPs��

Qi2S N

X

i(fsg) for S any subset of 1; : : : ; n. Furthermore,

since IfUin � ug is predictable,Ps��

Qi2S IfUin � ugNX

i(fsg) � R �

0

Qi2S

�Yi(u)dA(u)

�forms a

zero mean martingale. Put Ni(t) =Rt

0IfUin � ugdNX

i(s) and J(s) = IfY�(s) > 0g (the dot in

place of the index denotes the sum over that index). Note that the above form of the compen-

sator forPs��

Qi2S Ni(fsg) implies the conditional independence of the Ni as de�ned in the last

section. De�ne Nn to be the point process with event times, Tj at the ordered event times of the

Ni's and at time Tj , the mark, �j , with ith row of �j equal to 1 if Ni jumps at time Tj and zero

otherwise. The conditional independence property implies that the compensator of Nn is given by

�n(d�; ds) = If�� 1gQn

i=1

�Yi(s)A(ds)

��i�1� Yi(s)A(ds)

�1��iIf�� > 0g.

We are unable to observe the (NX

i; NU

i)'s, rather we observe only the (Ni; Yi)'s. The Nelson

Aalen estimator is de�ned for t such that A(t) is �nite and is given by, A(t) =Rt

0J(s)Y�(s)�1dN�(s)

(in the integrand, interpret 0=0 as 0). The form of the compensator given in the last paragraph

implies that the compensator of A isR t0J(s)dA(s). An asymptotic analysis of the Nelson-Aalen

estimator is based on Xn(t) =pn�A(t) � R

t

0J(s)dA(s)

�and on

pnRt

0J(s) � 1 dA(s). Suppose

that for a speci�ed T , A(T ) < 1 and G(T�) < 1. The Glivenko-Cantellli theorem implies that

sup0�t�T jn�1Y�(t)� (1�F (t�)(1�G(t�)j converges in probability to zero which further implies that

sup0 jpnRt

0J(s)� 1 dA(s)j goes to zero in probability.

Since J(s)Y�(s)�1 bounded above by 1, Xn is a square integrable martingale (see Andersen et.

al., 1993, pg. 181) and Theorem 1 can be used to derive the asymptotic distribution for Xn on [0; T ].

Put Mn(t) = Xn(t ^ T ) and let Jn be the set of discontinuities of A. The marked point process, �n

recording the jumps of Mn is given by

�n(dx; dt) = Ift � TgZ�

Ifx 6= 0g�pnJ(t)Y

�1� (t)(��Y�(t)4A(t))(dx)Nn(d�; dt)+

Ift � TgIfx 6= 0g�(�pnJ(t)4A(t))(dx)�1�

Z�

Nn(d�; dt)�:

The compensator of �n, �n, is given by same formula as above but with�n in place ofNn. For t a conti-

nuity point of A, �n(dx; dt) simpli�es considerably to �n(dx; dt) = Ifx 6= 0gY�(t)�(pnJ(t)=Y�(t))(dx)A(dt).

Otherwise note that Fn;i(�; t) is the distribution ofpn(B � Yi(t)4A(t))J(t)=Y�(t) where B is a

Bernoulli with success probability, Yi(t)4A(t) and �n(dt) = �fs:4A(s)6=0g(dt). This means that (3)

12

can be written as a convolution and assumption 2b) is not necessary. Assumptions 1a), 1b) and 2a)

all follow by the Glivenko- Cantelli theorem applied to n�1Y� and the assumption that (1�F (T�)(1�

G(T�) > 0. The equation in assumption 1a) isRt^T0

nJ(s)=Y�(s)IfJ(s)=Y�(t) > �=pn�gAc(ds). And

to verify assumption 1b, note thatPs2Jn;s�t^T

Pi

Rx2Ifjxj > �gFn;i(dx; s)�n(fsg) =

Xs2Jn;s�t^T

nJ(s)=Y�(s)(1�A(fsg))2A(fsg)IfJ(s)=Y�(s)(1�A(fsg)) > �=png+

Xs2Jn;s�t^T

nJ(s)=Y�(s)A(fsg)2(1�A(fsg))IfJ(s)=Y�(s)A(fsg) > �=png:

SinceR t^T0

nJ(s)=Y�(s)(1�A(fsg))dA(s) converges in probability toR t^T0

�(1�F (t�)(1�G(t�)��1(1�

A(fsg))dA(s) assumption 2a) holds. This is su�cient for �nite dimensional convergence to a Gaus-

sian martingale with covariance function, �(t) =Rt^T0

�(1� F (s�)(1�G(s�)��1(1� A(fsg))dA(s).

Functional weak convergence using the Skorohod metric follows also but this problem is nice enough

so that it is possible to prove functional convergence on the space of bounded functions on [0; T ],

called l1[0; T ]. All that is necessary is to prove asymptotic tightness. It is su�cient to prove that

for each �, � > 0 there exists a �nite partition of [0; T ], say 0 = t1 <; : : : ; < tk = T , such that

lim supnP�maxi supt2[ti;ti+1) jMn(t) �Mn(ti)j > �

�< � (van der Vaart and Wellner, 1993). This

is easily enough done by choosing the partition to contain the larger jump points of A and using

Lenglart's inequality (see Andersen et.al., 1993). Gill (1980) proved a similar result for the Kaplan-

Meier Estimator of F by inserting an interval at the jump points of A. This theorem yields a quick

proof in that setting also.

3. The Proportional Hazards Model.

The marked point process, Nn is the composite of an n-variate multivariate counting process. The

event times, Tj , are the ordered event times of the component counting processes and the mark at

time Tj is simply �j where �ji = 1 if the ith counting process jumps at time Tj and is zero otherwise.

The ith counting process has intensity, Yi(t)e�TZi(t)�(t)dt; where the covariate process, Zi is locally

bounded and predictable and the baseline intensity, � is locally integrable. The process, Yi is also

predictable and is the censoring process, that is, Yi is one as long as it is possible to observe a jump of

13

the ith process and zero thereafter. Since the components of a multivariate counting process can not

have common jumps, the component counting processes are conditionally independent. This implies

that the marked point process has compensator,

�n(d�; dt) = If�� 1gnYi=1

�Yi(t)e

�TZi(t)

��i�(t)dt:

Note that �fign is the intensity of the ith counting process. The derivative with respect to � of the

natural log of Cox's partial likelihood (Andersen et.al., 1993 pg. 483) is given by,

Z nXi=1

�i

�Zi(t)�

PlYl(t)e

�TZl(t)Zl(t)P

lYl(t)e�

TZl(t)

�Nn(d�; dt):

The above is a local square integrable martingale, since the Zi's are locally bounded. De�ne Mn by

setting Hi =�Zi(t) �

PlYl(t)e

�TZl(t)P

lYl(t)e

�T Zl(t)

�. Since �n is continuous, only assumptions 1) and 2a) need to

be proved. These are the conditions of Rebolledo's theorem as stated in Andersen et. al. (1993, pg.

83). This model can be developed as in the previous example above (see Andersen et. al., 1993).

5. APPENDIX

Proof of Theorem 1

For simplicity, the proof for convergence of the �nite dimensional distributions is given for a

one dimensional local square integrable martingale, Mn, with Mn(0) = 0. The Cram�er-Wold device

can then be used to extend the result to higher dimensions. The superscripts c; d will denote the

continuous, respectively discrete, parts of the martingale and compensators.

Intuitively the conditional distribution of Mn(dt) given the past is a convolution of a continuous

gaussian increment Mcn(dt) with mean zero and variance, hMc

ni(dt) and a random variable Md

n(dt)

which assumes the value x�Ryy�n(dy; dt) according to, �n(dx; dt) and the value, �

Ryy�n(dy; dt) with

probability, (1��n(<; dt). SinceMn(0) = 0 implies thatRxx�n(dx; ftg) is zero a:s:,Md

n(ftg) will have

conditional distribution �n(dx; ftg) + �0(dx)(1 � �n(<; ftg)) (recall that �0 is a probability measure

giving mass 1 to the point 0). The characteristic function (conditional on the past) of the random

variable, (Mdn(dt)), is given by, e

�iuRx

x�n(dx;dt)�1 +

Reiux � 1 �n(dx; dt)

�, it's conditional mean is

14

Rxx�n(dx; dt) and it's conditional variance is

Rxx2�n(dx; dt). This intuition helps one understand why

the following proof works.

Theorem VIII.1.18 on page 418 of Jacod and Shiryaev (1987) gives a product integral which acts

much as a characteristic function when the limiting process is a process of independent increments

(as is the case here). This product integral is the product of the conditional characteristic functions

of Mn(dt). De�ne the process of locally bounded variation, An(t;u) = �1=2u2hMcni(t) +

R t0

Reiux �

1� iux �n(dx; ds). Then the product integral is

Gn(t;u) =Ys�t

(1 +An(ds;u)):

Since An is one dimensional andRx�n(dx; ftg) is zero, Gn simpli�es to

Gn(t;u) = exp

�� 1=2u2hMc

ni(t)�Ys�t

�1 +

Zx

eiux � 1 �n(dx; ds)

�e�iu

Rx

x�c

n(dx;ds)

(Andersen et. al. pg. 90, 1993). Theorem VIII.1.18 states that if Gn(t;u) converges in probability

to exp� � 1=2u2�(t)

for all u 2 < and for all t 2 D, then the �nite dimensional distributions in D

of Mn converge to the �nite dimensional distributions in D of a Gaussian martingale with covariance

function, �.

Set �n = exp� � 1=2u2hMc

ni(t) � 1=2u2

Rt

0

Rx2�n(dx; ds)

and �n = exp

� Rt

0

Reiux � 1 � iux+

1=2u2x2�cn(dx; ds): Of course, �n converges in probability to exp

�� 1=2u2�(t)by assumption 2a)

and in the next paragraph it is proved that �n converges to one in probability. Consider

Gn(t;u)� �n�n = exp

�� 1=2u2hMc

ni(t) +

Zt

0

Zeiux � 1� iux �c

n(dx; ds)

�

"Ys�t

�1 +

Zeiux � 1 �n(dx; fsg)

�� exp

�� 1=2u2

Xs�t

Zx2�n(dx; fsg)

�#:

The term before the square brackets can be shown to be bounded in probability by using the results

on �n and �n. All that needs to be done is to show that �n goes to one in probability and that the

above term in square brackets goes to zero in probability.

Since there is a constant, C for which jeiux � 1� iux+ u2x2=2j � C(jxj3 ^ x2), the exponent in

�n can be bounded in absolute value by CRt

0

R jxj3 ^ x2�cn(dx; ds). For � small,

Zt

0

Zjxj3 ^ x2�c

n(dx; ds) �

Zt

0

Zx2Ifjxj > �g�c

n(dx; ds) + �

Zt

0

Zx2Ifjxj � �g�c

n(dx; ds)

15

But this goes to zero in probability because of assumption 1a), assumption 2a) and the fact that �

may be taken arbitrarily small.

All that is left is to deal with the jumps of Mn. Intuitively, the term,

Ys�t

�1 +


�� exp

�� 1=2u2

Xs�t

Zx2�n(dx; fsg)

�

is just the di�erence between the conditional characteristic function of the jumps and the characteristic

function of a normal random variable with the same variance as the jump.

Both terms above can be written as two products, the �rst over the persistent jumps (in Jn)

and the second over the asymptotically negligible jumps (jumps not in Jn). Since A1A2 � B1B2 =

(A1 �B1)A2 +B1(A2 �B2) all that is necessary is to prove that

Ys�t;s62Jn

�1 +


�� exp

�� 1=2u2

Xs�t;s62Jn

Zx2�n(dx; fsg)

�P�! 0 (5)

and

Ys�t;s2Jn

�1 +


�� exp

�� 1=2u2

Xs�t;s2Jn

Zx2�n(dx; fsg)

�P�! 0: (6)

The assumptions, (1), 1b) and 2b), are only used in proving (6) for the persistent jumps.

First (5) is proven as it is the easier. Using Taylor series expansions, it is possible to prove that

for any �,

jeiux � 1� iux+ u2x2=2Ifjxj � �gj � juxj3=6Ifjxj � �gj+ (tx)2=2Ifjxj > �gj:

The notation to follow is made simpler by denoting En;s as expectation with respect to the distribution,

�n(dx; fsg) + �0(dx)(1 � �n(<; fsg). Note that this distribution has mean zero and variance �2n(s).

Let f�ng��1 be a sequence converging so slowly to zero so that 1a) holds for �n in place of �. Using

the last inequality given above, one gets that

En;s(eiux)� 1 + u2=2En;s(x

2Ifjxj � �ng) = n;s

� juj36�nEn;s(x

2Ifjxj � �ng) + u2=2En;s(x2Ifjxj > �ng)

�

for n;s which is bounded in absolute value by one. Denote

�u2=2En;s(x2Ifjxj � �ng) + n;s

�juj3=6�nEn;s(x2Ifjxj � �ng) + u2=2En;s(x

2Ifjxj > �ng)�

16

by n;s. These terms, n;s; s 62 Jn possess the following nice properties,

1) the sups62Jn; s�t j n;sj converges to zero in probability and

2) thePs62Jn; s�t j n;sj is bounded in probability.

These two properties follow from assumptions 1a) and 2a). The proof of this follows the proof in

Chung (1974, pg. 200).

Rewrite equation (5) as,"exp

� Xs62Jn; s�t

ln(En;seiux) +

1

2u2

Xs62Jn; s�t

�2n(s)

�� 1

#e�1=2u2

Ps62Jn; s�t

�2n(s):

SincePs62Jn; s�t �

2n(s) is bounded in probability, it is su�cient to show that,

Xs62Jn; s�t

ln(En;seiux) +

1

2u2

Xs62Jn; s�t

�2n(s) =X

s62Jn; s�tln(1 + n;s) +

1

2u2

Xs62Jn; s�t

�2n(s)

converges to zero in probability in order to �nish the proof of (5). Since the sups62Jn; s�t j n;sj

converges to zero in probability, j ln(1+ n;s)� n;sj = jP1m=2

(�1)m�1m!

mn;sj can be bounded above by

j n;sj2=2P1m=2(1=2)

m�2 � j n;sj2 on a set of probability going to one. Now,

�� Xs62Jn; s�t

ln(1+ n;s)+1

2u2

Xs62Jn; s�t

�2n(s)

�� Xs62Jn; s�t

ln(1+ n;s)� n;s��+�� Xs62Jn; s�t

n;s+1

2u2�2n(s)

��:But jP

s62Jn; s�t ln(1 + n;s) � n;sj �Ps62Jn; s�t j n;sj2 on a set of probability converging to 1 and

therefore converges to zero in probability. Also jPs62Jn; s�t n;s +

12u2�2n(s)j can also be shown to

converge to zero using assumptions 1a) and 2a).

Now condition (6) for the persistent jumps must be proved. Assumptions (1), 1b) and 2b) are suf-

�cient to prove (6). As in the above paragraph one only need prove thatPs2Jn; s�t ln[En;s(e

iux)] +

u2=2�2n(s) goes to zero in probability. Note that for s 2 Jn, the expectation, En;s, is with re-

spect to the distribution,R�(�iFn;i)(dx; �; s)Gn(d�; s)�n(fsg) + �0(dx)(1 � �n(fsg)) and �2

n(s) =R P

i

Rx2Fn;i(dx; �; s)Gn(d�; s)�n(fsg). Further, recall that,

Pi

RxFn;i(dx; �; s) = 0 for almost all

�. If necessary, rede�ne the Fn;i(�; �; s) so thatRxFn;i(dx; �; s) = 0. Using the same Taylor series

argument as in the proof of (5), one gets,R(eiux)Fn;i(dx; �; s)� 1 = n;s(�; i) for

n;s(�; i) = �u2=2Z(x2Ifjxj � �ng)Fn;i(dx; �; s)+

in;s

� juj36�n

Zx2Ifjxj � �ng Fn;i(dx; �; s) + u2=2

Z(x2Ifjxj > �ng Fn;i(dx; �; s)

�

17

where in;s

which is bounded in absolute value by one. If�Q

iFn;i

�Gn can also be written as a productdistribution then the above integrals should be with respect to the ith distribution in the product. Or if

the expectation, En;s can be written as an expectation with respect to a product distribution, then the

integrals in the Taylor series should be with respect to each distribution in the product. Then similar

steps to the following can be taken to prove thatPs2Jn; s�t j ln[eu

2=2�2

n(s)En;s(e

iux)]j goes to zero in

probability. Indeed if En;s can be written as an expectation with respect to a product distribution,

then each of the integrals in the above expansion should be replaced by the each distribution in the

product and then assumptions 2b) and (*) are no longer necessary.

All that is necessary to �nish the proof is to show thatPs2Jn; s�t jeu

2=2�2

n(s)En;s(e

iux) � 1j

goes to zero in probability. This is the case, since onPs2Jn; s�t jeu

2=2�2

n(s)En;s(e

iux) � 1j < 1=2,

Ps2Jn; s�t j ln[eu

2=2�2

n(s)En;s(e

�ux)] � 1j is bounded above byPs2Jn; s�t jeu

2=2�2

n(s)En;s(e

iux) � 1j.

This in turn is bounded above by,

Xs2Jn

��Z

expnX

i

ln(1 + n;s(�; i)) + 1=2u2�2n(s)o� 1 Gn(d�; s)�n(fsg)

��+X

s2Jn; s�tj�n(fsg)� 1j: (7)

The last term goes to zero in probability by assumption (*). If jeg1(v)+g2(v) � 1j � C for all v, A is a

set and L is a positive measure, then the integral,

�� Z eg1(v)+g2(v) � 1 L(dv)�� C � L

�jg1(v)jIfv 2 Ag > � [ jg2(v)j > � [ Ac

�+

2

Zjg1(v)Ifv 2 Ag+ g2(v)jI

njg1(v)jIfv 2 Ag < � \ jg2(v)j < � \ A

oL(dv)

for � su�ciently small. This in turn is bounded above by

C=�

Zjg1(v)Ifv 2 AgjL(dv) + C=�

Zjg2(v)jL(dv)+

CL(Ac)) + 2

Zjg1(v)Ifv 2 AgjL(dv) + 2

Zjg2(v)jL(dv):

To apply this inequality to the �rst term in (7), let �0 be arbitrarily greater than zero and set g1 =

[Piln(1 + n;s(�; i)) �

Pi n;s(�; i)], A = fsupi j n;s(�; i))j � �0g, L(d�) = Gn(d�; s)�n(fsg), C =

18

2eu2=2�2

n(s) and g2 =

Pi n;s(�; i)+u

2=2�2n(s). To �nish the proof the following must converge to zero

in probability;

1)P

s2Jn; s�tR[Pij ln(1 + n;s(�; i))�

Pi n;s(�; i)]Ifsupi j n;s(�; i))j � �0gGn(d�; s)�n(fsg);

2)P

s2Jn; s�tR[jP

i n;s(�; i) + u2=2�2

n(fsg)j Gn(d�; s)�n(fsg) and

3)P

s2Jn; s�t Gn�fsup

ij n;s(�; i)j > �0g; s��n(fsg):

First consider 1). This is bounded above byPs2Jn; s�t

R[Pij n;s(�; i)j2Ifsupi j n;s(�; i))j �

�0gGn(d�; s)�n(fsg) which is equal to Op(1)�0P

s2Jn; s�t �2n(s). Since

Ps2Jn; s�t �

2n(s) is bounded in

probability as n increases, 1) can be made as small as desired by reducing the value of �0. Next using the

de�nition of n;s(�; i) note that 2) is bounded above by the sum of u2=2Ps2Jn; s�t

R[jP

i

Rx2Fn;i(dx; �; s)�

�2n(s)jGn(d�; s)�n(fsg) and u2=2Ps2Jn; s�t

R[Pi

Rx2I jfjxj > �ngFn;i(dx; �; s)Gn(d�; s)�n(fsg) and

Op(1)�nP

s2Jn; s�t �2n(s). All three terms converge to zero in probability. All that is left is 3). It is

bounded above by a constant times 1=�0Ps2Jn; s�t

R[Pi

Rx2I jfjxj > �ngFn;i(dx; �; s)Gn(d�; s)�n(fsg)

plus a constant times (1=�0)�2nPs2Jn; s�t �

2n(s). These two terms also converge to zero in probability.

To prove tightness in the Skorohod metric, assume that D is dense on <d. Theorem VI.5.17 of

Jacod and Shiryaev, (1987) implies that it is su�cient to show that the predictable variation of Mn

converges in the Skorohod metric. That is, there exists a nondecreasing deterministic function, �, for

which,

1) hMnijj(t) P�! �jj(t) for j = 1; : : : ; d and

2)P

d

j=1

Ps�t

�4hMnijj(s)�2 P�!P

d

j=1

Ps�t

�4�jj(s)�2,

for all t 2 D. The predictable variation ofMn is given by hMni(�) = hMcni(�)+

R �0

RxxT �n(dx; ds).

So 1) is assumption 2a) and 2) is assumption 3).

To prove tightness in l1([0; � ]), we must prove that for any �; � > 0 there exists a partition,

0 = t1 < t2 <; : : : ; < tk = � , for which

lim supnP (supi

supt2[ti;ti+1)

jMn(t)�Mn(ti)j > �) < �:

For a reference see van der Vaart and Wellner (1993, chapter 1, section 4). Choose the partition to con-

tain all of the (�nitely many) jumps of � which are of size greater than �=4. In addition choose the par-

tition �ne enough so that �(ti+1�)��(ti) < �=4. To show thatPiP (supt2[ti;ti+1) jMn(t)�Mn(ti)j >

19

�) < �, employ Lenglart's inequality as proved in Jacod and Shiryaev (1987, pg. 35). Assumption 30

is used in proving that hMni converges uniformly to �.

Proof of Corollary 1

Since (4) is assumed, 2) and 3) of theorem 1 translate directly into the assumptions 2) and 3) of

this corollary. The equation in assumption 1b) of theorem 1 is

n�1nXi=1

Xs2Jn;s�t

ZHi(x; z;

i

s; s)Hi(x; z; i

s; s)T I��Hi(x; z;

i

s; s)�� > �

pnFn;i(dx; z; s)Gnfig(dz; s)�n(fsg):

Since there are no non-persistent jumps, the equation in assumption 1a) is

Zt

0

ZxxT Ifjjxjj > �g�c

n(dx; ds):

The above can be expressed in terms of �n;

Z t

0

Zn�1

� nXi=1

H(xi; zi; i

s; s)�i

�� nXi=1

H(xi; zi; i

s; s)�i

�TIfjj

nXi=1

H(xi; zi; i

s; s)�ijj > �png�n(dx; dz; d�; ds):

Recall that no two of the �i's can be one at the continuity points of �n, so that the summation over

i comes out of the square term and the indicator term, resulting in assumption 1) of the corollary.

20

5. REFERENCES

Andersen, P.K., �. Borgan, R.D. Gill, and N. Keiding (1993). Statistical models based on counting

processes. Springer Verlag, New York.

Arjas, E. and P. Haara (1992). Observation scheme and likelihood. Scand. J. Statist. 19, 111-132.

Chung,K.L. (1974). A course in probability theory. Academic Press, Inc., San Diego.

Gill, R.D. (1980). Censoring and stochastic integrals. Mathematical Centre Tract 124. Mathematisch

Centrum, Amsterdam.

Gr�uger, J., R. Kay, and M. Schumacher (1991). The validity of inferences based on incomplete

observations in disease state models. Biometrics 47, 595-605.

Jacod, J. and A.N. Shiryaev (1987). Limit theorems for stochastic processes. Springer Verlag, New

York.

Liptser, R.S. and A.N. Shiryaev (1983). On the invariance principle for semi-martingales: the

\nonclassical"case. Theory Prob. Appl. 28, 1-34.

Murphy, S.A. and B. Li (1993). Projected partial likelihood and its application to the longitudinal

data. to appear in Biometrika.

Scheike, T.H. (1994). Parametric regression for longitudinal data with counting process measurement

times. Scand. J. Statist. 21, 245-264.

van der Vaart, A.W. and J.A. Wellner (1993). Weak convergence and empirical processes. IMS Lecture

Notes-Monograph Series (to appear).

Susan A. Murphy

Department of Statistics, Pennsylvania State University,

326 Classroom Building, University Park, PA 16802, U.S.A.

21

Documents

1.people.seas.harvard.edu/~samurphy/papers/clt.pdf · longitudinal and life history data. R unning He ad line: A Cen tral Limit Theorem for Martingales Key wor ds: Longitudinal Data,