Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
A Central Limit Theorem for Local Martingales with Applications to the
Analysis of Longitudinal Data
S. A. MURPHY
September 20, 1996
Department of Statistics
Pennsylvania State University
SUMMARY
A functional central limit theorem for a local square integrable martingale with persistent disconti-
nuities is given. By persistent discontinuities, it is meant that the martingale has jumps which do
not vanish asymptotically. This central limit theorem is motivated by problems in the analysis of
longitudinal and life history data.
Running Headline: A Central Limit Theorem for Martingales
Key words: Longitudinal Data, Event History Analysis, Non-Classical Central Limit Theorem, Mar-
tingale
Research supported by NSF grant DMS-9307255 and partially carried out during the author's visit
with the Econometrics Dept., Free University, Amsterdam.
1
1. INTRODUCTION
Very little recent work has been done on Non-Classical Central Limit Theorems in addition to the
work by Gill (1982) and the paper by Liptser and Shiryaev (1983) which is later reformulated in the
book by Jacod and Shiryaev (1987). The central limit theorem given here is based on the later two
works, but the conditions given are amenable to applications in life history/ longitudinal data analysis.
Life history/longitudinal data typically involves observation of entities or individuals over a period of
time. Even though this type of data may be thought of as the observation of stochastic processes, the
statistical analysis is quite di�erent. The analysis of life history data is based on the observation of
several or many stochastic processes each over a short time period instead of observation of a very few
(or one) stochastic processes over a long period of time. Therefore the asymptotics given here will be
for the number of individuals/processes increasing without bound.
Both longitudinal and life history data can expressed as observations of marked point processes
(Arjas and Haara, 1992). The event times of the point process are the times at which one collects
information on the individuals and the marks are the information collected. Asymptotic results for
estimators and test statistics are based on a central limit theorem for an estimating equation. The
estimating equation may be based on the derivative of the log of the full or partial likelihood. This
derivative forms a local square integrable martingale under integrability conditions(Andersen et.al.,
1993). More generally, estimating equations can be constructed by parametrizing aspects of the
conditional distribution of the information collected at a time point given the past. These estimating
equations are integrals with respect to the marked point process and under integrability conditions
form locally square integrable martingales (Murphy and Li, 1993). Central limit theorems ( Rebolledo,
see Andersen et.al., 1993) for continuous time martingales assume that the intensity of the jumps of
the martingale is (asymptotically) continuous. However it is easy to envision the situation in which
one plans to make measurements on each individual at regular intervals (eg. every 3 months) but some
individuals appear earlier or later for measurements. The time at which the individual appears could
depend on past history, i.e. an appointment for a sicker patient may be scheduled earlier due to health
concerns ( doctor's care or patient self-selection, Gr�uger, Kay and Schumacher, 1991). The asymptotic
2
analysis should allow for both measurements taken at random times and clumping of measurements
(at the 3 month intervals). Additionally individuals may be lost to follow up or censored. The theorem
presented in the next section will also allow for dependence between individuals which is due to the
censoring mechanism and time dependent covariates.
The �rst theorem is given for a local square integrable martingale. Next this theorem is specialized
to a central limit theorem for integrals with respect to a marked point process. Lastly motivating
applications are discussed. All of the proofs are in the appendix.
2. A CENTRAL LIMIT THEOREM
The �rst theorem is the most general given here and is for a d-dimensional local square integrable
martingale, Mn with Mn(0) = 0, de�ned on a stochastic basis�n; IFn; fIFn
tgt2<+ ; Pn
�. Associated
with Mn is a marked point process which counts the jumps of Mn, �Mn, and records the sizes of
the jumps as follows, �n(dx; dt) =Ps3�Mn(s)6=0 ��Mn(s);s(dx; dt) (x 2 <d) where �u is a probability
measure giving mass 1 to the point u. The marked point process has a predictable compensator given
by, �n(dx; dt). For the precise de�nition of the predictable sigma �eld, IFnp, and other terminology see
Jacod and Shiryaev (Chapter 2, 1987). Using this marked point process one can decompose Mn into
a continuous local square integrable martingale, Mcn, plus the compensated jumps, Mn(�) =Mc
n(�) +R �0
Rx(�n(dx; dt) � �n(dx; dt)). It is always possible to write �n as, �n(dx; dt) = Kn(dx; t)�n(dt)
where Kn(dx; s) is a transition function from�<+ �n; IFn
p
�to�<d;B(<d)� and �n is a predictable
nondecreasing process. Let Jn be a subset of discontinuities of �n. These will be the persistent jumps
which will contribute to the �xed jumps of the limiting Gaussian process. The jumps of �n which are
not contained in Jn will be assumed to be asymptotically negligible.
The accumulation of information necessary for asymptotics on the continuous part (and the asymp-
totically negligible jumps) ofMn is formed by summing over ever smaller intervals in time, uncorrelated
increments of Mn. Lipster and Shiryaev avoid the details of how one might accumulate information
on the persistent jumps by requiring that the conditional distribution (given the past) of the jump
sizes approach a normal distribution su�ciently fast (condition R� in Liptser and Shiryaev, 1983).
3
In applications, it is necessary to give some thought to how one would accumulate information at the
persistent jumps. In the discussion following Corollary 1, the jump sizes correspond to the derivative
of a log density. The density is parametrized by a regression of the response observed at the jump
time on covariates. In regression one often assumes that, given the covariates, the responses are inde-
pendent. Denoting the covariates by the variable, �, a general assumption, which mimics the above
regression assumption, is that the conditional distribution of the size of a jump given that the jump
occurred and the past is the mixture of a convolution, i.e., for t 2 Jn, assume that
Kn(dx; t) =
ZIfx 6= 0g(�iFn;i)(dx; �; t)Gn(d�; t): (1)
The above product (�iFn;i) denotes the convolution of probability transition functions, Fn;i's, each
on <d. For �, of dimension n � p, Gn is a probability transition function from (<+ � n; IFnp) to
(<np;B(<np)) and Fn;i is a probability transition function from (<+ � n � <np; IFnp_ B(<np)) to
(<d;B(<d)).
Note that for t a discontinuity point of �n, the assumption ofMn(0) = 0, implies thatRxKn(dx; t) =
0 a:e:Pn. Make the further assumption thatZx(�iFn;i)(dx; �; t) = 0 (2)
a:e: (Gn(d�; t)dPn). De�ne �2n(t) =Rxx
T �n(dx; ftg) and hMcni to be the predictable variation matrix
of Mcn. Without loss of generality, each component of the vector, Mn, is assumed to belong to the
space of right continuous, left hand-limited functions on [0;1), denoted by D[0;1). Let M be a
Gaussian martingale on D[0;1)d with E(MMT ) = �.
Theorem 1
For each t 2 D, D a subset of <+, consider the following assumptions.
1) Asymptotic Negligibility.
a) For all � > 0, Zt
0
Ifs 62 JngZxx
T Ifjjxjj > �g�n(dx; ds) P�! 0:
b) For all � > 0,
Xs2Jn;s�t
Z Xi
Zxx
T Ifjjxjj > �gFn;i(dx; �; s)Gn(d�; s)�n(fsg) P�! 0:
4
2) Convergence of the variance.
a)
hMc
ni(t) +Z t
0
Zxx
T �n(dx; ds)P�! �(t):
b) For all l; j,
Xs2Jn; s�t
Z ����2n(s)lj �Xi
ZxlxjFn;i(dx; �; s)
���Gn(d�; s)�n(fsg) P�! 0;
and Xs2Jn;s�t
j�n(fsg)� 1j P�! 0: (�)
3) Tightness in D[0;1).
Xs�t
� dXj=1
�2n(s)jj
�2 P�!Xs�t
� dXj=1
��(s)jj
�2:
Under conditions 1) and 2), the �nite dimensional distributions of Mn in D converge to the �nite
dimension distributions of M. If D is dense on the real line then the additional assumption of 3)
implies that Mn converges weakly with respect to the Skorohod metric on D[0;1) to M.
Remarks:
(1) Intuitively, the joint conditional distribution of a jump at a time t 2 Jn and the jump size is,
Z�
(�iFn;i)(dx; �; t)Gn(d�; t)�n(ftg) + �0(dx)(1� �n(ftg): (3)
The assumption, 2b), is present because the distribution (3) is for �xed time, t, a mixture of a
convolution. IfR�(�iFn;i)(dx; �; t)Gn(d�; t) is a convolution then the �rst part of 2b) is implied by
the second part of 2b), assumption (*) on the persistent jumps. Furthermore if for �xed t, (3) can be
written as a convolution then assumption 2b) need not be made.
(2) In applications the above result may be more useful when the weak convergence is in supremum
norm on the space of bounded functions on a compact interval, (l1([0; � ])). The space, l1([0; � ]) is
discussed by van der Vaart and Wellner (1993). This can be achieved by employing a di�erent version
of 3) above. The more restrictive version essentially requires the a priori knowledge of the locations
of the persistent jumps (points of discontinuity of �). Replace 3) by:
5
30) Tightness in l1([0; � ]). Suppose D = [0; � ] and in addition, for each s, a discontinuity point
of � and each j, we have
�2n(s)jjP�! ��(s)jj :
Then there exists a version of M which is a tight Borel measurable, Gaussian process in l1([0; � ])
and M converges weakly to M.
(3) The Gaussian martingale, M, possess the following properties. For a �xed countable set of
times, t, �(t) � �(t�) = ��(t) will be nonzero, symmetric and nonnegative de�nite. These are the
�xed times of discontinuities of M. Outside of these �xed times of discontinuity, almost all paths
of M are continuous. Additionally �M(t) =M(t)�M(t�) is multivariate normal with mean zero
and variance-covariance matrix, ��(t). The covariance of Mk(t) and Ml(s), for t less than s, is the
entry in the kth row, lth column of �(t). For more properties of a Gaussian martingale see Jacod and
Shiryaev, (pg. 111, 1987).
Now specialize to local square integrable martingales which can be expressed as integrals with
respect to a nonexplosive marked point process. The marked point process, Nn, is de�ned on the
stochastic basis�n; IFn; fIFn
tgt2<+ ; Pn
�and is a composite of n marked point processes so that the
event times of Nn, say fTjgj�1, are the ordered event times of the individual marked point processes.
Part of the mark at each Tj is an indicator of which of the individual marked point processes jumped
at that time. Denote the mark at time Tj be (Xj ;Zj ; �j). In applications, Xj is the matrix of
reponses and Zj is the matrix of covariates collected at time Tj . So the mark is a matrix of real
components and it has row dimension n; if the ith marked point process has a jump at Tj , �ji is
set to one and the ith row of (Xj ;Zj) corresponds to the mark of the i process. Otherwise, the
ith row of (Xj ;Zj) is set to the empty set and �ji is set to zero. Denote the compensator of Nn
by �n. If the �ltration, IF is the internal �ltration, then �n, written as a measure, is given by,
�n(dx; dz; d�; ds) = P [Xj 2 dx;Zj 2 dz; �j 2 d�; Tj 2 ds��Tj � s; FTj�1 ] on Tj�1 < s � Tj ; j � 1,
where ds denotes the interval [s; s + ds). For fHigi�1, each a d dimensional deterministic function
6
and f isgi�1 predictable, consider,
Mn(�) =Z �
0
Zn�1=2
nXi=1
Hi(xi; zi; i
s; s)�iNn(dx; dz; d�; ds):
To ensure thatMn is a local square integrable martingale, assume that the compensator ofMn is zero
and thatRt
0
R �Pn
i=1Hi(xi; zi; is; s)�i
�2�n(dx; dz; d�; ds) is locally integrable (Jacod and Shiryaev,
pg. 73, 1987). After the corollary is a discussion of how equations of the form ofMn arise as estimating
equations in survival analysis and in the analysis of life history data.
The marked point process of the jumps of Mn is,
�n(du; dt) =
Z�fn�1=2
PiHi(xi;zi; is;s)�ig(du)Ifn
�1=2Xi
Hi(xi; zi; i
s; s)�i 6= 0gNn(dx; dz; d�; ds):
Of course �n follows the same formula as the above but with �n in place of Nn. Assume (1) with � =
(z; �) and Fn;i(dv; �; s) =RIfn�1=2Hi(x; zi;
is; s)�i 2 dvgFn;i(dx; zi; s). The distribution function
Fn;i is allowed to be a function of the past, that is, Fn;i is a probability transition function from
(<+�n�<p; IFnp_B(<p)) to (<k;B(<k)) where p is the column dimension of Z and k is the column
dimension of X. Assume thatRHi(x; zi;
is; s)Fn;i(dx; zi; s) = 0 which implies (2). De�ne �
fign ,
(Gnfig) to be �i�n, (�iGn) integrated over all �, (xl; zl; l 6= i). Let Jn be the set of all discontinuities
of �n. This set certainly includes the discontinuities of �n. If t 2 Jn then �fign (dx; dz; ftg) =
Fn;i(dx; z; t)Gnfig(dz; t)�n(ftg) and �2n(t)lj = n�1Pn
i=1
Rx;z
Hi(x; z; it; t)lHi(x; z;
it; t)jFn;i(dx; z; t)
Gnfig(dz; t)�n(ftg). Assume that at the continuity points of �n, at most one of the individual marked
point processes can jump, that is,
Zx;z;�;s
Ifs 62 Jng�i�j�n(dx; dz; d�; ds) = 0: (4)
The assumptions of theorem 1 simplify nicely.
Corollary 1
For each t 2 D consider the following assumptions.
1) Asymptotic negligibility.
For all � > 0,
n�1nXi=1
Z t
0
ZHi(x; z;
i
s; s)Hi(x; z; i
s; s)T I�����Hi(x; z;
i
s; s)���� > �
pn�fign (dx; dz; ds)
P�! 0:
7
2) Convergence of the variance.
a)
n�1nXi=1
Z t
0
ZHi(x; z;
i
s; s)Hi(x; z; i
s; s)T�fign (dx; dz; ds)
P�! �t:
b) For all l; j,
Xs�t
Z ����2n(s)lj � n�1nXi=1
ZHi(x; z;
i
s; s)lHi(x; z;
i
s; s)jFn;i(dx; zi; s)�i
���Gn(dz; d�; s)�n(fsg) P�! 0
and (*)
3) Tightness. Xs�t
� dXj=1
�2n(s)jj
�2 P�!Xs�t
� dXj=1
��(s)jj
�2:
30) Tightness in l1([0; � ]). Suppose that for each s, a discontinuity point of � and each j, we have
�2n(s)jj
P�! ��(s)jj :
Under conditions 1) and 2), the �nite dimensional distributions of Mn in D converge to the �nite
dimensional distributions of M. If D is dense on the real line then the additional assumption of 3)
implies thatMn converges weakly with respect to the Skorohod metric on D[0;1) toM. If D = [0; � ]
and conditions 1), 2) and 30) hold, there exists a version of M which is a tight Borel measurable
process in l1([0; � ]) and M converges weakly to M.
As before, the assumption of 2b) is only necessary because the conditional distribution of X given
the time of the jump and the past is a mixture of a convolution of distributions. If the distributions
of the rows of (X;Z; �) are independent given the time of the jump and the past then the �rst part
of 2b) is implied by the assumption (*) on the persistent jumps. Furthermore if the joint conditional
distribution of the rows of (X ;Z; �) and jump time given the past can be expressed as a product of
distributions for �xed s ((3) is a convolution), then assumption 2b) need not be assumed.
A marked point process can be used to model life history data (Arjas and Haara, 1992), in which
case the event times of the process, indicate the ordering in which observations are collected. In many
applications Tj � j. The mark at time Tj is the information collected. A simpli�ed version is as
follows. In contrast to survival analysis there are two indicators of observability. First an indicator of
8
censoring,�j , an n�1 vector with ith entry equal to one if the ith subject leaves the study at \time"
Tj or is presently absent from the study and zero otherwise. Second an indicator of measurement
time, �j , an n � 1 vector with ith entry equal to one if the ith subject contributes both a covariate
Z and a response, X , at \time", Tj and zero otherwise. Subject i can only contribute a response at
time Tj , if �j�1;i = 0. Then (Xj ;Zj ; �j ;�j) is the mark at time Tj .
The marked point process approach is best illustrated using the terminology of a longitudinal
study. This illustration is a generalization of the work by Scheike, (1994). In an longitudinal study
of n individuals, Tj would be time of the jth event (either an appointment or a censoring time); if
individual i contributes a measurement of X at time Tj then �ji = 1 otherwise �ji = 0. Interest
lies in regressing X on Z and possibly other variables measured prior to the present event time.
These other variables can include the present time, and measurements of X and Z made on previous
appointments. Since some appointments may be scheduled a priori at regular intervals (e.g. every
three months), several individuals can contribute measurements at any one time. The times of the
other appointments or censorings may depend on previous measurements of X and Z. Let U (j�1) =
((Tl;Xl;Zl; �l;�l); l < j; Tj ;Zj ; �j) and de�ne, IFTj = �f(Tl;Xl;Zl; �l;�l); l � jg. Let I be
an arbitrary subset of the integers, 1 through n. Assume that responses from di�erent individuals
present at an appointment time are conditionally independent. That is, the conditional independence
assumption is : onQi2I �ji
Qi62I(1� �ji) = 1,
P [Xji � xi; i 2 IjU (j�1)] =Yi2I
Fn;i(xi;Zji; i
t; t)jt=Tj ;
where for each l and t 2 (Tl�1; Tl], it is a function of the variables in FTl�1 (i.e. i is predictable). This
assumption means that onQi2I �ji
Qi62I(1 � �ji) = 1, the Xji; i 2 I are conditionally independent
with distribution functions Fn;i; i 2 I. The conditional distribution, Fn;i, can be parametrized to
re ect the e�ect of the covariates on Xj . A partial likelihood for the response X is
Yj�1
nYi=1
�Fn;i(dXji;Zji;
i
Tj; Tj)
��ji:
By modeling only Fn;i, we model only the probabilistic evolution of X through time and not the
evolution of the other variables, Z and T .
9
If it is possible to completely model the density of Fn;i, say fn;i, as a function of a parameter,
�, then a partial likelihood analysis would be based on the partial likelihood score. That is the lo-
cal martingale, Mn is given by putting Hi =@
@�ln�fn;i
�. On the other hand, if one is willing to
parametrize at most the conditional mean and conditional variance of Xj , then an analysis based
on the projected partial score is still possible. Suppose that the conditional mean and variance of
Fn;i(�;Zji; iTji ; Tji) is given by �ji = �i(Zji; i
Tji; Tji;�), and Vji = Vi(Zji;
i
Tji; Tji;�), respectively.
The projected partial score is given by,Pj�1
Pn
i=1@
@��jiV
�1ji
(Xji � �ji)�ji (Murphy and Li, 1993).
Here, put Hi(x; z; it; t) = @
@��i(z;
it; t;�)V �1
i(z; i
t; t;�)(x� �i(z;
it; t;�)) to form the local martin-
gale, Mn. Scheike (1994) considers examples in which there are no Z's so that the mean and variance
of Xj are functions of i
Tjonly.
3. EXAMPLES
1.Conditionally Independent Marked Point Processes
A marked point process which satis�es, (1) and (4) can be formed by combining conditionally
independent nonexplosive point processes. Conditionally independent point processes share with the
multivariate point process, the property, that at time points which are continuity points of the compen-
sator only one of the component processes is allowed to jump. Suppose that on a given stochastic basis,
there exists nmarked point processes, Ni; i � 1 each with marks (Xj ; Zj) 2 E; the mark space; j � 1
and locally integrable compensators, �i; i � 1. De�ne the composite marked point process, Nn to
have event times, Tj , at the ordered event times of the Ni's and the ith row of the mark at Tj to
be the empty set if Ni does not jump at Tj and to be the mark from Ni if Ni does jump at Tj . In
addition add to the mark at time Tj , the n by one vector �j , with i row equal to one if Ni jumps at
Tj and zero otherwise. So the mark at time, Tj , is (Xj ;Zj ; �j) which has n rows, one for each of the
Ni's. The conditional independence assumption is;
Xs��
Yi2S
Ni(Ai; fsg)�Z �
0
Yi2S
�i(Ai; ds)
10
is a local martingale for all S, subsets of f1; 2; : : : ; ng and Ai, measurable subsets of the mark space,
E. This means (use the monotone class theorem) that the compensator of Nn is �n(dx; dz; d�; ds) =
If�� � 1gQn
i=1 �i(dxi; dzi; ds)�i(1� �i(E; ds))
1��i�;(dxi; dzi)1��i . The dot in the place of the index
for � indicates that � is summed over the index. Note that at the continuity points (in s) of �n
only one of the Ni can jump, that is, (4) holds. So conditionally independent marked point pro-
cesses behave like a multivariate point process at the continuity points of �n. Write �i(dxi; dzi; ds) as
Fi(dxi; zi; s)Gi(dzi; s)�i(ds). Then,�n(dx; dz; d�; ds) = If�� � 1gQn
i=1
�Fi(dxi; zi; s)Gi(dzi; s)�i(ds)
��i�(1� �i(ds))�;(dxi; dzi)
�1��iand �
fign (dx; dz; ds) = Fi(dx; z; s)Gi(dz; s)�i(ds).
To write �n in the form speci�ed by (1), put Fn;i(dv; �; s) =RIf(n�1=2Hi(x; zi;
is; s)�i 2
dvgFn;i(dx; zi; s)Gi(dzi; s), put Gn(d�; s) = If�� � 1gQ
n
i=1
��i(ds)
��i�1��i(ds)
�1��i1�Q
n
i=1(1��i(ds)) and put �n(ds)
= 1�Qn
i=1(1� �i(ds)).
IfR t0
R �Pn
i=1Hi(xi; zi; is; s)�i
�2�n(dx; dz; d�; ds) < 1 holds for all t then Mn (de�ned in the
previous section) is a locally square integrable martingale, and since (3) can be written as a convo-
lution for �xed s, only assumptions 1), 2a) and 3) of the corollary need be veri�ed in order to prove
weak convergence forMn. Assumptions 1) and 2a) are identical to the assumptions (2.5.1) and (2.5.3)
of Rebolledo's theorem as stated by Andersen et. al. (1993). The only additional assumption here is
3) concerning the persistent jumps.
2.Nelson-Aalen Estimator for Cumulative Hazards with Jumps
Suppose that for each n, there exists mutually independentX1n; X2n; : : : ; Xnn and U1n; U2n; : : : ; Unn
where each Xin has distribution function F and integrated hazard rate, A(�) = R �0(1�F (u�))�1dF (u)
and each Uin has distribution function G. In the following we consider estimation of A. Let
NX
i(t) = IfXin � tg, NU
i(t) = IfUin � tg and Yi(t) = IfXin ^ Uin � tg for all t 2 <+. The
�ltration, IF, is the internal �ltration of the (NXi; NU
i)'s. These processes can be represented by a
marked point process with event times at each of the Uin and Xin and marks indicating which of the
Uin and Xin occur at each event time. Then IF is the internal �ltration for this process and Jacod's
formula for the compensator (Andersen et. al. pg. 96, 1993) yields,R �0
Qi2S
�IfXin � ugdA(u)�
11
as the predictable compensator ofPs��
Qi2S N
X
i(fsg) for S any subset of 1; : : : ; n. Furthermore,
since IfUin � ug is predictable,Ps��
Qi2S IfUin � ugNX
i(fsg) � R �
0
Qi2S
�Yi(u)dA(u)
�forms a
zero mean martingale. Put Ni(t) =Rt
0IfUin � ugdNX
i(s) and J(s) = IfY�(s) > 0g (the dot in
place of the index denotes the sum over that index). Note that the above form of the compen-
sator forPs��
Qi2S Ni(fsg) implies the conditional independence of the Ni as de�ned in the last
section. De�ne Nn to be the point process with event times, Tj at the ordered event times of the
Ni's and at time Tj , the mark, �j , with ith row of �j equal to 1 if Ni jumps at time Tj and zero
otherwise. The conditional independence property implies that the compensator of Nn is given by
�n(d�; ds) = If�� � 1gQn
i=1
�Yi(s)A(ds)
��i�1� Yi(s)A(ds)
�1��iIf�� > 0g.
We are unable to observe the (NX
i; NU
i)'s, rather we observe only the (Ni; Yi)'s. The Nelson
Aalen estimator is de�ned for t such that A(t) is �nite and is given by, A(t) =Rt
0J(s)Y�(s)�1dN�(s)
(in the integrand, interpret 0=0 as 0). The form of the compensator given in the last paragraph
implies that the compensator of A isR t0J(s)dA(s). An asymptotic analysis of the Nelson-Aalen
estimator is based on Xn(t) =pn�A(t) � R
t
0J(s)dA(s)
�and on
pnRt
0J(s) � 1 dA(s). Suppose
that for a speci�ed T , A(T ) < 1 and G(T�) < 1. The Glivenko-Cantellli theorem implies that
sup0�t�T jn�1Y�(t)� (1�F (t�)(1�G(t�)j converges in probability to zero which further implies that
sup0 jpnRt
0J(s)� 1 dA(s)j goes to zero in probability.
Since J(s)Y�(s)�1 bounded above by 1, Xn is a square integrable martingale (see Andersen et.
al., 1993, pg. 181) and Theorem 1 can be used to derive the asymptotic distribution for Xn on [0; T ].
Put Mn(t) = Xn(t ^ T ) and let Jn be the set of discontinuities of A. The marked point process, �n
recording the jumps of Mn is given by
�n(dx; dt) = Ift � TgZ�
Ifx 6= 0g�pnJ(t)Y
�1� (t)(���Y�(t)4A(t))(dx)Nn(d�; dt)+
Ift � TgIfx 6= 0g�(�pnJ(t)4A(t))(dx)�1�
Z�
Nn(d�; dt)�:
The compensator of �n, �n, is given by same formula as above but with�n in place ofNn. For t a conti-
nuity point of A, �n(dx; dt) simpli�es considerably to �n(dx; dt) = Ifx 6= 0gY�(t)�(pnJ(t)=Y�(t))(dx)A(dt).
Otherwise note that Fn;i(�; t) is the distribution ofpn(B � Yi(t)4A(t))J(t)=Y�(t) where B is a
Bernoulli with success probability, Yi(t)4A(t) and �n(dt) = �fs:4A(s)6=0g(dt). This means that (3)
12
can be written as a convolution and assumption 2b) is not necessary. Assumptions 1a), 1b) and 2a)
all follow by the Glivenko- Cantelli theorem applied to n�1Y� and the assumption that (1�F (T�)(1�
G(T�) > 0. The equation in assumption 1a) isRt^T0
nJ(s)=Y�(s)IfJ(s)=Y�(t) > �=pn�gAc(ds). And
to verify assumption 1b, note thatPs2Jn;s�t^T
Pi
Rx2Ifjxj > �gFn;i(dx; s)�n(fsg) =
Xs2Jn;s�t^T
nJ(s)=Y�(s)(1�A(fsg))2A(fsg)IfJ(s)=Y�(s)(1�A(fsg)) > �=png+
Xs2Jn;s�t^T
nJ(s)=Y�(s)A(fsg)2(1�A(fsg))IfJ(s)=Y�(s)A(fsg) > �=png:
SinceR t^T0
nJ(s)=Y�(s)(1�A(fsg))dA(s) converges in probability toR t^T0
�(1�F (t�)(1�G(t�)��1(1�
A(fsg))dA(s) assumption 2a) holds. This is su�cient for �nite dimensional convergence to a Gaus-
sian martingale with covariance function, �(t) =Rt^T0
�(1� F (s�)(1�G(s�)��1(1� A(fsg))dA(s).
Functional weak convergence using the Skorohod metric follows also but this problem is nice enough
so that it is possible to prove functional convergence on the space of bounded functions on [0; T ],
called l1[0; T ]. All that is necessary is to prove asymptotic tightness. It is su�cient to prove that
for each �, � > 0 there exists a �nite partition of [0; T ], say 0 = t1 <; : : : ; < tk = T , such that
lim supnP�maxi supt2[ti;ti+1) jMn(t) �Mn(ti)j > �
�< � (van der Vaart and Wellner, 1993). This
is easily enough done by choosing the partition to contain the larger jump points of A and using
Lenglart's inequality (see Andersen et.al., 1993). Gill (1980) proved a similar result for the Kaplan-
Meier Estimator of F by inserting an interval at the jump points of A. This theorem yields a quick
proof in that setting also.
3. The Proportional Hazards Model.
The marked point process, Nn is the composite of an n-variate multivariate counting process. The
event times, Tj , are the ordered event times of the component counting processes and the mark at
time Tj is simply �j where �ji = 1 if the ith counting process jumps at time Tj and is zero otherwise.
The ith counting process has intensity, Yi(t)e�TZi(t)�(t)dt; where the covariate process, Zi is locally
bounded and predictable and the baseline intensity, � is locally integrable. The process, Yi is also
predictable and is the censoring process, that is, Yi is one as long as it is possible to observe a jump of
13
the ith process and zero thereafter. Since the components of a multivariate counting process can not
have common jumps, the component counting processes are conditionally independent. This implies
that the marked point process has compensator,
�n(d�; dt) = If�� � 1gnYi=1
�Yi(t)e
�TZi(t)
��i�(t)dt:
Note that �fign is the intensity of the ith counting process. The derivative with respect to � of the
natural log of Cox's partial likelihood (Andersen et.al., 1993 pg. 483) is given by,
Z nXi=1
�i
�Zi(t)�
PlYl(t)e
�TZl(t)Zl(t)P
lYl(t)e�
TZl(t)
�Nn(d�; dt):
The above is a local square integrable martingale, since the Zi's are locally bounded. De�ne Mn by
setting Hi =�Zi(t) �
PlYl(t)e
�TZl(t)P
lYl(t)e
�T Zl(t)
�. Since �n is continuous, only assumptions 1) and 2a) need to
be proved. These are the conditions of Rebolledo's theorem as stated in Andersen et. al. (1993, pg.
83). This model can be developed as in the previous example above (see Andersen et. al., 1993).
5. APPENDIX
Proof of Theorem 1
For simplicity, the proof for convergence of the �nite dimensional distributions is given for a
one dimensional local square integrable martingale, Mn, with Mn(0) = 0. The Cram�er-Wold device
can then be used to extend the result to higher dimensions. The superscripts c; d will denote the
continuous, respectively discrete, parts of the martingale and compensators.
Intuitively the conditional distribution of Mn(dt) given the past is a convolution of a continuous
gaussian increment Mcn(dt) with mean zero and variance, hMc
ni(dt) and a random variable Md
n(dt)
which assumes the value x�Ryy�n(dy; dt) according to, �n(dx; dt) and the value, �
Ryy�n(dy; dt) with
probability, (1��n(<; dt). SinceMn(0) = 0 implies thatRxx�n(dx; ftg) is zero a:s:,Md
n(ftg) will have
conditional distribution �n(dx; ftg) + �0(dx)(1 � �n(<; ftg)) (recall that �0 is a probability measure
giving mass 1 to the point 0). The characteristic function (conditional on the past) of the random
variable, (Mdn(dt)), is given by, e
�iuRx
x�n(dx;dt)�1 +
Reiux � 1 �n(dx; dt)
�, it's conditional mean is
14
Rxx�n(dx; dt) and it's conditional variance is
Rxx2�n(dx; dt). This intuition helps one understand why
the following proof works.
Theorem VIII.1.18 on page 418 of Jacod and Shiryaev (1987) gives a product integral which acts
much as a characteristic function when the limiting process is a process of independent increments
(as is the case here). This product integral is the product of the conditional characteristic functions
of Mn(dt). De�ne the process of locally bounded variation, An(t;u) = �1=2u2hMcni(t) +
R t0
Reiux �
1� iux �n(dx; ds). Then the product integral is
Gn(t;u) =Ys�t
(1 +An(ds;u)):
Since An is one dimensional andRx�n(dx; ftg) is zero, Gn simpli�es to
Gn(t;u) = exp
�� 1=2u2hMc
ni(t)�Ys�t
�1 +
Zx
eiux � 1 �n(dx; ds)
�e�iu
Rx
x�c
n(dx;ds)
(Andersen et. al. pg. 90, 1993). Theorem VIII.1.18 states that if Gn(t;u) converges in probability
to exp� � 1=2u2�(t)
for all u 2 < and for all t 2 D, then the �nite dimensional distributions in D
of Mn converge to the �nite dimensional distributions in D of a Gaussian martingale with covariance
function, �.
Set �n = exp� � 1=2u2hMc
ni(t) � 1=2u2
Rt
0
Rx2�n(dx; ds)
and �n = exp
� Rt
0
Reiux � 1 � iux+
1=2u2x2�cn(dx; ds): Of course, �n converges in probability to exp
�� 1=2u2�(t)by assumption 2a)
and in the next paragraph it is proved that �n converges to one in probability. Consider
Gn(t;u)� �n�n = exp
�� 1=2u2hMc
ni(t) +
Zt
0
Zeiux � 1� iux �c
n(dx; ds)
�
"Ys�t
�1 +
Zeiux � 1 �n(dx; fsg)
�� exp
�� 1=2u2
Xs�t
Zx2�n(dx; fsg)
�#:
The term before the square brackets can be shown to be bounded in probability by using the results
on �n and �n. All that needs to be done is to show that �n goes to one in probability and that the
above term in square brackets goes to zero in probability.
Since there is a constant, C for which jeiux � 1� iux+ u2x2=2j � C(jxj3 ^ x2), the exponent in
�n can be bounded in absolute value by CRt
0
R jxj3 ^ x2�cn(dx; ds). For � small,
Zt
0
Zjxj3 ^ x2�c
n(dx; ds) �
Zt
0
Zx2Ifjxj > �g�c
n(dx; ds) + �
Zt
0
Zx2Ifjxj � �g�c
n(dx; ds)
15
But this goes to zero in probability because of assumption 1a), assumption 2a) and the fact that �
may be taken arbitrarily small.
All that is left is to deal with the jumps of Mn. Intuitively, the term,
Ys�t
�1 +
Zeiux � 1 �n(dx; fsg)
�� exp
�� 1=2u2
Xs�t
Zx2�n(dx; fsg)
�
is just the di�erence between the conditional characteristic function of the jumps and the characteristic
function of a normal random variable with the same variance as the jump.
Both terms above can be written as two products, the �rst over the persistent jumps (in Jn)
and the second over the asymptotically negligible jumps (jumps not in Jn). Since A1A2 � B1B2 =
(A1 �B1)A2 +B1(A2 �B2) all that is necessary is to prove that
Ys�t;s62Jn
�1 +
Zeiux � 1 �n(dx; fsg)
�� exp
�� 1=2u2
Xs�t;s62Jn
Zx2�n(dx; fsg)
�P�! 0 (5)
and
Ys�t;s2Jn
�1 +
Zeiux � 1 �n(dx; fsg)
�� exp
�� 1=2u2
Xs�t;s2Jn
Zx2�n(dx; fsg)
�P�! 0: (6)
The assumptions, (1), 1b) and 2b), are only used in proving (6) for the persistent jumps.
First (5) is proven as it is the easier. Using Taylor series expansions, it is possible to prove that
for any �,
jeiux � 1� iux+ u2x2=2Ifjxj � �gj � juxj3=6Ifjxj � �gj+ (tx)2=2Ifjxj > �gj:
The notation to follow is made simpler by denoting En;s as expectation with respect to the distribution,
�n(dx; fsg) + �0(dx)(1 � �n(<; fsg). Note that this distribution has mean zero and variance �2n(s).
Let f�ng��1 be a sequence converging so slowly to zero so that 1a) holds for �n in place of �. Using
the last inequality given above, one gets that
En;s(eiux)� 1 + u2=2En;s(x
2Ifjxj � �ng) = n;s
� juj36�nEn;s(x
2Ifjxj � �ng) + u2=2En;s(x2Ifjxj > �ng)
�
for n;s which is bounded in absolute value by one. Denote
�u2=2En;s(x2Ifjxj � �ng) + n;s
�juj3=6�nEn;s(x2Ifjxj � �ng) + u2=2En;s(x
2Ifjxj > �ng)�
16
by n;s. These terms, n;s; s 62 Jn possess the following nice properties,
1) the sups62Jn; s�t j n;sj converges to zero in probability and
2) thePs62Jn; s�t j n;sj is bounded in probability.
These two properties follow from assumptions 1a) and 2a). The proof of this follows the proof in
Chung (1974, pg. 200).
Rewrite equation (5) as,"exp
� Xs62Jn; s�t
ln(En;seiux) +
1
2u2
Xs62Jn; s�t
�2n(s)
�� 1
#e�1=2u2
Ps62Jn; s�t
�2n(s):
SincePs62Jn; s�t �
2n(s) is bounded in probability, it is su�cient to show that,
Xs62Jn; s�t
ln(En;seiux) +
1
2u2
Xs62Jn; s�t
�2n(s) =X
s62Jn; s�tln(1 + n;s) +
1
2u2
Xs62Jn; s�t
�2n(s)
converges to zero in probability in order to �nish the proof of (5). Since the sups62Jn; s�t j n;sj
converges to zero in probability, j ln(1+ n;s)� n;sj = jP1m=2
(�1)m�1m!
mn;sj can be bounded above by
j n;sj2=2P1m=2(1=2)
m�2 � j n;sj2 on a set of probability going to one. Now,
���� Xs62Jn; s�t
ln(1+ n;s)+1
2u2
Xs62Jn; s�t
�2n(s)
���� ����� Xs62Jn; s�t
ln(1+ n;s)� n;s����+���� Xs62Jn; s�t
n;s+1
2u2�2n(s)
����:But jP
s62Jn; s�t ln(1 + n;s) � n;sj �Ps62Jn; s�t j n;sj2 on a set of probability converging to 1 and
therefore converges to zero in probability. Also jPs62Jn; s�t n;s +
12u2�2n(s)j can also be shown to
converge to zero using assumptions 1a) and 2a).
Now condition (6) for the persistent jumps must be proved. Assumptions (1), 1b) and 2b) are suf-
�cient to prove (6). As in the above paragraph one only need prove thatPs2Jn; s�t ln[En;s(e
iux)] +
u2=2�2n(s) goes to zero in probability. Note that for s 2 Jn, the expectation, En;s, is with re-
spect to the distribution,R�(�iFn;i)(dx; �; s)Gn(d�; s)�n(fsg) + �0(dx)(1 � �n(fsg)) and �2
n(s) =R P
i
Rx2Fn;i(dx; �; s)Gn(d�; s)�n(fsg). Further, recall that,
Pi
RxFn;i(dx; �; s) = 0 for almost all
�. If necessary, rede�ne the Fn;i(�; �; s) so thatRxFn;i(dx; �; s) = 0. Using the same Taylor series
argument as in the proof of (5), one gets,R(eiux)Fn;i(dx; �; s)� 1 = n;s(�; i) for
n;s(�; i) = �u2=2Z(x2Ifjxj � �ng)Fn;i(dx; �; s)+
in;s
� juj36�n
Zx2Ifjxj � �ng Fn;i(dx; �; s) + u2=2
Z(x2Ifjxj > �ng Fn;i(dx; �; s)
�
17
where in;s
which is bounded in absolute value by one. If�Q
iFn;i
�Gn can also be written as a productdistribution then the above integrals should be with respect to the ith distribution in the product. Or if
the expectation, En;s can be written as an expectation with respect to a product distribution, then the
integrals in the Taylor series should be with respect to each distribution in the product. Then similar
steps to the following can be taken to prove thatPs2Jn; s�t j ln[eu
2=2�2
n(s)En;s(e
iux)]j goes to zero in
probability. Indeed if En;s can be written as an expectation with respect to a product distribution,
then each of the integrals in the above expansion should be replaced by the each distribution in the
product and then assumptions 2b) and (*) are no longer necessary.
All that is necessary to �nish the proof is to show thatPs2Jn; s�t jeu
2=2�2
n(s)En;s(e
iux) � 1j
goes to zero in probability. This is the case, since onPs2Jn; s�t jeu
2=2�2
n(s)En;s(e
iux) � 1j < 1=2,
Ps2Jn; s�t j ln[eu
2=2�2
n(s)En;s(e
�ux)] � 1j is bounded above byPs2Jn; s�t jeu
2=2�2
n(s)En;s(e
iux) � 1j.
This in turn is bounded above by,
Xs2Jn
����Z
expnX
i
ln(1 + n;s(�; i)) + 1=2u2�2n(s)o� 1 Gn(d�; s)�n(fsg)
����+X
s2Jn; s�tj�n(fsg)� 1j: (7)
The last term goes to zero in probability by assumption (*). If jeg1(v)+g2(v) � 1j � C for all v, A is a
set and L is a positive measure, then the integral,
��� Z eg1(v)+g2(v) � 1 L(dv)��� � C � L
�jg1(v)jIfv 2 Ag > � [ jg2(v)j > � [ Ac
�+
2
Zjg1(v)Ifv 2 Ag+ g2(v)jI
njg1(v)jIfv 2 Ag < � \ jg2(v)j < � \ A
oL(dv)
for � su�ciently small. This in turn is bounded above by
C=�
Zjg1(v)Ifv 2 AgjL(dv) + C=�
Zjg2(v)jL(dv)+
CL(Ac)) + 2
Zjg1(v)Ifv 2 AgjL(dv) + 2
Zjg2(v)jL(dv):
To apply this inequality to the �rst term in (7), let �0 be arbitrarily greater than zero and set g1 =
[Piln(1 + n;s(�; i)) �
Pi n;s(�; i)], A = fsupi j n;s(�; i))j � �0g, L(d�) = Gn(d�; s)�n(fsg), C =
18
2eu2=2�2
n(s) and g2 =
Pi n;s(�; i)+u
2=2�2n(s). To �nish the proof the following must converge to zero
in probability;
1)P
s2Jn; s�tR[Pij ln(1 + n;s(�; i))�
Pi n;s(�; i)]Ifsupi j n;s(�; i))j � �0gGn(d�; s)�n(fsg);
2)P
s2Jn; s�tR[jP
i n;s(�; i) + u2=2�2
n(fsg)j Gn(d�; s)�n(fsg) and
3)P
s2Jn; s�t Gn�fsup
ij n;s(�; i)j > �0g; s��n(fsg):
First consider 1). This is bounded above byPs2Jn; s�t
R[Pij n;s(�; i)j2Ifsupi j n;s(�; i))j �
�0gGn(d�; s)�n(fsg) which is equal to Op(1)�0P
s2Jn; s�t �2n(s). Since
Ps2Jn; s�t �
2n(s) is bounded in
probability as n increases, 1) can be made as small as desired by reducing the value of �0. Next using the
de�nition of n;s(�; i) note that 2) is bounded above by the sum of u2=2Ps2Jn; s�t
R[jP
i
Rx2Fn;i(dx; �; s)�
�2n(s)jGn(d�; s)�n(fsg) and u2=2Ps2Jn; s�t
R[Pi
Rx2I jfjxj > �ngFn;i(dx; �; s)Gn(d�; s)�n(fsg) and
Op(1)�nP
s2Jn; s�t �2n(s). All three terms converge to zero in probability. All that is left is 3). It is
bounded above by a constant times 1=�0Ps2Jn; s�t
R[Pi
Rx2I jfjxj > �ngFn;i(dx; �; s)Gn(d�; s)�n(fsg)
plus a constant times (1=�0)�2nPs2Jn; s�t �
2n(s). These two terms also converge to zero in probability.
To prove tightness in the Skorohod metric, assume that D is dense on <d. Theorem VI.5.17 of
Jacod and Shiryaev, (1987) implies that it is su�cient to show that the predictable variation of Mn
converges in the Skorohod metric. That is, there exists a nondecreasing deterministic function, �, for
which,
1) hMnijj(t) P�! �jj(t) for j = 1; : : : ; d and
2)P
d
j=1
Ps�t
�4hMnijj(s)�2 P�!P
d
j=1
Ps�t
�4�jj(s)�2,
for all t 2 D. The predictable variation ofMn is given by hMni(�) = hMcni(�)+
R �0
RxxT �n(dx; ds).
So 1) is assumption 2a) and 2) is assumption 3).
To prove tightness in l1([0; � ]), we must prove that for any �; � > 0 there exists a partition,
0 = t1 < t2 <; : : : ; < tk = � , for which
lim supnP (supi
supt2[ti;ti+1)
jMn(t)�Mn(ti)j > �) < �:
For a reference see van der Vaart and Wellner (1993, chapter 1, section 4). Choose the partition to con-
tain all of the (�nitely many) jumps of � which are of size greater than �=4. In addition choose the par-
tition �ne enough so that �(ti+1�)��(ti) < �=4. To show thatPiP (supt2[ti;ti+1) jMn(t)�Mn(ti)j >
19
�) < �, employ Lenglart's inequality as proved in Jacod and Shiryaev (1987, pg. 35). Assumption 30
is used in proving that hMni converges uniformly to �.
Proof of Corollary 1
Since (4) is assumed, 2) and 3) of theorem 1 translate directly into the assumptions 2) and 3) of
this corollary. The equation in assumption 1b) of theorem 1 is
n�1nXi=1
Xs2Jn;s�t
ZHi(x; z;
i
s; s)Hi(x; z; i
s; s)T I�����Hi(x; z;
i
s; s)���� > �
pnFn;i(dx; z; s)Gnfig(dz; s)�n(fsg):
Since there are no non-persistent jumps, the equation in assumption 1a) is
Zt
0
ZxxT Ifjjxjj > �g�c
n(dx; ds):
The above can be expressed in terms of �n;
Z t
0
Zn�1
� nXi=1
H(xi; zi; i
s; s)�i
�� nXi=1
H(xi; zi; i
s; s)�i
�TIfjj
nXi=1
H(xi; zi; i
s; s)�ijj > �png�n(dx; dz; d�; ds):
Recall that no two of the �i's can be one at the continuity points of �n, so that the summation over
i comes out of the square term and the indicator term, resulting in assumption 1) of the corollary.
20
5. REFERENCES
Andersen, P.K., �. Borgan, R.D. Gill, and N. Keiding (1993). Statistical models based on counting
processes. Springer Verlag, New York.
Arjas, E. and P. Haara (1992). Observation scheme and likelihood. Scand. J. Statist. 19, 111-132.
Chung,K.L. (1974). A course in probability theory. Academic Press, Inc., San Diego.
Gill, R.D. (1980). Censoring and stochastic integrals. Mathematical Centre Tract 124. Mathematisch
Centrum, Amsterdam.
Gr�uger, J., R. Kay, and M. Schumacher (1991). The validity of inferences based on incomplete
observations in disease state models. Biometrics 47, 595-605.
Jacod, J. and A.N. Shiryaev (1987). Limit theorems for stochastic processes. Springer Verlag, New
York.
Liptser, R.S. and A.N. Shiryaev (1983). On the invariance principle for semi-martingales: the
\nonclassical"case. Theory Prob. Appl. 28, 1-34.
Murphy, S.A. and B. Li (1993). Projected partial likelihood and its application to the longitudinal
data. to appear in Biometrika.
Scheike, T.H. (1994). Parametric regression for longitudinal data with counting process measurement
times. Scand. J. Statist. 21, 245-264.
van der Vaart, A.W. and J.A. Wellner (1993). Weak convergence and empirical processes. IMS Lecture
Notes-Monograph Series (to appear).
Susan A. Murphy
Department of Statistics, Pennsylvania State University,
326 Classroom Building, University Park, PA 16802, U.S.A.
21