Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
GENERALIZED LONGITUDINAL DATA ANALYSIS, WITH
APPLICATION TO EVALUATING HOSPITAL UTILIZATION BASED
ON ADMINISTRATIVE DATABASE
Shih-Wa Celes Ying
B.Sc. in Statistics and Actuarial Science, Simon Fraser University, 2004
A P R O J E C T SUBMITTED IN PARTIAL FULFILLMENT
O F T H E REQUIREMENTS F O R T H E DEGREE O F
MASTER OF SCIENCE
in the School
of
Statistics and Actuarial Science
@ Shih-Wa Celes Ying 2006
SIMON FRASER UNIVERSITY
Summer 2006
All rights reserved. This work may not be
reproduced in whole or in part, by photocopy
or other means, without the permission of the author.
APPROVAL
Name: Shih-Wa Celes Ying
Degree: Master of Science
Title of project: Generalized Longitudinal Data Analysis, with Application
to Evaluating Hospital Utilization based on Administrative
Database
Examining Committee: Dr. Richard Lockhart
Chair
Dr. X. Joan Hu Senior Supervisor Simon Fraser University
Dr. Rachel Altman Supervisor Simon F'raser University
Dr. John Spinelli External Examiner BC Cancer Agency and Simon F'raser University
Date Approved: 3,14 I$. 1006
SIMON FRASER UN~~RSITYI i bra ry
DECLARATION OF PARTIAL COPYRIGHT LICENCE
The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.
The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.
The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.
It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.
Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.
The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.
Simon Fraser University Library Burnaby, BC, Canada
SIMON FRASER U N ~ ~ E R ~ I W ~ i bra ry
STATEMENT OF ETHICS APPROVAL
The author, whose name appears on the title page of this work, has obtained, for the research described in this work, either:
(a) Human research ethics approval from the Simon Fraser University Office of Research Ethics,
(b) Advance approval of the animal care protocol from the University Animal Care Committee of Simon Fraser University;
or has conducted the research
(c) as a co-investigator, in a research project approved in advance,
(d) as a member of a course approved in advance for minimal risk human research, by the Office of Research Ethics.
A copy of the approval letter has been tiled at the Theses Ofice of the University Library at the time of submission of this thesis or project.
The original application for approval and letter of approval are filed with the relevant offices. Inquiries may be directed to those authorities.
Simon Fraser University Library Burnaby, BC, Canada
Abstract
There are many practical situations where subjects can experience recurrence of an event,
the event has non-negligible duration, and both the rate of the event occurrences and the
accumulative event duration are of particular interest. Well-developed methods for recurrent
events analysis do not take into account the event duration, which could lead to undesirable
inferences in the situations. Motivated partly by the research project with BC Cancer
Agency to evaluate the hospital utilization of young cancer survivors, we develop a method
to analyze recurrent event data with adjustment for event duration. Our methodology
can be viewed as an extension of the well-established approaches for recurrent events. We
also propose an approach to fitting semiparametric models for a general response process,
which includes counting process as a special case. Data from the cancer project are used
throughout the thesis to illustrate our formulation and approaches.
Keywords: counting process, data collected over time, event-duration, semiparametric
regression model
To my father and my mother
Acknowledgements
It is never enough for me to thank my Supervisor, X. Joan Hu, who guided me through the
program with patience and encouragement, helped me build up my confidence and reach for
high standards. lATithout her time devoted to it, this thesis would not have been completed.
I would like t o acknowledge the insightful comments and suggestions made by my thesis
committee Rachel Altman and John Spinelli.
It brings me great pleasure to record my thanks to John Spinelli and hlary h1cBride for
providing me such a nice opportunity to work at BC Cancer Agency under their supervi-
sion, which partly motivated my research interest and made the completeness of my thesis
possible. Special thanks go to Gurbakhshash Singh and Maria Lorenzi, who helped me and
gave me excellent advice along the way.
I am indebted to Professors Charmaine Dean, Richard Lockhart, and Tim Swartz, who
have taught me statistics in different fields. I express my appreciation to my fellow graduate
students Lihui, Pritam, Jean, Eric, Darcy, Suman, Amy, Farouk, and Darby, who have been
helpful and kind to me through my graduate study in Simon Fraser University.
To my friends, Chunfang Lin, Lucy Liu, Lisa Wang, and Lijing Xie, thank you for
listening to me, sharing with me, and being with me. My life could not be so colorful
without you.
Finally, I am grateful to my family for care, support and encouragement. To my par-
ents, I cannot thank enough for enlightening my life. To my brother, Leon, thank you for
supporting me and making me happier.
Contents
Approval
Abstract iii
Dedication
Acknowledgements
Contents
List of Figures
List of Tables
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Childhood Cancer Survivorship Project
2.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Childhood Cancer Hospitalization Data . . . . . . . . . . . . . . . . . . . .
2.2.1 Potential Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Data Formulation and Construction . . . . . . . . . . . . . . . . . .
3 Analysis of Recurrent Events
3.1 Review of Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . 3.2 Extensions of Cox PH Model for Recurrent Events . . . . . . . . . . . . . .
3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Various I\lodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Estiination Procedures . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Analysis of Childhood Cancer Data I . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Time-to-Event Formulation . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Analysis Results: Hospital Admissions . . . . . . . . . . . . . . . . .
3.3.3 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Extensions of Cox PH Model for Frequency of Events Adjusted for Event
Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Adjustment for Event Duration I . . . . . . . . . . . . . . . . . . . .
3.4.2 Adjustment for Event Duration I1 . . . . . . . . . . . . . . . . . . .
3.5 Analysis of Childhood Cancer Data I1 . . . . . . . . . . . . . . . . . . . . .
3.5.1 Comparison of Analysis Results: Recurrent Events versus Event-
Duration Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Analysis of Generalized Longitudinal Data
4.1 Analysis of Hospital Duration Using Counting Process Formulation . . . . .
. . . . . . . . . . . . . . . . . 4.1.1 R Data Format for Hospital Duration
. . . 4.1.2 Analysis Results for Day in Hospital Using R Built-in F'unction
4.2 General Response Processes . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Methodology Inheritance . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Analysis of Childhood Cancer Data 111 . . . . . . . . . . . . . . . . . . . . .
4.3.1 Analysis Results: Prevalence of Hospitalization . . . . . . . . . . . .
4.3.2 Analysis Results: Hospital Duration . . . . . . . . . . . . . . . . . .
4.4 Nonparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Estimation Based on Observed Sample Mean (OSM) . . . . . . . . .
4.4.2 Analysis Results Based on OSM for Childhood Cancer Data . . . . .
5 Final Remarks
Appendices
vii
A Estimation Results Using R Built-in Function
B Data Input in C Program
C Estimation Results Based on C Program
Bibliography
List of Figures
. . . . . . . . 1.1 Response Processes with Discrete Time and Continuous Time 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 hlodel Flow-Chart 18
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Strata Flow-Chart 23
3.3 Estimates Based on Andersen-Gill Model vs . Nelson-Aalen Estimates . . . 32
3.4 Estimates Based on Prentice-~Tilliams-Peterson Model vs . Nelson-Aalen Es-
timates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Hypothetic Examples 48
4.2 Observed Sample Mean (OSM) with Three Types of Response of Interest . 55
. . . . . . . . . . . . . . . . . 5.1 Graphical Display of Multi-State Transitions 59
List of Tables
. . . . . . . . . . . . . . 2.1 Cross-Classification of Gender by Diagnosis Period
. . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Frequency of Cancer Diagnoses
3.1 Original Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Time-to-Event Interval Format . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Estimates of P in hIodel (4) and Its Special Cases . . . . . . . . . . . . . . .
3.4 Estimates of P in Model (5) and Its Special Cases . . . . . . . . . . . . . . .
3.5 Time-to-Event Interval Format Adjusted for Event Duration . . . . . . . .
3.6 Estimates of B in Model (4) and Its Special Cases with Additional Time-
Dependent Covariate under Event-Duration Adjustment . . . . . . . . . . .
3.7 Estimates of P in hlodel (5) and Its Special Cases with Additional Time-
Dependent Covariate under Event-Duration Adjustment . . . . . . . . . . .
3.8 Estimates of P in hlodel (4) and Its Special Cases under Event-Duration
Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Estimates of P in Model (5) and Its Special Cases under Event-Duration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adjustment
4.1 Hypothetic Data for an Arbitrary Individual . . . . . . . . . . . . . . . . .
4.2 Time-to-Event Interval Format for Hospital Duration . . . . . . . . . . . . .
4.3 Definition of Stratification to Five Strata . . . . . . . . . . . . . . . . . . .
4.4 Estimates of /3 for Hospital Duration Based on Day in Hospital . . . . . . .
A. l Estimates of P for Hospital Admissions without Event-Duration Adjustment:
Model (4) Related Estimates with Robust Standard Errors . . . . . . . . .
A.2 Estimates of P for Hospital Admissions without Event-Duration Adjustment:
hlodel (5) Related Estimates with Robust Standard Errors . . . . . . . . .
A.3 Estiinates of B for Hospital Admissions with Additional Time-Dependent
Covariate under Event-Duration Adjustment: AIodel (4) Related Estimates
with Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . .
A.4 Estimates of B for Hospital Admissions with Additional Time-Dependent
Covariate under Event-Duration Adjustment: hlodel (5) Related Estimates
with Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . .
A.5 Estimates of B for Hospital Admissions under Event-Duration Adjustment:
hlodel (4) Related Estimates with Robust Standard Errors . . . . . . . . .
A.6 Estimates of /3 for Hospital Admissions under Event-Duration Adjustment:
Model (5) Related Estiinates with Robust Standard Errors . . . . . . . . .
A.7 Estimates of 3 for Hospital Duration Based on Day i n Hospital with Esti-
mated Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . .
B.l Input hlodel Specification and Response Process in C Program . . . . . . .
C.l C Program Output for P Estimates with Several Models . . . . . . . . . . .
C.2 C Program Output for /3 Estimates with Several Models under Event-Duration
Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.3 C Program Output for P Estimates in Model (5a) for Hospital Status . . . C.4 C Program Output for j3 Estimates with Several Models for Hospital Duration 75
Chapter 1
Introduction
1.1 Motivation
Health services play a crucial role in people's everyday life, and it is important to evaluate
health utilization, such as hospitalization, especially for those who are diagnosed with serious
diseases, say cancer. Risk factor identification for health services in a research project could
eventually be translated and delivered to policymakers and care providers for improvement
of the nation's quality of life.
The well-known Cox proportional hazards model (Cox, 1972) can be directly applied
to study risk factors to time to the first hospitalization. To analyze the recurrent hospital
admissions, we can apply well-developed methods for recurrent events. Particularly, Ander-
sen and Gill (1982) assume the event counts overtime follow the Poisson assumption and
propose an estimation approach accordingly. Whereas, Prentice, Williams and Peterson
(1981) consider a stratified proportional intensity type of model, which allows us to study
covariate effects within each stratum as well as the overall effects. Other intermediate mod-
els are considered by combining the simplicity of Andersen and Gill (AG) model and the
stratified property of Prentice, Williams and Peterson (PWP) model to further investigate
some scientific questions of interest.
Those conventional methods for recurrent event data analysis deal with "point" events,
that is, an event only happens at some time point, and the subject is back to be at risk for
CHAPTER 1. INTRODUCTION 2
the nest occurrence of the event right after the time point. This situation is very colnmon
for recurrent elrents such as relapses of asthma attacks, which occur instantaneously with
negligible length of time. However, it is not the case for, for esample, hospital admissions
of childhood cancer patients. Complex treatments and careful surveillance are needed after
patients admitted into hospital, hence patients usually stay in hospital for a period of time.
It is impossible for a patient to have another hospital admission during the time he/she is
in hospital. For this type of event, conventional methods need to be modified to take event
duration into account.
In addition to the use of hospital admissions as an evaluation of health utilization,
other quantities such as prevalence of hospitalization and hospital duration are appealing
to investigators as well. We therefore extend the response process from a counting process
to a general response process to gain the flexibility and possibility of evaluating health
utilization from different aspects.
This thesis reviews available methods in recurrent events analysis, and presents an
extension with adjustment for event duration. In addition, we consider different types of
response, i.e., generalized longitudinal data, not restricting ourselves to count data.
1.2 General Framework
Notation we use throughout the rest of the thesis is briefly introduced here. We denote the
response process as { N ( t ) : 0 < t < oa) to emphasize the events of interest are counts, where
N ( t ) is the cumulative number of failureslevents prior to time t for an arbitrary subject. We
later extend the response process t o a general process and denote it as { X ( t ) : 0 < t < oo),
which includes counting process as a special case. The following gives some special cases
as examples for { X ( t ) : t > 0). A graphical illustration for these response processes is
displayed in Figure 1.1.
Example 1. Survival Process of Time to the First Hospital Admission: Let T > 0 be
the time to an event of interest (i.e., the first hospital admission) and define the response
CHAPTER 1. INTRODUCTION
process as X (t) = I (T 5 tj. That is,
Figure 1.1 (a) continuous-time shows the survival process.
Example 2. Counting Process as Accumulative Counts of Hospitalization Over Time:
Let the event of interest be hospital admission and define the response process, X( t ) , be the
number of hospital admissions up to time t. Figure 1.1 (b) illustrates the counting process.
Example 3. Alternating Binary Process as an Indicator of Hospital Status Over Time:
Using two states t o capture hospitalization status of an individual (i.e., whether the patient
is in hospital or not), define the response process as an indicator of patient's hospital status,
X( t ) = I ( in hospital). That is,
1 if the patient is in hospital, X( t ) =
( 0 otherwise.
See Figure 1.1 (c) for an illustration of the alternating binary process.
Example 4. Response Process as Cumulative Hospital Duration: No matter what time
scale we use, the discrete- or continuous-time, cumulating the alternating binary process
over time gives the response process for hospital duration. Figure 1.1 (d) shows the response
process as for hospital duration over time.
A censoring indicator, Y(t) = I ( C 2 t) , where C is the censoring time, is introduced to
indicate whether the subject drops off the study or not over his/her follow-up period.
Suppose there are p factors of interest, let Z(t) = (Zl(t), . . , Zp(t))' denote a vector
of covariates available a t time t > 0. The corresponding covariate process up to time t
is denoted by {Z(t) : t > 0). We model the intensity function, the instantaneous rate of
an event given the covariate process up to date and response history, and estimate the
parameters in the models to make inferences and to address scientific questions of interest.
CHAPTER 1. INTRODUCTION
CHAPTER 1. INTRODUCTION
1.3 Overview
Adjustment for event duration and the extension of response process are motivated partly
by the research project with BC Cancer Agency to evaluate the hospital utilization of child-
hood cancer survivors. This thesis studies data collected over time with particular attention
centered on estimation of covariate effects. It is organized as follows Chapter 2 provides the
background of childhood cancer data and available hospital records based on an on-going
research project from BC Cancer Agency. This data will be analyzed for illustration using
the approaches proposed in the thesis as well as the well-established approaches. Chapter
3 reviews Cox proportional hazards model, counting process formulation, and multiplica-
tive risk models that have been proposed in the literature to address recurrent events.
Extensions of existing Cox regression models are considered and estimation procedures of
the parameters as well as variance estimation are then provided. Chapter 4 extends the
approaches for the counting process t o a general response process. Inheritance and modifi-
cation of the methodology from Chapter 3 are presented. Chapter 5 gives conclusions and
provides remarks on further investigations.
Chapter 2
Childhood Cancer Survivorship
Project
Although historical records show that only 1% of all cancers are diagnosed in the age range
0 to 19 years in Canada, childhood cancer is among top three leading causes of death for
children of age 1 to 14. Recently, successful improvements in treatment have increased
survival to almost 80%, compared to mid 90's when there is only a little hope of being
cured (Hewitt, Weiner and Simone, 2003). As a consequence, the population of childhood
cancer survivors has grown dramatically. The primary goal of a health services research
project with BC Cancer Agency is to address issues related to assessment of long-term
resource needs and development of strategies to improve access and effectiveness of care.
To accomplish this, statistical methods are required to analyze health services utilization
for the young cancer survivors.
2.1 Data Sources
Among the growing population of cancer survivors, it is important to describe prevalence
and patterns of health services utilization as well as the relative risk of hospitalization
among several factors.
The data sources are based on the availability of population-based files of vital events,
CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT 7
cancer, and health care utilization in Canada. The analyses of health risks and utilization
are through the use of linked health files that are originally from the BC Vital Statistics
Agency, BC Cancer Registry, and BC Ministry of Health. These source files are administered
by the Linked Health Database of the Center for Health Services and Policy Research
(CHSPR) at University of British Columbia.
An unselected geographically-defined patient group, a virtually complete set of treat-
ment data, and individual-level utilization data are available in sufficient level of complete-
ness, detail and reliability t o examine health risks and utilization. Approximately 90% of
health services are covered by provincial health insurance (Health Insurance BC, or Ned-
ical Services Plan), which includes all medically required services for each individual from
1986 to the present. Because of the person-based, longitudinal nature of the data files, ad-
ministrative datasets which will be used here contain more information than studies using
self-reports or medical records on severe late effects and health services utilization.
2.2 Childhood Cancer Hospitalization Data
Residents in the province of British Columbia, who were diagnosed with cancer at age 0
to 19 between 1970 and 1995 and survived a t least 5 years after diagnosis are included in
the childhood cancer survivor cohort. Childhood survivors who survived at least five years
post diagnosis are included in the study to evaluate their long-term resource needs, as this
group is at a high risk of complications of their cancer and its treatment.
Since the database for hospitalization records existed only from 1986 onwards, we con-
sider childhood cancer survivors who were diagnosed with cancer from 1981 to 1995 in the
study in order t o fully assess their health services information from their 5-year survival
date. The resulting population is the study group for analysis of hospital utilization which
consists of 1375 childhood cancer survivors.
2.2.1 Potential Risk Factors
Several potential factors are associated with health services utilization. They include disease
factors such as initial diagnosis and treatment, demographic factors such as gender, social
CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT 8
economic status (SES) and time since diagnosis, and location of services (geographic region,
urban/rural). Of those potential factors. four covariates will be used in the thesis for
illustrative purposes, which are cancer diagnosis. age. gender, and diagnosis period.
The cancer diagnosis group is defined according to the International Classification of
Childhood Cancer (ICCC), which is based on the third edition of the International Clas-
sification of Diseases for Oncology (ICD-0-3) published in 2000. There are seven groups
of interest which include leukemia (ICCC I), lymphoma(1CCC II), central nervous system
(CNS: ICCC 111), kidney (ICCC IT) , bone (ICCC VIII or IdY), carcinoma (ICCC X or
.IWI), and others (ICCC IV7 I/: 1111. and XII).
The individual's age at diagnosis is calculated as (diagnosis date - birth date)/365.25
and standardized by subtracting the mean, 9.51 years old at time of diagnosis with cancer,
in the following analysis. For gender, we select female as our baseline which indicates the
coding scheme is 0 for female (baseline) and 1 for male for computational purposes.
Diagnosis period is the calendar period of diagnosis. Note that the whole group con-
tains survivors diagnosed with cancer between years 1981 and 1995 inclusively. We further
categorize the diagnosis period into 3 categories: (1981, l985), (1986, l99O), (1991, l995),
and choose the earliest calendar period as the baseline.
The information for these covariates of interest for a total of 1375 study subjects is
stored in the data as the following form:
Study. I D i ccc s t d . age gender calendar
xxx0 1 3 2.144 0 3
xxxO2 10 10.243 1 1
xxxO3 4 8.806 1 1
where variables (from left to right) correspond to unique identification number of childhood
survivors, diagnosis group of cancer, standardized age at diagnosis, gender, and diagnosis
period respectively.
CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT
Table 2.1: Cross-Classification of Gender by Diagnosis Period Diagnosis Period (calendar)
(1981, 1985) (1986, 1990) (1991. 1995) Total
(gender) 214 Total 376 463 536 1375
Table 2.2: Frequency of Cancer Diagnoses Cancer Diagnosis I Leukemia Carcinonla Lvniphon~a CNS Kidney Bone Other 1 Total
Freuuenr~ 1 337 237 230 259 71 l4b 95 I 1375
2.2.2 Data Formulation and Construction
In cohort studies, the follow-up time is essential for the analysis. In the analysis of hospital
utilization, the study period for the group is defined as follows: the start date of the
study is the time measured starting from 5 years after initial diagnosis. All hospitalization
information, such as admission and discharge dates, is available for each individual during
the follow-up until the end of the study (December 31, 2000), death date, or cancellation
of BC Insurance registry, whichever is the earliest.
Due to the possibility of having more than one hospital admission and hence more than
one discharge from hospital over the follow-up period for each individual, each subject has
multiple rows in the data set if necessary to capture his/her dates of hospital admission
along with discharge dates. The data format is as follows:
Study.ID day.entry day.exit days.adm days.re1
xxx0 1 0 151 N A N A
xxxO2 0 915 N A N A
xxxO3 0 12 10 78 100
xxxO3 0 1210 254 257
xxxO3 0 1210 288 3 14
where the variables (from left to right) capture unique identification number of childhood
survivors, beginning of study period, end of study period, admission time, and discharge
time respectively. We apply a Julian date format and set the internal clock for each subject
so that day. entry always starts a t zero. Other time-dependent information is shifted
CHAPTER 2. CHILDHOOD CANCER SUR171T/ORSHIP PROJECT
correspondingly.
By coml~ining the hospital service utilization database with disease and demographic
information, we obtain con~plete information of long term outcome data and the risk factors
that meet the purpose of study for analysis of hospital utilization for childhood cancer
survivors.
Chapter 3
Analysis of Recurrent Events
In this chapter, we begin with a brief review of Cox proportional hazards (PH) model (Cox,
1972). We then study extensions of the model for dealing with recurrent event data, often
referred to as Cox's regression models, some of which have been discussed in the literature
and some have not. Procedures for estimating the parameters in the models are presented.
Motivated by the consideration to address hospital stay in the BC Cancer research project,
we propose approaches to analyzing recurrent events with adjustment for event duration.
Analyses of the data described in Chapter 2 are given as an illustration for the models and
for applying the estimation procedures.
3.1 Review of Cox Proportional Hazards Model
Assume there are 12 subjects in a study and we are interested in the effects of several
covariates to the survival of these subjects. Let Ti be the time from study entry to the
event of interest for subject i , Ci be the censoring time of the event for subject i , and Ui be
the corresponding observation time, that is, Ui = min(Ti, Ci). Let Si = I(T, 5 Ci) be the
indicator variable that takes the value 1 if we observe the lifetime T and 0 if not. Denote
h(tlZ) as the hazard function of T conditional on covariate Z, and suppose
CHAPTER 3. ANALI'SIS OF RECURRENT EVENTS 12
where the nonnegative function ho(t) is the baseline hazard function left unspecified, and
p is a vector of unknown parameters or coefficients corresponding to 2. Based on the
niultiplicative hazard model, the factorization implies
This indicates that the model assumes hazards of two subjects with fixed covariate vectors
are proportional over time. A common choice for g(Z; B ) is e'" and it yields the hazard
function h(tlZ) = ho(t) . e3", which is Cox's proportional hazards model.
Assume n iid observations, say (ui, ai, Zi), i = 1,2 , . . . , n . The likelihood function can
be written as
which can be decomposed as the product of L1 (13) and L2 (P, hO). Cox (1972) proposes to
make inferences about /3 based on the partial likelihood Ll(p) , which is not a likelihood in
the usual sense (Klein and Moeschberger, 2003). The partial likelihood suggested by Cox
(1975) has the following form:
where the risk set, R(ui) = { j : uj > ui}, is the set of subjects that are alive and at risk
just prior to time ui, and Zj(ui) is observed on [0, ui] for subject j . It can be shown that
the resulting estimator for 0 is consistent and asymptotically normal.
The logarithm of the above partial likelihood has the form
Let be the value that maximize the partial likelihood function, Breslow (1972) suggests
an estimator
for the underlying cumulative hazard Ho(t) = J,' ho(s)ds.
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 13
3.2 Extensions of Cox PH Model for Recurrent Events
In studies of survivorship of a certain disease, the observed outcome of interest occurs
only once during the study period. illultiple event data occurs in the situation where
each subject can experience more than one event, such as the relapses of a disease like
repeated asthma attacks or occurrences of tumors. Several Cox regression models have
been proposed to handle multiple event data. Of particular interests are Andersen and Gill
(1982), and Prentice, LVilliams and Peterson (1981). We here present a general form of the
Cox regression models and relate it to the models discussed in the literature.
3.2.1 Notation
We first provide two sets of notation, one is under the usual survival analysis setting and the
other is under the counting process setting, which is commonly used in the recent literature
for recurrent events.
Notation based on Survival Analysis Setting
We generalize the notation used for Cox PH models (Cox, 1972) with one more subscript
to capture multiple events. Let Tik be the total time of the kth event for the ith subject,
Cik be the censoring time of the kth event for the ith subject. Define Uik be the observation
time, that is, Uik = min(Tik, Cik), and 6il, = I(Tik 5 Cik) is an indicator of observed kth
failure time for subject i. Zik = (Zlik,. . . , Zpik)l is the covariate vector for the ith subject
with respect to the kth event, and 2, = (Z;,, . , Z&) denotes the covariate vector for the
ith subject, where h7 is the maximum number of events within a subject. = (PI, .. . , Pp)l is a p x 1 vector of unknown parameters. Denote hk(tlZi(t)) as the hazard function for the
kth event of the ith subject at time t.
In general, the hazard function at time t for a subject is defined as the instantaneous
probability of failure at time t given the survivorship prior to time t and the covariates:
Note that Cox PH model for the kth event time Tk is hk(t (Zi (t)) = ho,k (t) exp{PIZi(t)).
CHAPTER 3. AArALl-SIS OF RECURRENT EVENTS
Nota t ion based o n Count ing Process Setting
Let ,Y1 ( t ) be the cumulative number of failures for subject i prior to time t and N, ( t ) =
{&(u) : 0 < u 5 t ) is the path up to t of the counting process for subject I . It is important
to notice that information given by N z ( t ) is equivalent to the random failure times 0 <
Tzl < Tz2 < . . . < TzAr,(t) 5 t . Let Z,( t ) = {ZZl ( t ) . . . . . Zzp( t ) ) denote a vector of covariates,
for subject i , available at time t 2 0. Denote Z,( t ) = {Z,(u) : u 5 t ) as the corresponding
covariate process up to time t . Let AN, ( t ) be the number of failures over the small interval
[t , t + A t ) . The intensity function a t time t is defined to be the instantaneous rate of failure
at time t given the covariate and counting process up to time t
X(tlNi(t-), Z i ( t ) ) = lim Pr( t 5 Ti,lL;(t-)+l < t + AtlNi( t -) , Z i ( t ) ) / A t nt-o
= nt-o lim Pr(AhTi ( t ) = 1 (N i ( t - ) , Zi ( t ) ) / A t . (2)
There are certain relationships between the hazard function under survival definition (1) and
the intensity function under counting process definition (2) . However, the interpretation
and aspect depart from each other especially when dealing with multiple events. In order to
show the equivalence we need not only to restrict on the order of the multiple events such
that Til < Tiz < . . . < Tih. < t from the survival setting but also make the assumption that
only the most recent history information under counting process setting is required. That
is,
As a result, we use intensity function for the following modeling mechanism in the counting
process setting.
3.2.2 Various Models
Suppose there are n study subjects and the data we observed can be expressed as a set of iid
realizations of response and censoring processes and covariates, that is, { (N i ( . ) , Zi(.) , C i ) :
i = 1,2, . . . , n,) where Ni ( t ) and Zi ( t ) are available for t 5 Ci. Assume conditional indepen-
dence between the counting process and censoring given the covariates, i.e., N ( . ) I C given
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 15
Z ( . ) . Define the at-risk indicator E:(t) = I(Ci > t ) , where I ( A ) = 1 if A is true and zero
otherwise.
For simplicity, we shall drop the subscript i for an arbitrary individual. Let Z ( t ) =
{N(u) : 0 < u < t ) U { Z ( u ) : 0 < u 5 t ) be the history information just before time t . We
model the intensity function (2) as
where S ( t ) is a function of X ( t ) . A special case for S ( t ) is S ( t ) = N ( t - ) , the number of
preceding events. The intensity function (3) is the product of an arbitrary function of time
and an exponential function of covariates, hence it can be viewed as an extension of Cox
proportional hazards model (Cox, 1972).
There are several models that can be postulated from (3) by permitting the shape of
the intensity function to depend on, say, the number of preceding failures and possibly on
other characteristics of { N ( t ) , Z ( t ) ) . This suggests two semiparametric specifications of the
intensity function (3) , namely
where P(t) in (4) is known up to a finite number of parameters. For example, P(t) =
Po + P1 In t . Similarly, Pk(t) = POk + Plk In t in (5) .
Model (4) specifies Xo(t; S ( t ) ) in (3) into a proportional form and P(t; S ( t ) ) in (3) is
specified into a linear function of S ( t ) . The summary of the history information S ( t ) could
be any random process under these models. It is possible to assess how the baseline function
depends on S ( t ) and how the effect of the covariate Z ( t ) depends on S ( t ) under the models.
Moreover, all of the unknowns, the baseline cumulative intensity function and parameters,
can be estimated, similarly to estimating the unknowns in the Cox proportional hazards
model described in section 3.1.
The contribution of S ( t ) to the diverse shapes of intensity function is by viewing it
as a covariate into model (4). It is done differently under model (5) by using S ( t ) as a
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 16
stratification variable. To make it clear, model (5) can be written in detail as follow:
Xol (t) esp {$l(t)lZ(t)) if S(t) E A1
X02(t) exp {P'~(t)lZ(t)) if S(t) E A2
The discreteness of the stratification variable S( t ) allows us to estimate stratum-specific
baseline intensity, Xo(t; S( t ) ) = XOk(t) if S( t ) E Ak, as well as stratum-specific regression
coefficients, P(t; S( t )) = Pk(t) if S( t) E Ak , based on information related to that particular
stratum only. In addition, the arbitrary stratum-specific baseline intensities and regression
coefficients are left unspecified regarding the effect of S(t) to the intensity function under
model (5).
Now consider some special cases under model (4) by successive restrictions. First, we
assume the covariate effect does not depend on S( t ) by setting y = 0 in (4), and the resulting
model has the intensity function
We can further consider time-independent covariate effect by assuming P(t) = P, that is,
the covariate effect does not change over time. The reduced model has the following form
A semi-parametric intensity function proposed by Andersen and Gill (1982), also known as
AG model, has the most restricted form:
This intensity function is obtained by assuming the conditional intensity function is inde-
pendent of the history of the counting process, N(.), which is the reduced model under
(4b) by setting a = 0, and thus the process N(- ) is Poisson. The AG model ( 4 ~ ) assumes
independent increments, that is, the number of failures in non-overlapping time intervals
is independent from each other given the time-invariant covariates. In other words, under
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 17
AG model, the risk of recurrent failure for a subject follows the usual proportional hazards
assumption and is unaffected by previous failures that the subject has been experienced.
Similarly, there are some special cases that can be postulated from (5). Under the
assumption of time-independent covariate effects in (5), the resultant model is the well-
known PJVP model, ~roposed by Prentice, Williams and Peterson (1981). This stratified
proportional hazards type of regression model specifies the intensity function as follows,
X(t('Fl(t)) =XOk(t)exp{P~Z(t)) for S( t ) E A k , k = 1 ,2 , ..., ( 5 4
where S( t ) = S{N(t), Z(t), t} is the stratification variable that may change as a function of
time for a given subject. This model not only allows completely arbitrary baseline intensity
functions, XOk(.) > 0, but also permits the regression coefficients to differ across strata
in a time-independent fashion. That is, the PWP model makes an assumption of time-
independent stratum-specific regression coefficients. A special case for the stratification
variable mentioned by Prentice et al. is S( t ) = N(t ) + 1 for which a subject moves to
stratum k immediately following his/her (k - failure and remains there until the kth
failure or until censorship takes place. In this situation, (5a) permits the baseline intensity
functions to depend arbitrarily on the number of preceding failures for the subject, hence
refers to event-specific baseline hazards.
Furthermore, assume the regression coefficients are the same across stratum under the
PWP model (5a), that is, Pk = for b'k = 1,2 , . . ., and then the corresponding reduced
stratified intensity function has the form
X(t l 'Fl( t ) )=XOk(t)exp{~'Z(t)) for S( t ) E A k , k = 1,2, . . .
Now by glancing at the intensity function (5b), a further restriction on the stratum-specific
baseline intensities through the introduction of common baseline intensities, such that
XOk(t) = XO(t) for k = 1,2, . . ., leads to the AG model (4c).
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
Generalized Cox Proportional Regression Model
X(tlX(t)) = Xo(t: S ( t ) ) exp{P(t; S(t))'Z(t)} (3)
hIodel Type 2
X(tlx(t)) = Xok(t) ex~{3k(t)'z(t)} f o r S ( t ) ~ A ~ . k = 1 . 2 , . . . (5)
PWP hlodel
X(tlx(t)) = Xok(t) exp{pLZ(t)} (5a)
Figure 3.1 : Flow-chart for various models
CHAPTER 3. ANALYSIS OF RECURRENT EVEXTS
3.2.3 Estimation Procedures
We now consider estimation of the parameters and the cumulative baseline function in the
intensity functions specified in the previous section.
Consider the estimation of the regression coefficients, 6 = ( a , 3, ?), in model (4). The
partial score function can be obtained by differentiating the log partial likelihood (Cox,
1975) with respect to 6 and it is
where Wi (u, 6) = afSi(u) + P(u)'Zi (u) + -yfSi(u)Zi (u). Solving PS(4) (6) = 0 gives the
partial likelihood estimator of 6, denoted by 8̂. Further, the Breslow type estimator of the
cumulative baseline function A. (t) = J; Xo (u)du is
where Y,(u) is the at-risk indicator.
Similar to Andersen and Gill (1982), it may be shown that the partial score function is
an unbiased estimating function of the parameter 6 under model (4). In addition, properties
of stochastic processes and martingale theory could be applied to show the consistency of A
the estimators &(.) and 6, aspptotical normality of 8, and weak convergence of &(.).
The estimation procedure for special cases under model (4) can directly apply PS(4) (0)
(6.1) and &(t) (6.2) with little modification of the vector of regression coefficient 6 such
that 6 = (a, - p), 0 = (a, p), and 6 = /3 for model (4a), (4b), and (4c) respectively. The
estimator of cumulative baseline function under model (4c) is the well-known Breslow (1972,
1974) estimator.
Due to the stratum-specific property of model (5) and its special cases, the estimation
procedure only requires slight adjustment from PS(,) (6) (6.1) and Xo (t) (6.2) by introducing
the at-risk indicator subject to the particular stratum. Define Xk(t) = I(Ci > t and
Si(t) E Ak) which is the indicator of whether individual i is at risk of having stratum k
CHAPTER 3. AIVALYSIS OF RECURRENT EVENTS
failures, the partial score function under model (5) has the following form
One can obtain the stratum-specific estimates of the regression coefficient $'k by solving A
the equation PS(5) (pk) = 0 for all k . Substituting Pk into the corresponding estimator of
stratum-specific cumulative baseline function gives
an estimator of the stratum-specific cumulative baseline function Aok(.). Similar to model
(4), the estimators under model (5) enjoy consistency and asymptotic normality, which
could be shown through the application of martingale theory.
For special case (5a). PS(5)(/3k) (7.1) and AOr(t) (7.2) can be directly applied by re-
placing Pk(t) with time-independent stratum-specific regression coefficient Pk. In order
to obtain the overall estimator for each covariate effect under model (5b), we obtain the
following partial score function
Solving the equation PS(50)(P) = 0, we obtain the overall estimator @ under the restricted
PWP model (5b).
3.2.4 Variance Estimation
Functions (6.1), (7.1)) and (7.3) in the previous section can be used to emphasize their
common structure of the Cox partial score function. To unify the construction of robust
variance estimation, we use PS(.), 8, and Zi to denote the partial score function, parameters
of interest, and covariates for subject i respectively.
Borrowing the structure of the Cox partial score function, we have seen that the max-
imum partial likelihood estimate 8̂ is the unique solution to PS(8) = 0. The derivation
C H A P T E R 3. ANALYSIS OF RECURRENT EVENTS 21
of robust variance estimation starts from first order Taylor series expansion on the partial
score funrtion:
Thus
Assuming the limit of the first term on the right hand side exists almost surely, we obtain A
the variance of r3 by finding the variance of above equation
Let X(t) = {N(u) : 0 < u < t} U{Z(u) : 0 < u 5 t} denote the history information just
before time t, and Cr=l ki (t) Zi exp(OtZi)
A(t) = Cy=l Y J (u) e-xp(OrZj) '
where E7(t) is the at-risk indicator subject to censoring. Focusing on Var(PS(Oo)),
where dMi(t) = dNi(t) - exp(OrZi)dAo(t),and d G ( t ) = dNi(t) - exp(8zi)d&(t)
m where = 1 [Zi - ~ ( t ) j ~ , ( t ) d G ( t ) , and C:=l Bi = B.
n
By substituting equation (7.5) into ('7.4), we obtain the variance estimator
It has been shown that this robust variance estimator ('7.6) can be applied to a general
response process (Hu, Sun and Wei, 2003), and it is equivalent to the so-called "sandwich"
estimator presented by Lin and Wei (1989) when the response process is restricted to a
counting process.
CHAPTER 3. ANALYSIS OF R,ECURRENT EVENTS
3.3 Analysis of Childhood Cancer Data I
Recall the childhood cancer data, the study group for analysis of hospital utilization consists
of 1375 individuals meeting the following criteria for inclusion: resident in BC at time of
diagnosis, diagnosed with cancer between 0 and 19 years of age from 1981 to 1995, who
survived at least five years from diagnosis.
3.3.1 Time-to-Event Formulation
To focus on evaluating the relative risk of hospitalization among several factors, we apply
various models described in section 3.2.2 to the data by considering a hospital admission
as an event of interest. We create a data set of interval form as the counting process
style so that we treat the data as time-ordered outcomes (Therneau and Grambsch, 2000).
For illustration, recall that each study subject has the original data format capturing the
hospitalization information such as admission time and discharge time in Table 3.1. The
appropriate data format needed for model fitting is the interval formulation in Table 3.2.
Here start and stop capture the time (in days) from the beginning of study to the event of
interest or end of study. The variable event is an indicator, which is one if an event happens
and zero otherwise. The variable enum indicates the strata that each event is assigned to. It
is defined as N ( t ) + l which is a special stratification variable suggested by Prentice, Williams
and Peterson (1981). However, this definition causes some computational difficulty for the
data and some adjustment are required to determine the summary of history information
S ( t ) used in our model fitting.
Table 3.1: Original Data Format
Study.ID day.entry day.exit days.adm days.re1
xxxo 1 0 c1 t 1 v 1
xxxOl 0 c1 t 2 v2
Table 3.2: Time-to-Event Interval Format Study. ID start stop event enum
xxxo 1 0 t 1 1 1 xxxo 1 t 1 t 2 1 2 xxxo 1 t 2 c1 0 2
CHAPTER 3. ANALYSIS OF RECURRENT ET.'ENTS 23
With the definition of stratification, S(t) = N(t) + 1, all 1375 childhood cancer survivors
were at risk in stratum 1 of whom 561 had a first hospital admission and so entered stratum
2. Of these, 309 were observed to have a second admission and so entered stratum 3.
This process is illustrated in Figure 3.2 and the observation is continued until 1 patient
experienced the thirty-eighth admission.
Figure 3.2: Flow-chart for strata where the stratification scheme used is suggested by Pren- tice, Williams and Peterson. The number of individuals is shown in each stratum
Due to small amount of patients contributed to certain strata which results in inefficiency
of computation (i.e., the iteration method does not converge), we pool admissions in-between
the third and the fifth inclusively into one stratum. Similarly, we group admissions beyond
the fifth into one stratum. That is,
In this data format, we model time to an event which is the time to a hospital admission,
and the multiple rows are contributed through the recurrence of hospital admission.
S( t) = {
With combination of covariate information and interval type of data format, we then fit
the models through the use of statistical software package R1 for our data analysis.
/
1 i f N ( t ) = O o renum=l
2 if N(t) = 1 or enum = 2
3 if N(t) = 2 or enum = 3
1 if N(t) = [3,5] or enum = 4, 5, or 6
5 i f N ( t ) > 5 orenum27.
he R Project for Statistical Computing. http://www.r-project.org
CHAPTER 3. ANALYSIS OF RECURRENT E W T S 24
Due to time-invariant covariates, we apply model (4) with 3( t) = a, model (4b), AG
model (4c), PM7P model (5a), and model (5h) to the childhood cancer data to determine
the relative risks of hospital admission under various factor effects.
3.3.2 Analysis Results: Hospital Admissions
Table 3.3 gives the result of fitting model (4) and its special cases with time-invariant
covariates. Estimates of standard error are shown in brackets (Table A.l in Appendix A
provides corresponding estimates with robust standard error).
The restricted model (4) with P(t) = P has (partial-) likelihood-ratio test (PLRT)
statistics of value 2840 with 54 df. Due to the non-significant treatment by strata interaction
term which has test statistics (PLRT) of value 27.2 with 24 df (p-value = 0.293), the body
of Table 3.3 under model (4) is the estimated coefficients by excluding treatment by strata
interaction term.
The significance of the estimated risk factor of hospital admission by stratum interac-
tion (7) indicates the importance of viewing each hospital admission separately, especially
when its sign and magnitude change at different strata.
This restricted model (4) shows the stratification has a positive contribution of hospital
admission risk for male survivors and patients diagnosed in year 1985 to 1990 (calendar 2)
relative to female and early cancer diagnosis in year 1980 to 1985 (baseline) respectively due
to the positive sign of those estimates across strata. Since we observe the diverse estimated
values of factor by stratum interaction, there is no consistent relationship of risk to hospital
admission associated with factors of interest across strata.
The significance of the stratification effect from models (4a)&(4b) convey similar in-
formation that the number of previous hospital admissions plays an important role to the
relative risk of hospitalization among factors of interest as mentioned previously in the
restricted model (4). In addition, the increasing values of estimated coefficient for strat-
ification variable imply the greater the number of hospitalization, the higher the risk of
hospital admission.
'* indicates the significance at Q. level of 0.05
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 25
The estimates under model (4a)&(ib) and AG model (4c) have the same sign but
different magnitude due to the extra stratification variable included in model (4h). Both
models suggest a decreased hospitalization risk associated with later diagnosis period versus
early diagnosis as well as males versus females. Anlong different types of cancer, only bone
and other types of cancer patients have higher relative risk of hospital admission compared
to Leukemia patients. No significant difference in terms of hospital admission risk was found
among different ages at diagnosis.
Table 3.4 gives the result of fitting model (5a) with a separate coefficient for each stratum
and model (5b) with restriction of common coefficient across strata (Table A.2 in Appendix
A provides corresponding estimates with robust standard error).
The coefficient of -0.519 in the second row of stratum 1 indicates an estimated relative
risk for first hospital admission of exp(-0.519) = 0.595, associated with late cancer diagnosis
(calendar 3) versus early cancer diagnosis (baseline).
Proceeding across each column in Table 3.4 under model (5a), one sees the estimated
association between factors versus their baseline according to the second, the third, the
fourth to the sixth, and more than the sixth admission. For instance, the coefficient in
column 3 of model (5a) arises from the comparison of intensity functions among childhood
survivors who have experienced exactly two hospital admissions and who are at the same
total time since the beginning of study period. This comparison gives a significant2 esti-
mated hospital admission risk of exp(-0.684) = 0.505 associated with lymphoma versus
leukemia (baseline) patients.
The estimated gender effect steadily increases from strata 1, 2, 3, and 5. In stratum 4,
there are 232 males and 280 females; 1841232 of males had hospital admission as compared
to 2141280 for female survivors. This could be the reason that the estimated coefficient for
gender changes the sign in stratum 4 and leads to an relative risk of exp(0.168) = 1.183
associated with male versus female survivors.
Due to pooled admissions in strata 4 and 5, we interpret the trend of relative risks
under various factors based on the first three strata. For diagnosis period, these analyses
'* indicates the significance at a level of 0.05
CHAPTER 3. ANALI'SIS OF RECURRENT EVENTS 26
indicate a decreased hospitalization risli associated with late cancer diagnosis versus early
diagnosis as the frequency of admission increases. An increasing risk of hospital admission
for males compares to females as the number of admission increases, however male survivors
have lower risk of hospitalization overall. For different types of cancer, CNS and kidney
have a decreased risk of hospital admission compared to leukemia (baseline) patients as
the frequency of hospital admission increases. However, among these two types of cancer,
only kidney cancer survivors have overall lower relative risk than leukemia survivors. It is
difficult to interpret or draw conclusion for remaining factors of interest since there is no
clear pattern.
Although stratification under PWP model, model (5a), allows us to examine the patterns
of covariate effects across strata, it is difficult to suggest any strategy to health providers
for improvement. For example, Table 3.4 shows that the gender effect is highly significant
in stratum 1 under model (5a), which indicates males have lower risk of hospital admission
compared to females before their first hospital admission. This significance declines as the
number of admissions increases. In addition, the gender effect does not stand out for the
overall effect under model (5b) even though it is significant at alpha level of 0.05. These
numerical results enable us to see such change of the gender effect, but it is unclear to
provide a strategy to prevent male patients from increasing the risk of hospital admission
relative to female patients.
Model comparisons in terms of model fitting arise spontaneously due to various models
we have considered. Based on the structure and assumption of each model, the baseline
model (Mo : YP = 0), AG model (MI: model (4c)), PWP restricted model (M2 : Pk = P for
Yk), and PWP stratum-specific model (M3: model (5a)) are nested. That is,
Mo Ml : AG model (4c) 2 M2 : model (5b) C_ M3 : model (5a).
It is possible to compare M2 and M3 using (partial) likelihood ratio test based on their
partial likelihood functions (PLRT). The (partial-) likelihood-ratio statistic for testing the
null hypothesis that baseline model (Mo) holds against the PWP stratum-specific model
(ill3: model (5a)) can be written as
CHAPTER 3. AhTALl;SIS OF RECURRENT E17EnTTS 2 7
where log PCRnIovs,~Ij(k) = -2 log {PCnIo /PCnIj(k)) is the (partial-) likelihood-ratio sta-
tistic for testing the null hypothesis that baseline model holds against the PMTP model
restricted to stratunl k. The direct sum over strata for obtaining log PC7ZnIous.;2~3 is due to
the property of PWP model, which is a conditional model. It can be easily seen from (7.3)
without restricting Jk = for 'dk. Similarly, one can obtain ~ o ~ P C R ~ ~ ~ ~ ~ , J ~ ~ , which is the
PLRT shown in the output tables under model (5b). To test the null hypothesis that M2:
model (5b) holds against M3: model (5a), we need to calculate the (partial-) likelihood-ratio
statistic log PCRhI,,s,A13. Based on the output from Table 3.4,
This (partial-) likelihood-ratio test statistic has value of 131 with 40 df, which gives a
p-value less than 0.001 under a chi-squared distribution. This result indicates that model
(5a) contributes to a better model fitting.
A comparison between M I : AG model and M2: model (5b) is not straightforward be-
cause the (partial-) likelihood-ratio test statistic can not be used for testing model fitting.
The reason is because these two models differ in nonparametric parts. Recall that a common
baseline intensity, Xo(.), is assumed under Andersen and Gill (1982) model, whereas Pren-
tice, Williams and Peterson (1981) model allows stratum-specific baseline intensity, XOk (.) .
A full likelihood function is required for the use of likelihood-ratio statistic to compare
model fits. However, comparisons between model (4) and its special cases are feasible using
(partial-) likelihood-ratio test to address model reduction. The (partial-) likelihood-ratio
statistic for testing the null hypothesis that model (4a)&(4b) holds against model (4) gives
a value of 110 with 16 df, which provides a p-value less than 0.001 under a chi-squared dis-
tribution. This shows that there exists strong evidence that the effect of interaction terms
is nonzero. Therefore, the full model, model (4), significantly improves fit.
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
CHAPTER 3. ANALYSIS OF RECURREATT EVENTS
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
3.3.3 Model Checking
A11 important step after model fitting is to assess whether the model is appropriate for the
data.
The intensity function of all models described previously is based on some estension of
Cox's proportional hazards model (Cox, 1972) whose intensity function is a product of an
arbitrary function of time and exponential function of covariates. As a consequence, it is
prudent to assess the proportionality assumption for all those Cox type models.
There are many tests of proportional hazards proposed in the literature. For example,
Schoenfeld residuals can be plotted and visualized as proportional hazards diagnostics.
Instead of applying this formal developed method, we use a simple idea for model checking
based on a nonparametric approach.
Since nonparametric estimates do not make assumptions about the form of the cumula-
tive hazard function, this conceptual advantage can be used as a validation of the propor-
tionality assumption. One of the well known nonparametric approaches for the cumulative
intensity function, also known as the empirical cumulative hazard function, proposed by
Nelson (1969) and Aalen (1972) is commonly known as the Nelson-Aalen (NA) estimate. It
has the following form:
where l$(u) is the at-risk indicator and dNi(u,) is the increment Ni{(u + du)-) - Ni(u-) of
Ni over the small interval [u, u + du). By comparing the NA estimate to any of estimated
cumulative baseline function under those models described in section (3.2.3), we see the
similarity of the estimate by fixing covariates a t baseline level to the NA estimate.
As a result, the idea of checking proportionality assumption is to plot the estimate of
the cumulative baseline intensity function under those semi-parametric models besides the
Nelson-Aalen estimate for each factor level separately and visually check the closeness of
the two curves.
For illustration, we simply select the diagnosis period (i.e., calendar) as the only co-
variate under AG model (4c) and PWP model (5a).
From Figure 3.3, the solid curve is the baseline estimate under the AG model for each
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 31
diagnosis period. The corresponding NA estimate for the cumulative intensity function is
obtained using partial information. The data used is restricted according to the particular
diagnosis period. We can apply the same technique to the PWP model in each stratum.
From Figure 3.4, it is clear to see the closeness of the NA estimate and the PWP nlodel
with an exception in stratum 3. The reason may be due to a small number of events
occurring in that stratum. Nevertheless, the proportionality assumption seems to be valid
overall for the baseline diagnosis period.
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
0'1 8'0 9'0 P'O Z'O 0'0
O'E 9'Z O'Z 9'1 0'1 9'0 0'0
uo!lez!lel!dso~ jo k u a n b a ~ j
O'E O'Z 0'1 0'0
uo!lez!lel!dso~ jo k u a n b a ~ j
CH-4PTER 3. AArALJrSIS OF RECURRENT EVENTS
P E Z 1 0
S' 1 0' L S'O 0'0
uo!lez!lg!dso~ 40 kuanbaq
CHAPTER 3. ANALYSIS OF RECURREKT ESrENTS 3-1
3.4 Extensions of Cox PH Model for Frequency of Events
Adjusted for Event Duration
In previous sections we do not take event duration into account. For example, the hospital
admission is considered as a one day event in the childhood cancer data. However, many
cancer-related treatments such as chemo therapy, radiation therapy, surgery, and bone
marrow transplantation are usually last more than one day.
For the analysis of hospital admissions using methods for recurrent events, the event
is an admission to the hospital. While in the hospital a patient is not at risk to the next
hospital admission. That is, during the time of an admission the individual is not at risk
of having another event as he/she is already admitted. This individual returns to being at
risk the first day after he/she is released from hospital.
Why do we in general need to consider how long the event lasts? The simple answer
follows the assumption that no other events could happen while the current event is hap-
pening. In order to take event duration into account we need additional information about
the end of event duration.
If Ni (u) is the cumulative number of started-events for subject i prior to time u, we define
V;(u) to be the cun~ulative number of correspondent ended-events for subject i prior to time
u. Then for an arbitrary subject, we have the path up to t of the counting process N ( t )
and V ( t ) for started-event and ended-event respectively. Where N(t) = { N ( u ) : 0 < u 5 t )
and V ( t ) = {V(u) : 0 < u 5 t ) .
3.4.1 Adjustment for Event Duration I
The Cox-type models mentioned in section 3.2.2 are extended as follows. First, notice that
the history information needs to be updated by including ended-event information. Let
IFI*(t) = {N(u) : 0 < u < t ) U{Z(u) : 0 < u I t ) U{V(u) : 0 < u < t ) be the updated
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
history information just before time t. hIodels described in section 3.2.2 are then
where S ( t ) is a function of X * ( t ) . Similarly for the special cases under model (4.1) and
(5.1). The second thing needs to be adjusted is the risk indicator, recall that no event
possibly happens within the on-going event duration. Define E'*(t) = I [ N ( t - ) = V( t - ) ] as
the at-risk indicator such that subject is not a t risk of having next event while the current
event lasts.
3.4.2 Adjustment for Event Duration I1
In addition to the at-risk adjustment due to the logic of duration, the importance of event
duration also arises from the potential factor of predisposing a subject to a greater number
of events. One can include previous event duration as a time-dependent covariate and obtain
the estimates through the model fitting.
To analyze the hospital admissions with adjustment for hospital stays, we transform
the data format from the original format in Table 3.1 to the interval format. Table 3.5
shows the interval format after event duration adjustment in addition to the inclusion of
time-dependent covariate, dur, to capture the previous event duration.
Table 3.5: Time-to-Event Interval Format Adjusted for Event Duration
Study. I D start s t o p event enum d u r xxxo 1 0 t 1 1 1 0 xxxO1 V l + 1 t p 1 2 v l - t l + l x xx0 l v 2 + 1 C l 0 2 u p - t 2 + 1
CHAPTER 3. ANALYSIS OF RECURREAJT EVENTS
3.5 Analysis of Childhood Cancer Data I1
Firstly, we implement all models mentioned in section 3.3.2 with adjustment for event
duration and inclusion of one extra time-dependent covariate. Then we compare the effect of
event-duration adjustment to the analysis results in section 3.3.2 by dropping the additional
time-dependent covariate, dur.
Table 3.6 and 3.7 display the results of model fitting (Table A.3 and A.4 in Appendix
A provide corresponding estimates with robust standard errors). From Table 3.6, we ob-
serve several significant factor by stratum interaction terms under Model (4) and highly
significant stratification effect under Model (4a)&(4b), which emphasize the importance of
the number of preceding hospital admissions again. One can see a direct influence on the
relative risk of hospital admission of CNS patients by discarding this stratification effect.
The opposite signs for the estimates of CNS under AG Model (4c) and Model (4a)&(4b)
convey the information that without/with considering the number of preceding hospital
admissions, childhood survivors diagnosed with CNS have relative lowerlhigher risk of hos-
pitalization compared to those diagnosed with leukemia (baseline). Moreover, the previous
hospital duration has a positive side effect to the relative risk of admission. In other words,
patients who stay in hospital longer are more likely to be admitted into hospital again.
There is no clear pattern for the trend of relative risks versus the frequency of admission
from Table 3.7. We observe that both lymphoma and kidney patients have lower risk of
hospital admission compared to leukemia patients no matter how many times they have
been admitted into hospital, these effects are not significant however.
To test the null hypothesis that model (5b) holds against model (5a), the (partial-)
likelihood-ratio statistic gives a value of 164.4 with 43 df (p-value < 0.001), which implies
the PWP model, model (5a), provides a better fit. Comparisons between model (4) and its
special cases3 show strong evidence that the effect of interaction terms is nonzero, hence
the model including interactions significantly improves fit.
- - -
2* indicates the significance at cu level of 0.05
3p-value < 0.001 based on the (partial-) likelihood-ratio statistic of 142 with 19 df for testing the null hypothesis that model (4a)&(4b) holds against model (4)
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
C H A P T E R 3. ANALYSIS OF RECURRENT EVENTS
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 39
3.5.1 Comparison of Analysis Results: Recurrent Events versus Event-
Duration Adjustment
To make the comparison between considering event duration or not, we fit the same models
as section 3.3.2 under the adjusted data format without the additional time-dependent
covariate, dur. Same significant level of 0.05 was chosen for consistent comparison. Table
3.8 and 3.9 display the results of fitting models (Table A.5 and A.6 in Appendix A provide
corresponding estimates with robust standard errors).
In model (4.1) or (4) and its special cases, there is no noticeable difference in fitted
results. However, some prominent differences appear in stratum-specific PMTP models.
Overall, fitted results after adjusted for event duration under PVITP model provide larger
estimates in terms of magnitude as well as slightly higher standard error compared to the
analysis without event duration adjustment. This result may be due to the combination
of adjusted risk set, restricted data used, and small number of events happening in each
stratum.
The estimates of hospital admission risk for diagnosis period in strata 2 and 4 change
sign compared to that ones without considering event duration adjustment (Table 3.4),
which result in opposite interpretation of relative risk of hospital admission associated with
late cancer diagnosis (calender 2) versus early cancer diagnosis (baseline). The estimates
are not significant2 however.
Model fitting results45 are consistent with ones without event-duration adjustment.
That is, the stratum-specific PWP model (5a) provides a better fit comparing to the re-
stricted PWP model (5b). No model reduction is suggested for model (4).
2* indicates the significance at a level of 0.05
4 ~ h e (partial-) likelihood-ratio statistic for testing the null hypothesis that model (5b) holds against model (5a) gives a value of 142 with 40 df (p-value < 0.001)
5 ~ h e (partial-) likelihood-ratio statistic for testing the null hypothesis that model (4a)&(4b) holds against model (4) gives a value of 120 with 16 df (p-value < 0.001)
CHAPTER 3. ANALYSIS OF RECURRENT EVENTS
CHAPTER 3. AIVALYSIS OF RECURREhTT EVENTS
Chapter 4
Analysis of Generalized
Longitudinal Data
There are basically three aspects of clinical interest of recurrent events: (1) time to an
event or times to multiple events, which address the risk of having event, (2) event frequency,
which focuses on the prevalence or duration of the event, and (3) multi-states, which provide
the transition probabilities of inlout of states determined by event-type.
In the previous chapter we applied several well-known models that have been proposed
in the literature by focusing on estimation of relative risk. Hence, those models are based
on the aspect of time to events. This chapter is a direct extension of previous chapter in
which the response process is a counting process. Due to the property of event frequency, a
counting process formulation is very natural to assess the relative risk of hospital admissions.
To incorporate different quantities of interest such as hospital status or duration, we are
motivated to extend the analysis with a counting process to an arbitrary response process.
Applying the same probability models considered in Chapter 3 with an arbitrary re-
sponse process results in different model interpretation due to the change from modeling
intensity functions to modeling marginal or conditional moment functions. The estimating
procedures for this extension are no longer based on partial likelihood function (Cox, 1975),
but on unbiased estimation functions (I•’).
CHAPTER 4. AArALI'SIS OF GENERALIZED LONGITUDIN-4L DATA 13
4.1 Analysis of Hospital Duration Using Counting Process
Formulation
The original data format (admission time and discharge time) of the childhood cancer
survivor is recorded in terms of date. The hospital duration itself has been viewed as discrete
with days as the unit instead of coiitinuous time. Without more detailed information,
for example, time is recorded in hours or even smaller units, it is reasonable to calculate
hospital duration in terms of the number of days that patients stay in hospital. Therefore,
we consider the response process as a counting process in a usual manner, in which we count
the number of days patients stay in hospital. One can see that it is a direct application of
the approach presented in Chapter 3: the number of hospital admissions is replaced with
the number of days in hospital.
4.1.1 R Data Format for Hospital Duration
To take advantage of the existing built-in function in R (coxph), we construct the required
time-to-event interval format described in Section 3.3.1 by treating day in hospital as an
event of interest.
Consider the following hypothetic example displayed in Table 4.1, an arbitrary individual
who has been admitted into hospital twice within the study period and stayed in hospital
for 2 days and 3 days respectively. The contribution of a total of 5 days in hospital for this
individual is the creation of 5 time-to-event intervals (5 rows) with event indicator equals to
1, and the last row captures the time interval to the censoring with event indicator equals
t o 0. Table 4.2 shows the required data construction for this hypothetic example in order
to obtain the estimates with direct use of built-in function in R.
Table 4.1: Hypothetic Data for an Arbitrary Individual
Study. ID day. entry day. e x i t days. adm days. re1 xxxo1 0 c1 t 1 tl + 1 xxx0l 0 c1 t 2 t 2 + 2
CHAPTER 4. ANALYSIS' O F GENERALIZED LONGITUDINAL DATA
Table 4.2: Time-to-Event Interval Format for Hospital Duration
Study. I D s t a r t s top event enum xxxo 1 0 t 1 1 1 >;;.;xo 1 tl t l + l 1 2 m O 1 t l + 1 t2 1 3 x x . 0 1 t2 t2 + 1 1 4 m O 1 t2 + 1 t2 + 2 1 5 s , y x O l t2 + 2 C1 0 5
4.1.2 Analysis Results for Day in Hospital Using R Built-in Function
We apply the time-to-event interval format to the cohort of 1375 patients in the study. One
challenge of directly using the built-in function in R for the analysis is the construction
of this required time-teinterval format. In addition, the numerous number of events (i.e.,
number of days a patient stayed in hospital) for each individual and the large sample size
require huge memory storage of the time-to-event interval format, which is the disadvantage
in the sense of inefficient usage of finite memory space.
Let N(t) be the cumulative number of days an arbitrary patient stayed in hospital up to
time t. A commonly used definition of strata, S( t ) = N ( t ) + l , makes some models displayed
in Figure 3.1 improper due to a maximum of 392 days contributed from a particular patient
within his/her study period, hence there would be 392 strata t o consider according to
this stratum definition. However, the AG and PWP models with an overall effect are
reasonable to investigate the relative risks of hospital duration under several factor effects.
The estimates are provided in the last two columns of Table 4.4.
In order to apply those models used in analyzing hospital admission, we need to pool
days in hospital to create reasonable numbers of strata by avoiding non-convergence of the
algorithm. Table 4.3 summarizes the definition to create 5 strata due to the consideration
of the number of strata used for the response of hospital admission (section 3.3.1).
The stratification scheme presented in Table 4.3 shows that strata are created so that
they contain approximately equal number of events. That is, a total of 11620 days cumulated
over follow-up periods of 1375 patients, each stratum is created so that it contains roughly
2324 (1162015) cumulative days over 1375 individuals. For instance, Table 4.3 shows that
stratum 2 includes those patients who have stayed in hospital in-between 7 and 19 days
CHAPTER 4. ANALYSIS O F GENERALIZED LONGITUDINAL DATA
inclusively and contribute a total of 2280 days over the corresponding period.
The estimated results are displayed in Table 4.4, where estimates of standard error
are displayed in brackets (Table A.7 in Appendis A provides corresponding estimates with
robust standard error). The last column of the estimates under Model (5b), i.e., PWP
overall effect model, is based on a commonly used definition of strata, S(t) = N(t ) + 1.
As we expected under the overall effect in Model (5b), the estimated values differ from
one definition of stratification to another. A surprising effect by changing stratification
definition is the opposite interpretation to the relative risk of hospital duration for cancer
Table 4.3: Definition of Stratification to Five Strata
patients with treatment diagnosis of kidney or other who were diagnosed with cancer in-
between year 1985 and 1990. Stratification according to a commonly used definition shows
that patients diagnosed with cancer in-between year 1985 and 1990 have lower risk of staying
in hospital for another day than those who diagnosed in the early 1980's. Also, patients
with treatment diagnosis of kidney or other are less likely to stay in hospital compared to
leukemia patients. Correspondingly, positive estimates results in an opposite interpretation
under the definition of stratification according to roughly equal number of events within
each of five strata.
Stratum Events
w)
The positive estimates for strata effect under Model (4a)k(4b) indicates the stratifi-
cation is important to the estimation of the relative risk for hospital duration. Looking
closely at the covariate effects across strata, we find the estimated results from stratum to
stratum under Model (5a). A consistent reduction in risk of hospital duration in terms of
the sign of the estimates across strata appears for late cancer diagnosis period (calendar 3)
and carcinoma patients. No other consistent results can be seen from Table 4.4. Among
different types of cancer, one can see a decreased risk of staying in hospital associated with
carcinoma patients versus leukemia patients as the cumulative days in hospital increase.
In general, it is difficult to make inference due to the uncertain stratification scheme and
limited, unclear patterns.
1 (enum 6) 2 (enum 19) 3 (enum 42) 4 (enum 84) 5 (enum 392) 2277 2280 2342 2357 2364
10, 7 ) [:, 20) [20, 43) [43, 85) [85, 392)
Total 11620
CHAPTER 4. ANALISIS O F GENERALIZED LONGITUDIN4L DATA 46
hlodel comparisons under the 5 strata stratification scheme show model (5a) contributes
to a better fit. The (partial-) likelihood-ratio statistic for testing the null hypothesis that
model (5b) holds against model (5a) has a value of 2268 with 40 df (p-value < 0.001).
4.2 General Response Processes
In order to incorporate other responses of interest under a unified framework, a general-
ization of response process is desirable. Denote {X(t) : t > 0 ) to be the response process
in general. We specify it according to admission frequency, hospital status, and hospital
duration respectively as follows:
1. X( t ) = the number of admissions up to time t
1, if subject is in hospital a t time t ; 2. X( t ) =
( 0, otherwise.
3. X(t ) = overall time in hospital up t o time t
As a hypothetical example, consider an arbitrary subject admitted into hospital four
times over his or her follow-up period, the hospitalization record includes four admission
times (TI, T2, T3, and T4), as well as the corresponding discharge time (Vl, V2, &, and
V4). Figure 4.1 displays the response process under different quantities related t o hospital
utilization.
Because the information collected for hospital records is based on a time unit of a single
day, the response process of hospital duration can be viewed as a counting process as we
considered in the previous section, that is, the hospital duration is equivalent t o the number
of days an individual stays in hospital up to time t. The general setting of response process
X( t ) is not restricted to any particular time unit. However, it is not rational to break the
time unit into many small pieces and hope its discreteness captures as much information as
the continuous time does.
CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA
I- ~1 n I- w I- PI n m m ~ ~ ~ a ~ - ~ i a - r m m 9 m o o m - o ~ ! m 9 0 ? p o p p O p o o I.. .
CHAPTER 4. ANALITSIS OF GEWERALIZED LONGITUDINAL DATA
~ s p t a l Status
#ospital Duration
Figure 4.1: Response process for hospital admission, status, and duration
CHAPTER 4. ANALI-SIS OF GENERALIZED LONGITUDINAL DAT-4
4.2.1 Methodology Inheritance
The flexibility of the response process X ( t ) enables the possibility of analyzing hospital
utilization from different perspectives sin~ultaneously. lye directly replace the response
process K ( t ) described in chapter 3 for analyzing event frequency by the arbitrary response
process X ( t ) to analyze prevalence of hospitalization and hospital duration.
All of the Cox-based models presented in section 3.2.2 and associated partial-likelihood-
based estimation procedures are applicable with replacement of the arbitrary response
process X ( t ) . However, the model interpretation would be quite different since the re-
sponse process is no longer restricted to a counting process. A type of Cox regression model
has the following form
Its two main postulated models:
where S ( t ) is a function of the history information, 'Fl(t) = { X ( u ) : 0 < u < t ) U { Z ( u ) :
0 < u 5 t ) . Depending on the quantity of interest to evaluate hospital utilization, the
stratification variable S ( t ) = X ( t ) + 1 proposed by Pretence, Williams, and Peterson (1981)
might require some modification. As mentioned in previous section, it is inefficient to apply
a commonly used definition t o address hospital duration because of enormous number of
days that patient could stay in hospital. Instead stratify directly according to the number
of days stayed in hospital, clustering several days together by allowing each stratum to
contain roughly equal number of days would be an option. However, stratification according
to equal number of days cannot distinguish between the continuity of successive hospital
duration based on one admission and contribution of hospital duration started from another
admission. In other words, this stratification definition assumes that there is no difference
between patients who admitted into hospital twice and patient who admitted once but
stayed in hospital for two days. To address the impact of another admission to hospital
duration, we consider the same stratification scheme as we used for stratifying hospital
admissions to create a total of five strata (section 3.3.1). That is, we group hospital duration
CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA 50
associated with the first, second, third, fourth to sixth, and seventh hospital admission and
above into corresponding stratum no matter the immensity and diversity of the number of
events across strata.
The estimation procedures under the counting process setting for Cox type models
are based on the formation of partial score functions arising from differentiation of the
logarithm of the partial likelihood functions (Cox, 1975). With the continuous property
of time duration for response of hospital duration and non-counting response process for
hospital status, our estimation procedures are more appealing. However, the estimation
procedures can be inherited from chapter 3 to deal with an arbitrary response process and
refer to as estimation functions (&&). In both cases, the pseudo Cox likelihood can be
used for estimation of the parameters of interest. The estimating equations for model (4.2)
and (5.2) are displayed as follows, where 19 includes all the parameters of interest and Pk corresponding to stratum-specific regression coefficients.
The estimates are obtained by solving the estimation equation &&(8) = 0 in a usual
manner. Similarly for the special cases generated from model (4.2) and (5.2). We apply
robust variance estimation provided in section 3.2.4 by replacing the partial score function,
PS(8) , with the estimation function, &&(8).
& , d G ( t ) = dXi(t) - exP(8Zi)dio(t) with where Bi = [Zi - A(t)]l;(t)d$(t), a = ,, i o ( t ) being the Breslow estimator of Ao(t), and
Cy=fi (t) Zi exp(8'Zi) A(t) = Cj"=, Y , (4 e x ~ ( e ' z j ) '
CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA
4.3 Analysis of Childhood Cancer Data I11
JTTith the extension of the response process, the R built-in function (cosph) is no longer
applicable and hence programming in C is required. The algorithm used to obtain the
estimates of parameter and robust variance is based on the methodology developed in the
previous section. Appendix B introduces an input friendly function for analyzing hospital
utilization based on hospital admission, status, and duration under various models consid-
ered in this thesis. Some estimated results are displayed in Appendix C.
One can see that estimates obtained through the C program and from R output are
comparable for analyzing hospital admission. However, the value of the partial likelihood
functions evaluated at the estimates based on the C program is larger compared to those
obtained from R. That is, we substitute the estimates from R into the C program and make
the comparison. Based on the criterion of maximizing partial likelihood function, estimates
obtained from the C program are preferable.
4.3.1 Analysis Results: Prevalence of Hospitalization
Due to this dichotomous response for hospital status, the response process is also known as
an alternating binary process. A phenomenon observed while running the C program is the
sensitivity of initial values resulting in the uncertainty of convergence of algorithm. As a
consequence, there is no guarantee to obtain estimates and the process is time-consuming.
Using root-finding algorithms other than the Newton-Raphson algorithm may be helpful t o
the efficiency and stability of the C program.
Besides possible updates of the C program to improve the convergence of algorithm,
the ease of replacing response process leads to a pitfall of model fitting. Recall that all
the models we considered are semi-parametric Cox-based models, that is, the assumption
of proportional hazards is crucial. Therefore, it is necessary to check the proportionality
assumption for an alternating binary response, such as hospital status.
It is possible to obtain estimated results using the C program with a more stable root-
finding algorithm if one is interested in the semi-parametric models considered so far. To
release the worry of model assumption, a nonparametric approach will be introduced in
CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDIATAL DATA
section 4.4 for evaluating hospital utilization including the prevalence of hospitalization
4.3.2 Analysis Results: Hospital Duration
VITe apply the C program to evaluate hospital utilization based on hospital duration. The
stratification scheme used in the C program is according to the number of hospital admission
to distinguish hospital duration from cumulating after another admission to a successive
duration that we used in day in hospital (section 4.1.2). Estimated results with robust
standard errors displayed in brackets are provided in Table C.4 (Appendix C).
The significance of the stratification effect from model (4a)&(4b) emphasizes the impor-
tance of investigating the effect of the relative risk of hospital duration across strata. Also,
an increasing value of estimated coefficient for stratification variable imply higher risk of
hospital duration for patients with more admissions.
We closely exanline the estimates under Model (5a) across stratum 1, stratum 2, and
stratum 3 because these strata are incremented by one hospital admission whereas the last
two strata (stratum 4 and 5) pooled several admissions together. One can see an increas-
ing risk of hospital duration for males compared to females as the number of admissions
increases. Similar trends appear for age at diagnosis. There exist decreased risks of hos-
pital duration for patients with treatment diagnosis of bone and carcinoma compared to
leukemia patients as the number of admission increases. This result indicates that bone
and carcinoma patients stay in the hospital shorter for their further admissions relative to
leukemia patients. No clear trend appears for the remaining factors of interest.
The estimated results for hospital duration under the counting process (day in hospital
in section 4.1) and the extended response process (Table C.4) are similar under Andersen
and Gill (AG) model due to its independent increment assumption. That is, different
stratification schemes do not affect the estimating procedure. These two sets of estimates
are very close in magnitude except for treatment diagnosis of lymphoma, but the effects
are consistent in terms of providing lower risk of hospital duration compared to leukemia
patients.
CHAPTER -1. ANALYSIS O F GENERALIZED LONGITUDINAL DATA
4.4 Nonparametric Estimation
Instead of estimating relative risk of having event(s) or focusing on factor effects using semi-
parametric approaches, a fundaiiiental objective from clinical perspective is estimation of
the mean function to gain overall information, such as how well the drug affect patient's
condition on average, the average number of hospital admission for cancer patients up
to a certain time, and so on. Estimation of the mean functions is recently developed
and publication of methods is enriched in the literature. We introduce a nonparametric
estimator based on the sample mean of available data.
4.4.1 Estimation Based on Observed Sample Mean (OSM)
Let X(.) denote an arbitrary response process {X(t) : t > 0), we are interested in estimating
the mean function, p ( . ) = E{X(.)), based on a set of independent and partially-observed
realizations of the response process. We introduce an observation process (0-1 process),
6(.) = {S(t) : t > 0), to indicate X(.) is observed or not. That is, S(t) = 1 indicating the
response process X(.) is observed at time t.
We consider censored survival data for an illustration purpose of the above formulation.
Let T > 0 be the time to an eventlfailure and define the response process as X ( t ) =
I ( T > t ) and assume the corresponding observation process subject to censoring, that is,
6(t) = I ( C > t ) where the random variable C denotes the censoring time, where I( .) is an
indicator function. In this setting, the mean function, p(.), of the survival process is the
survival function p(t) = Pr(T > t ) .
Hu and Lagakos (2006) introduce the observed sample mean (OSM) process p(.) as a
natural estimator of p(.) and derive its properties. The observed sample mean process is
defined as follows:
for all t for which C:=l &(t) > 0, where subscript i captures each study subject, i = 1, . . . , n.
The theorem proved by Hu and Lagakos (2006) establishes the uniform consistency and
weak convergence of this sample mean estimator.
CHAPTER 4. AMLYSIS OF GENERALIZED LONGITUDIAYAL DATA
4.4.2 Analysis Results Based on OSM for Childhood Cancer Data
To address hospital utilization of the childhood cancer survivors, recall that three re-
sponse processes are of particular interest. Let X ( t ) be the cumulative number of hos-
pital admission, hospital status, or duration, then form the corresponding response process
{ X ( t ) : t >_ 0). We choose the observation process, { b ( t ) : t 2 0), subject to censoring or
termination of study.
Based on the data of the response and observation processes, we estimate the mean
function using OSM estimator. The estimates are plotted in Figure 4.2. One can see an
increasing trend over time for both the mean number of admissions and the mean of hos-
pital duration, which indicates that childhood cancer survivors are likely to be hospitalized
during any time period. The prevalence of hospitalization over time reveals a consistent
phenomenon. On the other hand, using the lowess smoother (Cleveland, 1979), a decreas-
ing trend of prevalence of hospitalization can be seen from the scatter plot. The roughly
decreasing prevalence suggests a possible decline of the cancer survivors' time a t hospital.
The indication of the decreasing trend for hospital prevalence could result from an invalid
assumption of non-informative censoring. A relative high estimated value near the end of
study might be due to the small amount of those survivors whose information is available
in the late study period.
CHAPTER 4. ANALYSIS OF GEXERALIZED L OXGIT UDIXA L DATA 55
Number of Hospital Admissions Number of Hospital Admissions
0 1000 2000 3000 4000 5000
follow-up time (in days)
Prevalence of Hospitalization
i
0 1000 2000 3000 4000 5000
follow-up time (in days)
Hospital Duration
0 1000 2000 3000 4000 5000
follow-up time (in days)
follow-up time (in days)
Prevalence of Hospitalization
follow-up time (in days)
Hospital Duration
0 1000 2000 3000 4000 5000
follow-up time (in days)
Figure 4.2: Observed Sample Mean (OSh4) with hospital admission, status, and duration. NOTE: The estimates on the left column are connected by line, while the estimates on the right column are scattered with lowess smoother
Chapter 5
Final Remarks
We applied several approaches for recurrent events analyses that have been proposed in the
literature for evaluating hospital utilization of childhood cancer survivors. While working
on it, we noticed that each hospital admission is followed by a hospital stay and a subject is
not possible t o be admitted into hospital during this period. This motivated us to consider
adjustment for event duration using available discharge records. It is desirable to exclude
the subject from the group possibly to be admitted to hospital before his/her discharge
from hospital. In addition, to incorporate different aspects of clinical interest we extend
recurrent data analysis to generalized longitudinal data analysis. With the flexibility of
our methodology, we are able to analyze hospital admission, status, and duration under a
unified framework.
Further work includes theoretical and simulation studies of the proposed methods.
Model checking, consideration of different time scales, implementation of additional models
to address other scientific questions of interest, and case-control analysis are also worth
further investigation.
The childhood cancer data we used for illustration of our formulation and approaches
can be considered as a subset of the on-going research project in BC Cancer Agency. In
practice, there are more potential risk factors that investigators are interested in. It may
be computationally challenging to include all the potential risk factors. Important factors
might be overlooked by not including all potential factors however.
CHAPTER 5. FINAL R E M A R E 3
For the analysis of recurrent events for longitudinal data, there are some possible time
scales to consider. Two natural time scales for recurrent event data are time-to-event and
time-between-event. LVe have considered the recurrent events in the time-to-event fashion,
the estimation procedures are also applicable to recurrent gap times, also known as the
time-between-event or inter-event times.
Lin and Ying (1994) proposed the additive risk model as an alternative to the propor-
tional hazards model. The additive risk model specifies the hazard function is the sum
of! rather than the product of, the baseline hazard function and the regression function
of covariates. Their proposed techniques for estimating parameters resemble the partial-
likelihood-based methods for the proportional hazards model, and their estimator of regres-
sion parameters has a closed form. They suggested fitting both multiplicative and additive
risk models on the same data set as these risk models inform two different aspects of the
association between risk factors and hospital utilization: the former pertains to the risk
ratios whereas the latter to the risk differences.
Another possibility for future work is to focus on another aspect of clinical interest by ap-
plying a multi-state model that provides the transition probabilities of inlout of states. For
this particular health utilization project, possible states of interest can be determined by (1)
the status of hospitalization, (2) ordinal hospital admissions or discharges, (3) combination
of ordinal hospital admissions and corresponding discharges, and (4) all of above together
with other states of interest. Figure 5.1 displays possible transitions between states of in-
terest mentioned. As one can expect, the multi-state model is very informative for making
inference about patient's health condition as well as assessing hospital utilization.
Comparisons between case and control is of particular interest from a clinical perspective.
The case-control comparison can help scientific investigators to draw conclusions of the
necessary level of increasing health utilization to childhood cancer patients to improve their
quality of life. It can be easily implemented by introducing an additional covariate to
indicate treatment (case) group and control group. It is important to select covariates that
suit one's interest, sometimes the choice of covariates is not clear to the investigators for
the case-control comparison however.
The feasibility of assessing hospital resource in the province of British Columbia, Canada
CHAPTER 5. FINAL R E A I A W 58
requires medical insurance (hledical Service Plan is currently named Health Insurance BC)
for each resident in BC. Although it is rare, the possibility of a gap in insurance coverage
is possible even for those suffering from serious diseases , such as cancer patients. The
complication then lies to the consideration of the observation process. Instead of assuming
continuous insurance coverage for the observation process subject to censoring, taking the
status of insurance coverage into account together with censoring information may be an
option. hloreover, from an insurance company or government perspective, it is useful to
know the frequency of hospitalization and its associated amount of cost that customers
claim for to set up assistance program. To make inference about this, the state of inlout
of insurance can be added into the multi-state model as displayed in Figure 5.1 (4).
CH-4PTER 5. FINAL R E M A R I G
y In Hospital I . I Out of Hospital h
1 2nd admission 1.77 ,
(2)
3Td admission I 1
lSt admission
and so on
1
2nd admission 2nd discharge
I I
and so on
(4) Under MSP Coverage
In Hospital Out of Hospital
-1 Out of MSP Coverage + Figure 5.1: Illustration of multi-state model with states determined by (1) the status of hospitalization, (2) ordinal hospital admissions (or discharges), (3) combination of ordinal hospital admissions and corresponding discharges, and (4) status of hospitalization and status of health insurance coverage
Appendix A
Estimation Results Using R
Built-in Function
This appendix provides all of the estimates obtained by the R built-in function with ro-
bust standard errors, which includes tables for analysis of hospital admissions without
event-duration adjustment (Table A . l & Table A.2); analysis of hospital admissions with
additional time-dependent covariate under event-duration adjustment (Table A.3 & Table
A.4); and analysis of hospital admissions with event-duration adjustment (Table A.5 & Ta-
ble A.6). In addition to analysis of hospital admissions, analysis results of day in hospital
presented in Chapter 4 are also obtained based on R built-in function, Table A.7 provides
the estimates with robust standard errors.
Due to the use of different standard errors, significant covariate effects are expected to
be different from those ones without using robust standard errors even though the alpha
level is set to be 0.05.
APPENDIX -4. ESTIMATION RESULTS USING R BUILT-IAN FUNCTION
Mod
els
Str
atum
N
un
her
of
Eve
nts
ca1e
nda.
r 2
cale
ndar
3
lym
phom
a C
NS
ot
her
kidn
ey
hone
ca
rcin
oma
gend
er
PLR
T
d f
p-va
lue
Tab
le A
.2:
Est
imat
es o
f P
un
der
Mod
el (
5) a
nd
its
spe
cial
cas
es f
or h
ospi
tal
adm
issi
ons
wit
hout
eve
nt-d
urat
ion
adju
st,m
ent.
NO
TE
: E
stim
ates
of
rob
ust
sta
nd
ard
err
or
are
disp
laye
d in
bra
cket
s
APPENDIX A. ESTIAfATION RESULTS USING R B [TILT-IN FUlNCTION
APPENDIX A . ESTIMATION RESULTS USING R BUILT-IN FUNCTION 64
APPENDIX A . ESTIhfATION RESULTS USING R BUILT-IN """ 'nm'^ " '
APPENDIX A. ESTIMATION RESULTS USING R BUILT-IN FUATCTION
Mod
els
Str
atu
m
Num
ber
of E
vent
s ca
lend
ar 2
ca
lend
ar 3
st
d.ag
e ly
mp
ho
ma
CN
S
othe
r ki
dney
bo
ne
gend
er
enum
19
Mod
el (
4a)
&(l
b)
0.06
5 (0
.118
) '-0
.596
(0
.115
) 0.
008
(0.0
10)
-0.1
60 (
0.17
5)
'0.3
32
(0.1
66)
0.16
3 (0
.177
) 0.
073
(0.2
33)
0.24
0 (0
.162
) *-
0.38
1 (0
.184
) 0.
021
(0.1
06)
*3.0
61 (
0.11
6)
*3.9
51 (
0.14
2)
*4.4
42 (
0.16
6)
Rlo
del
(51)
) ol
-era
ll e
ff
d
1162
0 0.
057
(0.1
2G)
*-0.
525
(0.1
20)
0.00
8 (0
.009
) -0
.097
(0.
174)
'0
.378
(O
.lG
9)
0.12
7 (0
.185
) 0.
002
(0.?
38)
0.24
6 (0
.1G
O)
*-0.
383
(0.1
951
-0.0
35 (
0.10
7)
PL
RT
= 8
74
10 (I
f p <
0.0
01
hlod
el (
5a)
1
(enu
m 6
) 2
(en
um
19)
3
(en
rtn
~ 42
) 4
(enu
rn 8
4)
5 (e
nunl
392
) 22
77
2280
23
42
2357
23
64
"-0.
319
(0.0
97)
-0.1
76 (
0.19
6)
0.27
6 (0
.273
) -0
.226
(0.
283)
'1
.140
(0.
429)
*-
0.69
1 (0
.135
) -0
.595
(0.
314)
-0
.296
(0.
359)
'-1
.069
(0.
539)
*-
1.22
7 (0
.470
) *0
.0!2
1 (0
.009
) 0.
030
(0.0
16)
0.00
9 (0
.029
) -0
.029
(0.
024)
'0
.070
(0
.01
7)
-0.1
90 (
0.15
3)
-0.4
51 (
0.28
6)
-0.1
65 (
0.52
0)
0.55
9 (0
.599
) -1
.415
(1.
182)
0.
172
(0.3
4)
0.08
4 (0
.245
) -0
.138
(0.
315)
0.
363
(0.3
27)
*1.5
09 (
0.42
2)
-0.2
01 (
0.25
5)
-0.0
45 (
0.50
8)
0.06
5 (0
.443
) *
-OX
1 (
0.38
7)
*1.5
55 (
0.4
48
) -0
.344
(0.
222)
-0
.060
(0.
827)
-0
.lll)
T (0
.644
) 0.
293
(0.6
06)
*0.7
94 (
0.36
7)
0.28
9 (0
.157
) 0.
270
(0.3
36)
0.47
1 (
09
5)
-0
.510
(0.
366)
0.
680
(0.4
38)
-0.0
44 (
0.15
6)
-0.1
60 (
0.27
6)
-0.6
09 (
0.53
5)
*-1.
303
(0.4
09)
'-1.7
92
(0.3
77)
*-0.
389
(0.0
87)
-0.0
26 (
0.18
9)
-0.0
85 (
0.25
0)
0.33
7 (0
.290
) -0
.20G
(0.
288)
*5.0
39 (
0.19
9)
PL
RT
= 3
7389
14
df
p <
0.0
01
RIo
tlcl
(4c)
-0.2
56 (
0.20
0)
'-1.1
42
(0.2
44)
0.02
6 (0
.021
1 '-0
.914
(0.
299)
0.
138
(0.2
64)
0.10
1 (0
.44
3)
-0.3
05 (
0.56
2)
0.48
7 (0
.345
) +-
O.G
% (
0.29
0)
-0.0
49 (
0.17
8)
PL
R'I'
= X3:E
10 tl
f 1)
< 0
.001
PL
RT
= 3
20
PL
RT
= 1
84
PL
RT
= 2
47
PL
RT
= 1
070
PL
RI'
= 1
321
10 d
f 10
df
10 tl
f 10
df
10 rl
f p
< 0
.001
p <
0.0
01
p <
0.0
01
1, <
0.0
01
11 <
0.0
01
hlot
lt4
(511
) o
~rr
all
effr
ct,
1162
0 -0
.074
(0.
076)
-0
.473
(0.
095)
'0
.013
(0.
006)
-0
.144
(0.
113)
0
.lW
(0.
094)
-0
.075
io.
140)
-0
.116
(0.
2231
0.
104
(0.1
17
) -0
.1!)
3 (0
.11
71
-0
.019
(O
.CI(
i6)
PL
RT
= 1
88
10 tl
f p
< 0
.001
Tab
le A
.7:
Est
imat
ion
res
ult
s of
ho
spit
al d
ura
tio
n,
Day
in
Hos
pita
l, u
sing
th
e R
bui
lt-i
n fu
ncti
on f
or s
ever
al m
odel
s (e
stim
ates
of
rob
ust
sta
nd
ard
err
or
are
disp
laye
d in
bra
cket
s).
NO
TE
: A
nder
sen
and
Gil
l m
odel
, m
odel
(4
c),
is u
naff
ecte
d by
th
e di
ffer
ent
defi
niti
ons
of s
trat
ific
atio
n.
Th
e la
st c
olum
n of
th
e t
able
dis
play
s th
e es
tim
atin
g r
esul
ts f
or t
he
over
all
effe
ct o
f P
ren
tice
, W
illi
anis
an
d P
eter
son
mod
el,
mod
el (
5b
), u
nd
er a
com
mon
ly u
sed
defi
niti
on o
f st
rata
acc
ordi
ng t
o t
he
nu
mb
ers
of (
lay
stay
ed i
n ho
spit
al.
* in
dica
tes
the
sig
nifi
canc
e a
t cr l
evel
of
0.05
Appendix B
Data Input in C Program
Due to the variety of models, type of response process, and adjustment for event-duration
considered in this thesis project, C program is coded to incorporate this variety into input-
based function. We require input from user including the dimension of parameters under
specific model (Dimension), the response process for analyzing hospital utilization (Re-
sponse Type), model specification (Model Type), identification of stratum for stratum-
specific estimates or overall estimates (Stratum), and event-duration adjustment or not
(Adjustment). Table B.l displays the possible combinations we considered in this thesis
project.
Dimension and Model Type are model-dependent, the usefulness of Dimension is a t the
situation when we consider the same model by including the previous hospital duration as
one extra time-dependent covariate so that we are able to obtain the estimates for this extra
covariate as well.
Detailed categorical values for each input are provided for hospital admissions, some
modification on Stratum6 is necessary when consider other responses such as hospital status
and duration as mentioned in section 4.2.
'~odification required for Stratum under hospital status and duration
68
APPENDIX B. DATA INPUT IN C PROGRAM
1 Hospital Admissions
Respollse Type = 2 Hospital Status or Prevalence of Hospitalization
3 Hospital Duration
h4odel Type =
Stratum"
1 for model(4)
2 for model(4a)&(4b)
3 for model(4c)
4 for model(5a)
5 for model(5b)
0 for overall effect
1 if N(t-) = 0
2 if N(t-) = 1
3 i f N ( t - ) = 2
4 if N(t-) = 3 to 5
5 if N ( t - ) > 5
0 for no event-duration adjustment
Adjustment = 1 for event-duration adjustment
2 for event-duration adjustment with extra time-dependent covariate
APPENDIX B. DATA INPUT IN C PROGRAh1
Appendix C
Estimation Results Based on C
Program
We apply some models proposed in Chapter 3 by providing corresponding user input into
C program described in Appendix B. Of those models, we choose model (4a)&(4b), model
(4c), model (5a), and model (5b) for illustration. In addition to the model specification, we
also provide the estimates for hospital admission withoutlwith event-duration adjustment
so that we are able to compare the estimating results with those ones obtained using R.
Of those models just mentioned, there are some problems obtaining estimates for hos-
pital status due to sensitivity to selection of initial values. Therefore, we only provide
estimates under model (5a) for hospital status.
The estimating results are provided in Table C.l and Table C.3 for hospital admission
withoutlwith event-duration adjustment, Table C.3 for hospital status, and Table C.4 for
hospital duration. Estimates of robust standard error are displayed in brackets for hospital
status and hospital duration, but not for hospital admissions.
APPENDIX C. ESTIAlATIOIV RESULTS BASED ON C PROGRAM
APPENDIX C. ESTIMATION RESULTS BASED ON C PROGRAM
cale
ndar
3
std
.ag
e ly
mph
onla
C
NS
o
ther
ki
dney
ho
ne
rarc
inoi
na
gend
er
, I
Tab
le C
.3:
Est
imat
ed r
esul
ts o
f ho
spit
al s
tatu
s ba
sed
on C
pro
gram
for
Mod
el (
5b).
i.e
., T
he
esti
mat
es a
re o
btai
ned
tlir
ough
th
e sp
ecif
ic u
ser
inpu
t of
Res
pons
e T
ype
= 2
; M
odel
Typ
e =
4;
Str
atu
m =
1,
2, 3
, 4,
5;
and
Adj
usti
nent
=
0;
wit
h co
rres
pond
ing
Dim
ensi
on a
nd S
trat
um.
Est
imat
es o
f ro
bu
st s
tan
dard
err
or
are
disp
laye
d in
bra
cket
s.
Str
atu
ni
cale
i~d
ar 2
1 (
enu
ml)
2
(en
tun
2)
3 (e
nu
m:~
) 4
(e1~
1111
5)
5 (e
1111
111'7
+)
-0.2
63 (
0.31
7)
0.20
7 (0
.573
) 0.
954
(1.3
55
) 0.
215
(0.7
58)
-0.5
67 (
0.75
9)
APPENDIX C. ESTIAIATION RESULTS BASED ON C PROGRAhil
t ^ m b 3 O m t - w 3 o O m T r m m t - a b 0 7 - 3 ??9??9?9'?" o q c q c q o c q o
Bibliography
Andersen, P.K. and Gill, R.D. (1982). 'Cox's regression model for counting processess: a
large sample study'. Annals of Statistics 10:1100-1120.
Cleveland, W.S. (1979). 'Robust Locally Weighted Regression and Smoothing Scatterplots7.
Journal of the Amer ican Statistical Association 74:829-836.
Cleveland, W.S. (1981). 'LOWESS: A program for smoothing scatterplots by robust locally
weighted regression'. T h e American Statistican 35:54.
Cox, D.R. (1972). 'Regression Models and Life-Tables'. Journal of the Royal Statistical
Society. Ser ies B (Methodological) 34, No. 2:187-220.
Cox, D.R. (1975). 'Partial Likelihood'. Biometrika 62, No. 2:269-276.
Finkelstein, D.M., Schoenfeld, D.A., and Stamenovic, E. (1997). 'Analysis of multiple failure
time data from an AIDS clinical trial7. Statistics in Medicine 16:951-961.
Hewitt, M., Weiner, S.L., and Simone, J.V. (2003). Childhood Cancer Suruivorship: I m -
proving Care and Quality of Life. The National Academies Press, Washington, D.C.
Hu, X.J., and Lagakos, S.W. (2006). 'Nonparametric estimation of the mean function of a
stochastic process with missing observations (Revision for publication)' .
Hu, X.J., Sun, J., and Wei, L.J. (2003). 'Regression Parameter Estimation from Panel
Counts'. Scandinavian Journal of Statistics 30:25-43.
Kelly, P.J. and Lim, L.L-Y (2000). 'Survival analysis for recurrent event data: an application
to childhood infectious diseases'. Statistics in Medicine 19:13-33.
BIBLIOGRAPHY 77
Klein, J.P. and Rloeschberger, ALL. (2003). Sur.uiua1 Analysis: Techniques for Ceizsored
and Truncated Data. Springer (Statistics for Biology and Health), New York.
Lawless, J.F. (2002). Statistical Models and Methods for Lifetime Data. John JViley & Sons
Canada (IViley series in probability and statistics).
Lee, E.JV., Wei, L.J. and Amato, D.A. (1992). 'Cox-type regression analysis for large
numbers of small groups of correlated failure time observations'. Survival -4nalysis: State
of the Art (Phaenomenologica Series) 39:237-247.
Lin, D.Y. and Wei, L.J. (1989). 'The Robust Inference for the Cox Proportional Hazards
Model'. Journal of the American Statistical Association 84:1074-1078.
Lin, D.Y. and Ying, Z. (1994). 'Semiparametric Analysis of the Additix~e Risk Model'.
Biometrika 81:61-71.
Metcalfe, C., Thompson, S.G., and White, I.R. (2005). 'Analyzing the duration of recurrent
events in clinical trials: A comparison of approaches using data from the UK700 trial of
psychiatric cases management'. Contemporary Clinical Trials 26:443-458.
Prentice, R.L., Williams, B.J., and Peterson, A.V. (1981). 'On the regression analysis of
multivariate failure time data'. Biometrika 68:373-379.
Therneau, T.M. and Grambsch P.M. (2000). Modeling Suruiual Data: Extending the Cox
Model. Springer (Statistics for Biology and Health), New York.
Wei, L.J. and Glidden, D.V. (1997). 'An overview of statistical methods for multiple failure
time data in clinical trials'. Statistics in Medicine 16:833-839.
Wei, L. J., Lin, D.Y. and Weissfeld, L. (1989). 'Regression analysis of multivariate incomplete
failure time data by modeling marginal distributions'. Journal of the American Statistical
Association 84: 1065-1073.