GENERALIZED LONGITUDINAL DATA ANALYSIS, WITH …summit.sfu.ca/system/files/iritems1/3196/etd2389.pdf · 2021. 1. 23. · 4 Analysis of Generalized Longitudinal Data 4.1 Analysis of

GENERALIZED LONGITUDINAL DATA ANALYSIS, WITH

APPLICATION TO EVALUATING HOSPITAL UTILIZATION BASED

ON ADMINISTRATIVE DATABASE

Shih-Wa Celes Ying

B.Sc. in Statistics and Actuarial Science, Simon Fraser University, 2004

A P R O J E C T SUBMITTED IN PARTIAL FULFILLMENT

O F T H E REQUIREMENTS F O R T H E DEGREE O F

MASTER OF SCIENCE

in the School

of

Statistics and Actuarial Science

@ Shih-Wa Celes Ying 2006

SIMON FRASER UNIVERSITY

Summer 2006

All rights reserved. This work may not be

reproduced in whole or in part, by photocopy

or other means, without the permission of the author.

APPROVAL

Name: Shih-Wa Celes Ying

Degree: Master of Science

Title of project: Generalized Longitudinal Data Analysis, with Application

to Evaluating Hospital Utilization based on Administrative

Database

Examining Committee: Dr. Richard Lockhart

Chair

Dr. X. Joan Hu Senior Supervisor Simon Fraser University

Dr. Rachel Altman Supervisor Simon F'raser University

Dr. John Spinelli External Examiner BC Cancer Agency and Simon F'raser University

Date Approved: 3,14 I$. 1006

SIMON FRASER UN~~RSITYI i bra ry

DECLARATION OF PARTIAL COPYRIGHT LICENCE

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library Burnaby, BC, Canada

SIMON FRASER U N ~ ~ E R ~ I W ~ i bra ry

STATEMENT OF ETHICS APPROVAL

The author, whose name appears on the title page of this work, has obtained, for the research described in this work, either:

(a) Human research ethics approval from the Simon Fraser University Office of Research Ethics,

(b) Advance approval of the animal care protocol from the University Animal Care Committee of Simon Fraser University;

or has conducted the research

(c) as a co-investigator, in a research project approved in advance,

(d) as a member of a course approved in advance for minimal risk human research, by the Office of Research Ethics.

A copy of the approval letter has been tiled at the Theses Ofice of the University Library at the time of submission of this thesis or project.

The original application for approval and letter of approval are filed with the relevant offices. Inquiries may be directed to those authorities.

Simon Fraser University Library Burnaby, BC, Canada

Abstract

There are many practical situations where subjects can experience recurrence of an event,

the event has non-negligible duration, and both the rate of the event occurrences and the

accumulative event duration are of particular interest. Well-developed methods for recurrent

events analysis do not take into account the event duration, which could lead to undesirable

inferences in the situations. Motivated partly by the research project with BC Cancer

Agency to evaluate the hospital utilization of young cancer survivors, we develop a method

to analyze recurrent event data with adjustment for event duration. Our methodology

can be viewed as an extension of the well-established approaches for recurrent events. We

also propose an approach to fitting semiparametric models for a general response process,

which includes counting process as a special case. Data from the cancer project are used

throughout the thesis to illustrate our formulation and approaches.

Keywords: counting process, data collected over time, event-duration, semiparametric

regression model

To my father and my mother

Acknowledgements

It is never enough for me to thank my Supervisor, X. Joan Hu, who guided me through the

program with patience and encouragement, helped me build up my confidence and reach for

high standards. lATithout her time devoted to it, this thesis would not have been completed.

I would like t o acknowledge the insightful comments and suggestions made by my thesis

committee Rachel Altman and John Spinelli.

It brings me great pleasure to record my thanks to John Spinelli and hlary h1cBride for

providing me such a nice opportunity to work at BC Cancer Agency under their supervi-

sion, which partly motivated my research interest and made the completeness of my thesis

possible. Special thanks go to Gurbakhshash Singh and Maria Lorenzi, who helped me and

gave me excellent advice along the way.

I am indebted to Professors Charmaine Dean, Richard Lockhart, and Tim Swartz, who

have taught me statistics in different fields. I express my appreciation to my fellow graduate

students Lihui, Pritam, Jean, Eric, Darcy, Suman, Amy, Farouk, and Darby, who have been

helpful and kind to me through my graduate study in Simon Fraser University.

To my friends, Chunfang Lin, Lucy Liu, Lisa Wang, and Lijing Xie, thank you for

listening to me, sharing with me, and being with me. My life could not be so colorful

without you.

Finally, I am grateful to my family for care, support and encouragement. To my par-

ents, I cannot thank enough for enlightening my life. To my brother, Leon, thank you for

supporting me and making me happier.

Contents

Approval

Abstract iii

Dedication

Acknowledgements

Contents

List of Figures

List of Tables

1 Introduction

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Childhood Cancer Survivorship Project

2.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 Childhood Cancer Hospitalization Data . . . . . . . . . . . . . . . . . . . .

2.2.1 Potential Risk Factors . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.2 Data Formulation and Construction . . . . . . . . . . . . . . . . . .

3 Analysis of Recurrent Events

3.1 Review of Cox Proportional Hazards Model . . . . . . . . . . . . . . . . . . 3.2 Extensions of Cox PH Model for Recurrent Events . . . . . . . . . . . . . .

3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.2 Various I\lodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.3 Estiination Procedures . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.4 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3 Analysis of Childhood Cancer Data I . . . . . . . . . . . . . . . . . . . . . .

3.3.1 Time-to-Event Formulation . . . . . . . . . . . . . . . . . . . . . . .

3.3.2 Analysis Results: Hospital Admissions . . . . . . . . . . . . . . . . .

3.3.3 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.4 Extensions of Cox PH Model for Frequency of Events Adjusted for Event

Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.4.1 Adjustment for Event Duration I . . . . . . . . . . . . . . . . . . . .

3.4.2 Adjustment for Event Duration I1 . . . . . . . . . . . . . . . . . . .

3.5 Analysis of Childhood Cancer Data I1 . . . . . . . . . . . . . . . . . . . . .

3.5.1 Comparison of Analysis Results: Recurrent Events versus Event-

Duration Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Analysis of Generalized Longitudinal Data

4.1 Analysis of Hospital Duration Using Counting Process Formulation . . . . .

. . . . . . . . . . . . . . . . . 4.1.1 R Data Format for Hospital Duration

. . . 4.1.2 Analysis Results for Day in Hospital Using R Built-in F'unction

4.2 General Response Processes . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.2.1 Methodology Inheritance . . . . . . . . . . . . . . . . . . . . . . . .

4.3 Analysis of Childhood Cancer Data 111 . . . . . . . . . . . . . . . . . . . . .

4.3.1 Analysis Results: Prevalence of Hospitalization . . . . . . . . . . . .

4.3.2 Analysis Results: Hospital Duration . . . . . . . . . . . . . . . . . .

4.4 Nonparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4.1 Estimation Based on Observed Sample Mean (OSM) . . . . . . . . .

4.4.2 Analysis Results Based on OSM for Childhood Cancer Data . . . . .

5 Final Remarks

Appendices

vii

A Estimation Results Using R Built-in Function

B Data Input in C Program

C Estimation Results Based on C Program

Bibliography

List of Figures

. . . . . . . . 1.1 Response Processes with Discrete Time and Continuous Time 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 hlodel Flow-Chart 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Strata Flow-Chart 23

3.3 Estimates Based on Andersen-Gill Model vs . Nelson-Aalen Estimates . . . 32

3.4 Estimates Based on Prentice-~Tilliams-Peterson Model vs . Nelson-Aalen Es-

timates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Hypothetic Examples 48

4.2 Observed Sample Mean (OSM) with Three Types of Response of Interest . 55

. . . . . . . . . . . . . . . . . 5.1 Graphical Display of Multi-State Transitions 59

List of Tables

. . . . . . . . . . . . . . 2.1 Cross-Classification of Gender by Diagnosis Period

. . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Frequency of Cancer Diagnoses

3.1 Original Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2 Time-to-Event Interval Format . . . . . . . . . . . . . . . . . . . . . . . . .

3.3 Estimates of P in hIodel (4) and Its Special Cases . . . . . . . . . . . . . . .

3.4 Estimates of P in Model (5) and Its Special Cases . . . . . . . . . . . . . . .

3.5 Time-to-Event Interval Format Adjusted for Event Duration . . . . . . . .

3.6 Estimates of B in Model (4) and Its Special Cases with Additional Time-

Dependent Covariate under Event-Duration Adjustment . . . . . . . . . . .

3.7 Estimates of P in hlodel (5) and Its Special Cases with Additional Time-

Dependent Covariate under Event-Duration Adjustment . . . . . . . . . . .

3.8 Estimates of P in hlodel (4) and Its Special Cases under Event-Duration

Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.9 Estimates of P in Model (5) and Its Special Cases under Event-Duration

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adjustment

4.1 Hypothetic Data for an Arbitrary Individual . . . . . . . . . . . . . . . . .

4.2 Time-to-Event Interval Format for Hospital Duration . . . . . . . . . . . . .

4.3 Definition of Stratification to Five Strata . . . . . . . . . . . . . . . . . . .

4.4 Estimates of /3 for Hospital Duration Based on Day in Hospital . . . . . . .

A. l Estimates of P for Hospital Admissions without Event-Duration Adjustment:

Model (4) Related Estimates with Robust Standard Errors . . . . . . . . .

A.2 Estimates of P for Hospital Admissions without Event-Duration Adjustment:

hlodel (5) Related Estimates with Robust Standard Errors . . . . . . . . .

A.3 Estiinates of B for Hospital Admissions with Additional Time-Dependent

Covariate under Event-Duration Adjustment: AIodel (4) Related Estimates

with Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . .

A.4 Estimates of B for Hospital Admissions with Additional Time-Dependent

Covariate under Event-Duration Adjustment: hlodel (5) Related Estimates

with Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . . .

A.5 Estimates of B for Hospital Admissions under Event-Duration Adjustment:

hlodel (4) Related Estimates with Robust Standard Errors . . . . . . . . .

A.6 Estimates of /3 for Hospital Admissions under Event-Duration Adjustment:

Model (5) Related Estiinates with Robust Standard Errors . . . . . . . . .

A.7 Estimates of 3 for Hospital Duration Based on Day i n Hospital with Esti-

mated Robust Standard Errors . . . . . . . . . . . . . . . . . . . . . . . . .

B.l Input hlodel Specification and Response Process in C Program . . . . . . .

C.l C Program Output for P Estimates with Several Models . . . . . . . . . . .

C.2 C Program Output for /3 Estimates with Several Models under Event-Duration

Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

C.3 C Program Output for P Estimates in Model (5a) for Hospital Status . . . C.4 C Program Output for j3 Estimates with Several Models for Hospital Duration 75

Chapter 1

Introduction

1.1 Motivation

Health services play a crucial role in people's everyday life, and it is important to evaluate

health utilization, such as hospitalization, especially for those who are diagnosed with serious

diseases, say cancer. Risk factor identification for health services in a research project could

eventually be translated and delivered to policymakers and care providers for improvement

of the nation's quality of life.

The well-known Cox proportional hazards model (Cox, 1972) can be directly applied

to study risk factors to time to the first hospitalization. To analyze the recurrent hospital

admissions, we can apply well-developed methods for recurrent events. Particularly, Ander-

sen and Gill (1982) assume the event counts overtime follow the Poisson assumption and

propose an estimation approach accordingly. Whereas, Prentice, Williams and Peterson

(1981) consider a stratified proportional intensity type of model, which allows us to study

covariate effects within each stratum as well as the overall effects. Other intermediate mod-

els are considered by combining the simplicity of Andersen and Gill (AG) model and the

stratified property of Prentice, Williams and Peterson (PWP) model to further investigate

some scientific questions of interest.

Those conventional methods for recurrent event data analysis deal with "point" events,

that is, an event only happens at some time point, and the subject is back to be at risk for

CHAPTER 1. INTRODUCTION 2

the nest occurrence of the event right after the time point. This situation is very colnmon

for recurrent elrents such as relapses of asthma attacks, which occur instantaneously with

negligible length of time. However, it is not the case for, for esample, hospital admissions

of childhood cancer patients. Complex treatments and careful surveillance are needed after

patients admitted into hospital, hence patients usually stay in hospital for a period of time.

It is impossible for a patient to have another hospital admission during the time he/she is

in hospital. For this type of event, conventional methods need to be modified to take event

duration into account.

In addition to the use of hospital admissions as an evaluation of health utilization,

other quantities such as prevalence of hospitalization and hospital duration are appealing

to investigators as well. We therefore extend the response process from a counting process

to a general response process to gain the flexibility and possibility of evaluating health

utilization from different aspects.

This thesis reviews available methods in recurrent events analysis, and presents an

extension with adjustment for event duration. In addition, we consider different types of

response, i.e., generalized longitudinal data, not restricting ourselves to count data.

1.2 General Framework

Notation we use throughout the rest of the thesis is briefly introduced here. We denote the

response process as { N ( t ) : 0 < t < oa) to emphasize the events of interest are counts, where

N ( t ) is the cumulative number of failureslevents prior to time t for an arbitrary subject. We

later extend the response process t o a general process and denote it as { X ( t ) : 0 < t < oo),

which includes counting process as a special case. The following gives some special cases

as examples for { X ( t ) : t > 0). A graphical illustration for these response processes is

displayed in Figure 1.1.

Example 1. Survival Process of Time to the First Hospital Admission: Let T > 0 be

the time to an event of interest (i.e., the first hospital admission) and define the response

CHAPTER 1. INTRODUCTION

process as X (t) = I (T 5 tj. That is,

Figure 1.1 (a) continuous-time shows the survival process.

Example 2. Counting Process as Accumulative Counts of Hospitalization Over Time:

Let the event of interest be hospital admission and define the response process, X( t ) , be the

number of hospital admissions up to time t. Figure 1.1 (b) illustrates the counting process.

Example 3. Alternating Binary Process as an Indicator of Hospital Status Over Time:

Using two states t o capture hospitalization status of an individual (i.e., whether the patient

is in hospital or not), define the response process as an indicator of patient's hospital status,

X( t ) = I ( in hospital). That is,

1 if the patient is in hospital, X( t ) =

( 0 otherwise.

See Figure 1.1 (c) for an illustration of the alternating binary process.

Example 4. Response Process as Cumulative Hospital Duration: No matter what time

scale we use, the discrete- or continuous-time, cumulating the alternating binary process

over time gives the response process for hospital duration. Figure 1.1 (d) shows the response

process as for hospital duration over time.

A censoring indicator, Y(t) = I ( C 2 t) , where C is the censoring time, is introduced to

indicate whether the subject drops off the study or not over his/her follow-up period.

Suppose there are p factors of interest, let Z(t) = (Zl(t), . . , Zp(t))' denote a vector

of covariates available a t time t > 0. The corresponding covariate process up to time t

is denoted by {Z(t) : t > 0). We model the intensity function, the instantaneous rate of

an event given the covariate process up to date and response history, and estimate the

parameters in the models to make inferences and to address scientific questions of interest.



1.3 Overview

Adjustment for event duration and the extension of response process are motivated partly

by the research project with BC Cancer Agency to evaluate the hospital utilization of child-

hood cancer survivors. This thesis studies data collected over time with particular attention

centered on estimation of covariate effects. It is organized as follows Chapter 2 provides the

background of childhood cancer data and available hospital records based on an on-going

research project from BC Cancer Agency. This data will be analyzed for illustration using

the approaches proposed in the thesis as well as the well-established approaches. Chapter

3 reviews Cox proportional hazards model, counting process formulation, and multiplica-

tive risk models that have been proposed in the literature to address recurrent events.

Extensions of existing Cox regression models are considered and estimation procedures of

the parameters as well as variance estimation are then provided. Chapter 4 extends the

approaches for the counting process t o a general response process. Inheritance and modifi-

cation of the methodology from Chapter 3 are presented. Chapter 5 gives conclusions and

provides remarks on further investigations.

Chapter 2

Childhood Cancer Survivorship

Project

Although historical records show that only 1% of all cancers are diagnosed in the age range

0 to 19 years in Canada, childhood cancer is among top three leading causes of death for

children of age 1 to 14. Recently, successful improvements in treatment have increased

survival to almost 80%, compared to mid 90's when there is only a little hope of being

cured (Hewitt, Weiner and Simone, 2003). As a consequence, the population of childhood

cancer survivors has grown dramatically. The primary goal of a health services research

project with BC Cancer Agency is to address issues related to assessment of long-term

resource needs and development of strategies to improve access and effectiveness of care.

To accomplish this, statistical methods are required to analyze health services utilization

for the young cancer survivors.

2.1 Data Sources

Among the growing population of cancer survivors, it is important to describe prevalence

and patterns of health services utilization as well as the relative risk of hospitalization

among several factors.

The data sources are based on the availability of population-based files of vital events,

CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT 7

cancer, and health care utilization in Canada. The analyses of health risks and utilization

are through the use of linked health files that are originally from the BC Vital Statistics

Agency, BC Cancer Registry, and BC Ministry of Health. These source files are administered

by the Linked Health Database of the Center for Health Services and Policy Research

(CHSPR) at University of British Columbia.

An unselected geographically-defined patient group, a virtually complete set of treat-

ment data, and individual-level utilization data are available in sufficient level of complete-

ness, detail and reliability t o examine health risks and utilization. Approximately 90% of

health services are covered by provincial health insurance (Health Insurance BC, or Ned-

ical Services Plan), which includes all medically required services for each individual from

1986 to the present. Because of the person-based, longitudinal nature of the data files, ad-

ministrative datasets which will be used here contain more information than studies using

self-reports or medical records on severe late effects and health services utilization.

2.2 Childhood Cancer Hospitalization Data

Residents in the province of British Columbia, who were diagnosed with cancer at age 0

to 19 between 1970 and 1995 and survived a t least 5 years after diagnosis are included in

the childhood cancer survivor cohort. Childhood survivors who survived at least five years

post diagnosis are included in the study to evaluate their long-term resource needs, as this

group is at a high risk of complications of their cancer and its treatment.

Since the database for hospitalization records existed only from 1986 onwards, we con-

sider childhood cancer survivors who were diagnosed with cancer from 1981 to 1995 in the

study in order t o fully assess their health services information from their 5-year survival

date. The resulting population is the study group for analysis of hospital utilization which

consists of 1375 childhood cancer survivors.

2.2.1 Potential Risk Factors

Several potential factors are associated with health services utilization. They include disease

factors such as initial diagnosis and treatment, demographic factors such as gender, social

CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT 8

economic status (SES) and time since diagnosis, and location of services (geographic region,

urban/rural). Of those potential factors. four covariates will be used in the thesis for

illustrative purposes, which are cancer diagnosis. age. gender, and diagnosis period.

The cancer diagnosis group is defined according to the International Classification of

Childhood Cancer (ICCC), which is based on the third edition of the International Clas-

sification of Diseases for Oncology (ICD-0-3) published in 2000. There are seven groups

of interest which include leukemia (ICCC I), lymphoma(1CCC II), central nervous system

(CNS: ICCC 111), kidney (ICCC IT) , bone (ICCC VIII or IdY), carcinoma (ICCC X or

.IWI), and others (ICCC IV7 I/: 1111. and XII).

The individual's age at diagnosis is calculated as (diagnosis date - birth date)/365.25

and standardized by subtracting the mean, 9.51 years old at time of diagnosis with cancer,

in the following analysis. For gender, we select female as our baseline which indicates the

coding scheme is 0 for female (baseline) and 1 for male for computational purposes.

Diagnosis period is the calendar period of diagnosis. Note that the whole group con-

tains survivors diagnosed with cancer between years 1981 and 1995 inclusively. We further

categorize the diagnosis period into 3 categories: (1981, l985), (1986, l99O), (1991, l995),

and choose the earliest calendar period as the baseline.

The information for these covariates of interest for a total of 1375 study subjects is

stored in the data as the following form:

Study. I D i ccc s t d . age gender calendar

xxx0 1 3 2.144 0 3

xxxO2 10 10.243 1 1

xxxO3 4 8.806 1 1

where variables (from left to right) correspond to unique identification number of childhood

survivors, diagnosis group of cancer, standardized age at diagnosis, gender, and diagnosis

period respectively.

CHAPTER 2. CHILDHOOD CANCER SURVIVORSHIP PROJECT

Table 2.1: Cross-Classification of Gender by Diagnosis Period Diagnosis Period (calendar)

(1981, 1985) (1986, 1990) (1991. 1995) Total

(gender) 214 Total 376 463 536 1375

Table 2.2: Frequency of Cancer Diagnoses Cancer Diagnosis I Leukemia Carcinonla Lvniphon~a CNS Kidney Bone Other 1 Total

Freuuenr~ 1 337 237 230 259 71 l4b 95 I 1375

2.2.2 Data Formulation and Construction

In cohort studies, the follow-up time is essential for the analysis. In the analysis of hospital

utilization, the study period for the group is defined as follows: the start date of the

study is the time measured starting from 5 years after initial diagnosis. All hospitalization

information, such as admission and discharge dates, is available for each individual during

the follow-up until the end of the study (December 31, 2000), death date, or cancellation

of BC Insurance registry, whichever is the earliest.

Due to the possibility of having more than one hospital admission and hence more than

one discharge from hospital over the follow-up period for each individual, each subject has

multiple rows in the data set if necessary to capture his/her dates of hospital admission

along with discharge dates. The data format is as follows:

Study.ID day.entry day.exit days.adm days.re1

xxx0 1 0 151 N A N A

xxxO2 0 915 N A N A

xxxO3 0 12 10 78 100

xxxO3 0 1210 254 257

xxxO3 0 1210 288 3 14

where the variables (from left to right) capture unique identification number of childhood

survivors, beginning of study period, end of study period, admission time, and discharge

time respectively. We apply a Julian date format and set the internal clock for each subject

so that day. entry always starts a t zero. Other time-dependent information is shifted

CHAPTER 2. CHILDHOOD CANCER SUR171T/ORSHIP PROJECT

correspondingly.

By coml~ining the hospital service utilization database with disease and demographic

information, we obtain con~plete information of long term outcome data and the risk factors

that meet the purpose of study for analysis of hospital utilization for childhood cancer

survivors.

Chapter 3

Analysis of Recurrent Events

In this chapter, we begin with a brief review of Cox proportional hazards (PH) model (Cox,

1972). We then study extensions of the model for dealing with recurrent event data, often

referred to as Cox's regression models, some of which have been discussed in the literature

and some have not. Procedures for estimating the parameters in the models are presented.

Motivated by the consideration to address hospital stay in the BC Cancer research project,

we propose approaches to analyzing recurrent events with adjustment for event duration.

Analyses of the data described in Chapter 2 are given as an illustration for the models and

for applying the estimation procedures.

3.1 Review of Cox Proportional Hazards Model

Assume there are 12 subjects in a study and we are interested in the effects of several

covariates to the survival of these subjects. Let Ti be the time from study entry to the

event of interest for subject i , Ci be the censoring time of the event for subject i , and Ui be

the corresponding observation time, that is, Ui = min(Ti, Ci). Let Si = I(T, 5 Ci) be the

indicator variable that takes the value 1 if we observe the lifetime T and 0 if not. Denote

h(tlZ) as the hazard function of T conditional on covariate Z, and suppose

CHAPTER 3. ANALI'SIS OF RECURRENT EVENTS 12

where the nonnegative function ho(t) is the baseline hazard function left unspecified, and

p is a vector of unknown parameters or coefficients corresponding to 2. Based on the

niultiplicative hazard model, the factorization implies

This indicates that the model assumes hazards of two subjects with fixed covariate vectors

are proportional over time. A common choice for g(Z; B ) is e'" and it yields the hazard

function h(tlZ) = ho(t) . e3", which is Cox's proportional hazards model.

Assume n iid observations, say (ui, ai, Zi), i = 1,2 , . . . , n . The likelihood function can

be written as

which can be decomposed as the product of L1 (13) and L2 (P, hO). Cox (1972) proposes to

make inferences about /3 based on the partial likelihood Ll(p) , which is not a likelihood in

the usual sense (Klein and Moeschberger, 2003). The partial likelihood suggested by Cox

(1975) has the following form:

where the risk set, R(ui) = { j : uj > ui}, is the set of subjects that are alive and at risk

just prior to time ui, and Zj(ui) is observed on [0, ui] for subject j . It can be shown that

the resulting estimator for 0 is consistent and asymptotically normal.

The logarithm of the above partial likelihood has the form

Let be the value that maximize the partial likelihood function, Breslow (1972) suggests

an estimator

for the underlying cumulative hazard Ho(t) = J,' ho(s)ds.

CHAPTER 3. ANALYSIS OF RECURRENT EVENTS 13

3.2 Extensions of Cox PH Model for Recurrent Events

In studies of survivorship of a certain disease, the observed outcome of interest occurs

only once during the study period. illultiple event data occurs in the situation where

each subject can experience more than one event, such as the relapses of a disease like

repeated asthma attacks or occurrences of tumors. Several Cox regression models have

been proposed to handle multiple event data. Of particular interests are Andersen and Gill

(1982), and Prentice, LVilliams and Peterson (1981). We here present a general form of the

Cox regression models and relate it to the models discussed in the literature.

3.2.1 Notation

We first provide two sets of notation, one is under the usual survival analysis setting and the

other is under the counting process setting, which is commonly used in the recent literature

for recurrent events.

Notation based on Survival Analysis Setting

We generalize the notation used for Cox PH models (Cox, 1972) with one more subscript

to capture multiple events. Let Tik be the total time of the kth event for the ith subject,

Cik be the censoring time of the kth event for the ith subject. Define Uik be the observation

time, that is, Uik = min(Tik, Cik), and 6il, = I(Tik 5 Cik) is an indicator of observed kth

failure time for subject i. Zik = (Zlik,. . . , Zpik)l is the covariate vector for the ith subject

with respect to the kth event, and 2, = (Z;,, . , Z&) denotes the covariate vector for the

ith subject, where h7 is the maximum number of events within a subject. = (PI, .. . , Pp)l is a p x 1 vector of unknown parameters. Denote hk(tlZi(t)) as the hazard function for the

kth event of the ith subject at time t.

In general, the hazard function at time t for a subject is defined as the instantaneous

probability of failure at time t given the survivorship prior to time t and the covariates:

Note that Cox PH model for the kth event time Tk is hk(t (Zi (t)) = ho,k (t) exp{PIZi(t)).

CHAPTER 3. AArALl-SIS OF RECURRENT EVENTS

Nota t ion based o n Count ing Process Setting

Let ,Y1 ( t ) be the cumulative number of failures for subject i prior to time t and N, ( t ) =

{&(u) : 0 < u 5 t ) is the path up to t of the counting process for subject I . It is important

to notice that information given by N z ( t ) is equivalent to the random failure times 0 <

Tzl < Tz2 < . . . < TzAr,(t) 5 t . Let Z,( t ) = {ZZl ( t ) . . . . . Zzp( t ) ) denote a vector of covariates,

for subject i , available at time t 2 0. Denote Z,( t ) = {Z,(u) : u 5 t ) as the corresponding

covariate process up to time t . Let AN, ( t ) be the number of failures over the small interval

[t , t + A t ) . The intensity function a t time t is defined to be the instantaneous rate of failure

at time t given the covariate and counting process up to time t

X(tlNi(t-), Z i ( t ) ) = lim Pr( t 5 Ti,lL;(t-)+l < t + AtlNi( t -) , Z i ( t ) ) / A t nt-o

= nt-o lim Pr(AhTi ( t ) = 1 (N i ( t - ) , Zi ( t ) ) / A t . (2)

There are certain relationships between the hazard function under survival definition (1) and

the intensity function under counting process definition (2) . However, the interpretation

and aspect depart from each other especially when dealing with multiple events. In order to

show the equivalence we need not only to restrict on the order of the multiple events such

that Til < Tiz < . . . < Tih. < t from the survival setting but also make the assumption that

only the most recent history information under counting process setting is required. That

is,

As a result, we use intensity function for the following modeling mechanism in the counting

process setting.

3.2.2 Various Models

Suppose there are n study subjects and the data we observed can be expressed as a set of iid

realizations of response and censoring processes and covariates, that is, { (N i ( . ) , Zi(.) , C i ) :

i = 1,2, . . . , n,) where Ni ( t ) and Zi ( t ) are available for t 5 Ci. Assume conditional indepen-

dence between the counting process and censoring given the covariates, i.e., N ( . ) I C given


Z ( . ) . Define the at-risk indicator E:(t) = I(Ci > t ) , where I ( A ) = 1 if A is true and zero

otherwise.

For simplicity, we shall drop the subscript i for an arbitrary individual. Let Z ( t ) =

{N(u) : 0 < u < t ) U { Z ( u ) : 0 < u 5 t ) be the history information just before time t . We

model the intensity function (2) as

where S ( t ) is a function of X ( t ) . A special case for S ( t ) is S ( t ) = N ( t - ) , the number of

preceding events. The intensity function (3) is the product of an arbitrary function of time

and an exponential function of covariates, hence it can be viewed as an extension of Cox

proportional hazards model (Cox, 1972).

There are several models that can be postulated from (3) by permitting the shape of

the intensity function to depend on, say, the number of preceding failures and possibly on

other characteristics of { N ( t ) , Z ( t ) ) . This suggests two semiparametric specifications of the

intensity function (3) , namely

where P(t) in (4) is known up to a finite number of parameters. For example, P(t) =

Po + P1 In t . Similarly, Pk(t) = POk + Plk In t in (5) .

Model (4) specifies Xo(t; S ( t ) ) in (3) into a proportional form and P(t; S ( t ) ) in (3) is

specified into a linear function of S ( t ) . The summary of the history information S ( t ) could

be any random process under these models. It is possible to assess how the baseline function

depends on S ( t ) and how the effect of the covariate Z ( t ) depends on S ( t ) under the models.

Moreover, all of the unknowns, the baseline cumulative intensity function and parameters,

can be estimated, similarly to estimating the unknowns in the Cox proportional hazards

model described in section 3.1.

The contribution of S ( t ) to the diverse shapes of intensity function is by viewing it

as a covariate into model (4). It is done differently under model (5) by using S ( t ) as a


stratification variable. To make it clear, model (5) can be written in detail as follow:

Xol (t) esp {$l(t)lZ(t)) if S(t) E A1

X02(t) exp {P'~(t)lZ(t)) if S(t) E A2

The discreteness of the stratification variable S( t ) allows us to estimate stratum-specific

baseline intensity, Xo(t; S( t ) ) = XOk(t) if S( t ) E Ak, as well as stratum-specific regression

coefficients, P(t; S( t )) = Pk(t) if S( t) E Ak , based on information related to that particular

stratum only. In addition, the arbitrary stratum-specific baseline intensities and regression

coefficients are left unspecified regarding the effect of S(t) to the intensity function under

model (5).

Now consider some special cases under model (4) by successive restrictions. First, we

assume the covariate effect does not depend on S( t ) by setting y = 0 in (4), and the resulting

model has the intensity function

We can further consider time-independent covariate effect by assuming P(t) = P, that is,

the covariate effect does not change over time. The reduced model has the following form

A semi-parametric intensity function proposed by Andersen and Gill (1982), also known as

AG model, has the most restricted form:

This intensity function is obtained by assuming the conditional intensity function is inde-

pendent of the history of the counting process, N(.), which is the reduced model under

(4b) by setting a = 0, and thus the process N(- ) is Poisson. The AG model ( 4 ~ ) assumes

independent increments, that is, the number of failures in non-overlapping time intervals

is independent from each other given the time-invariant covariates. In other words, under


AG model, the risk of recurrent failure for a subject follows the usual proportional hazards

assumption and is unaffected by previous failures that the subject has been experienced.

Similarly, there are some special cases that can be postulated from (5). Under the

assumption of time-independent covariate effects in (5), the resultant model is the well-

known PJVP model, ~roposed by Prentice, Williams and Peterson (1981). This stratified

proportional hazards type of regression model specifies the intensity function as follows,

X(t('Fl(t)) =XOk(t)exp{P~Z(t)) for S( t ) E A k , k = 1 ,2 , ..., ( 5 4

where S( t ) = S{N(t), Z(t), t} is the stratification variable that may change as a function of

time for a given subject. This model not only allows completely arbitrary baseline intensity

functions, XOk(.) > 0, but also permits the regression coefficients to differ across strata

in a time-independent fashion. That is, the PWP model makes an assumption of time-

independent stratum-specific regression coefficients. A special case for the stratification

variable mentioned by Prentice et al. is S( t ) = N(t ) + 1 for which a subject moves to

stratum k immediately following his/her (k - failure and remains there until the kth

failure or until censorship takes place. In this situation, (5a) permits the baseline intensity

functions to depend arbitrarily on the number of preceding failures for the subject, hence

refers to event-specific baseline hazards.

Furthermore, assume the regression coefficients are the same across stratum under the

PWP model (5a), that is, Pk = for b'k = 1,2 , . . ., and then the corresponding reduced

stratified intensity function has the form

X(t l 'Fl( t ) )=XOk(t)exp{~'Z(t)) for S( t ) E A k , k = 1,2, . . .

Now by glancing at the intensity function (5b), a further restriction on the stratum-specific

baseline intensities through the introduction of common baseline intensities, such that

XOk(t) = XO(t) for k = 1,2, . . ., leads to the AG model (4c).

CHAPTER 3. ANALYSIS OF RECURRENT EVENTS

Generalized Cox Proportional Regression Model

X(tlX(t)) = Xo(t: S ( t ) ) exp{P(t; S(t))'Z(t)} (3)

hIodel Type 2

X(tlx(t)) = Xok(t) ex~{3k(t)'z(t)} f o r S ( t ) ~ A ~ . k = 1 . 2 , . . . (5)

PWP hlodel

X(tlx(t)) = Xok(t) exp{pLZ(t)} (5a)

Figure 3.1 : Flow-chart for various models

CHAPTER 3. ANALYSIS OF RECURRENT EVEXTS

3.2.3 Estimation Procedures

We now consider estimation of the parameters and the cumulative baseline function in the

intensity functions specified in the previous section.

Consider the estimation of the regression coefficients, 6 = ( a , 3, ?), in model (4). The

partial score function can be obtained by differentiating the log partial likelihood (Cox,

1975) with respect to 6 and it is

where Wi (u, 6) = afSi(u) + P(u)'Zi (u) + -yfSi(u)Zi (u). Solving PS(4) (6) = 0 gives the

partial likelihood estimator of 6, denoted by 8̂. Further, the Breslow type estimator of the

cumulative baseline function A. (t) = J; Xo (u)du is

where Y,(u) is the at-risk indicator.

Similar to Andersen and Gill (1982), it may be shown that the partial score function is

an unbiased estimating function of the parameter 6 under model (4). In addition, properties

of stochastic processes and martingale theory could be applied to show the consistency of A

the estimators &(.) and 6, aspptotical normality of 8, and weak convergence of &(.).

The estimation procedure for special cases under model (4) can directly apply PS(4) (0)

(6.1) and &(t) (6.2) with little modification of the vector of regression coefficient 6 such

that 6 = (a, - p), 0 = (a, p), and 6 = /3 for model (4a), (4b), and (4c) respectively. The

estimator of cumulative baseline function under model (4c) is the well-known Breslow (1972,

1974) estimator.

Due to the stratum-specific property of model (5) and its special cases, the estimation

procedure only requires slight adjustment from PS(,) (6) (6.1) and Xo (t) (6.2) by introducing

the at-risk indicator subject to the particular stratum. Define Xk(t) = I(Ci > t and

Si(t) E Ak) which is the indicator of whether individual i is at risk of having stratum k

CHAPTER 3. AIVALYSIS OF RECURRENT EVENTS

failures, the partial score function under model (5) has the following form

One can obtain the stratum-specific estimates of the regression coefficient $'k by solving A

the equation PS(5) (pk) = 0 for all k . Substituting Pk into the corresponding estimator of

stratum-specific cumulative baseline function gives

an estimator of the stratum-specific cumulative baseline function Aok(.). Similar to model

(4), the estimators under model (5) enjoy consistency and asymptotic normality, which

could be shown through the application of martingale theory.

For special case (5a). PS(5)(/3k) (7.1) and AOr(t) (7.2) can be directly applied by re-

placing Pk(t) with time-independent stratum-specific regression coefficient Pk. In order

to obtain the overall estimator for each covariate effect under model (5b), we obtain the

following partial score function

Solving the equation PS(50)(P) = 0, we obtain the overall estimator @ under the restricted

PWP model (5b).

3.2.4 Variance Estimation

Functions (6.1), (7.1)) and (7.3) in the previous section can be used to emphasize their

common structure of the Cox partial score function. To unify the construction of robust

variance estimation, we use PS(.), 8, and Zi to denote the partial score function, parameters

of interest, and covariates for subject i respectively.

Borrowing the structure of the Cox partial score function, we have seen that the max-

imum partial likelihood estimate 8̂ is the unique solution to PS(8) = 0. The derivation

C H A P T E R 3. ANALYSIS OF RECURRENT EVENTS 21

of robust variance estimation starts from first order Taylor series expansion on the partial

score funrtion:

Thus

Assuming the limit of the first term on the right hand side exists almost surely, we obtain A

the variance of r3 by finding the variance of above equation

Let X(t) = {N(u) : 0 < u < t} U{Z(u) : 0 < u 5 t} denote the history information just

before time t, and Cr=l ki (t) Zi exp(OtZi)

A(t) = Cy=l Y J (u) e-xp(OrZj) '

where E7(t) is the at-risk indicator subject to censoring. Focusing on Var(PS(Oo)),

where dMi(t) = dNi(t) - exp(OrZi)dAo(t),and d G ( t ) = dNi(t) - exp(8zi)d&(t)

m where = 1 [Zi - ~ ( t ) j ~ , ( t ) d G ( t ) , and C:=l Bi = B.

n

By substituting equation (7.5) into ('7.4), we obtain the variance estimator

It has been shown that this robust variance estimator ('7.6) can be applied to a general

response process (Hu, Sun and Wei, 2003), and it is equivalent to the so-called "sandwich"

estimator presented by Lin and Wei (1989) when the response process is restricted to a

counting process.

CHAPTER 3. ANALYSIS OF R,ECURRENT EVENTS

3.3 Analysis of Childhood Cancer Data I

Recall the childhood cancer data, the study group for analysis of hospital utilization consists

of 1375 individuals meeting the following criteria for inclusion: resident in BC at time of

diagnosis, diagnosed with cancer between 0 and 19 years of age from 1981 to 1995, who

survived at least five years from diagnosis.

3.3.1 Time-to-Event Formulation

To focus on evaluating the relative risk of hospitalization among several factors, we apply

various models described in section 3.2.2 to the data by considering a hospital admission

as an event of interest. We create a data set of interval form as the counting process

style so that we treat the data as time-ordered outcomes (Therneau and Grambsch, 2000).

For illustration, recall that each study subject has the original data format capturing the

hospitalization information such as admission time and discharge time in Table 3.1. The

appropriate data format needed for model fitting is the interval formulation in Table 3.2.

Here start and stop capture the time (in days) from the beginning of study to the event of

interest or end of study. The variable event is an indicator, which is one if an event happens

and zero otherwise. The variable enum indicates the strata that each event is assigned to. It

is defined as N ( t ) + l which is a special stratification variable suggested by Prentice, Williams

and Peterson (1981). However, this definition causes some computational difficulty for the

data and some adjustment are required to determine the summary of history information

S ( t ) used in our model fitting.

Table 3.1: Original Data Format

Study.ID day.entry day.exit days.adm days.re1

xxxo 1 0 c1 t 1 v 1

xxxOl 0 c1 t 2 v2

Table 3.2: Time-to-Event Interval Format Study. ID start stop event enum

xxxo 1 0 t 1 1 1 xxxo 1 t 1 t 2 1 2 xxxo 1 t 2 c1 0 2

CHAPTER 3. ANALYSIS OF RECURRENT ET.'ENTS 23

With the definition of stratification, S(t) = N(t) + 1, all 1375 childhood cancer survivors

were at risk in stratum 1 of whom 561 had a first hospital admission and so entered stratum

2. Of these, 309 were observed to have a second admission and so entered stratum 3.

This process is illustrated in Figure 3.2 and the observation is continued until 1 patient

experienced the thirty-eighth admission.

Figure 3.2: Flow-chart for strata where the stratification scheme used is suggested by Pren- tice, Williams and Peterson. The number of individuals is shown in each stratum

Due to small amount of patients contributed to certain strata which results in inefficiency

of computation (i.e., the iteration method does not converge), we pool admissions in-between

the third and the fifth inclusively into one stratum. Similarly, we group admissions beyond

the fifth into one stratum. That is,

In this data format, we model time to an event which is the time to a hospital admission,

and the multiple rows are contributed through the recurrence of hospital admission.

S( t) = {

With combination of covariate information and interval type of data format, we then fit

the models through the use of statistical software package R1 for our data analysis.

/

1 i f N ( t ) = O o renum=l

2 if N(t) = 1 or enum = 2

3 if N(t) = 2 or enum = 3

1 if N(t) = [3,5] or enum = 4, 5, or 6

5 i f N ( t ) > 5 orenum27.

he R Project for Statistical Computing. http://www.r-project.org

CHAPTER 3. ANALYSIS OF RECURRENT E W T S 24

Due to time-invariant covariates, we apply model (4) with 3( t) = a, model (4b), AG

model (4c), PM7P model (5a), and model (5h) to the childhood cancer data to determine

the relative risks of hospital admission under various factor effects.

3.3.2 Analysis Results: Hospital Admissions

Table 3.3 gives the result of fitting model (4) and its special cases with time-invariant

covariates. Estimates of standard error are shown in brackets (Table A.l in Appendix A

provides corresponding estimates with robust standard error).

The restricted model (4) with P(t) = P has (partial-) likelihood-ratio test (PLRT)

statistics of value 2840 with 54 df. Due to the non-significant treatment by strata interaction

term which has test statistics (PLRT) of value 27.2 with 24 df (p-value = 0.293), the body

of Table 3.3 under model (4) is the estimated coefficients by excluding treatment by strata

interaction term.

The significance of the estimated risk factor of hospital admission by stratum interac-

tion (7) indicates the importance of viewing each hospital admission separately, especially

when its sign and magnitude change at different strata.

This restricted model (4) shows the stratification has a positive contribution of hospital

admission risk for male survivors and patients diagnosed in year 1985 to 1990 (calendar 2)

relative to female and early cancer diagnosis in year 1980 to 1985 (baseline) respectively due

to the positive sign of those estimates across strata. Since we observe the diverse estimated

values of factor by stratum interaction, there is no consistent relationship of risk to hospital

admission associated with factors of interest across strata.

The significance of the stratification effect from models (4a)&(4b) convey similar in-

formation that the number of previous hospital admissions plays an important role to the

relative risk of hospitalization among factors of interest as mentioned previously in the

restricted model (4). In addition, the increasing values of estimated coefficient for strat-

ification variable imply the greater the number of hospitalization, the higher the risk of

hospital admission.

'* indicates the significance at Q. level of 0.05


The estimates under model (4a)&(ib) and AG model (4c) have the same sign but

different magnitude due to the extra stratification variable included in model (4h). Both

models suggest a decreased hospitalization risk associated with later diagnosis period versus

early diagnosis as well as males versus females. Anlong different types of cancer, only bone

and other types of cancer patients have higher relative risk of hospital admission compared

to Leukemia patients. No significant difference in terms of hospital admission risk was found

among different ages at diagnosis.

Table 3.4 gives the result of fitting model (5a) with a separate coefficient for each stratum

and model (5b) with restriction of common coefficient across strata (Table A.2 in Appendix

A provides corresponding estimates with robust standard error).

The coefficient of -0.519 in the second row of stratum 1 indicates an estimated relative

risk for first hospital admission of exp(-0.519) = 0.595, associated with late cancer diagnosis

(calendar 3) versus early cancer diagnosis (baseline).

Proceeding across each column in Table 3.4 under model (5a), one sees the estimated

association between factors versus their baseline according to the second, the third, the

fourth to the sixth, and more than the sixth admission. For instance, the coefficient in

column 3 of model (5a) arises from the comparison of intensity functions among childhood

survivors who have experienced exactly two hospital admissions and who are at the same

total time since the beginning of study period. This comparison gives a significant2 esti-

mated hospital admission risk of exp(-0.684) = 0.505 associated with lymphoma versus

leukemia (baseline) patients.

The estimated gender effect steadily increases from strata 1, 2, 3, and 5. In stratum 4,

there are 232 males and 280 females; 1841232 of males had hospital admission as compared

to 2141280 for female survivors. This could be the reason that the estimated coefficient for

gender changes the sign in stratum 4 and leads to an relative risk of exp(0.168) = 1.183

associated with male versus female survivors.

Due to pooled admissions in strata 4 and 5, we interpret the trend of relative risks

under various factors based on the first three strata. For diagnosis period, these analyses

'* indicates the significance at a level of 0.05

CHAPTER 3. ANALI'SIS OF RECURRENT EVENTS 26

indicate a decreased hospitalization risli associated with late cancer diagnosis versus early

diagnosis as the frequency of admission increases. An increasing risk of hospital admission

for males compares to females as the number of admission increases, however male survivors

have lower risk of hospitalization overall. For different types of cancer, CNS and kidney

have a decreased risk of hospital admission compared to leukemia (baseline) patients as

the frequency of hospital admission increases. However, among these two types of cancer,

only kidney cancer survivors have overall lower relative risk than leukemia survivors. It is

difficult to interpret or draw conclusion for remaining factors of interest since there is no

clear pattern.

Although stratification under PWP model, model (5a), allows us to examine the patterns

of covariate effects across strata, it is difficult to suggest any strategy to health providers

for improvement. For example, Table 3.4 shows that the gender effect is highly significant

in stratum 1 under model (5a), which indicates males have lower risk of hospital admission

compared to females before their first hospital admission. This significance declines as the

number of admissions increases. In addition, the gender effect does not stand out for the

overall effect under model (5b) even though it is significant at alpha level of 0.05. These

numerical results enable us to see such change of the gender effect, but it is unclear to

provide a strategy to prevent male patients from increasing the risk of hospital admission

relative to female patients.

Model comparisons in terms of model fitting arise spontaneously due to various models

we have considered. Based on the structure and assumption of each model, the baseline

model (Mo : YP = 0), AG model (MI: model (4c)), PWP restricted model (M2 : Pk = P for

Yk), and PWP stratum-specific model (M3: model (5a)) are nested. That is,

Mo Ml : AG model (4c) 2 M2 : model (5b) C_ M3 : model (5a).

It is possible to compare M2 and M3 using (partial) likelihood ratio test based on their

partial likelihood functions (PLRT). The (partial-) likelihood-ratio statistic for testing the

null hypothesis that baseline model (Mo) holds against the PWP stratum-specific model

(ill3: model (5a)) can be written as

CHAPTER 3. AhTALl;SIS OF RECURRENT E17EnTTS 2 7

where log PCRnIovs,~Ij(k) = -2 log {PCnIo /PCnIj(k)) is the (partial-) likelihood-ratio sta-

tistic for testing the null hypothesis that baseline model holds against the PMTP model

restricted to stratunl k. The direct sum over strata for obtaining log PC7ZnIous.;2~3 is due to

the property of PWP model, which is a conditional model. It can be easily seen from (7.3)

without restricting Jk = for 'dk. Similarly, one can obtain ~ o ~ P C R ~ ~ ~ ~ ~ , J ~ ~ , which is the

PLRT shown in the output tables under model (5b). To test the null hypothesis that M2:

model (5b) holds against M3: model (5a), we need to calculate the (partial-) likelihood-ratio

statistic log PCRhI,,s,A13. Based on the output from Table 3.4,

This (partial-) likelihood-ratio test statistic has value of 131 with 40 df, which gives a

p-value less than 0.001 under a chi-squared distribution. This result indicates that model

(5a) contributes to a better model fitting.

A comparison between M I : AG model and M2: model (5b) is not straightforward be-

cause the (partial-) likelihood-ratio test statistic can not be used for testing model fitting.

The reason is because these two models differ in nonparametric parts. Recall that a common

baseline intensity, Xo(.), is assumed under Andersen and Gill (1982) model, whereas Pren-

tice, Williams and Peterson (1981) model allows stratum-specific baseline intensity, XOk (.) .

A full likelihood function is required for the use of likelihood-ratio statistic to compare

model fits. However, comparisons between model (4) and its special cases are feasible using

(partial-) likelihood-ratio test to address model reduction. The (partial-) likelihood-ratio

statistic for testing the null hypothesis that model (4a)&(4b) holds against model (4) gives

a value of 110 with 16 df, which provides a p-value less than 0.001 under a chi-squared dis-

tribution. This shows that there exists strong evidence that the effect of interaction terms

is nonzero. Therefore, the full model, model (4), significantly improves fit.


CHAPTER 3. ANALYSIS OF RECURREATT EVENTS


3.3.3 Model Checking

A11 important step after model fitting is to assess whether the model is appropriate for the

data.

The intensity function of all models described previously is based on some estension of

Cox's proportional hazards model (Cox, 1972) whose intensity function is a product of an

arbitrary function of time and exponential function of covariates. As a consequence, it is

prudent to assess the proportionality assumption for all those Cox type models.

There are many tests of proportional hazards proposed in the literature. For example,

Schoenfeld residuals can be plotted and visualized as proportional hazards diagnostics.

Instead of applying this formal developed method, we use a simple idea for model checking

based on a nonparametric approach.

Since nonparametric estimates do not make assumptions about the form of the cumula-

tive hazard function, this conceptual advantage can be used as a validation of the propor-

tionality assumption. One of the well known nonparametric approaches for the cumulative

intensity function, also known as the empirical cumulative hazard function, proposed by

Nelson (1969) and Aalen (1972) is commonly known as the Nelson-Aalen (NA) estimate. It

has the following form:

where l$(u) is the at-risk indicator and dNi(u,) is the increment Ni{(u + du)-) - Ni(u-) of

Ni over the small interval [u, u + du). By comparing the NA estimate to any of estimated

cumulative baseline function under those models described in section (3.2.3), we see the

similarity of the estimate by fixing covariates a t baseline level to the NA estimate.

As a result, the idea of checking proportionality assumption is to plot the estimate of

the cumulative baseline intensity function under those semi-parametric models besides the

Nelson-Aalen estimate for each factor level separately and visually check the closeness of

the two curves.

For illustration, we simply select the diagnosis period (i.e., calendar) as the only co-

variate under AG model (4c) and PWP model (5a).

From Figure 3.3, the solid curve is the baseline estimate under the AG model for each


diagnosis period. The corresponding NA estimate for the cumulative intensity function is

obtained using partial information. The data used is restricted according to the particular

diagnosis period. We can apply the same technique to the PWP model in each stratum.

From Figure 3.4, it is clear to see the closeness of the NA estimate and the PWP nlodel

with an exception in stratum 3. The reason may be due to a small number of events

occurring in that stratum. Nevertheless, the proportionality assumption seems to be valid

overall for the baseline diagnosis period.


0'1 8'0 9'0 P'O Z'O 0'0

O'E 9'Z O'Z 9'1 0'1 9'0 0'0

uo!lez!lel!dso~ jo k u a n b a ~ j

O'E O'Z 0'1 0'0

uo!lez!lel!dso~ jo k u a n b a ~ j

CH-4PTER 3. AArALJrSIS OF RECURRENT EVENTS

P E Z 1 0

S' 1 0' L S'O 0'0

uo!lez!lg!dso~ 40 kuanbaq

CHAPTER 3. ANALYSIS OF RECURREKT ESrENTS 3-1

3.4 Extensions of Cox PH Model for Frequency of Events

Adjusted for Event Duration

In previous sections we do not take event duration into account. For example, the hospital

admission is considered as a one day event in the childhood cancer data. However, many

cancer-related treatments such as chemo therapy, radiation therapy, surgery, and bone

marrow transplantation are usually last more than one day.

For the analysis of hospital admissions using methods for recurrent events, the event

is an admission to the hospital. While in the hospital a patient is not at risk to the next

hospital admission. That is, during the time of an admission the individual is not at risk

of having another event as he/she is already admitted. This individual returns to being at

risk the first day after he/she is released from hospital.

Why do we in general need to consider how long the event lasts? The simple answer

follows the assumption that no other events could happen while the current event is hap-

pening. In order to take event duration into account we need additional information about

the end of event duration.

If Ni (u) is the cumulative number of started-events for subject i prior to time u, we define

V;(u) to be the cun~ulative number of correspondent ended-events for subject i prior to time

u. Then for an arbitrary subject, we have the path up to t of the counting process N ( t )

and V ( t ) for started-event and ended-event respectively. Where N(t) = { N ( u ) : 0 < u 5 t )

and V ( t ) = {V(u) : 0 < u 5 t ) .

3.4.1 Adjustment for Event Duration I

The Cox-type models mentioned in section 3.2.2 are extended as follows. First, notice that

the history information needs to be updated by including ended-event information. Let

IFI*(t) = {N(u) : 0 < u < t ) U{Z(u) : 0 < u I t ) U{V(u) : 0 < u < t ) be the updated


history information just before time t. hIodels described in section 3.2.2 are then

where S ( t ) is a function of X * ( t ) . Similarly for the special cases under model (4.1) and

(5.1). The second thing needs to be adjusted is the risk indicator, recall that no event

possibly happens within the on-going event duration. Define E'*(t) = I [ N ( t - ) = V( t - ) ] as

the at-risk indicator such that subject is not a t risk of having next event while the current

event lasts.

3.4.2 Adjustment for Event Duration I1

In addition to the at-risk adjustment due to the logic of duration, the importance of event

duration also arises from the potential factor of predisposing a subject to a greater number

of events. One can include previous event duration as a time-dependent covariate and obtain

the estimates through the model fitting.

To analyze the hospital admissions with adjustment for hospital stays, we transform

the data format from the original format in Table 3.1 to the interval format. Table 3.5

shows the interval format after event duration adjustment in addition to the inclusion of

time-dependent covariate, dur, to capture the previous event duration.

Table 3.5: Time-to-Event Interval Format Adjusted for Event Duration

Study. I D start s t o p event enum d u r xxxo 1 0 t 1 1 1 0 xxxO1 V l + 1 t p 1 2 v l - t l + l x xx0 l v 2 + 1 C l 0 2 u p - t 2 + 1

CHAPTER 3. ANALYSIS OF RECURREAJT EVENTS

3.5 Analysis of Childhood Cancer Data I1

Firstly, we implement all models mentioned in section 3.3.2 with adjustment for event

duration and inclusion of one extra time-dependent covariate. Then we compare the effect of

event-duration adjustment to the analysis results in section 3.3.2 by dropping the additional

time-dependent covariate, dur.

Table 3.6 and 3.7 display the results of model fitting (Table A.3 and A.4 in Appendix

A provide corresponding estimates with robust standard errors). From Table 3.6, we ob-

serve several significant factor by stratum interaction terms under Model (4) and highly

significant stratification effect under Model (4a)&(4b), which emphasize the importance of

the number of preceding hospital admissions again. One can see a direct influence on the

relative risk of hospital admission of CNS patients by discarding this stratification effect.

The opposite signs for the estimates of CNS under AG Model (4c) and Model (4a)&(4b)

convey the information that without/with considering the number of preceding hospital

admissions, childhood survivors diagnosed with CNS have relative lowerlhigher risk of hos-

pitalization compared to those diagnosed with leukemia (baseline). Moreover, the previous

hospital duration has a positive side effect to the relative risk of admission. In other words,

patients who stay in hospital longer are more likely to be admitted into hospital again.

There is no clear pattern for the trend of relative risks versus the frequency of admission

from Table 3.7. We observe that both lymphoma and kidney patients have lower risk of

hospital admission compared to leukemia patients no matter how many times they have

been admitted into hospital, these effects are not significant however.

To test the null hypothesis that model (5b) holds against model (5a), the (partial-)

likelihood-ratio statistic gives a value of 164.4 with 43 df (p-value < 0.001), which implies

the PWP model, model (5a), provides a better fit. Comparisons between model (4) and its

special cases3 show strong evidence that the effect of interaction terms is nonzero, hence

the model including interactions significantly improves fit.

- - -

2* indicates the significance at cu level of 0.05

3p-value < 0.001 based on the (partial-) likelihood-ratio statistic of 142 with 19 df for testing the null hypothesis that model (4a)&(4b) holds against model (4)


C H A P T E R 3. ANALYSIS OF RECURRENT EVENTS


3.5.1 Comparison of Analysis Results: Recurrent Events versus Event-

Duration Adjustment

To make the comparison between considering event duration or not, we fit the same models

as section 3.3.2 under the adjusted data format without the additional time-dependent

covariate, dur. Same significant level of 0.05 was chosen for consistent comparison. Table

3.8 and 3.9 display the results of fitting models (Table A.5 and A.6 in Appendix A provide

corresponding estimates with robust standard errors).

In model (4.1) or (4) and its special cases, there is no noticeable difference in fitted

results. However, some prominent differences appear in stratum-specific PMTP models.

Overall, fitted results after adjusted for event duration under PVITP model provide larger

estimates in terms of magnitude as well as slightly higher standard error compared to the

analysis without event duration adjustment. This result may be due to the combination

of adjusted risk set, restricted data used, and small number of events happening in each

stratum.

The estimates of hospital admission risk for diagnosis period in strata 2 and 4 change

sign compared to that ones without considering event duration adjustment (Table 3.4),

which result in opposite interpretation of relative risk of hospital admission associated with

late cancer diagnosis (calender 2) versus early cancer diagnosis (baseline). The estimates

are not significant2 however.

Model fitting results45 are consistent with ones without event-duration adjustment.

That is, the stratum-specific PWP model (5a) provides a better fit comparing to the re-

stricted PWP model (5b). No model reduction is suggested for model (4).

2* indicates the significance at a level of 0.05

4 ~ h e (partial-) likelihood-ratio statistic for testing the null hypothesis that model (5b) holds against model (5a) gives a value of 142 with 40 df (p-value < 0.001)

5 ~ h e (partial-) likelihood-ratio statistic for testing the null hypothesis that model (4a)&(4b) holds against model (4) gives a value of 120 with 16 df (p-value < 0.001)


CHAPTER 3. AIVALYSIS OF RECURREhTT EVENTS

Chapter 4

Analysis of Generalized

Longitudinal Data

There are basically three aspects of clinical interest of recurrent events: (1) time to an

event or times to multiple events, which address the risk of having event, (2) event frequency,

which focuses on the prevalence or duration of the event, and (3) multi-states, which provide

the transition probabilities of inlout of states determined by event-type.

In the previous chapter we applied several well-known models that have been proposed

in the literature by focusing on estimation of relative risk. Hence, those models are based

on the aspect of time to events. This chapter is a direct extension of previous chapter in

which the response process is a counting process. Due to the property of event frequency, a

counting process formulation is very natural to assess the relative risk of hospital admissions.

To incorporate different quantities of interest such as hospital status or duration, we are

motivated to extend the analysis with a counting process to an arbitrary response process.

Applying the same probability models considered in Chapter 3 with an arbitrary re-

sponse process results in different model interpretation due to the change from modeling

intensity functions to modeling marginal or conditional moment functions. The estimating

procedures for this extension are no longer based on partial likelihood function (Cox, 1975),

but on unbiased estimation functions (I•’).

CHAPTER 4. AArALI'SIS OF GENERALIZED LONGITUDIN-4L DATA 13

4.1 Analysis of Hospital Duration Using Counting Process

Formulation

The original data format (admission time and discharge time) of the childhood cancer

survivor is recorded in terms of date. The hospital duration itself has been viewed as discrete

with days as the unit instead of coiitinuous time. Without more detailed information,

for example, time is recorded in hours or even smaller units, it is reasonable to calculate

hospital duration in terms of the number of days that patients stay in hospital. Therefore,

we consider the response process as a counting process in a usual manner, in which we count

the number of days patients stay in hospital. One can see that it is a direct application of

the approach presented in Chapter 3: the number of hospital admissions is replaced with

the number of days in hospital.

4.1.1 R Data Format for Hospital Duration

To take advantage of the existing built-in function in R (coxph), we construct the required

time-to-event interval format described in Section 3.3.1 by treating day in hospital as an

event of interest.

Consider the following hypothetic example displayed in Table 4.1, an arbitrary individual

who has been admitted into hospital twice within the study period and stayed in hospital

for 2 days and 3 days respectively. The contribution of a total of 5 days in hospital for this

individual is the creation of 5 time-to-event intervals (5 rows) with event indicator equals to

1, and the last row captures the time interval to the censoring with event indicator equals

t o 0. Table 4.2 shows the required data construction for this hypothetic example in order

to obtain the estimates with direct use of built-in function in R.

Table 4.1: Hypothetic Data for an Arbitrary Individual

Study. ID day. entry day. e x i t days. adm days. re1 xxxo1 0 c1 t 1 tl + 1 xxx0l 0 c1 t 2 t 2 + 2

CHAPTER 4. ANALYSIS' O F GENERALIZED LONGITUDINAL DATA

Table 4.2: Time-to-Event Interval Format for Hospital Duration

Study. I D s t a r t s top event enum xxxo 1 0 t 1 1 1 >;;.;xo 1 tl t l + l 1 2 m O 1 t l + 1 t2 1 3 x x . 0 1 t2 t2 + 1 1 4 m O 1 t2 + 1 t2 + 2 1 5 s , y x O l t2 + 2 C1 0 5

4.1.2 Analysis Results for Day in Hospital Using R Built-in Function

We apply the time-to-event interval format to the cohort of 1375 patients in the study. One

challenge of directly using the built-in function in R for the analysis is the construction

of this required time-teinterval format. In addition, the numerous number of events (i.e.,

number of days a patient stayed in hospital) for each individual and the large sample size

require huge memory storage of the time-to-event interval format, which is the disadvantage

in the sense of inefficient usage of finite memory space.

Let N(t) be the cumulative number of days an arbitrary patient stayed in hospital up to

time t. A commonly used definition of strata, S( t ) = N ( t ) + l , makes some models displayed

in Figure 3.1 improper due to a maximum of 392 days contributed from a particular patient

within his/her study period, hence there would be 392 strata t o consider according to

this stratum definition. However, the AG and PWP models with an overall effect are

reasonable to investigate the relative risks of hospital duration under several factor effects.

The estimates are provided in the last two columns of Table 4.4.

In order to apply those models used in analyzing hospital admission, we need to pool

days in hospital to create reasonable numbers of strata by avoiding non-convergence of the

algorithm. Table 4.3 summarizes the definition to create 5 strata due to the consideration

of the number of strata used for the response of hospital admission (section 3.3.1).

The stratification scheme presented in Table 4.3 shows that strata are created so that

they contain approximately equal number of events. That is, a total of 11620 days cumulated

over follow-up periods of 1375 patients, each stratum is created so that it contains roughly

2324 (1162015) cumulative days over 1375 individuals. For instance, Table 4.3 shows that

stratum 2 includes those patients who have stayed in hospital in-between 7 and 19 days

CHAPTER 4. ANALYSIS O F GENERALIZED LONGITUDINAL DATA

inclusively and contribute a total of 2280 days over the corresponding period.

The estimated results are displayed in Table 4.4, where estimates of standard error

are displayed in brackets (Table A.7 in Appendis A provides corresponding estimates with

robust standard error). The last column of the estimates under Model (5b), i.e., PWP

overall effect model, is based on a commonly used definition of strata, S(t) = N(t ) + 1.

As we expected under the overall effect in Model (5b), the estimated values differ from

one definition of stratification to another. A surprising effect by changing stratification

definition is the opposite interpretation to the relative risk of hospital duration for cancer

Table 4.3: Definition of Stratification to Five Strata

patients with treatment diagnosis of kidney or other who were diagnosed with cancer in-

between year 1985 and 1990. Stratification according to a commonly used definition shows

that patients diagnosed with cancer in-between year 1985 and 1990 have lower risk of staying

in hospital for another day than those who diagnosed in the early 1980's. Also, patients

with treatment diagnosis of kidney or other are less likely to stay in hospital compared to

leukemia patients. Correspondingly, positive estimates results in an opposite interpretation

under the definition of stratification according to roughly equal number of events within

each of five strata.

Stratum Events

w)

The positive estimates for strata effect under Model (4a)k(4b) indicates the stratifi-

cation is important to the estimation of the relative risk for hospital duration. Looking

closely at the covariate effects across strata, we find the estimated results from stratum to

stratum under Model (5a). A consistent reduction in risk of hospital duration in terms of

the sign of the estimates across strata appears for late cancer diagnosis period (calendar 3)

and carcinoma patients. No other consistent results can be seen from Table 4.4. Among

different types of cancer, one can see a decreased risk of staying in hospital associated with

carcinoma patients versus leukemia patients as the cumulative days in hospital increase.

In general, it is difficult to make inference due to the uncertain stratification scheme and

limited, unclear patterns.

1 (enum 6) 2 (enum 19) 3 (enum 42) 4 (enum 84) 5 (enum 392) 2277 2280 2342 2357 2364

10, 7 ) [:, 20) [20, 43) [43, 85) [85, 392)

Total 11620

CHAPTER 4. ANALISIS O F GENERALIZED LONGITUDIN4L DATA 46

hlodel comparisons under the 5 strata stratification scheme show model (5a) contributes

to a better fit. The (partial-) likelihood-ratio statistic for testing the null hypothesis that

model (5b) holds against model (5a) has a value of 2268 with 40 df (p-value < 0.001).

4.2 General Response Processes

In order to incorporate other responses of interest under a unified framework, a general-

ization of response process is desirable. Denote {X(t) : t > 0 ) to be the response process

in general. We specify it according to admission frequency, hospital status, and hospital

duration respectively as follows:

1. X( t ) = the number of admissions up to time t

1, if subject is in hospital a t time t ; 2. X( t ) =

( 0, otherwise.

3. X(t ) = overall time in hospital up t o time t

As a hypothetical example, consider an arbitrary subject admitted into hospital four

times over his or her follow-up period, the hospitalization record includes four admission

times (TI, T2, T3, and T4), as well as the corresponding discharge time (Vl, V2, &, and

V4). Figure 4.1 displays the response process under different quantities related t o hospital

utilization.

Because the information collected for hospital records is based on a time unit of a single

day, the response process of hospital duration can be viewed as a counting process as we

considered in the previous section, that is, the hospital duration is equivalent t o the number

of days an individual stays in hospital up to time t. The general setting of response process

X( t ) is not restricted to any particular time unit. However, it is not rational to break the

time unit into many small pieces and hope its discreteness captures as much information as

the continuous time does.

CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA

I- ~1 n I- w I- PI n m m ~ ~ ~ a ~ - ~ i a - r m m 9 m o o m - o ~ ! m 9 0 ? p o p p O p o o I.. .

CHAPTER 4. ANALITSIS OF GEWERALIZED LONGITUDINAL DATA

~ s p t a l Status

#ospital Duration

Figure 4.1: Response process for hospital admission, status, and duration

CHAPTER 4. ANALI-SIS OF GENERALIZED LONGITUDINAL DAT-4

4.2.1 Methodology Inheritance

The flexibility of the response process X ( t ) enables the possibility of analyzing hospital

utilization from different perspectives sin~ultaneously. lye directly replace the response

process K ( t ) described in chapter 3 for analyzing event frequency by the arbitrary response

process X ( t ) to analyze prevalence of hospitalization and hospital duration.

All of the Cox-based models presented in section 3.2.2 and associated partial-likelihood-

based estimation procedures are applicable with replacement of the arbitrary response

process X ( t ) . However, the model interpretation would be quite different since the re-

sponse process is no longer restricted to a counting process. A type of Cox regression model

has the following form

Its two main postulated models:

where S ( t ) is a function of the history information, 'Fl(t) = { X ( u ) : 0 < u < t ) U { Z ( u ) :

0 < u 5 t ) . Depending on the quantity of interest to evaluate hospital utilization, the

stratification variable S ( t ) = X ( t ) + 1 proposed by Pretence, Williams, and Peterson (1981)

might require some modification. As mentioned in previous section, it is inefficient to apply

a commonly used definition t o address hospital duration because of enormous number of

days that patient could stay in hospital. Instead stratify directly according to the number

of days stayed in hospital, clustering several days together by allowing each stratum to

contain roughly equal number of days would be an option. However, stratification according

to equal number of days cannot distinguish between the continuity of successive hospital

duration based on one admission and contribution of hospital duration started from another

admission. In other words, this stratification definition assumes that there is no difference

between patients who admitted into hospital twice and patient who admitted once but

stayed in hospital for two days. To address the impact of another admission to hospital

duration, we consider the same stratification scheme as we used for stratifying hospital

admissions to create a total of five strata (section 3.3.1). That is, we group hospital duration

CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA 50

associated with the first, second, third, fourth to sixth, and seventh hospital admission and

above into corresponding stratum no matter the immensity and diversity of the number of

events across strata.

The estimation procedures under the counting process setting for Cox type models

are based on the formation of partial score functions arising from differentiation of the

logarithm of the partial likelihood functions (Cox, 1975). With the continuous property

of time duration for response of hospital duration and non-counting response process for

hospital status, our estimation procedures are more appealing. However, the estimation

procedures can be inherited from chapter 3 to deal with an arbitrary response process and

refer to as estimation functions (&&). In both cases, the pseudo Cox likelihood can be

used for estimation of the parameters of interest. The estimating equations for model (4.2)

and (5.2) are displayed as follows, where 19 includes all the parameters of interest and Pk corresponding to stratum-specific regression coefficients.

The estimates are obtained by solving the estimation equation &&(8) = 0 in a usual

manner. Similarly for the special cases generated from model (4.2) and (5.2). We apply

robust variance estimation provided in section 3.2.4 by replacing the partial score function,

PS(8) , with the estimation function, &&(8).

& , d G ( t ) = dXi(t) - exP(8Zi)dio(t) with where Bi = [Zi - A(t)]l;(t)d$(t), a = ,, i o ( t ) being the Breslow estimator of Ao(t), and

Cy=fi (t) Zi exp(8'Zi) A(t) = Cj"=, Y , (4 e x ~ ( e ' z j ) '

CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDINAL DATA

4.3 Analysis of Childhood Cancer Data I11

JTTith the extension of the response process, the R built-in function (cosph) is no longer

applicable and hence programming in C is required. The algorithm used to obtain the

estimates of parameter and robust variance is based on the methodology developed in the

previous section. Appendix B introduces an input friendly function for analyzing hospital

utilization based on hospital admission, status, and duration under various models consid-

ered in this thesis. Some estimated results are displayed in Appendix C.

One can see that estimates obtained through the C program and from R output are

comparable for analyzing hospital admission. However, the value of the partial likelihood

functions evaluated at the estimates based on the C program is larger compared to those

obtained from R. That is, we substitute the estimates from R into the C program and make

the comparison. Based on the criterion of maximizing partial likelihood function, estimates

obtained from the C program are preferable.

4.3.1 Analysis Results: Prevalence of Hospitalization

Due to this dichotomous response for hospital status, the response process is also known as

an alternating binary process. A phenomenon observed while running the C program is the

sensitivity of initial values resulting in the uncertainty of convergence of algorithm. As a

consequence, there is no guarantee to obtain estimates and the process is time-consuming.

Using root-finding algorithms other than the Newton-Raphson algorithm may be helpful t o

the efficiency and stability of the C program.

Besides possible updates of the C program to improve the convergence of algorithm,

the ease of replacing response process leads to a pitfall of model fitting. Recall that all

the models we considered are semi-parametric Cox-based models, that is, the assumption

of proportional hazards is crucial. Therefore, it is necessary to check the proportionality

assumption for an alternating binary response, such as hospital status.

It is possible to obtain estimated results using the C program with a more stable root-

finding algorithm if one is interested in the semi-parametric models considered so far. To

release the worry of model assumption, a nonparametric approach will be introduced in

CHAPTER 4. ANALYSIS OF GENERALIZED LONGITUDIATAL DATA

section 4.4 for evaluating hospital utilization including the prevalence of hospitalization

4.3.2 Analysis Results: Hospital Duration

VITe apply the C program to evaluate hospital utilization based on hospital duration. The

stratification scheme used in the C program is according to the number of hospital admission

to distinguish hospital duration from cumulating after another admission to a successive

duration that we used in day in hospital (section 4.1.2). Estimated results with robust

standard errors displayed in brackets are provided in Table C.4 (Appendix C).

The significance of the stratification effect from model (4a)&(4b) emphasizes the impor-

tance of investigating the effect of the relative risk of hospital duration across strata. Also,

an increasing value of estimated coefficient for stratification variable imply higher risk of

hospital duration for patients with more admissions.

We closely exanline the estimates under Model (5a) across stratum 1, stratum 2, and

stratum 3 because these strata are incremented by one hospital admission whereas the last

two strata (stratum 4 and 5) pooled several admissions together. One can see an increas-

ing risk of hospital duration for males compared to females as the number of admissions

increases. Similar trends appear for age at diagnosis. There exist decreased risks of hos-

pital duration for patients with treatment diagnosis of bone and carcinoma compared to

leukemia patients as the number of admission increases. This result indicates that bone

and carcinoma patients stay in the hospital shorter for their further admissions relative to

leukemia patients. No clear trend appears for the remaining factors of interest.

The estimated results for hospital duration under the counting process (day in hospital

in section 4.1) and the extended response process (Table C.4) are similar under Andersen

and Gill (AG) model due to its independent increment assumption. That is, different

stratification schemes do not affect the estimating procedure. These two sets of estimates

are very close in magnitude except for treatment diagnosis of lymphoma, but the effects

are consistent in terms of providing lower risk of hospital duration compared to leukemia

patients.

CHAPTER -1. ANALYSIS O F GENERALIZED LONGITUDINAL DATA

4.4 Nonparametric Estimation

Instead of estimating relative risk of having event(s) or focusing on factor effects using semi-

parametric approaches, a fundaiiiental objective from clinical perspective is estimation of

the mean function to gain overall information, such as how well the drug affect patient's

condition on average, the average number of hospital admission for cancer patients up

to a certain time, and so on. Estimation of the mean functions is recently developed

and publication of methods is enriched in the literature. We introduce a nonparametric

estimator based on the sample mean of available data.

4.4.1 Estimation Based on Observed Sample Mean (OSM)

Let X(.) denote an arbitrary response process {X(t) : t > 0), we are interested in estimating

the mean function, p ( . ) = E{X(.)), based on a set of independent and partially-observed

realizations of the response process. We introduce an observation process (0-1 process),

6(.) = {S(t) : t > 0), to indicate X(.) is observed or not. That is, S(t) = 1 indicating the

response process X(.) is observed at time t.

We consider censored survival data for an illustration purpose of the above formulation.

Let T > 0 be the time to an eventlfailure and define the response process as X ( t ) =

I ( T > t ) and assume the corresponding observation process subject to censoring, that is,

6(t) = I ( C > t ) where the random variable C denotes the censoring time, where I( .) is an

indicator function. In this setting, the mean function, p(.), of the survival process is the

survival function p(t) = Pr(T > t ) .

Hu and Lagakos (2006) introduce the observed sample mean (OSM) process p(.) as a

natural estimator of p(.) and derive its properties. The observed sample mean process is

defined as follows:

for all t for which C:=l &(t) > 0, where subscript i captures each study subject, i = 1, . . . , n.

The theorem proved by Hu and Lagakos (2006) establishes the uniform consistency and

weak convergence of this sample mean estimator.

CHAPTER 4. AMLYSIS OF GENERALIZED LONGITUDIAYAL DATA

4.4.2 Analysis Results Based on OSM for Childhood Cancer Data

To address hospital utilization of the childhood cancer survivors, recall that three re-

sponse processes are of particular interest. Let X ( t ) be the cumulative number of hos-

pital admission, hospital status, or duration, then form the corresponding response process

{ X ( t ) : t >_ 0). We choose the observation process, { b ( t ) : t 2 0), subject to censoring or

termination of study.

Based on the data of the response and observation processes, we estimate the mean

function using OSM estimator. The estimates are plotted in Figure 4.2. One can see an

increasing trend over time for both the mean number of admissions and the mean of hos-

pital duration, which indicates that childhood cancer survivors are likely to be hospitalized

during any time period. The prevalence of hospitalization over time reveals a consistent

phenomenon. On the other hand, using the lowess smoother (Cleveland, 1979), a decreas-

ing trend of prevalence of hospitalization can be seen from the scatter plot. The roughly

decreasing prevalence suggests a possible decline of the cancer survivors' time a t hospital.

The indication of the decreasing trend for hospital prevalence could result from an invalid

assumption of non-informative censoring. A relative high estimated value near the end of

study might be due to the small amount of those survivors whose information is available

in the late study period.

CHAPTER 4. ANALYSIS OF GEXERALIZED L OXGIT UDIXA L DATA 55

Number of Hospital Admissions Number of Hospital Admissions

0 1000 2000 3000 4000 5000

follow-up time (in days)

Prevalence of Hospitalization

i

0 1000 2000 3000 4000 5000


Hospital Duration

0 1000 2000 3000 4000 5000



Prevalence of Hospitalization


Hospital Duration

0 1000 2000 3000 4000 5000


Figure 4.2: Observed Sample Mean (OSh4) with hospital admission, status, and duration. NOTE: The estimates on the left column are connected by line, while the estimates on the right column are scattered with lowess smoother

Chapter 5

Final Remarks

We applied several approaches for recurrent events analyses that have been proposed in the

literature for evaluating hospital utilization of childhood cancer survivors. While working

on it, we noticed that each hospital admission is followed by a hospital stay and a subject is

not possible t o be admitted into hospital during this period. This motivated us to consider

adjustment for event duration using available discharge records. It is desirable to exclude

the subject from the group possibly to be admitted to hospital before his/her discharge

from hospital. In addition, to incorporate different aspects of clinical interest we extend

recurrent data analysis to generalized longitudinal data analysis. With the flexibility of

our methodology, we are able to analyze hospital admission, status, and duration under a

unified framework.

Further work includes theoretical and simulation studies of the proposed methods.

Model checking, consideration of different time scales, implementation of additional models

to address other scientific questions of interest, and case-control analysis are also worth

further investigation.

The childhood cancer data we used for illustration of our formulation and approaches

can be considered as a subset of the on-going research project in BC Cancer Agency. In

practice, there are more potential risk factors that investigators are interested in. It may

be computationally challenging to include all the potential risk factors. Important factors

might be overlooked by not including all potential factors however.

CHAPTER 5. FINAL R E M A R E 3

For the analysis of recurrent events for longitudinal data, there are some possible time

scales to consider. Two natural time scales for recurrent event data are time-to-event and

time-between-event. LVe have considered the recurrent events in the time-to-event fashion,

the estimation procedures are also applicable to recurrent gap times, also known as the

time-between-event or inter-event times.

Lin and Ying (1994) proposed the additive risk model as an alternative to the propor-

tional hazards model. The additive risk model specifies the hazard function is the sum

of! rather than the product of, the baseline hazard function and the regression function

of covariates. Their proposed techniques for estimating parameters resemble the partial-

likelihood-based methods for the proportional hazards model, and their estimator of regres-

sion parameters has a closed form. They suggested fitting both multiplicative and additive

risk models on the same data set as these risk models inform two different aspects of the

association between risk factors and hospital utilization: the former pertains to the risk

ratios whereas the latter to the risk differences.

Another possibility for future work is to focus on another aspect of clinical interest by ap-

plying a multi-state model that provides the transition probabilities of inlout of states. For

this particular health utilization project, possible states of interest can be determined by (1)

the status of hospitalization, (2) ordinal hospital admissions or discharges, (3) combination

of ordinal hospital admissions and corresponding discharges, and (4) all of above together

with other states of interest. Figure 5.1 displays possible transitions between states of in-

terest mentioned. As one can expect, the multi-state model is very informative for making

inference about patient's health condition as well as assessing hospital utilization.

Comparisons between case and control is of particular interest from a clinical perspective.

The case-control comparison can help scientific investigators to draw conclusions of the

necessary level of increasing health utilization to childhood cancer patients to improve their

quality of life. It can be easily implemented by introducing an additional covariate to

indicate treatment (case) group and control group. It is important to select covariates that

suit one's interest, sometimes the choice of covariates is not clear to the investigators for

the case-control comparison however.

The feasibility of assessing hospital resource in the province of British Columbia, Canada

CHAPTER 5. FINAL R E A I A W 58

requires medical insurance (hledical Service Plan is currently named Health Insurance BC)

for each resident in BC. Although it is rare, the possibility of a gap in insurance coverage

is possible even for those suffering from serious diseases , such as cancer patients. The

complication then lies to the consideration of the observation process. Instead of assuming

continuous insurance coverage for the observation process subject to censoring, taking the

status of insurance coverage into account together with censoring information may be an

option. hloreover, from an insurance company or government perspective, it is useful to

know the frequency of hospitalization and its associated amount of cost that customers

claim for to set up assistance program. To make inference about this, the state of inlout

of insurance can be added into the multi-state model as displayed in Figure 5.1 (4).

CH-4PTER 5. FINAL R E M A R I G

y In Hospital I . I Out of Hospital h

1 2nd admission 1.77 ,

(2)

3Td admission I 1

lSt admission

and so on

1

2nd admission 2nd discharge

I I

and so on

(4) Under MSP Coverage

In Hospital Out of Hospital

-1 Out of MSP Coverage + Figure 5.1: Illustration of multi-state model with states determined by (1) the status of hospitalization, (2) ordinal hospital admissions (or discharges), (3) combination of ordinal hospital admissions and corresponding discharges, and (4) status of hospitalization and status of health insurance coverage

Appendix A

Estimation Results Using R

Built-in Function

This appendix provides all of the estimates obtained by the R built-in function with ro-

bust standard errors, which includes tables for analysis of hospital admissions without

event-duration adjustment (Table A . l & Table A.2); analysis of hospital admissions with

additional time-dependent covariate under event-duration adjustment (Table A.3 & Table

A.4); and analysis of hospital admissions with event-duration adjustment (Table A.5 & Ta-

ble A.6). In addition to analysis of hospital admissions, analysis results of day in hospital

presented in Chapter 4 are also obtained based on R built-in function, Table A.7 provides

the estimates with robust standard errors.

Due to the use of different standard errors, significant covariate effects are expected to

be different from those ones without using robust standard errors even though the alpha

level is set to be 0.05.

APPENDIX -4. ESTIMATION RESULTS USING R BUILT-IAN FUNCTION

Mod

els

Str

atum

N

un

her

of

Eve

nts

ca1e

nda.

r 2

cale

ndar

3

lym

phom

a C

NS

ot

her

kidn

ey

hone

ca

rcin

oma

gend

er

PLR

T

d f

p-va

lue

Tab

le A

.2:

Est

imat

es o

f P

un

der

Mod

el (

5) a

nd

its

spe

cial

cas

es f

or h

ospi

tal

adm

issi

ons

wit

hout

eve

nt-d

urat

ion

adju

st,m

ent.

NO

TE

: E

stim

ates

of

rob

ust

sta

nd

ard

err

or

are

disp

laye

d in

bra

cket

s

APPENDIX A. ESTIAfATION RESULTS USING R B [TILT-IN FUlNCTION

APPENDIX A . ESTIMATION RESULTS USING R BUILT-IN FUNCTION 64

APPENDIX A . ESTIhfATION RESULTS USING R BUILT-IN """ 'nm'^ " '

APPENDIX A. ESTIMATION RESULTS USING R BUILT-IN FUATCTION

Mod

els

Str

atu

m

Num

ber

of E

vent

s ca

lend

ar 2

ca

lend

ar 3

st

d.ag

e ly

mp

ho

ma

CN

S

othe

r ki

dney

bo

ne

gend

er

enum

19

Mod

el (

4a)

&(l

b)

0.06

5 (0

.118

) '-0

.596

(0

.115

) 0.

008

(0.0

10)

-0.1

60 (

0.17

5)

'0.3

32

(0.1

66)

0.16

3 (0

.177

) 0.

073

(0.2

33)

0.24

0 (0

.162

) *-

0.38

1 (0

.184

) 0.

021

(0.1

06)

*3.0

61 (

0.11

6)

*3.9

51 (

0.14

2)

*4.4

42 (

0.16

6)

Rlo

del

(51)

) ol

-era

ll e

ff

d

1162

0 0.

057

(0.1

2G)

*-0.

525

(0.1

20)

0.00

8 (0

.009

) -0

.097

(0.

174)

'0

.378

(O

.lG

9)

0.12

7 (0

.185

) 0.

002

(0.?

38)

0.24

6 (0

.1G

O)

*-0.

383

(0.1

951

-0.0

35 (

0.10

7)

PL

RT

= 8

74

10 (I

f p <

0.0

01

hlod

el (

5a)

1

(enu

m 6

) 2

(en

um

19)

3

(en

rtn

~ 42

) 4

(enu

rn 8

4)

5 (e

nunl

392

) 22

77

2280

23

42

2357

23

64

"-0.

319

(0.0

97)

-0.1

76 (

0.19

6)

0.27

6 (0

.273

) -0

.226

(0.

283)

'1

.140

(0.

429)

*-

0.69

1 (0

.135

) -0

.595

(0.

314)

-0

.296

(0.

359)

'-1

.069

(0.

539)

*-

1.22

7 (0

.470

) *0

.0!2

1 (0

.009

) 0.

030

(0.0

16)

0.00

9 (0

.029

) -0

.029

(0.

024)

'0

.070

(0

.01

7)

-0.1

90 (

0.15

3)

-0.4

51 (

0.28

6)

-0.1

65 (

0.52

0)

0.55

9 (0

.599

) -1

.415

(1.

182)

0.

172

(0.3

4)

0.08

4 (0

.245

) -0

.138

(0.

315)

0.

363

(0.3

27)

*1.5

09 (

0.42

2)

-0.2

01 (

0.25

5)

-0.0

45 (

0.50

8)

0.06

5 (0

.443

) *

-OX

1 (

0.38

7)

*1.5

55 (

0.4

48

) -0

.344

(0.

222)

-0

.060

(0.

827)

-0

.lll)

T (0

.644

) 0.

293

(0.6

06)

*0.7

94 (

0.36

7)

0.28

9 (0

.157

) 0.

270

(0.3

36)

0.47

1 (

09

5)

-0

.510

(0.

366)

0.

680

(0.4

38)

-0.0

44 (

0.15

6)

-0.1

60 (

0.27

6)

-0.6

09 (

0.53

5)

*-1.

303

(0.4

09)

'-1.7

92

(0.3

77)

*-0.

389

(0.0

87)

-0.0

26 (

0.18

9)

-0.0

85 (

0.25

0)

0.33

7 (0

.290

) -0

.20G

(0.

288)

*5.0

39 (

0.19

9)

PL

RT

= 3

7389

14

df

p <

0.0

01

RIo

tlcl

(4c)

-0.2

56 (

0.20

0)

'-1.1

42

(0.2

44)

0.02

6 (0

.021

1 '-0

.914

(0.

299)

0.

138

(0.2

64)

0.10

1 (0

.44

3)

-0.3

05 (

0.56

2)

0.48

7 (0

.345

) +-

O.G

% (

0.29

0)

-0.0

49 (

0.17

8)

PL

R'I'

= X3:E

10 tl

f 1)

< 0

.001

PL

RT

= 3

20

PL

RT

= 1

84

PL

RT

= 2

47

PL

RT

= 1

070

PL

RI'

= 1

321

10 d

f 10

df

10 tl

f 10

df

10 rl

f p

< 0

.001

p <

0.0

01

p <

0.0

01

1, <

0.0

01

11 <

0.0

01

hlot

lt4

(511

) o

~rr

all

effr

ct,

1162

0 -0

.074

(0.

076)

-0

.473

(0.

095)

'0

.013

(0.

006)

-0

.144

(0.

113)

0

.lW

(0.

094)

-0

.075

io.

140)

-0

.116

(0.

2231

0.

104

(0.1

17

) -0

.1!)

3 (0

.11

71

-0

.019

(O

.CI(

i6)

PL

RT

= 1

88

10 tl

f p

< 0

.001

Tab

le A

.7:

Est

imat

ion

res

ult

s of

ho

spit

al d

ura

tio

n,

Day

in

Hos

pita

l, u

sing

th

e R

bui

lt-i

n fu

ncti

on f

or s

ever

al m

odel

s (e

stim

ates

of

rob

ust

sta

nd

ard

err

or

are

disp

laye

d in

bra

cket

s).

NO

TE

: A

nder

sen

and

Gil

l m

odel

, m

odel

(4

c),

is u

naff

ecte

d by

th

e di

ffer

ent

defi

niti

ons

of s

trat

ific

atio

n.

Th

e la

st c

olum

n of

th

e t

able

dis

play

s th

e es

tim

atin

g r

esul

ts f

or t

he

over

all

effe

ct o

f P

ren

tice

, W

illi

anis

an

d P

eter

son

mod

el,

mod

el (

5b

), u

nd

er a

com

mon

ly u

sed

defi

niti

on o

f st

rata

acc

ordi

ng t

o t

he

nu

mb

ers

of (

lay

stay

ed i

n ho

spit

al.

* in

dica

tes

the

sig

nifi

canc

e a

t cr l

evel

of

0.05

Appendix B

Data Input in C Program

Due to the variety of models, type of response process, and adjustment for event-duration

considered in this thesis project, C program is coded to incorporate this variety into input-

based function. We require input from user including the dimension of parameters under

specific model (Dimension), the response process for analyzing hospital utilization (Re-

sponse Type), model specification (Model Type), identification of stratum for stratum-

specific estimates or overall estimates (Stratum), and event-duration adjustment or not

(Adjustment). Table B.l displays the possible combinations we considered in this thesis

project.

Dimension and Model Type are model-dependent, the usefulness of Dimension is a t the

situation when we consider the same model by including the previous hospital duration as

one extra time-dependent covariate so that we are able to obtain the estimates for this extra

covariate as well.

Detailed categorical values for each input are provided for hospital admissions, some

modification on Stratum6 is necessary when consider other responses such as hospital status

and duration as mentioned in section 4.2.

'~odification required for Stratum under hospital status and duration

68

APPENDIX B. DATA INPUT IN C PROGRAM

1 Hospital Admissions

Respollse Type = 2 Hospital Status or Prevalence of Hospitalization

3 Hospital Duration

h4odel Type =

Stratum"

1 for model(4)

2 for model(4a)&(4b)

3 for model(4c)

4 for model(5a)

5 for model(5b)

0 for overall effect

1 if N(t-) = 0

2 if N(t-) = 1

3 i f N ( t - ) = 2

4 if N(t-) = 3 to 5

5 if N ( t - ) > 5

0 for no event-duration adjustment

Adjustment = 1 for event-duration adjustment

2 for event-duration adjustment with extra time-dependent covariate

APPENDIX B. DATA INPUT IN C PROGRAh1

Appendix C

Estimation Results Based on C

Program

We apply some models proposed in Chapter 3 by providing corresponding user input into

C program described in Appendix B. Of those models, we choose model (4a)&(4b), model

(4c), model (5a), and model (5b) for illustration. In addition to the model specification, we

also provide the estimates for hospital admission withoutlwith event-duration adjustment

so that we are able to compare the estimating results with those ones obtained using R.

Of those models just mentioned, there are some problems obtaining estimates for hos-

pital status due to sensitivity to selection of initial values. Therefore, we only provide

estimates under model (5a) for hospital status.

The estimating results are provided in Table C.l and Table C.3 for hospital admission

withoutlwith event-duration adjustment, Table C.3 for hospital status, and Table C.4 for

hospital duration. Estimates of robust standard error are displayed in brackets for hospital

status and hospital duration, but not for hospital admissions.

APPENDIX C. ESTIAlATIOIV RESULTS BASED ON C PROGRAM

APPENDIX C. ESTIMATION RESULTS BASED ON C PROGRAM

cale

ndar

3

std

.ag

e ly

mph

onla

C

NS

o

ther

ki

dney

ho

ne

rarc

inoi

na

gend

er

, I

Tab

le C

.3:

Est

imat

ed r

esul

ts o

f ho

spit

al s

tatu

s ba

sed

on C

pro

gram

for

Mod

el (

5b).

i.e

., T

he

esti

mat

es a

re o

btai

ned

tlir

ough

th

e sp

ecif

ic u

ser

inpu

t of

Res

pons

e T

ype

= 2

; M

odel

Typ

e =

4;

Str

atu

m =

1,

2, 3

, 4,

5;

and

Adj

usti

nent

=

0;

wit

h co

rres

pond

ing

Dim

ensi

on a

nd S

trat

um.

Est

imat

es o

f ro

bu

st s

tan

dard

err

or

are

disp

laye

d in

bra

cket

s.

Str

atu

ni

cale

i~d

ar 2

1 (

enu

ml)

2

(en

tun

2)

3 (e

nu

m:~

) 4

(e1~

1111

5)

5 (e

1111

111'7

+)

-0.2

63 (

0.31

7)

0.20

7 (0

.573

) 0.

954

(1.3

55

) 0.

215

(0.7

58)

-0.5

67 (

0.75

9)

APPENDIX C. ESTIAIATION RESULTS BASED ON C PROGRAhil

t ^ m b 3 O m t - w 3 o O m T r m m t - a b 0 7 - 3 ??9??9?9'?" o q c q c q o c q o

Bibliography

Andersen, P.K. and Gill, R.D. (1982). 'Cox's regression model for counting processess: a

large sample study'. Annals of Statistics 10:1100-1120.

Cleveland, W.S. (1979). 'Robust Locally Weighted Regression and Smoothing Scatterplots7.

Journal of the Amer ican Statistical Association 74:829-836.

Cleveland, W.S. (1981). 'LOWESS: A program for smoothing scatterplots by robust locally

weighted regression'. T h e American Statistican 35:54.

Cox, D.R. (1972). 'Regression Models and Life-Tables'. Journal of the Royal Statistical

Society. Ser ies B (Methodological) 34, No. 2:187-220.

Cox, D.R. (1975). 'Partial Likelihood'. Biometrika 62, No. 2:269-276.

Finkelstein, D.M., Schoenfeld, D.A., and Stamenovic, E. (1997). 'Analysis of multiple failure

time data from an AIDS clinical trial7. Statistics in Medicine 16:951-961.

Hewitt, M., Weiner, S.L., and Simone, J.V. (2003). Childhood Cancer Suruivorship: I m -

proving Care and Quality of Life. The National Academies Press, Washington, D.C.

Hu, X.J., and Lagakos, S.W. (2006). 'Nonparametric estimation of the mean function of a

stochastic process with missing observations (Revision for publication)' .

Hu, X.J., Sun, J., and Wei, L.J. (2003). 'Regression Parameter Estimation from Panel

Counts'. Scandinavian Journal of Statistics 30:25-43.

Kelly, P.J. and Lim, L.L-Y (2000). 'Survival analysis for recurrent event data: an application

to childhood infectious diseases'. Statistics in Medicine 19:13-33.

BIBLIOGRAPHY 77

Klein, J.P. and Rloeschberger, ALL. (2003). Sur.uiua1 Analysis: Techniques for Ceizsored

and Truncated Data. Springer (Statistics for Biology and Health), New York.

Lawless, J.F. (2002). Statistical Models and Methods for Lifetime Data. John JViley & Sons

Canada (IViley series in probability and statistics).

Lee, E.JV., Wei, L.J. and Amato, D.A. (1992). 'Cox-type regression analysis for large

numbers of small groups of correlated failure time observations'. Survival -4nalysis: State

of the Art (Phaenomenologica Series) 39:237-247.

Lin, D.Y. and Wei, L.J. (1989). 'The Robust Inference for the Cox Proportional Hazards

Model'. Journal of the American Statistical Association 84:1074-1078.

Lin, D.Y. and Ying, Z. (1994). 'Semiparametric Analysis of the Additix~e Risk Model'.

Biometrika 81:61-71.

Metcalfe, C., Thompson, S.G., and White, I.R. (2005). 'Analyzing the duration of recurrent

events in clinical trials: A comparison of approaches using data from the UK700 trial of

psychiatric cases management'. Contemporary Clinical Trials 26:443-458.

Prentice, R.L., Williams, B.J., and Peterson, A.V. (1981). 'On the regression analysis of

multivariate failure time data'. Biometrika 68:373-379.

Therneau, T.M. and Grambsch P.M. (2000). Modeling Suruiual Data: Extending the Cox

Model. Springer (Statistics for Biology and Health), New York.

Wei, L.J. and Glidden, D.V. (1997). 'An overview of statistical methods for multiple failure

time data in clinical trials'. Statistics in Medicine 16:833-839.

Wei, L. J., Lin, D.Y. and Weissfeld, L. (1989). 'Regression analysis of multivariate incomplete

failure time data by modeling marginal distributions'. Journal of the American Statistical

Association 84: 1065-1073.

Documents

GENERALIZED LONGITUDINAL DATA ANALYSIS, WITH …summit.sfu.ca/system/files/iritems1/3196/etd2389.pdf · 2021. 1. 23. · 4 Analysis of Generalized Longitudinal Data 4.1 Analysis of