25
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Embed Size (px)

Citation preview

Page 1: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Design and Analysis of Clinical Study 11. Analysis of Cohort Study

Dr. Tuan V. Nguyen

Garvan Institute of Medical Research

Sydney, Australia

Page 2: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Overview

• Incidence, person-years, hazard • Relative risk• Logistic regression analysis• Lifetable• Cox’s regression analysis• Diagnosis and prognosis

Page 3: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Person-time

• Person-time = # persons x duration

1

2

3

4

5

Time

Incidence rate (IR). During (2+4+4+8+2)=20 person-years,there were 2 incident cases: IR = 2/20 = 0.1

0 2 4 6 8

2

4

4

8

2

Page 4: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Incidence

1

2

3

4

5

Time

Incidence proportion (IP). During a 2-year period, 3 out of 5 subjects developed the disease; IP = 3/5 = 0.6

1 0 2

Page 5: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Estimation of Incidence Rates

• Consider a study where P patient-years have been followed and N cases (eg deaths, survivors, diseased, etc.) were recorded.

• Assumption: Poisson distribution.

• The estimate of incidence rate is: I = N / P

• Standard error of I is:

• 95% confidence interval of “true” incidence rate: I + 1.96 x SD(I)

NSE

P

Page 6: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Relative Risk

(exp ) 15.12.48

( exp ) 6.1

Risk osedRR

Risk un oused

Incidence rate of ischemic heart disease (IHD)

<2750 kcal >2750 kcal______________________________________________________________

Person-years 1858 2769

New cases 28 17______________________________________________________________

Estimate rate 15.1 6.1

SD of est. rate 2.8 1.5

1 2

1 1 1 10.3075

28 17SE

N N

• Relative risk (RR):

• L = log(RR) = 0.908• Standard error of log(RR)

• 95% of L: L ± 1.96xSE

= 0.908 ± 1.96x0.3075

= 0.3055, 1.51

• 95% of RR:

= exp(0.3055), exp(1.51)

= 1.36, 4.53

Page 7: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Analysis of Difference in Incidence Rates

Incidence rate of ischemic heart disease (IHD)

<2750 kcal >2750 kcal______________________________________________________________

Person-years 1858 2769

New cases 28 17______________________________________________________________

Estimate rate 15.1 6.1

SD of est. rate 2.8 1.5

• Difference:

D = 15.1 – 6.1 = 8.93

• Standard error (SE) of D

2 2

28 170.032

1858 2769SE

• 95% of D

= D ± 1.96xSE

= 8.93 ± 1.96x0.032

= 3.65, 14.2

Page 8: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Logistic Regression Analysis

• Example: A prospective of the association between BMI, BMD and bone turnover markers and fracture in 139 men. The risk factors were measured at baseline, and fracture was recorded during the 10-year follow-up:

id fx age bmi bmd ictp pinp

1 1 79 24.7252 0.818 9.170 37.383

2 1 89 25.9909 0.871 7.561 24.685

3 1 70 25.3934 1.358 5.347 40.620

4 1 88 23.2254 0.714 7.354 56.782

5 1 85 24.6097 0.748 6.760 58.358

6 0 68 25.0762 0.935 4.939 67.123

7 0 70 19.8839 1.040 4.321 26.399

8 0 69 25.0593 1.002 4.212 47.515

9 0 74 25.6544 0.987 5.605 26.132

10 0 79 19.9594 0.863 5.204 60.267

...

137 0 64 38.0762 1.086 5.043 32.835

138 1 80 23.3887 0.875 4.086 23.837

139 0 67 25.9455 0.983 4.328 71.334

Page 9: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Logistic Regression: Model

• p = probability of fracture

• odds:

• Logit of p:

• X is a risk factor. Linear logistic model:

L = + X +

• Expected value of = 0. Expected value of L is: L = + X

1

pOdds

p

log1

pL

p

• Odds = e+X

• Odds ratio (OR)

0

0

10

0

| 1

|

x

x

odds p x x ee

odds p x x e

0

0

10

0

1 x

x

odds x x eOR e

odds x x e

Page 10: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Logistic Regression Analysis using R

fracture <- read.table(“fracture.txt”, header=TRUE, na.string=”.”)

attach(fulldata)results <- glm(fx ~ bmd, family=”binomial”) summary(results)

Deviance Residuals:

Min 1Q Median 3Q Max

-1.0287 -0.8242 -0.7020 1.3780 2.0709

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.063 1.342 0.792 0.428

bmd -2.270 1.455 -1.560 0.119

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 157.81 on 136 degrees of freedom

Residual deviance: 155.27 on 135 degrees of freedom

AIC: 159.27

Page 11: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Model of Prediction

> sd(bmd)

[1] 0.1406543

• OR per SD increase in BMD: e-2.27*0.1406 = 0.7267

• Predictive model:

1.063 2.27

1.063 2.27ˆ

1

bmd

bmd

ep

e

Page 12: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Model of Prediction

plot(bmd, fitted(glm(fx ~ bmd, family=”binomial”)))

0.6 0.8 1.0 1.2

0.1

50

.20

0.2

50

.30

0.3

50

.40

bmd

fitte

d(g

lm(f

x ~

bm

d, f

am

ily =

"b

ino

mia

l"))

Page 13: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Problem of Time-to-event Data

• Non-normally distribution• Lost to follow-up• Censored observations (eg patients are still alive at the last follow-up)

• A class of statistical methods to study the occurrence and timing of events.

• Its applications are found in medicine and engineering science.

– Lifetime of machine components

– Time from diagnosis to death

– Time from infection to disease onset (latency time)

Page 14: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Definition of “Failure Time”

• Time origin– Time origin = starting time of the experiment/study

• Scale of measurement– Chronological time, but not necessary– Must be non-negative

• Precise definition – Death– Death with a specified reason

123456

c

Censored obs

Observed failure

Page 15: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Construction of Lifetable

• Survival time (in years) of 18 patients after diagnosis of parathyroid cancer:

10 13* 18* 19 23* 30 36 38* 54* 56* 59 75 93 97 104* 107 107* 107* *: censored (= survived)

• Arrange the observed failure times in an increasing order (tj)

• Calculate the number of failures (dj) during [tj-1 to tj]

• Calculate the number of censored observations (cj) during [tj-1 to tj]

• Calculate the number of subjects at risk up to time tj-1

• Compute the proportion of deaths for each interval• Compute the estimate of survivor function

Page 16: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Lifetable of Example Data

Time (t)

Duration (in weeks)

Number at risk at the

start of duration (nt)

Number of failures

during the duration (dt)

Probability of Failure - h(t)

Probability of survival

pt

Cumulative probability of survival

S(t)

1 0 – 9 18 0 0.0000 1.0000 1.0000

2 10 – 18 18 1 0.0555 0.9445 0.9445

3 19 – 29 15 1 0.0667 0.9333 0.8815

4 30 – 35 13 1 0.0769 0.9231 0.8137

5 36 – 58 12 1 0.0833 0.9167 0.7459

6 59 – 74 8 1 0.1250 0.8750 0.6526

7 75 – 92 7 1 0.1428 0.8572 0.5594

8 93 – 96 6 1 0.1667 0.8333 0.4662

9 97 – 106 5 1 0.2000 0.8000 0.3729

10 107 – 3 1 0.3333 0.6667 0.2486

Page 17: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Lifetable analysis using R

library(survival)

weeks <- c(10, 13, 18, 19, 23, 30, 36, 38, 54,

56, 59, 75, 93, 97, 104, 107, 107, 107)

status <- c(1, 0, 0, 1, 0, 1, 1,0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0)

data <- data.frame(duration, status)

survtime <- Surv(weeks, status==1)

kp <- survfit(survtime)

summary(kp)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

10 18 1 0.944 0.0540 0.844 1.000

19 15 1 0.881 0.0790 0.739 1.000

30 13 1 0.814 0.0978 0.643 1.000

36 12 1 0.746 0.1107 0.558 0.998

59 8 1 0.653 0.1303 0.441 0.965

75 7 1 0.559 0.1412 0.341 0.917

93 6 1 0.466 0.1452 0.253 0.858

97 5 1 0.373 0.1430 0.176 0.791

107 3 1 0.249 0.1392 0.083 0.745

Page 18: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Lifetable analysis using R

plot(kp, xlab="Time (weeks)", ylab="Cumulative survival probability")

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Time (weeks)

Cu

mu

lativ

e s

urv

iva

l pro

ba

bili

ty

Page 19: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Example of Cox’s Survival Data

Treatment groupid episodes time infected 1 12 8 1 3 10 12 0 6 7 52 0 7 10 28 1 8 6 44 1 10 8 14 1 12 8 3 1 14 9 52 1 15 11 35 1 18 13 6 1 20 7 12 1 23 13 7 0 24 9 52 0 26 12 52 0 28 13 36 1 31 8 52 0 33 10 9 1 34 16 11 0 36 6 52 0 39 14 15 1 40 13 13 1 42 13 21 1 44 16 24 0 46 13 52 0 48 9 28 1

Control groupid episodes time infected 2 9 15 1 4 10 44 0 5 12 2 0 9 7 8 111 7 12 113 7 52 016 7 21 117 11 19 119 16 6 121 16 10 122 6 15 025 15 4 127 9 9 029 10 27 130 17 1 132 8 12 135 8 20 137 8 32 038 8 15 141 14 5 143 13 35 145 9 28 147 15 6 1

Time to infection among patients with herpes. 25 patients were treated with gd2 and 23 patients were not treated.

Risk factor is the number of infectious episodes in previous year.

Page 20: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Cox’s Regression Model: Theory

• Setting: a prospective study (or randomized clinical trial)– Risk factors were measured at baseline– Patients were follow-up for T time– Event occurred during that time– Risk of having the event was related to baseline risk ?

• Let x1, x2, x3, … xp be risk factors. X could be continuous or discrete variables.

• Model:

Risk = (base risk) x (risk factor)

1 1 2 2 3 3 ... p px x x xh t t e h(t) : hazard / risk of having the event

(t) : base risk

x1 + x2 + … : coefficient associated with each risk factor

Page 21: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Cox’s Regression Model: Data

• Relative risk (relative hazards - RH)

1 2| , group episodeh t group episode t e

1 12 1| 2

| 1

h t groupRH e e

h t group

1 represents the relative hazards or treatment effect

Page 22: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Cox’s Regression Model Using R

group <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)episode <- c(12, 10, 7, 10, 6, 8, 8, 9, 11, 13, 7, 13, 9, 12, 13, 8, 10, 16, 6, 14, 13, 13, 16, 13, 9, 9, 10, 12, 7, 7, 7, 7, 11, 16, 16, 6, 15, 9, 10, 17, 8, 8, 8, 8, 14, 13, 9, 15)time <- c(8, 12, 52, 28, 44, 14, 3, 52, 35, 6, 12, 7, 52, 52, 36, 52, 9, 11, 52,15, 13, 21,24, 52,28, 15,44, 2, 8,12,52,21,19, 6,10,15, 4, 9,27, 1, 12,20,32,15, 5,35,28, 6)infected <- c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)data <- data.frame(group, episode, time, infected)

Page 23: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Cox’s Regression Model Using R

library(survival)

kp.by.group <- survfit(Surv(time, infected==1) ~ group)

# Kaplan Meier curve

summary(kp.by.group)

plot(kp.by.group,

xlab="Time",

ylab="Cum. survival probability",

col=c(“black”, “red”))

# Cox’s regression model 1

analysis <- coxph(Surv(time, infected==1) ~ group)

summary(analysis)

# Cox’s regression model 2

analysis <- coxph(Surv(time, infected==1) ~ group + episode)

summary(analysis)

Cox.model <- survfit(coxph(Surv(time, infected==1)~episode+strata(group)))

Page 24: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Survival Curves

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

Time

Cu

mu

lativ

e s

urv

iva

l pro

ba

bili

ty

Page 25: Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia

Cox’s Regression Model Using R

analysis <- coxph(Surv(time, infected==1) ~ group + episode)

summary(analysis)

coef exp(coef) se(coef) z pgroup 0.874 2.40 0.3712 2.35 0.0190episode 0.172 1.19 0.0648 2.66 0.0079

exp(coef) exp(-coef) lower .95 upper .95group 2.40 0.417 1.16 4.96episode 1.19 0.842 1.05 1.35

Rsquare= 0.196 (max possible= 0.986 )Likelihood ratio test= 10.5 on 2 df, p=0.00537Wald test = 10.4 on 2 df, p=0.00555Score (logrank) test = 10.6 on 2 df, p=0.00489