Upload
duongdat
View
213
Download
0
Embed Size (px)
Citation preview
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.1
Lecture 4
Extensions of the Logrank TestStatistics 255 - Survival Analysis
Presented January 21, 2016
Dan GillenDepartment of Statistics
University of California, Irvine
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.2
Weighted Logrank Tests
Logrank and Mantel-Haenzel Test
M-H Test Series of (independent) tables at different levels ofa confounder C
I Data at level C = k :D D
E ak bk
E ck dk
I M-H test compares Pr[D|E,C = k ] and Pr[D|E,C = k ] andis designed (most powerful) for the case where the oddsratio, ψk is constant at all levels of C:
ψk =Pr[D|E,C = k ]/Pr[D|E,C = k ]
Pr[D|E,C = k ]/Pr[D|E,C = k ]
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.3
Weighted Logrank Tests
Logrank and Mantel-Haenzel Test
Logrank Test Series of (dependent) tables at different failuretimes
I Data at time tk :
D D Risk SizeE d0k y0k − d0k y0k
E d1k y1k − d1k y1k
Total dk yk − dk yk
I We expect the logrank test to be most powerful when the"odds ratio" over infinitesimal time intervals are constantacross time, ie ψt = ψ for all t where
ψt =Pr[t ≤ T < t + ∆t |E ,T ≥ t]/{1− Pr[t ≤ T < t + ∆t |E ,T ≥ t]}Pr[t ≤ T < t + ∆t |E,T ≥ t]/{1− Pr[t ≤ T < t + ∆t |E,T ≥ t]}
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.4
Weighted Logrank Tests
Proportional Hazards
I But, as ∆t ↓ 0I 1− Pr’s ↑ 1I Ratio of Pr’s→ ratio of hazards, ie
ψt ≈λ(t |E)
λ(t |E)
I The logrank test will be most powerful for the case wherethe hazard ratio remains constant over time. This is calledthe proportional hazards case.
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.5
Weighted Logrank Tests
Ex. Proportional Hazards
Time (months)
Surv
ival
0 10 20 30
0.0
0.2
0.4
0.6
0.8
1.0
Control ~ Exponential (.050)Treatment ~ Exponential( .050/1.56 )
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.6
Weighted Logrank Tests
Ex. Non-Proportional Hazards
Time (months)
Surv
ival
0 10 20 30
0.0
0.2
0.4
0.6
0.8
1.0
Control ~ Weibull(1.5,16.9)Treatment ~ Weibull (.90,31.3)
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.7
Weighted Logrank Tests
Weighted Logrank Statistics
I Consider weighting (Obs − Exp) differently over time
I This will enable us to inflate early or late differences
→ Potential for increased power under non-proportionalhazards
TW =
[∑Dk=1 wk (Ok − Ek )
]2
∑Dk=1 w2
k Vk=
[∑Dk=1 wk Uk
]2
∑Dk=1 w2
k Vk
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.8
Weighted Logrank Tests
Weighted Logrank Statistics
I Choices for wk that have been proposed:
1. wk = nk gives the Gehan-Breslow test (weights equal tothe total number of subjects at risk at each failure time).Applies greater weight to early failure times.
2. wk = SKM (tk−) gives the generalized Wilcoxon test(weights equal to the pooled estimate of survival just prior totime tk ). Applies greater weight to early failure times.
I Equivalent to the Wilcoxon rank sum statistic when there is nocensoring.
I The Gρ,γ family (Fleming and Harrington; 1991)
I wk =[SKM (tk−)
]ρ [1− SKM (tk−)
]γI ρ = γ = 0 gives the usual logrank statisticI ρ = 1 and γ = 0 gives the generalized Wilcoxon test
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.9
Weighted Logrank Tests
Power Comparisons - Proportional Hazards
Time
Surv
ival
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4
Hazard Ratio Over Time(Cox estimate: 0.50)
HR = 1/2
Theta
Pow
er
1.0 1.5 2.0 2.5 3.0
0.2
0.4
0.6
0.8
1.0
Rho=0, Gamma=0 (Logrank)Rho=1, Gamma=0 (Wilcoxon)Rho=0, Gamma=1Rho=1, Gamma=1
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.10
Weighted Logrank Tests
Power Comparisons - Early Diverging Hazards
Time
Surv
ival
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4
Hazard Ratio Over Time(Cox estimate: 0.75)
HR = 1/2 HR = 2 HR = 1
Pow
er
1.0 1.1 1.2 1.3 1.4 1.5
0.2
0.4
0.6
0.8
1.0
Rho=0, Gamma=0 (Logrank)Rho=1, Gamma=0 (Wilcoxon)Rho=0, Gamma=1Rho=1, Gamma=1
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.11
Weighted Logrank Tests
Power Comparisons - Late Diverging Hazards
Time
Surv
ival
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4
Hazard Ratio Over Time(Cox estimate: 0.69)
HR = 1 HR = 1/2
Pow
er
1.0 1.2 1.4 1.6 1.8
0.2
0.4
0.6
0.8
1.0
Rho=0, Gamma=0 (Logrank)Rho=1, Gamma=0 (Wilcoxon)Rho=0, Gamma=1Rho=1, Gamma=1
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.12
Weighted Logrank Tests
Implementation in R - 6MP Example
I Know that the (unweighted) logrank statistic will be mostpowerful under proportional hazards
I How can we (informally) check the proportional hazardsassumption?
I If we have proportional hazards, then
λ1(t) = φλ0(t)
so that
log Λ1(t) = log(φ) + log Λ0(t)
I So, if the log cumulative hazards are roughly parallel, thelogrank test will tend to be most powerful
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.13
Weighted Logrank Tests
6MP log-Cumulative Hazards Plot
plot( survfit( Surv( time, irelapse ) ~ sixmp, data=sixmpLong ),fun="cloglog", lty=1:2, mark.time=FALSE,xlab="Time (mths)", ylab="log-Cumulative Hazard" )
legend( 1,1, lty=1:2, legend=c("Control (N=21)", "6-MP (N=21)"),bty="n" )
1 2 5 10 20
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
Time (mths)
log−
Cum
ulat
ive
Haz
ard
Control (N=21)6−MP (N=21)
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.14
Weighted Logrank Tests
Implementation in R - 6MP Example
I Not too bad...We do not expect the generalized Wilcoxontest to be as powerful as the logrank test in this situation
I To compute the generalized Wilcoxon, specify the optionrho=1 in the survdiff() function
I Note: The Gρ,γ statistic is not currently implemented in thesurvival package
> ##### Usual (unweight LR test)> survdiff( Surv( time, irelapse ) ~ sixmp, rho=0, data=sixmpLong )
N Observed Expected (O-E)^2/E (O-E)^2/Vsixmp=0 21 21 10.7 9.77 16.8sixmp=1 21 9 19.3 5.46 16.8
Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05
> ##### Generalized Wilcoxon test> survdiff( Surv( time, irelapse ) ~ sixmp, rho=1, data=sixmpLong )
N Observed Expected (O-E)^2/E (O-E)^2/Vsixmp=0 21 14.55 7.68 6.16 14.5sixmp=1 21 5.12 12.00 3.94 14.5
Chisq= 14.5 on 1 degrees of freedom, p= 0.000143
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.15
Weighted Logrank Tests
How should weights be chosen?
I For scientific inference it is not reasonable to look at thesurvival curves first, then choose weights
I First, ask whether there is a reason to believe we will havenon-proportional hazards
I If not, go with the logrank testI If so, consider what survival differences are most
meaningful (early vs late)
→ Childhood cancer (late differences)→ Late stage lung cancer remission (early differences)
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.16
K -Sample Logrank Tests
K -Sample Logrank Tests
I Suppose we have K > 2 groups and we wish tosimultaneously compare them with respect to survival timedistributions (or equivalently, hazards)
H0 : λ1(t) = λ2(t) = . . . λK (t), for all t > 0
(i.e. the survival curves for the all groups are equaleverywhere)
I We are particularly concerned with the alternatives
HA : λk (t) > λk ′(t), for some t > 0or
λk (t) < λk ′(t), for some t > 0for at least some k 6= k ′
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.17
K -Sample Logrank Tests
K -Sample Logrank Tests
I Test statistic is a generalization of the two sample statisticthat depends on the covariance between the (O − E)’sbetween each group
I Consider the data at the i th observed event time ti in thepooled sample:
1 2 k K Totald1i d2i . . . dki . . . dKi di
y1i − d1i y2i − d2i . . . yki − dki . . . yKi − dKi yi − di
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.18
K -Sample Logrank Tests
Ex: Survival in patients with cancer of the larynx (Sect 1.8 inK&M)
I Time origin: diagnosis with cancerI Failure event: deathI Question of interest: How does survival time from
diagnosis to death vary by stage of disease atpresentation?
> larynx[1:10,]stage t2death age year death
1 1 0.6 77 76 12 1 1.3 53 71 13 1 2.4 45 71 14 1 2.5 57 78 05 1 3.2 58 74 16 1 3.2 51 77 07 1 3.3 76 74 18 1 3.3 63 77 09 1 3.5 43 71 110 1 3.5 60 73 1
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.19
K -Sample Logrank Tests
Ex: Survival in patients with cancer of the larynx (Sect 1.8 inK&M)
Time from study start (yrs)
Sur
viva
l
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
0.2
0.4
0.6
0.8
1.0
Stage 1 Stage 1 33 (0) 23 (7) 6 (14) 1 (15)
Stage 2 Stage 2 17 (0) 11 (3) 3 (7) 0 (7)
Stage 3 Stage 3 27 (0) 14 (13) 4 (16) 0 (17)
Stage 4 Stage 4 13 (0)
90 (0)
3 (9)
51 (32)
0 (11)
13 (48)
0 (11)
1 (50)Total
Stage 1Stage 2Stage 3Stage 4
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.20
K -Sample Logrank Tests
Ex: Survival in patients with cancer of the larynx (Sect 1.8 inK&M)
I survfit() can be used to test differences in K -samplesas before
> survdiff( Surv(t2death,death) ~ stage, data=larynx )Call:survdiff(formula = Surv(t2death, death) ~ stage, data = larynx)
N Observed Expected (O-E)^2/E (O-E)^2/Vstage=1 33 15 22.57 2.537 4.741stage=2 17 7 10.01 0.906 1.152stage=3 27 17 14.08 0.603 0.856stage=4 13 11 3.34 17.590 19.827
Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05
Conclusion: The hypothesis that all four survival curves areequal is clearly rejected. We conclude that at least one groupis different with respect to survival
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.21
K -Sample Logrank Tests
Ex: Survival in patients with cancer of the larynx (Sect 1.8 inK&M)
I Additional Notes:I The Gρ family of weighted logrank statistics can be
extended to K -samples by specifying the rho value insurvfit()
I These tests say nothing about how the groups differ;which one is worst, best, etc. (though the sum of ranksgives a clue). That can be further explored with a trendtest or regression modeling...
I For now, we could also think about testing for trend sincestage is ordinal
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.22
K -Sample (Tarone) Test for Trend
Larynx Cancer Example
I Recall that there were 4 stages of disease recorded atbaseline (the origin)
I The 4 stage of disease groups can be ordered in ameaningful way
I Suppose we wish to examine the hypothesis that thesurvival experience by stage of disease is eitherprogressively worse or progressively better by stage ofdisease
I That is, we wish to take advantage of the ordinal nature ofthe stage of disease variable stagedx
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.23
K -Sample (Tarone) Test for Trend
Larynx Cancer Example
I Formally, for K ordered groups with dose vector s1, ..., sK(could be s1 = 1, ..., sK = K ), we want to test thehypothesis
H0 : λ1(t) = λ2(t) = . . . = λK (t), for all t > 0
(*Note: Same H0 as the general K -sample problem!)
vs.
HA : φs1λ1(t) = φs2λ2(t) = . . . = φsKλK (t),for φ 6= 1 and for all t > 0
I What is the general form of this alternative in terms ofsurvival curves?
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.24
K -Sample (Tarone) Test for Trend
Formulation of the test
I Recall the log-rank test is:
X 2 =U2
V·∼ χ2
1
where
U ≡ U1 ≡∑
j
(obs(j)1 − exp(j)1) =∑
j
U(j)1
is the “observed” – “expected” discrepancy for group 1,and
V ≡ V1 ≡ Var[U1] =∑
j
V(j)1
is the variance of U1 under H0
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.25
K -Sample (Tarone) Test for Trend
Formulation of the test
I Suppose group 1 (e.g. treatment) being compared togroup 0 (e.g. placebo). Could similarly define
U0 =∑
j
(obs(j)0 − exp(j)0) =∑
j
U(j)0
and writeU = 1× U1 + 0× U0 = U1
assigning “scores” 1 and 0 to the two groups
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.26
K -Sample (Tarone) Test for Trend
Formulation of the test
I For K -samples, assign scores s1, s2, . . . , sK and compute
UT = s1 ×U1 + s2 ×U2 + · · ·+ sK ×UK =K∑
k=1
sk (Ok − Ek )
I Then UT will be large (positive or negative) if the Uk sincrease (or decrease) with sk
I VT is computed using the variance-covariance matrix of(U1,U2, . . . ,UK ):
VT =K∑
k=1
s2k Vkk + 2
∑k<k ′
sk sk ′Vkk ′
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.27
K -Sample (Tarone) Test for Trend
Formulation of the test
I So our test statistic is
X 2T =
U2T
VT
·∼ χ21
I Why 1-degree of freedom?
→ Test is really a regression of logλk (t) on sk
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.28
K -Sample (Tarone) Test for Trend
Larynx Cancer Example
I There is no dedicated function for the trend test in R, but Ihave written the function survtrend() and posted it onthe course webpage for this purpose
> survtrend( Surv(t2death,death) ~ stage, data=larynx )N Observed Expected
stage=1 33 15 22.5660stage=2 17 7 10.0117stage=3 27 17 14.0845stage=4 13 11 3.3377
Logrank Test : Chi(3) = 22.763, p-value = 4.5252e-05Tarone Test Trend : Chi(1) = 13.815, p-value = 0.00020169
Conclusion: Reject the hypothesis that all four survival curvesare equal and conclude that stage is positively associated withthe hazard for death
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.29
K -Sample (Tarone) Test for Trend
Comments
I The trend test depends on the order of the covariate beingtested while the general K -sample test does not
I Why use a trend test (on 1 df) vs. a general K -sample test(on K − 1 df)?
I If effects are monotonically ordered it will be more sensitive
I The general K -sample test has less power because it doesnot take advantage of the ordinal nature of the data
I It seeks to detect a more specific alternative
I If survival curves differ, but differences are not ordered, trendtest less likely to reject
I The trend test is essentially a regression of the hazard onthe covariate of interest
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.30
Stratified Logrank Tests
Confounding
I One definition: A confounder is a variable that isassociated with the predictor of interest (X ) and causallyrelated to the outcome of interest (Y ).
Predictor (X ) Outcome (Y )
Confounder (W )
-
HHH
HHHY
������*
HHj
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.31
Stratified Logrank Tests
Confounding
I Example: Weight may be a confounder in the relationshipbetween diabetes and blood pressure:
I Diabetics tend to be heavier than non-diabeticsI Increased weight is associated with higher blood pressure
I Note: When considering potential confounders, need tocarefully consider whether or not the potential confounderlies in the causal pathway of the association of interest
Exposure Z Outcome- -
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.32
Stratified Logrank Tests
Confounding
I How do we deal with confounding? Adjust for theconfounder
I Adjustment involves the assumption that the effect ofinterest is similar across all strata of the potentialconfounder
I What if we want to test for differences in risk (ie survivaldata) after adjustment for a potential confounding factor?
→ One solution is to stratify the sample
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.33
Stratified Logrank Tests
Set-up and Notation
I Suppose the variable we wish to stratify on has J levels
I Consider testing the hypothesis
H0 : λj1(t) = λj0(t), for j = 1, ..., J and t > 0HA : λj1(t) = φλj0(t), for j = 1, ..., J and t > 0, φ 6= 1
I Notes:
1. λj1(t) can differ from λj′1(t) for two strata j and j ′, as canλj0(t) and λj′0(t)
2. Testing whether, on average across strata j = 1, . . . , J andacross time t , the within-stratum hazard λj1(t) greater (orless) than λj0(t)?
3. Testing for similar (proportional hazards) effects across timeand strata
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.34
Stratified Logrank Tests
Set-up and Notation
I Suppose that, for the jth stratumni(j) = the number at risk at time ti(j)
di(j) = the number failing at time ti(j)
I Defineni(j)1 = the number at risk in group 1
and stratum j at time ti(j)
di(j)1 = the number failing in group 1and stratum j at time ti(j)
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.35
Stratified Logrank Tests
Set-up and Notation
I Recall: the log-rank test for the j th stratum only wouldcompare “observed” to “expected”:
Uj =∑
i
(obsi(j) − expi(j)) =∑
i
Ui(j)
=∑
i
{di(j)1 − ni(j)1
(di(j)
ni(j)
)}
using the variance
Vj = Var[Uj ] =∑
i
vi(j)
I If Uj is large (positive or negative), then the test will rejectwithin the j th stratum
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.36
Stratified Logrank Tests
Set-up and Notation
I The stratified log-rank test sums (averages) over stratajust as the log-rank test sums (averages) over times:
US =∑
j
Uj =∑
j
∑i
Ui(j)
andVS = Var[US] =
∑j
Vj =∑
j
∑i
vi(j)
I Under H0
X 2S =
U2S
VS
·∼ χ21
I The stratified log-rank test statistic US is a weightedaverage of the within-stratum log-rank test statistics Ui
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.37
Stratified Logrank Tests
Back to the larynx cancer example...
I Let’s consider adjustment for age as a potentialconfounding factor in the relationship between stage ofdisease and time to death
I Specifically, consider age discretized into 3 groups
> ##> ##### Consider potential confounding by age> ##> summary(larynx$age)
Min. 1st Qu. Median Mean 3rd Qu. Max.41.0 57.0 65.0 64.6 72.0 86.0
> round( quantile( larynx$age, c(0,.33,.66,1) ) )0% 33% 66% 100%41 60 70 86
> larynx$agegrp <- cut( larynx$age, c(41,60,70,86),include.lowest=TRUE )
> summary( larynx$agegrp )[41,60] (60,70] (70,86]
30 32 28
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.38
Stratified Logrank Tests
Back to the larynx cancer example...
I Let’s consider first consider whether or not age is likely tomeet the definition of a confounder...
> ##> ##### Does age meet the definition of a confounder? (not really...)> ##> chisq.test( table( larynx$stage, larynx$agegrp ) )
Pearson’s Chi-squared test
data: table(larynx$stage, larynx$agegrp)X-squared = 4.7134, df = 6, p-value = 0.5811
> survdiff( Surv(t2death,death) ~ agegrp, data=larynx )Call:survdiff(formula = Surv(t2death, death) ~ agegrp, data = larynx)
N Observed Expected (O-E)^2/E (O-E)^2/Vagegrp=[41,60] 30 14 15.9 0.221 0.330agegrp=(60,70] 32 16 20.8 1.103 1.938agegrp=(70,86] 28 20 13.3 3.325 4.615
Chisq= 4.7 on 2 degrees of freedom, p= 0.0937
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.39
Stratified Logrank Tests
Back to the larynx cancer example...
I From the above, age is not significantly associated withstage or with time to death (in the dataset)
I The implication of this is that adjustment for age is unlikelyto have any impact on the conclusions of our analysis (wewill lose some efficiency though...)
I In a real setting, when testing a well-defined hypothesis weshould decide upon adjustment for age before assessingthe data in order to avoid data-driven inflation of the type 1error rate!
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.40
Stratified Logrank Tests
Back to the larynx cancer example...
I Let’s stratify by age here as an example...To do this, usethe strata() function in the formula statement ofsurvdiff()
> ##> ##### LR test of association between stage and t2death,> ##### stratified by agegrp> ##> survdiff(Surv(t2death,death) ~ stage + strata(agegrp), data=larynx)Call:survdiff(formula = Surv(t2death, death) ~ stage + strata(agegrp),
data = larynx)
N Observed Expected (O-E)^2/E (O-E)^2/Vstage=1 33 15 23.60 3.134 6.430stage=2 17 7 9.38 0.602 0.763stage=3 27 17 13.23 1.074 1.547stage=4 13 11 3.79 13.686 16.182
Chisq= 20.1 on 3 degrees of freedom, p= 0.00016
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.41
Stratified Logrank Tests
Conclusions
I Stage of disease at diagnosis is positively related to therisk of death
I This relationship still holds after adjusting for the effect ofage
I The association is not due to any (positive or negative)association of age with stage of disease and / or age withrisk of death
I The association is not due to any confounding effect ofage
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.42
Matched Tests
Matching
I When explicit control for confounders will be difficult,comparative studies are sometime performed on samplesof matched pairs:
I one member of pair is exposed or treated and the other isnot (or gets placebo)
I matching on: age × sex, neighborhood, clinic, etc.
I twins
I matched pairs like many strata of size 2
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.43
Matched Tests
Matching
I To account for matching in the sampling scheme, we can:
1. stratify on the matching set,2. compare outcomes within that strata, then3. combine the results across (independent) strata
I As an example, consider the 6-MP data where subjectswere actually matched by remission status and hospital
I One member randomized to 6-MP (vs. placebo)maintenance therapy
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.44
Matched Tests
Ex: 6-MP data
I A proper analysis should account for the correlationinduced by matching...
> ##> ##### Matched analysis of the 6-MP data> ##> sixmp <- read.table( "http://www.ics.uci.edu/~dgillen/
STAT255/Data/sixmp.txt" )> sixmp[1:5,]
pairid tpbo t6mp irelapse1 1 1 10 12 2 22 7 13 3 3 32 04 4 12 23 15 5 8 22 1
> ##> ##### Transform data to long format> ##> sixmpLong <- cbind( rep(sixmp$pairid, 2), c(sixmp$tpbo, sixmp$t6mp),+ rep(0:1, each=21), c( rep(1,21), sixmp$irelapse ) )> sixmpLong <- as.data.frame( sixmpLong )> names( sixmpLong ) <- c( "pairid", "time", "sixmp", "irelapse" )
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.45
Matched Tests
Ex: 6-MP data
> ##> ##### Matched analysis stratifying by pairid> ##> survdiff(Surv(time, irelapse)~sixmp+strata(pairid), data=sixmpLong)Call:survdiff(formula = Surv(time, irelapse) ~ sixmp + strata(pairid),
data = sixmpLong)
N Observed Expected (O-E)^2/E (O-E)^2/Vsixmp=0 21 21 13.5 4.17 10.7sixmp=1 21 9 16.5 3.41 10.7
Chisq= 10.7 on 1 degrees of freedom, p= 0.00106
Conclusion: After accounting for correlation induced bymatching on remission status and hospital, we conclude thatthere is a difference in time-to-relapse between the 6-MP armand the control arm using a level .05 test
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.46
Stratified Analyses
Summary
1. Analyses by separate strata, stratified tests andadjustment are statistical activities, but . . .
I . . . the identification of confounders and the decision toadjust for them are extra-statistical considerations
I . . . they involve (1) the scientific question of interest and (2)possible chains of causality
2. If the study design is stratified or matched, always adjust
3. Stratified tests will have good power for alternatives thatare in the same direction in each stratum
4. When effects are different by stratum (interaction or effectmodification), analyses are better performed and reportedseparately on each stratum
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.47
Stratified Analyses
Summary
5. Weighted, K -sample and K -sample trend test versions ofthe stratified log-rank test exist
6. Strata can be quite small for adjustment (but not forwithin-stratum analyses)
7. When data are in the form of matched pairs (or smallsets), think of them as many small strata
8. Stratified log-rank test on matched pairs is the censoreddata analogue of the the signed-rank test for paired data
Lecture 4
Stat 255 - D. Gillen
Weighted LogrankTests
K -Sample LogrankTests
K -Sample (Tarone)Test for Trend
Stratified LogrankTestsMatched Tests
Summary
4.48
Stratified Analyses
Analogous Methods For Binomial Data
Proportions Survival Data
1. Description p, RR S, Λ,OR λ, RR
2. Two-sample test Z test/χ2 test Logrank test
3. Stratified two Mantel-Haenzel Stratified-sample test test logrank test
4. K -sample K -sample K -sampleheterogeneity test heterogeneity test logrank test
5. K -sample Cochran-Armitage Tarone trendtrend test trend test test
6. Regression models Logistic regression Cox regression