Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Survival Analysis
I Outcome variable: Time until an event occursI Start follow-up Time−→ EventI Event examples: death; disease; recovery; purchase.I Time is often called survival time since it’s the time that an individual has
survived over some follow-up period.
Censored Data
I Censoring occurs when we have some information about individual survivaltime, but it’s not exactly known.
I Consider a person followed in a medical study until they die. If the studyends while the patient is still alive then the patient’s survival time iscensored, since the person will die after the study ends.
I Common reasons why censoring may occur:I study ends - no eventI lost to follow-upI withdraw from study
Survival Data
The data below is from Pancreas cancer patients in TCGA.
days_to_last_follow_up days_to_death vital_status– 12 Dead706 – Alive– 239 Dead1794 – Alive– 153 Dead33 – Alive
How can observed survival time be defined?
Survival Data
days_to_last_follow_up days_to_death vital_status– 12 Dead706 – Alive– 239 Dead1794 – Alive– 153 Dead33 – Alive
I If vital_status = Dead and days_to_death is not missing then T =days_to_death.
I If vital_status = Alive and days_to_last_follow_up is not missingthen T = days_to_last_follow_up.
I If a person is still alive during the study period then they are censored; ifthe person died then the event occured.
Survival Data - Define Survival Time
Define survival time:clinical$os_days <- ifelse(clinical$vital_status == "Alive",
clinical$days_to_last_follow_up,ifelse(clinical$vital_status == "Dead",
clinical$days_to_death, NA))
days_to_last_follow_up days_to_death vital_status os_days– 12 Dead 12706 – Alive 706– 239 Dead 2391794 – Alive 1794– 153 Dead 15333 – Alive 33
Survival Data - Define Censoring
Define event indicator for subject i :
δi ={
1 if death0 if censored
clinical$dead <- ifelse(clinical$vital_status == "Alive", 0,ifelse(clinical$vital_status == "Dead", 1, NA))
days_to_last_follow_up days_to_death vital_status os_days dead– 12 Dead 12 1706 – Alive 706 0– 239 Dead 239 11794 – Alive 1794 0– 153 Dead 153 133 – Alive 33 0
Survivor FunctionI Let T ≥ 0 be a random variable for a person’s survival time.I S(t) = P(T > t) = 1− F (t) is the survivor function.I For example, if T ∼ exp(λ) then S(t) for λ = 1/2 is
0.25
0.50
0.75
1.00
0 1 2 3 4 5t
S(t
)
Exponential Survival Function with rate = 0.5
I Observed survival function and exponential survival with λ = 1/2.
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4t
Obs
erve
d S
(t)
Observed Survival Function
Hazard Function
The hazard function is defined as:
h(t) = lim∆t→0
P(t ≤ T < t + ∆t|T ≥ t)∆t .
I h(t)∆t ≈ P(t ≤ T < t + ∆t|T ≥ t) - the probability of death in(t, t + ∆t) given survival up to time t.
I For example, if time is measured in days then h(t) is the approximateprobability that an individual who is alive on day t, dies in the following day.
Non-parametric Estimation of the Survival Function
S(t) = Number of individuals with survival times ≥ tNumber of individuals in data set .
I The Kaplan-Meier estimate is an example of a procedure for estimatingS(t).
Kaplan-Meier
km <- survfit(Surv(time, status) ~ 1, data = leukemia)summary(km)
## Call: survfit(formula = Surv(time, status) ~ 1, data = leukemia)#### time n.risk n.event survival std.err lower 95% CI upper 95% CI## 5 23 2 0.9130 0.0588 0.8049 1.000## 8 21 2 0.8261 0.0790 0.6848 0.996## 9 19 1 0.7826 0.0860 0.6310 0.971## 12 18 1 0.7391 0.0916 0.5798 0.942## 13 17 1 0.6957 0.0959 0.5309 0.912## 18 14 1 0.6460 0.1011 0.4753 0.878## 23 13 2 0.5466 0.1073 0.3721 0.803## 27 11 1 0.4969 0.1084 0.3240 0.762## 30 9 1 0.4417 0.1095 0.2717 0.718## 31 8 1 0.3865 0.1089 0.2225 0.671## 33 7 1 0.3313 0.1064 0.1765 0.622## 34 6 1 0.2761 0.1020 0.1338 0.569## 43 5 1 0.2208 0.0954 0.0947 0.515## 45 4 1 0.1656 0.0860 0.0598 0.458## 48 2 1 0.0828 0.0727 0.0148 0.462
Kaplan-Meirlibrary(survminer)ggsurvplot(km)
+ +
+
+
+
0.00
0.25
0.50
0.75
1.00
0 40 80 120 160Time
Sur
viva
l pro
babi
lity
Strata + All
Proportional Hazards Model
I Suppose we have two groups of patients: A and B.I Assume the hazard at time t for a patient in B is proportional to the hazard
at time t for a patient in group A. That is,
hB = ψhA(t),
where t ≥ 0 and ψ is a constant.I It’s convenient to set ψ = exp(β), since the ratio is always positive. Let
xi = 1, if subject is in B and xi = 0, if subject is in A then
hi (t) = exp(βxi )h0(t),
This is the proportional hazards model for comparing two groups, withψ = exp(βxi ).
General Proportional Hazards Model
I Suppose that we would like to model the hazard of death at particular timeas a linear function of p explanatory variables: x1, . . . , xp.
I In this case,
ψi = β1x1i + · · ·+ βpxpi .
So,hi (t) = exp(β1x1i + · · ·+ βpxpi )h0(t), or,
log{
hi (t)h0(t)
}= β1x1i + · · ·+ βpxpi .
I h0(t) is called the baseline hazard - the hazard with no x ’s in the model.I A linear model for the logarithm of the hazard ratio.
Fitting the proportional Hazards model
I Cox(1972) derived the appropriate likelihood for this model.I MLE of the β parameters can be found by maximizing the log-likelihood
function using numerical methods such as Newton-Raphson.
Fitting the proportional Hazards modelcox.mod <- coxph(Surv(time, status) ~ x, data = leukemia)summary(cox.mod)
## Call:## coxph(formula = Surv(time, status) ~ x, data = leukemia)#### n= 23, number of events= 18#### coef exp(coef) se(coef) z Pr(>|z|)## xNonmaintained 0.9155 2.4981 0.5119 1.788 0.0737 .## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### exp(coef) exp(-coef) lower .95 upper .95## xNonmaintained 2.498 0.4003 0.9159 6.813#### Concordance= 0.619 (se = 0.063 )## Likelihood ratio test= 3.38 on 1 df, p=0.07## Wald test = 3.2 on 1 df, p=0.07## Score (logrank) test = 3.42 on 1 df, p=0.06
I The estimated hazard ratio of patients not given maintenancechemotherapy vs. patients given maintenance chemotherapy is 2.4981.
hnon-maintenance(t) = exp(0.9155)hmaintenance(t)
Cox Model Diagnostics
I Graphical approaches such as log-log plots.I Goodness of fit tests such as Schoenfield residual plot
Principal Components Analysis (PCA)
I Suppose we have n measurements on p features/covariates, X1, . . . ,Xp.I PCA finds a low-dimensional represen- tation of a data set that contains as
much as possible of the variation.I The first principal component is the linear combination of the features
Z1 = φ11X1 + φ21X2 + · · ·+ φp1Xp
that has the largest variance such that∑p
j=1 φ2j1 = 1.
I φ11, φ21, . . . , φp1 are called the loadings of the first principal component.
Computing the first PC
I To compute the first principal component centre each of X1, . . . ,Xp (i.e., sothat they have mean 0).
I Then find the linear combination such that:
zi1 = φ11xi1 + φ21xi2 + · · ·+ φp1xip
that has the largest variance, subject to∑p
j=1 φ2j1 = 1.
Computing the second PC
The second principal component is the linear combination of X1, . . . ,Xp that hasmaximal variance out of all linear combinations that are uncorrelated with Z1.
Computing PCPlot of the score vectors for PC1 and PC2library(ggfortify)# Random sample of patients and genes from pancreaspca.out <- prcomp(pancreas_gene_sample[, 9:38], center = T)as.data.frame(pca.out$rotation[,c(1,2)]) %>% rownames_to_column() %>%
arrange(desc(PC1)) %>% head(n=3)
## rowname PC1 PC2## 1 ENSG00000152969.15 0.15331976 -0.09474669## 2 ENSG00000183580.9 0.08169159 -0.59653953## 3 ENSG00000169696.14 0.04970751 0.35904737autoplot(pca.out, data = pancreas_gene_sample,colour = "dead")
−0.25
0.00
0.25
0.50
−0.2 0.0 0.2 0.4PC1 (72.47%)
PC
2 (1
1.15
%)
0.00
0.25
0.50
0.75
1.00dead
More about principal components
I How many principal components? Scree plot.I Proportion of variance explained.
PCA using sklearnI In Python use sklearn.decomposition module
from sklearn.datasets import load_irisfrom sklearn.preprocessing import StandardScalerimport numpy as npfrom sklearn.decomposition import PCAimport pandas as pd
data = load_iris() # load iris datadf = pd.DataFrame(data = data.data,columns = data.feature_names)x = StandardScaler().fit_transform(df) # standardize data
pca = PCA(n_components=2) # 2 componentspcs = pca.fit_transform(x) # compute pc scores
principalDf = pd.DataFrame(data = pcs,columns = ['principal component 1','principal component 2'])
principalDf.head(n=2)
## principal component 1 principal component 2## 0 -2.264703 0.480027## 1 -2.080961 -0.674134
k means
I Try to define k disjoint subsets such that each observation belongs suchthat the within cluster variation is as small as possible.
I Typically use Euclidean distance to define within cluster variation.
1. Randomly assign a number, from 1 to k, to each of the observations. Theseserve as initial cluster assignments for the observations.
2. Iterate until the cluster assignments stop changing:
(a) For each of the k clusters, compute the cluster centroid. The kth clustercentroid is the vector of the p feature means for the observations in the kthcluster.
(b) Assign each observation to the cluster whose centroid is closest (whereclosest is defined using Euclidean distance).
k means
set.seed(416)k <- kmeans(x = alldat[,c(9:60491)], centers = 2, nstart = 25)alldat$cluster <- k$clusterfit <- survfit(Surv(os_days/365.25, dead) ~ as.factor(cluster),
data = alldat)ggsurvplot(fit, risk.table = T)
Hierarchical Clusteringset.seed(6)X1 <- rnorm(9, 0, 2)X2 <- rnorm(9,0,1) + runif(9, 0, 10)df <- data.frame(X1,X2)df$rownum <- 1:nrow(df)plot(X1,X2, type ="n");text(X1,X2,labels = rownames(df))
−2 −1 0 1 2 3
26
10
X1
X2
12
3 45 6
7
8
9
hc <- hclust(dist(as.matrix(df)))plot(hc)
8 9
5 6 3 4
7
1 204
8
Cluster Dendrogram
hclust (*, "complete")dist(as.matrix(df))
Hei
ght
Hierarchical Clustering
−2 −1 0 1 2 3
26
10
X1
X2
12
3 45 6
7
8
9
8 9
5 6 3 4
7
1 204
8
Cluster Dendrogram
hclust (*, "complete")dist(as.matrix(df))
Hei
ght
hc$merge
## [,1] [,2]## [1,] -5 -6## [2,] -3 -4## [3,] -1 -2## [4,] -8 -9## [5,] 1 2## [6,] 4 5## [7,] -7 3## [8,] 6 7