Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Functional Principal Component Analysis forLongitudinal and Survival Data
Author: Fang Yao, University of Toronto
Presenter: Duzhe Wang, UW-Madison
November 29, 2016
1 / 33
Outline
BackgroundFunctional Principal Com-ponent Analysis
Review of the PaperAssumptions and Notations
ModelsPractical IssuesSimulation StudiesReal Data AnalysisModel Discussion
My Research
2 / 33
Principal Component Analysis
Figure 1: 3D data
The line approximates the data well
3 / 33
What if each observation is a curve? functional data
Figure 2: Plots of the log(FEV1) curves for the first one hundred sub-jects. The left panel displays curves obtained via smoothing splines,while the right panel displays curves which are linearly interpolated.
4 / 33
Functional Principal Component Analysis
Let X be a random variable X : Ω → L2(I), such thatX ∈ L2(Ω).
Define µ = E(X) to be the mean process of X.
The covariance function of X is defined to be the function G :I × I → R, such that G(s, t) = CovX(s), X(t) = E[X(s) −µ(s)][X(t) − µ(t)]. We Assume that X has continuous andsquare-integrable covariance function, that is,
∫ ∫G(s, t)2dsdt <
∞.
5 / 33
Functional Principal Component Analysis
Mercer’s Lemma: Assume that the covariance function G asdefined is continuous over I2. Then there exists an orthonor-mal sequence φj of continuous functions in L2(I), and a non-increasing sequence λj of positive numbers, such that
G(s, t) =
∞∑j=1
λjφj(s)φj(t), s, t ∈ I,
where the series converges uniformly on I2.
6 / 33
Functional Principal Component Analysis
Karhunen-Loeve expansion: Under the assumptions and nota-tions of Mercer’s Lemma, we have
X(t) = µ(t) +
∞∑j=1
ξjφj(t)
where the coefficients ξj =∫X(t)−µ(t)φj(t)dt are uncorrelated
random variables with mean zero and variances Eξ2j = λj , s.t.∑
j λj <∞.
7 / 33
Functional Principal Component Analysis
The idea of FPCA is to retain the first k terms in the aboveexpansion as an approximation to X:
X(t) = µ(t) +
k∑j=1
ξjφj(t)
8 / 33
9 / 33
Assumptions and Notations
For the ith individual, i = 1, ..., n, the survival time Si isindependent of right censoring Ci, then one observes Ti =min(Si, Ci) and the failure indicator ∆i.
There is also a latent longitudinal process Xi(t) for the ith
individual with the corresponding mean process µi(t). Theobserved covariate Yi = (Yi1, ..., Yini)
ᵀ for the ith individ-ual is assumed to be sampled from Xi(t), measured at ti =(ti1, ..., tini)
ᵀ, and terminated at the endpoint Ti, subject tomeasurement error εi = (εi1, ..., εini)
ᵀ.
Therefore, we have Yij = Xi(tij) + εij for i = 1, ..., n andj = 1, ..., ni.
We also assume maxTi : i = 1, ..., n ≤ τ .
10 / 33
Assumptions and Notations
Let Zi(t) = Zi1(t), ..., Zir(t)ᵀ be the vector of covariatesvalued at time t ≤ Ti having effects on the longitudinal pro-cess and Vi(t) = Vi1(t), ..., Vis(t)ᵀ be the vector of covari-ates at time t ≤ Ti having effects on the survival time. NoteZi(t) and Vi(t) are possibly time-dependent.
We assume that the “true” values of these covariates for theith subject can be exactly observed at any time t ≤ Ti. Incontrast, the longitudinal covariates Yi is assumed to havemeasurement error and only available at ti.
Denote the ni×r design matrix formed by the covariate Zi(t)at ti by Zi = Zi(ti1), ..., Zi(tini)ᵀ.
11 / 33
Models
Linear Model Let µ(t)be the overall mean function with-out considering the vector of covariates Zi(t). If the effectsof Zi(t) are also taken into account, then µi(t) = µ(t) +βᵀZi(t), t ∈ [0, τ ] where β = (β1, ..., βr)
ᵀ.
Karhunen-Loeve Representation We assume the covariancestructure of Xi(t), G(·, ·) is the same for all i, then in terms oforthogonal eigenfunctions φk and non-increasing eigenval-ues λk, G(s, t) =
∑k λkφk(s)φk(t), s, t ∈ [0, τ ]. Therefore
we have Xi(t) = µi(t)+∑
k ξikφk(t), where ξik =∫ τ
0 Xi(t)−µi(t)φk(t)dt are uncorrelated random variables with meanzero and variances Eξ2
ik = λk, subject to∑
k λk <∞.
12 / 33
Models
FPCA We assume λk tend to zero rapidly, so the covari-ance function G can be well-approximated by the first fewterms in the eigen-decomposition. Therefore, we modelXi(t)by using the first K leading principal components, Xi(t) =µi(t) +
∑Kk=1 ξikφk(t).
Sub-model for the longitudinal covariate Yij
Yij = Xi(tij)+εij = µ(tij)+Zi(tij)ᵀβ+
K∑k=1
ξikφk(tij)+εij , t ∈ [0, τ ]
(1)
13 / 33
Models
Spline Approximation of µt Let Bp(t) = (Bp1(t), ..., Bpp(t))ᵀ
be a set of basis spline functions on [0, τ ] to model the overallmean function µ(t), with coefficients α = (α1, ..., αp)
ᵀ, soµ(t) = Bp(t)
ᵀα.
Spline Approximation of φk the eigenfunctions are mod-eled by using a set of orthonormal basis functions Bq(t) =(Bq1(t), ..., Bqq(t))
ᵀ with coefficients θk = (θ1k, ..., θqk)ᵀ that
are subject to∫ τ
0Bqm(t)Bqn(t)dy = δmn, θ
ᵀkθl = δkl, (2)
where m,n = 1, ..., q; k, l = 1, ...,K
14 / 33
Models
An alternative to (1) Then let ξi = (ξi1, ..., ξiK)ᵀ and Θ =(θ1, ..., θK)ᵀ, (1) becomes
Yij = Bp(tij)ᵀα+ Zi(tij)
ᵀβ +Bq(tij)ᵀΘξi + εij (3)
Hence, let Λ = diag(λ1, ..., λK). We also assume ξi ∼ N(0K ,Λ),εij ∼ N(0, σ2) and ξi ⊥ εij , then the covariance between ob-served values of the longitudinal process is
cov(Yij , Yil) = Bq(tij)ᵀΘΛΘᵀBq(til) + σ2δjl (4)
A concise matrix form Let Bi = (Bp(ti1), ..., Bp(tini))ᵀ, Bi =
(Bq(ti1), ..., Bq(tini))ᵀ, the model (3) can be written as Yi =
Biα+ Ziβ +BiΘξi + εi, i = 1, ..., n.
15 / 33
Models
Cox Model We model the failure by
hi(t) = h0(t) expγXi(t) + Vi(t)ᵀζ (5)
where γ and ζ = (ζ1, ..., ζs)ᵀ are coefficients.
Observed data and parameters of interest The observed datafor each individual is denoted by
Oi = (Ti,∆i, Yi, Zi, Vi, ti),
the full set of parameters of interest is
Ω = (γ, ζ, h0(·), α, β,Θ,Λ, σ2).
16 / 33
Models
Joint Modelling the observed data likelihood for the full setof parameters is given by
Lo =
n∏i=1
∫f(Ti,∆i|XH
i (Ti), Vi(Ti), γ, ζ, h0)f(Yi|Xi, σ2, ti)f(ξi|Λ)dξi
whereXHi (Ti) = Xi(t) : 0 ≤ t ≤ Ti, Xi = Xi(ti1), ..., Xi(tini)ᵀ
andf(Ti,∆i|XH
i (Ti), Vi(Ti), γ, ζ, h0) =
[h0(Ti) expγXi(Ti) + Vi(Ti)ᵀζ]∆i
× exp[−∫ Ti
0h0(u) expγXi(u) + Vi(u)ᵀζdu]
17 / 33
Models
Joint Modelling(continued)
f(Yi|Xi, σ2, ti) = (2πσ2)−
ni2 exp− 1
2σ2(Yi − Xi)
ᵀ(Yi − Xi),
and
f(ξi|Λ) = (2π|Λ|)−12 exp(−1
2ξᵀi Λ−1ξi)
where Xi = Biα+ Ziβ +BiΘξi
Estimation Procedures The EM algorithm, see the appendixfor details.
18 / 33
Practical Issues
Tuning Parameters
Degree of smoothness for the spline basis function: Howmany knots, i.e. what are p and q?
FPCA: How many eigenfunctions, i.e.what is K?
lc(K, p, q) =
n∑i=1
[logf(Ti,∆i|XiH
(Ti), Vi(Ti), γ, ζ, h0)
+ logf(Yi|Xi, σ
2, ti)]
AIC(K, p, q) = −2lc(K, p, q) + 2[p+ (K + 1)q + r + s+ 1] (6)
We start with initial guesses for p and q, choose K by minimizing(6), then choose p and q in turn base on (6), repeat until thereis no further change of (6).
19 / 33
Simulation Studies
Parameters Set-up
Item µ(t) β φ1(t) φ2(t) λ1
Value 13 sin(3πt
40 ) 0r−1√
5cos(πt10) 1√
5sin(πt10) 10
Item λ2 τ γ ζ εijValue 1 10 -1.0 0 N(0, 0.1)
Item h0(t)
Value t2/100
20 / 33
Simulation Studies
Sampling
Censoring time Ci were generated independently of all othervariables as Weibull random variables with 20% dropouts att=6 and 70% dropouts at t=9.
200 normal samples and 200 non-normal samples with eachsample size n=200. For 200 normal sampels, ξik ∼ N(0, λk).For the other 200 non-normal samples, ξik were generatedfrom a mixture of two norms, N(
√λk/2, λk/2) with proba-
bility 1/2 and N(−√λk/2, λk/2) with probability 1/2.
time sampling: si = ci + ei, ci = i for i = 1, ..., 9 andei ∼ N(0, 0.1).
21 / 33
Simulation Studies
For each normal and mixture sample, γ was estimated in threeways:
22 / 33
Real Data Analysis
Longitudinal CD4 Counts and Survival Data
There were totally 467 HIV-infected patients enrolled in thistrial, and randomly assigned to receive either ddC or ddItreatment. This real data analysis included 160 patients.
CD4 counts were recorded at the study entry, and at the 2-,6-, 12-, 18- month visits. CD4 counts were transformed by afourth-root power to achieve homogeneity of within-subjectvariance. The time to death was also recorded.
The sample size at the five time points were (79, 62, 62, 58,11) for the ddC group, and (81, 71, 60, 51, 10) for the ddIgroup.
23 / 33
Real Data AnalysisLongitudinal Model and Cox Model
CD4 curves of the two groups are modelled separately usinga B-spline basis and have a common covariance structure.Then the model is
Yij = µgi(tij) +
K∑k=1
ξikφk(tij) + εij
= Bp(tij)ᵀ(α+ giβ) +Bq(tij)
ᵀΘξi + εij
where gi = 0 for the ddC group and gi = 1 for the ddI group,β = (β1, ..., βp)
ᵀ is the vector of coefficients for modelling thedifference between two drug groups. K=2, p=q=4
The underlying CD4 processes and drug groups on survivaltime are considered with
h(t) = h0(t) expγXi(t) + ζgi, t ∈ [0, τ ]
where τ = 21.4.24 / 33
Real Data Analysis
CD4 Counts in ddC and ddI group
25 / 33
Real Data Analysis
Smooth estimates of µ0(t), µ1(t), φ1(t), φ2(t)
26 / 33
Real Data Analysis
Conclusions
γ = −0.726, 95% confidence interval is (-1.176, -0.227),which indicates the significance of the relationship betweenCD4 and survival.
For a fixed time, a reduction of CD4 count by 1, in fourth-root scale, will result in a risk of death increased by 107%,with a 95% confidence interval of (32%, 224%).
ζ = 1.034 with 95% confidence interval (0.212, 1.856). Thisimplies if two patients have the same CD4 counts, the risk ofdeath for the patient in the ddI group is about 2.813 timesthat in the ddC group, with the confidence interval (1.236,6.4).
27 / 33
Model Discussion
Open Question 1: What’s the difference?
28 / 33
Model Discussion
Open Question 2: Is EM algorithm the best?
29 / 33
My Research
Functional Data vs. High-dimensional Data
30 / 33
My Research
To model the high-frequency time series multivariate process, forthe ith individual subject, a possible alternative model is
Yi(t+ 1,m)|Yi(t) ∼ p(vi,m + aᵀi,m × Yi(t))
where Yi(t + 1,m) is the mth observation of Yi(t + 1), Yi(t) areM-variate vectors, p(·) is an exponential family.
31 / 33
Thank you!
32 / 33