Functional Principal Component Analysis for Longitudinal ...pages.cs.wisc.edu/~duzhe/STAT741Slides.pdfgroup. 23/33. Real Data Analysis Longitudinal Model and Cox Model CD4 curves of

Functional Principal Component Analysis forLongitudinal and Survival Data

Author: Fang Yao, University of Toronto

Presenter: Duzhe Wang, UW-Madison

November 29, 2016

1 / 33

Outline

BackgroundFunctional Principal Com-ponent Analysis

Review of the PaperAssumptions and Notations

ModelsPractical IssuesSimulation StudiesReal Data AnalysisModel Discussion

My Research

2 / 33

Principal Component Analysis

Figure 1: 3D data

The line approximates the data well

3 / 33

What if each observation is a curve? functional data

Figure 2: Plots of the log(FEV1) curves for the first one hundred sub-jects. The left panel displays curves obtained via smoothing splines,while the right panel displays curves which are linearly interpolated.

4 / 33

Functional Principal Component Analysis

Let X be a random variable X : Ω → L2(I), such thatX ∈ L2(Ω).

Define µ = E(X) to be the mean process of X.

The covariance function of X is defined to be the function G :I × I → R, such that G(s, t) = CovX(s), X(t) = E[X(s) −µ(s)][X(t) − µ(t)]. We Assume that X has continuous andsquare-integrable covariance function, that is,

∫ ∫G(s, t)2dsdt <

∞.

5 / 33


Mercer’s Lemma: Assume that the covariance function G asdefined is continuous over I2. Then there exists an orthonor-mal sequence φj of continuous functions in L2(I), and a non-increasing sequence λj of positive numbers, such that

G(s, t) =

∞∑j=1

λjφj(s)φj(t), s, t ∈ I,

where the series converges uniformly on I2.

6 / 33


Karhunen-Loeve expansion: Under the assumptions and nota-tions of Mercer’s Lemma, we have

X(t) = µ(t) +

∞∑j=1

ξjφj(t)

where the coefficients ξj =∫X(t)−µ(t)φj(t)dt are uncorrelated

random variables with mean zero and variances Eξ2j = λj , s.t.∑

j λj <∞.

7 / 33


The idea of FPCA is to retain the first k terms in the aboveexpansion as an approximation to X:

X(t) = µ(t) +

k∑j=1

ξjφj(t)

8 / 33

9 / 33

Assumptions and Notations

For the ith individual, i = 1, ..., n, the survival time Si isindependent of right censoring Ci, then one observes Ti =min(Si, Ci) and the failure indicator ∆i.

There is also a latent longitudinal process Xi(t) for the ith

individual with the corresponding mean process µi(t). Theobserved covariate Yi = (Yi1, ..., Yini)

ᵀ for the ith individ-ual is assumed to be sampled from Xi(t), measured at ti =(ti1, ..., tini)

ᵀ, and terminated at the endpoint Ti, subject tomeasurement error εi = (εi1, ..., εini)

ᵀ.

Therefore, we have Yij = Xi(tij) + εij for i = 1, ..., n andj = 1, ..., ni.

We also assume maxTi : i = 1, ..., n ≤ τ .

10 / 33

Assumptions and Notations

Let Zi(t) = Zi1(t), ..., Zir(t)ᵀ be the vector of covariatesvalued at time t ≤ Ti having effects on the longitudinal pro-cess and Vi(t) = Vi1(t), ..., Vis(t)ᵀ be the vector of covari-ates at time t ≤ Ti having effects on the survival time. NoteZi(t) and Vi(t) are possibly time-dependent.

We assume that the “true” values of these covariates for theith subject can be exactly observed at any time t ≤ Ti. Incontrast, the longitudinal covariates Yi is assumed to havemeasurement error and only available at ti.

Denote the ni×r design matrix formed by the covariate Zi(t)at ti by Zi = Zi(ti1), ..., Zi(tini)ᵀ.

11 / 33

Models

Linear Model Let µ(t)be the overall mean function with-out considering the vector of covariates Zi(t). If the effectsof Zi(t) are also taken into account, then µi(t) = µ(t) +βᵀZi(t), t ∈ [0, τ ] where β = (β1, ..., βr)

ᵀ.

Karhunen-Loeve Representation We assume the covariancestructure of Xi(t), G(·, ·) is the same for all i, then in terms oforthogonal eigenfunctions φk and non-increasing eigenval-ues λk, G(s, t) =

∑k λkφk(s)φk(t), s, t ∈ [0, τ ]. Therefore

we have Xi(t) = µi(t)+∑

k ξikφk(t), where ξik =∫ τ

0 Xi(t)−µi(t)φk(t)dt are uncorrelated random variables with meanzero and variances Eξ2

ik = λk, subject to∑

k λk <∞.

12 / 33

Models

FPCA We assume λk tend to zero rapidly, so the covari-ance function G can be well-approximated by the first fewterms in the eigen-decomposition. Therefore, we modelXi(t)by using the first K leading principal components, Xi(t) =µi(t) +

∑Kk=1 ξikφk(t).

Sub-model for the longitudinal covariate Yij

Yij = Xi(tij)+εij = µ(tij)+Zi(tij)ᵀβ+

K∑k=1

ξikφk(tij)+εij , t ∈ [0, τ ]

(1)

13 / 33

Models

Spline Approximation of µt Let Bp(t) = (Bp1(t), ..., Bpp(t))ᵀ

be a set of basis spline functions on [0, τ ] to model the overallmean function µ(t), with coefficients α = (α1, ..., αp)

ᵀ, soµ(t) = Bp(t)

ᵀα.

Spline Approximation of φk the eigenfunctions are mod-eled by using a set of orthonormal basis functions Bq(t) =(Bq1(t), ..., Bqq(t))

ᵀ with coefficients θk = (θ1k, ..., θqk)ᵀ that

are subject to∫ τ

0Bqm(t)Bqn(t)dy = δmn, θ

ᵀkθl = δkl, (2)

where m,n = 1, ..., q; k, l = 1, ...,K

14 / 33

Models

An alternative to (1) Then let ξi = (ξi1, ..., ξiK)ᵀ and Θ =(θ1, ..., θK)ᵀ, (1) becomes

Yij = Bp(tij)ᵀα+ Zi(tij)

ᵀβ +Bq(tij)ᵀΘξi + εij (3)

Hence, let Λ = diag(λ1, ..., λK). We also assume ξi ∼ N(0K ,Λ),εij ∼ N(0, σ2) and ξi ⊥ εij , then the covariance between ob-served values of the longitudinal process is

cov(Yij , Yil) = Bq(tij)ᵀΘΛΘᵀBq(til) + σ2δjl (4)

A concise matrix form Let Bi = (Bp(ti1), ..., Bp(tini))ᵀ, Bi =

(Bq(ti1), ..., Bq(tini))ᵀ, the model (3) can be written as Yi =

Biα+ Ziβ +BiΘξi + εi, i = 1, ..., n.

15 / 33

Models

Cox Model We model the failure by

hi(t) = h0(t) expγXi(t) + Vi(t)ᵀζ (5)

where γ and ζ = (ζ1, ..., ζs)ᵀ are coefficients.

Observed data and parameters of interest The observed datafor each individual is denoted by

Oi = (Ti,∆i, Yi, Zi, Vi, ti),

the full set of parameters of interest is

Ω = (γ, ζ, h0(·), α, β,Θ,Λ, σ2).

16 / 33

Models

Joint Modelling the observed data likelihood for the full setof parameters is given by

Lo =

n∏i=1

∫f(Ti,∆i|XH

i (Ti), Vi(Ti), γ, ζ, h0)f(Yi|Xi, σ2, ti)f(ξi|Λ)dξi

whereXHi (Ti) = Xi(t) : 0 ≤ t ≤ Ti, Xi = Xi(ti1), ..., Xi(tini)ᵀ

andf(Ti,∆i|XH

i (Ti), Vi(Ti), γ, ζ, h0) =

[h0(Ti) expγXi(Ti) + Vi(Ti)ᵀζ]∆i

× exp[−∫ Ti

0h0(u) expγXi(u) + Vi(u)ᵀζdu]

17 / 33

Models

Joint Modelling(continued)

f(Yi|Xi, σ2, ti) = (2πσ2)−

ni2 exp− 1

2σ2(Yi − Xi)

ᵀ(Yi − Xi),

and

f(ξi|Λ) = (2π|Λ|)−12 exp(−1

2ξᵀi Λ−1ξi)

where Xi = Biα+ Ziβ +BiΘξi

Estimation Procedures The EM algorithm, see the appendixfor details.

18 / 33

Practical Issues

Tuning Parameters

Degree of smoothness for the spline basis function: Howmany knots, i.e. what are p and q?

FPCA: How many eigenfunctions, i.e.what is K?

lc(K, p, q) =

n∑i=1

[logf(Ti,∆i|XiH

(Ti), Vi(Ti), γ, ζ, h0)

+ logf(Yi|Xi, σ

2, ti)]

AIC(K, p, q) = −2lc(K, p, q) + 2[p+ (K + 1)q + r + s+ 1] (6)

We start with initial guesses for p and q, choose K by minimizing(6), then choose p and q in turn base on (6), repeat until thereis no further change of (6).

19 / 33

Simulation Studies

Parameters Set-up

Item µ(t) β φ1(t) φ2(t) λ1

Value 13 sin(3πt

40 ) 0r−1√

5cos(πt10) 1√

5sin(πt10) 10

Item λ2 τ γ ζ εijValue 1 10 -1.0 0 N(0, 0.1)

Item h0(t)

Value t2/100

20 / 33

Simulation Studies

Sampling

Censoring time Ci were generated independently of all othervariables as Weibull random variables with 20% dropouts att=6 and 70% dropouts at t=9.

200 normal samples and 200 non-normal samples with eachsample size n=200. For 200 normal sampels, ξik ∼ N(0, λk).For the other 200 non-normal samples, ξik were generatedfrom a mixture of two norms, N(

√λk/2, λk/2) with proba-

bility 1/2 and N(−√λk/2, λk/2) with probability 1/2.

time sampling: si = ci + ei, ci = i for i = 1, ..., 9 andei ∼ N(0, 0.1).

21 / 33

Simulation Studies

For each normal and mixture sample, γ was estimated in threeways:

22 / 33

Real Data Analysis

Longitudinal CD4 Counts and Survival Data

There were totally 467 HIV-infected patients enrolled in thistrial, and randomly assigned to receive either ddC or ddItreatment. This real data analysis included 160 patients.

CD4 counts were recorded at the study entry, and at the 2-,6-, 12-, 18- month visits. CD4 counts were transformed by afourth-root power to achieve homogeneity of within-subjectvariance. The time to death was also recorded.

The sample size at the five time points were (79, 62, 62, 58,11) for the ddC group, and (81, 71, 60, 51, 10) for the ddIgroup.

23 / 33

Real Data AnalysisLongitudinal Model and Cox Model

CD4 curves of the two groups are modelled separately usinga B-spline basis and have a common covariance structure.Then the model is

Yij = µgi(tij) +

K∑k=1

ξikφk(tij) + εij

= Bp(tij)ᵀ(α+ giβ) +Bq(tij)

ᵀΘξi + εij

where gi = 0 for the ddC group and gi = 1 for the ddI group,β = (β1, ..., βp)

ᵀ is the vector of coefficients for modelling thedifference between two drug groups. K=2, p=q=4

The underlying CD4 processes and drug groups on survivaltime are considered with

h(t) = h0(t) expγXi(t) + ζgi, t ∈ [0, τ ]

where τ = 21.4.24 / 33

Real Data Analysis

CD4 Counts in ddC and ddI group

25 / 33

Real Data Analysis

Smooth estimates of µ0(t), µ1(t), φ1(t), φ2(t)

26 / 33

Real Data Analysis

Conclusions

γ = −0.726, 95% confidence interval is (-1.176, -0.227),which indicates the significance of the relationship betweenCD4 and survival.

For a fixed time, a reduction of CD4 count by 1, in fourth-root scale, will result in a risk of death increased by 107%,with a 95% confidence interval of (32%, 224%).

ζ = 1.034 with 95% confidence interval (0.212, 1.856). Thisimplies if two patients have the same CD4 counts, the risk ofdeath for the patient in the ddI group is about 2.813 timesthat in the ddC group, with the confidence interval (1.236,6.4).

27 / 33

Model Discussion

Open Question 1: What’s the difference?

28 / 33

Model Discussion

Open Question 2: Is EM algorithm the best?

29 / 33

My Research

Functional Data vs. High-dimensional Data

30 / 33

My Research

To model the high-frequency time series multivariate process, forthe ith individual subject, a possible alternative model is

Yi(t+ 1,m)|Yi(t) ∼ p(vi,m + aᵀi,m × Yi(t))

where Yi(t + 1,m) is the mth observation of Yi(t + 1), Yi(t) areM-variate vectors, p(·) is an exponential family.

31 / 33

Thank you!

32 / 33

Documents

Functional Principal Component Analysis for Longitudinal ...pages.cs.wisc.edu/~duzhe/STAT741Slides.pdfgroup. 23/33. Real Data Analysis Longitudinal Model and Cox Model CD4 curves of