62
Estimation, testing and residual analysis in the GMANOVA-MANOVA model Martin Singull Department of Mathematics Link¨ oping university, Sweden Department of Mathematics, Makerere University, March 13, 2019

Estimation, testing and residual analysis in the GMANOVA

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Estimation, testing and residual analysis in the GMANOVA

Estimation, testing and residual analysis in theGMANOVA-MANOVA model

Martin Singull

Department of MathematicsLinkoping university, Sweden

Department of Mathematics, Makerere University, March 13, 2019

Page 2: Estimation, testing and residual analysis in the GMANOVA

Martin Singull, Linkoping University (LiU), SwedenAcademic Experience

I PhD in Mathematical Statistics (LiU)

I Associate Professor in Mathematical Statistics

I Head of Division, Mathematical Statistics (LiU)

Capacity Building in Math and Stat

I Deputy Team Leader - UR-Sweden Programme /Applied Math and Stat Sub-Programme (Rw)

I Active in the Bilateral programs in Tz (UDSM),Ug (MAK) and Moz (EMU)

I External Evaluator of the EAUMP (2016)

Supevision PhDs

I Supervised 5 Phds (2 Rw, 1 Swe, 2 Tz)

I Ongoing PhD studentsI Main supervisor: 3 (2 Rw, 1 Tz)I Co-supervisor: 6 (1 Rw, 3 Swe, 1 Ug, 1 Za)

Martin Singull 1/48

Page 3: Estimation, testing and residual analysis in the GMANOVA

Joint work

Phd student Beatrice Byukusenge

Professor Dietrich von Rosen

Martin Singull 2/48

Page 4: Estimation, testing and residual analysis in the GMANOVA

Content

I Multivariate Linear Model (MANOVA)

I Growth Curve Model

I Extended Growth Curve Model

I Hypothesis Testing MANOVA Model

I GMANOVA-MANOVA ModelI EstiamtionI Fitted ValuesI ResidualsI Interpretation of ResidualsI Properties of ResidualsI Simulation StudyI Example

Martin Singull 3/48

Page 5: Estimation, testing and residual analysis in the GMANOVA

Multivariate Linear Model

Let X = BC + E , where B : p× k unknown parameter matrix, C : k × nknown design matrix such that r(C ) + p ≤ n and

E ∼ Np,n (0,Σ, I ) .

The likelihood function is given by

L(B,Σ) = (2π)−np2 |Σ|− n

2 e−12 tr{Σ−1(X−M)(X−M)′}

∝ |Σ|− n2 e−

12 tr{Σ−1(X−BC)(X−BC)′},

where M = BC is the linear mean structure.

Martin Singull 4/48

Page 6: Estimation, testing and residual analysis in the GMANOVA

The MLEs for the multivariate linear model is given by

B = XC ′(CC ′)−1,

nΣ = X (I − PC ′) X ′ = RR′

= V ,

where PC ′ = C ′(CC ′)−1C , i.e., the projection on the space C(C ′) andthe estimated mean structure and residual are

BC = XPC ′ , and R = X (I − PC ′) = XQC ′ ,

where we denote the orthogonal projector as QC ′ = I − PC ′ .

The whole space can now be decomposed as C(C ′) � C(C ′)⊥,

BC R

C(C ′) C(C ′)⊥

Martin Singull 5/48

Page 7: Estimation, testing and residual analysis in the GMANOVA

The MLEs for the multivariate linear model is given by

B = XC ′(CC ′)−1,

nΣ = X (I − PC ′) X ′ = RR′

= V ,

where PC ′ = C ′(CC ′)−1C , i.e., the projection on the space C(C ′) andthe estimated mean structure and residual are

BC = XPC ′ , and R = X (I − PC ′) = XQC ′ ,

where we denote the orthogonal projector as QC ′ = I − PC ′ .

The whole space can now be decomposed as C(C ′) � C(C ′)⊥,

BC R

C(C ′) C(C ′)⊥

Martin Singull 5/48

Page 8: Estimation, testing and residual analysis in the GMANOVA

Example - Linear Regression

If p = 1 we have an ordinary linear regression where one wants to explainthe variation for a response variable y with the variation of predictorsx1, . . . , xk with a ”theoretical” linear function

y = β0 + β1x1 + . . .+ βkxk .

Model: yj = µj + εj = β0 + β1xj1 + . . .+ βkxjk + εj , for j = 1, . . . , n, andwhere

µj = β0 + β1xj1 + . . .+ βkxjk

The predictors xj1, . . . , xjk are known, while the parameters β0, . . . , βk areunknown. Further,

ε1, . . . , εn are independent and N(0, σ2),

with σ2 unknown.

Martin Singull 6/48

Page 9: Estimation, testing and residual analysis in the GMANOVA

Example - Linear Regression

If p = 1 we have an ordinary linear regression where one wants to explainthe variation for a response variable y with the variation of predictorsx1, . . . , xk with a ”theoretical” linear function

y = β0 + β1x1 + . . .+ βkxk .

Model: yj = µj + εj = β0 + β1xj1 + . . .+ βkxjk + εj , for j = 1, . . . , n, andwhere

µj = β0 + β1xj1 + . . .+ βkxjk

The predictors xj1, . . . , xjk are known, while the parameters β0, . . . , βk areunknown. Further,

ε1, . . . , εn are independent and N(0, σ2),

with σ2 unknown.

Martin Singull 6/48

Page 10: Estimation, testing and residual analysis in the GMANOVA

Hence, the random variables y1, . . . , yn are independently distributed as

yj ∼ N(µj , σ2)

∼ N(β0 + β1xj1 + . . .+ βkxjk , σ2),

which can be written in matrix formy1y2...yn

=

β0 + β1x11 + . . .+ βkx1kβ0 + β1x21 + . . .+ βkx2k

...β0 + β1xn1 + . . .+ βkxnk

+

ε1ε2...εn

,

Martin Singull 7/48

Page 11: Estimation, testing and residual analysis in the GMANOVA

Hence, the random variables y1, . . . , yn are independently distributed as

yj ∼ N(µj , σ2)

∼ N(β0 + β1xj1 + . . .+ βkxjk , σ2),

which can be written in matrix formy1y2...yn

=

β0 + β1x11 + . . .+ βkx1kβ0 + β1x21 + . . .+ βkx2k

...β0 + β1xn1 + . . .+ βkxnk

+

ε1ε2...εn

,

Martin Singull 7/48

Page 12: Estimation, testing and residual analysis in the GMANOVA

or y1y2...yn

︸ ︷︷ ︸

=y

=

1 x11 . . . x1k1 x21 . . . x2k...

.... . .

...1 xn1 . . . xnk

︸ ︷︷ ︸

=X

β0β1...βk

︸ ︷︷ ︸

+

ε1ε2...εn

︸ ︷︷ ︸

or shortery ∼ Nn(Xβ, σ2I n),

i.e., a linear model.

Martin Singull 8/48

Page 13: Estimation, testing and residual analysis in the GMANOVA

When X is of full column rank, i.e., r(X ) = k + 1, then the estimatorsfor β and σ2 are given by

β =(X ′X

)−1 X ′y ,

σ2 = s2 =SSE

n − k − 1,

where

SSE = RR′

= (y − X β)′(y − X β) = y ′(I − X (X ′X )−X ′)y = V ,

where the residuals are given by

R = (y − X β)′.

Martin Singull 9/48

Page 14: Estimation, testing and residual analysis in the GMANOVA

We would like to test the bilinear hypothesis

H0 : FBG = 0 vs. H1 : B unrestricted.

Assume that the known matrices F : r × p and G : k ×m are of full rankwith m ≤ k and r ≤ p.

The likelihood ratio test is based on the statistic

Λ =|FVF ′|

|FVF ′ + FBG [G ′(CC ′)−1G ]−1G ′B′F ′|

,

where the asymptotic distribution of Λ is given by

P

(−(n − k − 1

2(r −m + 1)) ln Λ ≥ z

)≈

≈ P(χ2f ≥ z) +

γ2ν2

(P(χ2f+4 ≥ z)− P(χ2

f ≥ z)),

where f = rm, γ2 =rm(r2 + m2 − 5)

48and ν = n − r −m + 1

2.

Martin Singull 10/48

Page 15: Estimation, testing and residual analysis in the GMANOVA

We would like to test the bilinear hypothesis

H0 : FBG = 0 vs. H1 : B unrestricted.

Assume that the known matrices F : r × p and G : k ×m are of full rankwith m ≤ k and r ≤ p.

The likelihood ratio test is based on the statistic

Λ =|FVF ′|

|FVF ′ + FBG [G ′(CC ′)−1G ]−1G ′B′F ′|

,

where the asymptotic distribution of Λ is given by

P

(−(n − k − 1

2(r −m + 1)) ln Λ ≥ z

)≈

≈ P(χ2f ≥ z) +

γ2ν2

(P(χ2f+4 ≥ z)− P(χ2

f ≥ z)),

where f = rm, γ2 =rm(r2 + m2 − 5)

48and ν = n − r −m + 1

2.

Martin Singull 10/48

Page 16: Estimation, testing and residual analysis in the GMANOVA

However, the linear system FBG = 0 can be solved for B as

B =(F ′)◦

Θ1G ′ + Θ2G◦′,

where Θ1 and Θ2 are new parameters.

Then the model becomes

X =(F ′)◦︸ ︷︷ ︸

=A

Θ1︸︷︷︸=B1

G ′C︸︷︷︸=C 1

+ Θ2︸︷︷︸=B2

G◦′C︸ ︷︷ ︸=C 2

+E ,

which is a special case of the so called Extended Growth Curve Model(EGCM) also called the GMANOVA-MANOVA model.

Furthermore, one can show that the likelihood ratio statistics can begiven by the residuals, i.e., it is of interest to study the residuals forthe GMANOVA-MANOVA model.

Martin Singull 11/48

Page 17: Estimation, testing and residual analysis in the GMANOVA

However, the linear system FBG = 0 can be solved for B as

B =(F ′)◦

Θ1G ′ + Θ2G◦′,

where Θ1 and Θ2 are new parameters.

Then the model becomes

X =(F ′)◦︸ ︷︷ ︸

=A

Θ1︸︷︷︸=B1

G ′C︸︷︷︸=C 1

+ Θ2︸︷︷︸=B2

G◦′C︸ ︷︷ ︸=C 2

+E ,

which is a special case of the so called Extended Growth Curve Model(EGCM) also called the GMANOVA-MANOVA model.

Furthermore, one can show that the likelihood ratio statistics can begiven by the residuals, i.e., it is of interest to study the residuals forthe GMANOVA-MANOVA model.

Martin Singull 11/48

Page 18: Estimation, testing and residual analysis in the GMANOVA

Example - Potthoff & Roy (1964)

Dental measurements on eleven girls and sixteen boys at four differentages (8, 10, 12, 14) were taken. Each measurement is the distance, inmillimeters, from the center of pituitary (hypophysis) topterygo-maxillary fissure.

Martin Singull 12/48

Page 19: Estimation, testing and residual analysis in the GMANOVA

X = (x1, . . . , x27)

=

21 21 20.5 23.5 21.5 20 21.5 23 20 . . .16.5 24.5 26 21.5 23 20 25.5 24.5 22 . . .. . . 24 23 27.5 23 21.5 17 22.5 23 2220 21.5 24 24.5 23 21 22.5 23 21 . . .19 25 25 22.5 22.5 23.5 27.5 25.5 22 . . .. . . 21.5 20.5 28 23 23.5 24.5 25.5 24.5 21.521.5 24 24.5 25 22.5 21 23 23.5 22 . . .19 28 29 23 24 22.5 26.5 27 24.5 . . .. . . 24.5 31 31 23.5 24 26 25.5 26 23.523 25.5 26 26.5 23.5 22.5 25 24 21.5 . . .19.5 28 31 26.5 27.5 26 27 28.5 26.5 . . .. . . 25.5 26 31.5 25 28 29.5 26 30 25

.

Martin Singull 13/48

Page 20: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 14/48

Page 21: Estimation, testing and residual analysis in the GMANOVA

Growth Curve Model (Potthoff and Roy, 1964)

Definition. Let X : p × N and B : q × m be the observation andparameter matrices, respectively, and let A : p × q and C : m × Nbe the within and between individual design matrices, respectively.Suppose that q ≤ p and p ≤ N − r(C ) = n.

The Growth Curve model (GCM) is defined by

X = ABC + E ,

where E ∼ Np,N (0,Σ, IN).

More about the GCM, see Kollo and von Rosen (2005).

Martin Singull 15/48

Page 22: Estimation, testing and residual analysis in the GMANOVA

Example - Potthoff & Roy (1964), cont.

Assume that we want to model linear growth, i.e.,

µi =

b0i + 8b1ib0i + 10b1ib0i + 12b1ib0i + 14b1i

, for i = 1, 2.

For this we may use the parameter and design matrices

B =

(b01 b02b11 b12

),

A =

1 81 101 121 14

and C =

(1′11 0′160′11 1′16

).

Martin Singull 16/48

Page 23: Estimation, testing and residual analysis in the GMANOVA

Example - Potthoff & Roy (1964), cont.

Assume that we want to model linear growth, i.e.,

µi =

b0i + 8b1ib0i + 10b1ib0i + 12b1ib0i + 14b1i

, for i = 1, 2.

For this we may use the parameter and design matrices

B =

(b01 b02b11 b12

),

A =

1 81 101 121 14

and C =

(1′11 0′160′11 1′16

).

Martin Singull 16/48

Page 24: Estimation, testing and residual analysis in the GMANOVA

Growth Curve Model – MLEs

If A and C has full rank, the MLEs for the GCM is given

B =(A′V−1A

)−1A′V−1XC ′

(CC ′

)−1, i.e.,

ABC = PVA XPC ′ ,

NΣ =(X − ABC

)(X − ABC

)′= RR

′︸︷︷︸=V

+R1R′1,

where

R1 =(I p − PV

A

)XPC ′ ,

R = X (IN − PC ′) ,

V = RR′

= X (IN − PC ′) X ′,

PC ′ = C ′(CC ′)−1C = projection on C(C ′),

PVA = A

(A′V−1A

)−1A′V−1 = projection on CV (A).

Martin Singull 17/48

Page 25: Estimation, testing and residual analysis in the GMANOVA

Growth Curve Model – MLEs

If A and C has full rank, the MLEs for the GCM is given

B =(A′V−1A

)−1A′V−1XC ′

(CC ′

)−1, i.e.,

ABC = PVA XPC ′ ,

NΣ =(X − ABC

)(X − ABC

)′= RR

′︸︷︷︸=V

+R1R′1,

where

R1 =(I p − PV

A

)XPC ′ ,

R = X (IN − PC ′) ,

V = RR′

= X (IN − PC ′) X ′,

PC ′ = C ′(CC ′)−1C = projection on C(C ′),

PVA = A

(A′V−1A

)−1A′V−1 = projection on CV (A).

Martin Singull 17/48

Page 26: Estimation, testing and residual analysis in the GMANOVA

CV (A)⊗ C(C ′) � (CV (A)⊗ C(C ′))⊥

= (CV (A)⊗ C(C ′)) � CV (A)⊥ ⊗ C(C ′) � V ⊗ C(C ′)⊥

CV (A)⊥ R1 R = X (IN − PC ′)

RR1 =

(I p − PV

A

)XPC ′

CV (A) ABC

C(C ′) C(C ′)⊥

ABC = PVA XPC ′ ,

NΣ = RR′

+ R1R′1.

Martin Singull 18/48

Page 27: Estimation, testing and residual analysis in the GMANOVA

Example - Potthoff & Roy (1964), cont.

The MLEs for the Example are given by

B =

(17.4254 15.84230.4764 0.8268

)and

Σ =

5.1192 2.4409 3.6105 2.52222.4409 3.9279 2.7175 3.06233.6105 2.7175 5.9798 3.82352.5222 3.0623 3.8235 4.6180

.

Martin Singull 19/48

Page 28: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 20/48

Page 29: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 20/48

Page 30: Estimation, testing and residual analysis in the GMANOVA

Extended Growth Curve Model

Definition. Let X : p×n, Ai : p×qi , qi ≤ p, B i : qi×ki , C i : ki×n,r(C 1) + p ≤ n, i = 1, ...,m, C(C ′i ) ⊆ C(C ′i−1), i = 1, ...,m andΣ : p × p positive definite.

Then the Extended Growth Curve model (EGCM) is defined by

X =m∑i=1

AiB iC i + E ,

where E ∼ Np,n (0,Σ, I n).

The subspace condition C(C ′i ) ⊆ C(C ′i−1) may be replaced by

C(Ai ) ⊆ C(Ai−1).

See (Filipiak & von Rosen, 2012) for more details.

The EGCM is also called the sum of profiles model.

Martin Singull 21/48

Page 31: Estimation, testing and residual analysis in the GMANOVA

Extended Growth Curve Model

Definition. Let X : p×n, Ai : p×qi , qi ≤ p, B i : qi×ki , C i : ki×n,r(C 1) + p ≤ n, i = 1, ...,m, C(C ′i ) ⊆ C(C ′i−1), i = 1, ...,m andΣ : p × p positive definite.

Then the Extended Growth Curve model (EGCM) is defined by

X =m∑i=1

AiB iC i + E ,

where E ∼ Np,n (0,Σ, I n).

The subspace condition C(C ′i ) ⊆ C(C ′i−1) may be replaced by

C(Ai ) ⊆ C(Ai−1).

See (Filipiak & von Rosen, 2012) for more details.

The EGCM is also called the sum of profiles model.

Martin Singull 21/48

Page 32: Estimation, testing and residual analysis in the GMANOVA

Example - Potthoff & Roy (1964), cont.

It seems to be reasonable to assume that for both girls and boys we havea linear growth component but additionally for the boys there also existsa second order polynomial structure.

Then we may use the EGCM with two terms

E(X ) = A1B1C 1 + A2B2C 2,

where A1,B1 and C 1 are as before and

A2 =(82 102 122 142

)′,

C 2 =(0′11 1′16

)and B2 = b22.

Martin Singull 22/48

Page 33: Estimation, testing and residual analysis in the GMANOVA

In the papers

I Singull, M., & von Rosen, D. (2010). Explicit estimators ofparameters in the Growth Curve model with linearly structuredcovariance matrices. J. Multivariate Anal. 101:1284–1295.

I Nzabanita, J., von Rosen, D., & Singull, M. (2015). Estimation ofparameters in the extended growth curve model with a linearlystructured covariance matrix. Acta et Com. Univ. Tartuensis deMath., 16(1):13–32.

I Nzabanita, J., von Rosen, D., & Singull, M. (2015). ExtendedGMANOVA Model with a Linearly Structured Covariance Matrix.Math. Meth. Stat., 24(4):280–291.

we study the problem of estimation of the parameters for the (E)GCMwith linearly structured covariance matrix Σ by using the residuals.

Martin Singull 23/48

Page 34: Estimation, testing and residual analysis in the GMANOVA

Furthermore, in

I Srivastava, M. S. & Singull, M. (2017). Test for the mean matrix ina Growth Curve model for high dimensions. Commun. Statist.Theor. Meth., 46(13):6668–6683.

I Srivastava, M. S. & Singull, M. (2017). Testing sphericity and intra-class covariance structures under a growth curve model in highdimension. Commun. Statist. Sim. Comp., 46(7):5740-5751.

we study the problem of high-dimensional GCM, i.e., when

p ↗ N − r = n

or even when p > n.

Martin Singull 24/48

Page 35: Estimation, testing and residual analysis in the GMANOVA

Example - Multiple SclerosisIn this example we will study a special case of the extended GCM.

I A study was conducted to investigate two treatments for patientssuffering from multiple sclerosis (MS).

I 79 suffers of the disease were recruited into the study.

I 44 were randomized to receive azathioprine alone (group 1).

I 35 were randomized to receive azathioprine + methylprenisommne(group 2).

I For each participant, a measure of autoimmunity, AFCR, wasplanned at clinic visits at baseline (time 0, at initiation of thetreatment) and at 6, 12, and 18 months thereafter.

I MS affects the immune system: low values of AFCR are evidencethat immunity is improving, which is hopefully associated with abetter prognosis for suffers of MS.

I Also recorded for each subject was an indicator of whether or notthe subject had previous treatment (0=no, 1=yes).

Martin Singull 25/48

Page 36: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 26/48

Page 37: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 26/48

Page 38: Estimation, testing and residual analysis in the GMANOVA

GMANOVA-MANOVA Model

The GMANOVA-MANOVA model is a special case of the EGCM.

Let X be an p× n matrix of observations, where n represents the numberof subjects measured each at p occasions.

X = AB1C 1 + B2C 2 + E ,

where E ∼ Np,n (0,Σ, I n) and

I A : p ×m1, B1 : m1 × r1,C 1 : r1 × n

I B2 : p × r2,C 2 : r2 × n,

I In this case B2C 2 term encodes the pre-treatment (covariates).

Observe that for this model the subspace condition for the withinindividual design matrices is always fulfilled since C (A) ⊆ C (I ).

Martin Singull 27/48

Page 39: Estimation, testing and residual analysis in the GMANOVA

Estimation of Parameters B1, B2 and Σ

If C (A) ⊆ C (I ) , m1 ≤ p and Σ is a positive definite matrix, then MLEscan be obtained from examination of the likelihood function

L (B1,B2,Σ)

= (2π)−12 pn |Σ|

−1

2n

exp(− 1

2tr{

Σ−1 (X − AB1C 1 − B2C 2) ()′})

.

The likelihood equations are

A′Σ−1(X − AB1C 1 − B2C 2

)C ′1 = 0,

Σ−1(X − AB1C 1 − B2C 2

)C ′2 = 0,

nΣ =(X − AB1C 1 − B2C 2

)()′.

Martin Singull 28/48

Page 40: Estimation, testing and residual analysis in the GMANOVA

If we take similar as before

PVA = A

(A′V−1A

)−1A′V−1 with P I

A = PA and

QA = I − PA.

Then, after some algebra the solution of this system follows

B1 =(A′V−1A

)−A′V−1XQC ′2C

′1

(C 1QC ′2C

′1

)−,

B2 =(X − AB1C 1

)C ′2(C 2C ′2

)−,

nΣ =(X − AB1C 1 − B2C 2

)()′=(X −

(XPC ′2 + PV

A XPQC′2C ′1

))()′= V +

(I − PV

A

)XPQC′

2C ′1X

′(I − PV ′

A

).

with V = XQC ′2

(I − PQC′

2C ′1

)QC ′2X

′.

Martin Singull 29/48

Page 41: Estimation, testing and residual analysis in the GMANOVA

Fitted Values

The fitted values is given by

X = AB1C 1 + B2C 2

= XPC ′2 + PVA XQC ′2C

′1

(C 1QC ′2C

′1

)−C 1QC ′2 .

Hence, it follows that

X = XPC ′2 + PVA XPQC′

2C ′1 .

Martin Singull 30/48

Page 42: Estimation, testing and residual analysis in the GMANOVA

Theorem Let B1 and B2 be the maximum likelihood estimators of B1

and B2, respectively appearing in the GMANOVA-MANOVA model.

Then we have

X = A1B1C 1 + B2C 2 = XPC ′2 + PVA XPQC′

2C ′1 ,

where

V = XQC ′2

(I − PQC′

2C ′1

)QC ′2X

′,

QC ′2 = I − PC ′2 = I − C ′2(C 2C ′2

)− C 2.

Martin Singull 31/48

Page 43: Estimation, testing and residual analysis in the GMANOVA

ResidualsI Residuals, i.e., estimates of the model errors, X − X , is very

important quantities.I Usually, for model validation residuals are helpful.I Residuals can also be used to develop test statistics for different

hypotheses.

The ordinary residuals are obtained by subtracting the fitted values fromthe observations, this means that

R = X − X ,

= X −(XPC ′2 + PV

A XPQC′2C ′1

),

= XQC ′2 − PVA XPQC′

2C ′1

Equivalently (skipping a lot of details), we can write

R = ... = X(QC ′2 − PQC′

2C ′1

)+(I − PV

A

)XPQC′

2C ′1 ,

= X(I − PC ′2:C

′1

)+(I − PV

A

)XPQC′

2C ′1 .

Martin Singull 32/48

Page 44: Estimation, testing and residual analysis in the GMANOVA

ResidualsI Residuals, i.e., estimates of the model errors, X − X , is very

important quantities.I Usually, for model validation residuals are helpful.I Residuals can also be used to develop test statistics for different

hypotheses.

The ordinary residuals are obtained by subtracting the fitted values fromthe observations, this means that

R = X − X ,

= X −(XPC ′2 + PV

A XPQC′2C ′1

),

= XQC ′2 − PVA XPQC′

2C ′1

Equivalently (skipping a lot of details), we can write

R = ... = X(QC ′2 − PQC′

2C ′1

)+(I − PV

A

)XPQC′

2C ′1 ,

= X(I − PC ′2:C

′1

)+(I − PV

A

)XPQC′

2C ′1 .

Martin Singull 32/48

Page 45: Estimation, testing and residual analysis in the GMANOVA

Theorem. If we take V = XQC ′2

(I − PQC′

2C ′1

)QC ′2X

′.

Then the residuals for the GMANOVA-MANOVA model are given by

R1 = X(I − PC ′2:C

′1

),

R2 =(I − PV

A

)XPQC′

2C ′1 .

Moreover, R1 = R11 + R12, where

R11 = PVA X

(I − PC ′2:C

′1

),

R12 =(I − PV

A

)X(I − PC ′2:C

′1

).

Martin Singull 33/48

Page 46: Estimation, testing and residual analysis in the GMANOVA

I For any matrix A let A◦ be any matrix of full rank spanning theorthogonal complement of C(A) with respect to the standard innerproduct, i.e, C(A◦) = C(A)⊥. Using this definition we can rewriteR2 as follows

R2 =(I − PV

A

)XPQC′

2C ′1 = PV−1

A◦ XPQC′2C ′1

because

I − PVA = I − A

(A′V−1A

)−A′V−1 = VA◦

(A◦′VA◦

)−A◦′

= PV−1

A◦ .

Martin Singull 34/48

Page 47: Estimation, testing and residual analysis in the GMANOVA

C(C ′2)

C(QC ′2C

′1

)C(C ′2 : C ′1

)⊥

CV (A)

CV (A)⊥ R2

R1

AB1C1 + B2C2

E [X] =

Martin Singull 35/48

Page 48: Estimation, testing and residual analysis in the GMANOVA

Interpretation of Residuals

The expressions for the residuals R1 and R2 have a clear meaning.

The first residual

R1 = X(I − PC ′

2 :C ′1

)is the difference between the observations X and the mean XPC ′2:C

′1, it is

the observations minus the means for each group, and therefore it tells ushow far the observations are from their group mean.

More specifically, it gives us information about the between individualassumption in a given group.

Therefore, it can be used to detect observations which deviate from theothers without taking into account any model assumption.

Martin Singull 36/48

Page 49: Estimation, testing and residual analysis in the GMANOVA

Furthermore,

R1 = PVA X

(I − PC ′2:C

′1

)+(I − PV

A

)X(I − PC ′2:C

′1

)= R11 + R12.

and we see that R11 is the difference between the observations X and themean XPC ′2:C

′1

relative to the within-individuals model.

It could therefore, be useful for detecting if observations do not followthe ”within-individuals” model assumptions.

Similarly, R12 is the difference between the observations X and the meanXPC ′2:C

′1

relative to the case where the within-individuals modelassumptions do not hold.

Martin Singull 37/48

Page 50: Estimation, testing and residual analysis in the GMANOVA

For the second residual R2 we have

R2 = XPC ′2:C′1−[A1B1C 1 + B2C 2

]︸ ︷︷ ︸

=X

.

It is the observed mean XPC ′2:C′1

minus the estimated mean model

A1B1C 1 + B2C 2 = X , i.e., the fitted values.

It tells us how well the estimated mean model fits the observed mean.

More specifically, it gives us information about the within individualassumptions.

Therefore, it could give some information about the appropriateness ofthe model assumptions about the mean structure.

Martin Singull 38/48

Page 51: Estimation, testing and residual analysis in the GMANOVA

Properties of Residuals

With the assumption of normality, the following results show that theresiduals are symmetrically distributed around zero.

We have

E (R1) = (AB1C 1 + B2C 2)(QC ′2 − PQC′

2C ′1

)= (AB1C 1 + B2C 2) QC ′2 − (AB1C 1 + B2C 2) PQC′

2C ′1

= AB1C 1QC ′2 − AB1C 1QC ′2

= 0.

For the residual R2, since V and XPQC′2C ′1 are independent, we have

E (R2) = E(PA◦,V−1XPQC′

2C ′1

)= E

(PA◦,V−1 E

[XPQC′

2C ′1

])= 0.

Martin Singull 39/48

Page 52: Estimation, testing and residual analysis in the GMANOVA

Theorem. Let R1 and R2 be the residuals defined above. Then

(i) D (R1) =(I − PC ′1:C

′2

)⊗Σ

(ii) D (R2) = PQC′2C ′1

⊗[

p − r (A)

n − (r1 + r2)− p + r (A)− 1

]A(A′Σ−1A

)−A′.

Theorem. Let R1 and R2 be the residuals defined above. Then

(i) cov (R1,R2) = 0,

(ii) cov(R i , X

)= 0 for i = 1, 2,

(iii) cov(R i , B j

)= 0 for i , j = 1, 2.

Martin Singull 40/48

Page 53: Estimation, testing and residual analysis in the GMANOVA

Theorem. Let R1 and R2 be the residuals defined above. Then

(i) D (R1) =(I − PC ′1:C

′2

)⊗Σ

(ii) D (R2) = PQC′2C ′1

⊗[

p − r (A)

n − (r1 + r2)− p + r (A)− 1

]A(A′Σ−1A

)−A′.

Theorem. Let R1 and R2 be the residuals defined above. Then

(i) cov (R1,R2) = 0,

(ii) cov(R i , X

)= 0 for i = 1, 2,

(iii) cov(R i , B j

)= 0 for i , j = 1, 2.

Martin Singull 40/48

Page 54: Estimation, testing and residual analysis in the GMANOVA

Simulation Study

In this simulation study we simulate the residuals R1 and R2 for theGMANOVA-MANOVA model.

For simplicity, assume two groups of equal sizes and in each group half ofthe individuals have received a pretreatment.

For this study we choose n = 20.

Hence, we simulate X as above N times and calculate the Frobeniusnorm for the mean of the residuals.

Martin Singull 41/48

Page 55: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 42/48

Page 56: Estimation, testing and residual analysis in the GMANOVA

Example - Multiple sclerosis, cont.

I A study was conducted to investigate two treatments for patientssuffering from MS.

I 79 suffers of the disease were recruited into the study.

I 44 were randomized to receive azathioprine alone (group 1).

I 35 were randomized to receive azathioprine + methylprenisommne(group 2).

I For each participant, a measure of autoimmunity, AFCR, wasplanned at clinic visits at baseline (time 0, at initiation of thetreatment) and at 6, 12, and 18 months thereafter.

I MS affects the immune system: low values of AFCR are evidencethat immunity is improving, which is hopefully associated with abetter prognosis for suffers of MS.

I Also recorded for each subject was an indicator of whether or notthe subject had previous treatment (0=no, 1=yes).

Martin Singull 43/48

Page 57: Estimation, testing and residual analysis in the GMANOVA

One model could be given as

X = AB1C 1 + B2C 2 + E ,

where E ∼ N4,79 (0,Σ, I ) with

A =

1 01 61 121 18

, B1 : 2× 2, C 1 =

(1′44 0′350′44 1′35

)

and

B2 : 4× 2 and C 2 =

(0′19 1′25 0′11 0′240′19 0′25 0′11 1′24

).

Martin Singull 44/48

Page 58: Estimation, testing and residual analysis in the GMANOVA

The estimates are given by

B1 =

(12.5537 12.4220−0.1639 −0.2117

), B2 =

0.9903 0.90301.2536 1.00261.1368 0.76461.0720 0.8725

and

Σ =

5.2368 2.1029 1.7694 2.58632.1029 4.7146 2.7628 2.33441.7694 2.7628 5.0649 2.77702.5863 2.3344 2.7770 5.3530

.

Martin Singull 45/48

Page 59: Estimation, testing and residual analysis in the GMANOVA

Also histogram of the residuals R1 and R2 can be calculated

Martin Singull 46/48

Page 60: Estimation, testing and residual analysis in the GMANOVA

Martin Singull 47/48

Page 61: Estimation, testing and residual analysis in the GMANOVA

Conclusions

I In regression analysis, it is well known that the residuals in theunivariate linear model are symmetrically distributed around zeroand uncorrelated with the fitted model.

I The residuals in the GCM, von Rosen (1995) has shown thatresiduals are symmetrically distributed around zero and obtainedmoment expressions.

I Similar results can be obtained for the GMANOVA-MANOVA model.

I Similar results have been obtained by Seid Hamid and von Rosen(2006) for an EGCM with the nested subspace conditionC(C ′2)⊆ C

(C ′1).

I We recall that in our model, we assume no nested subspacecondition C

(C ′

2

)⊆ C

(C ′1).

Martin Singull 48/48

Page 62: Estimation, testing and residual analysis in the GMANOVA

Linkoping University - Research that makes a difference