Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Estimation, testing and residual analysis in theGMANOVA-MANOVA model
Martin Singull
Department of MathematicsLinkoping university, Sweden
Department of Mathematics, Makerere University, March 13, 2019
Martin Singull, Linkoping University (LiU), SwedenAcademic Experience
I PhD in Mathematical Statistics (LiU)
I Associate Professor in Mathematical Statistics
I Head of Division, Mathematical Statistics (LiU)
Capacity Building in Math and Stat
I Deputy Team Leader - UR-Sweden Programme /Applied Math and Stat Sub-Programme (Rw)
I Active in the Bilateral programs in Tz (UDSM),Ug (MAK) and Moz (EMU)
I External Evaluator of the EAUMP (2016)
Supevision PhDs
I Supervised 5 Phds (2 Rw, 1 Swe, 2 Tz)
I Ongoing PhD studentsI Main supervisor: 3 (2 Rw, 1 Tz)I Co-supervisor: 6 (1 Rw, 3 Swe, 1 Ug, 1 Za)
Martin Singull 1/48
Joint work
Phd student Beatrice Byukusenge
Professor Dietrich von Rosen
Martin Singull 2/48
Content
I Multivariate Linear Model (MANOVA)
I Growth Curve Model
I Extended Growth Curve Model
I Hypothesis Testing MANOVA Model
I GMANOVA-MANOVA ModelI EstiamtionI Fitted ValuesI ResidualsI Interpretation of ResidualsI Properties of ResidualsI Simulation StudyI Example
Martin Singull 3/48
Multivariate Linear Model
Let X = BC + E , where B : p× k unknown parameter matrix, C : k × nknown design matrix such that r(C ) + p ≤ n and
E ∼ Np,n (0,Σ, I ) .
The likelihood function is given by
L(B,Σ) = (2π)−np2 |Σ|− n
2 e−12 tr{Σ−1(X−M)(X−M)′}
∝ |Σ|− n2 e−
12 tr{Σ−1(X−BC)(X−BC)′},
where M = BC is the linear mean structure.
Martin Singull 4/48
The MLEs for the multivariate linear model is given by
B = XC ′(CC ′)−1,
nΣ = X (I − PC ′) X ′ = RR′
= V ,
where PC ′ = C ′(CC ′)−1C , i.e., the projection on the space C(C ′) andthe estimated mean structure and residual are
BC = XPC ′ , and R = X (I − PC ′) = XQC ′ ,
where we denote the orthogonal projector as QC ′ = I − PC ′ .
The whole space can now be decomposed as C(C ′) � C(C ′)⊥,
BC R
C(C ′) C(C ′)⊥
Martin Singull 5/48
The MLEs for the multivariate linear model is given by
B = XC ′(CC ′)−1,
nΣ = X (I − PC ′) X ′ = RR′
= V ,
where PC ′ = C ′(CC ′)−1C , i.e., the projection on the space C(C ′) andthe estimated mean structure and residual are
BC = XPC ′ , and R = X (I − PC ′) = XQC ′ ,
where we denote the orthogonal projector as QC ′ = I − PC ′ .
The whole space can now be decomposed as C(C ′) � C(C ′)⊥,
BC R
C(C ′) C(C ′)⊥
Martin Singull 5/48
Example - Linear Regression
If p = 1 we have an ordinary linear regression where one wants to explainthe variation for a response variable y with the variation of predictorsx1, . . . , xk with a ”theoretical” linear function
y = β0 + β1x1 + . . .+ βkxk .
Model: yj = µj + εj = β0 + β1xj1 + . . .+ βkxjk + εj , for j = 1, . . . , n, andwhere
µj = β0 + β1xj1 + . . .+ βkxjk
The predictors xj1, . . . , xjk are known, while the parameters β0, . . . , βk areunknown. Further,
ε1, . . . , εn are independent and N(0, σ2),
with σ2 unknown.
Martin Singull 6/48
Example - Linear Regression
If p = 1 we have an ordinary linear regression where one wants to explainthe variation for a response variable y with the variation of predictorsx1, . . . , xk with a ”theoretical” linear function
y = β0 + β1x1 + . . .+ βkxk .
Model: yj = µj + εj = β0 + β1xj1 + . . .+ βkxjk + εj , for j = 1, . . . , n, andwhere
µj = β0 + β1xj1 + . . .+ βkxjk
The predictors xj1, . . . , xjk are known, while the parameters β0, . . . , βk areunknown. Further,
ε1, . . . , εn are independent and N(0, σ2),
with σ2 unknown.
Martin Singull 6/48
Hence, the random variables y1, . . . , yn are independently distributed as
yj ∼ N(µj , σ2)
∼ N(β0 + β1xj1 + . . .+ βkxjk , σ2),
which can be written in matrix formy1y2...yn
=
β0 + β1x11 + . . .+ βkx1kβ0 + β1x21 + . . .+ βkx2k
...β0 + β1xn1 + . . .+ βkxnk
+
ε1ε2...εn
,
Martin Singull 7/48
Hence, the random variables y1, . . . , yn are independently distributed as
yj ∼ N(µj , σ2)
∼ N(β0 + β1xj1 + . . .+ βkxjk , σ2),
which can be written in matrix formy1y2...yn
=
β0 + β1x11 + . . .+ βkx1kβ0 + β1x21 + . . .+ βkx2k
...β0 + β1xn1 + . . .+ βkxnk
+
ε1ε2...εn
,
Martin Singull 7/48
or y1y2...yn
︸ ︷︷ ︸
=y
=
1 x11 . . . x1k1 x21 . . . x2k...
.... . .
...1 xn1 . . . xnk
︸ ︷︷ ︸
=X
β0β1...βk
︸ ︷︷ ︸
=β
+
ε1ε2...εn
︸ ︷︷ ︸
=ε
or shortery ∼ Nn(Xβ, σ2I n),
i.e., a linear model.
Martin Singull 8/48
When X is of full column rank, i.e., r(X ) = k + 1, then the estimatorsfor β and σ2 are given by
β =(X ′X
)−1 X ′y ,
σ2 = s2 =SSE
n − k − 1,
where
SSE = RR′
= (y − X β)′(y − X β) = y ′(I − X (X ′X )−X ′)y = V ,
where the residuals are given by
R = (y − X β)′.
Martin Singull 9/48
We would like to test the bilinear hypothesis
H0 : FBG = 0 vs. H1 : B unrestricted.
Assume that the known matrices F : r × p and G : k ×m are of full rankwith m ≤ k and r ≤ p.
The likelihood ratio test is based on the statistic
Λ =|FVF ′|
|FVF ′ + FBG [G ′(CC ′)−1G ]−1G ′B′F ′|
,
where the asymptotic distribution of Λ is given by
P
(−(n − k − 1
2(r −m + 1)) ln Λ ≥ z
)≈
≈ P(χ2f ≥ z) +
γ2ν2
(P(χ2f+4 ≥ z)− P(χ2
f ≥ z)),
where f = rm, γ2 =rm(r2 + m2 − 5)
48and ν = n − r −m + 1
2.
Martin Singull 10/48
We would like to test the bilinear hypothesis
H0 : FBG = 0 vs. H1 : B unrestricted.
Assume that the known matrices F : r × p and G : k ×m are of full rankwith m ≤ k and r ≤ p.
The likelihood ratio test is based on the statistic
Λ =|FVF ′|
|FVF ′ + FBG [G ′(CC ′)−1G ]−1G ′B′F ′|
,
where the asymptotic distribution of Λ is given by
P
(−(n − k − 1
2(r −m + 1)) ln Λ ≥ z
)≈
≈ P(χ2f ≥ z) +
γ2ν2
(P(χ2f+4 ≥ z)− P(χ2
f ≥ z)),
where f = rm, γ2 =rm(r2 + m2 − 5)
48and ν = n − r −m + 1
2.
Martin Singull 10/48
However, the linear system FBG = 0 can be solved for B as
B =(F ′)◦
Θ1G ′ + Θ2G◦′,
where Θ1 and Θ2 are new parameters.
Then the model becomes
X =(F ′)◦︸ ︷︷ ︸
=A
Θ1︸︷︷︸=B1
G ′C︸︷︷︸=C 1
+ Θ2︸︷︷︸=B2
G◦′C︸ ︷︷ ︸=C 2
+E ,
which is a special case of the so called Extended Growth Curve Model(EGCM) also called the GMANOVA-MANOVA model.
Furthermore, one can show that the likelihood ratio statistics can begiven by the residuals, i.e., it is of interest to study the residuals forthe GMANOVA-MANOVA model.
Martin Singull 11/48
However, the linear system FBG = 0 can be solved for B as
B =(F ′)◦
Θ1G ′ + Θ2G◦′,
where Θ1 and Θ2 are new parameters.
Then the model becomes
X =(F ′)◦︸ ︷︷ ︸
=A
Θ1︸︷︷︸=B1
G ′C︸︷︷︸=C 1
+ Θ2︸︷︷︸=B2
G◦′C︸ ︷︷ ︸=C 2
+E ,
which is a special case of the so called Extended Growth Curve Model(EGCM) also called the GMANOVA-MANOVA model.
Furthermore, one can show that the likelihood ratio statistics can begiven by the residuals, i.e., it is of interest to study the residuals forthe GMANOVA-MANOVA model.
Martin Singull 11/48
Example - Potthoff & Roy (1964)
Dental measurements on eleven girls and sixteen boys at four differentages (8, 10, 12, 14) were taken. Each measurement is the distance, inmillimeters, from the center of pituitary (hypophysis) topterygo-maxillary fissure.
Martin Singull 12/48
X = (x1, . . . , x27)
=
21 21 20.5 23.5 21.5 20 21.5 23 20 . . .16.5 24.5 26 21.5 23 20 25.5 24.5 22 . . .. . . 24 23 27.5 23 21.5 17 22.5 23 2220 21.5 24 24.5 23 21 22.5 23 21 . . .19 25 25 22.5 22.5 23.5 27.5 25.5 22 . . .. . . 21.5 20.5 28 23 23.5 24.5 25.5 24.5 21.521.5 24 24.5 25 22.5 21 23 23.5 22 . . .19 28 29 23 24 22.5 26.5 27 24.5 . . .. . . 24.5 31 31 23.5 24 26 25.5 26 23.523 25.5 26 26.5 23.5 22.5 25 24 21.5 . . .19.5 28 31 26.5 27.5 26 27 28.5 26.5 . . .. . . 25.5 26 31.5 25 28 29.5 26 30 25
.
Martin Singull 13/48
Martin Singull 14/48
Growth Curve Model (Potthoff and Roy, 1964)
Definition. Let X : p × N and B : q × m be the observation andparameter matrices, respectively, and let A : p × q and C : m × Nbe the within and between individual design matrices, respectively.Suppose that q ≤ p and p ≤ N − r(C ) = n.
The Growth Curve model (GCM) is defined by
X = ABC + E ,
where E ∼ Np,N (0,Σ, IN).
More about the GCM, see Kollo and von Rosen (2005).
Martin Singull 15/48
Example - Potthoff & Roy (1964), cont.
Assume that we want to model linear growth, i.e.,
µi =
b0i + 8b1ib0i + 10b1ib0i + 12b1ib0i + 14b1i
, for i = 1, 2.
For this we may use the parameter and design matrices
B =
(b01 b02b11 b12
),
A =
1 81 101 121 14
and C =
(1′11 0′160′11 1′16
).
Martin Singull 16/48
Example - Potthoff & Roy (1964), cont.
Assume that we want to model linear growth, i.e.,
µi =
b0i + 8b1ib0i + 10b1ib0i + 12b1ib0i + 14b1i
, for i = 1, 2.
For this we may use the parameter and design matrices
B =
(b01 b02b11 b12
),
A =
1 81 101 121 14
and C =
(1′11 0′160′11 1′16
).
Martin Singull 16/48
Growth Curve Model – MLEs
If A and C has full rank, the MLEs for the GCM is given
B =(A′V−1A
)−1A′V−1XC ′
(CC ′
)−1, i.e.,
ABC = PVA XPC ′ ,
NΣ =(X − ABC
)(X − ABC
)′= RR
′︸︷︷︸=V
+R1R′1,
where
R1 =(I p − PV
A
)XPC ′ ,
R = X (IN − PC ′) ,
V = RR′
= X (IN − PC ′) X ′,
PC ′ = C ′(CC ′)−1C = projection on C(C ′),
PVA = A
(A′V−1A
)−1A′V−1 = projection on CV (A).
Martin Singull 17/48
Growth Curve Model – MLEs
If A and C has full rank, the MLEs for the GCM is given
B =(A′V−1A
)−1A′V−1XC ′
(CC ′
)−1, i.e.,
ABC = PVA XPC ′ ,
NΣ =(X − ABC
)(X − ABC
)′= RR
′︸︷︷︸=V
+R1R′1,
where
R1 =(I p − PV
A
)XPC ′ ,
R = X (IN − PC ′) ,
V = RR′
= X (IN − PC ′) X ′,
PC ′ = C ′(CC ′)−1C = projection on C(C ′),
PVA = A
(A′V−1A
)−1A′V−1 = projection on CV (A).
Martin Singull 17/48
CV (A)⊗ C(C ′) � (CV (A)⊗ C(C ′))⊥
= (CV (A)⊗ C(C ′)) � CV (A)⊥ ⊗ C(C ′) � V ⊗ C(C ′)⊥
CV (A)⊥ R1 R = X (IN − PC ′)
RR1 =
(I p − PV
A
)XPC ′
CV (A) ABC
C(C ′) C(C ′)⊥
ABC = PVA XPC ′ ,
NΣ = RR′
+ R1R′1.
Martin Singull 18/48
Example - Potthoff & Roy (1964), cont.
The MLEs for the Example are given by
B =
(17.4254 15.84230.4764 0.8268
)and
Σ =
5.1192 2.4409 3.6105 2.52222.4409 3.9279 2.7175 3.06233.6105 2.7175 5.9798 3.82352.5222 3.0623 3.8235 4.6180
.
Martin Singull 19/48
Martin Singull 20/48
Martin Singull 20/48
Extended Growth Curve Model
Definition. Let X : p×n, Ai : p×qi , qi ≤ p, B i : qi×ki , C i : ki×n,r(C 1) + p ≤ n, i = 1, ...,m, C(C ′i ) ⊆ C(C ′i−1), i = 1, ...,m andΣ : p × p positive definite.
Then the Extended Growth Curve model (EGCM) is defined by
X =m∑i=1
AiB iC i + E ,
where E ∼ Np,n (0,Σ, I n).
The subspace condition C(C ′i ) ⊆ C(C ′i−1) may be replaced by
C(Ai ) ⊆ C(Ai−1).
See (Filipiak & von Rosen, 2012) for more details.
The EGCM is also called the sum of profiles model.
Martin Singull 21/48
Extended Growth Curve Model
Definition. Let X : p×n, Ai : p×qi , qi ≤ p, B i : qi×ki , C i : ki×n,r(C 1) + p ≤ n, i = 1, ...,m, C(C ′i ) ⊆ C(C ′i−1), i = 1, ...,m andΣ : p × p positive definite.
Then the Extended Growth Curve model (EGCM) is defined by
X =m∑i=1
AiB iC i + E ,
where E ∼ Np,n (0,Σ, I n).
The subspace condition C(C ′i ) ⊆ C(C ′i−1) may be replaced by
C(Ai ) ⊆ C(Ai−1).
See (Filipiak & von Rosen, 2012) for more details.
The EGCM is also called the sum of profiles model.
Martin Singull 21/48
Example - Potthoff & Roy (1964), cont.
It seems to be reasonable to assume that for both girls and boys we havea linear growth component but additionally for the boys there also existsa second order polynomial structure.
Then we may use the EGCM with two terms
E(X ) = A1B1C 1 + A2B2C 2,
where A1,B1 and C 1 are as before and
A2 =(82 102 122 142
)′,
C 2 =(0′11 1′16
)and B2 = b22.
Martin Singull 22/48
In the papers
I Singull, M., & von Rosen, D. (2010). Explicit estimators ofparameters in the Growth Curve model with linearly structuredcovariance matrices. J. Multivariate Anal. 101:1284–1295.
I Nzabanita, J., von Rosen, D., & Singull, M. (2015). Estimation ofparameters in the extended growth curve model with a linearlystructured covariance matrix. Acta et Com. Univ. Tartuensis deMath., 16(1):13–32.
I Nzabanita, J., von Rosen, D., & Singull, M. (2015). ExtendedGMANOVA Model with a Linearly Structured Covariance Matrix.Math. Meth. Stat., 24(4):280–291.
we study the problem of estimation of the parameters for the (E)GCMwith linearly structured covariance matrix Σ by using the residuals.
Martin Singull 23/48
Furthermore, in
I Srivastava, M. S. & Singull, M. (2017). Test for the mean matrix ina Growth Curve model for high dimensions. Commun. Statist.Theor. Meth., 46(13):6668–6683.
I Srivastava, M. S. & Singull, M. (2017). Testing sphericity and intra-class covariance structures under a growth curve model in highdimension. Commun. Statist. Sim. Comp., 46(7):5740-5751.
we study the problem of high-dimensional GCM, i.e., when
p ↗ N − r = n
or even when p > n.
Martin Singull 24/48
Example - Multiple SclerosisIn this example we will study a special case of the extended GCM.
I A study was conducted to investigate two treatments for patientssuffering from multiple sclerosis (MS).
I 79 suffers of the disease were recruited into the study.
I 44 were randomized to receive azathioprine alone (group 1).
I 35 were randomized to receive azathioprine + methylprenisommne(group 2).
I For each participant, a measure of autoimmunity, AFCR, wasplanned at clinic visits at baseline (time 0, at initiation of thetreatment) and at 6, 12, and 18 months thereafter.
I MS affects the immune system: low values of AFCR are evidencethat immunity is improving, which is hopefully associated with abetter prognosis for suffers of MS.
I Also recorded for each subject was an indicator of whether or notthe subject had previous treatment (0=no, 1=yes).
Martin Singull 25/48
Martin Singull 26/48
Martin Singull 26/48
GMANOVA-MANOVA Model
The GMANOVA-MANOVA model is a special case of the EGCM.
Let X be an p× n matrix of observations, where n represents the numberof subjects measured each at p occasions.
X = AB1C 1 + B2C 2 + E ,
where E ∼ Np,n (0,Σ, I n) and
I A : p ×m1, B1 : m1 × r1,C 1 : r1 × n
I B2 : p × r2,C 2 : r2 × n,
I In this case B2C 2 term encodes the pre-treatment (covariates).
Observe that for this model the subspace condition for the withinindividual design matrices is always fulfilled since C (A) ⊆ C (I ).
Martin Singull 27/48
Estimation of Parameters B1, B2 and Σ
If C (A) ⊆ C (I ) , m1 ≤ p and Σ is a positive definite matrix, then MLEscan be obtained from examination of the likelihood function
L (B1,B2,Σ)
= (2π)−12 pn |Σ|
−1
2n
exp(− 1
2tr{
Σ−1 (X − AB1C 1 − B2C 2) ()′})
.
The likelihood equations are
A′Σ−1(X − AB1C 1 − B2C 2
)C ′1 = 0,
Σ−1(X − AB1C 1 − B2C 2
)C ′2 = 0,
nΣ =(X − AB1C 1 − B2C 2
)()′.
Martin Singull 28/48
If we take similar as before
PVA = A
(A′V−1A
)−1A′V−1 with P I
A = PA and
QA = I − PA.
Then, after some algebra the solution of this system follows
B1 =(A′V−1A
)−A′V−1XQC ′2C
′1
(C 1QC ′2C
′1
)−,
B2 =(X − AB1C 1
)C ′2(C 2C ′2
)−,
nΣ =(X − AB1C 1 − B2C 2
)()′=(X −
(XPC ′2 + PV
A XPQC′2C ′1
))()′= V +
(I − PV
A
)XPQC′
2C ′1X
′(I − PV ′
A
).
with V = XQC ′2
(I − PQC′
2C ′1
)QC ′2X
′.
Martin Singull 29/48
Fitted Values
The fitted values is given by
X = AB1C 1 + B2C 2
= XPC ′2 + PVA XQC ′2C
′1
(C 1QC ′2C
′1
)−C 1QC ′2 .
Hence, it follows that
X = XPC ′2 + PVA XPQC′
2C ′1 .
Martin Singull 30/48
Theorem Let B1 and B2 be the maximum likelihood estimators of B1
and B2, respectively appearing in the GMANOVA-MANOVA model.
Then we have
X = A1B1C 1 + B2C 2 = XPC ′2 + PVA XPQC′
2C ′1 ,
where
V = XQC ′2
(I − PQC′
2C ′1
)QC ′2X
′,
QC ′2 = I − PC ′2 = I − C ′2(C 2C ′2
)− C 2.
Martin Singull 31/48
ResidualsI Residuals, i.e., estimates of the model errors, X − X , is very
important quantities.I Usually, for model validation residuals are helpful.I Residuals can also be used to develop test statistics for different
hypotheses.
The ordinary residuals are obtained by subtracting the fitted values fromthe observations, this means that
R = X − X ,
= X −(XPC ′2 + PV
A XPQC′2C ′1
),
= XQC ′2 − PVA XPQC′
2C ′1
Equivalently (skipping a lot of details), we can write
R = ... = X(QC ′2 − PQC′
2C ′1
)+(I − PV
A
)XPQC′
2C ′1 ,
= X(I − PC ′2:C
′1
)+(I − PV
A
)XPQC′
2C ′1 .
Martin Singull 32/48
ResidualsI Residuals, i.e., estimates of the model errors, X − X , is very
important quantities.I Usually, for model validation residuals are helpful.I Residuals can also be used to develop test statistics for different
hypotheses.
The ordinary residuals are obtained by subtracting the fitted values fromthe observations, this means that
R = X − X ,
= X −(XPC ′2 + PV
A XPQC′2C ′1
),
= XQC ′2 − PVA XPQC′
2C ′1
Equivalently (skipping a lot of details), we can write
R = ... = X(QC ′2 − PQC′
2C ′1
)+(I − PV
A
)XPQC′
2C ′1 ,
= X(I − PC ′2:C
′1
)+(I − PV
A
)XPQC′
2C ′1 .
Martin Singull 32/48
Theorem. If we take V = XQC ′2
(I − PQC′
2C ′1
)QC ′2X
′.
Then the residuals for the GMANOVA-MANOVA model are given by
R1 = X(I − PC ′2:C
′1
),
R2 =(I − PV
A
)XPQC′
2C ′1 .
Moreover, R1 = R11 + R12, where
R11 = PVA X
(I − PC ′2:C
′1
),
R12 =(I − PV
A
)X(I − PC ′2:C
′1
).
Martin Singull 33/48
I For any matrix A let A◦ be any matrix of full rank spanning theorthogonal complement of C(A) with respect to the standard innerproduct, i.e, C(A◦) = C(A)⊥. Using this definition we can rewriteR2 as follows
R2 =(I − PV
A
)XPQC′
2C ′1 = PV−1
A◦ XPQC′2C ′1
because
I − PVA = I − A
(A′V−1A
)−A′V−1 = VA◦
(A◦′VA◦
)−A◦′
= PV−1
A◦ .
Martin Singull 34/48
C(C ′2)
C(QC ′2C
′1
)C(C ′2 : C ′1
)⊥
CV (A)
CV (A)⊥ R2
R1
AB1C1 + B2C2
E [X] =
Martin Singull 35/48
Interpretation of Residuals
The expressions for the residuals R1 and R2 have a clear meaning.
The first residual
R1 = X(I − PC ′
2 :C ′1
)is the difference between the observations X and the mean XPC ′2:C
′1, it is
the observations minus the means for each group, and therefore it tells ushow far the observations are from their group mean.
More specifically, it gives us information about the between individualassumption in a given group.
Therefore, it can be used to detect observations which deviate from theothers without taking into account any model assumption.
Martin Singull 36/48
Furthermore,
R1 = PVA X
(I − PC ′2:C
′1
)+(I − PV
A
)X(I − PC ′2:C
′1
)= R11 + R12.
and we see that R11 is the difference between the observations X and themean XPC ′2:C
′1
relative to the within-individuals model.
It could therefore, be useful for detecting if observations do not followthe ”within-individuals” model assumptions.
Similarly, R12 is the difference between the observations X and the meanXPC ′2:C
′1
relative to the case where the within-individuals modelassumptions do not hold.
Martin Singull 37/48
For the second residual R2 we have
R2 = XPC ′2:C′1−[A1B1C 1 + B2C 2
]︸ ︷︷ ︸
=X
.
It is the observed mean XPC ′2:C′1
minus the estimated mean model
A1B1C 1 + B2C 2 = X , i.e., the fitted values.
It tells us how well the estimated mean model fits the observed mean.
More specifically, it gives us information about the within individualassumptions.
Therefore, it could give some information about the appropriateness ofthe model assumptions about the mean structure.
Martin Singull 38/48
Properties of Residuals
With the assumption of normality, the following results show that theresiduals are symmetrically distributed around zero.
We have
E (R1) = (AB1C 1 + B2C 2)(QC ′2 − PQC′
2C ′1
)= (AB1C 1 + B2C 2) QC ′2 − (AB1C 1 + B2C 2) PQC′
2C ′1
= AB1C 1QC ′2 − AB1C 1QC ′2
= 0.
For the residual R2, since V and XPQC′2C ′1 are independent, we have
E (R2) = E(PA◦,V−1XPQC′
2C ′1
)= E
(PA◦,V−1 E
[XPQC′
2C ′1
])= 0.
Martin Singull 39/48
Theorem. Let R1 and R2 be the residuals defined above. Then
(i) D (R1) =(I − PC ′1:C
′2
)⊗Σ
(ii) D (R2) = PQC′2C ′1
⊗[
p − r (A)
n − (r1 + r2)− p + r (A)− 1
]A(A′Σ−1A
)−A′.
Theorem. Let R1 and R2 be the residuals defined above. Then
(i) cov (R1,R2) = 0,
(ii) cov(R i , X
)= 0 for i = 1, 2,
(iii) cov(R i , B j
)= 0 for i , j = 1, 2.
Martin Singull 40/48
Theorem. Let R1 and R2 be the residuals defined above. Then
(i) D (R1) =(I − PC ′1:C
′2
)⊗Σ
(ii) D (R2) = PQC′2C ′1
⊗[
p − r (A)
n − (r1 + r2)− p + r (A)− 1
]A(A′Σ−1A
)−A′.
Theorem. Let R1 and R2 be the residuals defined above. Then
(i) cov (R1,R2) = 0,
(ii) cov(R i , X
)= 0 for i = 1, 2,
(iii) cov(R i , B j
)= 0 for i , j = 1, 2.
Martin Singull 40/48
Simulation Study
In this simulation study we simulate the residuals R1 and R2 for theGMANOVA-MANOVA model.
For simplicity, assume two groups of equal sizes and in each group half ofthe individuals have received a pretreatment.
For this study we choose n = 20.
Hence, we simulate X as above N times and calculate the Frobeniusnorm for the mean of the residuals.
Martin Singull 41/48
Martin Singull 42/48
Example - Multiple sclerosis, cont.
I A study was conducted to investigate two treatments for patientssuffering from MS.
I 79 suffers of the disease were recruited into the study.
I 44 were randomized to receive azathioprine alone (group 1).
I 35 were randomized to receive azathioprine + methylprenisommne(group 2).
I For each participant, a measure of autoimmunity, AFCR, wasplanned at clinic visits at baseline (time 0, at initiation of thetreatment) and at 6, 12, and 18 months thereafter.
I MS affects the immune system: low values of AFCR are evidencethat immunity is improving, which is hopefully associated with abetter prognosis for suffers of MS.
I Also recorded for each subject was an indicator of whether or notthe subject had previous treatment (0=no, 1=yes).
Martin Singull 43/48
One model could be given as
X = AB1C 1 + B2C 2 + E ,
where E ∼ N4,79 (0,Σ, I ) with
A =
1 01 61 121 18
, B1 : 2× 2, C 1 =
(1′44 0′350′44 1′35
)
and
B2 : 4× 2 and C 2 =
(0′19 1′25 0′11 0′240′19 0′25 0′11 1′24
).
Martin Singull 44/48
The estimates are given by
B1 =
(12.5537 12.4220−0.1639 −0.2117
), B2 =
0.9903 0.90301.2536 1.00261.1368 0.76461.0720 0.8725
and
Σ =
5.2368 2.1029 1.7694 2.58632.1029 4.7146 2.7628 2.33441.7694 2.7628 5.0649 2.77702.5863 2.3344 2.7770 5.3530
.
Martin Singull 45/48
Also histogram of the residuals R1 and R2 can be calculated
Martin Singull 46/48
Martin Singull 47/48
Conclusions
I In regression analysis, it is well known that the residuals in theunivariate linear model are symmetrically distributed around zeroand uncorrelated with the fitted model.
I The residuals in the GCM, von Rosen (1995) has shown thatresiduals are symmetrically distributed around zero and obtainedmoment expressions.
I Similar results can be obtained for the GMANOVA-MANOVA model.
I Similar results have been obtained by Seid Hamid and von Rosen(2006) for an EGCM with the nested subspace conditionC(C ′2)⊆ C
(C ′1).
I We recall that in our model, we assume no nested subspacecondition C
(C ′
2
)⊆ C
(C ′1).
Martin Singull 48/48
Linkoping University - Research that makes a difference