Upload
anthony-vaughan
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
SADC Course in Statistics
Analysis of Variance with two factors
(Session 13)
2To put your footer here go to View > Header and Footer
Learning Objectives
At the end of this session, you will be able to
• understand and interpret the components of a linear model with two categorical factors
• fit a model involving two factors, interpret the output and present the results
• understand the difference between raw means and adjusted means
• appreciate that a residual analysis is the same with more complex models
3To put your footer here go to View > Header and Footer
Using Paddy again!
In the paddy example, there were two categorical factors, variety and village.
Here we will look at a model including both factors and the corresponding output.
We will also discuss assumptions associated with anova models with categorical factors and procedures to check these assumptions.
4To put your footer here go to View > Header and Footer
A model using two factors
Objective here is to compare paddy yields across the 3 varieties and also across villages.
A linear model for this takes the form:
yij = 0 + vi + gj + ij
Here 0 represents a constant, and the gj (i=1,2,3)
represent the variety effect as before.
We also have the term vi (i=1,2,3,4) to represent the
village effect.
5To put your footer here go to View > Header and Footer
Anova resultsSource d.f. S.S. M.S. F Prob.
Village 3 13.91 4.64 14.0 0.000
Variety 2 25.68 12.84 38.7 0.000
Residual 30 9.95 0.3318
Total 35 49.55
Above is a two-way anova since there are two factors explaining the variability in paddy yields.
Again the Residual M.S. (s2) = 0.3318 describes the variation not explained by village and variety.
6To put your footer here go to View > Header and Footer
Sample sizes
Above shows data is not balanced. Hence need to worry about the order of fitting terms. How then should we interpret the sequential S.S.’s shown in slide 5 anova?
--------+-----------------------+------- | Variety | Village | New Old Trad | Total--------+-----------------------+------- KESEN | 0 3 4 | 7 NANDA | 2 7 5 | 14 NIKO | 0 2 3 | 5 SABEY | 2 5 3 | 10 --------+-----------------------+------- Total | 4 17 15 | 36 --------+-----------------------+-------
7To put your footer here go to View > Header and Footer
Anova with adjusted SS and MS
Source d.f. Adj.S.S. Adj.M.S. F Prob.
Village 3 4.32 1.44 4.34 0.012
Variety 2 25.68 12.84 38.7 0.000
Residual 33 9.95 0.3318
Total 35 49.55
How may the above results be interpreted?
What are your conclusions?
8To put your footer here go to View > Header and Footer
Model estimates
Parameter Coeff. Std.error t t prob
0 :constant 5.284 0.386 13.7 0.000
v2 (Nanda) 0.718 0.272 2.63 0.013
v3 (Niko) -0.179 0.337 -0.53 0.599
v4 (Sabey) 0.633 0.294 2.16 0.039
g2 (old) -1.201 0.327 -3.67 0.001
g3 (trad) -2.614 0.340 -7.68 0.000
What do these results tell us?
9To put your footer here go to View > Header and Footer
Relating estimates to means
Again: Old - New = -1.201 = Estimate of g2
Trad - New = -2.614 = Estimate of g3
This is similar to the case with one categorical factor – can make comparisons easily with the “base” level using model estimates.
But when sample sizes are unequal across the two categorical factors, results should be reported in terms of adjusted means!
10To put your footer here go to View > Header and Footer
Raw means and adjusted means
Sample Raw Std.error
Variety Size(n) Means (s.d./n)
New improved 4 5.96 0.128
Old improved 17 4.54 0.173
Traditional 15 3.00 0.168
Variety Adjusted means Std.error (s/n)
New improved 5.58 0.308
Old improved 4.38 0.148
Traditional 2.96 0.150
Model based summaries (adjusted means):
11To put your footer here go to View > Header and Footer
Computing adjusted meansThe model equation
yij = 0 + vi + gj + ij
can be used to find the variety adjusted means
e.g. adjusted mean for traditional variety is:
= 5.284+0.25[0+0.718–0.179+0.633]–2.614
= 2.963
Thus the variety adjusted mean is an average over the 4 villages.
1 2 3 40 3
ˆ ˆ ˆ ˆv v v vˆ g4
12To put your footer here go to View > Header and Footer
Checking model assumptions
Anova model with two categorical factors is:
yij = 0 + gi + vj + ij
Model assumptions are associated with the ij.
These are checked in exactly the same way as before.
A residual analysis is done, looking at plots of residuals in various ways.
We give below a residual analysis for the model fitted above.
13To put your footer here go to View > Header and Footer
Histogram to check normality
Histogram of standardised residuals after fitting a model of yield on village and variety.
0.1
.2.3
.4.5
De
nsity
-2 -1 0 1 2Standardized residuals
14To put your footer here go to View > Header and Footer
A normal probability plot…
Another check on the normality assumption
Do you think the points follow a straight line?
-2-1
01
2S
tand
ard
ize
d re
sidu
als
-2 -1 0 1 2Inverse Normal
15To put your footer here go to View > Header and Footer
Std. residuals versus fitted values
Checking assumption of variance homogeneity, and identification of outliers:
What can you say here about the variance homogeneity assumption?
-2-1
01
2S
tand
ard
ize
d re
sidu
als
2 3 4 5 6Fitted values
16To put your footer here go to View > Header and Footer
Finally… know your softwareDifferent software packages impose different constraints on model parameters so need to be aware what this is.
For example, Stata and Genstat set the first level of the factor to zero. SPSS and SAS set the last level to zero. Minitab imposes a constraint that sets the sum of the parameter estimates to zero!
Check also whether the software produces sequential or adjusted or some other form of sums of squares. The correct interpretation of anova results would depend on this.
17To put your footer here go to View > Header and Footer
Practical work follows to ensure learning objectives are
achieved…