12
BIOL 582 Lecture Set 8 Two-factor Models

BIOL 582

  • Upload
    fifi

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

BIOL 582. Lecture Set 8 Two-factor Models. BIOL 582. Disclaimer. We learned that ANOVA can be thought of as the comparison of errors between two models (one of which is the “ Null ” model) This is a useful way of thinking about ANOVA - PowerPoint PPT Presentation

Citation preview

Page 1: BIOL 582

BIOL 582

Lecture Set 8

Two-factor Models

Page 2: BIOL 582

BIOL 582 Disclaimer

• We learned that ANOVA can be thought of as the comparison of errors between two models (one of which is the “Null” model)

• This is a useful way of thinking about ANOVA• In fact, from this point on, this course is a severe departure

from how different ANOVA models are typically presented• Most courses/texts use always-changing definitions of how

to calculate sums of squares.• Here is a link for how one person (like many) approaches

multi-factor ANOVA link• We will not worry about formulas; rather we will worry

about defining models and “sub-models”, and calculated sums of squares from comparisons of SSE

Page 3: BIOL 582

BIOL 582 Two-factor Model Set-up

• Often, biological research is concerned with multiple factors that might explain the variation of a response variable

• Consider the pupfish data. There are two factors: sex and population• A two-factor ANOVA is one which allows for the comparison of relative

strengths of each factor to explain response variation• As we will see, these factors are additive – they are decomposed parts

of a larger factor• First, consider this linear equation

• Which has the model

• That produces error

Page 4: BIOL 582

BIOL 582 Two-factor Model Set-up

• It is easy to see all possible “sub-models” (reduced models) of the full model. They are shown here in terms of decreasing complexity

• Imagine that for every model, the SSE can be obtained easily (from residuals of predictions made by estimated model parameters). There are four sets of SSE from the four different models

• From model containing: both factors A factor only B factor only only the intercept• All models contain an intercept

Page 5: BIOL 582

BIOL 582 Two-factor Model Set-up

• This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)?

• There are two ways to do this!

• Type I SS (Sequential)

• Consider the null hypothesis, * , which means factor A has no effect

• Then it should be true that is another way to say the same thing,

because if factor A is meaningless, there would be no improvement over the null model if

we included it. Thus, is a measure of model improvement

because of factor A.

• Likewise, is a measure of improvement because of factor B

* Actually, it is more appropriate to state the null that the effect, α, is equal to 0, as it is a parameter that has some real value,

estimated from the observed data. SSA is not a parameter (although it contributes to a population parameter, σ2, among population

means).

Page 6: BIOL 582

BIOL 582 Two-factor Model Set-up

• This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)?

• There are two ways to do this!

• Type III SS (Weighted)

• Consider the null hypothesis, , which means factor A has no effect

• Then it should be true that is another way to say the same thing,

because if factor A is meaningless, there would be no change in model error by

excluding it. Thus, is a measure of model detriment by

excluding factor A, and this detriment is, therefore, the effect of Factor A.

• Likewise, is a measure of the effect of of factor B

Page 7: BIOL 582

BIOL 582 Two-factor Model Set-up

• This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)?

• There are two ways to do this!

• What, no type II?

• There are actually ~6 types of SS. Some (IV-VI) concern missing data. Type II will be explained but it requires factor interactions, before its distinction from type I and type III is apparent.

• Types IV-VI will not be discussed (beyond scope of this class)

Page 8: BIOL 582

BIOL 582 Two-factor Model Hypotheses

Null Alternative Base Statistic

SSM = SST - SSE

SSA

SSB

M means “model” – it refers to all effects of the model

In general, it is better to use variance in the null hypothesis statement, as effects might contain several parameters. Plus, the alternative hypothesis when using variance is always a one-tail result.

Page 9: BIOL 582

BIOL 582 Two-factor Model Uses and Assumptions

• There is not much good use for a two-factor ANOVA except to introduce how to use ANOVA with multiple factors. The next step (next lecture) will be to understand factor interactions. Many research designs use multiple factors with interactions.

• Assumptions include• Normally distributed residuals (not data)• Homoscedasticity• Independent observations (i.e., sample sizes don’t contain multiple

measurements on the same subjects; different samples or treatments do not contain the same subjects)

• These are the assumptions of Linear Models!

Page 10: BIOL 582

BIOL 582 Two-factor Model Evaluation

• Summary of ANOVA for two factors (excluding interactions of factors)• Type I (Sequential) – values in blue only necessary for F distribution-determination of P-values.

• Type III (Weighted)

• k is the number of parameters (coefficients) needed for the effect

Source SS df MS F

A SSEμ - SSEA kA MSA = SSA/dfA MSA/MSE

B SSEA – SSEA,B kB MSB = SSB/dfB MSB/MSE

error SSEA,B n – kA - kB-1 MSE=SSEA,B/dferror

Source SS df MS F

A SSEB - SSEA,B kA MSA = SSA/dfA MSA/MSE

B SSEA – SSEA,B kB MSB = SSB/dfB MSB/MSE

error SSEA,B n – kA - kB-1 MSE=SSEA,B/dferror

Page 11: BIOL 582

BIOL 582 Two-factor Model Evaluation

• Example from pupfish-parasite data in R (ignore AIC values for now)

> lm.sex.pop<-lm(log.grubs~SEX+POPULATION)> > anova(lm.sex.pop) # Type I SS

Analysis of Variance Table

Response: log.grubs Df Sum Sq Mean Sq F value Pr(>F) SEX 1 15.554 15.5543 9.4775 0.002685 **POPULATION 1 1.176 1.1762 0.7167 0.399264 Residuals 100 164.119 1.6412 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > > drop1(lm.sex.pop,test="F") # Type III SS

Single term deletions

Model:log.grubs ~ SEX + POPULATION Df Sum of Sq RSS AIC F value Pr(F) <none> 164.12 53.984 SEX 1 16.6425 180.76 61.932 10.1405 0.001934 **POPULATION 1 1.1762 165.29 52.719 0.7167 0.399264 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 12: BIOL 582

BIOL 582 Multiple Comparisons

• Whether one uses type I or type III sums of squares is not an issue, because the SSE (called RSS in R) of the full model is the same.

• Multiple comparison tests like Tukey’s HSD use the SSE of the full model to calculate standard error.

• However, multiple comparisons with a two-factor model are generally unenlightening

• Example from pupfish-parasite data in R

> pop<-factor(POPULATION)> sex<-factor(SEX)> > aov.two.factor<-aov(log.grubs~sex+pop)> > aov.two.factorCall: aov(formula = log.grubs ~ sex + pop)

Terms: sex pop ResidualsSum of Squares 15.55428 1.17617 164.11883Deg. of Freedom 1 1 100

Residual standard error: 1.281089 Estimated effects may be unbalanced

> TukeyHSD(aov.two.factor) Tukey multiple comparisons of means 95% family-wise confidence level

Fit: aov(formula = log.grubs ~ sex + pop)

$sex diff lwr upr p adjM-F 0.7938817 0.2822641 1.305499 0.002685

$pop diff lwr upr p adj2-1 0.2101225 -0.2919093 0.7121544 0.4083022