Introduction to Analysis of Varianceacademic.udayton.edu/gregelvers/psy216/pdfs/Chap12.pdfFor the summary table on slide 29, df between treatments = 2, df within treatment = 27, and

1

1

Introduction to Analysis of Variance

2

Experiments with More than 2 Conditions

Often the research that psychologists perform has more conditions than just the control and experimental conditions

You might want to compare the effectiveness of a new treatment to an old treatment and to a placebo control group

Factor – the variable (independent or quasi independent) that designates the groups being compared

Level – the values that the factor can take on

3

Multiple t Tests

How would you analyze data from an experiment with more than two levels?

Given what we know so far, our only real option would be to perform multiple t-tests

The problem with this approach to the data analysis is that you might have to perform many, many t-tests to test all possible combinations of the levels

2

4

Multiple t-Tests

As the number of levels (or conditions) increases, the number of comparisons needed increases more rapidly

# comparisons = (n2 -n) / 2

n = number of levels

5

Why Not Perform Multiple t Tests?

With a computer, it is easy, quick, and error-free to perform multiple t-tests, so why shouldn’t we do it?

The answer lies in αα is the probability of making a Type-I error, or incorrectly rejecting H0 when H0 is true

α applies to a single comparisonIt is correctly called αcomparison wise

6

Why Not Perform Multiple t Tests?

What happens to the probability of making at least one Type-I error as the number of comparisons increases?

It increases too

The probability of making at least one Type-I error across all of your inferential statistical tests is called αfamilywiseor αfw

3

7

αfw

If the probability of making a Type-I error on a given comparison is given by α, what is the probability of making at least 1 Type-I error when you make n comparisons?

8

αfw

αfw = 1 - (1 -α)n

This assumes that the Type I errors are independent

As the number of comparisons increases, the probability of making at least one Type-I error increases rapidly

9

Why Not Multiple t-Tests?

Performing multiple t-tests is bad because it increases the probability that you will make at least one Type-I error

You can never determine which statistically significant results, if any, are probably due to chance

4

10

ANOVA

Part of the solution to this problem rests in a statistical procedure known as the analysis of variance or ANOVA for short

ANOVA replaces the multiple comparisons with a single omnibus null hypothesis

Omnibus -- covers all aspects

In ANOVA, H0 is of the form:H0: µ1 = µ2 = µ3 = …. = µn

That is, H0 is that all the means are equal

11

ANOVA Alternative Hypothesis

Given the H0 and H1 must be both mutually exclusive and exhaustive, what is H1 for ANOVA?Why isn’t this H1?H1: µ1 ≠ µ2 ≠ µ3 ≠…. ≠ µn

In ANOVA, the alternative hypothesis is of the form:H1: not H0 orH1: at least one population mean is different from another

12

Two Estimates of Variance

ANOVA compares two estimates of the variability in the data in order to determine if H0 can be rejected:

Between-treatments varianceAlso called between-groups variance

Within-treatment varianceAlso called within-groups variance

5

13

Within-Treatment Variance

Within-treatment variance is the weighted mean variability within each group or condition

Which of the two sets of distributions to the right has a larger within-treatment variance? Why?

14

Within-Treatment Variance

What causes variability within a treatment?Within-treatment variation is caused by factors that we cannot or did not control in the experiment, such as individual differences between the participants

Do you want within-treatment variability to be large or small?

15

Between-Treatments Variance

Between-treatments variance is a measure of how different the groups are from each other

Which set of distributions has a greater between-treatments variance?

6

16

Sources of Between-Treatments Variance

What causes, or is the source of, between-treatments variance? That is, why are not all the groups identical to each other?

Between-treatments variance is partially caused by the effect that the treatment has on the dependent variable

The larger the effect is, the more different the groups become and the larger between-treatments variance will be

17

Sources of Between-Treatments Variance

Between-treatments variance also measures sampling error

Even if the treatment had no effect on the dependent variable, you still would not expect the distributions to be equal because samples rarely are identical to each other

That is, some of the between-treatments variance is due to the fact that the groups are initially different due to random fluctuations (or errors) in our sampling

18

Variance Summary

Within-treatments variance measures error

Between-treatments variance measures the effect of the treatment on the dependent variable and error

7

19

F Ratio

Fisher’s F ratio is defined as:

20

F Ratio

What should F equal if H0 is true?When H0 is true, then there is no effect of the treatment on the DV

Thus, when H0 is true, we have random, unsystematic differences / random, unsystematic differences which should equal 1

21

F Ratio

What should F be if H0 is not true?When H0 is not true, then there is an effect of the treatment on the DV

Thus, when H0 is not true, F should be larger than 1

8

22

F Ratio

The denominator of the F ratio is called the error term or error variance.

It provides a measure of the variance due to random, unsystematic differences

23

How Big Does F Have to Be?

Several factors influence how big F has to be in order to reject H0

To reject H0, the calculated F must be larger than the critical F which can be found in a table

Anything that makes the critical F value large will make it more difficult to reject H0

24

Factors That Influence the Size of the Critical F

α level -- as α decreases, the size of the critical F increases

Sample size -- as the sample size increase, the degrees of freedom for the within-treatments variance increases, and the size of the critical F decreases

As the sample becomes larger, it becomes more representative of the population

9

25

Factors That Influence the Size of the Critical F

Number of conditions -- as the number of conditions increase, so does the degrees of freedom for the between-treatments variance term

For any reasonable sample size, the size of the critical F will decrease as the number of conditions increases

26

Factors That Influence the Size of the Observed F

Experimental Control -- as experimental control increases, within-treatments variance decreases, and the observed Fincreases

ANOVA Example

Students used one of three types of mnemonics to learn a list of words: rehearsal, visual imagery, peg word. The number of words recalled in the correct order was measured.There were 10 students in each level of the factor

27

10

ANOVA Example

28

Rehearsal Visual Imagery Peg Word

1094184

35246

52332

363418

1111191519

1915192020

n = 10=6.5ΣX = 65ΣX2 = 627

SSReh= 627 – 652 / 10= 204.5

n = 10=4.9ΣX = 49ΣX2 = 445

SSVis = 445 – 492 / 10= 204.9

n = 10=16.8ΣX = 168ΣX2 = 2936

SSPeg= 2936 – 1682 / 10= 113.6

N = 30=9.4

ΣX= 282 = 65+49+169ΣX2=4008=627+445+2936SSTotal = 4008 – 2822 / 30

= 1357.2

29

ANOVA Summary Table

SS = sum of squares -- the numerator of the variance formula = Σ(X-)2 = ΣX2 – (ΣX)2 / n

If n1 = n2 = n3 = …SSBetween= n · SSMΣM = 6.5 + 4.9 + 16.8 = 28.2ΣM2 = 6.52 + 4.92 + 16.82 = 348.5SSBetween= 10 · (348.5 – 28.22 / 3) = 834.2

SS df MS F p

Between treatments 834.2 2 417.1 21.53 .000

Within treatment 523.0 27 19.37

Total 1357.2 29

30

ANOVA Summary Table

SS df MS F p



Total 1357.2 29

SSWithin = SSRehearsal+ SSVisual Imagery+ SSPeg Word

= 204.5 + 204.9 + 113.6= 523.0

SSTotal = SSBetween+ SSWithin = 834.2 + 523.0 = 1357.2

11

31

ANOVA Summary Table

SS df MS F p



Total 1357.2 29

dfBetween= # of conditions – 1 = 3 – 1 = 2

dfWithin = Σ(ni – 1) = (10 – 1) + (10 – 1) + (10 – 1) = 27

dfTotal = N – 1 = 30 – 1 = 29

dfTotal = dfBetween+ dfWithin = 2 + 27 = 29

32

ANOVA Summary Table

SS df MS F p



Total 1357.2 29

MSBetween= SSBetween/ dfBetween= 834.2 / 2 = 417.1

MSWithin = SSWithin / dfWithin = 523.0 / 27 = 19.37

F = MSBetween/ MSWithin = 417.1 / 19.37 = 21.53

Critical Value of F

Reject H0 when the calculated F ratio is at least as large as the critical F ratioFind the critical F ratio in a table of critical Fratios

You need to know dfbetween treatments, dfwithin treatment, and αFor the summary table on slide 29, dfbetween treatments= 2, dfwithin treatment= 27, and α = .05Critical F = 3.35Since the calculated F = 21.53 is larger than the critical F = 3.35, we should reject H0

33

12

Effect Sizes with ANOVA

r2 is the proportion of the total variance that is explainable by the treatment

For ANOVA, r2 is reported as η2 (eta squared) in the literature

34

35

ANOVA Assumptions

ANOVA makes certain assumptions:Population from which the samples are selected are normally distributed

Homogeneity of variance -- the variability within each group is approximately equal

Independence of observations

36

ANOVA Assumptions

ANOVA is fairly robust (it will give good results even if the assumptions are violated) to the normality assumption and the homogeneity of variance assumption as long as:

the number of participants in each group is equal

the number of participants in each group is fairly large

13

37

Post Hoc Tests / Multiple Comparisons

ANOVA only tells us if there is an effectThat is, are the means of the groups not all equal?

ANOVA does not tell us which means are different from which other means

Multiple comparisons, or post hoc tests, are used after you reject ANOVA’s H0 to determine which means are probably different from which other means

38

Post Hoc Tests

Post hoc tests are not fundamentally different from performing a t-test

That is, both post hoc tests and t-tests answer the same question: are two means different from each other

The difference is that the post hoc tests protect us from making a Type-I error even though we perform many such comparisons

39

Post Hoc Tests

There are many types of post hoc testsE.g. Tukey, Newman-Keuls, Dunnett, Scheffé, Bonferroni

There is little agreement about which particular one is best

The different tests trade off statistical power for protection from Type-I errors

14

40

Tukey Tests

We will consider only the Tukey test; other people may feel that other tests are more appropriate

The Tukey test offers reasonable protection from Type-I errors while maintaining reasonable statistical power

41

Tukey Tests

You should perform Tukey tests only when two criteria have been met:

There are more than two levels to the factor / IV

The factor / IV is statistically significant

42

Steps in Performing Tukey Tests

Write the hypothesesH0: µ1 = µ2

H1: µ1 ≠ µ2

Specify the α levelα = .05

15

43


Calculate the honestly significant difference (HSD)

qα,k,dfwithin-treatment= tabled q valueα = α levelk = number of levels of the IVdfwithin-treatment= degrees of freedom for MSwithin-treatment

MSwithin-treatment= within-treatment variance estimaten = number of scores in each treatment

44


Take the difference of the means of the conditions you are comparing

If the difference of the means is at least as large as the HSD, you can reject H0

Repeat for whatever other comparisons need to be made

Tukey Example

SS df MS F p



Total 1357.2 29

45

Mnemonic

Rehearsal 6.5

Visual Imagery 4.9

Peg Word 16.8

16

Tukey Example

46

SS df MS F p



Total 1357.2 29

Tukey Example

47

Mnemonic

Rehearsal 6.5

Visual Imagery 4.9

Peg Word 16.8

If a pair of means is at least the HSD, 4.88, apart, then they are reliably different

Tukey Example

48

Mnemonic

Rehearsal 6.5

Visual Imagery 4.9

Peg Word 16.8

H0: µRehearsal= µVisual Imagery

H1: µRehearsal≠ µVisual Imagery

If | Rehearsal- Visual Imagery| ≥ HSD, then reject H0| 6.5 – 4.9 | < 4.88 → fail to reject H0

17

Tukey Example

49

Mnemonic

Rehearsal 6.5

Visual Imagery 4.9

Peg Word 16.8

H0: µRehearsal= µPeg Word

H1: µRehearsal≠ µPeg Word

If | Rehearsal- Peg Word| ≥ HSD, then reject H0| 6.5 – 16.8 | ≥ 4.88 → reject H0

Tukey Example

50

Mnemonic

Rehearsal 6.5

Visual Imagery 4.9

Peg Word 16.8

H0: µVisual Imagery= µPeg Word

H1: µVisual Imagery≠ µPeg Word

If | Visual Imagery - Peg Word| ≥ HSD, then reject H0| 4.9 – 16.8 | ≥ 4.88 → reject H0

F = t2

When there are only two levels to the factor and you are performing a two-tailed test, you can use either the t-test or ANOVAThe calculated value of F will equal the square of the calculated value of tThe critical value of F will equal the square of the critical value of tThus, you will always reach the same conclusion –either reject H0 or fail to reject H0

51

Documents

Introduction to Analysis of Varianceacademic.udayton.edu/gregelvers/psy216/pdfs/Chap12.pdfFor the summary table on slide 29, df between treatments = 2, df within treatment = 27, and