Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
1
Introduction to Analysis of Variance
2
Experiments with More than 2 Conditions
Often the research that psychologists perform has more conditions than just the control and experimental conditions
You might want to compare the effectiveness of a new treatment to an old treatment and to a placebo control group
Factor – the variable (independent or quasi independent) that designates the groups being compared
Level – the values that the factor can take on
3
Multiple t Tests
How would you analyze data from an experiment with more than two levels?
Given what we know so far, our only real option would be to perform multiple t-tests
The problem with this approach to the data analysis is that you might have to perform many, many t-tests to test all possible combinations of the levels
2
4
Multiple t-Tests
As the number of levels (or conditions) increases, the number of comparisons needed increases more rapidly
# comparisons = (n2 -n) / 2
n = number of levels
5
Why Not Perform Multiple t Tests?
With a computer, it is easy, quick, and error-free to perform multiple t-tests, so why shouldn’t we do it?
The answer lies in αα is the probability of making a Type-I error, or incorrectly rejecting H0 when H0 is true
α applies to a single comparisonIt is correctly called αcomparison wise
6
Why Not Perform Multiple t Tests?
What happens to the probability of making at least one Type-I error as the number of comparisons increases?
It increases too
The probability of making at least one Type-I error across all of your inferential statistical tests is called αfamilywiseor αfw
3
7
αfw
If the probability of making a Type-I error on a given comparison is given by α, what is the probability of making at least 1 Type-I error when you make n comparisons?
8
αfw
αfw = 1 - (1 -α)n
This assumes that the Type I errors are independent
As the number of comparisons increases, the probability of making at least one Type-I error increases rapidly
9
Why Not Multiple t-Tests?
Performing multiple t-tests is bad because it increases the probability that you will make at least one Type-I error
You can never determine which statistically significant results, if any, are probably due to chance
4
10
ANOVA
Part of the solution to this problem rests in a statistical procedure known as the analysis of variance or ANOVA for short
ANOVA replaces the multiple comparisons with a single omnibus null hypothesis
Omnibus -- covers all aspects
In ANOVA, H0 is of the form:H0: µ1 = µ2 = µ3 = …. = µn
That is, H0 is that all the means are equal
11
ANOVA Alternative Hypothesis
Given the H0 and H1 must be both mutually exclusive and exhaustive, what is H1 for ANOVA?Why isn’t this H1?H1: µ1 ≠ µ2 ≠ µ3 ≠…. ≠ µn
In ANOVA, the alternative hypothesis is of the form:H1: not H0 orH1: at least one population mean is different from another
12
Two Estimates of Variance
ANOVA compares two estimates of the variability in the data in order to determine if H0 can be rejected:
Between-treatments varianceAlso called between-groups variance
Within-treatment varianceAlso called within-groups variance
5
13
Within-Treatment Variance
Within-treatment variance is the weighted mean variability within each group or condition
Which of the two sets of distributions to the right has a larger within-treatment variance? Why?
14
Within-Treatment Variance
What causes variability within a treatment?Within-treatment variation is caused by factors that we cannot or did not control in the experiment, such as individual differences between the participants
Do you want within-treatment variability to be large or small?
15
Between-Treatments Variance
Between-treatments variance is a measure of how different the groups are from each other
Which set of distributions has a greater between-treatments variance?
6
16
Sources of Between-Treatments Variance
What causes, or is the source of, between-treatments variance? That is, why are not all the groups identical to each other?
Between-treatments variance is partially caused by the effect that the treatment has on the dependent variable
The larger the effect is, the more different the groups become and the larger between-treatments variance will be
17
Sources of Between-Treatments Variance
Between-treatments variance also measures sampling error
Even if the treatment had no effect on the dependent variable, you still would not expect the distributions to be equal because samples rarely are identical to each other
That is, some of the between-treatments variance is due to the fact that the groups are initially different due to random fluctuations (or errors) in our sampling
18
Variance Summary
Within-treatments variance measures error
Between-treatments variance measures the effect of the treatment on the dependent variable and error
7
19
F Ratio
Fisher’s F ratio is defined as:
20
F Ratio
What should F equal if H0 is true?When H0 is true, then there is no effect of the treatment on the DV
Thus, when H0 is true, we have random, unsystematic differences / random, unsystematic differences which should equal 1
21
F Ratio
What should F be if H0 is not true?When H0 is not true, then there is an effect of the treatment on the DV
Thus, when H0 is not true, F should be larger than 1
8
22
F Ratio
The denominator of the F ratio is called the error term or error variance.
It provides a measure of the variance due to random, unsystematic differences
23
How Big Does F Have to Be?
Several factors influence how big F has to be in order to reject H0
To reject H0, the calculated F must be larger than the critical F which can be found in a table
Anything that makes the critical F value large will make it more difficult to reject H0
24
Factors That Influence the Size of the Critical F
α level -- as α decreases, the size of the critical F increases
Sample size -- as the sample size increase, the degrees of freedom for the within-treatments variance increases, and the size of the critical F decreases
As the sample becomes larger, it becomes more representative of the population
9
25
Factors That Influence the Size of the Critical F
Number of conditions -- as the number of conditions increase, so does the degrees of freedom for the between-treatments variance term
For any reasonable sample size, the size of the critical F will decrease as the number of conditions increases
26
Factors That Influence the Size of the Observed F
Experimental Control -- as experimental control increases, within-treatments variance decreases, and the observed Fincreases
ANOVA Example
Students used one of three types of mnemonics to learn a list of words: rehearsal, visual imagery, peg word. The number of words recalled in the correct order was measured.There were 10 students in each level of the factor
27
10
ANOVA Example
28
Rehearsal Visual Imagery Peg Word
1094184
35246
52332
363418
1111191519
1915192020
n = 10=6.5ΣX = 65ΣX2 = 627
SSReh= 627 – 652 / 10= 204.5
n = 10=4.9ΣX = 49ΣX2 = 445
SSVis = 445 – 492 / 10= 204.9
n = 10=16.8ΣX = 168ΣX2 = 2936
SSPeg= 2936 – 1682 / 10= 113.6
N = 30=9.4
ΣX= 282 = 65+49+169ΣX2=4008=627+445+2936SSTotal = 4008 – 2822 / 30
= 1357.2
29
ANOVA Summary Table
SS = sum of squares -- the numerator of the variance formula = Σ(X-)2 = ΣX2 – (ΣX)2 / n
If n1 = n2 = n3 = …SSBetween= n · SSMΣM = 6.5 + 4.9 + 16.8 = 28.2ΣM2 = 6.52 + 4.92 + 16.82 = 348.5SSBetween= 10 · (348.5 – 28.22 / 3) = 834.2
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
30
ANOVA Summary Table
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
SSWithin = SSRehearsal+ SSVisual Imagery+ SSPeg Word
= 204.5 + 204.9 + 113.6= 523.0
SSTotal = SSBetween+ SSWithin = 834.2 + 523.0 = 1357.2
11
31
ANOVA Summary Table
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
dfBetween= # of conditions – 1 = 3 – 1 = 2
dfWithin = Σ(ni – 1) = (10 – 1) + (10 – 1) + (10 – 1) = 27
dfTotal = N – 1 = 30 – 1 = 29
dfTotal = dfBetween+ dfWithin = 2 + 27 = 29
32
ANOVA Summary Table
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
MSBetween= SSBetween/ dfBetween= 834.2 / 2 = 417.1
MSWithin = SSWithin / dfWithin = 523.0 / 27 = 19.37
F = MSBetween/ MSWithin = 417.1 / 19.37 = 21.53
Critical Value of F
Reject H0 when the calculated F ratio is at least as large as the critical F ratioFind the critical F ratio in a table of critical Fratios
You need to know dfbetween treatments, dfwithin treatment, and αFor the summary table on slide 29, dfbetween treatments= 2, dfwithin treatment= 27, and α = .05Critical F = 3.35Since the calculated F = 21.53 is larger than the critical F = 3.35, we should reject H0
33
12
Effect Sizes with ANOVA
r2 is the proportion of the total variance that is explainable by the treatment
For ANOVA, r2 is reported as η2 (eta squared) in the literature
34
35
ANOVA Assumptions
ANOVA makes certain assumptions:Population from which the samples are selected are normally distributed
Homogeneity of variance -- the variability within each group is approximately equal
Independence of observations
36
ANOVA Assumptions
ANOVA is fairly robust (it will give good results even if the assumptions are violated) to the normality assumption and the homogeneity of variance assumption as long as:
the number of participants in each group is equal
the number of participants in each group is fairly large
13
37
Post Hoc Tests / Multiple Comparisons
ANOVA only tells us if there is an effectThat is, are the means of the groups not all equal?
ANOVA does not tell us which means are different from which other means
Multiple comparisons, or post hoc tests, are used after you reject ANOVA’s H0 to determine which means are probably different from which other means
38
Post Hoc Tests
Post hoc tests are not fundamentally different from performing a t-test
That is, both post hoc tests and t-tests answer the same question: are two means different from each other
The difference is that the post hoc tests protect us from making a Type-I error even though we perform many such comparisons
39
Post Hoc Tests
There are many types of post hoc testsE.g. Tukey, Newman-Keuls, Dunnett, Scheffé, Bonferroni
There is little agreement about which particular one is best
The different tests trade off statistical power for protection from Type-I errors
14
40
Tukey Tests
We will consider only the Tukey test; other people may feel that other tests are more appropriate
The Tukey test offers reasonable protection from Type-I errors while maintaining reasonable statistical power
41
Tukey Tests
You should perform Tukey tests only when two criteria have been met:
There are more than two levels to the factor / IV
The factor / IV is statistically significant
42
Steps in Performing Tukey Tests
Write the hypothesesH0: µ1 = µ2
H1: µ1 ≠ µ2
Specify the α levelα = .05
15
43
Steps in Performing Tukey Tests
Calculate the honestly significant difference (HSD)
qα,k,dfwithin-treatment= tabled q valueα = α levelk = number of levels of the IVdfwithin-treatment= degrees of freedom for MSwithin-treatment
MSwithin-treatment= within-treatment variance estimaten = number of scores in each treatment
44
Steps in Performing Tukey Tests
Take the difference of the means of the conditions you are comparing
If the difference of the means is at least as large as the HSD, you can reject H0
Repeat for whatever other comparisons need to be made
Tukey Example
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
45
Mnemonic
Rehearsal 6.5
Visual Imagery 4.9
Peg Word 16.8
16
Tukey Example
46
SS df MS F p
Between treatments 834.2 2 417.1 21.53 .000
Within treatment 523.0 27 19.37
Total 1357.2 29
Tukey Example
47
Mnemonic
Rehearsal 6.5
Visual Imagery 4.9
Peg Word 16.8
If a pair of means is at least the HSD, 4.88, apart, then they are reliably different
Tukey Example
48
Mnemonic
Rehearsal 6.5
Visual Imagery 4.9
Peg Word 16.8
H0: µRehearsal= µVisual Imagery
H1: µRehearsal≠ µVisual Imagery
If | Rehearsal- Visual Imagery| ≥ HSD, then reject H0| 6.5 – 4.9 | < 4.88 → fail to reject H0
17
Tukey Example
49
Mnemonic
Rehearsal 6.5
Visual Imagery 4.9
Peg Word 16.8
H0: µRehearsal= µPeg Word
H1: µRehearsal≠ µPeg Word
If | Rehearsal- Peg Word| ≥ HSD, then reject H0| 6.5 – 16.8 | ≥ 4.88 → reject H0
Tukey Example
50
Mnemonic
Rehearsal 6.5
Visual Imagery 4.9
Peg Word 16.8
H0: µVisual Imagery= µPeg Word
H1: µVisual Imagery≠ µPeg Word
If | Visual Imagery - Peg Word| ≥ HSD, then reject H0| 4.9 – 16.8 | ≥ 4.88 → reject H0
F = t2
When there are only two levels to the factor and you are performing a two-tailed test, you can use either the t-test or ANOVAThe calculated value of F will equal the square of the calculated value of tThe critical value of F will equal the square of the critical value of tThus, you will always reach the same conclusion –either reject H0 or fail to reject H0
51