Analysis of variance John W. Worley AudioGroup, WCL Department of Electrical and Computer Engineering University of Patras, Greece

Analysis of varianceAnalysis of variance

John W. WorleyJohn W. Worley

AudioGroup, WCLAudioGroup, WCL

Department of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering

University of Patras, GreeceUniversity of Patras, Greece

http://www.wcl.ee.upatras.gr/AudioGroup/http://www.wcl.ee.upatras.gr/AudioGroup/

Slide 2 of 25

Experiment

Hypothesis

Null Hyp. (H0) Exp. Hyp. (H1)

I.V. D.V.

Data

Nominal Ordinal Interval Ratio

Analysis

Modify Hyp.

Type I Type II

Slide 3 of 25

Data AnalysisData Analysis

Descriptive• Mean.• Standard error the mean (SEM).

Inferential statistics• t-test.

– Related-means t-test.– Independent-means t-test.

Univariate Analysis of Variance (ANOVA).

Multivariate ANOVA (MANOVA).

There are lots of fancy statistics out there, and if the study calls for it you have to do them. However, with a well-designed experiment you can get away with just simple statistics

Then mean tells us if an experimental factor created a change in performance, above chance or more than a control group or condition.

Mean is irrelevant w/o a measure of variance, such as stdev. And its better to use SEM since as we then account for the number of observations

To correctly reject the null hyp, we need a measure of probability to conclude that our result was not merely due to chance. With psychological tests we work to a 0.05 confidence level, therefore we are 95% sure that our result is statistically significant. The t-test is basically where you find the difference between the means and divide by the SEM of the difference to get t-crit. When looking up values of t-crit values in a table you need to know the degrees of freedom which is the amount of observations minus 1.

Related-means is when you have a within-subject design, and therefore the samples are drawn from the same populationIndependent-means is when you have a between-subjects design and therefore the population distribution could be different

Anova is an extension of the t-test and allows us to manipulate many IV’s within one experiment, rather than running multiple separate experiments.

The MANOVA is an extension of the ANOVA but with more than one DV

Slide 4 of 25

0

1

2

3

4

5

61 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Control Group

Exp. Group

Variance

µ1 µ2

Small VarianceSmall Variance

Slide 5 of 25

0

1

2

3

4

5

6 Control Group

Exp. Group

Variance

µ1 µ2

Large VarianceLarge Variance

The amount of overlap we allow is determined from our significance level

Slide 6 of 25

0

1

2

3

4

5

61 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Control Group

Exp. Groupµ1 µ2

P < 0.05

Small VarianceSmall Variance

So if we accept a 0.05 sig level then we accept data distributed like this.

Slide 7 of 25

0

1

2

3

4

5

6 Control Group

Exp. Groupµ1 µ2

P > 0.05

Large VarianceLarge Variance

But we would not accept this distribution.

Slide 8 of 25

Errors in statistical decisionsErrors in statistical decisions

Type I error:• Rejecting Null Hyp. when it’s true.

Type II error:• Retaining Null Hyp. when its false.

These error types are important when trying to understand what statistical test you can perform.Before discussing ANOVA I want to talk about why we use anova when we have severall IV's. Since we could merely use the t-test to look at all of the differences in a an exp. with more than 2 groups.EG. if we have 3 seperate groups and we perform t-test's on the three group combinations we will need to perform 3 t-tests. The probability of making a type I error in each of the tests is 5%, therefore the probability of making no Type I errors is 95%. However, assuming each t-test is independant then the overall probability of no type I errors is 0.95 to power 3, which is 0.857. In other words we have a 14% probability of making at least one Type I error ( far greater than the 5% criterion we usually apopt). With 10 tests the probability of a Type I error increases to 40%.Whenever we reject the null hyp their is a probability of making a type I error. The probability of making the error is based on the size of the region of rejection. Normally this is 0.05, therefore their is a 1 in 20 chance of us making a Type I error.The type II error is when a significant differnce exists between our groups but we still retain the null hypothesis.If we are to avoid type II error, we are correctly rejecting the false null hyp. This ability is known as power. Therefore, the power of a statistical test is the probability that we will correctly reject the null hypothesis. Microscope analogy

Slide 9 of 25

Factorial design: Definitions

Factor is a categorical predictor variable• e.g. A treatment.

Level is amount of the factor.• e.g. the amount of treatment.

One-way ANOVA• One factor, multiple levels.

Two-way ANOVA• Two factors, different levels within the factors.

So lets define some terminology you will find with an analysis of variance.

A factor is basically your IV, and the level is the amount of change in the IV. If we use a one-way anova, we only manipulate one factor within the experiment. A two-way anova means that we have two IV’s, these are advantageous because they allow us to identify interactions between variables. You can also have 3-, or 4-way anova’s, but these can be difficult to interpret the interactions if you find any.An example- Memory task.We wish to discover a subjects memory recall for a storyWe predict that use of a mnemonic aid will improve story recall. The mnemonic will be an experimental factor, and if we use multiple types of mnemonic aids then these would constitute, different levels. As I have already stated a Within subjects design (all subjects perform all conditions), or between subjects factors (different subject group for each factor), or a mixed design (where we have some factors split between groups, and other factors performed by all subjects).

Slide 10 of 25

ANOVA: AssumptionsANOVA: Assumptions

Normal distributed data.• Histogram• Kolmogorov-Smirnov test• Shapiro-Wilk test

Interval or ratio data. Independence. Homogeneity of variance.

• Levens test

same as all parametric tests

assuming that data from different subs are independant. In that subjects do not influence people.

problem is its subjective to look and say 'yeh thats looks normal'

If the result of these tests is non-sig then the data is normally distributed. since you are compaing your data to a normal data set with the same mean and standard devaition as your data

The variance in each group is roughly equal. Use levens test, if the result is non.sig then the variances are homogeneous

Slide 11 of 25

Mnemonic aid

YES NO Levels of Factor Mnemonic

n1

n2

nn

Subjects

One-way, Between-Subjects Design: - One between groups factor (with 2 levels).

So using our memory example. This is the best way to visualise your design. And it is how U will enter your data into a statistical package, such as SPSS.So, we have one between-subjects factor, the mnumonic aid, and whether the subjects receive the aid is the levels of the factor.

Slide 12 of 25

Memory recall with practise

Levels of Factor Time

Day-1 Day-2 Day-3

One-way, Within-subjects Design: - One within groups factor (with 3 levels)

n1

n2

nn

Subjects

Alternatively, we could predict that practicing the story over a period of days will improve recall. This we could create as a within-subjects design Therefore, we would have practise as the predictive variable, and the amount of practise would be the levels

Slide 13 of 25

Mnemonic aid

YES NO Levels of Factors Mnemonic

Levels of Factors TimeDay-1 Day-2 Day-3

n1

n2

nn

Subjects

Mixed Design: - One between groups factor (with 2 levels).

- One within groups factor (with 3 levels)

Day-1 Day-2 Day-3

OK, those examples were fairly simplistic and if we were interested in both practice and mnemonic aids in memory recall we would need twos separate experiments. An alternative would be to create a mixed-design. With one between groups factor (the mnemonic aid) (with 2 levels), and one within groups factor (the amount of practice) (with 3 levels). The advantage of this design would be that we would be able to reveal whether there is an interaction between mnemonic aid and practise. An interaction being when the effect of one variable depends on the level of another variable. This is easy to see graphically.

Slide 14 of 25

Variable InteractionVariable Interaction

No Interaction

0

20

40

60

80

100

1 2 3

Practice (Days)

% S

tory

Re

call

Aid

No AidInteraction

0

20

40

60

80

100

1 2 3Practice (Days)

% S

tory

Re

call

Aid

No Aid

Story recall is improved by mnemonic aid and practice, with no interaction.

An interaction, practice has a greater effect upon recall with a mnemonic aid.

So using the hypothetical story recall example. On the left we se that practice gives a slight improvement in recall, and use of a mnemonic aid provides a consitent improvement in recall over the three days of testing, however, there is no interaction between practice and the use of a mnemonic aid. The difference in recall is 20% on each dayConversly, on the right we can see that without a mnemonic aid we have the same consistent increase in performance over time. But with a mnemonic aid story recall is improved every day over not having a mnemonic aid. This is therefore an interaction between the factors.

Notice the lines are parallel, when their isn’t an interaction this is always true.

Slide 15 of 25

Post-hoc testsPost-hoc tests

Least-significant difference (LSD) pairwise comparison.

• No Type I error control.

Studentized Newman-Keuls (SNK).• Liberal, no Type I error control.

Bonferroni Method.• Controls Type I error• Good when comparisons small.

Tukey Test• Controls Type I error• Good when comparisons large.

LSD equivelant to multiple t-tests

In some circumstances you may wish to make paiwise comparisons with your data, as in the case of post-hoc tests.The most popular method for controlling for Type I error is to divide the type I error rate (0.05) by the number of comparisons. So if we make 10 comparisons then type I error rate becomes 0.005. This method is known as the bonferroni correction. We control for making a type I error but at the cost of statistical power. So by being conservative we reduce the probability of making a type I error, but increase the possibility of making a Type II error.

Slide 16 of 25

MANOVAMANOVA

Memory recall with practise

Levels of Factor Time

Day-1 Day-2 Day-3

n1

n2

nn

SubjectsRecall

ComprehensionConfidence

Using our memory recall example we can visualise this study if the experiemntor had measured more than just participants recall of the story, ie the experiemntor could measure comprehension of the story and also the particpants confidence in doing the task.

Slide 17 of 25

Multivariate analysis of variance (MANOVA)Multivariate analysis of variance (MANOVA)

2-stage test For more than one DV. Avoids Type I error, with multiple ANOVA’s. Detects differences among a combination of

variables

Whereas ANOVA will only detect differences along a singular dimension

first the manova, for which there are 4 ways to assess the overall significance. Then the secondary test will interpret the group differnces when manova is significant, typically univariate anova's or discriminant analysis.

Slide 18 of 25

MANOVA: Pre-requisites.MANOVA: Pre-requisites.

What DV’s:• Correlations*.• Discriminate function variates.

Assumptions: Independence. Random sampling. Interval data Multivariate normality Equality of covariance

Leven’s test Box’s test

*Cole et al (1994) Psych. Bulletin, 115(3), 465-474.

analyse the theoretical meaningful variables. Just because you have measured lots of variables does not mean that you have to include all of them in the analysis.some contreversy about how to choose the DV's, BUt, the bottom line appears to be If you have large group differences, then the power of manova increases if the DV's are uncorrelated. But, if you have 2 Dv's one showing a large group difference and the other showing a small group difference then the power of the MANOVA will be improved if these two variables are highly correlated.

Finding discriminate function variates is really just a method of data reduction. U R reducing the DV's into underlying dimensions or factors. Via a mathamatical procedure known as maximisation. and ie the 1st discriminate function is a linear combination of DV's that maximises the diff's between groups.

Observations should be statistically independant

Data should be randomly sampled and measured at an interval level.

MULTI variate NORMALITYIn anova we assume univariate normality, which is that the DV is normally distributed within each group. With Manova we assume that the DV's collectivly have multivarite normality with groups. You cannot test multivariate normality with SPSS so instead you test univariate normality on each DV, this does not gaurantee mulitvariate normaility but its close since univariate normality is a necessary condition for multivariate normality

EQUALITY OF COVARIANCEessentially that the variance is homogeneous, therefore the correlation between any 2 DV's is the same in all groups.covariance means that two groups change in the same way.firstly you test univariate equality of variance using Levens test, and levens should not be significant for any of the DV's. But, levens doesn't account for covariance. For this use Box's test, and the result should be non. sig. However, Box's test is susceptable to devaiations from multivariate normality, so it can be non. sig not because the matrices are similiar but because the assumption of multivariate normaility is not tenanble. Therefore, you need to have any idea about the normality of your data before you look at equality of covariance.

Slide 19 of 25

MANOVA: OutputMANOVA: Output

Pillai-Bartlett Trace (V). Wilk’s Lamda (Λ). Hotelling’s T2. Roy’s Largest Root.

In spss the manova output will give you 4 test statistics.If you have one underlying variate then these test statistics will be the same. However, you will not always have only one underlying variate therefore you will need to know which test is the best in terms of robustness and power.

POWERFirstly, If your sample size is small to moderate then the tests will not differ much in terms of power. (Olson, 1974)If the group differences are concentrated on the first variate then Roy's largest root will be the best choice since as it only takes into account the first variate., followed by Hottellings trace, Wilks lambda, then Pillai-Bartlett's trace.However, when your groups differ along more than one variate then this power ordering is reversed. The Pillai-Bartlett trace is the most powerful and Roy's largest root is the least.

Another issue related to test power is caused by the sample size and the number of DV's.Stevens (1980) recommends that you keep the number of DV's to less than 10, unless you have a large sample size.

Robustness.All 4 test statistics are relativly robust to violations of multivariate normality. However, Roys largest root is not robust when the homogeneity is not tenable. Also, Roys root is affected by playkurtic distributions.

Slide 20 of 25

DistributionsDistributions

platykurtic

leptokurtic

Distributions can differ in terms of how large or fat their tails are.The two distributions you see here differ in how many scores they have in their tails. They differ in kurtosis. The top distribution has rel. more scores in its tails than the bottom distribution, its called a leptokurtic distribution.The bottom distribution is a platykurtic distribution , and as you can see it has fewer extreme scores.

Slide 21 of 25

MANOVA: OutputMANOVA: Output

Pillai-Bartlett Trace (V). Wilk’s Lamda (Λ). Hotelling’s T2. Roy’s Largest Root.

The bottom line is that when your sample sizes are equal the Pillia-Bartlett trace is most robust to violations of our assumptions.However, if sample sizes are unequal, the Pillia-Bartlett trace is affected by violations of the assumption of equal variance matrices. Therfore, if you have unequal sample sizes check the homogeneity of covariance matrices using box's test, if its non-sig. and the assumption of multivariate normality is satisfied, which allows us to assume that box's test is accurate, then assume the Pillia-Bartlett trace is accurate.

Slide 22 of 25

MANOVA: Follow-up analysisMANOVA: Follow-up analysis

If MANOVA sig.ANOVA

• Type I error– Bonferroni correction

Discriminant analysis

One method to follow-up a significant MANOVA, Run seperate anova's on each DV.But, U R thinkin, why bother with the manova, and isn't their a possibility of a type I error. Well one idea is that a significant MANOVA protects against type I error. The argument is that the overall MANOVA protects against type I error since if the manova is non-sig. then you do not report any folllow-up anova's. But, this is not entirely true, since you could have a sig. manova that is only due to one of your DV's, and not all. Therefore, the manova is giving protection to those Dv's for which a genuine differences exist.Therfore, it is probably best to apply a bonferroni correction to the level at which you accept significance.

Another follow-up method advocated by some researchers is discriminant anaysis.Since the follow-up anaova implies that the sig. manova is not due to the DV's representing underlying dimensions that differentiate the groups.A discriminate analysis finds the linear combinations of the DV's that best differentaites the groups.Therefore, the adv. over anova is that it reduces and explains the DV's interms of underlying dimensions that reflect underlying theoretical dimensions.In SPSS the manova analysis will automatically give you univariate analyses on each DV, whereas, you will have to run the discriminate analysis seperatly on your data.

Slide 23 of 25

Practical ConsiderationsPractical Considerations

Controls Standardisation.

Instructions Practise/fatigue effects

Counterbalance conditions. Randomise trial order.

Within- or Between-subjects design? For MANOVA:

Choose DV’s theoretically Do Follow-up analysis. Experimentwise errors

OK before I finish lets go thru a few practical considerations which will help you make experiments.

Since we need to control extraneous variables. Therefore, if an extraneous variable is unavoidable then we control for them by making sure that every subject receives the extraneous variable.

Carry-over effects

Use a Latin-square for more than 2 levels in a repeated measures test

A within-subjects design will give you more statistics more power, since you are then certain that they two groups are homogenous

A one-tailed hypothesis will make the critical value of t easier to reach

Slide 24 of 25

ConclusionsConclusions

Identify your DV type. Within- or Between-subjects design. Control for everything. Report descriptive and inferential statistics. ANOVA is highly useful. MANOVA is logical progression.

Follow-up analysis. Bonferroni correction.

Since this will define the statistics you can use

If possible choose a within-subjects design, it allows for more powerful statistical test

Be aware that there are different test paradigms. Choice of the right one will make your exp. more valid. You do not need to design an exp. like all those that have gone before. Think before you start an exp, what the answer will look like if you have chance performance or perfect performance, and ask yourself is this what I wish to discover

Without correct experimental controls we cannot be sure that the result you obtained is merely an artefact of some uncontrolled extraneous variable

Remember that w/o a measure of variance we can infer nothing from your data

supresses the increase of type I error that would be apparent with multiple t-tests allows us to see interactions

a logical progression of the anova allowing us to examine more than one DV simultaneously.But, be aware of the possibility of rejecting the null when its true. therefore, use bongferroni correction, and follow-up analysis.

AudioGroup, WCLAudioGroup, WCLDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringUniversity of Patras, GreeceUniversity of Patras, Greecehttp://www.wcl.ee.upatras.gr/AudioGroup/http://www.wcl.ee.upatras.gr/AudioGroup/

Documents

Analysis of variance John W. Worley AudioGroup, WCL Department of Electrical and Computer Engineering University of Patras, Greece