Analysis of Variance

Analysis of varianceIntroduction

Solutions of statistical problems based on inference about population means. In this chapter we extend the methods of inference about population means to the comparison of more than two means. When the data have been obtained, according to specified sampling procedures, they are easy to analyze and also may contain information pertinent to population means than could be obtained using random sampling. The procedure for selecting sample data is called the design of an experiment (experimental design) and a statistical procedure for comparing the population means is called the analysis of variance (ANOVA).

Read and notes on types of Experimental Study Designs

- The Completely Randomized Design- The Randomized Block Design- The Factorial Design- Etc

Anova ANOVA–stands for analysis of variance

Its one of the mult -variate data analysis techniques

Its defined as a techinique where by total variation present in a set of data is partitioned into two or more components.

Associated to each of the components is a specific source of variation, so that in the ..… analysis ,its possible to ascertain the magnitude of the contribution of each of these sources to the total variation.

Application

anova has wide application in the analysis of data resulting out of experiments

Analysis of variance is studied mainly for two purposes:

To estimate and test hypothesis about population variances

To estimate and test hypothesis about population means

Other Terms Related to ANOVA

Read about the following

i. Treatment variable

ii. Response variable

iii. Extraneous variables

iv. Experimental unit

Anova and F-distributionAnova uses the F-Distribution

Developed by Sir Ronald Fisher

F-distribution, is one of the continuous sampling distributions we have in statistics

Developed by Sir Ronald Fisher

F-distribution, is one of the continuous sampling distributions we have in statistics

This distribution is developed from a ratio((S21/σ2

1)/ (s22/σ2

2))

Where S21 is the variance from sample n 1 selected from a normally distributed

population

S22 is the variance from sample n 2 selected from a normally distributed population

S21 designate the larger of the two sample variances

This distribution depends on two-degrees –of – freedom values, one corresponding to the value n1 -1 used in computing S2

1 and another corresponding to the value n2 -

1 used in computing S22

Its skewed to the right(so its shape)

Reading of the f-values from F-tables

Locate the F-tables in any statistical book ,Normaly at the back, e.g. Table G in Wayne W.Daneil:Biostatistics for analysis in the health Science

Then the F-value at any point will be obtained using a combination of three things i.e. (i) α (ii) and the two degrees of freedom

The area given in the tables is normally the area left to the right of the F-value and corresponding to the two degrees of freedom

Graph of the F-distribution

Locate the F-tables in any statistical book,Normaly at the back, e.g. Table G in Wayne W.Daneil: Biostatistics for analysis in the health Science

Then the F-value at any point will be obtained using a combination of three things i.e.

(i) α

(ii) and the two degrees of freedom(n1 & n2)

The area given in the tables is normally the area left to the right of the F-value and corresponding to the two degrees of freedom

Illustrative examples

Suppose we have the following information:

(i) n1=11 n2 =4

(ii) α=0.05

The F 0.025,(10 , 3)=14.42 (illustrate to them)

The F 0.05,(10, 3 ) = 8.79(illustrate to them )

Obtain the following F-values from the F-tables

i. F 0.05,(4,20)

ii. F 0.05,(20,4)

iii. F 0.1, (5,10)

iv. F 0.1,(10,5)

v. F 0.025,(12,2)

Anova is a statistical procedure for partitioning errors according to treatment ie how much o error is attributed to experimental design and how much error is attributed to treatment

Example

A researcher wants to compare the effect of 3 different fertilizers on the yield of maize. He is interested in knowing whether differences in yields are due to type of fertilizer or due to soil type or both. That is the effect of fertilizer (treatment) or soil on maize yield.

One method of determining such effect is ANOVA; where you compare the performance of three different types of fertilizers controlling the soil type. When we look at the effect of fertilizer alone, without considering soil type (or effect), the design of analysis is referred to as ONE-WAY ANOVA.

When we look at effect of fertilizers and soil type (both), then we have a TWO WAY ANOVA

When we consider more than 2 factors, we dealing with n-WAY ANOVA

For this course, we shall only look at ONE WAY ANOVA

ONE-WAY ANOVAOne way anova has three components, namely

1. SST = Total Sum of Squares2. SSC = Sum of squares for column means

3. SSE = Error Sum of Squares

Sum of Squares computational Formulas

SST=∑∑ (X2 i,j –T2.. /N)………………………………………………..(i)

SSC=∑(Ti2

. /n - T2.. /N)…………………………………………….(ii)

SSE=SST – SSC………………………………………………………..(iii)

Where N is the number of participants or sample size or number of elements/observations

Ti2

. Is the total of the observation from the ith column

T2.. Total of all the observations

n= number of rows, given as ni for un equal number of rows or measurements ie number of rows in that column

X2 i,j------ the jth observation from the ith population.

We shall use equations (i,ii&iii) to develop an anova table

source SS df MS V.R

Treatment SSC C-1 MSC=SSC/C-1 V.R=F=MSC/MSE

Residual SSE N-C MSE=SSE/N-C

Total N-1

Steps in ANOVA

1. Hypothesis statementHo =µ1= µ2= µ3

HA=at least any two of the means are not equal

OR

Ho =σ2 1= σ 2

2 = σ 2

3

HA: At least any two of the variances are not the same

2. Critical Region(values)Reject Ho

If f > fα [ k-1, k(n-1]3. Computation of test statistic

We use the formulae and summarize results in ANOVA TABLE, whose format is given below

Source of Variation

Sum of squares Degrees of freedom

Mean Square Computed f

Column meansError

SSCSSE

k-1k(n-1)

S12 = SSC/k-1

S22 = SSE/ k(n-1)

f = S1

2/ S22

Total SST Nk-1

An alimentary school teacher wants to try on 3 different reading workbooks. At the end of the year, the 18 children in class will take a test in reading achievement. This test scores will be used to compare the work books. The table below gives the reading achievement test scores. Each set of scores of the six children using a work book is considered as a sample from the population of all the nursery children who might use that type of workbook.

WKBK1 WKb2 WKbk3 Total

2 9 4

4 10 5

3 10 6

4 7 3

5 8 7

6 10 5

Total 24 54 30 108

Example two

The data in the table below represents the number of hours of pain relief provided by 5 different brands of headache tablets administered to 25 subjects. The 25 subjects were randomly divided into 5 groups and each group was treated with a different brand.

Hours of relief from the headache Tablet

Tablets

A B C D E

5 9 3 2 7

4 7 5 3 6

8 8 2 4 9

6 6 3 1 4

3 9 7 4 7

Total 26 39 20 14 33 132

Question; Perform the analysis of variance, and test the hypothesis at the α= 0.05 level of significance that the mean number of hours of relief provided by the tablets is the same for all five brands

Solution:

1. Hypothesis statement

H0: µ1 = µ2 = µ3= µ4 = µ5

H1: At least two of the means are not the same

2. Critical values

α= 0.05

Reject Ho if f>fα [ k-1, k(n-1]

K = 5, n = 6,

=== fα [ 5-1, 5(6-1]

fα [ 4, 20],

From f distribution tables

fα [ 4, 20], = 2.87

== f> 2.87

Computation of test statistic

SST = 52 + 42+ ……..+ 72 - 1322/25

834 – 696.960 = 137.040,

SSC = 262 + 392 + …. + 332/ 5 - 1322/ 25

776.400 – 696.960 = 79.440

SSE = 137.040 - 79.440 = 57.6600

These results and the remaining computations are filled in the Anova table

Source of Variation

Sum of squares Degrees of freedom

Mean Square Computed f

Column means

Error

SSC = 79.440

SSE = 57.600

k-1 = 4

k(n-1) = 20

S12 = SSC/k-1

79.440/4 =19.8860

S22 = SSE/ k(n-1)

= 57.600/20= 2.88

f = S1

2/ S22

6.90

Total SST = 137.040 Nk-1

Decision

Reject the null hypothesis and conclude that the mean number of hours of relief provided by headache tablets is not the same for all 5 brands

Note: For unequal sample sizes, the critical region is given by

Reject Ho if f > f α, (k-1), (N-k)

Assignment:

See hand out

Documents

Analysis of Variance