Upload
jovel-lapid
View
236
Download
0
Embed Size (px)
DESCRIPTION
anova
Citation preview
Analysis of varianceIntroduction
Solutions of statistical problems based on inference about population means. In this chapter we extend the methods of inference about population means to the comparison of more than two means. When the data have been obtained, according to specified sampling procedures, they are easy to analyze and also may contain information pertinent to population means than could be obtained using random sampling. The procedure for selecting sample data is called the design of an experiment (experimental design) and a statistical procedure for comparing the population means is called the analysis of variance (ANOVA).
Read and notes on types of Experimental Study Designs
- The Completely Randomized Design- The Randomized Block Design- The Factorial Design- Etc
Anova ANOVA–stands for analysis of variance
Its one of the mult -variate data analysis techniques
Its defined as a techinique where by total variation present in a set of data is partitioned into two or more components.
Associated to each of the components is a specific source of variation, so that in the ..… analysis ,its possible to ascertain the magnitude of the contribution of each of these sources to the total variation.
Application
anova has wide application in the analysis of data resulting out of experiments
Analysis of variance is studied mainly for two purposes:
To estimate and test hypothesis about population variances
To estimate and test hypothesis about population means
Other Terms Related to ANOVA
Read about the following
i. Treatment variable
ii. Response variable
iii. Extraneous variables
iv. Experimental unit
Anova and F-distributionAnova uses the F-Distribution
Developed by Sir Ronald Fisher
F-distribution, is one of the continuous sampling distributions we have in statistics
Developed by Sir Ronald Fisher
F-distribution, is one of the continuous sampling distributions we have in statistics
This distribution is developed from a ratio((S21/σ2
1)/ (s22/σ2
2))
Where S21 is the variance from sample n 1 selected from a normally distributed
population
S22 is the variance from sample n 2 selected from a normally distributed population
S21 designate the larger of the two sample variances
This distribution depends on two-degrees –of – freedom values, one corresponding to the value n1 -1 used in computing S2
1 and another corresponding to the value n2 -
1 used in computing S22
Its skewed to the right(so its shape)
Reading of the f-values from F-tables
Locate the F-tables in any statistical book ,Normaly at the back, e.g. Table G in Wayne W.Daneil:Biostatistics for analysis in the health Science
Then the F-value at any point will be obtained using a combination of three things i.e. (i) α (ii) and the two degrees of freedom
The area given in the tables is normally the area left to the right of the F-value and corresponding to the two degrees of freedom
Graph of the F-distribution
Locate the F-tables in any statistical book,Normaly at the back, e.g. Table G in Wayne W.Daneil: Biostatistics for analysis in the health Science
Then the F-value at any point will be obtained using a combination of three things i.e.
(i) α
(ii) and the two degrees of freedom(n1 & n2)
The area given in the tables is normally the area left to the right of the F-value and corresponding to the two degrees of freedom
Illustrative examples
Suppose we have the following information:
(i) n1=11 n2 =4
(ii) α=0.05
The F 0.025,(10 , 3)=14.42 (illustrate to them)
The F 0.05,(10, 3 ) = 8.79(illustrate to them )
Obtain the following F-values from the F-tables
i. F 0.05,(4,20)
ii. F 0.05,(20,4)
iii. F 0.1, (5,10)
iv. F 0.1,(10,5)
v. F 0.025,(12,2)
Anova is a statistical procedure for partitioning errors according to treatment ie how much o error is attributed to experimental design and how much error is attributed to treatment
Example
A researcher wants to compare the effect of 3 different fertilizers on the yield of maize. He is interested in knowing whether differences in yields are due to type of fertilizer or due to soil type or both. That is the effect of fertilizer (treatment) or soil on maize yield.
One method of determining such effect is ANOVA; where you compare the performance of three different types of fertilizers controlling the soil type. When we look at the effect of fertilizer alone, without considering soil type (or effect), the design of analysis is referred to as ONE-WAY ANOVA.
When we look at effect of fertilizers and soil type (both), then we have a TWO WAY ANOVA
When we consider more than 2 factors, we dealing with n-WAY ANOVA
For this course, we shall only look at ONE WAY ANOVA
ONE-WAY ANOVAOne way anova has three components, namely
1. SST = Total Sum of Squares2. SSC = Sum of squares for column means
3. SSE = Error Sum of Squares
Sum of Squares computational Formulas
SST=∑∑ (X2 i,j –T2.. /N)………………………………………………..(i)
SSC=∑(Ti2
. /n - T2.. /N)…………………………………………….(ii)
SSE=SST – SSC………………………………………………………..(iii)
Where N is the number of participants or sample size or number of elements/observations
Ti2
. Is the total of the observation from the ith column
T2.. Total of all the observations
n= number of rows, given as ni for un equal number of rows or measurements ie number of rows in that column
X2 i,j------ the jth observation from the ith population.
We shall use equations (i,ii&iii) to develop an anova table
source SS df MS V.R
Treatment SSC C-1 MSC=SSC/C-1 V.R=F=MSC/MSE
Residual SSE N-C MSE=SSE/N-C
Total N-1
Steps in ANOVA
1. Hypothesis statementHo =µ1= µ2= µ3
HA=at least any two of the means are not equal
OR
Ho =σ2 1= σ 2
2 = σ 2
3
HA: At least any two of the variances are not the same
2. Critical Region(values)Reject Ho
If f > fα [ k-1, k(n-1]3. Computation of test statistic
We use the formulae and summarize results in ANOVA TABLE, whose format is given below
Source of Variation
Sum of squares Degrees of freedom
Mean Square Computed f
Column meansError
SSCSSE
k-1k(n-1)
S12 = SSC/k-1
S22 = SSE/ k(n-1)
f = S1
2/ S22
Total SST Nk-1
An alimentary school teacher wants to try on 3 different reading workbooks. At the end of the year, the 18 children in class will take a test in reading achievement. This test scores will be used to compare the work books. The table below gives the reading achievement test scores. Each set of scores of the six children using a work book is considered as a sample from the population of all the nursery children who might use that type of workbook.
WKBK1 WKb2 WKbk3 Total
2 9 4
4 10 5
3 10 6
4 7 3
5 8 7
6 10 5
Total 24 54 30 108
Example two
The data in the table below represents the number of hours of pain relief provided by 5 different brands of headache tablets administered to 25 subjects. The 25 subjects were randomly divided into 5 groups and each group was treated with a different brand.
Hours of relief from the headache Tablet
Tablets
A B C D E
5 9 3 2 7
4 7 5 3 6
8 8 2 4 9
6 6 3 1 4
3 9 7 4 7
Total 26 39 20 14 33 132
Question; Perform the analysis of variance, and test the hypothesis at the α= 0.05 level of significance that the mean number of hours of relief provided by the tablets is the same for all five brands
Solution:
1. Hypothesis statement
H0: µ1 = µ2 = µ3= µ4 = µ5
H1: At least two of the means are not the same
2. Critical values
α= 0.05
Reject Ho if f>fα [ k-1, k(n-1]
K = 5, n = 6,
=== fα [ 5-1, 5(6-1]
fα [ 4, 20],
From f distribution tables
fα [ 4, 20], = 2.87
== f> 2.87
Computation of test statistic
SST = 52 + 42+ ……..+ 72 - 1322/25
834 – 696.960 = 137.040,
SSC = 262 + 392 + …. + 332/ 5 - 1322/ 25
776.400 – 696.960 = 79.440
SSE = 137.040 - 79.440 = 57.6600
These results and the remaining computations are filled in the Anova table
Source of Variation
Sum of squares Degrees of freedom
Mean Square Computed f
Column means
Error
SSC = 79.440
SSE = 57.600
k-1 = 4
k(n-1) = 20
S12 = SSC/k-1
79.440/4 =19.8860
S22 = SSE/ k(n-1)
= 57.600/20= 2.88
f = S1
2/ S22
6.90
Total SST = 137.040 Nk-1
Decision
Reject the null hypothesis and conclude that the mean number of hours of relief provided by headache tablets is not the same for all 5 brands
Note: For unequal sample sizes, the critical region is given by
Reject Ho if f > f α, (k-1), (N-k)
Assignment:
See hand out