Ch12 Analysis of Variance

Preview:

DESCRIPTION

Ch12 Analysis of Variance. Outline. Completely randomized designs Randomized-block designs. Analysis of Variance. Single Factor Analysis of Variance Single Factor ANOVA One Way Analysis of Variance One Way ANOVA. Background. - PowerPoint PPT Presentation

Citation preview

Ch12 Analysis of Variance

Outline

Completely randomized designs

Randomized-block designs

Analysis of Variance

Single Factor Analysis of Variance

Single Factor ANOVA

One Way Analysis of Variance

One Way ANOVA

BackgroundIf we have, say, 3 treatments to compare (A, B, C) then we would need 3 separate t-tests (comparing A with B, A with C, and B with C). If we had 7 treatments we would need 21 separate t-tests. This would be time-consuming but, more important, it would be inherently flawed because in each t-test we accept a 5% chance of our conclusion being wrong (when we test for p = 0.05). So, in 21 tests we would expect (by probability) that one test would give us a false result. ANalysis Of Variance (ANOVA) overcomes this problem by enabling us to detect significant differences between the treatments as a whole. We do a single test to see if there are differences between the means at our chosen probability level.

Assumption: equal variances, independent populations, random sampling

The scheme of one-way classification

1

1

2

2

211 12 1 1 1 1 1

1

221 22 2 2 2 2 2

1

21 2

1

1 2

, , , , , ( )

, , , , , ( )

, , , , , ( )

1

, , , , , (

:

2 :

:

:

i

i

k

n

j n ji

n

j n ji

n

i i ij in i ij ii

k k kj kn k kj

y y y y y y y

y y y y y y y

y y y

Observations Means Sum of Squ

Sample

Sample

Sample i

Samp

y y y y

y y

are

le k y y y y

s

2

1

)kn

ki

y

Simplify

1 1

.ink

iji j

T y

1

k

ii

N n

1

1

.

k

i iik

ii

n yT

yN

n

y is the overall mean or grand mean of all observations.

iyis the mean of the measurements obtained by the i-th laboratory.

The statistical analysis leading to a comparison of the k different population means consists essentially of splitting the sum of squares about the overall grand mean into a component due to treatment difference, and a component due to error or variation within a sample.

EX

Suppose 3 drying formulas for curing a glue are studied and the following times observed.

Formula A: 13 10 8 11 8

Formula B: 13 11 14 14

Formula C: 4 1 3 4 2 4

Each observation can be decomposed as

( ) ( )ij i ij i

observation grand deviation due error

mean to treatment

y y y y y y

Repeating the decomposition for each observation, we obtain the arrays

( ) ( )

1310 8 11 8 8 8 8 8 8 2 2 2 2 2 3 0 2 1 2

13111414 8 8 8 8 5 5 5 5 0 2 1 1

4 1 3 4 2 4 8 8 8 8 8 8 5 5 5 5 5 5 1 2 0 1 1 1

ij i ij i

observation grand mean tr

y y y y y

eament effects err r

y

o

2

1

( ) ( )k

i ii

treatment sum of square SS Tr n y y

2

1 1

( )ink

ij ii j

error sum of square SSE y y

Degrees of freedom for treatment: k-1

Degrees of freedom for error: N-k

Theorem. 2 2 2

1 1 1 1 1

( ) ( ) ( )i in nk k k

ij ij i i ii j i j i

SST y y y y n y y

SST SSE SS(Tr)

If denotes the mean of the i-th population and denotes the common variance of the k populations.

i 2

ij i ijY

ij i ijY Where is the mean of the in the experiment and

is the effect of the i-th treatment; hence

i

1

0k

i ii

n

The null hypothesis that the k population means are all equal can be replaced by the null hypothesis

1 2 0k

The alternative hypothesis that at least two of the population means are unequal.

To test the null hypothesis that the k population means are all equal, we shall compare two estimates of

One based on the variation among the sample means, and one based on the variation within the samples.

2

Each sum of squares is first converted to a mean square.

sum of squares

degrees of freedommean square

When the population means are equal, both

are estimates of 2

2

1

( )

-1

k

i iitreatment mean squaren y y

k

2

1 1

( )ink

ij ii jerror mean square

y y

N k

and

If the null hypothesis is true, it can be shown that the two mean squares are independent and that their ratio

2

1

2

1 1

( ) /( 1)( ) /( 1)

/( )( ) /( )

i

k

i iink

ij ii j

n y y kSS Tr k

SSE N ky y N

F

k

has an F distribution with k-1 and N-k degrees of freedom.

A large value for F indicates large difference between the sample means. Therefore, the null hypothesis will be rejected, if at level of significance. F F

One-way ANOVASource of variance

Sum of squares

Degree of freedom

Mean square

Computed

f

Treatments SS(Tr) K - 1

Error SSE K (n - 1)

Total SST nk - k

21

( )

1

SS Trs

k

2

( 1)

SSEs

k n

212

s

s

Solution of EX

Solution

One-way ANOVA: A, B, C

Source DF SS MS F P

Factor 2 270.00 135.00 50.63 0.000

Error 12 32.00 2.67

Total 14 302.00

The value of so we reject the null hypothesis of equal means.

0.05 (2,12) 3.89F

Exercise

Assume that we have recorded the biomass of 3 bacteria in flasks of glucose broth, and we used 3 replicate flasks for each bacterium

Replicate Bacterium

A

Bacterium

B

Bacterium C

1 12 20 40

2 15 19 35

3 9 23 42

Solution

One-way ANOVA: A, B, C

Source DF SS MS F P

Factor 2 1140.2 570.1 64.93 0.000

Error 6 52.7 8.78

Total 8 1192.9

The value of F(2,6) = 5.1 in the level of 0.05 so we reject the null hypothesis of equal means.

12.3 Random-Block designs

1 2 3 4

1

2

3

13 7 9 3

14 6 3 1

11 5 15 5

Blocks

Two way ANOVA

RCB Randomized Complete Block

The randomized block design is an extension of the paired t-test to situations where the factor of interest has more than two levels.

Example 1:

Suppose we are interested in how weight gain (Y) in rats is affected by Source of protein (Beef, Cereal, and Pork) and by Level of Protein (High or Low).

There are a total of t = 32 = 6 treatment combinations of the two factors (Beef -High Protein, Cereal-High Protein, Pork-High Protein, Beef -Low Protein, Cereal-Low Protein, and Pork-Low Protein) .

Suppose we have available to us a total of N = 60 experimental rats to which we are going to apply the different diets based on the t = 6 treatment combinations.

Prior to the experimentation the rats were divided into n = 10 homogeneous groups of size 6.

The grouping was based on factors that had previously been ignored (Example - Initial weight size, appetite size etc.)

Within each of the 10 blocks a rat is randomly assigned a treatment combination (diet).

The weight gain after a fixed period is measured for each of the test animals and is tabulated on the next slide:

Block Block 1 107 96 112 83 87 90 6 128 89 104 85 84 89 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

2 102 72 100 82 70 94 7 56 70 72 64 62 63 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

3 102 76 102 85 95 86 8 97 91 92 80 72 82 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

4 93 70 93 63 71 63 9 80 63 87 82 81 63 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

5 111 79 101 72 75 81 10 103 102 112 83 93 81 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

Randomized Block Design

Example 2:

The following experiment is interested in comparing the effect four different chemicals (A, B, C and D) in producing water resistance (y) in textiles.

A strip of material, randomly selected from each bolt, is cut into four pieces (samples) the pieces are randomly assigned to receive one of the four chemical treatments.

This process is replicated three times producing a Randomized Block (RB) design.

Moisture resistance (y) were measured for each of the samples. (Low readings indicate low moisture penetration).

The data is given in the diagram and table on the next slide.

Diagram: Blocks (Bolt Samples)

9.9 C 13.4 D 12.7 B 10.1 A 12.9 B 12.9 D 11.4 B 12.2 A 11.4 C 12.1 D 12.3 C 11.9 A

Table

Blocks (Bolt Samples)

Chemical 1 2 3

A 10.1 12.2 11.9

B 11.4 12.9 12.7

C 9.9 12.3 11.4

D 12.1 13.4 12.9

The randomized block design (RBD) consists of a two-step procedure:

1. Matched sets of experimental units, called blocks, are formed, each block consists of units. The blocks should consist of experimental units that are as similar as possible (to reduce the within-treatments variation) .

2. One experimental unit from each block is randomly assigned to each treatment, resulting in a total of

responses.

3. If every block has responses from all treatments, the design is complete, randomized complete block design.

ab

a b

RCB

For example, consider the situation where three different methods were used to predict the shear strength of steel plate girders. Say we use four girders as the experimental units.

RCB

b

jiji y

by

1.

1

a

iijj y

ay

1.

1

a

i

b

jijyab

y1 1

..

1

The total number of responses is ab.

RCB

The appropriate linear statistical model:

We assume

• treatments and blocks are initially fixed effects

• blocks do not interact

RCB

The hypotheses of interest are:

i.e., there is no treatments effect

RCB

has a-1 df

has b-1 df

has (a-1)(b-1) df

The mean squares are:

RCB

RCB

The expected values of these mean squares are:

RCB

RCB

Minitab

Two-way ANOVA: response versus row, col

Source DF SS MS F P

row 2 56 28.0000 3.23 0.112

col 3 90 30.0000 3.46 0.091

Error 6 52 8.6667

Total 11 198

The P-value > 0.05 level of significance, we cannot reject the null hypothesis.

The Anova Table for Diet Experiment

Source S.S d.f. M.S. F p-valueBlock 5992.41667 9 665.82407 9.52 0.00000Diet 4572.88333 5 914.576666 13.0766586 0.00000

ERROR 3147.28333 45 69.93963Total 13712.58 59

The Anova Table forTextile Experiment

SOURCE SUM OF SQUARES D.F. MEAN SQUARE F TAIL PROB.Blocks 7.17167 2 3.5858 40.21 0.0003Chem 5.20000 3 1.7333 19.44 0.0017

ERROR 0.53500 6 0.0892

Total 12.90667 11

Recommended