29
THE EXPECTATIONS OF MEAN SQUARES by R. E. Comstock Institute of Statistics Mimeograph Series No. 76 For Limited Distribution :,,/t I' f'! ! 1"""- . 'j) ,;

THE EXPECTATIONS OF MEAN SQUARES - Nc State · PDF fileTHE EXPECTATIONS OF MEAN SQUARES by R. E. Comstock Institute of Statistics Mimeograph Series No. 76 For Limited Distribution:,,/t

  • Upload
    dangnhi

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

THE EXPECTATIONS OF MEAN SQUARES

by

R. E. Comstock

Institute of Statistics

Mimeograph Series No. 76For Limited Distribution

:,,/t I'f'! ! 1"""-~ . 'j) ,;

Chapter VI

ThL r.,XP.c.CTATIONS OF ML;AN SQU.l~

The Expectation of a Variable

If individuals are drawn randomly from a population their avera3e value in

terms of any specified measurement will be equal in th8 long run to the mean for

the measurement in the population. We say that the value to be expected on the

average is that of the population mean. In fact, in Statistics the expectation of

a variable quantity is defined as the mean for such quantities in the population to

which the particular variate belongs. For example, let Xl' X2

, ••••• Xi •••••symbolize the values of th8 individuals in any univariate population. Then the

XIS constitute a population of quantities of which the expectation of anyone

chosen at random is ~ where IJ. is the population mean. This is stated symboli-x xcally as follows:

E(X.) = IJ.~ x

where X~ can be any of th~ XIS depending on the value given i and E(l.) is read... - ~

"the expectation of X.".~

As a second example, recall that the population variance is defined as

2 ~ 2(J = .~ (Xi - IJ.) /N

i

(2a)

population

see that

8~mbolizes the population variance,

X. symbolizes the value of any individual quantity in the population,~

N is the number of individuals in the population, and

as before is the population mean.. 2. d f' d th f 11 1 . hvarlance, (J , ~s Slne as e mean 0 a va ues, 1 0 e. t e

2(Xi - IJ.x) • In accord with the definition of expectation we

E(Xi - IJ.x)2 = i

IJ.xThus the

mean, of

or if we wish to represent the deviation of Xi from its population mean by a single

symbol, say xi' we can write

X • ." X. - ~x~ ~

(2b)

As a final example recall that the population covariance of two variables, say

X and Y is defined as

(J = ~ (X. - IJ. )(Y. - IJ. )/Nxy i 1 X ~ Y

-2-

where cr is the covariance ana other symbols have meanings in conformity withxythose listed above when considering the variance of X. We see that the covariance,

a , is defined as the population mean of (X. - ~ )(y. - ~ ) and therefore thatxy ~ x ~ y

Again if we set

we can write

and y. z: Y. - jJ.~ ~ y

E(x.y.) = cr~ ~ xy

crxy (3a)

(3b)

Interest in expectations centers around the fact that by setting observed

quantities equal to their expectations we find a basis for unbiased estimation of

parameters involved in the expectation. For example, it can be shown that

E(X. - 1)2 = n-l i~ n

It follows that

where Xis the mean of a sample of XIS, and

n is the number of individuals in the sample.

E I~ (X. - j{)J . n(n-l) i = (n~=l ~ J n

- 1) i

orr- n

X)2 °1~

E ls2 ~ (x. -2= = a

i=1 ~ In-l J

From this we see that sample variance obtained by dividing the sum of squares by2degrees of freedom has cr as its expectation, i.e. that it provides on unbiased

estimate of i.

Expectation of a Constant

This is specifically mentioned for completeness. Since a constant, by defi­

nition, is a quantity that always has the same value, the expectation of a constant

could hardly be anything but that particular value. For example, a population mean

is a constant and its expectation is the mean itself. Symbolically, if c is any

constant,

1(c) = c (4)

-)-

e .li.xpectation of the Product of a Constant and a Variable

Consider the product

Y '" c X

where X is a variable and c is a constant. We know that the population mean of Y

is c ~x and, tberefor~,

E(Y e C X) '" c ~ = c E(X)x (5)

In g6neral, the expectation of such a product is the product of the constant and

the expectation of the variable.

The Expectation of a Linear Function

Consider the linear function

F '" a + b + c Xl + X2

in which ~, £. and c are constants and Xl and X2 are variable qUJrlti ties drawn

randomly (but not necessarily independently) from two populations (one, a popu­

lation of quantities symbolized as Xl' the other a population of quantities sym­

bolized as X2). Two points are worth special attention.

(1) The specific manner in which F is definod may have the result that value~

of Xl and 12 contributing to different values of the quantity, F, are

correlat~d or on the other hand are independent, i.e. uncorrelated. For

example, suppose F is designed to reflGct in some special way the h6ight

of married couples. Then any single value of F would involve the height

of the husband (Xl) and that of his wife (X2). If the couples are chosen

randomly both Xl and X2 are random values from their respective popula­

tions, but art not necessarily independent in magnitude from one couple

to anothero In fact, evidence indicates that there is a degree of corre­

lation in stature of man and wife.

On the other hand, suppose F were defined as the h8ight of plants,

Xl as ti-!.,;: effect of genotype, and ~ as the effect of environment on

height; and it were known that in the population of plants involved geno­

types were distributed randomly with rl:Jspect to environment. The magni­

tudes of Xl ill1d X2 would vary independently from plant to plant and,

therefore, from one valub of F to another.

-4-

(2) The different variables may actually belong to the sam~ population

though it may be useful to think of them as coming from different ones.

For example, in the function given above Xl and X2 could be a pair of

values drawn randomly from the same population, Xl' being the first and

X2 the second drawn of any pair. In this case Xl and X2 would vary

independently, i.e. be uncorrelated.

Corresponding to every possible pair of values of Xl and ~ there is obviously

a value of F. These values comprise a population of F's. iie know that the mean

value of F in that population is

a ... b +c!J.l +~2

where ~l and ~ are the population means for Xl and X2, respectively. Hence

E(F) = !J.F a a + b + c !J.l + !J.2

where !J.F is the population mean of F. This serves to demonstrat~ the general fact

that the 6xpectation of ~ variable quantity~ is ~ linear function of other

variables is the~ linear function of ~ expectations of those variables. By

this rule

E(F) : E(a) + E(b) + E(c Xl) + E(~)

and since

E(a) = a

E(b) =b

E(c Xl) = c 1J.1

E(Y.2) = 1J.2

1,je have by substitution

E( F) = a + b + C !J.l + 1J.2

as given above.

Expectations of hcan Squares

Any mean square can be writt8n as a linear function in which the variable

quantities are the squares of variables, products of a variable with a constant, or

products of variableS. Bbnc~, the expectations can always be written in terms of

what is presented above. This fact will b~ clarified by examples.

-5-

Example I

Consider thG case rtp~es~nted by the analysis of variance for comparing

groups of equal size. The form of the analysis is as follows:

Variance Source

Groups

Within groups

Total

d.fo

m-l

m(n-l)

m n-l

m.s.

where m is the number of groups and n is number of individuals within groups. The

model on which the analysis is based can be stated symbolically as follows:

Y.. = jJ. + g. + e ..1) 1 1J

where ~ is the population mean taken over all groups,

gi isthe effect of the i-th group (the amount by which the ~opulation mean

for the i-th group deviates from ~), and

e .. is a random effect contributing to tht valu8 of Y for the j-th individuallJ

in the i-th group (the amount by which the individual deviates from the

mean for its group),

One of two assumptions is usually made concerning the groups: (a) that they

are random members of a population of groups, or (b) that the ones on which data

are taken are of special interest in themselves rather than as a sample from a

population. In case (a) the assll.'1lpt.io:1. is frequently stated by saying that g. is1

considered a random v3.riabl<.., in con7..:;.'at:' to case (b) where it is alternatively

said that the g. arc considered constant or fixed.1

g assumed to be a random variable

~e will consider first the case where g. is consid0red a random variable.1

G. be the sum of Yls for the N individuals of th~ i-th group, and1

T be thE.. sum of Y's for all nm individuals on which data were collectedg

Then thG mean squar~

T2

- /m-lnm

(6)

-6-

1 .This may b~ considtred the product of a constant and a variable where ~ 15 them-.Lconstant and the quantity in brackets is the variable. Hence, its expectation may

be written,

1 ,[12 2 2E(Ml ) '" m-l E :n (Gl + G2 + .... Gm)

122 2Note that :n (Gl + G2 + •••• Gm) is "That we commonly call the "uncorrected sum of

squares", that T2/nm is what we call the "correction factor", and that the whole

quantity in brackets is the "corrected sum of squares".

By the rule that the expectation of a linear function is the same function of

~xpectations of the variables in the function, we can write

1 [1 2 2 2 1 21E(Ml ) ... m-l .:n(EGl + EG2 + .... EGm) - run ET j

.... 2

+ Y'2 + •••• Y. J1 1n

Now the separate expectations in the expression can be considered

Consider EG~. In terms of our model,1

EGi •EL~l Iij] 2•E[In

one by one.

.E(n~ + ng. + e' l + e'2 +111

2• • •• e. )1n

Squaring and taking expectations term by term this can be written

'~G2 En2 2 En2 2 E{.t. 1i = ~ + gi + eil + ei2 + ••••2e. )

1n

+ •••• e. )1n

(7)

Before going further note, that both the g I S and e I s are defined as deviations from

a mean and hence, that the population mean of both the gls and e's is zero. Thus,

E(g.) ... 0, E(e .. ) D 0, E{g~) = 02 and B{e~.) = 0

2 where 02 is the population vari-

1 1J 2 1 g 1J e g .ance of g's and cr is the population variance of e's. It is common to assume thateall els are members of the same population and, therefore, that 0

2 is homogeneouseover all groups. This assumption will be made for the purpose of our example but

it should be understood that special cases may arise where the variance of e variese from group to group. It should also be noted that all g' sand e I s are assumed to

-7-

be random members of their populations. The significance of this is that in th~

population (the population that would be generated by repeating the experiment in

identical fashion an infinity of times) the correlation between (1) any two gIs,

(2) any e's, or (3) any ~ and any ~ would be zero. If the correlation is zero, so

also is the covariance and this means that the expectations of all products of two

gIs, two e's or a Kand an e are all equal to zero. Symbolically this is stated

as follows:

E(g. g!) "" 01. 1. (i f i' )

E(e .. et I,) =01.J 1. J (ir i ' ifj =j',jrj' ifi=i')

E(g,' e, ,) = 01. 1.J (either when i = i' or Wben i , i')

Now let us consider the

2IS (1 )

g

••••

2s~veral terms of EGi one by one.

(since n2~2 is a constant)

(since n2 is a constant)

(since Egi2

•••• Sin)

"" n2 Ei1.2 2

"" n (1g

2 2... n IJ. •.. 2 2J:!,njJ.

2 2E n gi

(a)

(b)

(c)

+ ••••

::md')

is (J£.en of them,

(d)

... n(1~ (since thu expectation of each of the 62

,S,

that of each product term is zero)

E2n2IJ.g, ... 2n2~g. (sinc~ 2n21J. is a constant)1. 1.

... zero (since Egi = 0)

(e) E2nIJ. (ei1 + 8 i2 + •••• Sin)

... 2n~ (eil + ei2 + •••• ein) (since 2n~ is a constant)

= z~ro (since the expectation of all e's is zero and, therefore, that

of the sum of any set of e's is also zero)

(f) E2ng, (e'l + b i2 + •••• e. ) ... 2nEg . «(;'1 + (;. 2 + •••• e. ) (sinct;: 2n is1. 1. 1.n 1. 1. 1. . 1.n a constant)

• Z0ro (since the expectation of the product of any ~ and ~ is zero)

REC:2:5J

-8-

Substituting in (7) in terms of (a) to (f) we find that,

22222 2EG. = n ~ + n a + na~ g e (8)

Substituting for the GIS we obtain

E T2 = E [~ + n(gi + g2

Now note that nothing in (8) is specific for the particular group in question (!does not appear as a subscript in the right hand member). The significance is-that

the expectation of G2 is the same for all groups, that

EGi = EG~ = .... EG;

In order to evaluate E(Ml ) it remains only to obtain ET2

E T2 = E (Gl + G2 + •••• Gm

)2

2+ e2l + e22 + .... + e2n .... + eml + em2 + •••• emnJ (9a )

Squaring, taking expectations term by term, and moving constants to th~ left of the

sign for expectation (proper because the expectation of the product of a constant

and a variable is equal to the product of the constant and the expectation of the

variable) we get2 222 2 2 2 2 2 2E T = n m ~ + n Egl + n Eg2 + •••• n E~

E 2 E 2 - 2 d t t+ -ell + e12 + •••• ~emn + pro uc erms

2 2of the types 2n m~ Egl , 2n Egl g2,

2n E.gl ell' or 2~11 8 12

Consider thE. various terms of this expression

(g) 2 Ei = n2 _ 2 2 E 2 2 2(since Eg~ ri)n .c.g2

= •••• n gm = n (J =1 g g

(h) . 2 E 2 2 2 (since Ee~. = i)J::!,ell

c: e12 = •••• = Ee = C1

mn e ~J e

(i) all proauct torms are of types shown to have z~ro expectation in the

process of developing E G~.~

(9b)

Substituting in (9b) in terms of (g) to

E T2 2 2 2

=nrotJ.

(i) we obtain

222+ n ma + nmag e (10)

-9-

Finally substituting in (6) in terms of (8) and (10) we find

1 [m 2 2 2 2 2 1 2 2 2 2 2 2]E(M ) ... ~ - (n II. + n a + no ) - - (n m:1. + n mO" + nma1 m-~ n ~ g e nm ~ g e

2 l-mn-mn] 2 rmn-n] 2 l- m-lJ 2, 2... ~ m-r + ag ~m-r + ae m-l = nag + aE:

The within group me~n square may be computed as follows:m 02

1 ~ 2 u2 _2 1M2 = ( 1) ~ (Y' l + ~2 + •••• ~ ---)m n- . 1 J. J. J.n nJ.=

(11)

(12 )

RcmE:mbering (a) that the expectation of the product and of a constant and a vari­

able is thE. product of the constant and the expectation of thE; varic::blt: and (b)

that tht: E:xp~ctation of a variable that is a linear function of variables is the

same function of the expectations of these later variables, we se& that

E(~) z m(;-l} ~l [E~l + E~2 + •••• E~n - ~ Eoi]2Consider the expectation of Yij

Therefore,

E1:, • E(~ + g. + 6 i .)2J.J J. J

Expanding ~nd taking expectations of individual terms separately we obtain,

Ei, = E~2 + Eg~ + Ee~, + E 2~g. + E 2~e .. + E 2g.e.. (13)J.J J. J.J J. J.J J. J.J

Taking the terms of this 6xpr0ssion separately,

(j) E~2 ~ ~2 (bbcausG ~2 is a constant),

(k) Ei = i (by definition when thcl g's arE.: assum8d random),J. g

(1) Ee~, = 02 (by definition),J.J e

(m) E 2~g. = 2~ Eg. = zero (since 2~ is a constant and Eg. = 0),J. J. J.

(n) E 2~e .. = 2~ £8., ... Z8ro (since 2~ is a constant and Ee'iJ' = 0),J.J J.J

(0) E 2g.e., ... 2 Eg.e .. = zero (since 2 is a constant and Eg.e., a 0).J. J.J J. J.J J. J.J

Substituting in (13) in terms of (j) to (0) we obtain,

;;'y2 2 2 2l~ .. =1-.1. +0 +0

J.J g e (14)

-10-

2We have already shown (8) that the expectation of Gi is,

E G~ ,.. n21J.2 + n2i + nei (8 )1 g e

Note that both (14) and (8) are the same for all Y's and GIS, respectively (all

terms in right hand members are constants). Recognizing this and substituting in

(12) in terms of (8) and (14) we obtain,

E(~) • m(:-l) [n(~2 + <7~ + <7~) - ~(nV + n2<7~ + n<7;)l

- 2 f'rn(n-n) -J + ei [m(n-n) ] + ei [m(n-l)l Ul cl (15)- lJ._m(n-l) g men-i) e _m(n-l)-. e

Using (11) and (15) the analysis of variance can now be pr~sented giving t~e

expectations of the mean squares.

Variance Source

Group·s

Within groups

Total

d.f.

m-l

m(n-l)

mn-l

Expectation of m.s.2 2

(j + n(je g2

(je

~~~ assumed to be constants

Differ~nces occasioned by assuming the gls constant rather than random are

listed below.

gls random gls constant

E g1 = 0 E g. :; g.1 1

E 2 2 Ei 2=(j = gigi g 1

Bcg. = 0 Ecg. = cg.1 l. 1

where c is any constant

other expectations involv8d in (7), (9b), and (13) are not affGcteo. ~'iith

the above differences in mind we s~& that in this case (7) does not reduce to (8)but to

E G~1

(16)

In like manner (9b) reduces to

222 2E.T =nmlJ. 2+ nm(je (17)

-11-

rather th<l.n to (10). The reason why no terms involving gls or the squar·:;s or

products of gls occurs in (17) is clarified by refer6nce to (9a). Note that the

gls ent~r (9a) in a tarm that is the sum of th8 gls for the ~ groups. In the caS6

where; ths g I S are assumed constant tJ. is taken as the population mean for the ~

groups in qU0stion. Then, since the gls are defined as deviations from this mean,

th~ir ~um must be zero. Hence, the term n(gl + g2 + •••• gn) disappears from (9a)

and correspondingly t~rms involving gls disappear from (9b). Finally (13) reduces

to(18)

[

m m1 1 2 2 2'~ 2 2'" 2

E(Ml ) = - -(ron j.L + n ..::::; g. + 2n jJ. ~ g~ + nmoe )m-l n . 1 ~ . 1 ...~= 1'"

rather than to (14).Substituting in (6) in terms of (16) and (17) rather than in terms of (8) and

(10) we obtain,

m

Keeping in mind that :2 g. I:: 0 as pointed out above this reduces toi==l J.

r~?-.!~ 1.1 m,,·J.- ~

m

+~ 2: i.l.' + a:m-l ""i=l [m-11m-l _

(19)

Substituting in (12) in t,:;rms of (16) and (18) rather than in terms of (8)

and (14) we obtain,

m m1 2 2 2~ 2 2 ~- - (mn jJ.. + n ~ g. + 2n u. ~ g.n i=l .1. i=l .1.

f, 2

._ \lTlniJ. + n2

g. + 21J.I1~

m

~i=1

2g. + mna ).1. e

+ mna~]

2= ,J.[

n-n ]_nl(n-l)

2+ a

E;;

2'" (Je

(20)

-12-

m

~ie have again used the fact that ~ g.i=l ~

tatiom is now as follows:

= o. The analysis of variance with GXp0C-

Variance Source

Groups

d.f.

m-l

Expectation of m.Q.m

2+..£... ~ 2O"e m-l .~ gi

1=1

TNithin groups m(n-l) 20"e

Total mn-l

Example 2

As a variation of example 1 consider th~ analysis of variance for comparison

of groups of unequal size. Let nl , n2

•••••• nm symbolize the number per group

in groups 1, 2, •••••• m, respectively. Th~ form of th~ analysis is as follows:

Variance Sourcf:) d.f. m. s.

Groups m-l Mlm

Hithin groups ~ (ni-l)i=l

Total N-l

wher\:J N is the total numb"r of individuals in all groups.

(21)

the same as in example 1.

random variahle. The mean

group size the model will be

case where ~ is considered a

c:mlP:tc:1a[s G~ + ::L .... G~_m- nl n2 nm

Lxcept for variation in

will consider only the

square for groups is

R.eferring to (7) and (8) it is clear that

E G~ 2 2 2 2 2= n. ~ + ni 0" + n. (j

~ ~ g ~ e

and hbnce that2 2 2 2 (22 )E(G. In. ) =n1 jJ. + n. 0" + 0"J. J. J. g e

-13-

'l' is now equal to

N~ + n1g1 + n2g2 + •••• nmgm + ell + 8 12 + •••• sln1

+ e21 + 8 22 + •••• e2n2 + •••• e 1 + e 2 + •••• em m mnm

Squaring and taking expuctations but ommitting t~rms with ~xp~ctation zero we obtain,

222.22 2222E T '" N t.l. + E n1

g'l + E nz g2 + •••• + E nmgm

E 2 ~ 2 E 2+ ell + ~ 6 12 + •••• + sln +

1

222•••• E em1 + E 8m2 + •••• + E t mnm

Evaluating thb s~parat~ terms, this becomes

E T2 • N2,,2 2 2 2 2r"' + n10'g + n20'g +

and. hence, m2 2 2 ~ 2 2

E(T IN) = N~ + 0' .z n. IN + 0'g i"'l 1. e

(23 )

2 2lJic have NO'e bbcause there are a total of N terms of the type 1e11 that art-' equal

m

Ni - i ~ n~/N -g i=l 1.

2to 0' •e Writing E(M1) in tbrms of (21), (22), and (23) we

E(M1) '" m~l [i ~ ni + i g ~ ni + mO";i"'l i=l

mNoting that ~ n. = N, this reduc0s to

i=l 1.

get

(24)

The coefficient of 0'2 in (24) is of the same form as given by Sn~docor (p.234, 1948).g

The within group mean square is computed as

+ •••

-14-

M2 • N~m [i!• N~m f'til + ti2 + ....

+ ~l + ~2 + .....f )-mnm

'raking the expectation term by term we have

E(~) • N~m [Etil + Ei112 + .... Ei1ln1

+ E~l + E~2 + .... E~~

+ Ei.fml + E~2 + •••• E.fmn - EG~/nlm

(25)

The L-xpectation of the square of any single Y is in no way affect8d by the numb(~r

= N, this reduces to

m

~ n. jJ.i=l J.

of individuals obs~rv8d in each group. Therefore, it is

tuting in (25) in terms of (14) ,md (22) we obtain,

E(~'2) • N~m tfL2

+ N"~ + N"~ •

m -

- ~ n. i - miJ1=1 J. g em

Rdmembering tha t jJ. and i arE; constan ts and that ~ n.g i=l J.

giv6n by (14). Substi-

(26)

R~f8rring to (24) and (26) the analysis of variance with mean square 8xp0ctations

can now bo writtbn as follows:

VariancE: Source

Groups

Within groups

-15-

d.f.

m-l

N-m

Bxpcctationc of m.s.

r::l + n t r::le g

Genoral Procedures

Total N-l

~ 1where n ". ~m-J.

m-.:'1 2--~ nii=l

N - --;:NO;---

Before turning to other examples it will be useful to summariZ8 the general

procedures demonstrat0d in the foregoing examples. Steps in th8 procedure are list8d

b,;;low.

1. Specification of the model. This includes a symbolic statement of the

composition of the individual values that make up th0 data, assumptions

as to whether the various ~ffccts are fixed or random, and assumptions

concerning whether separate 8ffects vary independently.

2. Th~ composition of each moan square is written out in terms of the mod~l

and the steps followed in computing the mean square~

3. The expectation of the mean square is developed term by term.

Rules employed in step 3 may be summarized as follows.

1. The expectation of a constant is thu constant itsolf.

2. The 6xp~ctation of ~ variable is the population mGan of the v~riable.

3. The expectation of the square of a variable '~hat has population mean zero

is the population variance of the variable.

4. The expectation of tht product of a constant and a variable is the product

of the constant and th~ expectation of the variable.

5. The expectation of th6 product of two variables that have population mean

zero is the population covariance of th0 variables.

6. The population covariance of any two variable effects is zero whenever

the particular two effects contributing to any OnE) m(;asur~ment in thi'j

data may be assumed to bejn~ drawn from their respective popu­

lations.

-16-

Two points merit special attention.

1. It is d6sirable to write the model in terms of a gen~ral mean so that all

effects will have zero as thdr popula.tion m8an. This allows taking

advantage of 3 and 5 above.

20 If 6 aboVe is kept in mind a great deal of labor can bo saved, in writing

out the composition of m",an squaros in expanded form, by omittingprcd\\cJG ....

terms that have expectation zero. For example with this in mind (7) might

have been written

2222222E G. = E n ~ + Eng. + E e'l + E 8'2

~ ~ 1 ~+ ••••

2E e.1n

for thE. case where g. was consided a random variable.1

In the case of more complicated analyses than thoSG considered in the fore-

going examples, expressiuns for the composition of the various mean squares may

be v~ry long. Rather than follow the procedure outlin6d above in just the form

demonstated by examp16s 1 and 2.t it is more conVenient in these cases to recognize

that every m~an square can be computed as a linear function of one or mor~

"uncorrected" sums of squares and what is commonly called the correction factor.

Thus the expectation of a mean squarE. can be obtained by combining the bxpectations

of uncorrected s~~s of squar~s and the corrbction factor in the same way that the

sums of square and correction factor were combined to obtain the mean square. The

procedure is to find the expectations of the uncorrected sums of squares that must

be computE:Q in the analysis and of the correction factor and then combine these

appropriately to obtain th~ 8xpectations of the mean squares.

Example 3

Consider the analysis of data obtained from comparison of ~ gcn6tic strains of

a particular annual crop in a randomized block design at each of s locations in each

of ! years. Assum8 ~ roplications in each location each year and that diff8rent

land or at least a new randomization is used in succ8ssive years at each location.

The form of the variance analysis is as follows:

-17-

Varianc;;; Source d.f. .. -m.s.

Locations

Years

Lx Y

Reps in years and locations

Strains

L x Strains

Y x Strains

L x Y x Strains

Strains x reps in Land Y

Total

Th.., modGl employcd will be as follows:

s-l

t-l

(s-l)(t-l)

st(r-l)

n-l

(s-l) (n-l)

(t-l)(n-l)

(s-l)( t-l)(n-l)

st(r-l) (n-)

rstn-l

y, 'kl =~ + g. + a. + bk + (ab)'k + (ga),. + (gb)l.'k~J ]. J J ].J

+ (gab) , 'k + c, kl + (gc)., kll.J J l.J

wh8re ~ is th8 population mean

is thG effect of the i-th strain

(gab)ijk

is the effGct of the j-th location

is the tffect of the k-th year

is an 0ffoct arising from first order int~raction betw8en ~nviron­

ment conditions of the j-.th location and k-th year

is an effect arising from first order interaction of the i-th strain

wi th the j -th location

is an 8ffcct arising from first ordGr interaction of the i-th strain

with the k-th year

is an &ff6ct arising from second order interaction of the i-th strain

with the j-th location and k-th year

is the effect of the l-th block at the j-th location in the k-th

year as a deviation from th6 mean for that location and year, and

-18-

(gc), 0kl is the effect of the plot to Which the i-th strain is assignbd in1.J

the I-th block in the j-th location and k-th year (strictly speaking

it also contains a plot~strain interaction effect and the error of

measurement, but only in special cases would it be important to

indicate this sub-division in the model).

All effects will be considered random variables with mean zero. This would be

appropriate if the objective of the work was to compare the strains for use in

locations and years of which those involved in the experiment were a random sample,

and if th~ strains represented a random sample from a population from which other

strains might have been taken for comparison. It will also be assumed that all

effects vary randomly with respect to each other so that all covarianc6s among

pairs of 8ffccts are ZEro. This is an appropriate assumption in consideration of

th~ way work like this is usually conducted. Finally, it will be assum6d that

E(ga)~ , is constant OVtlr all values of i and j1.J

. 2 is constant all values of i and kE(gb)ik over

2 is constant all valu6s of j and kE(ab)jk oVt:r

2 is constant all values of i, j, and kE(gab), 'k OVt;r1.J

2 is constant all values of j" k" a.nd 1E cjk1 ovt::r

2 is constant all values of i, j, k, and 1.E(gc) 0, kl over1.J

The sense of this is that all individual EJff6cts within anyone of th8 six kinds

belong to a common population and have the variance of that population as the

Qxpectation of their squares. This is an assumption very commonly made in connec­

tion with analyses of the typE; in question, though it may not always be justified.

The letter T with appropriate subscripts is used to symboliZE: different sums

of the Y's. For example,

T =grand total

Ti = sum for the i-th variety (over all locations, years, and blocks)

T. = sum for the j-th location (over all strains, years, and blocks)J

Tij .. sum for thE: i-th strain at thE; j -th location (over all years and blocks)

E:tc.

-19-

Carriud to its ultimat6 this means

but Yijkl will be used instead of Tijkle The uncorr~ct~d sums of squares will be

symbolized by S with appropriate subscripts. For 8xample,

S = T2/nrst = the correction factor.n

= ~ T~/rst = uncorrected sum of squares for strains.i=l ~

n sS.. ... ~ ~ T~ ./rt = uncorrected sum of squares for strain-location totals,~J i=l j=l ~J

etce

The process of obtaining thE; expectations of the mean squar(;s can be amply

illustrated by considering only one mean square, say M2. It is computed us follows:

1[Sij - (Si - S) - (Sj - 5) - sl~ -- (n-i{s-l) -

1[Sij - S. - Sj + s]= (n-l) (s-l) ~

Consequently1

E(M2) ... (n-i)(s-i) rE S.. - E S. - E S. + E slLJkJ ~ J -

(27)

The SIS involved have th8 following composition

SiJ' = ; ~ ~ T~ .r i j ~J

S. 1 ~T~=-J nrt j J

5 1 T2=-nrst

-20-

It follows that their expectations arB,

E Sij1~

.~ 2

\=- ~ E Tijrt i j

I

E S. 1~ E T~

~ :: rsti ~

E S. 1 ~ E T~""-J nrt

j J

E S "" -l:...- E T2nrat

4S the basis for obtaining the expccta tions of the TIS w~ must !mow their com­

position. The expectation of the square of any of th~sc TIs,

TJ.'j :: ~ ~ y, 'ldk 1 J.J

T. :: :2~~ YijldJ. j k 1

T. = 2~~ YijklJ i k 1

(28)

T ...

Expanding th8se sums in terms of the model for the analysis we have the following:

TJ.'J' = rt~ + rtg. + rta. + r

1.. J

+ r ~ (gab) .. k + ~~k J.J k 1

~ bk + r ~ (ab) 'k + rt(ga) ..k k J J.J

c . kl + ~] (go)., klJ. k 1 1.J

+ r ~ (gb). kk 1.

Ti :: rstiJ. + rstgi + rt ~ a.j J

+ rs ~ ~~~ (gb). k + r _ ~k 1. j k

+ rs ~ bk + r .~ ~ (ab) 'k + rt ~ (ga) ..k j k J j J.J

(gab), 'k + ~ ~ ~ c'kl + ~ ~ ~ (go)"kl1.J j k 1 J j k 1 J.J

-21-

T. = nrt~ + rt ~g. + nrta. + nr 2bk + nr ~ (ab)'k + rt Z (ga) ..J i 1 J k k J i 1J

+ r f f (gb)ik + r ~ .~ (gab) .. k + n ~ ~ c. kl + ~ ~ ~ (gc) ..kli k 1J k 1 J i k 1 1J

T =s nrst~ + rst ~ gi + nrt ~ a. + nrsi j J

+ rt ~ ~ (ga) .. + rsi j 1J

+ ~ ~ ~ ~ (gc)'jkli j k 1 1.

~ ~ (gb) 'k + ri k 1.

~ bk + nr ~ ~ (ab) . kk j k J

ZZ~ (gab) .. k + n ~ ~ ~ C . klij k 1.J j kl J

The expectation -,f' :" .. of the square of any of these T's is thb sum of the expecta­

tions of each term in the square. however, since all covariances among different

effects are Zbro (see statement of model) the expectations of all product terms in

the square of any T are also zero. Thus only the expectations of the squares of

thG separate terms in the above expressions o.:mtribu'b<.; to the expectations we ar8

seeking. These can be written directly from inspection of the terms. For example,

2t

2 2'" r tL

2wh6re cr s~~bolizes th~ population variance of the.bffect indicated by subscript

(bocause (1) the numbor of bls in the sum indicated

is t, (2) Eb~ = cr~, and (J) the expectation of the

product of two bls is zero)

Proceeding in this way the expectations can be written from the equations for the

T's as follows:

2t 2 2 2t 2 2 2t 2 2 2t 2 2t 2 2t 2 2 2t 2.. r lJ. + r crg + r cra + r crb + r crab + r ~ga + r crgb

+ r 2ti b + rti + rtiga c go(29a)

-22-

b T~ 2 2t 2 2 2it22 22222 2 2 2 2 t2 2=rs eJ. + r a + r st aa + r stab + r staab + r s a~ g ga

222 2 2 2 2+ r s tagb + r sta b + rstO' + rstO'ga 0 go

E T~ 2 2t 2 2 222 + 2 2t 2 2 222 222 222=nr IJ. + nr t 0' n r cr + n r to'b + n r taab + nr t 0'J g a ga

2 2 2 2 2 2 2+ nr tagb + nr tcr b + n rtcr + nrtO'ga 0 go

(29b)

(29c)

22222 2222 2222 222222 2= n r s t tJ. + nr s t 0' + n r st 0' + n r &I 'to:b +n r stO' bg a a

nlitigb + nr2sti b + n2rsti + nrstiga c go

222+ nr st 0'ga

(29d)

Note that the first of these expressions is constant no mattor which genotype­

location sum is in question (this is apparent since neith6r ~ nor ~ appears as a

subscript in the right hand side of the bxpression). The same sort of thing is

tru6 for the second and third expressions as well. Therefore, equations (28) can

bB rewritten as follows:

E S..~J

Ii, s.~

E S.J

E. S

= it ens 1Tij ]= r;t [ n E T~ ]

= n;t [ s E T~ J=.2.- E T2

nrst

(0)

The only r0maining step is to substitute in (27) in terms of equations (29) and

(30). Collecting tGrms involving a common parameter at the sam6 time that the

su~stitutions are madt, we obtain,

r 2 2E(M2) = llJ. (nrst - nrst - nrst + nrst) + O'g (nrst - nrst - rst + rst)

2 2+ O'a (nrst - nrt - nrst + nrt) + ab (nrs - nrs - nrs + nrs)

2 2+ cr b (nrs - nr - nrs + nr) + a (nrst - nrt - rst + rt)a ga

2 2+ 0" b (nrs - nrs - rs + rs) + 0' b (nrs - nr - rs + r)g ga

2 2 ] 1+ O"c (ns - n - ns + n) + O'gc ens - n - s + 1) (n-l)(s-l)

= (rt (ns - n - s + 1) i + r (ns - n - s + 1) i b

+ (ns - n - s + 1) cr~cJ g;n-1f(s-1) g3

RLC:2:53

-23-

41'. Sinc~ (n-l)(s-l) = (ns - n - s + 1) this reduc6s furthGr to

E(N2 ) =rti + ri b + iga ga gc

It is worth noting that thG mean square for locations is computod as

.l- (S. - S)s-1 J

and th e ont; for strains as

Thus thu 8xpectations of th(.s~ mean squares could be quickly obtained in terms of

information developed in working out E(M2).

An important practical ang18 to note is that as one gains experience in work­

ing out mean square expectations various short cuts becomE:: apparent (for em ;,:xamplt,

Sub Crump, Biometrics 1946). However, no attempt will be made to describe such

short-cuts and when they can b~ used, as the novicG will run less chance of mis­

applications if h~ goes through th~ full procedure in d8tail until he perceives

short-cuts and their rationale by himself. In doubtful caSeS it is always best to

proc06d in a straight-forward m;mn8r working through the full procedure described

above.

e,

Example 4On occasion estimates of v~iancc components are required from n-fold classi­

fication data in which sub-classes are disproportionate and in which in many

instances a portion of the sub-classes are not represented at all in the data. In

the case of data avai1ab18 to th~ animal geneticist for estimation of variances

arising from genetic variation or genotype-8nvironment interaction this can almost

be said to be the rule rather than the exception.

As a specific example suppose that data are available on the annual milk pro­

duction of cows that were by different ,sires and that were members of different

herds. It will be assumed that ffi8mb8rs of any particular sire family may have b6en

scattered through two or more herds but not ntcGssarily all herds. Herd eff~cts

will vary due to managemtnt practices (and perhaps for other reasons), family

~ff€cts will vary as a result of g~notypic variation among sires, and herd-

family int0raction 6ffocts may be pr0sum8d to Gxist. A rational model on which to

base anaJ.ysis of the data would be as follows:

WhvrG Yijk is

in

gi is

-24-

Y" k = ~ + g. + 3.. + (ga) .. + 8. ok1.J 1. J l.J J.J

tht. production of th.... k-th cow that is by the i-th sirt..' ;=..nd loc·.t\-(t

th\S; j -th h8rd,

the &ffect of th~ g~notype of the i-th sirE: (on production by his

daughters) J

~ is the effect of the j-th h~rd,j

(ga)ij is a.n ('ffect due to intGraction betw(.;· .•n aver£l.ge genotype of thl;;; i-th

family and th0 environment to which cows are exposed in the j-th h~rdJ

and

6 ijk is the deviation in production of the k-th cow from thl;) population

aV6rage for th~ i-th family in the j-th h8rd.

It will be assumed that all effects are random with population mean zero J that ::tll

individual effbcts ar", random wi th r~,spect to E:ach othE;r so that tht) 8XP(~ct9.tion

for any product of two 8ffGcts is zero, ~nd finally that

2 is constant all valu~s of i andE(ga). ° oVtjr j1.J

and E 2 is constant oVe:r all values of i and j8 ijk

';:1' production Wf~re measurt.:d in various years a raalistic mod.:l would includ·~ oth",r

c..t'i'l..cts but for th~ fJurpOS0 of this example we will assume all records W(jr(. t:o.ksn

in ~ single Y03.r.

'rh~rt:. arE: various computational approaches that may bt:; taken in thE us.? of

such data for estimation of variance components, but one of the. easiest that is

b~colning increasingly popular because of its ease is as follows: In terms of our

example, four mean squares would be computed: mean squares for (1) families,

(2) herds, (3) hlilrd-family subclasses, and (4) cows within herds. Th<: expectationc

of the first thre8 ~f th~se will b6 linear functions of th8 varianc~ of all four

of th~ variabl~s in the model. The fourth will hav~ dxpectation, cr~. Once com-'"puted the four mean squnres would be equated to their r~spbctivc expectations to

provida four equations in four unknowns (the varianc6s of the four <.:ff ,~cts) that

would then be solv~d simultaneously to obtain estimates of the four variancYs.

w~ lvill consider the mean square for subclasses (Msc ) in detail. It would tv

[~ ~ T~.j - T2/N]i . ~J n ..

J ~J

1s:I

-25-

wh~r~ n .. is the number of cows of the i-th family in the k-th herd,1J

Tij is the sum of production by all cows of th(; i-th family in tht; k-th h__rd,

T is the grand total of proQuction by all cows,

N is the total numb~r of cows,

and s is thG number of sub-classes r8pr6scnted by one or mor6 cows.

Obviously,

E Msc= 1

s-1 ~ E(T~ 'j )j l.J nij

(31)

Yl.'J'k = n,. ~ ... n .. g. + ni . a. + n .. (ga) .. +1J 1J. l.: J J 1J 1J

Proceeding in accord with arguments presented in conntction with the pr8vious

examplG we can writ\:: directly

In contrast to example 3 this is not constant for all T., but v,,,,ri8s with n. .• "I~G

2 ~ ~must now find the Gxp€cta tion of T •

T = "." ~ ~ + .~ .~ n, ,(ga).. + " .~"".r.;:j 4 T., = NiJ. + ~n.g. + s../!l,a. ~ ~ ~ ~ ~ e, 'k'j, j 1J i 1 1 j J J i j 1J l.J i j k l.J

wher8 n, =total number of cows in the i-th family,l.

and n. = total numb~r of cows in the j-th herd.J

E T2= if tJ.2 + ~ ni i + ~ n~ (i + ~ ~ n~ ,

i g j J a i j l.J

As ~n example of the detail involved in writing E T2

conaid€r the term, ~ n. g.•i 1 ].

(33)

from the Gxpression for T,

-26-

where f is the number of families

-, 2( 2n. g.)i ~ ~

+ •• ,.

+ product terms that need not

be written out since all have

zero Gxpectation.

~ 2Then E( ~ n. g.)

i J. J.

since the expectation of thE; square of any random. 2

g J.S cr •- g

~ ~n~.2 i . J.J 2]+cr J +cr)ga N e

Substituting in (31) in terms of (32) and '33) we obtain,

E Msc = S~l [2 ~ (n . .i + n';J'cr~ + ni·i + n .. i + i)i j J.J ~ ~ J a J.J ga e

." 2 ~ 2~n. ~n.

2 2 i J. 2~- (N~ + crg --r + cra N

· S:l [(Ni + i ~ ~ n .. + O'~ .~ ~ n .. + i ~.~ n ..g i j J.J i j J.J ga i 'j ~J

2+ sa

"

2+ (Je

2 ~n~. I~• • J.J 2J. J +0')

N e

~n~J )

N

~n? ~nj2i ~ 2 j 2

--+cr -+crNaN ga2

+ 0'g

[i (~~n ..g i j J.J

~ ~n~. J~ . . J.J(~~n .. - J. ~ )

i j ~J

2- (NiJ.

2+ (Jga

bxpectations of the other mean squares are obtained by the same procedure as that

used for E Msc • For any particular body of data N, the ni , the nj

, and thG nij ,

can be obtained by mGre counting and hence, the coefficients of the sevoral

1farlances in E M can be compute:d.sc

-27-

Final Comm0nts

The ~sscnce of working out mGan square expectations can be summarizGd as

follows:

1. It is necessary to know what is meant by expectation.

2. It is nec~8sary to know the values that the definition of Cxp0ct~tion

imposes on the expectations of (a) a constant (b) a random variat~ (c)

the product of a constant and a random variate (d) the squar~ of a random

variate, and (e) the product of two random variat~s (only the cases of

random variates with population mean zero are of special importance).

J. Fundamentally, thE; procedure is to write the mean square out symbolically

in a form that is expanded to the point that it is a linear function of

only terms of thE:. type (a) to (6) of point 2 abovl;.;.

4. When this has bGcn done, knowlvdge specified in point 2 above, together

with the rule that the ~xpectation of a linear function is equal to th(

same function of tlk expectations of the. separate t8rms of the quantity

for which the expectation is desired providus the basis for writing thu

d8sired expectation.

5. From the practical point of vi6W, many of thE; steps can and will be pt:;r­

formed only mentally (will not be written out). HowevGr, in case of doubt,

writing steps out i~ detail is likely to insure against an occasional

serious error. There are rules-of-thumb that can sometimes bv used but

thuir application involves risk of error unless the entire matt0r is so

well understood that th~ rvason why these ru18s work in specific cases is

entirely clear. Otherwise they may be applied in cases where thbY do not

work.

l"~r supplemontary reading on th", derivation of mt:;an squart; t;;xpectations Sc,6

And~rson and Bancroft (1952) and Kempthorne (1952).

Lit8rature Cited

Anderson, R. L. and T. A. Bancroft (1952)Hill.

Statistical Theory in ResGarch, McGraw­

N8W York.

Crump, .S •• Lee (1946) The Estimation of Variance Compon;;:nts in Analysis of

Variance. Biometrics Bull. 2:7-11.

K~mpthorn~, Oscar (1952) The Design and Analysis of £Xperilncnts. John Wiley

and Sons, Inc. New York.