56
ST217 Mathematical Statistics B arbel Finkenst ¨ adt March 11, 2008

ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Embed Size (px)

Citation preview

Page 1: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

ST217 Mathematical Statistics B

Barbel Finkenstadt

March 11, 2008

Page 2: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

CHAPTER 6

LINEAR STATISTICAL MODELS

The (Normal) Linear ModelThe Analysis of Variance (ANOVA)

I one-way ANOVAI two-way ANOVA

Page 3: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Introduction

Definition [Response Variable] a response variable is arandom variable Y whose value we wish to predict.

Definition [Explanatory Variable] An explanatory variable is arandom variable X whose values can be used to predict Y .

Definition [Linear Model] A linear model is a prediction functionfor Y in terms of the values x1, x2, . . . , xk of X1, X2, . . . , Xk of theform

E[Y |x1, x2, . . . , xk ] = β0 + β1x1 + β2x2 + · · ·+ βkxk (1)

Page 4: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Matrix formulationIf Y1, Y2, . . . , Yn are responses for cases 1, 2, . . . , n, and xij isthe value of Xj (j = 1, . . . , k ) for case i , then

E[Y|X] = Xβ. (2)

Vector of responses:

Y =

Y1Y2...

Yn

Matrix of explanatory variables (design matrix):

X = (xij) where xi0 = 1 for i = 1, . . . n

Parameter vector (unknown):

β =

β0β1...

βk

.

Page 5: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Example 1

Response Y = S2 (Systolic BP after treatment)

I explanatory variable S1 (this is a ‘simple linear regressionmodel’, with just 1 explanatory variable),

I explanatory variable D2

I explanatory variables D1 and S1 (a ‘multiple regressionmodel’).

Page 6: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Example 2

Response Y = 2D2 + S2

I explanatory variable Z = 2D1 + S1

I explanatory variables Z and Z 2 (a ‘quadratic regressionmodel’).

New response and explanatory variables may be obtained bytransforming and/or combining old ones.

Page 7: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Comments

I A linear relationship is the simplest possible relationshipbetween response variables and explanatory variables.

I Can sometimes approximate a nonlinear relationship by alinear model, for example ‘polynomial regression’

E[Y |x ] = β0 + β1x + β2x2 + · · ·+ βmxm.

I By ’linear model’ we mean ’linear in parameters’.I Distributional assumptions (if any) ONLY about the

response variable Y . The explanatory variables areconsidered as given and fixed.

I There are various other names for response andexplanatory variables

Page 8: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Simple Linear Regression

Definition [Simple linear regression]A simple linear regression model is a linear model with oneresponse variable Y and one explanatory variable X , i.e. amodel of the form

E[Y |x1] = β0 + β1x1. (3)

Typically in practice we have n data points (xi , yi) fori = 1, . . . , n, and we want to predict a future response Y fromthe corresponding observed value x of X .

Page 9: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Which variable should be treated as the response?

1. X may precede Y in time, for exampleI X is BP before treatment and Y is BP after treatment, orI X is number of hours revision and Y is exam mark;

2. X may be in some way more fundamental, for exampleI X is weight and Y is height orI X is height and Y is weight;

3. X may be easier or cheaper to observe, and in future wehope to estimate/approximate Y without measuring it.

Page 10: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Parameters β0 and β1 are unknown and we need to estimatethem in order to predict Y by Y = β0 + β1x .

To make accurate predictions we require the prediction error

Y − Y = Y − β0 + β1x

to be small.

This suggests that, given data (xi , yi) for i = 1, . . . , n, we shouldfind β0 and β1 so that all the vertical deviations of the observeddata points from the fitted line y = β0 + β1x are small.

To do this we apply the ‘least squares’ criterion:Minimise the sum of squared deviations

∑(yi − yi)

2.

Page 11: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Method of Least Squares

For simple linear regression,

yi = β0 + β1xi (i = 1, . . . , n) (4)

To estimate β0 and β1 by least squares, we minimise

Q =n∑

i=1

[yi − (β0 + β1xi)]2. (5)

Page 12: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercise 6.1: Show that Q is minimised at values β0 and β1satisfying the equations

β0n + β1∑

xi =∑

yi ,

β0∑

xi + β1∑

x2i =

∑xiyi ,

(6)

(these are called the normal equations for β0 and β1) andhence that

β1 =

∑xiyi − n x y∑

x2i − nx2 , (7)

β0 = y − β1x . (8)

Page 13: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Comments

I Checking second order partial derivatives of Q verifies thatQ is minimised at β = β.

I y = β0 + β1x is called the ‘least squares fit’ to the data.I The least squares fitted line passes through (x , y), the

centroid of the data points.

Page 14: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

The Normal Linear Model (NLM)Definition [NLM]Given n response RVs Yi (i = 1, 2, . . . , n), with correspondingvalues of explanatory variables xT

i , the NLM makes thefollowing assumptions:

1. (Conditional) IndependenceThe Yi are mutually independent given the xT

i .2. Linearity

The expected value of the response variable is linearlyrelated to the unknown parameters β:

EYi = xTi β.

3. NormalityThe random variation Yi |xi is Normally distributed.

4. Homoscedasticity (Equal Variances)i.e. Yi |xi ∼ N(xT

i β, σ2).

Page 15: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Matrix Formulation of NLM

The NLM for responses y = (y1, y2, . . . , yn)T can be recast as

follows

1. E[Y] = Xβ for some parameter vectorβ = (β1, β2, . . . , βp)T ,

2. ε = Y− E[Y] ∼ MVN(0, σ2I), where I is the (n × n) identitymatrix.

Page 16: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

It can be shown that the least squares estimates of β are givenby solving the simultaneous linear equations

XT y = XT Xβ (9)

(the normal equations), with solution (assuming that XT X isnonsingular)

β = (XT X)−1XT y, (10)

Page 17: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

I In the general formulation the first column of the matrix Xcontains 1’s. The corresponding parameter β1 will be the‘constant term’.

I The fitted values are y = XT β.I The vector of residuals is r = y− y.

Page 18: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Definition [Residual sum of squares]The residual sum of squares (RSS) in the fitted NLM is

s2 =∑n

i=1(yi − yi)2

= (y− Xβ)T (y− Xβ)(11)

I S2/σ2 ∼ χ2(n−p),

I S2 is independent of β.

Page 19: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercise 6.2

(a) Show that the log-likelihood function for the NLM is

(constant)− n2

log(σ2)− 12σ2 (y− Xβ)T (y− Xβ). (12)

(b) Show that the maximum likelihood estimate of β is identicalto the least squares estimate. What is the distribution of β?

(c) Show that the MLE σ2 of σ2 is

σ2 =s2

n. (13)

What are the mean and variance of σ2?(d) Show that an unbiased estimator of σ2 is given by the

formulaResidual Sum of Squares

Residual Degrees of Freedom

Page 20: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

bf Amended lecture notes start here, insert the following slidesinstead of section 6.6 in current lecture notes

ExerciseGive the Rao-Cramer lower bound for an unbiased estimator ofβ and σ2 and verify whether β and the unbiased estimator of σ2

attain it.

Page 21: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Degree of Explanation

A frequently reported summary statistic for the ’goodness of fit’is the coefficient of determinaion R2

R2 =Varriation explained by Regression

Total variation(14)

=

∑ni=1(yi − y)2∑ni=1(yi − y)2

(15)

where the total variation corresponds to the observedunconditional variation of the response variable.

Caution: R2 cannot be used as a basis for comparing thegoodness of fit of alternative models as it can be increasedsimply by increasing the number of explanatory variables.

Page 22: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Hypothesis Testing

(a) General linear hypothesesSuppose we wish to test the null hypothesis

H0 : Rβ = r

that the parameters of the β vector are contained in thesubspace for which Rβ = r where

R is a J × p; J ≤ p hypothesis design matrix of rank J;r is a J × 1 known vector.

Page 23: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercise Formulate

(a) H0 : β1 = β2 for the case p = 2,(b) H0 : β1 = β2 and

∑pi=1 βi = 1,

(c) H0 : β1 − 3β3 = 6β4 and β1β2

= 4,

as a general linear hypothesis Rβ = r.

Page 24: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Let β = (XT X)−1XT y be the unconstrained MLE andσ2 = s2

(n−p) . Then it can be shown that the following test is ageneral likelihood ratio test:

Under H0 : Rβ = r

λ =(Rβ − r)T [R(XT X)−1RT ]−1(Rβ − r)

Jσ2 ∼ F(J,n−p).

Rejection rule: reject H0 if the realised λ is larger than the(1− α)percentile of the F(J,n−p) distribution.

Page 25: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

(b) Single HypothesisThis is a subcase of (a) with rank J = 1:

Suppose we wish to test the null hypothesis

H0 : R1β = r1

Examples: Formulate

H0 : β1 = β2,H0 :

∑pi=1 βi = 1,

H0 : β1 = r1,

as a simple linear hypothesis of the form R1β = r1.

Page 26: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

In the case of a simple linear hypothesis (J = 1) the likelihoodratio test in (a) reduces to

λ =(R1β − r)2

σ2[R(XT X)−1RT ]∼ F(1,n−p) = t2

(n−p).

Therefore we can use a t-test

T =√

λ =R1β − r

σ√

R(XT X)−1RT∼ t(n−p).

Decision rule: reject H0 if |T | is too large.

Page 27: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Comment: For the single hypothesis H0 : βi = βi,0 note that

T =βi − βi,0

σ√

aii∼ t(n−p),

where aii is the i’th diagonal element of (XT X)−1.

Regression software often reports the value of T for H0 : βi = 0for each single regression coefficient.

Page 28: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

(c) Hypothesis testing about σ2

Straightforward: The RV

(n − p)σ2

σ2 ∼ χ2(n−p)

provides distributional basis for testing H0 : σ2 = σ20.

Page 29: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Analysis of Residuals

Residuals should be approximately iid, normal, random,patternless, uncorrelated, ...

Investigate residuals to

I check for outliersI plot standardised residuals (these should be approximately

standard normal) and be suspicious about absolute valueslarger than about 2.

I check for violations of normalityI histogram and normal or QQ plot;I possibly tests for normality (χ2 of goodness of fit,

Kolmogorov-Smirnov test)

Page 30: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

I check for homogeneous variances by plotting residualsagains predictions y and also against explanatoryvariables;

I in particular if the response variable is a variable that ismeasured over time then the residuals might be correlated.You can inspect this in scatterplots where you plot theresidual at time point t versus the residual at time pointt − 1 and compute the sample correlation coefficient.

Page 31: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Examples of the NLMExample 1

Simple Linear Regression

Yi = β0 + β1xi + εi , (16)

where εiIID∼N(0, σ2).

Page 32: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Example 2

Two-sample t-test

y =

x1x2...

xmy1...

yn

, X =

1 01 0...

...1 00 1...

...0 1

, β =

(β0β1

), (17)

and we’re interested in the hypothesis H0 : (β0 − β1) = 0.

Page 33: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Example 3: Multiple RegressionY = SBP after captopril, x1 = SBP before captopril, x2 = DBPbefore captopril,

y =

201165166157147145168180147136151168179129131

, X =

1 210 1301 169 1221 187 1241 160 1041 167 1121 176 1011 185 1211 206 1241 173 1151 146 1021 174 981 201 1191 198 1061 148 1071 154 100

, β =

β0β1β2

,

(18)

Page 34: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

See slides captopril.pdf for output, file can be downloaded frommaterial webpage.

Page 35: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercise [dependence and heteroscedasticiy]Suppose that Y|X is MVN(Xβ, σ2Φ) where Φ is a knownpositive definite matrix. Show that the MLE of β is given by thegeneralised least squares estimator

β = (XT Φ−1X)−1XT Φ−1y.

Page 36: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

ANALYSIS OF VARIANE (ANOVA)

One-Way Analysis of VarianceTwo-Way Analysis of Variance

Page 37: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

One-Way Analysis of Variance

Generalization of the two-sample t-test to p > 2 groups

Observations yij (j = 1, 2, . . . , ni ) in the i th group(i = 1, 2, . . . , p).Total number of observations n = n1 + n2 + · · ·+ np

Denote the corresponding RVs by Yij . Assume thatYij ∼ N(βi , σ

2) independently.

Wish to test the null hypothesis

H0 : β1 = β2 = · · · = βp

Page 38: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

The one-way ANOVA model is a NLM of the form

Y ∼ MVN(Xβ, σ2I),where

Y =

Y1

Y2...

Yn

, X =

1 0 0 · · · 01 0 0 · · · 0...

......

. . ....

1 0 0 · · · 00 1 0 · · · 0...

......

. . ....

0 1 0 · · · 00 0 1 · · · 0...

......

......

0 0 1 · · · 0...

......

......

0 0 0 · · · 1...

......

. . ....

0 0 0 · · · 1

, β =

β1

β2...

βp

, (19)

where X has n1 rows of the first type, . . . np rows of the last type, andn1 + n2 + · · · + np = n.

Page 39: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Notation

y i+ =1ni

ni∑j=1

yij , y++ =1n

p∑i=1

ni∑j=1

yij

(=

1n

p∑i=1

niy i+

), etc.

Exercise 6.6 Show that for one-way ANOVA,XT X = diag(n1, n2, . . . , np), and henceβ = (Y 1+, Y 2+, . . . , Y p+)T .

Page 40: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Equivalent Reformulation

Yij = βi + eij = µ + αi + eij

whereµ is called the grand meanαi is called treatment effect of treatment i .

µ = E[Y ++] =1n

p∑i=1

ni∑j=1

EYij =1n

p∑i=1

niβi ,

αi = βi − µ (i = 1, 2, . . . , p),∑pi=1 αi = 0.

Hypotheses are now

H0 : αi = 0 (i = 1, 2, . . . , p),H1 : αi 6= 0 for at least one i .

Page 41: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

One-Way ANOVA TestTotal variability of all data taken together around its estimatedgrand mean can be decomposed as

SST =∑p

i=1∑ni

j=1(yij − y++)2

=∑p

i=1 ni(y i+ − y++)2 +∑p

i=1∑ni

j=1(yij − y i+)2

= SS(tr) + SSE

with corresponding degrees of freedom

n − 1 = p − 1 + n − p.

Under H0 the F-ratio

f =

SS(tr)(p−1)σ2

SSE(n−p)σ2

=MS(tr)MSE

∼ F (p − 1, n − p)

and we reject H0 if f exceeds the critical value at significancelevel α.

Page 42: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercises1) Prove that SST = SS(tr) + SSE.2) Show that f is a general likelihood ratio test.

Page 43: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

One-way-ANOVA table

Source of Degrees of Sum of Meanvariation freedom squares square fTreatment p − 1 SS(tr) MS(tr) MS(tr)

MSEError n − p SSE MSETotal n − 1 SST MST

Page 44: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Example: River pollution (p.94)The following data come from a study of pollution in inlandwaterways. In each of seven localities, five pike were caughtand the log concentration of copper in their livers measured.

Locality Log concentration of copper (ppm)

1. Windermere 0.187 0.836 0.704 0.938 0.1242. Grassmere 0.449 0.769 0.301 0.045 0.8463. River Stour 0.628 0.193 0.810 0.000 0.8554. Wimbourne St Giles 0.412 0.286 0.497 0.417 0.3375. River Avon 0.243 0.258 -0.276 -0.538 0.0416. River Leam 0.134 0.281 0.529 0.305 0.4597. River Kennett 0.471 0.371 0.297 0.691 0.535

Want to carry out a one-way analysis of variance to test fordifferences between the data between localities.

Page 45: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Figure: Concentration of copper in pike livers

Page 46: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Two-Way Analysis of Variance

Here there are two factors:

Factor A has I ‘levels’ 1, 2, . . . , I, and factor B has J ‘levels’1, 2, . . . , J.

Factor B1 2 . . . J

1 Y11 Y12 . . . Y1J2 Y21 Y22 . . . Y2J

Factor A 3 Y31 Y32 . . . Y3J...

......

. . ....

I YI1 YI2 . . . YIJ

i.e. there is precisely one observation Yij at each (i , j)combination of factor levels.

Page 47: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Again assume the NLM with

E[Yij ] = θi + φj for i = 1 . . . I and j = 1 . . . J.

i.e.Yij ∼ N(θi + φj , σ

2) independently. (20)

Exercise 6.7 What is the matrix formulation of this model?

Page 48: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Reformulation

The two-way ANOVA model

Yij = µ + αi + βj + eij ,

where

eij ∼ N(0, σ2),I∑

i=1

αi = 0,

J∑j=1

βj = 0.

The Hypotheses of interest are

H0,A : αi = 0 i = 1, ..., I versus H1,A : αi 6= 0 for at least one i .H0,B : βj = 0 j = 1, ..., J versus H1,B : βi 6= 0 for at least one j .

Page 49: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

NotationEstimated grand mean:

y++ =1IJ

I∑i=1

J∑j=1

yij

Mean observation for i’th level of factor A:

y i+ =1J

J∑j=1

yij ,

Mean observation for j’th level of factor B:

y+j =1I

I∑i=1

yij .

Page 50: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Decomposition of total variability of all data taken togetheraround its estimated grand mean∑I

i=1∑J

j=1(yij − y++)2 =∑I

i=1 J(y i+ − y++)2

+∑J

j=1 I(y+j − y++)2

+∑I

i=1∑J

j=1(yij − y i+ − y+j + y++)2

SST = SSA + SSB + SSE

with corresponding degrees of freedom

(IJ − 1) = (I − 1) + (J − 1) + (IJ − J − I + 2− 1).

Page 51: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Two-way ANOVA test

We test H0,A with the F-ratio

fA =SSA/(I − 1)σ2

SSE/(I − 1)(J − 1)σ2 =MSAMSE

Under H0,AfA ∼ F (I − 1, (I − 1)(J − 1))

and we reject H0,A if fA is larger than the appropriately chosencritical value.

The test for H0,B is analogous.

Page 52: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Two-way-ANOVA table

Source of Degrees of Sum of Meanvariation freedom squares square fFactor A I − 1 SSA MSA MSA

MSEFactor B J − 1 SSB MSB MSB

MSEError (I − 1)(J − 1) SSE MSETotal IJ − 1 SST MST

Page 53: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Exercise

Consider the following data on the amount of time (in minutes)it took a certain person to drive to work, Monday through Friday,along four different routes:

Mon Tues Wed Thu FriRoute 1 22 26 25 25 31Route 2 25 27 28 26 29Route 3 26 29 33 30 33Route 4 26 28 27 30 30

Test at the 0.05 level of significance whether the differencesamong the means obtained for the differnt routes are significantand also whether the differences among the means obtainedfor the different days of the week are significant.

Page 54: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Notes for RevisionRevise all chapters 2-6 carefully, including all material added onslides. You are expected to

Chapter 2 be able to derive marginal and conditional distributions,understand independence, be able to do all manipulationsof expectations and conditional expectations (includingcorrelations, variances and covariances), all exercises andproblems of chapter 2 are relevant.

Chapter 3 fully understand and do manipulations about multivariatenormal as practised in exercises (you do need to memorizethe pdf of the MVN distribution!); recognise chi-square, tand F distribution as transformations of other randomvariables, and know some basic properties of these (suchas first two moments).

Chapter 4 fully understand all material in particular about propertiesof estimators and likelihood estimation. Be able to deriveML estimators and their information matrix in themultiparameter case.

Page 55: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Revision contd.

Chapter 5 understand all basic concepts of testing, prove NeymanPearson lemma, be able to carry out all tests introduced todata, be aware of the test assumptions and how to checkit, be able to prove when a test is a LR test.

chapter 6 Definition and all assumptions of the NLM and its matrixformulation, be able to derive ML and LS estimators andtheir properties (again you do need to memorize the pdf ofMVN to set up the likelihood!), fully understand the outputof some regression software and analysis of residuals aswell as some formal testing in the NLM. Carry out a 1-wayand 2-way analysis of variance for some real data (withyour pocket calculator)

Page 56: ST217 Mathematical Statistics Bclux.x-pec.com/files/mathstuff/2ndyear/MSB/slides6.pdfIntroduction Definition [Response Variable] a response variable is a random variable Y whose value

Notes on Revision

Revise

I some linear algebra, you need to able to carry out matrixmanipulations (incl determinants, inverses, rank) andformulate linear equations in matrix form.

I exercises and problems (and most importantly all exercisesdone in lectures and problems marked as past examquestions in the lecture notes)

Make sure you bring a working pocket calculator to exam andthat you know how to use it. The pocket calculator has toconform to University rules, ie it may only be able to displayfigures, no letters!

Be aware: If you forget your calculator you will not be able to doa significant part of the exam. Also it is not permitted to share apocket calculator with a neighbour.