33
20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-1

Lecture 20

Cell Means Model

STAT 512

Spring 2011

Background Reading

KNNL: 16.3-16.6

Page 2: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-2

Topic Overview

• ANOVA as Regression

• Cell Means Model for Single Factor ANOVA

• Sums of Squares & Degrees of Freedom

• Cash Offers example

Page 3: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-3

Cash Offers Example (Pr16.10)

• Goal: Determine if age of owner affects the cash offer made by a dealer for a used car.

• Experiment: SAME car was taken by 36 different people (12 young, 12 middle-aged, and 12 elderly) to 36 different dealerships for an offer.

• Notes: “Owners” were randomized to dealerships. Offers given in hundreds of dollars.

• SAS code: cashoffers.sas

Page 4: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-4

Aligned Box Plots

• Informative way to plot the data and can be done easily in SAS

symbol1 v=dot c=purple; proc boxplot data =cash; plot offer*age / cboxes =purple cboxfill =yellow; run;

• Numerous different options can be used in the plot statement (see SAS help files).

Page 5: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-5

Aligned Box Plots (Cash Offers)

Page 6: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-6

Interpreting Box Plots

Page 7: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-7

Modeling by Regression

• Uses indicator variables

� YNG = 1 if YOUNG

� MID = 1 if MIDDLE

� (Both are 0 if ELDERLY)

• Model: 0ij yng yng mid mid ijY X Xβ β β ε= + + +

� 0β is the mean offer for ELDERLY

� 0 yngβ β+ is mean offer for YOUNG

� 0 midβ β+ is mean offer for MIDDLE

Page 8: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-8

SAS Code

*Create indicator variables for regression; data cash; set cash; if age= 'Young' then yng = 1; else yng = 0; if age= 'Middle' then mid = 1; else mid = 0; proc print; run; *Use Regression to analyze the indicator variables; proc reg data =cash; model offer=yng mid / clm alpha = 0.01667; id yng mid;

run;

Page 9: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-9

Regression Model

Source DF SS MS F Value Pr > F

Model 2 316.7 158.4 63.60 <.0001

Error 33 82.2 2.49

Total 35 398.9

• Two DF in model since two indicator variables needed

• F-test indicates that there is difference due to age (but doesn’t tell us what exactly is different)

Page 10: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-10

Regression Model (2)

Variable DF Est SE t Value Pr>|t|

Intercept 1 21.417 0.456 47.02 <.0001

yng 1 0.083 0.644 0.13 0.8979

mid 1 6.333 0.644 9.83 <.0001

• Estimated mean for Elderly is: $2142.

• Estimated mean for Young is: 2142+8=$2150.

• Estimated mean for Middle is: 2142+633=$2775.

• No difference between Elderly/Young, sig. difference between Elderly/Middle.

Page 11: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-11

Regression Model (3)

• Can get confidence intervals by taking appropriate combinations / using CLM

• CLM gives CI’s for all 36 points, but they will be the same for each group of 12

• Use alpha = 0.01667 (why?)

� YOUNG: (20.35, 22.65)

� MIDDLE: (26.60, 28.90)

� ELDERLY: (20.27, 22.57)

Page 12: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-12

Big Picture

• We could do everything for categorical variables using these indicator variables.

• Internally, SAS does this! But from an analytical viewpoint, there are other ways of modeling that are a bit easier to understand.

Page 13: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-13

Cell Means Model

ij i ijY µ ε= +

• ijY is the value of the response variable in

the jth trial for the ith factor level.

• iµ is the (unknown) theoretical mean for all

of the observations at level i

• ijε are independent normal errors with

means 0 and variances 2σ

• Since ijε are normal RV, ijY also are normal

RV with means iµ and variances 2σ

Page 14: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-14

Comparison to Regression

• 0eldµ β= is the mean offer for ELDERLY

• 0yng yngµ β β= + is mean offer for YOUNG

• 0mid midµ β β= + is mean offer for MIDDLE

• Note that the number of parameters involved is the same – 3 in each case. If I estimate the sµ′ , I can get the sβ ′ – and vice versa.

Page 15: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-15

Parameters in ANOVA

• Need to estimate all of the cell means

1 2, ,..., rµ µ µ and also 2σ

• F-test answers the question of whether iµ

depends on i. That is we test the null hypothesis 0 1 2: ... rH µ µ µ= = = against

the alternative that not all the means are the same.

Page 16: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-16

Notation

• “DOT” indicates to sum over that index, “BAR” indicates to take the average.

• Overall or grand mean is

1

ijT i j

Y Yn

= ∑∑ii

• Mean for factor level i is

1

i ij

ji

Y Yn

= ∑i

Page 17: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-17

Estimates

• Each group mean is estimated by the mean of the observations within that group:

1

i i ij

ji

Y Yn

µ = = ∑i

• Cell variances are estimated by

( )

( )22 1

1i ij i

ji

s Y Yn

= −−∑ i

• Note: in is the number of obs. in cell i. If

all the same, we usually just write n.

Page 18: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-18

Pooled Variance Estimate

• Assumed variances the same, so we pool the cell variances to get an overall variance est.

( )

( )

( )2

21

1

ij ii ii ji

Tii

Y Yn s

MSEn rn

−−

= =−−

∑∑∑

i

• Pooling is weighted according to the number of observations in each group.

Page 19: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-19

SAS Coding for ANOVA

proc glm data =cash; class age; model offer=age; means age; run;

• Class statement causes AGE to be treated as a classification (categorical) variable.

• Means statement produces table of means and standard deviations.

Page 20: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-20

ANOVA Output

Source DF SS MS F Value Pr > F

Model 2 316.7 158.4 63.60 <.0001

Error 33 82.2 2.49

Total 35 398.9

• Exactly the same as the regression model

• 2 DF in the model since 3 levels for AGE

• F-statistic indicates model significance; there is some difference in the age groups

Page 21: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-21

ANOVA Output (2)

Level of ---------offer---------

age N Mean Std Dev

Elderly 12 21.4167 1.67649

Middle 12 27.7500 1.28806

Young 12 21.5000 1.73205

• Note these results are the same as in the regression approach

• It seems apparent here where the difference in age groups is, but important to do statistical tests to obtain “groupings” (More in a later topic).

Page 22: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-22

Partitioning Variation

• Break down difference between observation and grand mean into two parts:

( ) ( ) ( ) Total Deviation of Estimated Deviaton around

Deviation Factor Level Mean Estimated Factor

Level Mean Around Grand Mean

ij i ij iY Y Y Y Y Y− = − + −ii i ii i

��������� ��������� ���������

BETWEEN WITHIN

Page 23: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-23

} ij iY Y−i

iY Y

i ii

ijY Y

ii

Page 24: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-24

Sums of Squares

• If we square both sides of the equation on Slide 22, cross-terms in ( )( )i ij iY Y Y Y− −

i ii i

will cancel and the equation works out nicely to:

( ) ( ) ( )2 22

, , ,

SSTO SSTrt SSE

ij i ij ii j i j i j

Y Y Y Y Y Y− = − + −∑ ∑ ∑ii i ii i

������������� ������������� �������������

Page 25: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-25

Analysis of Variance Table

Source DF SS MS

Model/Trt 1r − ( )2

i ii

n Y Y−∑ i ii

Trt

SSTrt

df

Error Tn r− ( )

2

,

ij i

i j

Y Y−∑ i

E

SSE

df

Total 1Tn − ( )

2

,

ij

i j

Y Y−∑ ii

Page 26: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-26

Sources of Variation

• MODEL line represents variation BETWEEN groups

• ERROR line represents variation WITHIN groups

• Ratio of Model to Error Mean Squares yields F-test as usual.

Page 27: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-27

Expected Mean Squares

• Can show that

( ) ( )2

2 1.

1 i ii

E MSTR nr

σ µ µ= + −−∑

where µ is the grand mean.

• ( ) 2E MSE σ=

• Ratio MSTR / MSE will be 1 if there is no treatment effect and will be bigger than 1 if there is a treatment effect.

• See page 696 for how to find the expected mean squares.

Page 28: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-28

F-test

• 0 1 2: ... rH µ µ µ= = =

• :aH Not all the

iµ are equal

• F = MSTR / MSE

• Under 0H , ( )~ 1,T

F F r n r− − , so reject

0H if F is bigger than the critical value at

significance level α

• SAS reports p-value for this test in the ANOVA table. Reject 0H if p-value less

than significance level α.

Page 29: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-29

Example (Cash Offers)

Source DF SS MS F Value Pr > F

Model 2 316.7 158.4 63.60 <.0001

Error 33 82.2 2.49

Total 35 398.9

• P-value < 0.0001 so there is some difference among the age groups

• We understand where the difference is from seeing the plots; and from the output of the MEANS statement we saw earlier. But, will check with a formal test later.

Page 30: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-30

Assumptions

• Constant variance of errors

• Residual plot: Residuals vs. X (Age Group)

• No obvious problems with the variance.

Page 31: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-31

Assumptions (2)

• Normality of errors

• No major violations of normality

• If minor violations, generally ok for ANOVA

Page 32: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-32

Assumptions (3)

• There are some slight differences in how these are assessed for ANOVA, and also in how we fix problems if they exist.

• Will discuss diagnostics/remedial measures in greater detail later.

Page 33: Lecture 20 Cell Means Model - Purdue Universityghobbs/STAT_512/Lecture_Notes/ANO… · 20-1 Lecture 20 Cell Means Model STAT 512 Spring 2011 Background Reading KNNL: 16.3-16.6

20-33

Upcoming in Lecture 21...

• Factor Effects Model

• Power/Sample Size Planning

• Sections 16.7, 16.10-11