44
Sample Size/Power Calculations 1

Sample Size/Power Calculations - Purdue Universitybacraig/notes525/Power 1.pdfsoftware like PROC GLM/MIXED (O’Brien and Lohr, 1984) 1) Substitute “true means” for data in ANOVA

  • Upload
    lamque

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Sample Size/Power Calculations

1

Why Power Analysis?

•Research is expensive…wouldn’t want to conduct an experiment with far too

1. few experimental units (EUs)Project won’t find important differences that exist not really worth doing

2. many experimental units (EUs)Project will be unnecessarily too expensive

•Typical granting agency requirement

2

A Simple Experimental Design

•Effect of diet on blood pressure (mmHg) in rats

•Consider a Completely Randomized Design (CRD)• 12 rats randomly assigned to one of two different diets

• Trt 1: DASH diet…n=6• Trt 2: Standard diet…n=6

• Investigator expects higher mean blood pressure (BP) at the end of 12 weeks when under Trt 2

• Is n=6 enough to detect this difference?

3

Statistical Analysis• Two competing hypotheses:

Ho: m1=m2

H1: m1<m2

• Basis for choosing between the two is the degree of evidence against the null hypothesis. We use the P-value relative to a declared significance level a

• P ≤ a → reject Ho and conclude mean BP larger in Trt 2

• P > a → fail to reject Ho, not enough evidence to conclude H1

• There are two possible incorrect conclusions based on this approach to inference

4

i.e. one-tailed test (for now)

Type I and Type II errors

5

Fail to reject Ho:

(P>a)

Reject Ho:

(P≤a)

HoNo error

Type I error

(Prob is a)

H1

Type II error

(Prob = b)No error

What the data indicate:Tru

e u

nknow

n s

tate

So is n = 6 rats large enough?

•Rephrase: Do we have enough statistical power?

•Need to “know” two things

1.How large is the true mean difference (d = m2-m1)?a) What do you anticipate and/or want to detect?b)What would be economically/practically important?

2.How much variability (s) between rats within a grp?• Sometimes prior information available from pilot study or

previously published studies• Otherwise need to make an educated “guess”• Always round up to be a little conservative

6

One way to elicit values for s

•Empirical rule: Consider range of responses to be equal to 4s

•Question to client: What would be the likely range (max-min) of responses for rats within the same trt?

•Suppose the answer was 60 mmHgR = 60 → s = 15 mmHg.

7R≈4s

Suppose researchers also believe that d ≥20 mmHg is important

Two competing hypotheses

• Under Ho:

• Under H1:

Conduct one-tailed z-test for a certain a

8

2

2 1

2~ 0,y y N

n

s

2

2 1

2~ ,y y N

n

sd

2 1

22

y yz z

n

a

s

Reject Ho: if

2

2 1

2y y z

na

s if

Currently assuming the data are Normal.

Sole difference is in the mean of the distribution.

Distributions of

9

Ho: H1:

0 d

22z

na

s

a

1b

12 yy

)Power 1-

21 2 /z na

b

d s

More reasonable statistical test

• t-test • Because you likely won’t be able to assume s2 is known• One-sided: Reject Ho if

• Two-sided (H1: m1≠m2): Reject Ho: if

• Use of t distribution results in more complicated alternative hypothesis distribution (non-central t)

10

/ 2,dft ta

2 1,

2 2

1 2

df

y yt t

s s

n n

a

Using SAS for power analysis

11

proc power;

twosamplemeans alpha=.05 nulldiff=0 sides=1

meandiff=20 npergroup=6 stddev=15

power=.;

run;

proc power;

onewayanova alpha=.05 test=overall

groupmeans=(0 20) npergroup=6 stddev=15

power=.;

run;

or

Similar to two-sided t-test

SAS Output

12

Two-sample t Test for Mean DifferenceFixed Scenario Elements

Distribution NormalMethod ExactNumber of Sides 1Null Difference 0Alpha 0.05Mean Difference 20Standard Deviation 15Sample Size Per Group 6

Computed PowerPower0.693

SAS Output

13

Overall F Test for One-Way ANOVAFixed Scenario Elements

Method ExactAlpha 0.05Group Means 0 20Standard Deviation 15Sample Size Per Group 6

Computed PowerPower0.550

Typically want power to be larger than 80% so more rats would be desirable

Using SAS for sample size

14

proc power;

twosamplemeans alpha=.05 nulldiff=0 sides=1

meandiff=20 npergroup=. stddev=15

power=.80;

run;

proc power;

onewayanova alpha=.05 test=overall

groupmeans=(0 20) npergroup=. stddev=15

power=.80;

run;

or

SAS Output

15

Two-sample t Test for Mean DifferenceFixed Scenario Elements

Distribution NormalMethod ExactNumber of Sides 1Null Difference 0Alpha 0.05Mean Difference 20Standard Deviation 15Nominal Power 0.8

Computed N Per GroupActual N PerPower Group0.813 8

SAS Output

16

Overall F Test for One-Way ANOVAFixed Scenario Elements

Method ExactAlpha 0.05Group Means 0 20Standard Deviation 15Nominal Power 0.8

Computed N Per GroupActual N PerPower Group0.805 10

Generating Power Curve I

17

proc power;twosamplemeans alpha=.05 nulldiff=0 sides=1

meandiff=20 stddev=15 power=.npergroup=3 to 20 by 1;

plot interpol=join yopts=(ref=0.80);run;

18

0.8

0 5 10 15 20

Sample Size Per Group

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0P

ow

er

Power Curve for one-sided t test

Generating Power Curve II

19

proc power;twosamplemeans alpha=.05 nulldiff=0 sides=1

meandiff=10 to 30 by 1 stddev=15npergroup=6 power=.;

plot x=effect interpol=join yopts=(ref=0.80);run;

Determining sample size for a desired margin of error

•Confidence interval

•Given guesstimates for the variances, one can set margin of error equal to desired amount and solve for n

2 2

1 22 1 df

s sy y t

n n Margin of error

What if more than two trts?

•Example: In a study of vitamin supplementation, certain pigs are assigned to each of 5 treatment groups and weight gains over a specified time period are to be recorded.

• Researchers anticipate mean responses to be 3.9, 4.1,4.2, 4.3 and 4.5 kg for the five treatments, respectively

•Based on previous experience, they anticipate a within-treatment variance of about 0.30 kg2

•They want to know if n=4 animals per treatment would provide sufficient power for the ANOVA F-test.

22

1) ij i ijY em

2) ij i ijY em a

Linear model written two ways

23

i= 1,....,r=5; j = 1,2,…,n=4 )2~ 0,ije NIID s

1

1

0

r

i ri

i i i

ir

m

m a m m a

i.e. Sum-to-zero constraints

Factor level effects model

Cell means model

Central F-distribution

One-way ANOVA table

24

Source Df SS MS EMS

Treatment r-1 SSTrt MSTrt 2

1

2r

i

i

function as

Error r(n-1) SSE MSE s 2

ANOVA F-test:

1) Ho: m1=m2=m3=m4=m5 versus H1: at least one mi≠mi’

2) Ho: ALL ai = 0 versus. H1: at least one ai ≠ 0

Note: if Ho: is true then both EMS = s2 such that F = MSTrt/MSE ~ Fr-1,r(n-1)

Equivalent specs.

Power determination for F-test

• Under H1: , or

This means F = MSTrt/MSE ~ Fr-1,r(n-1),f

25

1

2

3

4

5

3.9

4.1

4.2

4.3

4.5

m

m

m

m

m

1

2

3

4

5

0.3

0.1

0.0

0.1

0.3

a

a

a

a

a

2

21

r

i

i

nf a

s

is the non-centrality parameter

Non-central F-distribution (if f > 0)

with 4.2m

“Corrected sum of squared means” (CSSM) =(-0.3)2+(-0.1)2+ +(0.0)2+ (0.1)2+(+0.3)2=0.20 for example

SAS Code

proc power;

onewayanova alpha=.05 test=overall

groupmeans=(3.9 4.1 4.2 4.3 4.5)

npergroup=4 stddev=0.5477

power=.;

run;

26

This is the square root of 0.30

SAS Output

27

Overall F Test for One-Way ANOVAFixed Scenario Elements

Method ExactAlpha 0.05Group Means 3.9 4.1 4.2 4.3 4.5Standard Deviation 0.5477Sample Size Per Group 4

Computed PowerPower0.171

Very poorly underpowered….as designed, this would be a waste of time and money to run!!

SAS Codeproc power;

onewayanova alpha=.05 test=overall

groupmeans=(3.9 4.1 4.2 4.3 4.5)

npergroup=4 to 30

stddev=0.5477 power=.;

plot interpol=join yopts=(ref=.80);

run;

28

Let’s look at a power curve to get an idea of the necessary sample size

29

0.8

0 5 10 15 20 25 30

Sample Size Per Group

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pow

er

Looks like we need about 19 animals per group (almost 5 times the number before)

What if trt means unknown?

•Use the “worst case” scenario

•Conservative assessment of power• Just have to know the difference between the largest and

smallest means or the smallest difference D that is scientifically meaningful

• Use –D/2 and D/2 with all other means clumped at zero

30

fa minimizesit so Minimizes2

iTrue power will be greater than or equal to this

SAS Code

**Suppose D=0.6***;

proc power;

onewayanova alpha=.05 test=overall

groupmeans=(-0.3 0 0 0 0.3)

npergroup=4 to 30

stddev=0.5477 power=.;

plot interpol=join yopts=(ref=.80);

run;

31

32

0.8

0 5 10 15 20 25 30

Sample Size Per Group

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0P

ow

er

Looks like we need about 21 animals per group in the worst case

There is actually a “trick” to computing f using ANOVA software like PROC GLM/MIXED (O’Brien and Lohr, 1984)

1) Substitute “true means” for data in ANOVA.

2) Use the ANOVA table to compute the noncentrality parameter

3) Then use that computed value in power calculations!

33

Using “true means” for data

data oneway;

input treatment mean;

datalines;

1 4.0

1 4.0

1 4.0

1 4.0

2 4.3

2 4.3

2 4.3

2 4.3

3 4.6

3 4.6

3 4.6

3 4.6

;

34

Suppose you are interested in 3 treatments.

Anticipate true mean responses of 4.0, 4.3 and 4.6

Anticipate residual variance of 0.30

Wish to compute power based on sample size of n= 4 for each treatment.

proc mixed data=oneway noprofile;class treatment;model mean = treatment;parms (0.30) /noiter;ods output tests3 = tests3;

run;

Output the ANOVA table to a file called “tests3”

Trick to compute f

• Compute the ANOVA treatment “F ratio”

• Multiple “FTreatment” by numerator degrees of freedom (NumDF) to get f:

• FTreatment is a function of CSSM.

35

" "TreatmentF

1.2*2 2" ." 4* TreatmenTreatmen tt dF ff

Obs Effect NumDF DenDF FValue ProbF

1 treatment 2 9 1.20 0.3452

Use f to computer power

36

data power;set tests3;noncent = Fvalue*numdf;alpha = 0.05;criticalvalue = Finv(1-alpha,numdf,dendf,0);Power = 1-Probf(criticalvalue,numdf,dendf,noncent);

run;proc print data=power;run;

Effect Num

DF

Den

DF

FValue ProbF noncent alpha Critical

value

Power

treatment 2 9 1.20 0.3452 2.4 0.05 4.25649 0.20010

The critical value separating the “acceptance region” from the “rejection region”

Probability of falling in rejection region if H1 is true.

PROC GLMPOWER does this

data example1;

input FactorA $ mean;

datalines;

1 4.0

2 4.3

3 4.6

run;

proc glmpower data=example1 ;

class FactorA ;

model mean = FactorA ;

power

stddev = .548

ntotal = 12

power = .alpha=0.05;

run;

37

Total number of experimental units

Much simpler data step

The GLMPOWER ProcedureFixed Scenario Elements

Dependent Variable meanSource FactorAAlpha 0.05Error Standard Deviation 0.548Total Sample Size 12Test Degrees of Freedom 2Error Degrees of Freedom 9

Computed PowerPower0.200

What about Factorial Designs?

•An experiment was conducted to determine the effects of three different sources of dietary phosphorous and four different varieties of corn silage on daily milk production

•Proposed a 3 x 4 factorial experiment:• Factor A, Dietary phosphorus : 1, 2, & 3 (a=3)• Factor B, Corn silage varieties: 1,2,3, & 4 (b=4).

•Each cow randomly assigned to just one particular A*B treatment combination.

•How many cows should be considered?

38

Need to specify “true” means

39

Power analysis requires “knowledge” of mij and s2.

11 37m 12 38m 13 44m 14 41m

21 42m 22 43m 23 49m 24 46m

31 47m 32 48m 33 54m 34 51m

Suppose, investigator anticipates that:

s2 = 5 kg2

Wishes to determine power for both main effects and two-way interaction and also the difference between, say, Level 1 and 2 of A

) )1. 2. 11 12 13 14 21 22 23 24

1 1

4 4m m m m m m m m m m

Setup “data”

data power;

input FactorA FactorB cellmean;

datalines;

1 1 37

1 2 38

1 3 44

1 4 41

2 1 42

2 2 43

2 3 49

2 4 46

3 1 47

3 2 48

3 3 54

3 4 51

run;

40

symbol1 i=join;proc gplot;plot cellmean*FactorB=FactorA;run;

Profile means plot

41

Researcher anticipating no interaction

(Power analysis should still take its possiblity into account in ANOVA)

cellmean

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

FactorB

1 2 3 4

FactorA 1 2 3

42

proc glmpower data=power ;class FactorA FactorB;model cellmean = FactorA | FactorB ;contrast 'A1 vs A2' FactorA 1 -1 0 FactorB 0 0 0 0 FactorA*FactorB 0.25 0.25 0.25 0.25

-0.25 -0.25 -0.25 -0.250 0 0 0 ;

power stddev = 5/* square root of residual standard deviation */

ntotal = 36/* provides power determination for n =36/12 = 3 reps per group */

power = . /* Blank…because you want to compute power */

alpha=0.05;plot x=n min=24 max=96;

/* power curve plot ranging from n = 24/12 to 96/12 */ run;

Using GLMPower

PROC GLMPOWER OUTPUT

43

Fixed Scenario Elements

Dependent Variable cellmeanAlpha 0.05Error Standard Deviation 5Total Sample Size 36Error Degrees of Freedom 24

Computed PowerTest

Index Type Source DF Power 1 Effect FactorA 2 0.9892 Effect FactorB 3 0.7203 Effect FactorA*FactorB 6 0.0504 Contrast A1 vs A2 1 0.652

Power curves

44