32
7/9/2014 1 Power and Sample Size: Considerations in Study Design Nae-Yuh Wang, PhD Associate Professor of Medicine, Biostatistics & Epidemiology Institute for Clinical and Translational Research Introduction to Clinical Research Lecture Topics July 16, 2014 2 ACKNOWLEDGEMENT John McGready Department of Biostatistics Bloomberg School of Public Health 3 LECTURE TOPICS Sample size and statistical inferences Factors influencing sample size and power Sample size determination when comparing two means Sample size determination when comparing two proportions

Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

1

Power and Sample Size:Considerations in Study DesignNae-Yuh Wang, PhDAssociate Professor of Medicine, Biostatistics & Epidemiology

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

Lecture Topics

July 16, 2014

2

ACKNOWLEDGEMENT

• John McGreadyDepartment of BiostatisticsBloomberg School of Public Health

3

LECTURE TOPICS

• Sample size and statistical inferences• Factors influencing sample size and power• Sample size determination when comparing two

means• Sample size determination when comparing two

proportions

Page 2: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

2

Section A

Sample Size and Statistical Inferences

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

WHY IS SAMPLE SIZE RELEVANT

• A small sample will not allow precise estimates, therefore could not tell whether observed difference is real or due to variability. We may call things equal when we shouldn’t

• A very large sample size will allow us to detect tiny differences that have no clinical significance. Waste of research resources

• A very large sample size may not be large enough if the event rate is very small

• Sometimes the available sample size is fixed and we have to decide what we will be able to do with such a sample 5

INTERVAL ESTIMATION

Interval estimation: using sample mean and SE (SD of the sampling distribution) to construct 95% confidence interval to infer the population mean

• Sampling distribution becomes more and more like a normal (Gaussian) distribution with increasingly larger N

• 95% CI = sample mean +/- 1.96*SE= sample mean +/- 1.96*SD/√N

6

Page 3: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

3

WHY IS SAMPLE SIZE RELEVANT ESTIMATION

Estimation: What is the prevalence of obesity among adults in Baltimore City?• How precise do we want our estimate to be? • 95% CI = p +/- 1.96*SE(p)

= p +/- 1.96*SD/√n= p +/- 1.96*√p(1- p)/n

• Larger sample leads to more precise estimate, but it requires more resources.

7

ESTIMATION PRECISION

• Estimate the percentage of adults who are obese in such way that I have 95% confidence that the true percentage is within +/- 10% of my estimate.

• Define parameters of variables such as adults (e.g. age range), obese (e.g. BMI > 30 kg/m2), etc.

• Use 95% confidence interval as measure of estimation precision.

• Set acceptable precision level e, to be equated to the 95% confidence limit, 1.96 x SE

8

Commonly seen formula:n = 4 pq / e2

n: size of samplee: level of acceptable precisionp: expected prevalence value q =1 - pSD2 = Variance = pq4: constant equivalent to 95% level of confidence

9

ne /SD 1.96 222 /SD)96.1( en

ESTIMATION PRECISION

Page 4: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

4

• Say p = .30 (from literature or pilot data)e = .1 (determined subjectively, level of

minimal clinical significance)n = 4(.3)(.7)/(.1)2 = .84/.01 = 84

• If wanted precision of +/- 5%e = .05n = 4(.3)(.7)/(.05)2 = .84/.0025 = 336

• Greater estimation precision larger sample size

10

ESTIMATION PRECISION

• Estimating the difference in a single parameter (e.g. mean) between 2 populations

• Example:What is the difference in bone mineral density (lumber spine) between European and African American women, age 25-34 years of age?

11

ESTIMATION PRECISION

• Commonly used formula:n = 8 SD2 / e2 = 4 (2 SD2) / e2

n = size of sample for each populatione = level of acceptable precisionSD = population standard deviation, SD2 the variance2 SD2 reflects the fact that statistical inferences on difference of two independent populations subject to two sources of variability8 = constant equivalent to 95% confidence level and reflecting 2 sources of error

** Assumption: the populations we are comparing have the same variance, i.e.

SDB2 = SDW

2 = SD2 12

ESTIMATION PRECISION

Page 5: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

5

13

• Use formula: n = 8 SD2 / e2

Say SD = 1 (based on literature or pilot data)e = .5 (i.e. tolerable error = +/- 0.5 SD)n = 8 (1/.5)2 = 8(4) = 32 (per group)SD = 2 e = .25 (i.e. tolerable error = +/- 0.125 SD)n = 8 (2/.25)2 = 8(8)2 = 512 (per group)

• Greater population variability larger sample size• Greater estimation precision larger sample size• Higher confidence level (smaller a error) larger

sample size

ESTIMATION PRECISION

14

EXAMPLE

• Consider the following results from a study done on 29 women, all 35–39 years old

Sample Data

n Mean SBP SD of SBP

OC users 8 132.8 15.3

Non-OC Users 21 127.4 18.2

15

• Hypothesis: OC use is associated with higher blood pressure

• Statistically speaking we are interested in testing :

H0:µOC = µNO OC H0: µOC - µNO OC = 0HA: µOC ≠ µNO OC HA: µOC - µNO OC ≠ 0

µOC represents population mean SBP for OC usersµNO OC represents population mean BP for women not

using OC

EXAMPLE

Page 6: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

6

16

STATISTICAL POWER

TRUTHH0 HA

Reject H0

Not

Reject H0

Type I Error

alpha-level

Type II Error

beta

Power

1-beta

DE

CIS

ION

17

• The sample mean difference in blood pressures is 132.8 – 127.4 = 5.4

• This could be considered scientifically (clinically) significant, however, the result is not statistically significant (or even close to it!) at the = 0.05 level

• Suppose, as a researcher, you were concerned about detecting a population difference of this magnitude if it truly existed

• A study based on 29 women has low power to detect a difference of such magnitude

EXAMPLE

18

• Power is a measure of “doing the right thing” when HA is true!

• Higher power is better (smaller type II error)• We can calculate power for a given study if we specify a

specific HA

• An OC/Blood pressure study based on 29 women has power of .13 (13%) to detect a difference in BP of 5.4 mmHg or more, if this difference truly exists between the populations of 35-39 years old women who are using and not using OC

• When power is this low, it is difficult to determine whether there is really no difference in population means or we just call it wrong

STATISTICAL POWER

Page 7: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

7

Ho: 1-2= 019

• Recall, the sampling behavior of is normally distributed (large samples) with this sampling distributed centered at true mean difference. If H0is true, then curve is centered at 0

21 xx

H0: µ1 - µ2 = 0

STATISTICAL POWER

20

• Recall, the sampling behavior of is normally distributed (large samples) with this sampling distributed centered at true mean difference. If HAtruth, then curve is centered at some value d, d≠0

21 xx

HA: 1-2 = d

STATISTICAL POWER

Ho: 1-2= 021

• H0 will be rejected (for α = 0.05) if the sample result, , is more than 2 standard errors away from 0, either above or below

21 xx

H0: µ1 - µ2 = 0

STATISTICAL POWER

Page 8: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

8

22

• H0 will be rejected (for α = 0.05) if the sample result, , is more than 2 standard errors away from 0, either above or below

21 xx

H0: µ1 - µ2 = 0HA: µ1 - µ2 = d

STATISTICAL POWER

H1 H0 H1 H0

Increased n

WHY IS SAMPLE SIZE RELEVANTHYPOTHESIS TESTING

Does the sample better support a distribution under H0 or H1 ?

No difference No differenceNew TX better New TX better 23

Section B

Factors Influencing Sample Size and Power

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

Page 9: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

9

Does a clinically significant effect exist between a treated group and a non treated or differently treated group?

1-sided test: n / group = (Z1-α + Z1-β)2 2 (SD/Δ)2

2-sided test: n / group = (Z1-α/2 + Z1-β)2 2 (SD/Δ)2

25

HYPOTHESIS TESTING

SAMPLE SIZE / POWER CALCULATIONBASIC ELEMENTS

26

n , Zα , Zβ , SD2 , Δ2

• Smaller α error larger sample size• Smaller β error (higher power) larger sample

size• Higher variability in outcome larger sample

size• Smaller Δ difference to detect larger sample

size

27

Use formula:n / group = (Z1-α + Z1-β)2 (2 SD2 ) / Δ2

where Z1-α = normal deviate for a error (two sided test uses Z1-α/2)

Z1-β = normal deviate for b error2 indicates two groups (sources of variability)

i.e. treatment group; placebo groupΔ = least difference between treated (estrogen) group and non treated group (placebo) that is considered clinically significantn = sample size of each group

HYPOTHESIS TESTING

Page 10: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

10

28

Given formula:n = (Z1-α/2 + Z1-β)2 (2 SD2 ) / Δ2

Then n = (1.96 + 0.84)2 2 (.65/.30)2

= (15.68) (4.69) = 73.6 => 74

Allow for 20% random dropout: .8nc = 74nc = 74 / .8 = 92.5

Need 93 per arm for 80% power to detect a .3difference between group with 20% random dropouts

90% power use n = (1.96 + 1.28)2 2 (.65/.30)2 = 98.6

HYPOTHESIS TESTING

29

WHAT INFLUENCES POWER?

• In order to INCREASE power for a study comparing two group means, we need to do the following:1. Change the hypothesized values of µ1 and µ2 so

that the difference ( µ1 - µ2 ) is bigger

H0: µ1 - µ2 = 0HA: µ1 - µ2 = d

30

• In order to INCREASE power for a study comparing two group means, we need to do the following:1. Change the hypothesized values of µ1 and µ2 so

that the difference ( µ1 - µ2 ) is bigger

HA: µ1 - µ2 = d’H0: µ1 - µ2 = 0

WHAT INFLUENCES POWER?

Page 11: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

11

31

• In order to INCREASE power for a study comparing two group means, we need to do the following:2. Increase the sample size in each group

HA: µ1 - µ2 = dH0: µ1 - µ2 = 0

WHAT INFLUENCES POWER?

Ho: 1-2= 0HA: 1-2 = d32

• In order to INCREASE power for a study comparing two group means, we need to do the following:2. Increase the sample size in each group

HA: µ1 - µ2 = dH0: µ1 - µ2 = 0

WHAT INFLUENCES POWER?

33

• In order to INCREASE power for a study comparing two group means, we need to do the following:3. Increase the -level (functionally speaking,

make the test “easier to reject”)

with = 0.05

HA: µ1 - µ2 = dH0: µ1 - µ2 = 0

WHAT INFLUENCES POWER?

Page 12: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

12

34

• In order to INCREASE power for a study comparing two group means, we need to do the following:3. Increase the -level (functionally speaking,

make the test “easier to reject”)

with = 0.10

HA: µ1 - µ2 = dH0: µ1 - µ2 = 0

WHAT INFLUENCES POWER?

SAMPLE SIZE / POWER CALCULATIONBASIC ELEMENTS

35

n , Zα , Zβ , SD2 , Δ2

• increased α error higher power• Larger sample size higher power• Less variability (SD) in outcome higher power• Larger Δ difference to detect higher power

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

36

When the outcome of interest is before-after change of a certain outcome measure, and D denotes the between group difference of these changes:

• The SD used in the calculation should be the SD for the before-after change, not the SD for the cross-sectional outcome measure

• Assuming the SDbefore and SDafter = σ, then

SDchange = σ x √ 2(1-ρ)

where ρ is the correlation between the outcome

measures before and after

• Literature usually report SDbefore but not ρ

Page 13: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

13

• Power is computed to inform study design, and calculation of post-hoc power after a study is completed does not provide additional information to the study results

• Can only be computed for specific HA’s: i.e. this study had XX% to detect a difference in population means of YY or greater.

• Frequently, in study design, a required sample size is computed to actually achieve a certain preset power level to find a “Clinically/scientifically” minimal important difference in means

• “Industry standard” for power: 80% (or greater)

37

POWER & STUDIES

Section C

Sample Size Calculations for Comparing Two Means

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

39

• Blood pressure and oral contraceptives – Suppose we used data from the example in

Section A to motivate the following question:– Is oral contraceptive use associated with higher

blood pressure among individuals between the ages of 35–39?

EXAMPLE

Page 14: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

14

• Recall, the data:

40

PILOT STUDY

Sample Data

n Mean SBP SD of SBP

OC users 8 132.8 15.3

Non-OC Users 21 127.4 18.2

41

• 5 mmHg difference in BP is too large to ignore• Need a bigger study that has ample power to detect

such a difference, should it really exist in the population

• Task: determine sample sizes needed to detect a 5 mmHg increase in blood pressure in O.C. users with 80% power at significance level α = 0.05– Using pilot data, we estimate that the standard

deviations are 15.3 and 18.2 in O.C. and non-O.C. users respectively

PILOT STUDY

42

• Here we have a desired power in mind and want to find the sample sizes necessary to achieve a power of 80% to detect a population difference in blood pressure of five or more mmHg between the two groups

PILOT STUDY

Page 15: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

15

43

• We can find the necessary sample size(s) of this study if we specify .– α-level of test (.05)– Specific values for μ1 and μ2 (specific HA) and

hence d = μ1 - μ2: usually represents the minimum scientific difference of interest)

– Estimates of σ1 and σ2

– The desired power(.80)

PILOT STUDY

44

• How can we specify d = μ1 - μ2 and estimate population SDs?– Researcher knowledge—experience makes for

good educated guesses– Make use of pilot study data!

PILOT STUDY

45

• Fill in blanks from pilot study– -level of test (.05)– Specific HA (μOC=132.8, μNO OC =127.4), and

hence d = μ1 - μ2 = 5.4 mmHg

– Estimates of σOC ( = 15.3) and σNO OC (=18.2)– The power we desire (.80)

PILOT STUDY

Page 16: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

16

46

• Given this information, we can use Stata to do the sample size calculation

• “sampsi” command– Command syntax (items in italics are numbers to

be supplied by researcher)

sampsi 1 2, alpha() power(power) sd1(1) sd2(2)

PILOT STUDY

47

STATA RESULTS

• Blood Pressure/OC example

Total N = 306

48

PILOT STUDY/STATA RESULTS

• Our results from Stata suggest that in order to detect a difference in BP of 5.4 mmHg (if it really exists in the population) with high (80%) certainty, we would need to enroll 153 O.C. users and 153 non-users

• This assumed that we wanted equal numbers of women in each group!

Page 17: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

17

49

• Suppose we estimate that the prevalence of O.C. use amongst women 35–39 years of age is 20% – We wanted this reflected in our group sizes

• If O.C. users are 20% of the population, non-OC users are 80%– There are four times as many non-users as there

are users (4:1 ratio)

PILOT STUDY/STATA RESULTS

50

• We can specify a ratio of group sizes in Stata

– Again, using “sampsi” command with “ratio”option

sampsi 1 2, alpha() power(power) sd1(1) sd2(2) ratio(n2/n1)

PILOT STUDY/STATA RESULTS

• Blood Pressure/OC example

51

STATA RESULTS

Total N = 430

Page 18: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

18

Section D

Sample Size Determination for Comparing Two Proportions

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

53

POWER FOR COMPARING TWO PROPORTIONS

• Same ideas as with comparing means

54

• We can find the necessary sample size(s) of this study if we specify .– α-level of test – Specific values for p1 and p2 (specific HA) and

hence d = p1 - p2: usually represents the minimum scientific difference of interest)

– The desired power

PILOT STUDY

Page 19: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

19

55

• Given this information, we can use Stata to do the sample size calculation

• “sampsi” command– Command syntax (items in italics are numbers to

be supplied by researcher)

sampsi p1 p2, alpha() power(power)

PILOT STUDY

56Continued

ANOTHER EXAMPLE

• Two drugs for treatment of peptic ulcer compared (Familiari, et al., 1981)– The percentage of ulcers healed by pirenzepine (drug

A) and trithiozine (drug B) was 77% and 58% based on 30 and 31 patients respectively (p-value = .17), 95% CI for difference in proportions healed was (-.04, .42)

– The power to detect a difference as large as the sample results with samples of size 30 and 31 respectively is only 25%

)( B DRUGA DRUG p-p

Healed Not Healed Total

Drug A 23 7 30

Drug B 18 13 31

• As a clinician, you find the sample results intriguing – want to do a larger study to better quantify the difference in proportions healed

• Redesign a new trial, using aforementioned study results to “guestimate” population characteristics– Use pDRUG A = .77 and pDRUG B = .58– 80% power– = .05

• Command in Statasampsi .77 .58, alpha (.05) power (.8)

57

EXAMPLE

Page 20: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

20

• Command in Statasampsi .77 .58, alpha (.05) power (.8)

58

EXAMPLE

• What would happen if you change power to 90%?sampsi .77 .58, alpha (.05) power (.9)

59

EXAMPLE

60

• Suppose you wanted two times as many people on trithiozone (“Drug B”) as compared to pirenzephine(“Drug A”)– Here, the ratio of sample size for Group 2 to

Group 1 is 2.0

• Can use “ratio” option in “sampsi” command

EXAMPLE

Page 21: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

21

• Changing the ratio sampsi .77 .58, alpha (.05) power (.9) ratio(2)

61

Example

62

SAMPLE SIZE FOR COMPARING TWO PROPORTIONS

• A randomized trial is being designed to determine if vitamin A supplementation can reduce the risk of breast cancer– The study will follow women between the ages of

45–65 for one year– Women were randomized between vitamin A and

placebo• What sample sizes are recommended?

63

BREAST CANCER/VITAMIN A EXAMPLE

• Design a study to have 80% power to detect a 50% relative reduction in risk of breast cancer w/vitamin A

(i.e. )

using a (two-sided) test with α-level = .05• To get estimates of proportions of interest:

- using other studies, the breast cancer rate in the controls can be assumed to be 150/100,000 per year

50.PLACEBO

VITA

p

pRR

Page 22: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

22

• A 50% relative reduction: ifthen,

So, for this desired difference in the relative scale:

Notice, that this difference on the absolute scale, , is much smaller in magnitude:

64

100,000

750.5

100,000

150AVITp

50.PLACEBO

VITA

p

pRR PLACEBOAVIT pp 50.

PLACEBOAVIT pp

%075.000075.0 100,000

75

BREAST CANCER/VITAMIN A EXAMPLE

65

BREAST CANCER SAMPLE SIZE CALCULATION IN STATA

• “sampsi” commandsampsi .00075 .0015, alpha(.05) power(.8)

66

BREAST CANCER SAMPLE SIZE CALCULATION IN STATA

• You would need about 34,000 individuals per group• Why so many?

– Difference between two hypothesized proportions is very small: = .00075

We would expect about 50 cancer cases among the controls and 25 cancer cases among the vitamin A group

2534,000100,000

75:A vitamin

5134,000100,000

150:placebo

PLACEBOAVIT pp

Page 23: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

23

• Suppose you want 80% power to detect only a 20% (relative) reduction in risk associated with vitamin A

A 20% relative reduction: if then

So, for this desired difference in the relative scale:

Notice, that this difference on the absolute scale, , is much smaller in magnitude:

67

100,000

1200.8

100,000

150AVITp

BREAST CANCER/VITAMIN A EXAMPLE

80.PLACEBO

VITA

p

pRR

PLACEBOAVIT pp 80.

PLACEBOAVIT pp

%03.00003.0 100,000

30

• “sampsi” commandsampsi .0012 .0015, alpha(.05) power(.8)

68

BREAST CANCER SAMPLE SIZE CALCULATION IN STATA

69

BREAST CANCER—VITAMIN A EXAMPLE REVISITED

• You would need about 242,000 per group!• We would expect 360 cancer cases among the

placebo group and 290 among vitamin A group

Page 24: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

24

70

AN ALTERNATIVE APPROACH—DESIGN A LONGER STUDY

• Proposal– Five-year follow-up instead of one year

Here:

pVITA≈5×.0012 = .006pPLACEBO≈5×.0015 = .0075

• Need about 48,400 per group– Yields about 290 cases among vitamin A and 360

cases among placebo• Issue

– Loss to follow-up

Section E

More Examples

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

Johns Hopkins Practice-based Opportunity for Promotion of Weight Reduction (POWER) Trial: • a comparative effectiveness trial that tested

two practical behavioral weight loss interventions in obese medical outpatients with cardiovascular risk factors

72

HYPOTHESIS TESTINGEXAMPLE

Page 25: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

25

HYPOTHESIS TESTINGEXAMPLE

73

Control

Remote

In-Person

Randomization

= Measured weights and other outcomes

Baseline 6 Mo 12 Mo 24 Mo

HYPOTHESIS TESTINGEXAMPLE

Study Design

74

Hypotheses: • Primary:

– The Remote arm will produce 2.75kg greater mean weight loss over the 24 month trial period than the Control arm

– The In-Person arm will produce 2.75kg greater mean weight loss over the 24 month trial period than the Control arm

• Secondary:– In-Person vs. Remote, other weight outcomes (%

weight loss, % reached goal), other health outcomes…75

HYPOTHESIS TESTINGEXAMPLE

Page 26: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

26

Considerations for selecting the primary hypotheses in POWER trial• Weight loss can be quantitatively measured with

high precision, easy interpretation• Continuous outcomes typically allow greater

statistical power• Weight loss as small as 2.75 kg has been shown to

produce health benefits, so clinically relevant• Difference in intervention effects between Remote

vs In-Person not clear but expected to be smaller• Type-I error adjustment for multiple comparisons

76

HYPOTHESIS TESTINGEXAMPLE

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

77

Back to the POWER trial example:

Minimum Detectable Difference (MDD) in Kg Minimum Detectable Difference α=0.025A α=0.05B

Sample size Baselinec 18m ∆ σ VIF ρ β=0.80 β=0.90 β=0.80 β=0.90 300 (100/group)

103.83 -3.41 16.53 1.0 0.85 2.68 3.10 3.60 4.15 16.53 1.5 0.85 2.75 3.27 4.40 5.05

350 (117/group)

16.53 1.0 0.85 2.48 2.90 3.30 3.85 16.53 1.5 0.85 3.05 3.55 4.05 4.70

400 (133/group)

16.53 1.0 0.85 2.33 2.70 3.10 3.60 16.53 1.5 0.85 2.85 3.30 3.80 4.40

APower if either of 2 tests is positive at α=0.05/2; BPower for single test at α=0.05; CBaseline, 18m ∆ (= change from EST+DASH intervention in PREMIER, net of control), and rho(ρ, correlation between baseline and 18m), all from PREMIER trial unless otherwise indicated.

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

78

Then, budget planning indicated that we could afford N = 390 (130/group), so the table evolved to:

Net tx effect

in Kg

α=0.025A (at least 1 of 2

significant)

α=0.025B (1 of 1

significant)

α=0.05C (1 of 1

significant) 2.00 0.52 0.31 0.41 2.50 0.72 0.47 0.582.75 0.80 0.56 0.66 3.00 0.87 0.64 0.74 3.25 0.92 0.72 0.803.50 0.95 0.78 0.86 4.00 0.99 0.89 0.934.50 1.00 0.95 0.97

Page 27: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

27

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

79

Recall POWER has 2 primary hypotheses to test:

H0: Remote - Control = 0; H1: Remote - Control > 2.75kg

H0’: In-Person - Control = 0; H1’: In-Person - Control > 2.75kg

If each test is controlled at α = 0.05, then the prob. for both tests to be type-I error free would be

(0.95)(0.95) = 0.9025

So the prob. for at least 1 test to encounter type-I error would be 1 - 0.9025 = .0975 > 0.05

=> Multiple comparisons inflate overall α error

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

80

If each of the 2 tests is controlled at α = 0.05/2 = 0.025, then the prob. for both independent tests to be type-I error free would be

(0.975)(0.975) = 0.950625

So the prob. for at least 1 test to encounter type-I error would be 1 - 0.950625 = .049375 < 0.05

Bonferroni adjustment for multiple comparisons:

if we are testing k hypotheses and want to control the overall α at the nominal 0.05 level, then we should test each at α = 0.05/k level

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

81

Bonferroni correction is conservative in terms of statistical power, requires greater sample size

Holm method:

• Compare the smaller p-value of the 2 tests against α = 0.05/2 = 0.025. If significant, then compare the larger p-value against α = 0.05. If the first test is not significant, then declare both tests as non-significant.

• Family-wise type I error rate < 0.05

• Provide better statistical power then the Bonferronicorrection

Page 28: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

28

SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS

82

Then, budget planning indicated that we could afford N = 390 (130/group), so the table evolved to:

Net tx effect

in Kg

α=0.025A (at least 1 of 2

significant)

α=0.025B (1 of 1

significant)

α=0.05C (1 of 1

significant) 2.00 0.52 0.31 0.41 2.50 0.72 0.47 0.582.75 0.80 0.56 0.66 3.00 0.87 0.64 0.74 3.25 0.92 0.72 0.803.50 0.95 0.78 0.86 4.00 0.99 0.89 0.934.50 1.00 0.95 0.97

HYPOTHESIS TESTINGEXAMPLE

83

HYPOTHESIS TESTINGEXAMPLE

84

Page 29: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

29

Hypotheses: – Ca + Vit D arm will have 22% reduction of colon CA

incidence over average 8 years of trial period than the Placebo arm

Log-rank test at α = 0.05, with incidence rate of 0.2% per year in the placebo arm, and pre-fixed N = 35,000

Plug these assumed parameters into the calculator=> power = 81% (83% stated in the paper)

85

HYPOTHESIS TESTINGEXAMPLE

86

Sample size / power evaluation is a critical part of study design for hypothesis testing• Smaller α error larger sample size• Smaller β error (higher power) larger sample

size• Higher variability in outcome larger sample

size• Smaller difference to detect larger sample

size

Need to account for type I error when testing multiple hypotheses

SUMMARY

Section F

Sample Size and Study Design Principles: A Brief Summary

Institute for Clinical and Translational ResearchIntroduction to Clinical Research

Page 30: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

30

88

DESIGNING YOUR OWN STUDY

• When designing a study, there is a tradeoff between :– Power– -level– Sample size – Minimum detectable difference (specific HA)

• Industry standard—80% power, = 0.05

89

• What if sample size calculation yields group sizes that are too big (i.e., cannot afford or will take too long to do the study) or are very difficult to recruit enough participants for the study?– Increase minimum difference of interest– Increase -level– Decrease desired power– Select a more precise primary outcome

DESIGNING YOUR OWN STUDY

90

• Sample size calculations are an important part of study proposal– Study funders want to know that the researcher

can detect a relationship with a high degree of certainty (should it really exist)

• Even if you anticipate confounding factors, these approaches are the best you can do and are relatively easy

• Accounting for confounders requires more information and sample size has to be done via computer simulation—consult a statistician!

DESIGNING YOUR OWN STUDY

Page 31: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

31

91

• What is this specific alternative hypothesis? – Power can only be calculated for a specific

alternative hypothesis– When comparing two groups this means providing

value of the true population mean (proportion) for each group

DESIGNING YOUR OWN STUDY

92

• What is this specific alternative hypothesis?– Therefore specifying a difference between the two

groups, typically based on the smallest difference of scientific / clinical interest

– This difference is frequently called effect size– When sample size, power and other parameters are

fixed for a study, this difference derived from the calculation is frequently called minimum detectable difference

– Typically aiming for a sample size that would afford sufficient power to detect the minimum detectable difference that covers the smallest difference of scientific / clinical interest

DESIGNING YOUR OWN STUDY

93

• Where does this specific alternative hypothesis come from?– Hopefully, not the statistician!– As this is generally a quantity of scientific

interest, it is best estimated by a knowledgeable researcher or pilot study data

– This is perhaps the most difficult component of sample size calculations, as there is no magic rule or “industry standard”

DESIGNING YOUR OWN STUDY

Page 32: Wang Sample Size 16July2014 - ICTR · 2. Increase the sample size in each group HA: µ1-µ2 = d H0: µ1-µ2 = 0 WHAT INFLUENCES POWER? Ho: 1- 2= 0 32 HA: 1- 2 = d • In order to

7/9/2014

32

94

CALCULATING POWER

• In order to calculate power for a study comparing two population means, we need the following:– Sample size for each group– Estimated (population) means, μ1 and μ2 for each

group—these values frame a specific alternative hypothesis (usually minimum difference of scientific interest)

– Estimated (population) SD’s, σ1 and σ2

– -level of the hypothesis test

OTHER CONSIDERATIONS

Obtaining high quality data through best possible study design and rigorous research conduct are critical to the success of clinical investigations

Work with a biostatistician starting at the study design stage

95