Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
7/9/2014
1
Power and Sample Size:Considerations in Study DesignNae-Yuh Wang, PhDAssociate Professor of Medicine, Biostatistics & Epidemiology
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
Lecture Topics
July 16, 2014
2
ACKNOWLEDGEMENT
• John McGreadyDepartment of BiostatisticsBloomberg School of Public Health
3
LECTURE TOPICS
• Sample size and statistical inferences• Factors influencing sample size and power• Sample size determination when comparing two
means• Sample size determination when comparing two
proportions
7/9/2014
2
Section A
Sample Size and Statistical Inferences
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
WHY IS SAMPLE SIZE RELEVANT
• A small sample will not allow precise estimates, therefore could not tell whether observed difference is real or due to variability. We may call things equal when we shouldn’t
• A very large sample size will allow us to detect tiny differences that have no clinical significance. Waste of research resources
• A very large sample size may not be large enough if the event rate is very small
• Sometimes the available sample size is fixed and we have to decide what we will be able to do with such a sample 5
INTERVAL ESTIMATION
Interval estimation: using sample mean and SE (SD of the sampling distribution) to construct 95% confidence interval to infer the population mean
• Sampling distribution becomes more and more like a normal (Gaussian) distribution with increasingly larger N
• 95% CI = sample mean +/- 1.96*SE= sample mean +/- 1.96*SD/√N
6
7/9/2014
3
WHY IS SAMPLE SIZE RELEVANT ESTIMATION
Estimation: What is the prevalence of obesity among adults in Baltimore City?• How precise do we want our estimate to be? • 95% CI = p +/- 1.96*SE(p)
= p +/- 1.96*SD/√n= p +/- 1.96*√p(1- p)/n
• Larger sample leads to more precise estimate, but it requires more resources.
7
ESTIMATION PRECISION
• Estimate the percentage of adults who are obese in such way that I have 95% confidence that the true percentage is within +/- 10% of my estimate.
• Define parameters of variables such as adults (e.g. age range), obese (e.g. BMI > 30 kg/m2), etc.
• Use 95% confidence interval as measure of estimation precision.
• Set acceptable precision level e, to be equated to the 95% confidence limit, 1.96 x SE
8
Commonly seen formula:n = 4 pq / e2
n: size of samplee: level of acceptable precisionp: expected prevalence value q =1 - pSD2 = Variance = pq4: constant equivalent to 95% level of confidence
9
ne /SD 1.96 222 /SD)96.1( en
ESTIMATION PRECISION
7/9/2014
4
• Say p = .30 (from literature or pilot data)e = .1 (determined subjectively, level of
minimal clinical significance)n = 4(.3)(.7)/(.1)2 = .84/.01 = 84
• If wanted precision of +/- 5%e = .05n = 4(.3)(.7)/(.05)2 = .84/.0025 = 336
• Greater estimation precision larger sample size
10
ESTIMATION PRECISION
• Estimating the difference in a single parameter (e.g. mean) between 2 populations
• Example:What is the difference in bone mineral density (lumber spine) between European and African American women, age 25-34 years of age?
11
ESTIMATION PRECISION
• Commonly used formula:n = 8 SD2 / e2 = 4 (2 SD2) / e2
n = size of sample for each populatione = level of acceptable precisionSD = population standard deviation, SD2 the variance2 SD2 reflects the fact that statistical inferences on difference of two independent populations subject to two sources of variability8 = constant equivalent to 95% confidence level and reflecting 2 sources of error
** Assumption: the populations we are comparing have the same variance, i.e.
SDB2 = SDW
2 = SD2 12
ESTIMATION PRECISION
7/9/2014
5
13
• Use formula: n = 8 SD2 / e2
Say SD = 1 (based on literature or pilot data)e = .5 (i.e. tolerable error = +/- 0.5 SD)n = 8 (1/.5)2 = 8(4) = 32 (per group)SD = 2 e = .25 (i.e. tolerable error = +/- 0.125 SD)n = 8 (2/.25)2 = 8(8)2 = 512 (per group)
• Greater population variability larger sample size• Greater estimation precision larger sample size• Higher confidence level (smaller a error) larger
sample size
ESTIMATION PRECISION
14
EXAMPLE
• Consider the following results from a study done on 29 women, all 35–39 years old
Sample Data
n Mean SBP SD of SBP
OC users 8 132.8 15.3
Non-OC Users 21 127.4 18.2
15
• Hypothesis: OC use is associated with higher blood pressure
• Statistically speaking we are interested in testing :
H0:µOC = µNO OC H0: µOC - µNO OC = 0HA: µOC ≠ µNO OC HA: µOC - µNO OC ≠ 0
µOC represents population mean SBP for OC usersµNO OC represents population mean BP for women not
using OC
EXAMPLE
7/9/2014
6
16
STATISTICAL POWER
TRUTHH0 HA
Reject H0
Not
Reject H0
Type I Error
alpha-level
Type II Error
beta
Power
1-beta
DE
CIS
ION
17
• The sample mean difference in blood pressures is 132.8 – 127.4 = 5.4
• This could be considered scientifically (clinically) significant, however, the result is not statistically significant (or even close to it!) at the = 0.05 level
• Suppose, as a researcher, you were concerned about detecting a population difference of this magnitude if it truly existed
• A study based on 29 women has low power to detect a difference of such magnitude
EXAMPLE
18
• Power is a measure of “doing the right thing” when HA is true!
• Higher power is better (smaller type II error)• We can calculate power for a given study if we specify a
specific HA
• An OC/Blood pressure study based on 29 women has power of .13 (13%) to detect a difference in BP of 5.4 mmHg or more, if this difference truly exists between the populations of 35-39 years old women who are using and not using OC
• When power is this low, it is difficult to determine whether there is really no difference in population means or we just call it wrong
STATISTICAL POWER
7/9/2014
7
Ho: 1-2= 019
• Recall, the sampling behavior of is normally distributed (large samples) with this sampling distributed centered at true mean difference. If H0is true, then curve is centered at 0
21 xx
H0: µ1 - µ2 = 0
STATISTICAL POWER
20
• Recall, the sampling behavior of is normally distributed (large samples) with this sampling distributed centered at true mean difference. If HAtruth, then curve is centered at some value d, d≠0
21 xx
HA: 1-2 = d
STATISTICAL POWER
Ho: 1-2= 021
• H0 will be rejected (for α = 0.05) if the sample result, , is more than 2 standard errors away from 0, either above or below
21 xx
H0: µ1 - µ2 = 0
STATISTICAL POWER
7/9/2014
8
22
• H0 will be rejected (for α = 0.05) if the sample result, , is more than 2 standard errors away from 0, either above or below
21 xx
H0: µ1 - µ2 = 0HA: µ1 - µ2 = d
STATISTICAL POWER
H1 H0 H1 H0
Increased n
WHY IS SAMPLE SIZE RELEVANTHYPOTHESIS TESTING
Does the sample better support a distribution under H0 or H1 ?
No difference No differenceNew TX better New TX better 23
Section B
Factors Influencing Sample Size and Power
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
7/9/2014
9
Does a clinically significant effect exist between a treated group and a non treated or differently treated group?
1-sided test: n / group = (Z1-α + Z1-β)2 2 (SD/Δ)2
2-sided test: n / group = (Z1-α/2 + Z1-β)2 2 (SD/Δ)2
25
HYPOTHESIS TESTING
SAMPLE SIZE / POWER CALCULATIONBASIC ELEMENTS
26
n , Zα , Zβ , SD2 , Δ2
• Smaller α error larger sample size• Smaller β error (higher power) larger sample
size• Higher variability in outcome larger sample
size• Smaller Δ difference to detect larger sample
size
27
Use formula:n / group = (Z1-α + Z1-β)2 (2 SD2 ) / Δ2
where Z1-α = normal deviate for a error (two sided test uses Z1-α/2)
Z1-β = normal deviate for b error2 indicates two groups (sources of variability)
i.e. treatment group; placebo groupΔ = least difference between treated (estrogen) group and non treated group (placebo) that is considered clinically significantn = sample size of each group
HYPOTHESIS TESTING
7/9/2014
10
28
Given formula:n = (Z1-α/2 + Z1-β)2 (2 SD2 ) / Δ2
Then n = (1.96 + 0.84)2 2 (.65/.30)2
= (15.68) (4.69) = 73.6 => 74
Allow for 20% random dropout: .8nc = 74nc = 74 / .8 = 92.5
Need 93 per arm for 80% power to detect a .3difference between group with 20% random dropouts
90% power use n = (1.96 + 1.28)2 2 (.65/.30)2 = 98.6
HYPOTHESIS TESTING
29
WHAT INFLUENCES POWER?
• In order to INCREASE power for a study comparing two group means, we need to do the following:1. Change the hypothesized values of µ1 and µ2 so
that the difference ( µ1 - µ2 ) is bigger
H0: µ1 - µ2 = 0HA: µ1 - µ2 = d
30
• In order to INCREASE power for a study comparing two group means, we need to do the following:1. Change the hypothesized values of µ1 and µ2 so
that the difference ( µ1 - µ2 ) is bigger
HA: µ1 - µ2 = d’H0: µ1 - µ2 = 0
WHAT INFLUENCES POWER?
7/9/2014
11
31
• In order to INCREASE power for a study comparing two group means, we need to do the following:2. Increase the sample size in each group
HA: µ1 - µ2 = dH0: µ1 - µ2 = 0
WHAT INFLUENCES POWER?
Ho: 1-2= 0HA: 1-2 = d32
• In order to INCREASE power for a study comparing two group means, we need to do the following:2. Increase the sample size in each group
HA: µ1 - µ2 = dH0: µ1 - µ2 = 0
WHAT INFLUENCES POWER?
33
• In order to INCREASE power for a study comparing two group means, we need to do the following:3. Increase the -level (functionally speaking,
make the test “easier to reject”)
with = 0.05
HA: µ1 - µ2 = dH0: µ1 - µ2 = 0
WHAT INFLUENCES POWER?
7/9/2014
12
34
• In order to INCREASE power for a study comparing two group means, we need to do the following:3. Increase the -level (functionally speaking,
make the test “easier to reject”)
with = 0.10
HA: µ1 - µ2 = dH0: µ1 - µ2 = 0
WHAT INFLUENCES POWER?
SAMPLE SIZE / POWER CALCULATIONBASIC ELEMENTS
35
n , Zα , Zβ , SD2 , Δ2
• increased α error higher power• Larger sample size higher power• Less variability (SD) in outcome higher power• Larger Δ difference to detect higher power
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
36
When the outcome of interest is before-after change of a certain outcome measure, and D denotes the between group difference of these changes:
• The SD used in the calculation should be the SD for the before-after change, not the SD for the cross-sectional outcome measure
• Assuming the SDbefore and SDafter = σ, then
SDchange = σ x √ 2(1-ρ)
where ρ is the correlation between the outcome
measures before and after
• Literature usually report SDbefore but not ρ
7/9/2014
13
• Power is computed to inform study design, and calculation of post-hoc power after a study is completed does not provide additional information to the study results
• Can only be computed for specific HA’s: i.e. this study had XX% to detect a difference in population means of YY or greater.
• Frequently, in study design, a required sample size is computed to actually achieve a certain preset power level to find a “Clinically/scientifically” minimal important difference in means
• “Industry standard” for power: 80% (or greater)
37
POWER & STUDIES
Section C
Sample Size Calculations for Comparing Two Means
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
39
• Blood pressure and oral contraceptives – Suppose we used data from the example in
Section A to motivate the following question:– Is oral contraceptive use associated with higher
blood pressure among individuals between the ages of 35–39?
EXAMPLE
7/9/2014
14
• Recall, the data:
40
PILOT STUDY
Sample Data
n Mean SBP SD of SBP
OC users 8 132.8 15.3
Non-OC Users 21 127.4 18.2
41
• 5 mmHg difference in BP is too large to ignore• Need a bigger study that has ample power to detect
such a difference, should it really exist in the population
• Task: determine sample sizes needed to detect a 5 mmHg increase in blood pressure in O.C. users with 80% power at significance level α = 0.05– Using pilot data, we estimate that the standard
deviations are 15.3 and 18.2 in O.C. and non-O.C. users respectively
PILOT STUDY
42
• Here we have a desired power in mind and want to find the sample sizes necessary to achieve a power of 80% to detect a population difference in blood pressure of five or more mmHg between the two groups
PILOT STUDY
7/9/2014
15
43
• We can find the necessary sample size(s) of this study if we specify .– α-level of test (.05)– Specific values for μ1 and μ2 (specific HA) and
hence d = μ1 - μ2: usually represents the minimum scientific difference of interest)
– Estimates of σ1 and σ2
– The desired power(.80)
PILOT STUDY
44
• How can we specify d = μ1 - μ2 and estimate population SDs?– Researcher knowledge—experience makes for
good educated guesses– Make use of pilot study data!
PILOT STUDY
45
• Fill in blanks from pilot study– -level of test (.05)– Specific HA (μOC=132.8, μNO OC =127.4), and
hence d = μ1 - μ2 = 5.4 mmHg
– Estimates of σOC ( = 15.3) and σNO OC (=18.2)– The power we desire (.80)
PILOT STUDY
7/9/2014
16
46
• Given this information, we can use Stata to do the sample size calculation
• “sampsi” command– Command syntax (items in italics are numbers to
be supplied by researcher)
sampsi 1 2, alpha() power(power) sd1(1) sd2(2)
PILOT STUDY
47
STATA RESULTS
• Blood Pressure/OC example
Total N = 306
48
PILOT STUDY/STATA RESULTS
• Our results from Stata suggest that in order to detect a difference in BP of 5.4 mmHg (if it really exists in the population) with high (80%) certainty, we would need to enroll 153 O.C. users and 153 non-users
• This assumed that we wanted equal numbers of women in each group!
7/9/2014
17
49
• Suppose we estimate that the prevalence of O.C. use amongst women 35–39 years of age is 20% – We wanted this reflected in our group sizes
• If O.C. users are 20% of the population, non-OC users are 80%– There are four times as many non-users as there
are users (4:1 ratio)
PILOT STUDY/STATA RESULTS
50
• We can specify a ratio of group sizes in Stata
– Again, using “sampsi” command with “ratio”option
sampsi 1 2, alpha() power(power) sd1(1) sd2(2) ratio(n2/n1)
PILOT STUDY/STATA RESULTS
• Blood Pressure/OC example
51
STATA RESULTS
Total N = 430
7/9/2014
18
Section D
Sample Size Determination for Comparing Two Proportions
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
53
POWER FOR COMPARING TWO PROPORTIONS
• Same ideas as with comparing means
54
• We can find the necessary sample size(s) of this study if we specify .– α-level of test – Specific values for p1 and p2 (specific HA) and
hence d = p1 - p2: usually represents the minimum scientific difference of interest)
– The desired power
PILOT STUDY
7/9/2014
19
55
• Given this information, we can use Stata to do the sample size calculation
• “sampsi” command– Command syntax (items in italics are numbers to
be supplied by researcher)
sampsi p1 p2, alpha() power(power)
PILOT STUDY
56Continued
ANOTHER EXAMPLE
• Two drugs for treatment of peptic ulcer compared (Familiari, et al., 1981)– The percentage of ulcers healed by pirenzepine (drug
A) and trithiozine (drug B) was 77% and 58% based on 30 and 31 patients respectively (p-value = .17), 95% CI for difference in proportions healed was (-.04, .42)
– The power to detect a difference as large as the sample results with samples of size 30 and 31 respectively is only 25%
)( B DRUGA DRUG p-p
Healed Not Healed Total
Drug A 23 7 30
Drug B 18 13 31
• As a clinician, you find the sample results intriguing – want to do a larger study to better quantify the difference in proportions healed
• Redesign a new trial, using aforementioned study results to “guestimate” population characteristics– Use pDRUG A = .77 and pDRUG B = .58– 80% power– = .05
• Command in Statasampsi .77 .58, alpha (.05) power (.8)
57
EXAMPLE
7/9/2014
20
• Command in Statasampsi .77 .58, alpha (.05) power (.8)
58
EXAMPLE
• What would happen if you change power to 90%?sampsi .77 .58, alpha (.05) power (.9)
59
EXAMPLE
60
• Suppose you wanted two times as many people on trithiozone (“Drug B”) as compared to pirenzephine(“Drug A”)– Here, the ratio of sample size for Group 2 to
Group 1 is 2.0
• Can use “ratio” option in “sampsi” command
EXAMPLE
7/9/2014
21
• Changing the ratio sampsi .77 .58, alpha (.05) power (.9) ratio(2)
61
Example
62
SAMPLE SIZE FOR COMPARING TWO PROPORTIONS
• A randomized trial is being designed to determine if vitamin A supplementation can reduce the risk of breast cancer– The study will follow women between the ages of
45–65 for one year– Women were randomized between vitamin A and
placebo• What sample sizes are recommended?
63
BREAST CANCER/VITAMIN A EXAMPLE
• Design a study to have 80% power to detect a 50% relative reduction in risk of breast cancer w/vitamin A
(i.e. )
using a (two-sided) test with α-level = .05• To get estimates of proportions of interest:
- using other studies, the breast cancer rate in the controls can be assumed to be 150/100,000 per year
50.PLACEBO
VITA
p
pRR
7/9/2014
22
• A 50% relative reduction: ifthen,
So, for this desired difference in the relative scale:
Notice, that this difference on the absolute scale, , is much smaller in magnitude:
64
100,000
750.5
100,000
150AVITp
50.PLACEBO
VITA
p
pRR PLACEBOAVIT pp 50.
PLACEBOAVIT pp
%075.000075.0 100,000
75
BREAST CANCER/VITAMIN A EXAMPLE
65
BREAST CANCER SAMPLE SIZE CALCULATION IN STATA
• “sampsi” commandsampsi .00075 .0015, alpha(.05) power(.8)
66
BREAST CANCER SAMPLE SIZE CALCULATION IN STATA
• You would need about 34,000 individuals per group• Why so many?
– Difference between two hypothesized proportions is very small: = .00075
We would expect about 50 cancer cases among the controls and 25 cancer cases among the vitamin A group
2534,000100,000
75:A vitamin
5134,000100,000
150:placebo
PLACEBOAVIT pp
7/9/2014
23
• Suppose you want 80% power to detect only a 20% (relative) reduction in risk associated with vitamin A
A 20% relative reduction: if then
So, for this desired difference in the relative scale:
Notice, that this difference on the absolute scale, , is much smaller in magnitude:
67
100,000
1200.8
100,000
150AVITp
BREAST CANCER/VITAMIN A EXAMPLE
80.PLACEBO
VITA
p
pRR
PLACEBOAVIT pp 80.
PLACEBOAVIT pp
%03.00003.0 100,000
30
• “sampsi” commandsampsi .0012 .0015, alpha(.05) power(.8)
68
BREAST CANCER SAMPLE SIZE CALCULATION IN STATA
69
BREAST CANCER—VITAMIN A EXAMPLE REVISITED
• You would need about 242,000 per group!• We would expect 360 cancer cases among the
placebo group and 290 among vitamin A group
7/9/2014
24
70
AN ALTERNATIVE APPROACH—DESIGN A LONGER STUDY
• Proposal– Five-year follow-up instead of one year
Here:
pVITA≈5×.0012 = .006pPLACEBO≈5×.0015 = .0075
• Need about 48,400 per group– Yields about 290 cases among vitamin A and 360
cases among placebo• Issue
– Loss to follow-up
Section E
More Examples
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
Johns Hopkins Practice-based Opportunity for Promotion of Weight Reduction (POWER) Trial: • a comparative effectiveness trial that tested
two practical behavioral weight loss interventions in obese medical outpatients with cardiovascular risk factors
72
HYPOTHESIS TESTINGEXAMPLE
7/9/2014
25
HYPOTHESIS TESTINGEXAMPLE
73
Control
Remote
In-Person
Randomization
= Measured weights and other outcomes
Baseline 6 Mo 12 Mo 24 Mo
HYPOTHESIS TESTINGEXAMPLE
Study Design
74
Hypotheses: • Primary:
– The Remote arm will produce 2.75kg greater mean weight loss over the 24 month trial period than the Control arm
– The In-Person arm will produce 2.75kg greater mean weight loss over the 24 month trial period than the Control arm
• Secondary:– In-Person vs. Remote, other weight outcomes (%
weight loss, % reached goal), other health outcomes…75
HYPOTHESIS TESTINGEXAMPLE
7/9/2014
26
Considerations for selecting the primary hypotheses in POWER trial• Weight loss can be quantitatively measured with
high precision, easy interpretation• Continuous outcomes typically allow greater
statistical power• Weight loss as small as 2.75 kg has been shown to
produce health benefits, so clinically relevant• Difference in intervention effects between Remote
vs In-Person not clear but expected to be smaller• Type-I error adjustment for multiple comparisons
76
HYPOTHESIS TESTINGEXAMPLE
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
77
Back to the POWER trial example:
Minimum Detectable Difference (MDD) in Kg Minimum Detectable Difference α=0.025A α=0.05B
Sample size Baselinec 18m ∆ σ VIF ρ β=0.80 β=0.90 β=0.80 β=0.90 300 (100/group)
103.83 -3.41 16.53 1.0 0.85 2.68 3.10 3.60 4.15 16.53 1.5 0.85 2.75 3.27 4.40 5.05
350 (117/group)
16.53 1.0 0.85 2.48 2.90 3.30 3.85 16.53 1.5 0.85 3.05 3.55 4.05 4.70
400 (133/group)
16.53 1.0 0.85 2.33 2.70 3.10 3.60 16.53 1.5 0.85 2.85 3.30 3.80 4.40
APower if either of 2 tests is positive at α=0.05/2; BPower for single test at α=0.05; CBaseline, 18m ∆ (= change from EST+DASH intervention in PREMIER, net of control), and rho(ρ, correlation between baseline and 18m), all from PREMIER trial unless otherwise indicated.
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
78
Then, budget planning indicated that we could afford N = 390 (130/group), so the table evolved to:
Net tx effect
in Kg
α=0.025A (at least 1 of 2
significant)
α=0.025B (1 of 1
significant)
α=0.05C (1 of 1
significant) 2.00 0.52 0.31 0.41 2.50 0.72 0.47 0.582.75 0.80 0.56 0.66 3.00 0.87 0.64 0.74 3.25 0.92 0.72 0.803.50 0.95 0.78 0.86 4.00 0.99 0.89 0.934.50 1.00 0.95 0.97
7/9/2014
27
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
79
Recall POWER has 2 primary hypotheses to test:
H0: Remote - Control = 0; H1: Remote - Control > 2.75kg
H0’: In-Person - Control = 0; H1’: In-Person - Control > 2.75kg
If each test is controlled at α = 0.05, then the prob. for both tests to be type-I error free would be
(0.95)(0.95) = 0.9025
So the prob. for at least 1 test to encounter type-I error would be 1 - 0.9025 = .0975 > 0.05
=> Multiple comparisons inflate overall α error
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
80
If each of the 2 tests is controlled at α = 0.05/2 = 0.025, then the prob. for both independent tests to be type-I error free would be
(0.975)(0.975) = 0.950625
So the prob. for at least 1 test to encounter type-I error would be 1 - 0.950625 = .049375 < 0.05
Bonferroni adjustment for multiple comparisons:
if we are testing k hypotheses and want to control the overall α at the nominal 0.05 level, then we should test each at α = 0.05/k level
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
81
Bonferroni correction is conservative in terms of statistical power, requires greater sample size
Holm method:
• Compare the smaller p-value of the 2 tests against α = 0.05/2 = 0.025. If significant, then compare the larger p-value against α = 0.05. If the first test is not significant, then declare both tests as non-significant.
• Family-wise type I error rate < 0.05
• Provide better statistical power then the Bonferronicorrection
7/9/2014
28
SAMPLE SIZE / POWER CALCULATION OTHER CONSIDERATIONS
82
Then, budget planning indicated that we could afford N = 390 (130/group), so the table evolved to:
Net tx effect
in Kg
α=0.025A (at least 1 of 2
significant)
α=0.025B (1 of 1
significant)
α=0.05C (1 of 1
significant) 2.00 0.52 0.31 0.41 2.50 0.72 0.47 0.582.75 0.80 0.56 0.66 3.00 0.87 0.64 0.74 3.25 0.92 0.72 0.803.50 0.95 0.78 0.86 4.00 0.99 0.89 0.934.50 1.00 0.95 0.97
HYPOTHESIS TESTINGEXAMPLE
83
HYPOTHESIS TESTINGEXAMPLE
84
7/9/2014
29
Hypotheses: – Ca + Vit D arm will have 22% reduction of colon CA
incidence over average 8 years of trial period than the Placebo arm
Log-rank test at α = 0.05, with incidence rate of 0.2% per year in the placebo arm, and pre-fixed N = 35,000
Plug these assumed parameters into the calculator=> power = 81% (83% stated in the paper)
85
HYPOTHESIS TESTINGEXAMPLE
86
Sample size / power evaluation is a critical part of study design for hypothesis testing• Smaller α error larger sample size• Smaller β error (higher power) larger sample
size• Higher variability in outcome larger sample
size• Smaller difference to detect larger sample
size
Need to account for type I error when testing multiple hypotheses
SUMMARY
Section F
Sample Size and Study Design Principles: A Brief Summary
Institute for Clinical and Translational ResearchIntroduction to Clinical Research
7/9/2014
30
88
DESIGNING YOUR OWN STUDY
• When designing a study, there is a tradeoff between :– Power– -level– Sample size – Minimum detectable difference (specific HA)
• Industry standard—80% power, = 0.05
89
• What if sample size calculation yields group sizes that are too big (i.e., cannot afford or will take too long to do the study) or are very difficult to recruit enough participants for the study?– Increase minimum difference of interest– Increase -level– Decrease desired power– Select a more precise primary outcome
DESIGNING YOUR OWN STUDY
90
• Sample size calculations are an important part of study proposal– Study funders want to know that the researcher
can detect a relationship with a high degree of certainty (should it really exist)
• Even if you anticipate confounding factors, these approaches are the best you can do and are relatively easy
• Accounting for confounders requires more information and sample size has to be done via computer simulation—consult a statistician!
DESIGNING YOUR OWN STUDY
7/9/2014
31
91
• What is this specific alternative hypothesis? – Power can only be calculated for a specific
alternative hypothesis– When comparing two groups this means providing
value of the true population mean (proportion) for each group
DESIGNING YOUR OWN STUDY
92
• What is this specific alternative hypothesis?– Therefore specifying a difference between the two
groups, typically based on the smallest difference of scientific / clinical interest
– This difference is frequently called effect size– When sample size, power and other parameters are
fixed for a study, this difference derived from the calculation is frequently called minimum detectable difference
– Typically aiming for a sample size that would afford sufficient power to detect the minimum detectable difference that covers the smallest difference of scientific / clinical interest
DESIGNING YOUR OWN STUDY
93
• Where does this specific alternative hypothesis come from?– Hopefully, not the statistician!– As this is generally a quantity of scientific
interest, it is best estimated by a knowledgeable researcher or pilot study data
– This is perhaps the most difficult component of sample size calculations, as there is no magic rule or “industry standard”
DESIGNING YOUR OWN STUDY
7/9/2014
32
94
CALCULATING POWER
• In order to calculate power for a study comparing two population means, we need the following:– Sample size for each group– Estimated (population) means, μ1 and μ2 for each
group—these values frame a specific alternative hypothesis (usually minimum difference of scientific interest)
– Estimated (population) SD’s, σ1 and σ2
– -level of the hypothesis test
OTHER CONSIDERATIONS
Obtaining high quality data through best possible study design and rigorous research conduct are critical to the success of clinical investigations
Work with a biostatistician starting at the study design stage
95