How big should my study be? The science and art of choosing your sample size Mark Pletcher Designing Clinical Research Summer 2013

How big should my study be?The science and art of choosing your

sample size

Mark PletcherDesigning Clinical Research

Summer 2013

Choosing sample size

• A fundamental decision– A critical determinate of statistical power– A critical determinate of feasibility


• “Nothing focuses the mind like a sample size calculation”– Mike Kohn


• Ingredients for a sample size calculation– “Focusing the mind” on measurements, etc

• Tools for making the calculation– Tables in the book, Stata, online calculators

• Examples – What drives sample size?– Modifying study design to reduce sample size

• Getting to a final answer for your study– Round peg/square hole? MAKE IT FIT!– Unknown assumptions? GUESS!– Persuasive writing and justification

Example 1

• Alcohol and atrial fibrillation incidence

As an example, we might wish to assess alcohol as a predictor of incident atrial fibrillation. Assuming 20% of the cohort will drink 2 or more alcoholic beverages daily, we estimate that 2920 participants (584 drinking 2+/day) with full data and longitudinal follow-up over 5 years would provide 90% power to detect a 5% difference (15% vs. 10% in controls) in the incidence of AF using a two-tailed alpha of 0.05.

Example 1

• Alcohol and atrial fibrillation incidence


Example 1

(boiled down…)

– If………..[assumptions]– Then……a sample size of 2920 will give us a

90% chance of ending up with a “statistically significant” result

Example 1

(boiled down…)

– If………..[assumptions]– Then……a sample size of 2920 will give us a

90% chance of ending up with a “statistically significant” result

What are the key assumptions?

Key assumptions

• Assumptions (aka “ingredients”)– Testable hypothesis

• Clear measurements• Usually phrased as a “null” hypothesis

– Planned statistical test– Assumption about variability of measurements– An effect size– “Alpha” error (1-sided or 2-sided) threshold

Key assumptions


“Does alcohol cause atrial fibrillation?”

Key assumptions



Too vague!

Key assumptions



“Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65?”

Key assumptions



“Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65?

Better, but not phrased as a “null” hypothesis

Key assumptions



“Is drinking 2+ drinks/day (vs. drinking less) associated with incident atrial fibrillation at 5 years in adults over age 65?

“H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65”

The Null Hypothesis…

• Why do we need a NULL hypothesis?

The Null Hypothesis…

• Why do we need a NULL hypothesis?– Theoretically speaking, we can only

DISPROVE something (or say it’s unlikely), we can never PROVE something*

– So we state a NULL hypothesis, and then say that it is very unlikely to be true


*Karl Popper, The Logic of Scientific Discovery, 1934

Key assumptions




Key assumptions

• Assumptions (aka “ingredients”)– Planned statistical test

PREDICTOR

OUTCOME Dichotomous Continuous

Dichotomous chi-squared t-test

Continuous t-test correlation

Key assumptions


PREDICTOR




Need to know your variable types!

Key assumptions


Dichotomous variables have only 2 values.

Male vs. femaleDead vs. aliveHypertension vs. no hypertensionSmoker or non-smoker

Key assumptions


Continuous variables have many values

Blood pressureAgeQuality of lifeWaist circumference

Key assumptions


What kind of variable is alcohol use?

Key assumptions



Drinks/dayDrinker vs. non-drinkerHeavy (2+) vs. light drinker (<2 drinks/day)Non-drinker vs. occasional vs. regular vs. heavy

Key assumptions




Not normally distributed?

Key assumptions




4-level categorical variable?

Key assumptions




For the purposes of sample size calculation, you may want to dichotomize…

Easy!

Key assumptions


What kind of variable is atrial fibrillation?

Person with vs. without afibFrequency of episodesBeats/minuteYears to onset of afib (“time to event”)Proportion onset of afib at 5 years

Key assumptions




Normally distributed?

Key assumptions




“Survival analysis”

Key assumptions




Dichotomous (easy)

Key assumptions


PREDICTOR





Key assumptions




Key assumptions

• Assumptions (aka “ingredients”)– Variability and effect size for chi-squared test

Probability of outcome in each predictor group

P1 = 10%

P2 = 15%

Key assumptions



P1 = 10% (prob afib at 5 years if <2 drinks)

P2 = 15% (prob afib at 5 years if 2+ drinks)

Key assumptions





Effect size clearly delineated:

Risk difference = 5%; relative risk = 1.5

Key assumptions





Variability is “embedded”…varies with P1…

Key assumptions





Bottom line: Giving both probabilities is clear and unambiguous (…wait for counter-examples)

Key assumptions




Key assumptions

• Assumptions (aka “ingredients”)– “Alpha” error (1-sided or 2-sided) threshold

Standard p-value threshold: 0.05

(“Type I error” rate = “alpha”)

Key assumptions




Standard choice: 2-sided test

Key assumptions





Unless uninterested in a large effect in the opposite direction as you expect, choose 2-sided - the clear, safe choice almost always

Key assumptions


Standard p-value threshold: 0.05(“Type I error” rate = “alpha”)


Power = 1- “beta” error(so 90% power = 10% beta error)

Example 1

• H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and incident atrial fibrillation at 5 years in adults over age 65

• 2 dichotomous variables chi-squared test• P1 = 10%• P2 = 15%• 2-sided alpha = 0.05, beta = .10

Example 1

Go to page 75 of DCR (4th edition)…



Example 1

Sample size = 958 PER GROUP = 1916 total




Example 1

Sample size = 1252 x 2 = 2504 total

Go to page 86 of DCR (3rd edition)…


• 2 dichotomous variables chi-squared test• P1 = 15%• P2 = 20% Risk diff = 5%• 2-sided alpha = 0.05, beta = .10

Example 1




• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 25% Risk diff = 5%• 2-sided alpha = 0.05, beta = .10

Example 1




• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 30% RR = 1.5• 2-sided alpha = 0.05, beta = .10

Example 1




• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 30% RR = 1.5• 2-sided alpha = 0.05, beta = .10

Not enough to specify an effect size of “5%” or “RR = 1.5” – need to give both probabilities

Back to our paragraph…


Back to our paragraph…


Unequal sample sizes!! What do we do?

Tools for making the calculation…

• Options for getting the final answer:– Look at a table in the book (DCR)– Try an online calculator, like at:

• http://www.stat.ubc.ca/~rollin/stats/ssize/• http://www.dcr-4.net

– Fancy program (need to download): PSpower• http://

biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize

– Use Stata (sampsi, launch dialog box)

http://www.stat.ubc.ca/~rollin/stats/ssize/

http://www.dcr-4.net/

http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize



Tools for making the calculation…

• Options for getting the final answer:– Look at a table in the book (DCR)– Try an online calculator, like at:

• http://www.stat.ubc.ca/~rollin/stats/ssize/• http://www.dcr-4.net this one

– Fancy program (need to download): PSpower• http://

biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize

– Use Stata (sampsi, launch dialog box)Only some of these allow you to estimate sample size with unequal groups

http://www.stat.ubc.ca/~rollin/stats/ssize/





Example 1

Sample size = 584 + 2336 = 2920

Using the http://www.dcr-4.net calculator…


• 2 dichotomous variables chi-squared test• P1 = 10%• P2 = 15%• 2-sided alpha = 0.05, beta = .10• Proportion with P2 = 20% (ratio = 1:4)


Example 2

• Does alcohol consumption cause high blood pressure?

Example 2

• H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and prevalent hypertension in middle-aged women

Example 2


• 2 dichotomous variables chi-squared test

Example 2


• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 27%

Example 2


• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 27%• 2-sided alpha = 0.05, beta = .10• Proportion with P2 = 20% (ratio = 1:4)

Sample size = 503 + 2014 = 2518

Using dcr-4.net…

Example 2


• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 27%• 2-sided alpha = 0.05, beta = .10• Proportion with P2 = 20% (ratio = 1:4)• I only have enough money to study 1000 people!

Example 2


• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 27%• 2-sided alpha = 0.05, beta = .10• Proportion with P2 = 20% (ratio = 1:4)• I only have enough money to study 1000 people!

(Stata DEMO: sampsi command)

Example 2


• 2 dichotomous variables chi-squared test• P1 = 20%• P2 = 27%• 2-sided alpha = 0.05, beta = .10• Proportion with P2 = 20% (ratio = 1:4)• I only have enough money to survey 1000 people!

If sample size = 200/800=1000, power=.54

Using Stata sampsi command (dialog box)…

“Underpowered” studies…

• Is it unethical to conduct a study with only 54% power?


• Is it unethical to conduct a study with only 54% power? – Traditional view: <80% power is unethical



– Newer thinking:

Bacchetti, BMC Medicine 2010, 8:17



– Newer thinking: Not necessarily!



– Newer thinking: Not necessarily!

– But still, you might want to give yourself a better chance at statistical significance…

Example 2, redesigned

• H0: There is no association between drinking 2+ drinks/day (vs. drinking less) and systolic blood pressure in middle-aged women

Another option: Redesign with a continuous outcome



PREDICTOR






• 1 dichotomous, 1 continuous t-test




– Assumption about variability of measurements– An effect size




– Assumption about variability of measurements– An effect size

For continuous outcome, specify mean + standard deviation for each group



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 15 (from CARDIA)• Mean2 = 116 +/- 15 (guess at effect size?)

How do I find standard deviation?

• Search for a published study– Be careful you get SD not SE…

• Your own pilot study

• Take reasonable range, divide by 4

• Guess!

How do I find standard deviation?

• Search for a published study– Be careful you get SD not SE…

• Your own pilot study

• Take reasonable range, divide by 4

• Guess!

• Beware:

SD of SBP is NOT equal to

SD of change in SBP



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 15 (from CARDIA)• Mean2 = 116 +/- 15 (guess at effect size?)• 2-sided alpha = 0.05, beta = .10








We need E/S!!!


Standardized effect size…

• E/S = “Standardized effect size”

= Effect size, in terms of variability

= Difference in means / SD



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 15• Mean2 = 116 +/- 15 E/S = (116-111)/15 = .33• 2-sided alpha = 0.05, beta = .10






133-235 per group = ~400 total…





133-235 per group = ~400 total…if equal size!



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 15• Mean2 = 116 +/- 15 E/S = (116-111)/15 = .33• 2-sided alpha = 0.05, beta = .10• Proportion with 2+drinks/day = 20% (ratio = 1:4)

Stata sampsi command…

118 + 473 = 591 total





88 + 353 = 441 total





22 + 88 = 110 total

How to pick your effect size…

• If you knew the answer, you wouldn’t need to do the study!

• No right answer! No right method!*– Lowest possible interesting result?– Highest that you can justify as being possible?– Lowest that you can “afford”? (with fixed sample size)

• Clinical/scientific significance is key– Should be interesting, important and realistic

* - Bacchetti, BMC Medicine 2010, 8:17

How to pick your effect size…

• If you knew the answer, you wouldn’t need to do the study!

• No right answer! No right method!*– Lowest possible interesting result?– Highest that you can justify as being possible?– Lowest that you can “afford”? (with fixed sample size)

• Clinical/scientific significance is key– Should be interesting, important and realistic


After study completed, only the actual estimate and confidence interval are relevant*



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 10• Mean2 = 121 +/- 10 E/S = (121-111)/10 = 1• 2-sided alpha = 0.05, beta = .20• Proportion with 2+drinks/day = 20% (ratio = 1:4)


10 + 39 = 49 total

How can you reduce variability?...

• Variability derives from:– Actual population variation– Measurement error

• Reduce variability by reducing measurement error• Consider alternate designs:

– CHANGE over time within-people is often less variable than between-person differences



• 1 dichotomous, 1 continuous t-test• Mean1 = 111 +/- 10• Mean2 = 121 +/- 10 E/S = (121-111)/10 = 1• 2-sided alpha = 0.05, beta = .20• Proportion with 2+drinks/day = 20% (ratio = 1:4)


10 + 39 = 49 total

Redesign and changing assumptions reduced our sample size from 2455 to 49!

Which to choose?

• What is feasible?– What can you afford? – How many patients do you actually have access to?

• Can you convince yourself or a reader that:– Heavy alcohol might really increase SBP by 10 mmHg?– You can SUBSTANTIALLY reduce measurement error such that SD

goes from 15 to 10?– Or that being under-powered is actually OK? (Cite Bacchetti!*)


Which to choose?

• What is feasible?– What can you afford? – How many patients do you actually have access to?

• Can you convince yourself or a reader that:– Heavy alcohol might really increase SBP by 10 mmHg?– You can SUBSTANTIALLY reduce measurement error such that SD

goes from 15 to 10?– Or that being under-powered is actually OK? (Cite Bacchetti!*)


No right answers! This is an art, not (just) a science

What we haven’t addressed…

• 2 continuous variables – “correlation”• Descriptive studies (including estimating

sensitivity and specificity)• 3+ categorical variable• Non-normally distributed continuous var’s • Survival analysis• Loss to follow-up• Regression and adjustment for other variables

Critical advice

• Fit your study into the mold!– Dichotomize any variable that “doesn’t fit”– Guess when you need to– Show results of alternate guesses (“sensitivity

analyses”)

• It’s often OK to work backwards

Critical advice

• Fit your study into the mold!– Dichotomize any variable that “doesn’t fit”– Guess when you need to– Show results of alternate guesses (“sensitivity

analyses”)

• It’s often OK to work backwards

• It is really important for you to get all the way through a power calculation

Other key points

• Sample size calculations help you clarify your thinking about measurements

• Present effect size unambiguously– Give BOTH %’s or means, etc

• Watch out for unequal group sizes• “Always” choose 2-sided alpha• More power with continuous variables• More power with better measurement (with less

error, less noise)

Acknowledgements

• Mike Kohn – advice, tools

• Steve Cummings – very nice lecture!

Thanks!

• Questions?

Documents

How big should my study be? The science and art of choosing your sample size Mark Pletcher Designing Clinical Research Summer 2013