Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Power Analysis OPRE Report # 2021-143
: : :URBAN • · · • I N S T I T U T E • E L E V A T E • T H E · D E B A T E
a,s Children's ~f...: Bureau
+ Portlan~ .~.t.~~T<; - oPRE Carrie Furrer, PhD, Portland State University
INTRODUCTIONS
· U R B A N · I N S T I T U T E •
L E A R N I N G G O A L S
Learning Goals
• Participants will be able to calculate and interpret an expectedeffect size.
• Participants will be able to calculate the minimum number ofobservations to demonstrate potential impacts.
3
· U R B A N · I N S T I T U T E •
Overview of Session
• Power and evaluation design 20 min
• Power’s four parameters 15 min
• Estimating power or estimating parameters? 10 min
• Interactive visualization 10 min
• Other design considerations 10 min
• Closing 10 min
4
Power and Evaluation Design
· U R B A N · I N S T I T U T E •
P O W E R & D E S I G N
Design Determines a Statistical Test
• Step 1: What is your evaluation question?
• Step 2: What variables are you using?
• Step 3: How will you analyze?
Power is understood in the context of a statistical test.
6
· U R B A N · I N S T I T U T E •
P O W E R & D E S I G N
A Statistical Test Is Used for Hypothesis Testing
• Hypothesis Testing
Null hypothesis H0: parameter = value
Alternative hypothesis H1: parameter ≠ value
• Statistical test = guidelines for assessing how much information we haveabout the null hypothesis
Power is embedded in hypothesis testing.
7
P O W E R & D E S I G N
Review of Hypothesis Testing
Actual
H0 True H0 False
· U R B A N · I N S T I T U T E •
Decision Reject H0 Type I error α Correct (1-β)
Accept H0 Correct (1-α) Type I I error β
Power
H0 = Null hypothesis: no difference between groups α = Alpha, or % chance of deciding there is a difference when there isn’t one Β = Beta, or % chance of deciding there is NOT a difference when there is one
8
· U R B A N · I N S T I T U T E •
D E F I N I N G P O W E R
Power Defined
• Technical: the probability of rejecting a null hypothesiswhen it is false
• Translation: the chance of detecting an effect thatactually exists
9
Your evaluation design influences whether you willbe able to demonstrate that your intervention has an
effect.
An underpowered design cannot answer yourevaluation question.
EXAMPLE: Testing program impact with a continuous outcome
Statistically significant difference between treatment vs. control groups
· U R B A N · I N S T I T U T E •
I M P A C T E X A M P L E
Design Determines a Statistical Test: Example
• Step 1: Evaluation Question
Did children in the intervention group experience greater well-being than children in the control group?
• Step 2: Outcome VariableWell-being (scale ranging from 0 to 50)
• Step 3: Statistical test
Independent sample t-test
12
· U R B A N · I N S T I T U T E •
I M P A C T E X A M P L E
Hypothesis Testing: Example
• Hypothesis TestingNo difference between
Null hypothesis H0: diff �= 0 groups
Alternative hypothesis H1: diff � ≠ 0 Difference between groups
• Statistical test"!#"� = "
" " p<.05 #! #$ "
$! $"
13
· U R B A N · I N S T I T U T E •
Important Parameters
C "!#""� = " "#! $
#"ratio $! $"
• t = test statistic
• Difference between groups (effect s ize or diff �)
• Variation in outcome scores (s)
• # in each group (n)
• p value threshold (alpha or α)
p < .05
14
Ho
a/2
. u R B A N . I N s T I I u I K •
Null Hypothesis: Example
95%+ probability that there is no difference between treatment and control
H0 = Null hypothesis: No d ifference between groups α = Alpha, or % chance of deciding there is a difference when there isn’t one diff � = difference between average scores of two groups
15
Ho
diff x = 10
· U R B A N · I N S T I T U T E ·
. '
\ .
Actual Parameters: Example Treatment = 60 Control = 50 diff � = 10
H0 = Null hypothesis: No difference between groups H1 = Alternative hypothesis: Difference between groups α = Alpha, or % chance of deciding there is a difference when there isn’t one diff � = difference between average scores of two groups
16
Ho
\
'
Power= 1 - ~
cliff x = 0 diff x = 10
· U R B A N · I N S T I T U T E ·
D E F I N I N G P O W E R
What Is Power?
17
P O W E R & D E S I G N
Back to Hypothesis Testing
Finding
H0 True H0 False
· U R B A N · I N S T I T U T E •
Decision Reject H0 Type I error α Correct (1-β)
Accept H0 Correct (1-α) Type I I error β
H0 = Null hypothesis: no difference between groups α = Alpha, or % chance of deciding there is a difference when there isn’t one β = Beta, or % chance of deciding there is NOT a difference when there is one
Power
18
KEY POINTS
1. Power is understood in the context of a statisticaltest.
2. Power is embedded in hypothesis testing.
3. Power is the chance of correctly rejecting the nullhypothesis (finding a true difference).
Power depends on four parameters.
Cohen's d: 1.6
· U R B A N · I N S T I T U T E ·
JOO
••• ·•J •• • •
!00
•• I' .. • • • ,... .. . . . .. . . .. . . . . . . . . .. : .. . \ -.• •• • •• • •
.00 • • y• • . ... .... '· ..... . 0 . \ •
•
• • 20 40 60 80 11
• • •• f I
-1.96 0 1.96
F O U R P A R A M E T E R S
Power’s Four Parameters (Again, Better Graphic)
Effect size Variance in Outcome
p value Sample size
21
X
· U R B A N · I N S T I T U T E •
F O U R P A R A M E T E R S
Ways to Increase Power
"!#""� = " "#! $
#"$! $"
p < .05
• Increase diff �
• Decrease s
• Increase n
• Increase p value
22
F O U R P A R A M E T E R S
Four Parameters: Effect Size Example
Effect size 5 8 12 15
· U R B A N · I N S T I T U T E •
Scenario 1 Scenario 2 Scenario 3 Scenario 4
Variance 20 20 20 20
Sample size 100 100 100 100
p value .05 .05 .05 .05
Power .42 .80 .99 1.00 INCREASING EFFECT SIZE
23
F O U R P A R A M E T E R S
Four Parameters: Variance Example
Scenario 1 Scenario 2 Scenario 3 Scenario 4 Effect size 5 5 5 5
Variance 5 10 15 20
· U R B A N · I N S T I T U T E •
Sample size 100 100 100 100
p value .05 .05 .05 .05
Power 1.00 .94 .65 .42 DECREASING VARIANCE
24
F O U R P A R A M E T E R S
Four Parameters: Sample Size Example
Scenario 1 Scenario 2 Scenario 3 Scenario 4 Effect size 5 5 5 5
Variance 15 15 15 15
Sample size 60 100 150 250
· U R B A N · I N S T I T U T E •
p value .05 .05 .05 .05
Power .44 .65 .82 .96 INCREASING SAMPLE SIZE
25
KEY POINT
4. Power has four primary parameters: effect size,variation, sample size, p value.
Estimating power orestimating parameters?
Cohen's d: 1.6
· U R B A N · I N S T I T U T E ·
JOO
••• ·•J •• • •
!00
•• I' .. • • • ,... .. . . . .. . . .. . . . . . . . . .. : .. . \ -.• •• • •• • •
.00 • • y• • . ... .... '· ..... . 0 . \ •
•
• • 20 40 60 80 11
• • •• f I
-1.96 0 1.96
E S T I M A T I N G P O W E R
Power = These Four Parameters:
Effect size Variance of
outcome
p value Sample size
28
· U R B A N · I N S T I T U T E •
E S T I M A T I N G P O W E R
You May Already Have Some Ingredients:
• Fixed or maximum sample size
• Desired effect size (pilot studies, standardized measurement,expert opinion)
• Variance (pilot studies, standardized measurement)
• Power >= .80 (common threshold)
• p value <.05 (common threshold)
29
· U R B A N · I N S T I T U T E •
E S T I M A T I N G P O W E R
Solving for Effect Size: Sensitivity Power Analysis
• Smallest effect size that is meaningful to the field
• Minimum detectable effect (MDE)—minimum effect you are ableto distinguish from a null effect
• You must already know (or be able to estimate):• variance of outcome• sample size• p-value• desired power threshold
30
· U R B A N · I N S T I T U T E •
E S T I M A T I N G P O W E R
Types of Effect Sizes
• Mean
• Mean difference/Cohen’s d
• Correlation coefficient/r
• Regression coefficient/β
• Proportion
• Proportion difference
• Odds ratio
31
· U R B A N · I N S T I T U T E •
E S T I M A T I N G P O W E R
Resources to Help Estimate Variance/Standard Deviation andEffect Size • Pilot or historical data
• Published literature
• Standardized measures
• Expert opinion
• Develop a range of possibilities to see how different values affectpower (scenarios)
32
· U R B A N · I N S T I T U T E •
E S T I M A T I N G P O W E R
Solving for Sample Size: A Priori Power Analysis
• A priori power analysis is done during planning phase
• What sample size (n) is necessary to produce smallest effect size ofinterest, or MDE (minimum detectable effect)?
• You must already know (or be able to estimate):• variance of outcome• effect size• p-value• desired power threshold
33
p
1:i' 0 ,8 e Q.
0,8 co. . -._.
0 ,4
~ 0 ,2
0
rm rs
O (On y
t IS - A O A. mber o g oups • 5,
10 30 50 70
IS) r pro )
as • funct ion of
0 3
1th
• d p ob
· U R B A N · I N S T I T U T E ·
90 110
po nt
T bl
0.35
=0
• 0.25
130 150 70 90 210 230 250 270
To I
mM e lu s n
0 In s eps o 0 t rough o 70
:
: o. n t p of 0,0
: ;a
E S T I M A T I N G P O W E R
Example of Power Plot (G*Power)
Scenarios help determine a range of conditions under which your evaluation design will allow for a minimum detectable effect.
34
KEY POINTS
5. It’s typical to solve for sample size or minimumdetectable effect.
6. It’s good practice to construct scenarios.
Interactive Activity
· U R B A N · I N S T I T U T E •
F O U R P A R A M E T E R S
Interactive Visualization
"!#""Cohen’s � = $#
Cohen’s cutoffs
0.2 = small
0.5 = medium
0.8 = large
Interactive Visualization of P ower
37
Other Design Considerations
The more complex your design, research questions, and statistical tests, the harder it is to get good power
estimates.
· U R B A N · I N S T I T U T E •
W H A T ’ S N E W
Other Design Considerations: Increase Power
• Using covariates
• Including repeated measures, correlated outcomes(dependent variables) over time
Both reduce random error (or variance) in the outcome variable.
39
· U R B A N · I N S T I T U T E •
W H A T ’ S N E W
Other Design Considerations: Less Power
• Violating assumptions of the statistical test (independence,normality, heterogeneity)
• Number of statistical tests
• Interactions, stratification
• Missing data (e.g., data quality, attrition, service engagement)
• Cluster correlated data
• Complex sampling/weighted designs
40
CLOSING REFLECTION
What ne w insights do you have about po wer a s it relates to your evaluation work?
This report is in the public domain. Permission to reproduce is not necessary. Suggested citation: Urban Institute et al. (2021). Slide Deck Session 10: Power Analysis - Child Welfare Evidence-Building Academy. OPRE Report 2021-143, Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, U.S. Department of Health and Human Services.