Upload
hakhanh
View
223
Download
0
Embed Size (px)
Citation preview
Motivation Probability basics Power calculation Exercises
Sample Size and Power Calculations
IPA/JPAL/CMF Training
Limuru, Kenya28 July 2010
Owen OzierDepartment of Economics
University of California at Berkeley
Slides revised 14 September 2010
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Thanks and Introduction
Thanks to everyone from JPAL/IPA who made this happen!
My background: randomized evaluations in Busia, Kenya.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Thanks and Introduction
Thanks to everyone from JPAL/IPA who made this happen!
My background: randomized evaluations in Busia, Kenya.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data......but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data......but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data......but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data......but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data...
...but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Motivation
Program evaluation:bringing the scientific method to social science
First steps:
Propose a hypothesis
Design an experiment to test the hypothesis
This involves gathering data......but how much data will we need?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Usually a lot
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
How this will work
“Numerical data should be kept for eternity; it’s great stuff.”- Glenn Stevens, Boston University
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Outline
1 Motivation
2 Probability basicsCoin tossing
3 Power calculationTerminology/ConceptsThe Basic CalculationClustersCovariatesDetails
4 Exercises
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Outline
1 Motivation
2 Probability basicsCoin tossing
3 Power calculationTerminology/ConceptsThe Basic CalculationClustersCovariatesDetails
4 Exercises
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Outline
1 Motivation
2 Probability basicsCoin tossing
3 Power calculationTerminology/ConceptsThe Basic CalculationClustersCovariatesDetails
4 Exercises
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Outline
1 Motivation
2 Probability basicsCoin tossing
3 Power calculationTerminology/ConceptsThe Basic CalculationClustersCovariatesDetails
4 Exercises
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?
Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
A hypothesis and a kind of test
“Null” Hypothesis: the coin is fair50% chance of heads, 50% chance of tails.
Structure of the data:Toss the coin a number of times, count heads.
The test:“Accept” hypothesis if within some distance of the mean under the null;“Reject” otherwise.
If we only had 4 tosses of the coin, what distance cutoffs could we use?Could accept (A) never,(B) when exactly the mean (2 heads),(C) when within 1 (1, 2, or 3 heads),or (D) always.
We don’t want to reject the null when it is true, though;How much accidental rejection would each possible cutoff give us?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Distribution of possible results
0.06
0.25
0.38
0.25
0.06
0.1
.2.3
.4P
roba
bilit
y
0 1 2 3 4P(2)=.38; P(1...3)=.88; P(0...4)=1
Distribution of numbers of heads in 4 tosses of a fair coin
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Not enough data.
There is no way* to create such a test with four coin tossesso that the chance of accidental rejection under the “null” hypothesis(sometimes written H0) is less than 5%, a standard in social science.
* (Except the “never reject, no matter what” rule. Not very useful.)
What about 20 coin tosses?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Not enough data.
There is no way* to create such a test with four coin tossesso that the chance of accidental rejection under the “null” hypothesis(sometimes written H0) is less than 5%, a standard in social science.
* (Except the “never reject, no matter what” rule. Not very useful.)
What about 20 coin tosses?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Not enough data.
There is no way* to create such a test with four coin tossesso that the chance of accidental rejection under the “null” hypothesis(sometimes written H0) is less than 5%, a standard in social science.
* (Except the “never reject, no matter what” rule. Not very useful.)
What about 20 coin tosses?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Not enough data.
There is no way* to create such a test with four coin tossesso that the chance of accidental rejection under the “null” hypothesis(sometimes written H0) is less than 5%, a standard in social science.
* (Except the “never reject, no matter what” rule. Not very useful.)
What about 20 coin tosses?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
Distribution of possible results
0.000.000.000.000.000.01
0.04
0.07
0.12
0.16
0.18
0.16
0.12
0.07
0.04
0.010.000.000.000.000.00
0.0
5.1
.15
.2P
roba
bilit
y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20P(10)=.18; P(9...11)=.5; P(8...12)=.74; P(7...13)=.88; P(6...14)=.96
Distribution of numbers of heads in 20 tosses of a fair coin
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
The normal distribution
0.1
.2.3
.4P
roba
bilit
y de
nsity
−4 −2 0 2 4Deviations from mean
Distribution (limiting) of any well−behaved residual error
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Coin tossing
As sample size increases more:
0.0
2.0
4.0
6.0
8P
roba
bilit
y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Distribution of numbers of heads in 100 tosses of a fair coin
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Types of error
Test result
“Reject Null,” “Accept Null,”Find an effect! Conclude no effect.
Truth:There is an effect Great! “Type II Error”
(low power)Truth:
There is NO effect “Type I Error” Great!(test size)
The probability of Type I error is what we just discussed: the “size” of the test.By convention, we are usually interested in tests of “size” 0.05.
The probability of Type II error is also very important;If P(failure to detect an effect) = 1 − κ, then the power of the test is κ.Power depends on anticipated effect size; typical desired power is 80% or higher.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Types of error
Test result
“Reject Null,” “Accept Null,”Find an effect! Conclude no effect.
Truth:There is an effect Great! “Type II Error”
(low power)Truth:
There is NO effect “Type I Error” Great!(test size)
The probability of Type I error is what we just discussed: the “size” of the test.By convention, we are usually interested in tests of “size” 0.05.
The probability of Type II error is also very important;If P(failure to detect an effect) = 1 − κ, then the power of the test is κ.Power depends on anticipated effect size; typical desired power is 80% or higher.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Types of error
Test result
“Reject Null,” “Accept Null,”Find an effect! Conclude no effect.
Truth:There is an effect Great! “Type II Error”
(low power)Truth:
There is NO effect “Type I Error” Great!(test size)
The probability of Type I error is what we just discussed: the “size” of the test.By convention, we are usually interested in tests of “size” 0.05.
The probability of Type II error is also very important;If P(failure to detect an effect) = 1 − κ, then the power of the test is κ.
Power depends on anticipated effect size; typical desired power is 80% or higher.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Types of error
Test result
“Reject Null,” “Accept Null,”Find an effect! Conclude no effect.
Truth:There is an effect Great! “Type II Error”
(low power)Truth:
There is NO effect “Type I Error” Great!(test size)
The probability of Type I error is what we just discussed: the “size” of the test.By convention, we are usually interested in tests of “size” 0.05.
The probability of Type II error is also very important;If P(failure to detect an effect) = 1 − κ, then the power of the test is κ.Power depends on anticipated effect size; typical desired power is 80% or higher.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Rejecting H0 in critical region
Area: α/2Area: α/2
0P
roba
bilit
y de
nsity
0
Null distn. False rejection probability α
Significance level (test size) α
Note: two-sided test.Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Under an alternative:
0P
roba
bilit
y de
nsity
0
Null Under 1 SE effect
Suppose true effect were 1 SE (Standard Error):
Note: two-sided test.Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Under an alternative:
0P
roba
bilit
y de
nsity
0
Null Under 1 SE effect
Power would only be approximately 0.17
Note: two-sided test.Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Under an alternative:
0P
roba
bilit
y de
nsity
0 1 2 3
Null Under 3 SE effect
Suppose true effect were 3 SE’s (Standard Errors):
Note: this example follows slides by Marc Shotland (JPAL).Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Under an alternative:
0P
roba
bilit
y de
nsity
0 1 2 3
Null Under 3 SE effect
Power would be approximately 0.85
Note: this example follows slides by Marc Shotland (JPAL).Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Power calculation, visually
Effect
0P
roba
bilit
y de
nsity
0
Null distn. Effect distn. t_α/2 size t_1−κ power
How the power calculation formula works
Note: see the related figure in the Toolkit paper.Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula: for power κ and size α,
Effect > (t1−κ + tα/2)SE (β̂) Notation: t1−p = pth percentile of the t dist’n.
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N≈ (z1−κ + zα/2)
√1
P(1 − P)
√σ2
N
In practice (Stata): sampsiNote: Stata uses the normal rather than t distribution (avoiding the D.O.F. issue).
Where do these numbers come from, σ2 and the effect size? Two basic options:
Consider standardized effect sizes in terms of standard deviations
Draw on existing data: What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
What if treatment is assigned by groups?
We have been thinking here of randomizing at the individual level.But in practice, we often randomize larger units.
Examples:
Entire schools are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Classes within a school are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Households are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual family member
Sub-district locations are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual road
Microfinance branch offices are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual borrower
What does this do?It depends on how much variation is explained by the group each individual is in.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
What if treatment is assigned by groups?
We have been thinking here of randomizing at the individual level.But in practice, we often randomize larger units.Examples:
Entire schools are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Classes within a school are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Households are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual family member
Sub-district locations are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual road
Microfinance branch offices are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual borrower
What does this do?It depends on how much variation is explained by the group each individual is in.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
What if treatment is assigned by groups?
We have been thinking here of randomizing at the individual level.But in practice, we often randomize larger units.Examples:
Entire schools are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Classes within a school are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual pupil
Households are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual family member
Sub-district locations are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual road
Microfinance branch offices are assigned to treatment or comparison groups,and we observe outcomes at the level of the individual borrower
What does this do?It depends on how much variation is explained by the group each individual is in.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Intuitive example
You want to know how close the upcoming Kenyan referendum will be.
Method 1: Randomly select 50 people from entire Kenyan population
Method 2: Randomly select 5 families, and ask ten members of each extendedfamily their opinion
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Intuitive example
You want to know how close the upcoming Kenyan referendum will be.
Method 1: Randomly select 50 people from entire Kenyan population
Method 2: Randomly select 5 families, and ask ten members of each extendedfamily their opinion
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Intuitive example
You want to know how close the upcoming Kenyan referendum will be.
Method 1: Randomly select 50 people from entire Kenyan population
Method 2: Randomly select 5 families, and ask ten members of each extendedfamily their opinion
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.
In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
The formula
Scale the effective standard error by:
DesignEffect =√
1 + (ngroupsize − 1)ρ
ρ (“rho”) is the intra-class correlation.In practice (Stata): loneway and sampclus
Recall earlier formula:
MDE = (t1−κ + tα/2)
√1
P(1 − P)
√σ2
N
But where does this ρ number come from? Two basic options:
Consider what might be reasonable assumptions
Draw on existing data (again):What is available that could inform your project?
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Intra-class correlations we have known
Data source ICC (ρ)Madagascar Math + Language 0.5
Busia, Kenya Math + Language 0.22Udaipur, India Math + Language 0.23Mumbai, India Math + Language 0.29
Vadodara, India Math + Language 0.28Busia, Kenya Math 0.62
Source: Marc Shotland (JPAL) slides
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome values
On fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)
On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Baseline and other covariate data
Controlling for covariates reduces the SE, so increases power.
Baseline survey data on the outcome of interest is especially useful when thewithin-individual correlation of outcomes (in the absence of treatment) ishigh. Consider academic test scores, for example.
However, it may risk contaminating an otherwise clean randomized design.(randomizing was supposed to avoid omitted variable biases in OLS!)
Stratifying is a good option:
On baseline outcome valuesOn fixed observables that have high predictive power(parents’ education, age, ...)On observables that delineate subpopulations you may want to test within(you might want to do another power calculation on this subsample)
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Power calculation, visually
0.0
1.0
2.0
3K
erne
l den
sity
est
imat
e
100 120 140 160 180Height (cm)
5.1 cm average per additional year in age; overall SD=12.1cm;
Kernel density estimate
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Power calculation, visually
0.0
2.0
4.0
6.0
8K
erne
l den
sity
est
imat
e
100 120 140 160 180Height (cm)
Age 8Age 9
Age 10Age 11
Age 12Age 13
Age 14
5.1 cm average per additional year in age; overall SD=12.1cm; within SD=7.21cm
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.
To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.
Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th century
Issues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random
(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises Terminology/Concepts The Basic Calculation Clusters Covariates Details
Some things to watch out for
Imperfect compliance with treatment:
The most straightforward approach is to decrease the effect size.
Multiple treatments:
Optimal treatment structure depends on several things;May be better to have control group larger than any of the treatment groups.To get started, calculate power for the smallest effect you hope to test.Multiple test correction may be needed later. (recently: Michael Anderson)
How randomization is done:
Alternative test based on statistical work of Fisher in early 20th centuryIssues with 1,2,3,1,2,3... if alphabetical, not a problem if order is random(recently: McKenzie and Bruhn)
Much more: check out Duflo, Glennerster, and Kremer:Using randomization in development economics research: a toolkit
“A first comment is that, despite all the precision of these formulas,power calculations involve substantial guess work in practice.”
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations
Motivation Probability basics Power calculation Exercises
Exercises
First make sure you have the files:
Make sure sampclus is installed; either by using findit sampclusor by copying the two files into C:/ado/plus/s/ (typically).
Various .do and .dta files
type-I-error-reject-null-when-true.do
type-II-error-fail-to-reject-when-alt-is-true.do
sampsi-syntax.do
icc-example.do
SampsiExerciseB.do -here we need to adjust the directory in the .do file before running.
Owen Ozier Sample Size and Power Calculations