Upload
lythuan
View
229
Download
1
Embed Size (px)
Citation preview
Stata code for Sampling
Introduction
This is a basic introduction to the code that can be used for doing design effect and sample size calculations in Stata.
There are other options – many samplers use
SAS, Excel, or Optimal Design to do their calculations.
2
Design Effects
• In order to calculate design effects for a
particular dataset, you first need to define the
complex design for Stata.
• This example uses the common two-stage cluster
sample, but other more complicated designs are
also supported.
svyset clusterid [w= hh_weight_trimmed], strata(strataid)
3
To simply calculate the design effects for the overall sample, use the following commands:
svy: mean hhsize
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 16 Number of obs = 3265
Number of PSUs = 409 Population size = 7245851
Design df = 393
--------------------------------------------------------------
| Linearized
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
hhsize | 5.166811 .0668891 5.035306 5.298316
--------------------------------------------------------------
estat effects
----------------------------------------------------------
| Linearized
| Mean Std. Err. DEFF DEFT
-------------+--------------------------------------------
hhsize | 5.166811 .0668891 1.76149 1.32721
----------------------------------------------------------
4
Over subpopulations:
svy: mean hhsize, over (rural) (running mean on estimation sample)
Survey: Mean estimation
Number of strata = 16 Number of obs = 3265
Number of PSUs = 409 Population size = 7245851
Design df = 393
1: rural = 1
2: rural = 2
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
hhsize |
1 | 5.436019 .0809182 5.276932 5.595105
2 | 4.41353 .10837 4.200473 4.626587
--------------------------------------------------------------
estat effects, srssubpop
1: rural = 1
2: rural = 2
----------------------------------------------------------
| Linearized
Over | Mean Std. Err. DEFF DEFT
-------------+--------------------------------------------
hhsize |
1 | 5.436019 .0809182 1.58701 1.25977
2 | 4.41353 .10837 2.04029 1.42839
----------------------------------------------------------
5
Design Effects
Then to calculate ρ, use the following formula:
where m is the cluster size. You will know the cluster size either from the survey documentation or it can be calculated from the data:
gen n=1
collapse (sum) n, by (clusterid)
sum n
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
n | 409 7.982885 .1298585 7 8
6
)1(1 mdeff
Sample Size
Maize yield (kg/hect) • Mean: 802.6 kg/hect
• St. Dev. : 1027.79
• Deff: 1.28 (ρ=0.03)
Irrigation Usage
• Mean: 0.15
• Deff: 5.37 (ρ=0.49)
7
Fertilizer Usage
• Mean: 0.553
• Deff: 3.63 (ρ=0.29)
T values for two-tailed test
50% 80% 90% 95% 98% 99%
0.67449 1.281552 1.644854 1.95996 2.32635 2.57583
Sample Size
• What is the required sample size to detect a
10% change in maize yields?
• Note this is the same question as if you
wanted to see a 10% difference between
related questions in the same dataset (though
the variance would probably be lower).
8
)1(1)(4
2
2
12/1
2
m
D
zzn
sampsi 802.6 882.6, sd1(1027.79)
Estimated sample size for two-sample comparison of
means
Test Ho: m1 = m2, where m1 is the mean in population
1 and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (two-sided)
power = 0.9000
m1 = 802.6
m2 = 882.6
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 3469
n2 = 3469
sampsi 802.6 882.6, sd1(1027.79) onesided
Estimated sample size for two-sample comparison of
means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (one-sided)
power = 0.9000
m1 = 802.6
m2 = 882.6
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 2828
n2 = 2828
sampsi 802.6 882.6, sd1(1027.79) power (0.8) onesided
Estimated sample size for two-sample comparison of
means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (one-sided)
power = 0.8000
m1 = 802.6
m2 = 882.6
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 2041
n2 = 2041
sampsi 802.6 882.6, ratio(2) sd1(1027.79)
Estimated sample size for two-sample comparison of
means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (two-sided)
power = 0.9000
m1 = 802.6
m2 = 882.6
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 2.00
Estimated required sample sizes:
n1 = 2602
n2 = 5204
sampsi 802.6 1043.38, sd1(1027.79) onesided
Estimated sample size for two-sample comparison of
means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (one-sided)
power = 0.9000
m1 = 802.6
m2 = 1043.38
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 1.00
Estimated required sample sizes:
n1 = 313
n2 = 313
sampsi 802.6 882.6, sd1(1027.79) onesided
method(change) pre(1) post(1) r01(0.7)
Estimated sample size for two samples with repeated measures
Assumptions:
alpha = 0.0500 (one-sided)
power = 0.9000
m1 = 802.6
m2 = 882.6
sd1 = 1027.79
sd2 = 1027.79
n2/n1 = 1.00
number of follow-up measurements = 1
number of baseline measurements = 1
correlation between baseline & follow-up = 0.700
Method: CHANGE
relative efficiency = 1.667
adjustment to sd = 0.775
adjusted sd1 = 796.123
Estimated required sample
sizes:
n1 = 1697
n2 = 1697
sampsi 802.6 907.6, sd1(1027.79) sd2(1456.85) onesided
n1(1500) n2(1246)
Estimated power for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha = 0.0500 (one-sided)
m1 = 802.6
m2 = 907.6
sd1 = 1027.79
sd2 = 1456.85
sample size n1 = 1500
n2 = 1246
n2/n1 = 0.83
Estimated power:
power = 0.6897
sampsi 802.6 907.6, sd1(1027.79) sd2(1456.85) onesided
method(change) pre(1) post(1) r01(.7) n1(1500) n2(1246)
Estimated power for two samples with repeated measures
Assumptions: alpha = 0.0500 (one-sided)
m1 = 802.6
m2 = 907.6
sd1 = 1027.79
sd2 = 1456.85
sample size n1 = 1500
n2 = 1246
n2/n1 = 0.83
number of follow-up measurements = 1
number of baseline measurements = 1
correlation between baseline & follow-up = 0.700
Method: CHANGE
relative efficiency = 1.667
adjustment to sd = 0.775
adjusted sd1 = 796.123
adjusted sd2 = 1128.471
Estimated power:
power = 0.868
Can’t forget the Deff…
No design effects With design effects
2828 3620
2041 2612
313 401
849 1087
…where deff = 1.28
sampsi 802.6 1043.38, sd1(1027.79) onesided
n1 = 313
n2 = 313
sampclus, obsclus(10) rho(0.03)
Sample Size Adjusted for Cluster Design
n1 (uncorrected) = 313
n2 (uncorrected) = 313
Intraclass correlation = .03
Average obs. per cluster = 10
Minimum number of clusters = 80
Estimated sample size per group:
n1 (corrected) = 398
n2 (corrected) = 398