1
Ratio estimation under SRS Assume
Absence of nonsampling error SRS of size n from a pop of size N
Ratio estimation is alternative to under SRS, uses “auxiliary” information (X ) Sample data: observe yi and xi
Population information Have yi and xi on all individual units, or Have summary statistics from the population
distribution of X, such as population mean, total of X Ratio estimation is also used to estimate
population parameter called a ratio (B )
pty ˆ and ,ˆ ,
2
Uses Estimate a ratio
Tree volume or bushels per acre Per capita income Liability to asset ratio
More precise estimator of population parameters If X and Y are correlated, can improve upon
Estimating totals when pop size N is unknown Avoids need to know N in formula for
Domain estimation Obtaining estimates of subsamples
Incorporate known information into estimates Postratification
Adjust for nonresponse
t
pty ˆ and ,ˆ ,
3
Estimating a ratio, B Population parameter for the ratio: B
Examples Number of bushels harvested (y) per acre (x) Number of children (y) per single-parent
household (x) Total usable weight (y) relative to total
shipment weight (x) for chickens
U
U
x
y
xy
t
tB
4
Estimating a ratio
SRS of n observation units Collect data on y and x for each
OU Natural estimator for B ?
U
U
x
y
xy
t
tB
5
Estimating a ratio -2 Estimator for B
is a biased estimator for B
is a ratio of random variables
n
ii
n
ii
x
y
x
y
xy
t
tB
1
1
ˆ
ˆˆ
B
BBE ]ˆ[
B
6
Bias ofB
yS
xS
SSN
yyxxR
yxCorrR
SSRSBxnN
nBBE
y
x
yx
UiUi
N
i
yxxU
of deviation standard population
of deviation standard population
1
,
0
11ˆ
1
2
7
Bias is small if Sample size n is large Sample fraction n/N is large is large is small (pop std deviation for x) High positive correlation between X
and Y
(see Lohr p. 67)
xSUx
Bias of – 2 B
8
Estimated variance of estimator for B Estimator for
If is unknown?
iii
n
ii
n
iiie
U
e
xBye
en
xByn
s
xn
sNn
BV
ˆ
11ˆ
11
where
1]ˆ[ˆ
1
2
1
22
2
2
]ˆ[BV
Ux
9
Variance of
Variance is small if sample size n is large sample fraction n/N is large deviations about line e = y Bx are
small correlation between X and Y close to 1 is large
2
2
1ˆˆU
e
xn
sNn
BV
Ux
B
10
Ag example – 1 Frame: 1987 Agricultural Census
Take SRS of 300 counties from 3078 counties to estimate conditions in 1992
Collect data on y , have data on x for sample
Existing knowledge about the population
ix
iy
i
i
county in 1987 in farms of acreage total
county in 1992 in farms of acreage total
acres 625,470,964
1987 in US incounty per farms of acreage total
county / acres 283.343,313
1987 in US incounty per farms of acreage average
x
U
t
x
11
Ag example – 2 Estimate
1987 in farms of acres1992 in farms of acres
B
0,586,1179,369,11498
ˆ
acres 90,586,117
acres 89,369,114
300
1
300
1
i
i
ii
ii
x
yB
x
y
0.9866 farm acres in 1992 relative to 1987 farm acres
12
Ag example – 3
Need to calculate variance of ei ’s
2
2
1ˆˆU
e
xn
sNn
BV
13
Ag example – 4 For each county i, calculate Coffee Co, AL example
Sum of squares for ei
iii xBye ˆ
1693.00 179,311 (0.9866) 1175,209 ie
112300
22
22 109965166.2 xeeee ii
n
i
462,179,002,1 2991
11 2
1
2
i
n
ie e
ns
14
Ag example – 5
0055.0)ˆ(
acre farm 1987per 1992 in acres farm 9866.0ˆ
000030707.0283.343,313300
462,179,002,13078300
1
1ˆˆ2
2
BSE
B
xn
sNn
BVU
e
15
Estimating proportions If denominator variable is random, use
ratio estimator to estimate the proportion p
Example (p. 72) 10 plots under protected oak trees used to assess effect
of feral pigs on native vegetation on Santa Cruz Island, CA
Count live seedlings y and total number of seedlings x per plot
Y and X correlated due to common environmental factors
Estimate proportion of live seedlings to total number of seedlings
B
032.0)ˆ( with 300.06.201.6ˆ BSE
xy
B
16
Estimating population mean Estimator for
“Adjustment factor” for sample mean
A measure of discrepancy between sample and population information, and
Improves precision if X and Y are + correlated
Uy
xx
yxxy
xBy UUUr
ˆˆ
xxU
Uxx
17
Underlying model with B > 0
B is a slope B > 0 indicates X
and Y are positively correlated
Absence of intercept implies line must go through origin (0, 0)
y
x0
0
ii xBy
18
Using population mean of X to adjust sample mean
Discrepancy between sample & pop info for X is viewed as evidence that same relative discrepancy exists between
xx
yxBy UUr
ˆˆ
U
UU
yyxx
xx
of estimatebetter get to adjust
1
adjust 1 yxx
xx UU
Uyy and
19
Bias of Ratio estimator for the population mean
is biased
Rules of thumb for bias of apply
0]}ˆ{[
ˆˆ
BBEx
xxy
xBEyyE
U
UU
UUUr
B
ry
20
Estimator for variance of Estimator for variance of
ns
Nn
BVxyV eUr
22 1ˆˆˆˆ
ry
ry
21
Ag example – 6
1992 incounty per acres farm6.133,309
283.343,3139866.0ˆˆ
Ur xBy
22
Ag example - 8
17001736ˆ
67.890,014,3300
462,179,002,19025.0
1ˆˆˆˆ
1992 incounty / acres farm 100,309ˆ
22
r
eUr
r
ySE
ns
Nn
BVxyV
y
23
Ag example – 9 Expect a linear relationship between
X and Y (Figure 3.1) Note that sample mean is not equal to
population mean for X
county / acres 723.953,301
sample thefor 1987 in
US incounty per farms of acreage mean
county / acres 283.343,313
1987 in
US incounty per farms of acreage average
x
xU
24
MSE under ratio estimation Recall …
MSE = Variance + Bias2
SRS estimators are unbiased so MSE = Variance
Ratio estimators are biased so MSE > Variance
Use MSE to compare design/estimation strategies EX: compare sample mean under SRS with
ratio estimator for pop mean under SRS
25
Sample mean vs. ratio estimator of mean is smaller than
if and only if
For example, if and
ratio estimation will be better than SRS
]ˆ[ ryMSE
yCVxCV
R21
yCVxCV ~
2/1, yxCorrR
][yMSE
26
Estimating the MSE Estimate MSE with sample estimates of
bias and variance of estimator This tends to underestimate MSE
and are approximations Estimated MSE is less biased if
is small (see earlier slide) Large sample size or sampling fraction High + correlation for X and Y
is a precise estimate (small CV for ) We have a reasonably large sample size
(n > 30)
x
BBias ˆ BV ˆ
x
BBias ˆ
27
Ag example – 10
796,013,3100696,301,3ˆˆˆˆ
10830,344552,3449958.0
830,3449866.0954,301300
13078300
1ˆ
acres 736,1
1992 incounty per acres farm 134,309ˆ
042,151,357ˆˆ
acres898,18300
552,3443078300
1ˆ
1992 incounty per acres farm 897,297
2
2
2/1
rrr
r
r
yBiasyVyESM
saBi
yse
y
yVyESM
yes
y
28
Estimating population total t Estimator for t
Is biased?
Estimator for
rxx
x
yyr yNtBt
t
tt ˆˆ
ˆ
ˆˆ
yrt
]ˆ[ yrtV
]ˆ[ˆ yrtV
29
Ag example – 11
3
2
22
22
10856,2300
724,179,002,19025.03078
ˆˆ1ˆˆˆˆ
1992 in US in acres farm 191,513,951
625,470,9649866.0ˆˆ
yre
xyr
xyr
yVNns
Nn
NBVttV
tBt
30
Summary of ratio estimation
iii
n
iie
UU
e
n
ii
n
ii
x
y
xBye
en
s
xxxn
sNn
BV
x
y
xy
t
tB
ˆ
11
where
)w/ (est.1ˆˆ
ˆ
ˆˆ
1
22
2
2
1
1
31
Summary of ratio estn – 2
ns
Nn
NyVNBVttV
t
ttBt
ns
Nn
BVxyV
xx
yxBy
erxyr
x
xytxyr
eUr
UUr
2222
22
1ˆˆˆˆˆˆ
ˆˆˆˆ
1ˆˆˆˆ
ˆˆ
32
Regression estimation What if relationship between y and x is
linear, but does NOT pass through the origin
Better model in this case isxBBy 10
y
xB0
B1 slope
33
Regression estimation – 2 New estimator is a regression estimator
To estimate , is predicted value from regression of y on x at
Adjustment factor for sample mean is linear, rather than multiplicative
Uxx regyUy
xxByxBBy UUreg 110ˆˆˆˆ
34
Estimating population mean Regression estimator
Estimating regression parameters
Uy
xByB
s
sr
s
s
xx
yyxxB
x
y
x
xy
n
i i
n
i ii
10
2
1
21
1
ˆˆ
ˆ
xxByxBBy UUreg 110ˆˆˆˆ
35
Estimating pop mean – 2 Sample variances, correlation,
covariance
n
i iixy
yx
xy
n
i ix
n
i iy
xyxxn
s
ss
sr
xxn
s
yyn
s
1
1
22
1
22
11
11
11
36
Bias in regression estimator
0],ˆ[ˆ1 xBCovyyE Ureg
37
Estimating variance
Note: This is a different residual than ratio estimation (predicted values differ)
iiiii
n
iie
ereg
yyxBBye
en
s
ns
Nn
yV
ˆˆˆ
11
where
1ˆˆ
10
1
22
2
38
Estimating the MSE Plugging sample estimates into
Lohr, equation 3.13:
)1(1ˆˆ 22
rn
s
Nn
yESM yreg
39
Estimating population total t
Is regression estimator for t unbiased?
regyreg
regyreg
yVNtV
yNt
ˆˆˆˆ
ˆˆ
2
40
Tree example Goal: obtain a precise estimate of number
of dead trees in an area Sample
Select n = 25 out of N = 100 plots Make field determination of number of dead
trees per plot, yi
Population For all N = 100 plots, have photo determination
on number of dead trees per plot, xi
Calculate = 11.3 dead trees per plot Ux
41
Tree example – 2 Lohr, p. 77-78
Data Plot of y vs. x Output from PROC REG
Components for calculating estimators and estimating the variance of the estimators
We will use PROC SURVEYREG, which will give you the correct output for regression estimators
42
Tree example – 3 Estimated mean number of dead
trees/plot
Estimated total number of dead trees
41.0~4080.025
54834.510025
1ˆˆ
trees/plot dead99.113.11613274.0059292.5ˆ
reg
reg
yes
y
414080.0100ˆˆ
area in trees dead 119999.11100ˆ
yreg
yreg
tes
t
43
Tree example – 4 Due to small sample size, Lohr uses t -
distribution w/ n 2 degrees of freedom
Half-width for 95% CI
Approx 95% CI for ty is (1115, 1283) dead trees
07.2 so 232,05., 23,025.,2/ tndft df
45.8480.4007.2ˆ2,2/ yregn test
44
Related estimators Ratio estimator
B0 = 0 ratio model
Ratio estimator regression estimator with no intercept
Difference estimation B1 = 1 slope is assumed to be 1
xBBy 10
y
xB0
B1 slope
45
Domain estimation under SRS Usually interested in estimates and
inferences for subpopulations, called domains
If we have not used stratification to set the sample size for each domain, then we should use domain estimation We will assume SRS for this discussion
If we use stratified sampling with strata = domains, then use stratum estimators (Ch 4) To use stratification, need to know domain
assignment for each unit in the sampling frame prior to sampling
46
Stratification vs. domain estimation In stratified random sampling
Define sample size in each stratum before collecting data
Sample size in stratum h is fixed, or known In other words, the sample size nh is the same
for each sample selected under the specified design
In domain estimation nd = sample size in domain d is random Don’t know nd until after the data have been
collected The value of nd changes from sample to sample
47
Population partitioned into domains
Recall U = index set for population = {1, 2, …, N } Domain index set for domain d = 1, 2, …, D
Ud = {1, 2, …, Nd } where Nd = number of OUs in domain d in the population
In sample of size n nd = number of sample units from domain d are in the sample Sd = index set for sample belonging to domain d
Domain D
d=1
d=2
. . . . . . d=D
Domain #1
48
Boat owner example Population
N = 400,000 boat owners (currently licensed) Sample
n = 1,500 owners selected using SRS Divide universe (population) into 2 domains
d = 1 own open motor boat > 16 ft. (large boat) d = 2 do not own this type of boat
Of the n = 1500 sample owners: n1 = 472 owners of open motor boat > 16 ft. n2 = 1028 owners do not own this kind of boat
49
New population parameters Domain mean
Domain total
d
dUi
id
U yN
y1
d
dUi
iU yt
" domain to belong NOT does Unit "
" domain to belongs Unit "
diUi
diUi
d
d
50
Boat owner example - 2 Estimate population domain mean
Estimate the average number of children for boat owners from domain 1
Estimate proportion of boat owners from domain 1 who have children
Estimate population domain total Estimate the total number of children
for large boat owners (domain 1)
51
New population parameter – 2 Ratio form of population mean
Numerator variable
Denominator variable
Bxu
Nx
Nu
NN
Ny
N
y
yU
UN
ii
N
ii
d
N
ii
d
Uii
U
d
d
d
/
/
/
/
1
11
d
dii Ui
Uiyu
if0
if
d
di Ui
Uix
if0
if1
52
Boat owner example - 3 Estimate mean number of children
for owners from domain 1
owner for children ofnumber iy i
1) domain in(not otherwise0
1) (domain owner if 1Uiyu i
i
otherwise0
owner if1 1Uix i
Zero values for OUs that are not in domain 1
Applies to whole pop
53
Boat example – 4 Owner
(i) Domain
(di) # Kids
(yi) Den. (ui)
Num. (xi)
1 1 3 2 1 2 3 2 5 4 1 0 5 2 0 6 2 1 7 1 1 8 2 2 …
54
Estimator for population domain mean
dUy
dn
y
n
uu
x
u
xn
un
xu
By
d
dii
d
Sii
Sii
n
ii
n
ii
n
ii
n
ii
d
dd
domain in nsobservatio of mean sample
1
1
ˆ
1
1
1
1
55
Boat example – 5 Domain 1 data
Number of Children
Number of Respondents
0 76 1 139 2 166 3 63 4 19 5 5 6 3 8 1
Total 472
56
Boat example – 6 Domain 1 and domain 2 data
combined ui
Number of Respondents
0 1104 1 139 2 166 3 63 4 19 5 5 6 3 8 1
Total 1500
1104 zeros =
76 zeros from domain 1
+
1028 zeros from domain 2
57
Two ways of estimating mean
Boat example – 7
ownerboat largeper children 67.1314667.0524667.0ˆ
314667.015004821
524667.01500787
)8(1...)2(166)1(139)0(76)0(1028[1500
11
1
1
1
xu
B
nn
un
x
un
u
n
ii
n
ii
ownerboat largeper children 67.14727871 1
111
n
iiy
ny
Whole data set
Domain 1 data only
58
Estimator for variance of dy
dSiii
dyd
ydd
dd
d
xByn
s
snn
NN
nNn
yV
xu
By
22
2
2
ˆ1
1where
111
1ˆ
ˆ
59
Boat example – 8
111078.94ˆ1
1177966.3
11500472
11
177966.3472
1500 with estimate --
?000,400
FPC ignore can so 1000,400
500,111
111
1ˆ
1
2
1
21
11
11
2
2
Si
iiy
ydd
dd
xByn
s
nn
nn
nn
NN
Nn
snn
NN
nNn
yV
60
Boat example – 9
45.04465287.0
ownerboat largeper children 67.1667373.1
199388.0472111078.94
11
111
1ˆ
1
1
1
21
21
1
2
1
21
1
2
11
ySE
y
n
s
snn
nn
n
snn
NN
nNn
yV
y
y
y
61
Approximation for estimator of variance of dy
dSiii
dyd
dddd
d
ydd
xByn
s
nn
nn
nn
NN
n
s
Nn
yV
22
2
ˆ1
1 where
11
and assuming
1ˆ
Domain 1 data only
62
Estimated variance of Estimator for
Domain variance estimator is directly related
iii
n
ii
n
iiie
U
e
xBye
en
xByn
s
xn
sNn
BV
ˆ
11ˆ
11
where
1]ˆ[ˆ
1
2
1
22
2
2
]ˆ[BV
B
63
Relationship to estimating a ratio with Population mean of X
Residual
NN
x dU
U
UU x
uBy
d
xu
B ˆ
d
diiiii
SiB
SixByxBue
if 00ˆ0
if ˆˆ
64
Relationship to estimating a ratio with - 2 Residual variance
2
2
22
22
11
ˆ11
11
01
1ˆ1
1
ˆ1
1
ydd
Siii
d
d
SiSiii
Siiie
snn
xBynn
n
nxBy
n
xBun
s
d
dd
U
UU x
uBy
d
xu
B ˆ
65
Estimator for variance of dy
dSiii
dyd
ydd
dd
xByn
s
snn
NN
nNn
yV
22
2
2
ˆ1
1where
111
1ˆ
22
1
11]ˆ[ˆ e
U
sxnN
nBV
66
Estimating a population domain total If we know the domain sizes, Nd
Uddd yNt
ddyd
dddyd
yVNtV
NyNt
ˆˆˆ
known ifˆ
2
67
Estimating a population domain total - 2 If we do NOT know the domain
sizes
Uddd yNt
n
iiu
u
yd
dyd
uun
s
ns
Nn
N
uVNtV
NuNt
1
22
22
2
11
where
1
ˆˆˆ
unknown ifˆ
Standard SRS estimator using u as the variable
68
Boat example – 10 Do not know the domain size, N1
000,10530,10ˆˆ)ˆ(
232,871,1101500
0394178.1000,400
)1(ˆˆˆ
children 000,210867,209524667.0000,400
ˆˆ
11
2
222
1
1
yy
uy
yyd
tVtSE
ns
Nn
NuVNtV
uNtt
69
Comparing 2 domain means Suppose we want to test the hypothesis that
two domain means are equal
Construct a z-test with Type 1 error rate (for falsely rejecting null hypothesis)
Test statistic:
Critical value: z/2
Reject H0 if |z| > z/2
211
210
:
:
UU
UU
yyH
yyH
)(ˆ)(ˆ21
21
yVyV
yyz
70
Boat example - 10 Large boat owners (d = 1)
Other boat owners (d = 2)
4465287.0
ownerboat largeper children667373.1
1
1
ySE
y
669793.0
ownerboat other per children501059.2
2
2
ySE
y
71
Boat example - 11 Test whether domain means are equal at =
0.05 Calculate z-statistic
Critical value z/2 = z0.25 = 1.96 Apply rejection rule
|z| = |-1.04|=1.04 < 1.96 = z0.25 Fail to reject H0
04.1804991.0833686.0
669793.0446529.0
501059.2667373.1
)(ˆ)(ˆ
22
21
21
yVyV
yyz
72
Overview Population parameters
Mean Total Proportion (w/ fixed denom) Ratio
Includes proportion w/ random denominator
Domain mean Domain total
73
Overview – 2 Estimation strategies
No auxiliary information Auxiliary information X, no intercept
Y and X positively correlated Linear relationship passes through origin
Auxiliary information X, intercept Y and X positively correlated Linear relationship does not pass through
origin
74
Overview – 3 Make a table of population parameters
(rows) by estimation strategy (columns) In each cell, write down
Estimator for population parameter Estimator for variance of estimated parameter Residual ei
Notes Some cells will be blank Look for relationship between mean and total,
and mean and proportion Look at how the variance formulas for many of
the estimators are essentially the same form
)ˆ(ˆ V