71
1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum

1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

Embed Size (px)

Citation preview

Page 1: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

1

Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample

is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum

Page 2: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

2

Procedure Divide sampling frame into mutually

exclusive and exhaustive strata Assign each SU to one and only one stratum

Select a random sample from each stratum Select random sample from stratum 1 Select random sample from stratum 2 … Stratum H

h=1

h=2

. . . . . . h=HStratum #1

Page 3: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

3

Ag example Divide 3078 counties into 4 strata

corresponding to regions of the countries Northeast (h = 1) North central (h = 2) South (h = 3) West (h = 4)

Select a SRS from each stratum In this example, stratum sample size is proportional to

stratum population size 300 is 9.75% of 3078 Each stratum sample size is 9.75% of stratum

population

Page 4: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

4

Ag example – 2

Stratum(h)

Stratum size (Nh)

Sample size (nh)

1 (NE) 220 21

2 (NC) 1054 103

3 (S) 1382 135

4 (W) 422 41

Total 3078 300

Page 5: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

5

Procedure – 2 Need to have a stratum value for each

SU in the frame Minimum set of variables in sampling frame:

SU id, stratum assignment

Stratum (h)

SU (j)

1 1

1 2

1 3

2 1

2 2

… …

Page 6: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

6

Ag example – 3

Stratum (h)

SU (j)

1 1

1 2

1 3

… …

1 220

2 1

2 2

… …

4 421

4 422

Page 7: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

7

Procedure – 3 Each stratum sample is selected

independently of others New set of random numbers for each stratum Basis for deriving properties of estimators

Design within a stratum For Ch 4, we will assume a SRS is selected

within each stratum Can use any probability design within a

stratum Sample designs do not need to be the same

across strata

Page 8: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

8

Uses for STS To improve representativeness of

sample In SRS, can get ANY combination of n

elements in the sample In SYS, we severely restricted the set

to k possible samples Can get “bad” samples Less likely to get unbalanced samples if

frame is sorted using a variable correlated with Y

Page 9: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

9

Uses for STS – 2 To improve representativeness of

sample - 2 In STS, we also exclude samples

Explicitly choose strata to restrict possible samples

Improve chance of getting representative samples if use strata to encourage spread across variation in population

Page 10: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

10

Uses for STS – 3 To improve precision of estimates

for population parameters Achieved by creating strata so that

variation WITHIN stratum is small variation AMONG strata is large

Uses same principal as “blocking” in experimental design

Improve precision of estimate for population parameter by obtaining precise estimates within each stratum

Page 11: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

11

Uses for STS – 4 To study specific subpopulations

Define strata to be subpopulations of interest Examples

Male v. female Racial/ethnic minorities Geographic regions Population density (rural v. urban) College classification

Can establish sample size within each stratum to achieve desired precision level for estimates of subpopulations

Page 12: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

12

Uses for STS – 5 To assist in implementing operational

aspects of survey May wish to apply different sampling and data

collection procedures for different groups Agricultural surveys (sample designs)

Large farms in one stratum are selected using a list frame

Smaller farms belong to a second strata, and are selected using an area sample

Survey of employers (data collection methods) Large firms: use mail survey because information is

too voluminous to get over the phone Small firms: telephone survey

Page 13: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

13

Estimation strategy Objective: estimate population total Obtain estimates for each stratum

Estimate stratum population total Use SRS estimator for stratum total

Estimate variance of estimator in each stratum Use SRS estimator for variance of estimated stratum

total Pool estimates across strata

Sum stratum total estimates and variance estimates across strata

Variance formula justified by independence of samples across strata

Page 14: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

14

Ag example – 4

Stratum(h)

Stratum size (Nh)

Sample size (nh)

Sample mean ( )

Estimated stratum total ( )

1 (NE) 220 21 97,630 21,478,558

2 (NC) 1054 103 300,504 316,731,379

3 (S) 1382 135 211,315 292,037,391

4 (W) 422 41 662,295 279,488,706

Total 3078 300 Acres devoted to farms / co

Total farms acres for stratum

hy ht̂

Page 15: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

15

Ag example – 5 Estimated total farm acres in US

US in acres farm 034,736,909

)295,662(422)315,211(1382)504,300(1054)630,97(220

ˆˆ11

H

hhh

H

hhstr yNtt

Page 16: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

16

Ag example – 6

Stratum(h)

Stratum size (Nh)

Sample size (nh)

Sample variance ( )

1 (NE) 220 21 7,647,472, 708

2 (NC) 1054 103 29,618,183,543

3 (S) 1382 135 53,587,487,856

4 (W) 422 41 396,185,950,266

Total 3078 300

2hs

Page 17: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

17

Ag example – 7 Estimated variance for estimated total

farm acres in US

acres 248,417,50)ˆ(ˆ)ˆ(

10 x 2.5419

(...)422(...)1382(...)105421

708 7,647,472,22021

1220

1)ˆ(ˆ)ˆ(ˆ

15

2222

2

1

2

1

strstr

h

hH

h h

hn

H

hhstr

tVtSE

ns

Nn

NtVtV

Page 18: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

18

Ag example – 8 Compare with SRS estimates

acres 381,169,58)ˆ(ˆ)ˆ(

10 x 3.38368 )ˆ(ˆ

acres 100,927,916

15

strstr tVtSE

yNtV

yN

Page 19: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

19

Estimation strategy - 2 Objective: estimate population mean Divide estimated total by population size

OR equivalently, Obtain estimates for each stratum

Estimate stratum mean with stratum sample mean Pool estimates across strata

Use weighted average of stratum sample means with weights proportional to stratum sizes Nh

Nt

y strstr

ˆ

Page 20: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

20

Ag example – 9 Estimated mean farm acres / county

county / acres farm 034,736,909

295,6623078422

315,21130781382

504,30030781054

630,973078220

3078034,736,909ˆ

1

H

hh

hstr

strstr

yNN

y

orNt

y

Page 21: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

21

Ag example – 10 Estimate variance of estimated mean

farm acres / county

H

hh

hstr

strstr

yVN

NyV

or

tVN

yV

12

2

2

)(ˆ)(ˆ

)ˆ(ˆ1)(ˆ

Page 22: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

22

Index set for stratum h = 1, 2, …, H Uh = {1, 2, …, Nh } Nh = number of OUs in stratum h in the population

Partition sample of size n across strata nh = number of sample units from stratum h (fixed) Sh = index set for sample belonging to stratum h

NotationStratum H

h=1

h=2

. . . . . . h=H

Stratum 1

Page 23: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

23

Notation – 2 Population sizes

Nh = number of OUs in stratum h in the population

N = N1 + N2 + … + NH Partition sample of size n across strata

nh = number of sample units from stratum h n = n1 + n2 + … + nH The stratum sample sizes are fixed

In domain estimation, they are random For now, we will assume that the sampling

unit (SU) is an observation unit (OU)

Page 24: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

24

Notation – 3 Response variable

Yhj = characteristic of interest for OU j in stratum h

Population and stratum totals

total population

stratum in total population

1

1

H

hh

N

jhjh

tt

hyth

Page 25: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

25

Notation – 4 Population and stratum means

mean population overall

stratum in mean population

1 1

1

N

y

Nt

y

hN

yy

H

h

N

j hj

U

h

N

j hj

hU

h

h

Page 26: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

26

Notation – 5 Population stratum variance

h

N

yyS

hN

j h

hUhjh stratum in variancepopulation

11

2

2

Page 27: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

27

Notation – 6 SRS estimators for stratum

parameters

1

ˆ

2

2

h

Sjhhj

h

hhSj

hjh

hh

h

Sjhj

h

n

yy

s

yNynN

t

n

y

y

h

h

h

Page 28: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

28

STS estimators For population total

H

hhh

H

hhstr yNtt

11

ˆˆ

h

hH

h h

hn

H

hhstr n

sNn

NtVtV2

1

2

1

1)ˆ(ˆ)ˆ(ˆ

Page 29: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

29

STS estimators – 2 For population mean

H

hh

hstrstr yV

N

NtV

NyV

12

2

2)(ˆ)ˆ(ˆ1

)(ˆ

H

hh

hstrstr y

NN

Nt

y1

ˆ

Page 30: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

30

STS estimators – 3 For population proportion

Page 31: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

31

Properties STS estimators are unbiased

Each estimate of stratum population mean or total is unbiased (from SRS)

pp

tt

yy

str

str

Ustr

ofestimator unbiased is ˆ

ofestimator unbiased is ˆ

ofestimator unbiased is

U

N

hhU

hH

hh

hH

hh

h yyNN

yENN

yNN

Eh

111

Page 32: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

32

Properties – 2 Inclusion probability for SU j in

stratum h Definition in words:

Formula hj =

Page 33: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

33

Properties – 3 In general, for any stratification scheme,

STS will provide a more precise estimate of the population parameters (mean, total, proportion) than SRS For example

Confidence intervals Same form (using z/2) Different CLT

)()( yVyV str

Page 34: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

34

Sampling weights Note that

Sampling weight for SU j in stratum h

A sampling weight is a measure of the number of units in populations represented by SU j in stratum h

h

hhj n

Nw

H

h

N

jhjhj

H

h

N

jhj

h

hH

hhh

H

hhstr

hh

ywynN

yNtt1 11 111

ˆˆ

Page 35: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

35

Example

Note: weights for each OU within a stratum are the same

Stratum (h)

Nh

nh

h

hhj n

Nw

h = 1 6 3 2

36

h = 2 2 2 1

22

h = 3 4 1 4

14

h = 4 5 3 67.1

35

17 9

Page 36: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

36

Example – 2 Dataset from study

Stratum (h) Nh nh whj yhj

1 6 3 2 53

1 6 3 2 107

1 6 3 2 83

2 2 2 1 34

2 2 2 1 22

3 4 1 4 90

4 5 3 1.67 12

4 5 3 1.67 34

4 5 3 1.67 15

Page 37: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

37

Sampling weights – 2 For STS estimators presented in Ch

4, sampling weight is the inverse inclusion probability

h

hhj N

n

hjh

hhj n

Nw

1

Page 38: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

38

Defining strata Depends on purpose of stratification

Improved representativeness Improved precision Subpopulations estimates Implementing operational aspects

If possible, use factors related to variation in characteristic of interest, Y

Geography, political boundaries, population density Gender, ethnicity/race, ISU classification Size or type of business

Remember Stratum variable must be available for all OUs

Page 39: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

39

Allocation strategies Want to sample n units from the population An allocation rule defines how n will be spread

across the H strata and thus defines values for nh

Overview for estimating population parameters

Stratum costs same

Stratum variances

same

Allocation rule

No No Optimal

Yes No Neyman

Yes Yes Proportional

Special cases of optimal allocation

Page 40: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

40

Allocation strategies – 2 Focus is on estimating parameter

for entire population We’ll look at subpopulations later

Factors affecting allocation rule Number of OUs in stratum Data collection costs within strata Within-stratum variance

Page 41: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

41

Proportional allocation Stratum sample size allocated in

proportion to population size within stratum

Allocation rulen

NN

n hh

Nn

Nn

h

h

Page 42: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

42

Ag example – 11

Stratum h

Stratum Total Nh

Stratum Sample Size nh = n (Nh / N )

1 (NE) 220 21 .0975 (220) = 21.4

2 (NC) 1054 103 .0975 (1054) = 102.7

3 (S) 1382 135 .0975 (1382) = 134.7

4 (W) 422 41 .0975 (422) = 41.1

Total N = 3078 300 = n

Page 43: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

43

Proportional allocation – 2 Proportional allocation rule implies

Sampling fraction for stratum h is constant across strata

Inclusion probability is constant for all SUs in population

Sampling weight for each unit is constant

Nn

Nn

h

h

Nn

Nn

h

hhj

nN

whj

hj 1

Page 44: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

44

Proportional allocation – 3 STS with proportional allocation leads to a

self-weighting sample What is a self-weighting sample?

If whj has the same value for every OU in the sample, a sample is said to be self-weighting

Since each weight is the same, each sample unit represents the same number of units in the population

For self-weighting samples, estimator for population mean to sample mean

Estimator for variance does NOT necessarily reduce to SRS estimator for variance of

y

y

Page 45: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

45

Proportional allocation – 4 Check to see that a STS with proportional

allocation generates a self-weighting sample Is the sample weight whj is same for each OU?

Is estimator for population mean equal to the sample mean ?

What happens to the variance of ?stry

y

stry

y

Page 46: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

46

Stratum h

Stratum Total Nh

Stratum Sample Size nh

Sample Weight whj

1 (NE) 220 21 220/21 = 10.5

2 (NC) 1054 103 1054/103 = 10.2

3 (S) 1382 135 1382/135 = 10.2

4 (W) 422 41 422/41 = 10.3

Total N = 3078 n = 300

Ag example – 12

Even though we have used proportional allocation, rounding in setting sample sizes can lead to unequal (but approximately equal) weights

Page 47: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

47

Neyman allocation Suppose within-stratum variances

vary across strata Stratum sample size allocated in

proportion to Population size within stratum Nh

Population standard deviation within stratum Sh

Allocation rule nSN

SNn

H

lll

hhh

1

2hS

Page 48: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

48

S t r a t u m

h

N h

S h

N h S h

nSN

SNH

lll

hh

1

w h j

A 4 0 0 3 , 0 0 0 1 , 2 0 0 , 0 0 0 9 6 . 2 6 9 6 4 0 0 / 9 6 = 4 . 1 7

B 3 0 2 , 0 0 0 6 0 , 0 0 0 4 . 8 1 5 3 0 / 1 0 = 3 . 0 0

C 6 1 9 , 0 0 0 5 4 9 , 0 0 0 4 4 . 0 4 4 4 6 1 / 3 7 = 1 . 6 5

D 1 8 2 , 0 0 0 3 6 , 0 0 0 2 . 8 9 3 1 8 / 6 = 3 . 0 0

E 7 0 1 2 , 0 0 0 8 4 0 , 0 0 0 6 7 . 3 8 6 7 7 0 / 3 9 = 1 . 7 9

F 1 2 0 1 , 0 0 0 1 2 0 , 0 0 0 9 . 6 3 1 0 1 2 0 / 2 1 = 5 . 7 1

T o t a l N = 6 9 9 000,805,2

1

H

lll SN

n = 2 2 5

Caribou survey example

Page 49: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

49

Optimal allocation Suppose data collection costs ch vary across strata Let C = total budget

c0 = fixed costs (office rental, field manager)

ch = cost per SU in stratum h (interviewer time,travel cost)

Express budget constraints as

and determine nh

H

hhhnccC

10

Page 50: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

50

Optimal allocation – 2 Assume general case: stratum population

sizes, stratum variances, and stratum data collection costs vary across strata

Sample size is allocated to strata in proportion to

Stratum population size Nh

Stratum standard deviation Sh

Inverse square root of stratum data collection costs Allocation rule

ncSN

cSNn

H

llll

hhhh

1

/

/

hc

1

Page 51: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

51

Obtain this formula by finding nh such that is minimized given cost constraints

The optimal stratum allocation will generate the smallest variance of for a given stratification and cost constraint

Sample size for stratum h (nh ) is larger in strata where one or more of the following conditions exist

Stratum size Nh is large Stratum variance is large Stratum per-unit data collection costs ch are small

2hS

)( stryV

Optimal allocation – 3

stry

Page 52: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

52

Welfare example Objective

Estimate fraction of welfare participant households in NE Iowa that have access to a reliable vehicle for work

Sample design Frame = welfare participant list Stratum 1: Phone

N1 = 4500 households, p1 = 0.85, c1 = $100 Stratum 2: No phone

N2 = 500 households, p2 = 0.50, c2 = $300 Sample size n = 500

Page 53: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

53

Welfare example – 2 Optimal allocation with phone

strataStratum

h

Nh ph (1-ph)

ch

hhh cSN /

H

llll

hhh

cSN

cSN

1

/

/

nh

whj

1: phone

2: no phone

Total N = 5000

H

llll cSN

1

/

n = 500

2hS

Page 54: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

54

Optimal allocation – 4 Proportional and Neyman allocation are

special cases of optimal allocation Neyman allocation

Data collection costs per sample unit ch are approximately constant across strata

Telephone survey of US residents with regional strata

ch term cancels out of optimal allocation formula n

SN

SNn

H

lll

hhh

1

Page 55: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

55

Optimal allocation – 5 Proportional allocation

Data collection costs per sample unit ch are approximately constant across strata

Within stratum variances are approximately constant across strata

Y = number of persons per household is relatively constant across regions

ch and Sh terms drop out of allocation formula

2hS

nNN

n hh

Page 56: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

56

Subpopulation allocation Suppose main interest is in estimating

stratum parameters Subpopulation (stratum) mean, total,

proportion Define strata to be subpopulations

Estimate stratum population parameters:

Allocation rules derived from independent SRS within each stratum (subpopulation) Equal allocation for equal stratum costs,

variances Stratum variances change across strata

hUhUhU pty or or

Page 57: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

57

Subpopulation allocation – 2 Equal allocation

Assume Desired precision levels for each subpopulation

(stratum) are constant across strata Stratum costs, stratum variances equal across strata Stratum FPCs near 1

Allocation rule is to divide n equally across the H strata (subpopulations)

If Nh vary much, equal allocation will lead to less precise estimates of parameters for full population

Hn

nh

Page 58: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

58

Welfare example – 3 Suppose we wanted to estimate

proportion of welfare households that have access to a car for households in each of three subpopulations in NE Iowa Metropolitan county Counties adjacent to metropolitan

county Counties not adjacent to metro county

Page 59: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

59

Welfare example – 4 Equal allocation with population

density strata

Stratum h Nh nh h whj

1: Metro

3,800

2: Adjacent to metro

700

3: Not adjacent to metro

500

Total

N = 5000 n = 500

Page 60: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

60

Subpopulation allocation – 3 More complex settings: If Sh vary across strata, can

use SRS formulas for determining stratum sample sizes, e.g., for stratum mean

Result is

May get sample sizes (nh) that are too large or small relative to budget

Relax margin of error eh and/or confidence level 100(1-)% Recalibrate stratum sample sizes to get desired sample size

h

hh

hh

N

Sze

Szn

222/2

222/

H

hhnn

1

Page 61: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

61

Welfare example – 5 95% CI, e = 0.10 for all pop density

strataStratum h Nh ph

Initial nh Recalibrate nh

1: Metro

3,800 0.70 0.21

2: Adjacent to metro

700 0.80 0.16

3: Not adjacent to metro

500 0.90 0.09

Total

N = 5000 n = 500

2hS

Page 62: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

62

Compromise allocations

nh

Nh Nh

Nh

nh

nh

Proportional Allocation

Square Root Allocation

Equal Allocation

nh = nNh /N

nh = n /H

H

ll

hh

N

Nnn

1

Page 63: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

63

Square root allocation More SUs to small strata

than proportional allocation

Fewer SUs to large strata than equal

Variance for subpopulation estimates is smaller than proportional

Variance for whole population estimates is smaller than equal allocation

nh

Square Root Allocation

H

ll

hh

N

Nnn

1

Nh

Page 64: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

64

max nh

min nh

Compromise allocations – 2

May want to set Minimum number of

SUs in a stratum Cap on max number

of SUs in a stratum Rule

nh = min for Nh < A nh = max for Nh > B Apply rule in between

A and B Square root Proportional

max nh

min nh

nh

nh

A B Nh

A B Nh

Page 65: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

65

Welfare example – 6 Comparing equal, proportional and

square root allocation

Stratum h

Nh

Equal allocation

Proportional allocation

Square root of Nh

Square root allocation

1: Metro

3,800 167

2: Adjacent to metro

700 167

3: Not adjacent to metro

500 166

Total

N = 5000 n = 500 n = 500 Sum = n = 500

Page 66: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

66

Other allocations Certainty stratum is used to guarantee

inclusion in sample Census (sample all) the units in a stratum For certainty stratum h

Allocation: nh = Nh

Inclusion probability: hj = 1

Ad hoc allocations The sample allocation does not have to follow

any of the rules mentioned so far However, you should determine the stratum

allocation in relation to analysis objectives and operational constraints

Page 67: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

67

Welfare example – 7 Ad hoc allocation

Stratum h

Nh

Equal allocation

Square root allocation

Proportional allocation

Actual allocation

1: Metro

3,800 167 279 380 200

2: Adjacent to metro

700 167 120 70 150

3: Not adjacent to metro

500 166 101 50 150

Total

N = 5000 n = 500 n = 500 n = 500 n = 500

Page 68: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

68

Determining sample size n Determine allocation using rule expressed in

terms of relative sample size nh /n

Rewrite variance of as a function of relative sample sizes (ignoring stratum FPCs)

Sample size calculation based on margin of error e for population total

H

llll

hhhh

cSN

cSN

nn

1

/

/

H

hh

h

H

hh

hstr SN

nn

nSN

nn

ntV

hh1

22

1

22 where 1

)ˆ(

strt̂

2

22/

e

zn

Page 69: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

69

Determining sample size n – 2 Rewrite variance of as a function of

relative sample sizes (ignoring stratum FPCs)

Samples size calculation based on margin of error e for population mean

H

hh

h

H

hh

hstr SN

nn

nNSN

nn

NnyV

hh1

222

1

222

where 11

)(

stry

22

22/

Ne

zn

Page 70: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

70

Welfare example – 8 Relative sample size for equal allocation

Value of

For 95% CI with e = 0.1

Hnnh 1

900,399,9)]09(.500)16(.700)21(.3800[3 222

1

22

1

22

H

hh

H

hh

h

SHNSNnn

hh

150)000,000,25(01.

)900,399,9(422

22/

Ne

zn

Page 71: 1 Ch 4: Stratified Random Sampling (STS) DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called

71

STS Summary Choose stratification scheme

Scheme depends on objectives, operational constraints

Must know stratum identifier for each SU in the frame Set a design for each stratum

Design for each stratum – SRS, SYS, … Determine n and nh

Select sample independently within each stratum

Pool stratum estimates to get estimates of population parameters