Upload
michael-doherty
View
214
Download
0
Embed Size (px)
Citation preview
Lenka Mach, Statistics CanadaIoana Şchiopu-Kratina, Statistics Canada
Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada
ICES IIIJune 2007
Optimal Coordination of Samplesin Business Surveys
2
OUTLINE OF THE PRESENTATION:
1. Coordinated sampling
2. Optimal Sample Coordination
2.1 Transportation Problem
2.2 Reduced Transportation Problem
2.3 Variability of the Overlap
3. Example 1: NWCR method for negative coordination of two surveys.
4. Example 2: Reduced TP for positive coordination after re-stratification.
5. Conclusion
3
1. COORDINATED SAMPLING
• Needed when multiple sample surveys of overlapping populations
are conducted.
• Encompasses many different techniques to control the overlap of samples = number of common units.
higher overlap (positive coordination)• Objective:
lower overlap (negative coordination)
than if samples are selected independently.
• References: Ernst (1999), ICES II (2000), etc.
4
1. COORDINATED SAMPLING
First Survey:
S = set of all possible samples s
(marginal) prob. distribution on S
Second Survey:
S’ = set of all possible samples s’
(marginal) prob. distribution on S’
Integrated surveys:
joint prob. distribution s. t.
and
SsspP
SsspQ
SsSsssp ,,
Ssspssps
,, Ssspssps
,,
5
1. COORDINATED SAMPLING
Overlap of s and s’
= number of units that s and s’ have in common
Expected sample overlap
(1)
Survey are positively coordinated if
sso ,
sspssossoEs s
,,,
spspssosspssos ss s
,,,
6
2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem
We integrate two surveys so that the expected overlap is maximized (minimized):
Find max (min) of (1)
over all (2)
subject to (3)
sspssossoEs s
,,,
SsSsssp ,,X
Ssspssps
,,
Ssspssps
,,
1, s s
ssp
objectivefunction
unknown
constraints
7
2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem
s’1 s’2 s’3 … s’L p(s)
s1
…p(s1)
s2
…p(s2)
s3
…p(s3)
… … … … … … …
sK … p(sK)
p(s’) p(s’1) p(s’2) p(s’3) … p(s’L) 1
ss’
o(s1,s’1) o(s1,s’2)
o(s2,s’1)
o(s1,s’3) o(s1,s’L)
o(s3,s’1)
o(s2,s’2) o(s2,s’3) o(s2,s’L)
o(s3,s’2)
o(sK,s’L)
o(s3,s’L)o(s3,s’3)
o(sK,s’1) o(sK,s’2) o(sK,s’3)
X1 1 X12 X1 3 X1 L
X2 1 X2 2 X2 3X2L
X3 1 X3 3 X3 LX3 2
XK 1 XK 3XK 2 XK L
8
2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem
TP is too large, too many variables!
Example: First survey selects SRSWOR of n = 20 from N = 40.
= 137,846,528,820
n
NK
BUT, for stratified SRSWOR designs, we can reduce TP by grouping samples!Condition: The matrix of o(s, s’) within each group must be “symmetric”.
We use a two-stage procedure.
9
2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem
Notation:P frame for Survey 1, P’ frame for Survey 2, C = P P’ c = c(s) = number of units in C sc’ = c’(s’) = number of units in C s’
Solution - Stage 1:• Group samples s super-rows c• Group samples s’ super-columns c’• Form a matrix of blocks (c, c’), define block optimum o(c, c’) • Solve the reduced TP joint probabilities p(c, c’)
Solution - Stage 2:Distribute p(c, c’) evenly among the pairs (s, s’) that have the optimum overlap
– each row s within the block gets the same probability– each column s’ within the block gets the same probability
10
2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem
Matrix of o(s, s’) within a block.
211121112211121112
231
223
221
131
123
121
213121232121
duuduuduuduuduuduu
bbuubbuubbuu
11
2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem
Example 1:
Survey 1: N =40, SRSWOR n =20
Survey 2: N’=41, SRSWOR n’=20
C=37
D=3 B=4
c = 17, 18, 19, 20 4 super-rowsc’ = 16, 17, 18, 19, 20 5 super-columns
Reduced TP has only 4 x 5 = 20 unknowns.
Constraints:
n
Ncn
DcCcp )(
'
'''')'( n
Ncn
BcCcp
12
2. OPTIMAL SAMPLE COORDINATION 2.3 Variability of the Overlap
• Optimal coordination maximizes (minimizes)• In practice, one pair of samples (s, s’) is selected
its overlap o(s, s’) should be close to ! • TP can be used in 2 steps:
– Step 1: as described on Slide 6
– Step 2: - Use from Step 1 as an additional constraint
- New objective function: For example, find the minimum of
(4)
ssoE ,
ssoE ,
ssoE ,
sspssoEssossoVs s
,,,, 2
13
3. Example 1NWCR method for negative coordination of two surveys.
Survey 1: N =40, SRSWOR n =20
Survey 2: N’=41, SRSWOR n’=20D=3, C=37, B=4
Minimize . ssoE ,
Stage 1 – Solve the Reduced TP:• Group samples s into super-rows and s’ into super-columns. • Order super-rows by ascending c and super-columns by descending c’,
form a matrix of blocks.• Block optimum o(c, c’) = max{0, c+c’–C} = smallest possible overlap o(s, s’) within (c, c’). • Use NWCR algorithm to obtain a solution.
Stage 2 - Determine p(s, s’) for each pair (s, s’):• Distribute p(c, c’) equally among all pairs (s, s’) within the block that
have o(s, s’) = o(c, c’).
14
3. Example 1 NWCR method for negative coordination of two surveys.
Table 1a: Reduced TP, p(c, c’) assigned by NWCR
c’p(c)
c 20 19 18 17 16
17 0 0.0591 0 0.0563 0 0 0 0 0 0 0.1154
18 1 0 0 0.2064 0 0.1782 0 0 0 0 0.3846
19 2 0 1 0 0 0.2158 0 0.1689 0 0 0.3846
20 3 0 2 0 1 0 0 0.0675 0 0.0478 0.1154
p(c’) 0.0591 0.2627 0.3940 0.2364 0.0478 1.0000
o(c, c’)p(c, c’)
15
3. Example 1 NWCR method for negative coordination of two surveys.
Stage 2 - Distribution of probabilities within blocks
Consider (c=17, c’=20) with o(c, c’)=0:
• there are = 15,905,368,710 different samples (rows) s
• there are = 15,905,368,710 different samples (columns) s’
The matrix of overlaps o(s, s’) is symmetric:For each sample s, there is exactly one sample s’ such that o(s, s’)=0.For each sample s’, there is exactly one sample s such that o(s, s’)=0.
Each sample s will get probability of Each sample s’ will get probability of
33
1737
44
2037
,71015,905,368
0.0591
16
3. Example 1 NWCR method for negative coordination of two surveys.
Theorem:
(a) The joint density XNWCR obtained by the NWCR method for negative coordination satisfies the constraints given in (3).
(b) XNWCR has the minimum expected overlap within the set of joint densities that satisfy (3).
(c) XNWCR has the minimum variance within this set of joint densities.
Proof in Mach, Reiss, Şchiopu-Kratina (2006).
17
3. Example 1 NWCR method for negative coordination of two surveys.
Simultaneous Selection
i) Select one block using the joint probabilities p(c, c’) in Table 1a.ii) To draw samples s and s’, randomly select units from each set: C = common
units, D = deaths, B = births.
Suppose block (19, 18) selected in i). To select s, randomly select 19 units from 37 in C, and 1 unit from 3 in D . To select s’, take the remaining 37-19=18 units from C, and randomly select two units from 4 in B .
Sequential Selection (s drawn first)
i) Select one block from the super-row c(s) using the conditional probabilities p{(c, c’)| c(s)} corresponding to the joint probabilities in Table 1b.ii) Randomly select units from C and B sets to form s’.
18
3. Example 1 NWCR method for negative coordination of two surveys.
Deaths(D=3)
Common Units(C=37)
Births(B=4)
s s’
n = 20 o (s, s’ ) = 0 c’ = 18 n ’= 20c = 19
19
3. Example 1 NWCR method for negative coordination of two surveys.
20 19 18 17 16 p(c)
17 0 0.0083 0 0.0336 0 0.0426 0 0.0239 0 0.0043 0.1127
18 1 0.0225 0 0.1022 0 0.1574 0 0.0890 0 0.0160 0.3871
19 2 0.0210 1 0.0987 0 0.1508 0 0.0993 0 0.0182 0.3880
20 3 0.0052 2 0.0251 1 0.0426 0 0.0319 0 0.0074 0.1122
p(c’) 0.0570 0.2596 0.3934 0.2441 0.0459 1.0000
Table 1b: Empirical block probabilities for Sequential SRSWOR (PRN)
E [o(s, s’)] V [o(s, s’)]
NWCR 0 0
PRN 0.2716 0.3212
Table 1c: Expectations
20
4. Example 2Reduced TP for positive coordination after re-stratification.
C1 : C1 = 2
New stratum:N’ =15n’ = 5
C2 : C2 = 3
C3 : C3 = 10
Old stratum 1:N1 =20n1 =10
Old stratum 2:N2 = 6n2 = 3
Old stratum 3:N3 =10n3 = 2
Objective: Maximize . ssoE ,
21
4. Example 2 Reduced TP for positive coordination after re-stratification.
Super-rows:→ 3 x 4 x 1 = 12 super-rows
Super-columns:
(0, 0, 5), (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 0, 4), (1, 1, 3), (1, 2, 2), (1, 3, 1),
(2, 0, 2), (2, 1, 2), (2, 2, 1), (2, 3, 0). → 12 super-columns
Reduced TP has 12 x 12 = 144 unknowns.
Constraints:
:,, 321 cccc .2,3,2,1,0,2,1,0 321 ccc
:5''',',','' 321321 ccccccc
2
2
22
22
2
2
1
1
11
11
1
1)(nN
cnCN
cC
nN
cnCN
cC
p c
'
''''
)'(3
3
2
2
1
1nN
cC
cC
cC
p c
Product of hypergeometricprobabilities
Multihypergeometricprobabilities
22
4. Example 2 Reduced TP for positive coordination after re-stratification.
c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)
2,3,2 5 0 5 0.0115 5 0 … 2 0 0.0118
2,2,2 5 0 5 0.0301 4 0 … 2 0 0.1066
1,3,2 5 0 4 0 5 0.0031 … 2 0 0.0263
… … … … … … …
0,0,2 2 0 2 0 2 0 … 2 0.0118 0.0118
p(c’) 0.0899 0.0450 0.0150 … 0.0839 1.0000
c’
Table 2a: Block overlap and probabilities p(c,c’) (TP solution)
o(c, c’) = min(c1,c1’) + min(c2,c2’) + min(c3,c3’)
ETP [o(s, s’)] = 3.6494 VTP [o(s, s’)] = 0.7292
23
4. Example 2 Reduced TP for positive coordination after re-stratification.
Sequential selection:Suppose c = (2,3,2) with p(c’)=0.01184
c’ 2,1,2 2,3,0 Σ
p(c’) 0.01151 0.00033 0.01184
p{c’ |c=(2,3,2)} 0.97213 0.02787 1
ETP{o |c=(2,3,2)} = 5
VTP {o |c=(2,3,2)} = 0
i) Select super-column c’ using p{c’ |c=(2,3,2)}.
ii) Suppose c’ = (2,1,2) selected. → Randomly de-select 2 units from s C2 to form s’.
Table 2b: Probabilities for c = (2,3,2)
24
4. Example 2 Reduced TP for positive coordination after re-stratification.
Is the matrix of overlaps o(s, s’), within a block, is symmetric?
Consider block {c =(2,3,2), c’ =(2,1,2)} with o(c, c’)=5:
• = 43,758 x 1 x 45 different samples (rows) s
• = 1 x 3 x 45 different samples (columns) s’
For each s, there are exactly 3 samples s’ such that o(s, s’)=5.For each s’, there are exactly 43,758 samples s such that o(s, s’)=5.
Each s’ will get probability of
210
03
33
818
22
210
13
22
453
0.01151
25
4. Example 2 Reduced TP for positive coordination after re-stratification.
43,758 rows
333445433344543334454
333445433344543334454333445433344543334454333444533344453334445
333444533344453334445333444533344453334445
43,758 rows
16 s’ 16 s’ 16 s’28 s’ 28 s’ 28 s’
Table 2c: Matrix of o(s, s’); block {c =(2,3,2), c’ =(2,1,2)}
26
4. Example 2 Reduced TP for positive coordination after re-stratification.
c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)
2,3,2 5 0.0022 5 0.0015 5 0.0007 … 2 0.0002 0.0124
2,2,2 5 0.0160 5 0.0173 4 0.0006 … 2 0.0022 0.1067
1,3,2 5 0.0055 4 0.0001 5 0.0025 … 2 0.0007 0.0254
… … … … … … …0,0,2 2 0.0001 2 0 2 0 … 2 0.0069 0.0116
p(c’) 0.0897 0.0453 0.0153 … 0.0847 1.0000
Table 2d: Empirical block probabilities for Sequential SRSWOR (PRN)
c’
E [o(s, s’)] V [o(s, s’)] E{o |c=(2,3,2)} V{o |c=(2,3,2)}
TP 3.6494 0.7292 5 0
PRN 3.5602 0.6940 4.3282 0.5746
Table 2e: Expectations
5. CONCLUSION
Optimal sample coordination is a TP.
For stratified SRSWOR, we can reduce TP by grouping samples.
The groups must be formed so that the matrix of o(s, s’) within each group is symmetric.
The solution and the selection is done in two stages.
Different objective functions can be defined, depending on the goal of the sample coordination project.
Pour plus d’information, veuillez contacter
For more information please contact
www.statcan.ca
Optimal Coordination of Samplesin Business Surveys
Lenka Mach
E-mail/Courriel: [email protected]
29
REFERENCES
Ernst, L.R. (1999), “The Maximization and Minimization of Sample Overlap Problems: A Half Century of Results,” Bulletin of the International Statistical Institute, Proceedings, Tome LVIII, Book 2, pp 293-296.
Mach, L., Reiss, P.T., and Şchiopu-Kratina, I. (2006), “Optimizing the Expected Overlap of Survey Samples via the Northwest Corner Rule,” Journal of the American Statistical Association, Vol. 101, No. 476, Theory and Methods, pp. 1671-1679.
McKenzie, B. and Gross, B. (2000), “Synchronized Sampling,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 237-243.
Ohlsson, E. (2000), “Coordination of PPS Samples Over Time,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 255-264.
Royce, D. (2000), “Issues in Coordinated Sampling at Statistics Canada,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 245-254.