Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 ·...

Preview:

Citation preview

Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Haraldsen & Snijkers 2016 1

Sampling Business Surveys

TSE approach: Quality = 1 – ∑(Bias2 + Variance)

)]VVV()BBBBB[(1 gsinocesPrtMeasuremenSampling

2

gsinocesPrtMeasuremeneNonresponsFrameionSpecificat

Population

Sample frame

Respondents

Construct

Measurement instrument

Response

Adjusted Edited

Data delivery

Sample

Coverage errors

Sampling errors

Nonresponse errors

Adjustment errors

Validity

Measurement errors

Processing errors

WHO? WHAT? HOW?

Development & testing

Specification

Survey communication & management

Data delivery & documentation

Coding, cleaning & data integration

Survey Cycle Extended Survey Cycle

2 Haraldsen & Snijkers 2016

Administrative --> Statistical Business Register. Norway 2009

Haraldsen & Snijkers 2016 3

0.0 %

20.0 %

40.0 %

60.0 %

80.0 %

100.0 %

Identied Legal Entities Errors subtracted Passive subtracted

100.0 % 93.0 %

49.0 %

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

4 Haraldsen & Snijkers 2016

• IDENTIFY • CLASSIFY

• FRAME

• SAMPLE

Inactive included Active excluded One counted as several (multiplicity) Several counted as one (clustering) Inaccurate classification Misclassification

UPDATE

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

5 Haraldsen & Snijkers 2016

• With computerized questionnaires the relationship between cost and sample size is weaker than before

• For the individual company sample size does not affect response burden

• For the business world, however, it does

Sampling:

Haraldsen & Snijkers 2016 6

• Stratification as default

• Complete enumerated strata

• Partitions based on the most important domains in estimation

• Keep the number of stratifiers (and strata) low

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

Stratification =

• Decide what auxiliary variables to use as stratification variables

• Determine the number of strata

• Assign units to strata • Allocate the number to

be sampled from each strata (sample allocation)

• Take constraints (like expected nonresponse) into consideration

• (Random) Sampling within strata

= Industry code + no of employees?

Haraldsen & Snijkers 2016

7

= Which predict the survey outcome well

the partition of the population in such a way that the elements within a stratum are as similar as possible and the means of the strata are as different as possible.

= The point where variance decrease flattens out

Measures of variability

Haraldsen & Snijkers 2016 8

Range 𝑋 = 𝑋𝑀𝑎𝑥 − 𝑋𝑀𝑖𝑛

Variance 𝑠2 = 1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Standard deviation

𝑠 = 1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Coefficient of Variation

𝑐𝑣 = 1

𝑥

1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Stratification by intuition: Dalenius-Hodges cumulative method for determining stratum boundaries

0

5

10

15

20

25

30

35

t = 29,9

t/4 = 7,5

2t/4 = 14,9

3t/4 = 22,4

Stratification by intuition 2: Splitting large groups

1-10 > 10-20 > 20-30 > 30-40 > 40-50

44 52 23 21 17

6,6 13,8 18,6 23,2 27,3 35,6 39,7 43,3 43,3 45,5 45,5 45,5 47,1

1 1 2 2 3 3 4 4 4 4 4 4 4

0

5

10

15

20

25

30

35

40

45

50

t = 47,1

t/4 = 11,8

2t/4 = 23,5

3t/4 = 35,3

Alternatives

Haraldsen & Snijkers 2016 11

Varying sample

variances Hidiroglou (1986) Lavallée – Hidroglou (1988)

Power Allocation Allocation = proportional population standard deviation estimate x stratum population size (Neyman 1934)

Cochran (1977)

𝑛𝑖 = 𝑛𝑠ℎ𝑁ℎ 𝑠ℎℎ 𝑁ℎ

= internal + external cost (response burden)

Probability Proportion (to size) Sampling: Direct rather than stratified sampling proportional to a suitable auxiliary information

0 ≤ s(eed) ≤ X/n n1≈ s n2≈ s + X/n

n3≈ s + X/n + X/n

nn≈ ………

Hard to sample:

• A range of products (or services) that vary between none and a lot – Large samples or samples based on census

• Prices (transactions) of a range of products (or services) – Multistage sampling

• Estimates of rare characteristics – Satellite Registers

– Two stage sampling/Filter questions

Haraldsen & Snijkers 2016 12

Cutoff sampling

Haraldsen & Snijkers 2016 13

93.1 %

6.4 % 0.5 %

28.5 % 31.8 %

39.7 %

23.9 % 24.6 %

51.6 %

0.0 %

10.0 %

20.0 %

30.0 %

40.0 %

50.0 %

60.0 %

70.0 %

80.0 %

90.0 %

100.0 %

Small = 0-9 employees Medium = 10 -99employees

Large = 100 + employees

No of businesses No of employees Economic turnover

Variable rather than unit coverage: • The large are fully

enumerated • The middle size are

sampled • The small are left out

Common in spite of immediate disadvantages: • Estimates among the small based on assumptions • Biased estimates of the total • Changes among the smallest go unnoticed • How to determine the cutoff?

Sample rotation and overlap

Haraldsen & Snijkers 2016 14

Variance of change = var (t1) + var (t2) – 2cov(t1,t2)

Panel element

Representativity +/- Learning effects (Perceived) Response Burden

Common,

Frozen Frame

Sample rotation Response Burden

• For the business world response burden is unaffected by sample rotation

Haraldsen & Snijkers 2016 15

Participation scenario, 2 surveys

Actual burden (1,0), B1 (0,1), B2 (1,1), B1B2 Expected burden

Allow overlap p1(1-p2) (1-p1)p2 p1p2 p1B1 + p2B2

No overlap p1 p2 0 p1B1 + p2B2

Sample rotation Response Burden • For the business world response burden is

unaffected by sample rotation

Haraldsen & Snijkers 2016 16

• Sample/population size decides the room for rotation

• Actual fairness is unrealistic

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 1 0 1286 1135 1256 1124 1635 Sample no 2 0 400 947 2094 2236 1982 Sample no 3 0 948 1509 838 1124 1239 Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed

Haraldsen & Snijkers 2016 17

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982

Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Neither 4 nor 5 78 % 44 % -25 % -50 % -100 % -100 % Both 4 and 5 0,23 % 3 % 25 % 50 % 100 % 100 %

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed

• Expected survey holidays are probably the most convincing argument

Haraldsen & Snijkers 2016 18

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Available for rotation 273276 8214 To be rotated 37914 8734 Survey holiday calculation 7 Conditional holiday calculation 9 2 0

“A design that is robust to nonsampling errors will often be better than a highly optimized design that cannot be realized in practice”

Paul Smith’s summary of chapter 5

Haraldsen & Snijkers 2016 19

Recommended