Upload
collin-chapman
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Loredana Di Consiglio, Marco Fortini, Stefano
Falorsi
ISTAT
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Outline
Purpose: to plan a sampling strategy taking into
account for municipal undercoverage of next
Italian Census round
Sketch of 2011 Italian Census
Sources of data useful in planning Post
Enumeration Survey (PES)
Sampling strategies considered for comparison
Construction of a fictitious, but plausible,
population for simulations of sampling universe
Results of simulation study
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Key innovations of the 2011 Italian census
From traditional enumeration method…
Search for households and people on the field
… to a register-supported census
Municipal population registers so to mail out questionnaires to people
Data collection method based on web, mail back and municipal data collection centres
Reduction of the number of enumerators Data collection from late respondentsCoverage evaluation activities
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Coverage evaluation program
Requested by Eurostat quality report, it is anyhow
crucial in this context of extensive process and
methods innovations
Over-coverage: people no more living in the
municipality who are still enlisted into the
population registers
Checked by interviewers during contact of late-respondents
Under-coverage: people living in the municipality
being not yet enlisted in population registers Supplemental lists of people Extensive search on the field Statistical estimation based on capture-recapture
techniques
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Overview of Italian census undercount
Gross undercoverage of population registers
Estimated by Fortini and Gallo (2009) in about 400,000 people (up to 560,000) through administrative data and mixture model analysis to account for underreporting in the source
Gross undercoverage of 2001 Census (enumeration
based)
2001 Post Enumeration Survey estimates that about 800,000 people were missed
Both estimates are based on strong assumptions
However, this evidence makes reasonable the use
of municipal population registers as the main
source for households enumeration
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Capture-Recapture Approach
Correction for population register undercount through a
second source based on independent field enumeration
x1+ people enlisted into municipal register
estimate of municipal population based on field enumeration survey in a sample or enumeration areas (EAs)
estimate of people that would have been counted by both the sources if field enumeration had carried out on the whole municipal area
Petersen estimator of the hidden population is (Wolter,
1986)
Main goal: municipality estimates of population
counts
1ˆx
11
11
11
11111 ˆ
ˆ
ˆ
ˆˆ~x
xx
x
xxxxN
11x̂
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Sampling design for the 2011 Post-Enumeration Survey
About 1300 municipalities and 1,200,000 people will
be sampled
Two alternative two-stage sampling design with
municipalities and enumeration areas as primary and
secondary sampling units
Design A - region by class of population size (less than 5000, 5000-20000, 20000-50000, more than 50000)
Design B - aggregation of provinces inside region by the 4 classes of population size (help in reducing bias of SAE)
Stratification and selection of municipalities
according to their population size is considered for
both designs
It is necessary to sample among municipalities in
order to control costs
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Estimators
Direct estimates of census counts are available only
at planned domain level
small area estimation methods are needed at least for municipalities not included in the sample
Possible available predictors at area level modelling
Population counts coming from register
Demographic indicators (e.g. dependency ratios)
Socio economic indicators
In what follows we consider Direct estimation at regional level (Planned domains) Synthetic estimator at municipality level
Assumption of invariance among municipal under-coverage rates at planned domain level
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Direct Estimators
)(11,
)(11)( ˆ
ˆˆ
Em
EmmEm
X
XXN
mi
iiEm xwX 1)(1ˆ
mi
iiEm xwX 11)(11,ˆ
mmi cCw /
)(11,
)(11)( ˆ
ˆˆ
Cm
CmmCm
X
XXN
mi
iiCCm xwX 1)(1ˆ
mi
iiCCm xwX 11)(11,ˆ
111 / iimiC xxXw
Simple
Calibrated
Expansion estimators
Expansion estimators
Inverse of the selection probability
Final weight
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Synthetic Estimator
)(1
)(11,)( ˆ
ˆˆ
ED
EDED
X
X
)(1
)(11,)( ˆ
ˆˆ
CD
CDCD
X
X
)(1)( ˆ/ˆEDmSEm XN
)(1)( ˆ/ˆCDmSCm XN
Based on invariance assumption of under-coverage rates for
municipalities belonging to the same planned domain
For each system of weights, the coverage ratio is computed at
domain level
From the ratios, simple and calibrated synthetic estimators
are obtained for municipalities
Simple Calibrated
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Empirical study
It is based on simulation study
Two pseudo-populations of 335,643 Italian EAs were
considered
Sources of information
2001 Italian Post Enumeration Census
Administrative data on changes of residence occurred after 2001 census (from November 2002 to December 2005)
For every non empty EAs belonging to the 8101 Italian
municipalities, the following counts were generated Observed count from population register (X1+)
True (N) population count Field enumeration count (X+1)
Count of people enumerated by both the sources (X11 )
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assemble the Pseudo-population
For each Municipality
Munic.Id
EAId
True N
P. Reg
Survey
Both
1015
1 535
1015
2 37
1015
3 53
1015 4 40
1015 5 4
1015
6 64
1015
7 13
Tot. 746
EA Population register
counts come from 2001
Census counts
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign True population counts to municipality
For each Municipality
Munic. EA TrueN P.Reg Survey
Both
1015
1 535
1015
2 37
1015
3 53
1015 4 40
1015 5 4
1015
6 64
1015
7 13
Tot. 755 746
EA Population register
counts come from 2001
Census counts
True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations)
1/r
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign True population counts to EAs
For each Municipality
Munic. EA TrueN P.Reg Survey
Both
1015
1 538 535
1015
2 37 37
1015
3 58 53
1015 4 40 40
1015 5 4 4
1015
6 65 64
1015
7 13 13
Tot. 755 746
EA Population register
counts come from 2001
Census counts
True municipal Population counts: inflating P. Reg. with coverage rate ‘r’ estimated by model in Fortini, Gallo (2009) (2 different populations)
1/r
True N is allocated between EAs by hierarchical Dirichlet/Multinomial model with parameter vector p given by distribution of P. Reg population among EAs
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign survey counts to EAs
Each Municipality
Munic. EA True N
P.Reg Survey
Both
1015
1 538 535
1015
2 37 37
1015
3 58 53
1015 4 40 40
1015 5 4 4
1015
6 65 64
1015
7 13 13
Tot. 755 746
EA Survey counts – True N
multiplied by coverage
rate ‘rs’ ‘rs’ from beta -
binomial distribution
“alpha” and “beta” such
that mean and variance of
2001 PES coverage rates is
reproduced
(5 macro regions by 4
classes of munic. pop.
size)
rs536
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign survey counts to municipality
Each Municipality
Munic. EA True N
P. Reg
Survey
Both
1015
1 538 535 536
1015
2 37 37 37
1015
3 58 53 58
1015 4 40 40 39
1015 5 4 4 4
1015
6 65 64 65
1015
7 13 13 13
Tot. 755 746 752
Municipal count is obtained
summing up value of the EAs
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign number of people enumerated by both the lists
Each Municipality
Munic. EA TrueN P.Reg Survey
Both
1015
1 538 535 536 533
1015
2 37 37 37
1015
3 58 53 58
1015 4 40 40 39
1015 5 4 4 4
1015
6 65 64 65
1015
7 13 13 13
Tot. 755 746 752
People enumerated by
both lists: Hypergeometric
distribution at EA level with
parameters True N, P.Reg,
Survey
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Assign number of people enumerated by both the lists
Each Municipality
Munic. EA TrueN P.Reg Survey
Both
1015
1 538 535 536 533
1015
2 37 37 37 37
1015
3 58 53 58 53
1015 4 40 40 39 39
1015 5 4 4 4 4
1015
6 65 64 65 64
1015
7 13 13 13 13
Tot. 755 746 752 743
Municipal count is obtained
summing up EAs
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
St. dev. of coverage rates among municipalities
About 400,000 and 900,000 missing people were generated for pseudo-Register and pseudo-Survey respectively
Population register variability is larger for POP2 than for POP1
Survey variability is larger than its respective Population register variability (because of its lower coverage rate)
Survey variability is not so close to PES variability, even though their order of magnitude is the same
p Register p Survey POP1 POP2 POP1 POP2
p PES
N-West 0.0051 0.0096 0.0164 0.0172 0.0211
N-East 0.0051 0.0100 0.0041 0.0044 0.0145
Centre 0.0041 0.0085 0.0108 0.0128 0.0059
South 0.0036 0.0068 0.0094 0.0099 0.0284
Isles 0.0040 0.0082 0.0134 0.0135 0.0211
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Variability of coverage rates among EAs – Population registers
Pseudo-coverage of the register vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs)
Too many points here
Simulated EAs show too many large units with very small
coverage rate, which seems not realistic in our context
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Variability of coverage rates among EAs – Control survey
Pseudo-coverage of survey vs size of EAs (left) is compared with EAs coverage rates distribution at 2001 Italian PES (1098 EAs)
Too few points here
Simulated EAs show too few small units with small
coverage rate in this case
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Simulation of the sampling space
Four tests: designs A and B for populations 1 and 2
Each simulation is based on 500 sample replications
Sampling of municipalities with probability proportional
to their population size
Simple random sampling of EAs within municipalities
Simple and weighted direct estimation at domain level
Synthetic estimation at municipality level
Population counts coming from population registers are
used here as benchmark for comparisons
downwards biased but available at zero cost of achievement
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Results – Bias of registers vs. synthetic estimates
Main results Direct estimates have good performance in terms of bias and MSE
at domain level Calibrated estimates overcome the simple ones in terms of MSE,
both for direct and synthetic estimators The less-aggregate design B does not significantly improve the
estimates, so only design A is shown here
In terms of bias, synthetic estimator improves registers. Improvements decrease for larger municipalities. This results are more evident for population 1 than for population 2
In terms of maximum bias the improvement is not so noticeable
Table 2 Average and (Maximum Relative Bias)% of the calibrated synthetic estimator and register count for Population 1 and 2 (design A by class of municipality size ) Less than 5,000 5,000 – 19,000 20,000 – 50,000 50,000 and more
Register Synthetic Register Synthetic Register Synthetic Register Synthetic
P1 0.952 (8.75)
0.327 (7.94)
1.063 (9.20)
0.261 (7.80)
0.642 (3.74)
0.224 (2.40)
0.515 (1.42)
0.189 (1.30)
P2 0.844
(14.12) 0.676
(13.29) 0.940
(14.92) 0.541
(13.97) 0.702 (6.30)
0.469 (5.27)
0.479 (2.43)
0.326 (1.81)
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Bias of synthetic estimator vs register counts Population 1 - design A by class of municipality size
Less than 5,000
5,000 – 19,000Bisectors delimit the zone where synthetic estimates are better than simple register counts in term of bias
20,000 – 49,000
50,000 and more
Synthetic estimator almost always improve registers in terms of bias However, the improvement does not seem so prominent
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Bias of synthetic estimator vs register count Population 2 - design A by class of municipality size
Same conclusion for POP2 with worst results for larger municipalities
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Results – MSE of synthetic and direct estimators
Direct estimator can be applied to self-representative
municipalities
It is reported in the table for the two classes of larger municipalities
On average, synthetic estimator overcome the direct, which
seems not useful even in sampled municipalies
MSE of synthetic estimates is much larger than Bias (in
Table 2)
Since in real cases this does not happen, this could be an evidence of a too high variability of pseudo-populations at level of EAs
Table 3 Average Relative Root MSE% and Maximum ARRMSE% of the calibrated Synthetic estimator for Population 1 and 2- design A by class of municipality size (for classes 3 and 4, are reported also calibrated direct estimate)
Less than 5,000 5,000 – 19,000 20,000 – 50,000 50,000 and more
Synthetic Synthetic Synthetic Direct Synthetic Direct
P1 0.658 (7.95)
1.198 (9.96)
0.966 (3.58)
2.793 (30.667)
0.999 (6.86)
2.609 (19.07)
P2 1.070
(13.30) 1.297
(13.99) 1.350 (5.54)
3.113 (42.88)
1.280 (3.43)
2.372 (15.26)
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Difference between synthetic and direct estimator in terms of MSE – municipalities larger than 50,000 inh.
The most part of municipalities larger than 50,000 inh.
show better Synthetic MSE (negative values)
Direct and Synthetic estimates are equivalent for larger
municipalities (>250,000 inh.), but only for in POP1
Sampling strategy for the dual-system correction of the under-coverage in the Register Supported 2011 Italian Population Census
Concluding Remarks Sampling strategy of next Italian Census PES is evaluated here through pseudo-population and simulated experiments Slight improvement in census counts from registers is obtained from synthetic estimates Though Census PES is required by EU regulation for evaluation purposes, our present results does not endorse the use of PES in order to correct Census counts Even not discussed here, direct estimation with calibration achieved suitable results at domain level both in term of Bias and Variance
Further developments Better definition of pseudo-populations with respect to coverage ratios between EAs Use of model estimation (EBLUP) is promising in our previous studies carried out in a simplified framework
Q2010 European Conference on Quality in Official Statistics - Helsinki, 4 - 6 May 2010