1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi

1

Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data

Ranalli M.G. ~ Università di Perugia

D’Alo’ M., Di Consiglio L., Falorsi S., Solari F. ~ Istat

Pratesi M., Salvati N. ~ Università di Pisa

Q2008 ~ Rome, July 11th

2

OUTLINE OUTLINE

Italian Labour Force Survey

Standard small area estimators for LFS

Small area estimators that incorporate spatial information

Model based direct estimator (MBDE)

Semi-parametric models (based on p-splines)

Experimental study

Analysis of results

Final remarks

3

Labour Force Survey description

Labour Force Survey (LFS) is a quarterly two stage survey with partial overlap of sampling units according to a rotation scheme of type (2-2-2).

In each province the municipalities are classified as Self-Representing Areas (SRAs) and the Non Self-Representing Areas (NSRAs).

From each SRAs a sample of households is selected.

In NSRAs the sample is based on a stratified two stage sampling design. The municipalities are the primary sampling units (PSUs), while the households are the Secondary Sampling Units (SSUs).

For each quarterly sample about 1350 municipalities and 200,000 individuals are involved.

4

■Since 2000, ISTAT disseminates yearly LFS estimates of employed and unemployed counts related to the 784 Local Labour Market Areas (LLMAs).

■LLMAs are unplanned domains obtained as clusters of municipalities cutting across provinces which are the LFS finest planned domains.

■The direct estimates are unstable due to very small LLMA sample sizes (more than 100 LLMAs have zero sample size). SAE methods are necessary.

■Until 2003, a design based composite type estimator was adopted.

■Starting from 2004, after the redesign of LFS sampling strategy, a unit-level EBLUP estimator with spatially autocorrelated random area effects has been introduced.

Small area estimation on LFS

5

Standard small area estimators – design based

The GREG estimator is based on the standard linear model:

id

T

ididy βx 2)( var 0, )E( idid

wTD

dddd YY βXX ˆˆˆˆ DGREG

Direct and GREG estimator

and can be expressed as an adjustment of the direct estimator

for differences between the sample and population area means of covariates

The direct estimator is given by

dsi

iid NywYd

Dˆ

6

Unit level Synthetic and EBLUP

Standard small area estimators – model based

The Synthetic estimator assumes a standard linear mixed model with unit-specific auxiliary variables, random area-specific effects and errors independently normally distributed

iddTidid euy βx

),0( ~ ),,0( ~ 22

eidud NiideNiidu

βX ˆˆ SI TddY

and is given by

The EBLUP estimator assumes the same model but is given by

dTdd uY ˆˆˆ EB βX d

Uii Ny

d

ˆ

7

Enhanced small area estimators

1. Unit level EBLUP with spatial correlation of area effects

The matrix A depends on the distances among the areas and on an unknown parameter connected to the spatial correlation coefficient among the areas.

),0(~ ),,0(~ 22

Neu MNMN IeAu

1'

'',

exp1dddist

a ddddA

otherwise1

if0 '

'dd

dd

The EBLUP-S estimator is based on the following unit level linear mixed model:

iddTidid euy βx

8


2. Model Based Direct Estimator (Chambers & Chandra, 2006)

dd si

mi

sii

mid wywY MBDˆ

si

imi ywY where the weights are such that is the (E)BLUP of

The MBD estimator is based on a unit level linear mixed model and is given by

Ui

iyY under the model (Royall, 1976).

Calibrated with respect to the total of x.

Reduces bias vs EBLUP

Does not allow estimation for non-sampled areas

Less efficient than EBLUP

9

In the literature there are many nonparametric regression methods (kernel, local polynomial, wavelets…) BUT difficult to incorporate in a Small area model

Methods based on penalized splines (Eilers e Marx, 1996; Ruppert et al., 2003) can be estimated by means of mixed models -> promising candidate for SAE methods


3. Nonparametric EBLUP (Opsomer et al., 2008)

iddidididTidid euzzfzfy ),()( 321βx

),0( ~ ),,0( ~ 22

eidud NiideNiidu

Great Flexibility in definition of model Estimable with existing software using REML

Hard to estimate efficiency and test for terms significance (via bootstrap?)

10

LFS empirical study

The simulation study on LFS has been carried out to estimate the unemployment rate at LLMA level

500 two-stage LFS sample have been drawn from 2001 census data set.

The performances of the methods have been evaluated for the estimation of the unemployment rate in the 127 LLMAs belonging to the geographical area “Center of Italy ”.

GREG, Synthetic, EBLUP small area estimators have been applied considering two different sets of auxiliary variables

Case A - LFS real covariates = sex by 14 age classes + employment indicator at previous census;

Case B – LFS real covariates + geographic coordinates (latitude and longitude of the municipality the sampling unit belongs to).

11

■ Spatial EBLUP: A spatial correlation in the variance matrix of the random effects has been considered (EBLUP SP) + Case A covariates

■ MBD: Model based direct estimation is performed on sampled LLMAs, while synthetic estimators based on unit level linear mixed model is considered for non sampled LLMAs (Case A covariates)

■ Nonparametric EBLUP: two semiparametric representations based on penalized splines have been applied (fitted as additional random effects):

geographical coordinates of the municipality (EBLUP-SPLINE SP): this allows for a finer representation of the spatial component vs EBLUP SP (at municipality level instead of LLMA).

age (EBLUP-SPLINE AGE & EBLUP SP-SPLINE AGE)

Enhanced Small area estimators

12

D

1ddRB

D

1AARB

D

1ddRRMSE

D

1ARRMSE

d RBmaxd

MARB

Average Absolute RB:

Average RRMSE:

Maximum Absolute RB:

Maximum RRMSE: dRRMSEmaxd

MRRMSE

Evaluation Criteria

% Relative Bias:

% Relative Root Mean Squared Error:

100 ˆ1

RB1

d

R

r d

drd

Y

YY

R

100 ˆ1

RRMSE1

2

d

R

r d

drd

YYY

R

13

ESTIMATOR AARB

ARRMSE MARB

MRRMSE

DIRECT 2.9 51.7 20.4 90.7

GREG A 7.2 40.2 83.3 93.8

GREG B 6.9 40.0 71.5 82.8

SYNTH A 14.0 15.8 93.0 93.5

SYNTH B 12.4 16.4 79.7 81.0

EBLUP A 13.2 16.2 92.5 93.1

EBLUP B 11.9 16.7 79.5 80.7

EBLUP SP 12.7 16.3 90.9 91.6

MBD 8.8 35.3 86.3 92.6

EBLUP-SPLINE SP 12.1 16.5 91.1 92.2

EBLUP-SPLINE AGE 13.2 16.5 89.8 90.5

EBLUP SP-SPLINE AGE 12.2 17.3 90.3 90.9

Results – A: LFS covariates; B = A + geog. coord. mun.

14

Analysis of results

Area level estimators (not shown here) perform a little better in terms of Bias but much worse in terms of MSE.

The results of GREG, SYNTH and EBLUB in case B, when geographical information is considered in the fixed term, display better performances in terms of bias.

In terms of MSE standard estimators in case A outperform standard estimators in case B if the ARRMSE is considered as overall evaluation criteria, while better results are obtained in case B if MRRMSE is considered

15

EBLUP SP can be compared with the unit level EBLUP with geographical information included as covariates and the EBLUP-SPLINE SP.

o EBLUP SP show better performances in terms of MSE, while the unit level EBLUP outperform the other estimators in terms of bias.

o The EBLUP-SPLINE SP displays performances in between the other estimators.

Analysis of results

EBLUP-SPLINE AGE performs similarly to the unit level EBLUP in Case A

o The use of the age in a nonparametric way is an alternative use of auxiliary information. With respect to case A the model is more parsimonious.

As it was expected MBDE shows better results in term of bias and performs poorly in term of MSE than other SAE methods

The use of autocorrelation structure together with the spline on the variable age doesn’t improve the performances

16

Final remarks

Sensitivity to smoothing parameters’ choice in the splines approach has to be investigated.

The introduction of the sampling weighs should be considered to try to achieve benchmarking with direct estimates produced at regional level

The response in a 0-1 variable: a logistic mixed model is currently being investigated

The model group is a small portion of Italy (center); hence the area specific effects are smaller than they could be if an overall model was considered for all the country: the introduction of geographical information should be analyzed considering a larger model level group

Documents

1 Enhancing Small Area Estimation Methods Applications to Istat’s Survey Data Ranalli M.G. ~ Università di Perugia D’Alo’ M., Di Consiglio L., Falorsi