10
Recent advances in small area methodologies for poverty estimation * Caterina Giusti 1 , Monica Pratesi 2 , Nikos Tzavidis 3 , Nicola Salvati 4 1 University of Pisa, e-mail: [email protected] 2 University of Pisa, e-mail: [email protected] 3 University of Southampton, e-mail: [email protected] 4 University of Pisa, e-mail: [email protected] Abstract Until very recently the practice of poverty mapping has been dominated by the World Bank method proposed by Elbers et al. (2003). More recently researchers in small area estimation have intensively studied the World Bank method and have proposed small area models for poverty mapping. Two such recent methods are the Empirical Best Prediction (EBP) approach proposed by Molina and Rao (2010) and the M- quantile approach (Chambers and Tzavidis, 2006 and Tzavidis et al. 2008; 2010). This renewed interest extends beyond the academic community with National Statistical Offices around the world showing interest in poverty estimation methodologies. This interest is also reflected in the major investment that the European Commission has made by funding two research programmes on small area estimation of indicators under the auspices of the 7th Framework (SAMPLE http://www.sample-project.eu/ and AMELI http://www.ameli.surveystatistics.net). The aim of this paper is to present an overview of the recently proposed methodologies for small area estimation of poverty indicators that we have developed under the M-quantile approach. We apply the alternative methodologies to real data from the European Survey of Income and Living Conditions in Italy for estimating the incidence of poverty and the poverty gap in 29 Italian provinces situated in three regions, namely Tuscany, Lombardia and Campania. Keywords: poverty indicators; M-quantile models; EU-SILC survey. 1. Introduction The estimation and dissemination of poverty, inequality and life condition indicators all over the European Union is nowadays one topic of primary interest. Such indicators should assist in monitoring living conditions and in guiding the implementation of policies that aim at improving the living conditions in the EU Member States. In particular, the estimation of the average household equivalised income and of the corresponding quantiles should be accompanied by the estimation of poverty indicators such as the at-risk-of-poverty rate (Head Count Ratio – HCR) and the poverty gap (PG). The HCR indicator is a widely used measure of poverty. The * Work supported by the project SAMPLE “Small Area Methodology for Poverty and Living Condition Estimates” awarded by the European Commission in the 7thFP.

Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: [email protected] Abstract Until very recently the practice of poverty mapping has been

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

Recent advances in small area methodologies for poverty estimation

*

Caterina Giusti1, Monica Pratesi2, Nikos Tzavidis3, Nicola Salvati4 1University of Pisa, e-mail: [email protected]

2University of Pisa, e-mail: [email protected] 3University of Southampton, e-mail: [email protected]

4University of Pisa, e-mail: [email protected]

Abstract Until very recently the practice of poverty mapping has been dominated by the World Bank method proposed by Elbers et al. (2003). More recently researchers in small area estimation have intensively studied the World Bank method and have proposed small area models for poverty mapping. Two such recent methods are the Empirical Best Prediction (EBP) approach proposed by Molina and Rao (2010) and the M-quantile approach (Chambers and Tzavidis, 2006 and Tzavidis et al. 2008; 2010). This renewed interest extends beyond the academic community with National Statistical Offices around the world showing interest in poverty estimation methodologies. This interest is also reflected in the major investment that the European Commission has made by funding two research programmes on small area estimation of indicators under the auspices of the 7th Framework (SAMPLE http://www.sample-project.eu/ and AMELI http://www.ameli.surveystatistics.net). The aim of this paper is to present an overview of the recently proposed methodologies for small area estimation of poverty indicators that we have developed under the M-quantile approach. We apply the alternative methodologies to real data from the European Survey of Income and Living Conditions in Italy for estimating the incidence of poverty and the poverty gap in 29 Italian provinces situated in three regions, namely Tuscany, Lombardia and Campania. Keywords: poverty indicators; M-quantile models; EU-SILC survey. 1. Introduction

The estimation and dissemination of poverty, inequality and life condition indicators all over the European Union is nowadays one topic of primary interest. Such indicators should assist in monitoring living conditions and in guiding the implementation of policies that aim at improving the living conditions in the EU Member States.

In particular, the estimation of the average household equivalised income and of the corresponding quantiles should be accompanied by the estimation of poverty indicators such as the at-risk-of-poverty rate (Head Count Ratio – HCR) and the poverty gap (PG). The HCR indicator is a widely used measure of poverty. The

* Work supported by the project SAMPLE “Small Area Methodology for Poverty and Living Condition Estimates” awarded by the European Commission in the 7thFP.

Page 2: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

popularity of this indicator is due to its ease of construction and interpretation. At the same time this indicator also assumes that all poor household/individuals are in the same situation. For example, the easiest way of reducing the headcount index is by targeting benefits to people just below the poverty line because they are the ones who are cheapest to move across the line. Hence, policies based on the headcount index might be sub-optimal. For this reason we also obtain estimates of the PG indicator. The PG can be interpreted as the average shortfall of poor people. It shows how much would have to be transferred to the poor to bring their expenditure up to the poverty line.

The estimation of these target parameters at the small area level using data coming from major sample surveys, such as the EU-SILC survey, can be performed by a variety of methods. Since M-quantile models (Chambers and Tzavidis, 2006 and Tzavidis et al. 2008; 2010) do not impose strong distributional assumptions and are outlier robust, the use of these models for poverty estimation may protect against departures from assumptions of the traditional unit-level nested error regression model for small area estimation.

The aim of this paper is to estimate the average of the equivalised household income, the HCR and the PG indicators for the Provinces of three Italian regions, Lombardia, in the North of the Country, Toscana, in Central Italy, and Campania, in Southern Italy by using the M-quantile methods. The choice of these three regions, out of the 20 existing in Italy, is motivated by the geographical differences characterizing the Italian territory: indeed, it has been stated that the historical and geographical differences between Italian Regions and Municipalities cause an internal variability for any result, which is often comparable to that of the EU as a whole (Brandolini and Saraceno, 2007)

The structure of the paper is as follows. Section 2 describes the models and the estimators. In section 3 we present and discuss the results of the estimates of interest. The discussion of the results, the final remarks and the envisioning of the future research lines conclude the paper in section 4. 2. Theory Let xi be a known vector of p auxiliary variables for each population unit j in small area i and assume that information for the variable of interest y is available only on the sample. Chambers and Tzavidis (2006) have developed an approach to small area estimation based on the quantiles of the conditional distribution of the variable of study (y) given the covariates (Breckling and Chambers, 1988). The qth M-quantile

Q

q(x;! ) of the conditional distribution of y given x satisfies:

Qq (x ij ;! ) = x ij

T"!

(q) (2.1) where ! denotes the influence function associated with the M-quantile. For specified q and continuous ! , an estimate )(ˆ q!" of

!"

(q) is obtained via an iterative weighted least squares algorithm. When (2.1) holds the bias adjusted M-quantile predictor of

mj , the mean of the variable of interest in area i, is:

ˆ m iMQ /CD = Ni

!1 y j + x jT ˆ " # ( ˆ $ i) +

Ni ! ni

ni

(y j ! ˆ y j )j%si

&j%ri

&j%si

&'

( ) )

*

+ , , (2.2)

Page 3: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

where si denotes the ni sampled units in area i, ri denotes the remaining Ni ! ni units in the area, )ˆ(ˆˆ i

Tjjy !"#x= is a linear combination of the auxiliary variables and

i!̂ is an estimate of the average value of the M-quantile coefficients of the units in area i (Tzavidis and Chambers, 2007). The MSE of the estimator (2.2) can be estimated analytically as suggested in Chambers et al. (2007).

Although small area averages are widely used in small area applications, relying only on averages may not provide a very informative picture about the distribution of wealth in a small area. In economic applications for example, estimates of average income may not provide an accurate picture of the area wealth due to the high within area inequality. For this reasons, it is of interest on poverty analysis to focus on the estimation of poverty indicators such as the HCR F0 and PG F1 (see Foster et al. (1984)). Denoting by t the poverty line, the FGT poverty measures for a small area i are defined as

F!i =t " yijt

#

$ %

&

' (

!

I y ij ) t( ) (2.3)

Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the Poverty Gap.

In more details, estimation of these indicators under the M-quantile approach correspond to the problem of estimating the out of sample component in the expression:

F!i = Ni"1 F! i + F!i

j#ri

$j#si

$%

& ' '

(

) * * . (2.4)

One approach to estimating

F!i is by using a smearing-type estimator of the distribution function such as the Chambers-Dunstan estimator. In this case, an estimator

ˆ F !iMQ of

F!i is

ˆ F !i = Ni"1 I(y j # t) + ni

"1 I( ˆ y k + (y j " ˆ y j) # t)j$si

%k$ri

%j$si

%&

' ( (

)

* + +

. (2.5)

In more details, the procedure to estimate this quantity is: 1. Fit the M-quantile small area model (2.1) using the raw

y s sample values and obtain estimates of

! and

! i; 2. Draw an out of sample vector using

yijr* = x ijr ˆ ! ( ˆ " i) + eijr

* , where

eijr* is a vector

of size

Ni ! ni drawn from the Empirical Distribution Function (EDF) of the estimated M-quantile regression residuals or from a smooth version of this distribution and

ˆ ! ,

ˆ ! i are obtained from the previous step; 3. Repeat the process H times. Each time combine the sample data and out of

sample data for estimating the target using

ˆ F !iMQ = Ni

"1 I(y j # t) + I(y j* # t)

j$ri

%j$si

%&

' ( (

)

* + + ;

4. Average the results over H simulations.

Page 4: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

A mean squared error of the M-quantile estimates of the poverty indicators obtained with this procedure can be computed using the non-parametric bootstrap approach described in Tzavidis et al. (2010).

Recently, Molina and Rao (2009) proposed an Empirical Best Prediction (EBP) approach to poverty estimation, which attempts to minimize the effect of potential outliers in the data by modelling a logarithmic transformation of the outcome variable. Strictly speaking the EBP is not designed to be an outlier robust approach, however, the fact that this method uses the log-transformed income/consumption, offers to some extend protection against outliers. Although this transformation makes the Gaussian assumptions of the random effects model, employed by EBP, more plausible, this assumption may still not hold with real data. 3. Application In this section we present the results for the average equivalised household income, HCR, PG and for the corresponding RMSEs estimated using the M-quantile models approach. The estimates refer to the Provinces of three Italian Regions, Lombardia, in Northern Italy, Toscana, in Central Italy, and Campania, in Southern Italy. The choice of these three regions, out of the 20 existing regions in Italy, is motived by the geographical differences characterizing the Italian territory. In particular, the aim is to investigated the so-called “north-south” divide characterizing the Italian territory, since each of the three regions can be considered as representative of the corresponding geographical area of Italy (Northern, Central and Southern/Insular Italy). Estimation of the quartiles of the equivalised household income in the same Provinces has been performed as well, but it is not shown here.

The working M-quantile small area model uses data coming from the EU-SILC survey 2007 for the sampled households in the three Regions, and data coming from the Population Census 2001 for all the households living in the Regions. In the working model the equivalised household income is the outcome variable. The explanatory variables, common in the EU-SILC survey and in the Census micro-data, include the ownership status, the age of the head of the household, the employment status of the head of the household, the gender of the head of the household, the years of education of the head of the household and the household size.

The results (point and MSE estimation) from the application of the M-quantile model for the estimation of the average equivalised household income are presented in Figure 1, where a higher colour correspond to higher estimates, and in Tables 1, 2 and 3. The estimates of the average income for each province show that there are intra-regional differences. In Lombardia, the provinces of Milano, Pavia and Varese have the highest average equivalised household income while the provinces of Sondrio, Cremona and Brescia have lower average income. Such intra-regional variability is also present in Toscana. The provinces of Siena and Firenze appear to be as wealthy as the wealthier provinces of Lombardia whereas the provinces of Lucca and Massa-Cararra have lower average income. These results indicate that Toscana and Lombardia have similar levels of average equivalised household income although one may say that Lombardia is somewhat wealthier. Looking now at the results of the southern region of Campania, it is clear that provinces in this region have smaller average equivalised household income than provinces in Lombardia and Toscana. Compared to Caserta and Benevento, the provinces of Avellino, Salerno and Napoli have higher average

Page 5: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

income although the intra-regional differences in Campania are not so pronounced. Figure 1. Estimated Average (Root Mean Squared Error) of Household Equivalised Income for Lombardia, Toscana and Campania Provinces.

Table 1. Estimated average of household equivalised income (MEAN) and estimated Root Mean Squared Error of the Mean estimator (RMSE) for the Lombardia Provinces.

Province MEAN RMSE VARESE 21091.49 1305.98 COMO 18578.33 1137.01 SONDRIO 16307.16 1668.92 MILANO 20798.63 497.68 BERGAMO 18323.07 820.61 BRESCIA 16326.21 581.47 PAVIA 21081.25 4080.17 CREMONA 16774.18 883.69 MANTOVA 17774.90 677.24 LECCO 19497.61 1131.62 LODI 17052.58 965.49

Page 6: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

Table 2. Estimated average of household equivalised income (MEAN) and estimated Root Mean Squared Error of the Mean estimator (RMSE) for the Toscana Provinces.

Province MEAN RMSE MASSA CARRARA 14128.26 664.84 LUCCA 15867.69 766.80 PISTOIA 18980.76 1119.33 FIRENZE 19184.92 498.35 LIVORNO 17875.01 919.41 PISA 18550.16 876.37 AREZZO 18665.97 1014.42 SIENA 20228.98 1113.91 GROSSETO 16152.47 1151.84 PRATO 17702.87 632.74

Table 3. Estimated average of household equivalised income (MEAN) and estimated Root Mean Squared Error of the Mean estimator (RMSE) for the Campania Provinces.

Province MEAN RMSE CASERTA 11685.74 574.89 BENEVENTO 11312.89 1033.79 NAPOLI 12661.84 291.73 AVELLINO 12873.13 979.46 SALERNO 12715.91 502.22

The results for the estimation of the HCR and PG (point and MSE estimation) from the application of the M-quantile model are mapped in Figures 2, 3 and 4, where a higher colour correspond to higher estimates of poverty, and also presented in Tables 4, 5 and 6.

The HCR in Lombardia ranges from 0.172 (Milano) to 0.27 (Sondrio) while the PG for provinces in the same region ranges from 0.073 (Milano) to 0.124 (Sondrio).

For provinces in Toscana the HCR ranges from 0.28 (Massa-Carrara) to 0.161 (Siena) and the PG in the same region from 0.117 (Massa-Carrara) to 0.06 (Siena). Finally, for provinces in the region of Campania the HCR ranges from 0.238 (Salerno) to 0.280 (Benevento) and the PG from 0.127 (Salerno) to 0.161 (Caserta). The picture that emerges is as expected i.e. Campania is a region that has consistently higher poverty than Toscana and Lombardia. The use of the PG indicator significantly enhances the picture of wealth in the different regions.

Equally noticeable are some aspects of the comparison between the regions of Toscana and Lombardia. The analysis of the average household income indicated that Lombardia is somewhat wealthier than Toscana. However, looking at the estimates of HCR and PG a different picture emerges. Overall, provinces in Toscana have lower HCR and PG than provinces in Lombardia. For example, Pavia, one of the wealthiest provinces, in terms of average income and income distribution, in Lombardia appears to have higher poverty than a number of provinces in Toscana such as Siena and Florence. Of course, in these comparisons one must take into account the precision of

Page 7: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

the estimates and the fact that each region has its own poverty line (computed as 0.6 times the median of the equivalised household income of the region). Nevertheless, these results indicate that inequalities in Lombardia may be more pronounced than inequalities in Toscana.

These results illustrate that for constructing a good picture of the wealth locally, we must produce a wide range of small area statistics. Finally, the present work demonstrates how small area estimation and inference for such statistics can be implemented in practice. Figure 2. Estimated HCR and PG (Root Mean Squared Error) for Lombardia Provinces.

Table 4. Estimated HCR and PG and corresponding bootstrapped RMSE for the Provinces of the Lombardia Region.

Province HCR RMSE HCR PG RMSE PG VARESE 0.203 0.014 0.087 0.010 COMO 0.222 0.018 0.096 0.013 SONDRIO 0.270 0.037 0.124 0.028 MILANO 0.172 0.010 0.073 0.007 BERGAMO 0.224 0.015 0.097 0.011 BRESCIA 0.250 0.018 0.113 0.013 PAVIA 0.223 0.027 0.098 0.019 CREMONA 0.236 0.025 0.105 0.018 MANTOVA 0.226 0.015 0.100 0.011 LECCO 0.196 0.021 0.083 0.015 LODI 0.215 0.025 0.095 0.018

Page 8: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

Figure 3. Estimated HCR and PG (Root Mean Squared Error) for Toscana Provinces.

Table 5. Estimated HCR and PG and corresponding bootstrapped RMSE for the Provinces of the Toscana Region.

Province HCR RMSE HCR PG RMSE PG MASSA CARRARA 0.280 0.039 0.117 0.022 LUCCA 0.239 0.026 0.094 0.015 PISTOIA 0.195 0.019 0.073 0.011 FIRENZE 0.166 0.012 0.061 0.007 LIVORNO 0.193 0.020 0.075 0.012 PISA 0.175 0.018 0.065 0.010 AREZZO 0.182 0.018 0.068 0.010 SIENA 0.161 0.023 0.060 0.012 GROSSETO 0.231 0.029 0.093 0.019 PRATO 0.172 0.021 0.062 0.011 Figure 4. Estimated HCR and PG (Root Mean Squared Error) for Campania Provinces.

Page 9: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

Table 6. Estimated HCR and PG and corresponding bootstrapped RMSE for the Provinces of the Campania Region.

Province HCR RMSE HCR PG RMSE PG CASERTA 0.277 0.018 0.161 0.017 BENEVENTO 0.280 0.033 0.153 0.026 NAPOLI 0.266 0.010 0.158 0.011 AVELLINO 0.245 0.024 0.131 0.020 SALERNO 0.238 0.016 0.127 0.014 4. Conclusions and future perspectives In this paper we present the M-quantile small area methodologies for estimating small area means and poverty indicators. Our results suggest that the presence of outliers can significantly impact upon the small area estimates, recommending that the use of outlier robust small area methodologies may be needed in real data applications.

In Section 3 we presented a case study using the Italian EU-SILC data. The aim here is to estimate the mean income, the HCR and PG and the corresponding MSE in each province of the Toscana, Lombardia and Campania regions. The results show a clear gap between the levels of poverty in the Northern region of Lombardia and the Central region of Toscana compared to levels of poverty in provinces of the southern region of Campania. In addition, the results also allow us to examine the within region variability in the levels of poverty and income. As an overall comment we can say that the outlier robust small area methodologies represent a useful way of deriving important small area estimates alongside their corresponding measures of variability, even when the model assumptions are not met.

As a concluding remark, future work can focus on the application of the M-quantile approach for obtaining small area estimates at even lower geographical levels such as Italian municipalities. This geographical level represents an important target for the implementation of policies and the use of small area methods may give even more significant gains due to the smaller sample sizes at this level of geography. References

Battese, G., Harter, R. and Fuller, W. (1988). An Error-Components Model for

Prediction of County Crop Areas using Survey and Satellite Data. Journal of the American Statistical Association, 83, 28-36.

Bradolini A., Saraceno C. (2007), Introduzione, in Povertà e Benessere. Una geografia delle disuguaglianze in Italia, a cura di Bradolini A. and Saraceno C. Il Mulino.

Breckling J. and Chambers R. (1988). M-quantiles. Biometrika, 75, 761-71. Chambers R., Tzavidis N. (2006) M-quantile models for small area estimation,

Biometrika, 93, 255-268. Chambers, R., Chandra, H. and Tzavidis, N. (2007). On Robust Mean Squared Error

Estimation for Linear Predictors for Domains. CCSR Working paper 2007-10. Cathie Marsh Centre for Census and Survey Research, University of Manchester.

Chambers R., Dorfman A.H. (2003). Transformed Variables in Survey Sampling. S3RI Methodology Working Papers, M03/21, Southampton Statistical Sciences Research Institute, University of Southampton, UK.

Page 10: Recent advances in small area methodologies for poverty ......4University of Pisa, e-mail: salvati@ec.unipi.it Abstract Until very recently the practice of poverty mapping has been

Cheli, B. and Lemmi, A. (1995). A Totally Fuzzy and Relative Approach to the Multidimensional Analysis of Poverty. Economic Notes, 24, 115-134.

Eliers, P. and Marx, B. (1996). Flexible Smoothing using B-splines and Penalized Likelihood (with comments and rejoinder). Statistical Science, 11, 1200-1224.

European Commission (2006). Description of SILC Database Variables: Cross-sectional and Longitudinal. Version 2004.1 from 25-05-06. European Commission – Eurostat.

Foster J., Greer J., Thorbecke E. (1984). A class of decomposable poverty measures. Econometrica, 52, 761-766.

Kackar R.N. and Harville D.A. (1984), Approximations for standard errors of estimators for fixed and random effects in mixed models. Journal of the American Statistical Association, 79, 853-862.

Opsomer J.D., Claeskens G., Ranalli M.G., Kauermann G., Breidt F. J. (2008) Nonparametric small area estimation using penalized spline regression, Journal of the Royal Statistical Society: Series B, 70, 265-286.

Prasad, N.G.N. and Rao, J.N.K. (1990). The Estimation of the Mean Squared Error of Small Area Estimators. Journal of the American Statistical Association, 85, 163-171.

Pratesi M., Ranalli M.G., Salvati N. (2008) Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US, Environmetrics, 19, 687-701.

Rao, J.N.K. (2003). Small Area Estimation. New York: Wiley. Ruppert, D., Wand, M.P. and Carroll, R. (2003). Semiparametric Regression.

Cambridge University Press, Cambridge, New York. Särndal C.E., Swensson B., Wretman J.H. (1992). Model Assisted Survey Sampling.

New York, Springer-Verlag. Tzavidis N., Salvati N., Pratesi M., Chambers R. (2008). M-quantile Models with

Application to Poverty Mapping. Statistical Methods & Applications, 17, 393-411. Tzavidis N., Marchetti S., Chambers R. (2010). Robust estimation of small area

means and quantiles. Australian and New Zeland Journal of Statistics, 52(2), 167-186.