12
Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Embed Size (px)

Citation preview

Page 1: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Collecting the household data as a sub-sample.Rome May 2014Jonas Kylov Gielfeldt

Page 2: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

The broader frame – why are we collecting household data?

• Are households the “natural” unit for collecting LFS variables?

• Not very often! (jobless households and…) • Are the household unit better for data

collection? • Sometimes yes! With CAPI-mode household

is very sensible, but not for CAWI and CATI-mode.

Page 3: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

• All NSI’s work in an environment were resources are sparse(r).

• Is it justifiable to use a lot of resources on collecting household data if

a) there is no gain in terms of collection mode b) there is no strict substantial reason for collecting the variables on households instead of on individuals?

The economy of it all…

3

Page 4: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

The Danish case

4

• In Denmark we collect the core-LFS through CATI-mode – this model is better suited for individuals as the unit.

• We are obliged to collect the household data, this is done in a combination of CAWI/CATI

• Since collecting on household does not fit our collection-mode and we do not see the substantial reason for collecting LFS-variables we use a sub-sample to minimize costs.

Page 5: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

• The core-LFS gross sample – 40.000 persons pr. quarter• The number of respondents pr. quarter – 22.000

persons• The gross sub-sample – 11.000 persons (not including

Core-LFS respondents)• The number of respondents – 6.000 persons

The core-sample and the sub-sample

5

Page 6: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Why to use a sub-sample

6

• If the NSI primarily uses CATI - collecting the whole household through this mode will increase costs significantly.

• Collecting household as the core-sample quadruples the costs! Otherwise diminish the sample size, risking increased bias/cluster effect (household members are often equal)

Costs saved by sub-sampling

6000 respondents quadrupled Euro DKR (7,45)

Number of respondents 24.000

Current price in average for ca. 6000 respondents on HH 29.600 220.520

Price quadrupled 118.400 882.080

Difference (saved costs) 88.800 661.560

Page 7: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Different sample sizes – means different weighting models

7

• The weighting model of the Core-LFS

Variables Groupings

-age11 11 grp

Information is crossed -sex 2 grp

-region 5 grp

Information is crossed

-age6 6 grp

-education 3 grp

-socio-economic status 8 grp

-number of children in the

household 4 grp

-citizenship 4 grp

-registered as unemployed 12 grp

-brutto income 4 grp

-moved 2 grp

Page 8: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

• Quite a big non-response in the Danish LFS, but a lot of high quality registers.

• This is used as auxiliary information in a rather complex weighting model.

• The weighting model is optimized for the number of individuals in the population and especially wants to control bias on fx labour market status, education etc.

On the core-LFS weighting model

8

Page 9: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

The weighting model of the household-sample

9

Variables Groupings-age 3 grp

Information is crossed -sex 2 grp -family type 6 grp

-size of household 4 grp

A person from Household has moved 2 grp

-Only danes in household or

mixed household 2 grp

-average age of the household 3 grp

-brutto household income 4 grp

Page 10: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

On the household weighting model

• This weighting model is optimized for both the number of individuals in the population, but also the total number of households

• This means that new variables must be added as auxiliary information (family type, size of household etc.)

• At the same time – smaller smaple size limits the amount of auxiliary information

Page 11: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

Differences in estimates – the example on education

• Education is added as auxiliary information in the core-LFS but not in the household

• This means differences in estimatesHighest level of education completed (25-64 years) - % 2011 Core-LFS 2011 HH-LFS 2012 Core-LFS 2012 HH-LFS 2013 Core-LFS 2013 HH-LFS

-At most lower secondary level 23,1 19,4 22,1 18,3 21,7 19,1

-Upper secondary level 43,2 41,6 43,1 41 42,8 40,6

-Third level 33,7 39 34,8 40,6 35,4 40,2

2011 Core-LFS 2011 HH-LFS 2012 Core-LFS 2012 HH-LFS 2013 Core-LFS 2013 HH-LFS- min. ISCED3c long / upper secondary level (20-24 years) - % 70,0 74,9 72,0 74,9 71,8 76,1-Early leavers from education and training (18-24 years) - % 9,7 7,9 9,1 8,0 8,1 6,9

Page 12: Collecting the household data as a sub-sample. Rome May 2014 Jonas Kylov Gielfeldt

The auxiliary information on education

• The difference between Core and household-LFS shows that the auxiliary information helps dealing with the overrepresentation of higher educated.

• But it is not possible to use this information in the household model, since it would make it too complex.

• The household model does not handle the bias at all