26
ESDS Using working with s urveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government Data)

ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

Embed Size (px)

Citation preview

Page 1: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

1

Further Applications of Linking and matching

Anthony Rafferty & Jo Wathan

Economic and Social Data Service(Government Data)

Page 2: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

2

Other linking applications:

• 1) Complex datasets: File linking within and over hierarchy across data files (‘across a database’)

• 2) Pooling to form repeated cross-sectional datasets

• 3) Combining panel survey waves • Final Practice: Two exercises – a) GHS

(simple) and b) Family Resources Survey (slightly more advanced)

Page 3: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

3

1) File Linking across a database

• In some datasets different info/ levels of hierarchy are stored in separate data files

• e.g.:– Family Resources Survey– British Crime Survey (BCS)– Family Expenditure Survey (FES)– British Household Panel Survey (BHPS)

• ..so using hierarchy requires linking and combining info from different files.

Page 4: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

4

Example: Family Resources Survey

• A continuous, cross-sectional, voluntary survey

• 28,000 Private Households in U.K.– Northern Ireland added to the Survey in 2002-03

• Fieldwork by consortium of ONS and NatCen

Page 5: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

5

A typical FRS Year Database………………………

Page 6: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

6

Page 7: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

7

Terminology

• The complete collection of files for a given year of a survey is often referred to as a ‘Database’

• Individual files are often referred to as ‘Tables’ (think of it as tables of micro-data)

• We will still refer to id variables as linking variables or ‘keys’

Page 8: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

8

Page 9: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

9

Main Levels of Hierarchy in the FRS

• Household level (HOUSEHOL)• Benefit Unit• Individual Level (ADULT, CHILD)• Specific Sub-levels (e.g. benefit claims,

mortgage policies)

Page 10: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

10

Page 11: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

11

Linking across Hierarchy in SPSS

• Sort both datasets by linking variable first• Merge/ Add variables command• V14 onwards allows you to simultaneously

open more than one dataset at a time• Definition: The key /’look up’ table

Page 12: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

12

Page 13: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

13

Linking Across Hierarchy in Stata

• Sort by linking variable (s)• Merge command• Creates variable “_merge”• : tabulate _merge

– 1=master dataset (that in memory) 2= using dataset; 3= case in both datasets

Page 14: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

14

2) Pooling repeated cross-sectional datasets

• Repeated Cross-sectional: Multiple measurement time points, but different people interviewed at each time point (so ‘independent samples’)

• Most ESDS Government Datasets

• Special Cases:– General Household Survey (pre- 05)– LFS (has 5 quarter panel element as well)

Page 15: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

15

Survey Repeated cross-sectional

Longitudinal element

LFS √ 1992 onwards

GHS √ 2005 onwards

FRS √

EFS √

TUS 2000 (2005 in Omnibus)

BSAS √ 1984-1986

Omnibus √ (modules)

APS √

NTS √

BCS √

HSE √

SEH √

Definitions• Cross-sectional: one

point in time

• Repeated cross-sectional: survey repeated (each year) on different samples

• True longitudinal:same people at multiple points in time

• Retrospective

Types of data

Page 16: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

16

Why Pool data over time?

• Increase Sample Sizes, reduced standard errors

• Examine trends over-time• Include year specific controls in regression

models (e.g. year dummy, regional unemployment rate)

Page 17: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

17

Change in vehicle ownership over time

Source: GHS

Page 18: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

18

Pooling Data

• Merge “add cases” in SPSS• Append Command in Stata

Page 19: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

19

3) Combining panel survey waves

• Same individuals interviewed at different waves

• Cross-sectional (i) and time-series (t) dimension

• Often stored as separate wave files: – E.g. British Household Panel Survey (BHPS)

• Same linking commands can be used to join the files

Page 20: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

20

Long and Wide Format

• Appendix E of workbook

Page 21: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

21

Exercises

• FRS Exercise: Using data from three levels of hierarchy across three data tables

• GHS Exercise: Pooling years of repeated cross-sectional surveys (easier)

Page 22: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

22

FRS Exercise

• What percentage of people in London, the East-Midlands, and West-Midlands are claiming state retirement pensions?

• Method: Need to Combine three files at different levels of hierarchy: HOUSEHOL ADULT BENEFITS

• Then run the cross-tab syntax at the bottom. If you do the data linking right, you get the right answer..

Page 23: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

23

General Household Survey (GHS) Exercise

• How does the age of the UK population vary by ethnicity? Estimate the average age of different ethnic groups as coded in the variable ethnigp2

• Pooling three years of GHS Data• Effects of pooling on sample size and

estimation

Page 24: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

24

Units of analysis• Fundamental to your research question!

– Who do you want to generalise to?– What are your cases?– What units are your population composed of?– Who is your research question applicable to?

• Some typical units– Individuals– Households– Schools– Businesses– Farms– Doctors– Wards

Page 25: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

25

Hierarchy in some key datasets

SurveyHhd hierarchy?

Levels Type

GHS Household,Family,Individual,Sub Individual

Flat file

LFS Household, Family,Individual

Flat files(QLFS/Hhd data)

FES Multiple, inc. household, person, family unit, benefit unit

Multiple files

FRS Household,Benefit Unit, Individual Multiple files

HSE Household, Individual(watch out for variable samples)

Flat files (1 all inds, 1 all resps)

BSAS Individual Flat file

BCS Individual,Incident (Hhd context only)

Multiple files

BHPS Household, Individual (& below) Multiple files

Household SARs

Household, Family, Individual Flat file

Page 26: ESDS Using working with surveys: v.10/07 1 Further Applications of Linking and matching Anthony Rafferty & Jo Wathan Economic and Social Data Service (Government

ESDS Using working with surveys: v.10/07

26

Quarterly Labour Force Survey

Springquarter

Summerquarter

Autumnquarter

Winterquarter

Spring +1Quarter

W1 12k 12k 12k 12k 12k

W2 12k 12k 12k 12k 12k

W3 12k 12k 12k 12k 12k

W4 12k 12k 12k 12k 12k

W5 12k 12k 12k 12k 12k

Purple indicates those cases who were in wave 1 in spring year 1 – i.e. they’re in wave 2 in summer etc

• Each household participates for 5 consecutive waves (every 3 months/quarter)• Total 60k households per quarter