Upload
jada-mcdermott
View
220
Download
5
Tags:
Embed Size (px)
Citation preview
ESDS Using working with surveys: v.10/07
1
Further Applications of Linking and matching
Anthony Rafferty & Jo Wathan
Economic and Social Data Service(Government Data)
ESDS Using working with surveys: v.10/07
2
Other linking applications:
• 1) Complex datasets: File linking within and over hierarchy across data files (‘across a database’)
• 2) Pooling to form repeated cross-sectional datasets
• 3) Combining panel survey waves • Final Practice: Two exercises – a) GHS
(simple) and b) Family Resources Survey (slightly more advanced)
ESDS Using working with surveys: v.10/07
3
1) File Linking across a database
• In some datasets different info/ levels of hierarchy are stored in separate data files
• e.g.:– Family Resources Survey– British Crime Survey (BCS)– Family Expenditure Survey (FES)– British Household Panel Survey (BHPS)
• ..so using hierarchy requires linking and combining info from different files.
ESDS Using working with surveys: v.10/07
4
Example: Family Resources Survey
• A continuous, cross-sectional, voluntary survey
• 28,000 Private Households in U.K.– Northern Ireland added to the Survey in 2002-03
• Fieldwork by consortium of ONS and NatCen
ESDS Using working with surveys: v.10/07
5
A typical FRS Year Database………………………
ESDS Using working with surveys: v.10/07
6
ESDS Using working with surveys: v.10/07
7
Terminology
• The complete collection of files for a given year of a survey is often referred to as a ‘Database’
• Individual files are often referred to as ‘Tables’ (think of it as tables of micro-data)
• We will still refer to id variables as linking variables or ‘keys’
ESDS Using working with surveys: v.10/07
8
ESDS Using working with surveys: v.10/07
9
Main Levels of Hierarchy in the FRS
• Household level (HOUSEHOL)• Benefit Unit• Individual Level (ADULT, CHILD)• Specific Sub-levels (e.g. benefit claims,
mortgage policies)
ESDS Using working with surveys: v.10/07
10
ESDS Using working with surveys: v.10/07
11
Linking across Hierarchy in SPSS
• Sort both datasets by linking variable first• Merge/ Add variables command• V14 onwards allows you to simultaneously
open more than one dataset at a time• Definition: The key /’look up’ table
ESDS Using working with surveys: v.10/07
12
ESDS Using working with surveys: v.10/07
13
Linking Across Hierarchy in Stata
• Sort by linking variable (s)• Merge command• Creates variable “_merge”• : tabulate _merge
– 1=master dataset (that in memory) 2= using dataset; 3= case in both datasets
ESDS Using working with surveys: v.10/07
14
2) Pooling repeated cross-sectional datasets
• Repeated Cross-sectional: Multiple measurement time points, but different people interviewed at each time point (so ‘independent samples’)
• Most ESDS Government Datasets
• Special Cases:– General Household Survey (pre- 05)– LFS (has 5 quarter panel element as well)
ESDS Using working with surveys: v.10/07
15
Survey Repeated cross-sectional
Longitudinal element
LFS √ 1992 onwards
GHS √ 2005 onwards
FRS √
EFS √
TUS 2000 (2005 in Omnibus)
BSAS √ 1984-1986
Omnibus √ (modules)
APS √
NTS √
BCS √
HSE √
SEH √
Definitions• Cross-sectional: one
point in time
• Repeated cross-sectional: survey repeated (each year) on different samples
• True longitudinal:same people at multiple points in time
• Retrospective
Types of data
ESDS Using working with surveys: v.10/07
16
Why Pool data over time?
• Increase Sample Sizes, reduced standard errors
• Examine trends over-time• Include year specific controls in regression
models (e.g. year dummy, regional unemployment rate)
ESDS Using working with surveys: v.10/07
17
Change in vehicle ownership over time
Source: GHS
ESDS Using working with surveys: v.10/07
18
Pooling Data
• Merge “add cases” in SPSS• Append Command in Stata
ESDS Using working with surveys: v.10/07
19
3) Combining panel survey waves
• Same individuals interviewed at different waves
• Cross-sectional (i) and time-series (t) dimension
• Often stored as separate wave files: – E.g. British Household Panel Survey (BHPS)
• Same linking commands can be used to join the files
ESDS Using working with surveys: v.10/07
20
Long and Wide Format
• Appendix E of workbook
ESDS Using working with surveys: v.10/07
21
Exercises
• FRS Exercise: Using data from three levels of hierarchy across three data tables
• GHS Exercise: Pooling years of repeated cross-sectional surveys (easier)
ESDS Using working with surveys: v.10/07
22
FRS Exercise
• What percentage of people in London, the East-Midlands, and West-Midlands are claiming state retirement pensions?
• Method: Need to Combine three files at different levels of hierarchy: HOUSEHOL ADULT BENEFITS
• Then run the cross-tab syntax at the bottom. If you do the data linking right, you get the right answer..
ESDS Using working with surveys: v.10/07
23
General Household Survey (GHS) Exercise
• How does the age of the UK population vary by ethnicity? Estimate the average age of different ethnic groups as coded in the variable ethnigp2
• Pooling three years of GHS Data• Effects of pooling on sample size and
estimation
ESDS Using working with surveys: v.10/07
24
Units of analysis• Fundamental to your research question!
– Who do you want to generalise to?– What are your cases?– What units are your population composed of?– Who is your research question applicable to?
• Some typical units– Individuals– Households– Schools– Businesses– Farms– Doctors– Wards
ESDS Using working with surveys: v.10/07
25
Hierarchy in some key datasets
SurveyHhd hierarchy?
Levels Type
GHS Household,Family,Individual,Sub Individual
Flat file
LFS Household, Family,Individual
Flat files(QLFS/Hhd data)
FES Multiple, inc. household, person, family unit, benefit unit
Multiple files
FRS Household,Benefit Unit, Individual Multiple files
HSE Household, Individual(watch out for variable samples)
Flat files (1 all inds, 1 all resps)
BSAS Individual Flat file
BCS Individual,Incident (Hhd context only)
Multiple files
BHPS Household, Individual (& below) Multiple files
Household SARs
Household, Family, Individual Flat file
ESDS Using working with surveys: v.10/07
26
Quarterly Labour Force Survey
Springquarter
Summerquarter
Autumnquarter
Winterquarter
Spring +1Quarter
W1 12k 12k 12k 12k 12k
W2 12k 12k 12k 12k 12k
W3 12k 12k 12k 12k 12k
W4 12k 12k 12k 12k 12k
W5 12k 12k 12k 12k 12k
Purple indicates those cases who were in wave 1 in spring year 1 – i.e. they’re in wave 2 in summer etc
• Each household participates for 5 consecutive waves (every 3 months/quarter)• Total 60k households per quarter