Data Management for Longitudinal Data

20.7.04: LSS 1

Longitudinal Studies Seminars: Longitudinal Analyses Using

Stirling University, 20.7.04

Data and Variable ManagementPaul Lambert

20.7.04: LSS 2

Data Management for Longitudinal Data

1. The Nature of ‘Large and Complex’ Data

2. Data management & STATA – getting started

3. Longitudinal Data Types

4. Merging Datasets

20.7.04: LSS 3

The nature of ‘large and complex’ longitudinal resources: complicating

the variable by case matrix

Cases Variables

1 1 17 1.73 A . . . .

2 1 18 1.85 B . . . .

3 2 17 1.60 C . . . .

4 2 18 1.69 A . . . .

. . . . . . . . .

20.7.04: LSS 4

Large and complex =

Complexity in: • Multiple hierarchies of measurement• Array of variables / operationalisations• Relations between / subgroups of

cases• Multiple points of measurement

–Balanced or unbalanced repeated contacts

–Censored duration data• Sample collection and weighting

20.7.04: LSS 5

i) Multiple hierarchies (levels) of measurement

Common examples:• Both individuals and households• Schools and pupils • People and local districts and regions

Solutions: • Separate VxC matrix for each level, eg BHPS • Merged VxC matrix at lowest level

Illustration: Hierarchical dataset

Cluster Person Person-level Vars

1 1 1 38 1 1

1 2 2 34 2 2

1 3 2 6 - -

2 1 1 45 1 3

2 2 2 41 1 1

3 1 1 20 2 2

3 2 1 25 2 2

3 3 1 20 1 1

n1=3 n2=8

20.7.04: LSS 7

ii) Array of variables

Vast number of variable responses, eg 1K+• Recoding multiplies these up, eg dummies• Multiple response var.s (‘all that apply’)• Categorisations / indexes (eg occupations)

Implication: • Either separate files for separate var.

groups• Or very long and difficult files…

20.7.04: LSS 8

iii) Relations between cases

All respondents in a household Husbands and wives both sampled Fellow school pupils sampled Longitudinal: differing relations with

others at different times Outcomes:

• Link information between related cases

20.7.04: LSS 9

iv) Multiple measurement points

Longitudinal: information on same cases for multiple time points

Panel or cohort: several records via repeated contact for each individual• Problems of ‘unbalanced’ panels

Life history / retrospective: • Durations in spells: multistate /

multiepisode, overlapping spells; time varying covariates

• Left or right censoring of durations in spells

20.7.04: LSS 10

v) Sample collection / weighting

Multistage cluster particularly popular Sample may have been clustered,

stratified Longitudinal: uneven inclusion of cases

over time Sample weights designed to solve, but:

• Complex in application• Not suited to all applications

20.7.04: LSS 11

4. Merging Datasets

20.7.04: LSS 12

STATA data management examples: see datmanag_part1.do

Claim: For data management, STATA is powerful, but not always well designed

Batch files / interactive syntax / programs

Data entry / browsing Variable labels Computing / recoding Missing values Weighting data Survey estimators (svy)

20.7.04: LSS 13

4. Merging Datasets

20.7.04: LSS 14

Typology of longitudinal data files

3 Sets of contrasts :

1. Repeated X-section / Panel / Cohort

Event History / Time Series

2. Wide v’s Long3. Discrete v’s Continuous time

See datmanag_part 2.do

20.7.04: LSS 15

Contrast 1 Type A: Repeated x-sect data

Survey Person Person-level Vars

1 1 1 38 1 1

1 2 2 34 2 2

1 3 2 6 - -

2 4 1 45 1 3

2 5 2 41 1 1

3 6 1 20 2 2

3 7 1 25 2 2

3 8 1 20 1 1

N_s=3 N_c=8

20.7.04: LSS 16

C1 Type B: Panel dataset (Unbalanced)

Cases Year Variables

1 1 1 17 1 1

1 2 1 18 2 1

1 3 1 19 2 -

2 1 1 17 1 3

2 2 1 18 1 1

3 2 2 20 2 2

3 3 2 21 2 2

3 4 2 22 1 1

n1=3 n2=8

20.7.04: LSS 17

C1 Type C : Event history data analysis

Alternative data sources: • Panel / cohort (more reliable)• Retrospective (cheaper, but recall errors)

Aka: ‘Survival data analysis’; ‘Failure time analysis’; ‘hazards’; ‘risks’; ..

Focus shifts to length of time in a ‘state’ -

analyses determinants of time in state

20.7.04: LSS 18

Key to event histories is ‘state space’ Episodes within state space : Lifetime work histories for 3 adults born 1935 State space Person 1 FT work

PT work Not in work

Person 2 FT work

PT work Not in work

Person 3 FT work

PT work

Not in work 1950 1960 1970 1980 1990 2000

20.7.04: LSS 19

C1 Type D: Time series data

**Exact equivalence to panel data format

Examples: Unemployment rates by year in UK University entrance rates by year by

country

Statistical summary of one particular concept, collected at repeated time points from one or

more subjects

20.7.04: LSS 20

Contrast 2: ‘Wide’ versus ‘Long’ formatRelevant to all types of dataset: ‘Wide’ = 1 case per record (person),

additional vars for time points : Person 1 Sex YoB Var1_92 Var1_93 Var1_94 … Person 2 …

‘Long’ = 1 case per time point within person

(as panel data example)

STATA: ‘reshape’ command allows transfer between the two formats

20.7.04: LSS 21

Contrast 3: Continuous v’s Discrete time Primarily in terms of event history datasets Continuous time (‘spell files’, ‘event

oriented’) One episode per case, time in case is a

variable Discrete time One episode per time unit, type of event

and event occurrence as variables Analyses: Most packages can handle

either format comfortably

20.7.04: LSS 22

Illustration of a continuous time retrospective dataset Case Person Start

time End time

Duration Origin State

Destination state

{Other vars, person/state}

1 1 1 158 157 1 (FT) 3 (NW) 2 1 158 170 12 3 (NW) 3(NW) 3 2 1 22 21 3 (NW) 1 (FT) 4 2 22 106 84 1 (FT) 3 (NW) 5 2 106 149 43 3 (NW) 2 (PT) 6 2 149 170 21 2 (PT) 2 (PT) 7 3 1 10 9 1 (FT) 2 (PT) . . . . . . .

20.7.04: LSS 23

Illustration of a discrete time retrospective dataset Case Person Discrete

Time Approx real time

State End of state

{Other person, state, or time unit level variables}

1 1 1 5 1 FT 0 2 1 2 20 1 FT 0 3 1 3 35 1 FT 0 4 1 4 50 1 FT 0 5 1 5 65 1 FT 0 6 1 6 80 1 FT 0 7 1 7 95 1 FT 0 8 1 8 110 1 FT 0 9 1 9 125 1 FT 0 10 1 10 140 1 FT 1 11 1 11 155 3 NW 0 12 1 12 170 3 NW 1 13 2 1 5 3 NW 0 14 2 2 20 3 NW 1 15 2 3 35 1 FT 0 16 2 4 50 1 FT 1 . . . . . .

20.7.04: LSS 24

4. Merging Datasets

20.7.04: LSS 25

Matching files

Complex data inevitably involves more than one related data file

A vital data analysis skill!! Link data between files by connecting

them according to key linking variable(s)

Eg, ‘person identifier’ variable ‘pid’ Eg : http://iserwww.essex.ac.uk/bhps/doc/

See datmanag_part3.do

20.7.04: LSS 26

Types of file matching

Case-to-case matching• One-to-one link, eg two files with different

sets of variables for same people• STATA: append or merge

Table distribution• One-to-many link, eg one file has

individuals, another has households, and match household info to the individuals

• STATA: merge

20.7.04: LSS 27

Types of file matching ctd

Aggregating• Summarise over multiple cases then link

summaries back to cases• STATA: collapse

Related cases matching• Link info from one related case to another

case, eg info on spouse put on own case• STATA: merge or joinby

20.7.04: LSS 28

STATA file matching crib:

_merge = indicator of cases present for:

1 = Master file but not input file2 = Input file but not Master file3 = Master and input file

Data Management for Longitudinal Data

Documents

Visualization of longitudinal student data

A Primer on Longitudinal Data Analysis In Education - ERIC · A Primer on Longitudinal Data Analysis in Education Longitudinal data analysis in education is the study of student growth

Longitudinal Data, Mixed Models and Incomplete Data · Longitudinal Data, Mixed Models and Incomplete Data Geert Verbeke Biostatistical Center, ... Modelling Based on Generalized

Statewide Longitudinal Data System (SLDS)

Tennessee Longitudinal Data system (TLDS)

Modeling Continuous Longitudinal Data. Introduction to continuous longitudinal data: Examples

Linking Longitudinal Administrative Data for …raymarshallcenter.org/files/2009/12/DQI_ PPT_King_Dec09.pdfLinking Longitudinal Administrative Data for Program Evaluation and Management:

Longitudinal Asthma Management Profiles

New Challenges for Longitudinal Data Analysis · New Challenges for Longitudinal Data Analysis Joint modelling of Longitudinal and Competing risks data Ruwanthi Kolamunnage-Dona University

Longitudinal data

[XT] Longitudinal Data/Panel Data - Data Analysis and ... · PDF fileTitle intro — Introduction to longitudinal-data/panel-data manual DescriptionRemarks and examplesAlso see Description

Longitudinal Administrative Data Dictionary 2018

Opportunities and Challenges of Data Linkage for ... · • Longitudinal data linkage is the ability to link longitudinal survey data to a range of other (often also longitudinal)

Longitudinal Structural Equation Modeling · 1 Longitudinal Structural Equation Modeling 1.1 Longitudinal Data Analysis •longitudinal data analysis is the analysis of changein an

Missing Data in Longitudinal Studies - biostat.umn.eduxianghua/8452/note/11Missing.pdf · PubH8452 Longitudinal Data Analysis - Fall 2014 Missing Data in Longitudinal Studies Introduction

Longitudinal Structural Equation Modelingpersonality-project.org › revelle › ...longitudinal.pdf · 1 Longitudinal Structural Equation Modeling 1.1 Longitudinal Data Analysis

The SEM Approach to Longitudinal Data Analysis Using the ... · The SEM Approach to Longitudinal Data Analysis Using the CALIS ... SEM approach to longitudinal data analysis.

Longitudinal High-Dimensional Data Analysis

Scottish Longitudinal Study Data Dictionary

Longitudinal Data Systems Virginia Department of Education Office of Educational Information Management Putting Valuable Data to Good Use