Upload
anaya-sherwin
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Administrative Data Research Centre for England
1
Workshop on Synthetic Data
ONS, Dec 2014
The Administrative Data Research
Centre for England
and The Role of Synthetic Data
Dr Andy Cullis, Senior Data Scientist
University of Southampton2
Outline of talk ESRC Big Data Initiative
What is the Administrative Data Research Network (ADRN)
What is the Administrative Data Research Centre for England (ADRC-E)
ADRC-E infrastructure
What is the process for using data in the ADRC-E?
Role of Synthetic data in the ADRC-E
ADRC-E Synthetic data requirements
3
ESRC’s Big Data Initiative Administrative Data Task Force 2011-12
BIS awarded ESRC £64m for a Big Data Initiative
Phase 1 (started Oct 2013) Administrative Data Research Network (ADRN)
Five year funding project
Two other phases to come in 2014-15
Business and Local Authority Data Research Centres
Civil Society Data Partnership Projects
4
What is the Administrative Data Research Network (ADRN) Four Administrative Data Research Centres (one in each
country of the UK)
And an Administrative Data Service (ADS) – coordinates ADRN policies and procedures and key role in public engagement
Each ADRC is a partnership between academic institution(s) and respective national statistical organisations
5
What is the Administrative Data Research Centre for England (ADRC-E) Led by the University of Southampton and run in
collaboration with
The Office for National Statistics
University College London
The London School of Hygiene and Tropical Medicine
The Institute for Fiscal Studies
The Institute for Education
Director: Peter Smith (University of Southampton) 6
ADRC-E Infrastructure Access to the Office for National Statistics Virtual
Microdata Laboratory facilities and secure research laboratories in Southampton (UoS) and London (UCL) for the analysis of de-identified linked administrative data
Staff –
Southampton - 2 data scientists and 3 Research Fellows
London - 2 data scientists and 2 Research Fellows
Plus staff in London, Southampton and ONS
who manage liaison across sites
responsible for public engagement
communications activities, user outreach & training
A team of academics providing advice and leading coordinated research programmes 7
What is the process for using data in the ADRC-E?
A researcher approaches the ADRN
Project accreditation/ethics/approval and data supplier
Data from suppliers linked by trusted (third) party (e.g. ONS)
Access to linked anonymised data granted within appropriate secure setting for analysis
All outputs vetted and approved
End of project - Data destroyed, documentation and syntax/code retained
Plain English summaries of research published
8
Role of Synthetic data in ADRC- E
• Why do we need synthetic data?
• Two perspectives:
• Researcher wishing to use linked data
• Data Scientist support the ADRC-E
9
Role of Synthetic data in ADRC-E
•Researcher wishing to use linked data
• Get a better understanding of the variables in the datasets
• Be able to prepare syntax/code
• Run exploratory models to help decide on feasibility of data linkage project
10
Role of Synthetic data in ADRC-E
•Data Scientist support at ADRC-E
• Enable us to get better understanding of potential Administrative data sources
• Understand the quality of the Administrative data to be used in projects
• Help in preparation of data to be uploaded in secure environment
• Advise researchers of potential issues with the Administrative data they hope to use
11
ADRC-E Synthetic data requirements
•What do we need
• Data synthesised to differing degree
• Regularly used administrative data
• Who holds these types of datasets?
• Where could it be accessed?
12