29
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*, Paul Lambert**, Jesse Blum* * Computing Science, ** Applied Social Science University of Stirling, Stirling, Scotland, UK IASSIST 2010 Session: Automated Curation Tools and Services for Metadata Ithaca, NY 3 June 2010 DAMES: Data Management through e-Social Science http://www.dames.org.uk

A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social

Science Data Resources

Simon Jones*, Guy Warner*, Paul Lambert**, Jesse Blum*

* Computing Science, ** Applied Social ScienceUniversity of Stirling, Stirling, Scotland, UK

IASSIST 2010 Session: Automated Curation Tools and Services for Metadata

Ithaca, NY 3 June 2010

DAMES: Data Management through e-Social Sciencehttp://www.dames.org.uk

Page 2: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

2

DAMES: Background DAMES: Case studies, provision and support for data

management in the social sciences A 3 year ESRC research “node”: Feb 2008-Feb 2011

Funded by the Digital Social Research Strategy group Driven by social science needs for support for advanced data

management operations “In practice, social researchers often spend more time on

data management than any other part of the research process” (Lambert)

A ‘methodology’ of data management is relevant to ‘harmonisation’, ‘comparability’, ‘reproducibility’ in quantitative social science

Page 3: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

3

DAMES: Themes Four social science themes

Grid Enabled Specialist Data Environments:occupations; education; ethnicity

Micro-simulation on social care dataLinking e-Health and social science databasesTraining and interfaces for data management support

Underlying computer science research themesMetadataData curationData fusionWorkflow modellingData security

Page 4: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

4

DAMES goals To build on previous work on the GEODE project

Grid Enabled Occupational Data Environment To build an integrated DAMES Grid-based portal:

Combining specialist occupational, ethnicity, educational, health data resources: GE*DE(“GESDE”)

Enabling a virtual community of soc sci researchers:To deposit and search heterogeneous data resources To access online services/‘tools’ that enable researchers to

carry out repeatable and challenging data management techniques such as: • fusion • matching • imputation …

Facilitating access is an important goal

Page 5: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

5

GEODE

Page 6: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

6

DAMES scenarios Curation scenarios include:

Uploading occupational data to distribute across academic community

Recording data properties prior to undertaking data fusion involving a survey and an aggregate dataset

Fusion scenarios include:Linking a micro-social survey with aggregate

occupational information (deterministic link)Enhancing a survey dataset with ‘nearest match’

explanatory variables (probabilistic link)

Page 7: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

7

Key role for metadata Metadata records are absolutely core to the functioning of

the portal infrastructureFor adequate, searchable records for the heterogeneous

resources (data tables, command files, notes and documentation)

To connect the resources and the data mgmt toolsTo document the data sets resulting from application of

the data mgmt tools: inputs, process, rationale,… DAMES requirements:

(Micro-)data based, very general, lifecycle oriented, Grid friendly

DDI 3

Page 8: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

8

The metadata "cycle"

Processing

Metadata

Search

Page 9: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

9

DAMES portal architecture overview

Portal

DAMES Resources

External Dataset

Repositories

User

Services

Search

Enact Fusion

File Access

Compute Resources

Metadata

Local Datasets

(Note: Security omitted)

Page 10: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

10

Curation Tool prototypeThe source data:

Page 11: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

11

Page 12: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

12

Page 13: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

13

Page 14: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

14

Page 15: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

15

Page 16: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

16

Page 17: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

17

Page 18: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

18

Page 19: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

19

Page 20: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

20

Page 21: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

21

Page 22: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

22

Page 23: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

23

Also automatically uploaded to searchable eXist database

Page 24: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

24

Example: The Fusion Tool Scenario: A soc sci researcher wishes to fuse Scottish

Household Survey data with privately collected study data:Uses the data curation tool to upload the dataUses the data fusion/imputation tool to select the data,

identify corresponding variables, and to generate a derived dataset (held in the portal)

The metadata about this derived dataset is stored and (may be) made public through the portal

Another researcher can now search the portal (metadata) for SHS data and find the derived dataset

DAMES metadata handling must facilitate this process

Page 25: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

25

The Fusion Tool prototypeSelect datasets

(recipient and donor)

Select "common variables"

Select variables to be imputed

Select data fusion method

Submit to fusion "enactor"

Metadata accessed

Page 26: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

26

Select datasets (recipient and donor)

Select "common variables"

Select variables to be imputed

Select data fusion method

Submit to fusion "enactor"

Metadata accessed

Page 27: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

27

Select datasets (recipient and donor)

Select "common variables"

Select variables to be imputed

Select data fusion method

Submit to fusion "enactor"

Ski

pped

Metadata for result dataset

Page 28: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

29

Fusion job flow description The fusion/imputation is potentially resource greedy

Generalise to using a pool of computing resources We use a Job Flow Description Language (JFDL) to

submit the job to the computing resources pool The JFDL job description includes references to:

Input data setsProcessing steps and their relationshipsOutputs

This must be recorded in the DDI record for the resultant data set We are automating the translation JFDL→DDI3

Page 29: A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,

33

Summary/Work in progress The DAMES infrastructure is still very much in the

development phase We have a prototype outline portal, with prototype

curation and fusion tools We are firmly fixed on using DDI 3 as our metadata

standard, but we are still:Refining the JFDLRefining the DDI3 Improving generation of DDI3 from JFDL Improving searching and discovery of datasets

Thank you!