52
Research data: what is being done Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc

Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Research data: what is being done

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk @kevingashley

[email protected]

Reusable with attribution: CC-BY The DCC is supported by Jisc

Page 2: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 2

Page 3: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Before what -

WHY?

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 3

Page 4: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 4

Page 5: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

What a paleontologist looks at

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 5

Now 100 million years ago

25m 50m 75m

1m

Page 6: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

What a paleontologist looks at

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 6

Now 100 million years ago

25m 50m 75m

1m Now 1 million years

750,000 500,000 100,000

Page 7: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

What an archaeologist looks at

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 7

Now 1 million years

750,000 500,000 100,000

100,000 years ago

75,000 50,000 25,000

Page 8: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 8

Page 9: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 9

The Old weather project

Data for research, not from research

Page 10: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 10

Page 11: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 11

Page 12: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Data reuse - messages

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 12

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Page 13: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Data reuse from Hubble

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 13

Page 14: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

G8UK - Endorses

OA

Open Data

Charter

Policy Paper

18 June 2013

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 14

Page 15: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 15

Why care?

• Data is expensive – an investment

• Reuse:

– More research

– Teaching & Learning

– Planning

• Impact – with or without publication

• Accountability

• Legal & regulatory requirements

Page 16: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Why does this matter?

• Research quality – How close can we get to

the truth?

• Research speed – How quickly can we get

to the truth?

• Research finance – How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions

• Funders – hence government and society

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 16

Page 17: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 17

FUNDER POLICY UNIVERSITY RESPONSE

Page 18: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Funders are making demands

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 18

Page 19: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Funder requirements

• UK

• USA – NSF, NEH, NIH • Europe

• Denmark – in development • Most place burden on

researcher – some on the institution

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 19

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

Page 20: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

RCUK policy - The 1-minute version

• Research data are a public good – make openly available in timely & responsible way

• Have policies & plans. Data with long-term value should be preserved & usable

• Metadata for discovery & reuse. Link publications & data

• Sometimes law, ethics get in the way. We understand.

• Limited embargos OK. Recognition is important – always cite data sources

• OK to use public money to do this. Do it efficiently.

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 20

Page 21: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

EPSRC policy points

• Awareness of regulatory environment

• Data access statement

• Policies and processes

• Data storage

• Structured metadata descriptions

• DOIs for data

• Securely preserved for a minimum of 10 years from last use

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY

21

Compliance expected by 2015

Page 22: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 22

DCC Policy Summary

http://www.dcc.ac.uk/resources/policy-and-legal

Page 23: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 23

Page 24: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Research data centres are good value!

• See Jisc reports on ADS, BADC, UKDA:

• Returns on investment between 400% and 1200%

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 24

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

Page 25: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Research Data Centres – the solution!

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 25

MANY AREAS OF RESEARCH HAVE NO

DATA CENTRE TO SERVE THEM

Page 26: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

164 universities in UK*

*2011 HESA data

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 26

71 (43%) > 5% research income

115 (70%) > £1m income from research

Page 27: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

£4.4 billion total research grants

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 27

BIS business case: £1.5m annual investment in national research data services pays back 2.5 times after 5 years.

Page 28: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

DCC ‘institutional engagement’ Assess needs

Make the case

Develop support and

services

RDM policy development

Customised Data Management Plans

DAF & CARDIO assessments

Guidance and training

Workflow assessment

DCC support

team

Advocacy with senior management

Institutional data catalogues

Pilot RDM tools

…and support policy implementation 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 28

Page 29: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

DCC guidance

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 29

Page 30: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Roles and Responsibilities

What data to keep

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 30

Page 31: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 31

Some institutional roles

• Leadership – coordinate action • Audit – who has what, where does it go? • Advice on access – data, wherever it is • Preservation – permanence • Citability • Data/publication linking • Promoting data in teaching • Selection • Education – early career researchers

Page 32: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Who (in the UK) is leading RDM work?

Library

IT

Research

Office

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 32

RESEARCHERS

Page 33: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 33

Understanding Data Requirements

http://www.dcc.ac.uk/

Page 34: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Kevin Ashley –DEIC-2014 - CC-BY

34

“Departments don’t have guidelines or norms

for personal back-up and researcher procedure,

knowledge and diligence varies tremendously.

Many have experienced moderate to

catastrophic data loss”

Incremental Project Report, June 2010

http://www.flickr.com/photos/mattimattila/3003324844/

2014-10-01

Page 35: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 35

Page 36: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

INSTITUTIONAL SERVICES

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 36

Page 37: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Some example services

• Storage – persistent, shareable

• Permanent, citeable identifiers

• Database as a service (e.g. Oxford ORDS)

• Embed tools in Excel – Dataup, others

• Workflow management - Taverna

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 37

Page 38: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Make data creation easier

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 38

Page 39: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Make data citable

• Making data available increases citations

• Everyone – academic, funder, institution – loves citations

• Want evidence? – Alter, Pienta, Lyle – 240%, social sciences *

– Piwowar, Vision – 9% (microarray data)†

– Henneken, Accomazzi – 20% (astronomy) #

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 39

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 40: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Make data discoverable

• Data must be discoverable to be reused

• Alone, or in conjunction with publication

• Institutional catalogues, national data registries – JISC is piloting through DCC

• We are copying Australian approach

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 40

Page 41: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 41

Page 42: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Pimp your data –

make it findable & reusable

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 42

Gking.harvard.edu/data

Page 43: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Cloud for storage – sorted!

• Sorry, but it isn’t.

• See David Rosenthal’s analysis of the economics of Amazon for preservation

“Distributed digital preservation in the cloud”

IJDC 8(1), 2013 doi:10.2218/ijdc.v8i1.248

The cloud has uses – long-term data retention is not one.

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 43

Page 44: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Cost of data for 100 years – local vs Amazon S3 Data from blog.dshr.org/2013/01/talk-at-idcc2013.html

© David Rosenthal, used under CC-BY-SA licence 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 44

Page 45: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Cost of data for 100 years – local vs Amazon S3 AND Glacier Data from blog.dshr.org/2013/01/talk-at-idcc2013.html

© David Rosenthal, used under CC-BY-SA licence 2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 45

Page 46: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

What about collaboration?

• Collaborate within the university

• Collaborate with partners

• Collaborate with regional, national services

• Not everything can be done well locally

• Infrastructure needed at research group, institution, national, (discipline) & international level

• Internationally – look to Research Data Alliance

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 46

Page 47: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 47

Page 48: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Commercial services

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 48

Page 49: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 49

SWEDEN

DENMARK?

CANADA

Page 50: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Closing thoughts

• Library/data centre roles: – selecting content – protecting it – enabling and encouraging reuse – Assisting with data management planning

• Library: – helping users find the most relevant content – much

research data does not come from research

• Data center: – setting standards – enabling uptake – providing services

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 50

Page 51: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

Infrastructure levels

• Truly international – instruments, standards

• National variation, international core:

– Training

– Data management planning

– Policy

– ..

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 51

Page 52: Kevin Ashley Digital Curation Centre ...RDM policy development Customised Data Management Plans DAF & CARDIO assessments Guidance and training Workflow assessment DCC support team

My message to researchers • The credit belongs to you

• The data belongs to all of us

• Share, and we all reap the benefits

2014-10-01 Kevin Ashley –DEIC-2014 - CC-BY 52