View
167
Download
1
Category
Preview:
Citation preview
LibraryServices
Research Data
Management at
Imperial College
London
17th May 2017 – University of Copenhagen Library
Sarah Stewart - Research Data Support Assistant,
Scholarly Communications Team / @Biostew
ORCID: http://orcid.org/0000-0002-9465-4042
Imperial College London (some context)• ~15,000 students and
~8000 staff, including
~3000 researchers
• International community,
with students from 125
countries
• Focus on four main
disciplines: Sciences,
engineering, medicine and
business
• Times Higher Education
World University Rankings
2016-2017: 3rd in Europe
and 8th in the World.
• Greatest concentration of
high-impact research of
any major UK University.
The Strong Case for RDM
• Intensive Data-Generating Research Hubs = ‘Big Data’
• UK Med Bio - Bioinformatics Data Science Group – research into causes
and progression of human diseases.
• NHS Trust Research Data (Medicine)
• Research Computing Group and Research Software Engineering
Community
• But also many important ‘small data’ projects across College.
Funder requirements…
“Publicly funded research data are a public good,
produced in the public interest, which should be made
openly available with as few restrictions as
possible…”
RCUK Common Principles on Data Policy
Data Science hub and KPMG Data
Observatory launch (Nov 2015)
"At a research intensive
university like Imperial it is
hard to do anything that
doesn't involve data.“
James Stirling, Provost
"Data is at the heart of the
human condition."
Joanna Shields, UK Minister
for Internet Safety and
Security
The importance of RDM…
“In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data.”http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
Process of policy development
•2014: Draft policy: “Statement of Strategic Aims”
•Lack of reliable data (on data storage needs (scale) in particular)
•Concerns about cost of maintaining infrastructure
•Concerns about uncertainties and changing market / policy landscape
•Decision: re-think approach – more cost-effective, based on better data
•Approach: RDM Green Shoots and RDM Investigation
•Funded by Vice-Provost (Research)
•Green Shoots: 6 bottom-up, academic projects (2nd half of 2014)
•RDM investigation (Oct 2014-Jan 2015)
•Online survey (academics; 390 responses)
•~40 interviews (academics)
•Workshops (academics & data managers)
Online survey – where does active data
live?
0 10 20 30 40 50 60 70 80
College computer
External/portable storage
Cloud storage
Personal computer
Departmental/group storage
College H drive
ICT central storage
Use of different types of storage in %
Online survey – growth of data volume
0 5 10 15 20 25 30
> 1 PB
100 TB – 1 PB
10 TB – 100 TB
1 TB – 10 TB
100 GB – 1 TB
10 GB – 100 GB
< 10 GB
Research group data storage needs in %
Now
In 2 years
Findings (best practice)
•RDM principles are considered to be sound but not fully practised
•Sharing publicly-funded data accepted in principle but some question value and cost
•Concerns about (metadata) effort to make shared data discoverable
•Metadata schemas are not yet widely available across disciplines
•Auto-generate metadata where possible
•Consensus that RDM training for PhDs is vital(also to ensure data loss when they leave)
Findings (data)
•60-100% of grant required to re-generate data used in publications
•% of data that needs retaining to support publications: ~60%
•Data storage capacity will have to grow significantly
•Concerns around back-up and archiving, esp. considering data volume
•Popularity of cloud services (as opposed to College storage)
Researchers want self-administered, secure, responsive solutionfor data sharing, storing and archiving; open APIs preferred
(“Yes [storage] is really important. Basically, whenever we have been out to talk to researchers, that's the thing they have latched on to and want to talk about the most.” 10.1371/journal.pone.0114734)
Conclusions / policy implementation
principles
•Provide platform-independent, flexible data storage
•Embed RDM training into PhD progression
•Where available, uses existing workflows:
Symplectic Elements: metadata management
Spiral (DSpace): public (metadata) catalogue
•Additional infrastructure:
•use external resources
•no long-term commitment
•as flexible as possible
•cost-effective
Infrastructure summary•Flexible, can react to market / policy changes
•Components can be exchanged, no additional
in-house infrastructure
•Make a start, collect data, learn – change as required
•Preservation infrastructure needs further work
(discussions with Arkivum about ‘framework’ for costing
into grants) – how much do we need
to retain beyond published data?
•It isn’t perfect, but we can make a start
Result: Imperial College RDM Policy
“Imperial College London is committed to
promoting the highest standards of academic
research, including excellence in research data
management. This includes a robust digital
curation infrastructure that supports open data
access and protects confidential data. The
College acknowledges legal, ethical and
commercial constraints on data sharing and the
need to preserve the academic entitlement to
publication.”
“Principal Investigators have overall
responsibility for the effective management of
research data generated within or obtained for
their research, including by their research groups.
The Library and ICT will provide training,
guidance and services to support PIs.”
http://imperial.ac.uk/research-data-management
Who are we?
Helping the Imperial community to communicate and disseminate their research
and academic work.
Live Data Storage: Box (and Others)
• Box for live data storage (non-sensitive) and data
sharing
• Sensitive data storage via ICT secure storage and
encryption
• Specialist data storage, eg. Omero in Bioinformatics
Data Science Group for light microscopy images
• Research Computing Repository
• Imperial GitHub for Software and code
Treat software as valuable research output
PyRDM Green Shoots projectZenodo integrates with GitHub
College survey on distributed version controlSoftware Sustainability Institute – I a fellow
Archiving Data ‘without a Repository?’
• Data is archived in Zenodo
or in UK Data Service
(sensitive data) post-
project
• Software and code
archived in Zenodo via
GitHub
• Metadata from Data and
Software are deposited
into Spiral via Symplectic
• Indexed by DataCite and
CrossRef
ORCID – Open Researcher and Contributor ID
•Emerging global standard for identifying authors of academic outputs
•The College created ORCID iDs for academics staff in late 2014
(now 2,088 of 3,200 iDs claimed, ~1,500 linked in Elements)
•Imperial hosted launch of Jisc ORCID consortium with
50 UK universities in September 2015
http://www.imperial.ac.uk/orcid
Case for a national infrastructure?
Currently, ~100 UK institutions spend effort to define and implement
an RDM infrastructure (storage, workflows, interfaces, metadata, compliance, monitoring,
business model etc.). Some aspects
have to be local, but…
…imagine a national research data infrastructure (say for data
publishing and preservation), run by RCUK:
•Economies of scale
•No issues with funding
•Just one system to interface with
•Increased visibility/discoverability
•Solution would by default be compliant
•No commercial “ownership” of public data
Outreach – Love Your Data!
• PhD Training on RDM Basics and DMPOnline (including
PhD-specific DMPOnline template)
• RDM ‘Drop-in Clinics’
• RDM ‘Byte-Size’ sessions – informal sessions on various
topics
• Imperial Data Circus
• Open Access Road Show
Liaisons
BDAU
FoE
FoLS
Business
School
DoM
DoSC
Ped
Materi
als
Bioinf
Grad
School
ESA
RDM
Clinics
RDM Talks
1:1s
RMs
Chem
Aero
HPC
RSE
CDT
CM
Hub
FoM
PhD
Webi
DMPDOIs
DoM
Event
New
Starter
Research
Doughnut
RDM Outreach
OA Team
OA/RDMOA Week
Data
Circus
Civil
Bio
engcomp
Imperial Data Circus
• Originally for Open
Access Week 2016
• Informal showcase
for research
conducted at
Imperial with Open
Data and Open
Software
• Provides a forum to
discuss open
research across
disciplines
Engaging Directly with Researchers
• Embedded approach – meet with researchers in situ – in
their labs and offices
• One-on-one or group meetings
• Departmental meetings to inform on policy changes and
updates and provide insight into best practice.
Stats are exciting!
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81
0
20
40
60
80
100
120
140
Time Series of RDM Enquiries (Totals)
Number of enquiries
0
20
40
60
80
100
120
Dataset Deposits in Spiral, 2015-2016
Number of Deposits Cumulative Deposits
0
2
4
6
8
10
12
14
16
18
Software Deposits in Spiral - 2015-2016
Series1
Series2
Data Catalogue
Nature of RDM Enquiries
Box Data Access Statement Data Management Plan Data Sharing/Publication
Data Archiving DOIs/Metadata Data Policy Software
Zenodo Outreach Data Licenses
rdm-enquiries: what are they asking?
On the Horizon…
• On-line MOOC, pre-recorded webinar and video
presentations for researchers and students
• Medicine-specific DMPOnline template
• Jisc Shared Services Pilot – UK-wide network of data
management services (in planning)
Questions?
For more information:
www.imperial.ac.uk/rdm
rdm-enquiries@imperial.ac.uk
Sarah Stewart – sarah.stewart@imperial.ac.uk
@Biostew
Ash Barnes – a.barnes@imperial.ac.uk
@ashbarnes71
Recommended