IASSIST40 W1: Data management
& curation workshop
Toronto, 3 June, 2014
Robin RiceEDINA and Data LibraryUniversity of Edinburgh
Edinburgh DataShare• An institutional OA data repository based in
DSpace• Multidisciplinary, multiple data types• Supports University Research Data Mgmt Policy• For UoE researchers & their collaborators only• For research projects without a domain repository• Has been full service since 2010 but recent
University funding for development is allowing enhancements
• Promoted as part of RDM programme, one of many ‘tools for the job’
Scope• No limits in terms of subject matter or data types• From mathematical proofs to audio files of
Shilluuk tribal narratives • Meeting new users across university• 21 ‘schools’ make up our top level ‘communities’
– several still empty• Open data a tough sell for (some) academics• Interviews with Creative Arts researchers, not
comfortable with term ‘data’ in their context• Social Sciences covered well by UK Data Archive
Policies• No mandate for deposit; can only promote benefits• No deposit of sensitive data, please• No user registration or enforcement of T&Cs; open
data or embargo or ‘request copy’ from depositor• Self-deposit model: so KISS, in terms of workflow
o Guidance, such as checklist for deposit, user guide with screenshotso Meetings to discuss data welcome; assisted deposit where warranted
• Basic quality assurance checks by staff (documentation exists, file formats, file integrity)
• Open Data Commons Attribution licence by default; open metadata
• Preservation policy; depositor agreement; service level definition
Discoverability• Keep DSpace up to date; search engine hits good• Focus on discoverability metadata (DCMI); ‘type’• System-generated ‘suggested citation’ based on
required fields on landing page• Handle on landing page. DOIs coming soon• Harvested by Data Citation Index• Metadata links to data sources, articles, other
versions
Platform• DSpace originally, in 2013 RDM Steering Group
suggested pilot depositors to test fitness for purpose
• Advantages of continuity but we have also looked at: Fedora, Dataverse, CKAN, Invenio
• Still mainly upload/download model, no visualisation, modeling, online analysis
• SWORD and batch ingest recently enabled for large and/or voluminous datasets (upload)
• Becoming part of a local system including CRIS, Data Asset Register, Vault, Active Data Store
Metadata standards• Based on DCMI and a few necessary system-
related fields• Needs to be lightweight & multidisciplinary• Conforms to DataCite minimum fields (DOIs soon)• Discovery metadata only; documentation files
required to allow re-use (part of manual QA check)
• File formats are supported, known, or unknown, with guidance given to depositors.
Thank you
Binary-by-Xerones-CC-BY-NC