Upload
cody-austin
View
224
Download
7
Embed Size (px)
Citation preview
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Funded by:
The Digital Curation Lifecycle Model
Joy Davidson and Sarah Jones
Digital Curation Centre, Glasgow
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
DCC curation lifecycle model
Activities / phases to cover are:
• storing and preserving data
• accessing data
• licensing data
• citing data
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Key questions:
• which repository is right for your data?
• what are the policies of the repository?
• how long will your data be kept?
• who will have access to your data?
• how can you make the most impact with your data?
Ingest: where should you data go?
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Guidance:
Think about where you will deposit your data from the outset (what policies does the repository have that may affect your data?)
Determine how you will license your data early on. This is especially important to clarify early on in collaborative research endeavours.
Think about your impact from the outset. Make potential citations as easy as possible.
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Where will you store your data?
Exeter University has three repositorieshttp://as.exeter.ac.uk/library/resources/openaccess/repositories
• Exeter Research and Institutional Content archive (ERIC) – theses and publications
• Digital Collections Online (DCO) – images and multimedia
• Exeter Data Archive EDA - research data
DataCite list of data repositories http://datacite.org/repolist
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Exeter Data Archive (EDA)
EDA enables searches by • subject• authors• Collections
It can be a place to store your data but also a good way to find potential collaborators and gaps.
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Sustainability
Prior to deposit, check the data repository’s sustainability claims – both for your data and for the repository itself.
EDA Example
EDA regularly backs up its files according to current best practice.
In the event of Exeter Data Repository being closed down, the database will be transferred to another appropriate archive.
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Formats
Prior to deposit, check to make sure that the data repository accepts the format(s) you will be working with.
Check to see if there are normalisation procedures (ingest, preservation).
EDA Example
The Exeter Data Archive (EDA) collects, preserves and makes available the University's research data. The content policy states that EDA will accept all types of data.
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
What information do you need to provide?
Most repositories have a clearly defined set of minimum information requirements.
EDA Example
• Title • Data creator • Department• Date of publication • Dataset description
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
• details of how the data have been encoded (database structures, file formats);
• a list of software known to work with the data and their supporting information;
• indications of how the data relate to other data assets;
• administrative information (identifiers, checksums);
General guidance
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
• explanations of what the data represent (e.g. for sensor data, what the sensor was measuring and in what units); • the processing history of the data (how they were generated and subsequently transformed, when and by whom);
• a narrative describing the context (why the data were generated/collected, what methodology was used and why).
This information is particularly importantfor users as they interpret the data, and determine
whether andhow they can be integrated with other data.
General guidance
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Access
Prior to deposit, check to make sure that the data repository’s
policy on access meets your needs.
EDA Example
• Anyone may access full items free of charge. • Copies of full items generally can be: (a)reproduced, displayed or performed, given to third
parties, and stored in a database in any format or medium (b) for personal research or study, educational, or not-
for-profit purposes without prior permission or charge
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Restrictions on Access
Are there any restrictions on access to your data that the repository should be made aware of?
EDA Example
• Items can be deposited at any time, but will not be made publicly visible until any publishers' or funders' embargo period has expired. • This repository is not the publisher; it is merely the online archive.
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Licensing
Prior to deposit, think about how you will license your data. Make sure that your data license respects limits associated with any external data you are using in your work.
EDA Example
• Full items must not be sold commercially in any format or medium without formal permission of the copyright holders. • Any copyright violations are entirely the responsibility of the authors/depositors• Some full items are individually tagged with different rights permissions and conditions .
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Two key issues to consider:
• Licensing – legal instrument stating what people can and can’t do with your data
• Waivers – legal instrument allowing author to give up rights
General guidance for data licensing
Taken from DCC How-to guide on licensing data www.dcc.ac.uk/resources/how-guides
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
• Attribution condition - allows others to copy, distribute, display, and perform the work as long as the creator is given due credit.
• Non-commercial – users cannot use the work for commercial purposes
• Share-alike – all derivative works must be released under the same licence as the original work
Attribution Non-Commercial Share Alike (CC-BY-NC-SA)http://creativecommons.org/
Creative Commons
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
The most permissive way of releasing data is under a dedicationto the public domain. This is where all copyright interests anddatabase rights are waived, allowing the data to be used as freely as possible.
Creative Commons Zero (CC0) is the Creative Commons toolfor dedicating works to the public domain. It works on two levels:as a waiver of a person’s rights to the work, and in case that is not effective, as an irrevocable, royalty-free and unconditional licence for anyone to use the work for any purpose. http : / /creativecommons .org /publicdomain/zero/1.0/.
CC0
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Citation
Make sure that your data is citable to increase your potential impact.
EDA Example
Once your work has been approved for entry into the EDA, you will receive a notification via email. This email will contain a permanent link to your work - you should cite this link in preference to the URL of the item as it provides continuing persistent access in case the URL should ever change.
Policies
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
If you have generated/collected data to be used as evidence in an academic publication, you should deposit them with a suitable data archive or repository as soon as you are able.
If they do not provide you with a persistent identifier or URL for your data, encourage them to do so.
General guidance for data citation
Taken from DCC How-to guide on data citation www.dcc.ac.uk/resources/how-guides
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
When citing a dataset in a paper, use the citation style required by the editor/publisher. If no form is suggested for datasets, take a standard data citation style and adapt it to match the style for textual publications.
Give dataset identifiers in the form of a URL wherever possible, unless otherwise directed.
Include data citations alongside those for textual publications. Some reference management packages now include support for datasets, which should make this easier.
General guidance for data citation
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Cite datasets at the finest-grained level available that meets your need. If that is not fine enough, provide details of the subset of data you are using at the point in the text where you make the citation.
If a dataset exists in several versions, be sure to cite the exact version you used.
When you publish a paper that cites a dataset, notify the repository that holds the dataset, so it can add a link from that dataset to your paper.
General guidance for data citation
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Using Exeter Data Archive (EDA)
EDA is currently being piloted. If you would like to place any of your data in the repository please email [email protected] and a member of the team will be in contact.
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
But remember!
The validity and authenticity of the content ofsubmissions is the sole responsibility of the
depositor.
This is true for any data repository.
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Managing datahttp://www.dcc.ac.uk/resources/external/tools-services/managing-active-research-data
Sharing and tracking reusehttp://www.dcc.ac.uk/resources/external/tools-services/sharing-output-and-tracking-impact
Useful resources - DCC Tools catalogue
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Other ideas
Consider sharing negative results.
• Journal of Negative Results in BioMedicine www.jnrbm.com/
• Journal of Negative Results www.jnr-eeb.org/
• The All Results Journals www.arjournals.com/
Consider publishing on your RDM activity in curation journals http://www.dcc.ac.uk/resources/curation-journals
Because good research needs good data
The DCC lifecycle model, Exeter Uni, 18-19 May 2011
Any questions?
For DCC guidance, tools and case studies see:www.dcc.ac.uk/resources
Follow us on twitter @digitalcuration and #ukdcc