26
Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson and Sarah Jones Digital Curation Centre, Glasgow [email protected] [email protected]

Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Embed Size (px)

Citation preview

Page 1: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Funded by:

The Digital Curation Lifecycle Model

Joy Davidson and Sarah Jones

Digital Curation Centre, Glasgow

[email protected]

[email protected]

Page 2: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

DCC curation lifecycle model

Activities / phases to cover are:

• storing and preserving data

• accessing data

• licensing data

• citing data

Page 3: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Key questions:

• which repository is right for your data?

• what are the policies of the repository?

• how long will your data be kept?

• who will have access to your data?

• how can you make the most impact with your data?

Ingest: where should you data go?

Page 4: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Guidance:

Think about where you will deposit your data from the outset (what policies does the repository have that may affect your data?)

Determine how you will license your data early on. This is especially important to clarify early on in collaborative research endeavours.

Think about your impact from the outset. Make potential citations as easy as possible.

Page 5: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Where will you store your data?

Exeter University has three repositorieshttp://as.exeter.ac.uk/library/resources/openaccess/repositories

• Exeter Research and Institutional Content archive (ERIC) – theses and publications

• Digital Collections Online (DCO) – images and multimedia

• Exeter Data Archive EDA - research data

DataCite list of data repositories http://datacite.org/repolist

Page 6: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Exeter Data Archive (EDA)

EDA enables searches by • subject• authors• Collections

It can be a place to store your data but also a good way to find potential collaborators and gaps.

Page 7: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Sustainability

Prior to deposit, check the data repository’s sustainability claims – both for your data and for the repository itself.

EDA Example

EDA regularly backs up its files according to current best practice.

In the event of Exeter Data Repository being closed down, the database will be transferred to another appropriate archive.

Policies

Page 8: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Formats

Prior to deposit, check to make sure that the data repository accepts the format(s) you will be working with.

Check to see if there are normalisation procedures (ingest, preservation).

EDA Example

The Exeter Data Archive (EDA) collects, preserves and makes available the University's research data. The content policy states that EDA will accept all types of data.

Policies

Page 9: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

What information do you need to provide?

Most repositories have a clearly defined set of minimum information requirements.

EDA Example

• Title • Data creator • Department• Date of publication • Dataset description

Policies

Page 10: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

• details of how the data have been encoded (database structures, file formats);

• a list of software known to work with the data and their supporting information;

• indications of how the data relate to other data assets;

• administrative information (identifiers, checksums);

General guidance

Page 11: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

• explanations of what the data represent (e.g. for sensor data, what the sensor was measuring and in what units); • the processing history of the data (how they were generated and subsequently transformed, when and by whom);

• a narrative describing the context (why the data were generated/collected, what methodology was used and why).

This information is particularly importantfor users as they interpret the data, and determine

whether andhow they can be integrated with other data.

General guidance

Page 12: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Access

Prior to deposit, check to make sure that the data repository’s

policy on access meets your needs.

EDA Example

• Anyone may access full items free of charge. • Copies of full items generally can be: (a)reproduced, displayed or performed, given to third

parties, and stored in a database in any format or medium (b) for personal research or study, educational, or not-

for-profit purposes without prior permission or charge

Policies

Page 13: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Restrictions on Access

Are there any restrictions on access to your data that the repository should be made aware of?

EDA Example

• Items can be deposited at any time, but will not be made publicly visible until any publishers' or funders' embargo period has expired. • This repository is not the publisher; it is merely the online archive.

Policies

Page 14: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Licensing

Prior to deposit, think about how you will license your data. Make sure that your data license respects limits associated with any external data you are using in your work.

EDA Example

• Full items must not be sold commercially in any format or medium without formal permission of the copyright holders. • Any copyright violations are entirely the responsibility of the authors/depositors• Some full items are individually tagged with different rights permissions and conditions .

Policies

Page 15: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Two key issues to consider:

• Licensing – legal instrument stating what people can and can’t do with your data

• Waivers – legal instrument allowing author to give up rights

General guidance for data licensing

Taken from DCC How-to guide on licensing data www.dcc.ac.uk/resources/how-guides

Page 16: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

• Attribution condition - allows others to copy, distribute, display, and perform the work as long as the creator is given due credit.

• Non-commercial – users cannot use the work for commercial purposes

• Share-alike – all derivative works must be released under the same licence as the original work

Attribution Non-Commercial Share Alike (CC-BY-NC-SA)http://creativecommons.org/

Creative Commons

Page 17: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

The most permissive way of releasing data is under a dedicationto the public domain. This is where all copyright interests anddatabase rights are waived, allowing the data to be used as freely as possible.

Creative Commons Zero (CC0) is the Creative Commons toolfor dedicating works to the public domain. It works on two levels:as a waiver of a person’s rights to the work, and in case that is not effective, as an irrevocable, royalty-free and unconditional licence for anyone to use the work for any purpose. http : / /creativecommons .org /publicdomain/zero/1.0/.

CC0

Page 18: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Citation

Make sure that your data is citable to increase your potential impact.

EDA Example

Once your work has been approved for entry into the EDA, you will receive a notification via email. This email will contain a permanent link to your work - you should cite this link in preference to the URL of the item as it provides continuing persistent access in case the URL should ever change.

Policies

Page 19: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

If you have generated/collected data to be used as evidence in an academic publication, you should deposit them with a suitable data archive or repository as soon as you are able.

If they do not provide you with a persistent identifier or URL for your data, encourage them to do so.

General guidance for data citation

Taken from DCC How-to guide on data citation www.dcc.ac.uk/resources/how-guides

Page 20: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

When citing a dataset in a paper, use the citation style required by the editor/publisher. If no form is suggested for datasets, take a standard data citation style and adapt it to match the style for textual publications.

Give dataset identifiers in the form of a URL wherever possible, unless otherwise directed.

Include data citations alongside those for textual publications. Some reference management packages now include support for datasets, which should make this easier.

General guidance for data citation

Page 21: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Cite datasets at the finest-grained level available that meets your need. If that is not fine enough, provide details of the subset of data you are using at the point in the text where you make the citation.

If a dataset exists in several versions, be sure to cite the exact version you used.

When you publish a paper that cites a dataset, notify the repository that holds the dataset, so it can add a link from that dataset to your paper.

General guidance for data citation

Page 22: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Using Exeter Data Archive (EDA)

EDA is currently being piloted. If you would like to place any of your data in the repository please email [email protected] and a member of the team will be in contact.

Page 23: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

But remember!

The validity and authenticity of the content ofsubmissions is the sole responsibility of the

depositor.

This is true for any data repository.

Page 24: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Managing datahttp://www.dcc.ac.uk/resources/external/tools-services/managing-active-research-data

Sharing and tracking reusehttp://www.dcc.ac.uk/resources/external/tools-services/sharing-output-and-tracking-impact

Useful resources - DCC Tools catalogue

Page 25: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Other ideas

Consider sharing negative results.

• Journal of Negative Results in BioMedicine www.jnrbm.com/

• Journal of Negative Results www.jnr-eeb.org/ 

• The All Results Journals www.arjournals.com/

Consider publishing on your RDM activity in curation journals http://www.dcc.ac.uk/resources/curation-journals

Page 26: Because good research needs good data The DCC lifecycle model, Exeter Uni, 18-19 May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson

Because good research needs good data

The DCC lifecycle model, Exeter Uni, 18-19 May 2011

Any questions?

For DCC guidance, tools and case studies see:www.dcc.ac.uk/resources

Follow us on twitter @digitalcuration and #ukdcc