34
The research data lifecycle 3/24/22 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research /projects/rdmrose Research Data Management Workshop 1.4

RDMRose 1.4 The research data lifecycle

  • Upload
    rdmrose

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RDMRose 1.4 The research data lifecycle

Apr 15, 2023

The research data lifecycle

Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Research Data Management Workshop 1.4

Page 2: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Session 1.4 overview

• DCC Curation Lifecycle Model– Background– Target audience– The 8 lifecycle actions

• Alternative lifecycle models

Page 3: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

DCC Curation Lifecycle Model

• The DCC Curation Lifecycle Model is an authoritative generic model outlining what the umbrella term RDM consists of

• It outlines the activities that are required to successfully curate research data throughout its entire lifecycle

• The model is an idealised situation: curation is planned from the very beginning, and planned for throughout the lifecycle

• According to the DCC (n.d. c) this model is relevant to:– Data creators– Data archivists/curators– Data (re)users

Page 4: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Background

• The DCC Curation Lifecycle Model is based on the OAIS Reference Model

• OAIS = Open Archival Information System (pictured)

• OAIS is a model that defines a generic framework for building a digital archive (Lavoie, 2004 and Ball, 2006)

Page 5: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Background

• The DCC Curation Lifecycle Model adds to the OAIS Reference Model

• It includes activities that take place outside the archival system: the research lifecycle

• In particular: – the creation of data, – the re-use of data by other research projects

Page 6: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

DCC Curation Lifecycle Model

Page 7: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Actions

• Three sets of actions:– Sequential Actions (8): key

actions needed as data move through their lifecycle

– Occasional Actions (3): only occur when special conditions are met, but they do not apply to all data

– Full Lifecycle Actions (4): apply to all stages in the lifecycle

Page 8: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Actions

• Sequential actions (8):– Conceptualise– Create or receive– Appraise and select– Ingest– Preservation action– Store– Access, use and reuse– Transform

Page 9: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 1 Conceptualise

• Aim:– Designing research

projects (and grant proposals) with digital curation in mind, so that you produce curation-ready data

• Rusbridge (2008): “Repeat after me: curation begins before creation!”

Page 10: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Key activities

• DCC’s key activities include planning for:– Data capture and storage in curation-friendly file

formats (open standards)– Recording sufficient information at the time of

data capture to assist with ongoing management of those data and with their use

– Data storage on appropriate media– Identification of a safe place for the data and

ensuring that an archive will take them

Page 11: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 2 Create or receive• Aim:

– Creating or receiving digital data that is curation-ready

• DCC’s key activities:– Researcher: Create data that is

curation ready, including metadata.

– LIS professional: Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata.

Page 12: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Data quality

• Authentic: be what it purports to be.• Reliable: have trusted contents which

accurately reflects the business transaction documented.

• Have integrity: be complete and unaltered.• Usable: can be located, retrieved, presented

and interpreted. (ISO 15489-1.)

Page 13: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Metadata

• Descriptive: ensures identification, location and retrieval.• Technical: records the technical infrastructure used to

create or access the data.• Administrative: for management of data such as

acquisition, appraisal decisions, and IPR.• Use: manages access rights and tracks usage.• Preservation: records preservation actions, such as

checksums, and migrations (Based on Higgins, 2012, p. 38.)

• Representation information: metadata that are necessary to make the dataset intelligible to the designated community

Page 14: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 3 Appraise and select• “the process of evaluating

material in order to decide which to retain over the long term, which to retain for the meantime, and which to discard.” (Higgins, 2012, p. 28)

• DCC’s key activities:– Evaluate data and select for

long-term curation and preservation

– Adhere to documented guidance, policies or legal requirements

Page 15: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Why appraise and select?

• Digital content expands (data deluge).• Backup and mirroring increases costs.• Discovery gets harder.• Managing and preserving is expensive. (Based

on Whyte and Wilson, 2010.)

Page 16: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Significance

• Appraisal = “determination of significance” (Harvey, 2010, p. 132)

• What data do you need/want to keep?– Which datasets or digital resources do you want to keep? – Which characteristics or elements of these datasets or

resources do you want to keep? (Look and feel, structure of dataset, functionality such as hyperlinks or embedded comments, interoperability with other datasets.)

• How long do you need/want to keep the data?– E.g. in terms of user requirements (as evidence for verifying

conclusions) or risks of not keeping the data.

Page 17: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Criteria• General appraisal criteria (Whyte & Wilson, 2010)

– Relevance to mission of the repository– Relevance to research: scientific or historical value (inferring

anticipated future use)– Uniqueness (the only or most complete source? At risk of loss if

not accepted?)– Non-replicability (not feasible or impossible)– Potential for redistribution (depending on reliability, integrity

and usability of the data; legal issues may limit this)– Full documentation (to facilitate discovery, access, reuse etc.) – Economic case (costs of long-term maintenance vs potential

future benefits, available funding)

Page 18: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 4 Ingest

• DCC’s key activities:– Transfer data to an archive,

repository, data centre or other custodian

– Adhere to documented guidance, policies or legal requirements

• The term “Ingest” was introduced by the Open Archival Information System (OAIS) Reference Model

Page 19: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Key activities for ingest

– Preparing the data for placing in long-term storage could involve:• Assigning a persistent identifier (such as a DOI)• Checking that the data does not contain malware• Extracting, creating and assigning description and

representation information• Creating fixity values (checksums)• Confirming technical details such as file formats• Combining the data and their associated metadata into

an Archival Information Package• Migrating data to a different file format (DCC, n.d. a)

Page 20: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 5 Preservation Action

• Aims:– To ensure that data

remains authentic, reliable and usable while maintaining its integrity (data quality).

• DCC’s key activities:– Undertaking actions to

ensure long-term preservation and retention of the authoritative nature of data.

Page 21: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Preservation actions and strategiesOngoing preservation actions (Lord and Macdonald, 2003, 30-31):• Data checking and cleaning

(detecting and correcting/removing corrupt or inaccurate data)

• Assigning preservation metadata and representation information

• Ensuring acceptable data structures or file formats (open standards)

• Apply good data management practices

• Implement secure storage and organisational continuity

Three main families of digital preservation strategies to combat obsolescence of hardware and software:• Information migration• Technology emulation• Technology

preservation (‘computer museums’)

Page 22: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 6 Store

• DCC’s key activities:– Storing the data in a secure

manner adhering to relevant standards

• This includes – the storage facilities

themselves, including refreshment of storage media to avoid hardware obsolescence or bit-rot

– the administration of the data storage service with appropriate policies

Page 23: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Specific activities (Harvey, 2010)• Develop, maintain, and apply policies relating to secure data storage• Ensure that sufficient description and representation information is

stored with data• Use a reliable storage medium, preferably on more than one carrier

and with geographically distributed backup systems• Monitor events that might trigger other preservation actions (e.g.,

file format migration, file corruption)• Regularly check to ensure the integrity of the stored data and their

description and representation information• Ensure system and physical security• Maintain and replace the technical infrastructure as necessary• Develop, and administer as necessary, data recovery procedures

Page 24: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 7 Access, use and reuse

• Aims:– Data can be located, and used

and reused by legitimate users

• DCC’s key activities:– Ensuring that data is

accessible to both designated users and re-users, on a day-to-day basis, usually (but not necessarily) in the form of publicly available published information

– Applying robust access controls and authentication procedures where applicable

Page 25: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Specific actions

• Ensuring data is able to be discovered (located) by applying standards that ensure appropriate metadata are present so data can be located

• Ensuring that the required legal permissions are available for data to be used and reused, and that legal restrictions on the use and reuse of data are adhered to (funding bodies, legislation about confidentiality and privacy, IPR)

• Providing tools that allow collaboration in the use and reuse of data (e.g. annotation)

• Ensuring data is accessible only by authorised users, by applying access controls and authentication procedures.

Page 26: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Action 8 Transform

• DCC’s key activities:– Create new data from the

original data

• Methods:– Creating a subset (by

selection or query) to create newly derived results• for verification of results• as the basis of further

research

– Migration into a different format (migration changes data)

Page 27: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 1: How usable is the model?

• How is the model different from a library’s typical emphasis on collection development for access (as opposed to preservation)?

• If you were a researcher, how useful would this model be?

• How useful is the DCC Lifecycle Model for you in your role?

Page 28: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Other lifecycles

• DCC lifecycle emphasises data curation not research

• Other lifecycles for example more fully incorporate the research lifecycle

• E.g. the UK Data Archive’s research data lifecycle (on the left): http://data-archive.ac.uk/create-manage/life-cycle

Creating data

Processing data

Analysing data

Preserving data

Giving access to data

Re-using data

Page 29: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Activity 2: Alternative lifecycle models

• Look at the Review of Data Management Lifecycle Models by A. Ball at http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf

• The document gives an overview of 8 alternative models, including the DCC Curation Lifecycle Model

• Examine the models. Which of these, if any, would you prefer to use when discussing RDM with a researcher, and why?

• Which of these would be most useful for you in your role, and why?

Page 30: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

IMAGES, SOURCES AND REFERENCES

Page 31: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Images

• Slide 4:– http://commons.wikimedia.org/wiki/File:OAIS-.gif

• DCC Curation Lifecycle Model:– http://

www.dcc.ac.uk/resources/curation-lifecycle-model

Page 32: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

Sources

• Slides on the DCC Curation Lifecycle Model are based on:– DCC (n.d. a). Digital 101 materials. Edinburgh: Digital Curation

Centre. Retrieved from http://www.dcc.ac.uk/training/train-the-trainer/dc-101-training-materials.

– DCC (n.d. b) DCC Charter and Statement of Principles. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/about-us/dcc-charter/dcc-charter-and-statement-principles.

– DCC (n.d. c) Lifecycle Model FAQ. Edinburgh: Digital Curation Centre. Rretrieved from http://www.dcc.ac.uk/resources/curation-lifecycle-model/lifecycle-model-faqs.

Page 33: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References• Ball, A. (2006). Briefing Paper: the OAIS Reference Model. Retrieved

from http://www.ukoln.ac.uk/projects/grand-challenge/papers/oaisBriefing.pdf.

• Ball, A. (2012). Review of Data Management Lifecycle Models. Bath: University of Bath. Retrieved from http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf.

• Donnelly, M. (2012). Data management plans and planning. In G. Pryor (Ed.). Managing Research Data (pp. 83-104). London: Facet.

• Harvey, R. (2010) Digital Curation: A How-To-Do-It Manual. London: Facet.

• Higgins, S. (2012) The lifecycle of data management. In G. Pryor (Ed.). Managing Research Data (pp. 17-46). London: Facet.

Page 34: RDMRose 1.4 The research data lifecycle

Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose

References• Lavoie, B.F. (2004) The Open Archival Information System Reference Model:

introductory guide. Dublin, Ohio; York: OCLC Online Computer Library Centre; Digital Preservation Coalition. Retrieved from http://www.dpconline.org/docs/lavoie_OAIS.pdf.

• Lord, P. & Macdonald, A. (2003). Data Curation for e-Science in the UK: An Audit to Establish Requirements for Future Curation and Provision. Twickenham: The Digital Archiving Consultancy. http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf.

• Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B., & Stafford, S.G. (1997). Nongeospatial metadata for the ecological sciences. Ecological Applications, 7(1), 330-342.

• Rusbridge, C. (2008). Project data life course. Blogs. Edinburgh: Digital Curation Centre, http://www.dcc.ac.uk/news/project-data-life-course.

• Whyte, A., & Wilson, A. (2010). How to Appraise & Select Research Data for Curation. Edinburgh: Digital Curation Centre, http://www.dcc.ac.uk/webfm_send/828.