Upload
rdmrose
View
31
Download
0
Tags:
Embed Size (px)
Citation preview
Apr 15, 2023
The research data lifecycle
Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Research Data Management Workshop 1.4
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Session 1.4 overview
• DCC Curation Lifecycle Model– Background– Target audience– The 8 lifecycle actions
• Alternative lifecycle models
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
DCC Curation Lifecycle Model
• The DCC Curation Lifecycle Model is an authoritative generic model outlining what the umbrella term RDM consists of
• It outlines the activities that are required to successfully curate research data throughout its entire lifecycle
• The model is an idealised situation: curation is planned from the very beginning, and planned for throughout the lifecycle
• According to the DCC (n.d. c) this model is relevant to:– Data creators– Data archivists/curators– Data (re)users
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Background
• The DCC Curation Lifecycle Model is based on the OAIS Reference Model
• OAIS = Open Archival Information System (pictured)
• OAIS is a model that defines a generic framework for building a digital archive (Lavoie, 2004 and Ball, 2006)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Background
• The DCC Curation Lifecycle Model adds to the OAIS Reference Model
• It includes activities that take place outside the archival system: the research lifecycle
• In particular: – the creation of data, – the re-use of data by other research projects
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
DCC Curation Lifecycle Model
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Actions
• Three sets of actions:– Sequential Actions (8): key
actions needed as data move through their lifecycle
– Occasional Actions (3): only occur when special conditions are met, but they do not apply to all data
– Full Lifecycle Actions (4): apply to all stages in the lifecycle
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Actions
• Sequential actions (8):– Conceptualise– Create or receive– Appraise and select– Ingest– Preservation action– Store– Access, use and reuse– Transform
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 1 Conceptualise
• Aim:– Designing research
projects (and grant proposals) with digital curation in mind, so that you produce curation-ready data
• Rusbridge (2008): “Repeat after me: curation begins before creation!”
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Key activities
• DCC’s key activities include planning for:– Data capture and storage in curation-friendly file
formats (open standards)– Recording sufficient information at the time of
data capture to assist with ongoing management of those data and with their use
– Data storage on appropriate media– Identification of a safe place for the data and
ensuring that an archive will take them
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 2 Create or receive• Aim:
– Creating or receiving digital data that is curation-ready
• DCC’s key activities:– Researcher: Create data that is
curation ready, including metadata.
– LIS professional: Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Data quality
• Authentic: be what it purports to be.• Reliable: have trusted contents which
accurately reflects the business transaction documented.
• Have integrity: be complete and unaltered.• Usable: can be located, retrieved, presented
and interpreted. (ISO 15489-1.)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Metadata
• Descriptive: ensures identification, location and retrieval.• Technical: records the technical infrastructure used to
create or access the data.• Administrative: for management of data such as
acquisition, appraisal decisions, and IPR.• Use: manages access rights and tracks usage.• Preservation: records preservation actions, such as
checksums, and migrations (Based on Higgins, 2012, p. 38.)
• Representation information: metadata that are necessary to make the dataset intelligible to the designated community
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 3 Appraise and select• “the process of evaluating
material in order to decide which to retain over the long term, which to retain for the meantime, and which to discard.” (Higgins, 2012, p. 28)
• DCC’s key activities:– Evaluate data and select for
long-term curation and preservation
– Adhere to documented guidance, policies or legal requirements
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Why appraise and select?
• Digital content expands (data deluge).• Backup and mirroring increases costs.• Discovery gets harder.• Managing and preserving is expensive. (Based
on Whyte and Wilson, 2010.)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Significance
• Appraisal = “determination of significance” (Harvey, 2010, p. 132)
• What data do you need/want to keep?– Which datasets or digital resources do you want to keep? – Which characteristics or elements of these datasets or
resources do you want to keep? (Look and feel, structure of dataset, functionality such as hyperlinks or embedded comments, interoperability with other datasets.)
• How long do you need/want to keep the data?– E.g. in terms of user requirements (as evidence for verifying
conclusions) or risks of not keeping the data.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Criteria• General appraisal criteria (Whyte & Wilson, 2010)
– Relevance to mission of the repository– Relevance to research: scientific or historical value (inferring
anticipated future use)– Uniqueness (the only or most complete source? At risk of loss if
not accepted?)– Non-replicability (not feasible or impossible)– Potential for redistribution (depending on reliability, integrity
and usability of the data; legal issues may limit this)– Full documentation (to facilitate discovery, access, reuse etc.) – Economic case (costs of long-term maintenance vs potential
future benefits, available funding)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 4 Ingest
• DCC’s key activities:– Transfer data to an archive,
repository, data centre or other custodian
– Adhere to documented guidance, policies or legal requirements
• The term “Ingest” was introduced by the Open Archival Information System (OAIS) Reference Model
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Key activities for ingest
– Preparing the data for placing in long-term storage could involve:• Assigning a persistent identifier (such as a DOI)• Checking that the data does not contain malware• Extracting, creating and assigning description and
representation information• Creating fixity values (checksums)• Confirming technical details such as file formats• Combining the data and their associated metadata into
an Archival Information Package• Migrating data to a different file format (DCC, n.d. a)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 5 Preservation Action
• Aims:– To ensure that data
remains authentic, reliable and usable while maintaining its integrity (data quality).
• DCC’s key activities:– Undertaking actions to
ensure long-term preservation and retention of the authoritative nature of data.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Preservation actions and strategiesOngoing preservation actions (Lord and Macdonald, 2003, 30-31):• Data checking and cleaning
(detecting and correcting/removing corrupt or inaccurate data)
• Assigning preservation metadata and representation information
• Ensuring acceptable data structures or file formats (open standards)
• Apply good data management practices
• Implement secure storage and organisational continuity
Three main families of digital preservation strategies to combat obsolescence of hardware and software:• Information migration• Technology emulation• Technology
preservation (‘computer museums’)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 6 Store
• DCC’s key activities:– Storing the data in a secure
manner adhering to relevant standards
• This includes – the storage facilities
themselves, including refreshment of storage media to avoid hardware obsolescence or bit-rot
– the administration of the data storage service with appropriate policies
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Specific activities (Harvey, 2010)• Develop, maintain, and apply policies relating to secure data storage• Ensure that sufficient description and representation information is
stored with data• Use a reliable storage medium, preferably on more than one carrier
and with geographically distributed backup systems• Monitor events that might trigger other preservation actions (e.g.,
file format migration, file corruption)• Regularly check to ensure the integrity of the stored data and their
description and representation information• Ensure system and physical security• Maintain and replace the technical infrastructure as necessary• Develop, and administer as necessary, data recovery procedures
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 7 Access, use and reuse
• Aims:– Data can be located, and used
and reused by legitimate users
• DCC’s key activities:– Ensuring that data is
accessible to both designated users and re-users, on a day-to-day basis, usually (but not necessarily) in the form of publicly available published information
– Applying robust access controls and authentication procedures where applicable
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Specific actions
• Ensuring data is able to be discovered (located) by applying standards that ensure appropriate metadata are present so data can be located
• Ensuring that the required legal permissions are available for data to be used and reused, and that legal restrictions on the use and reuse of data are adhered to (funding bodies, legislation about confidentiality and privacy, IPR)
• Providing tools that allow collaboration in the use and reuse of data (e.g. annotation)
• Ensuring data is accessible only by authorised users, by applying access controls and authentication procedures.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Action 8 Transform
• DCC’s key activities:– Create new data from the
original data
• Methods:– Creating a subset (by
selection or query) to create newly derived results• for verification of results• as the basis of further
research
– Migration into a different format (migration changes data)
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 1: How usable is the model?
• How is the model different from a library’s typical emphasis on collection development for access (as opposed to preservation)?
• If you were a researcher, how useful would this model be?
• How useful is the DCC Lifecycle Model for you in your role?
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Other lifecycles
• DCC lifecycle emphasises data curation not research
• Other lifecycles for example more fully incorporate the research lifecycle
• E.g. the UK Data Archive’s research data lifecycle (on the left): http://data-archive.ac.uk/create-manage/life-cycle
Creating data
Processing data
Analysing data
Preserving data
Giving access to data
Re-using data
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Activity 2: Alternative lifecycle models
• Look at the Review of Data Management Lifecycle Models by A. Ball at http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf
• The document gives an overview of 8 alternative models, including the DCC Curation Lifecycle Model
• Examine the models. Which of these, if any, would you prefer to use when discussing RDM with a researcher, and why?
• Which of these would be most useful for you in your role, and why?
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
IMAGES, SOURCES AND REFERENCES
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Images
• Slide 4:– http://commons.wikimedia.org/wiki/File:OAIS-.gif
• DCC Curation Lifecycle Model:– http://
www.dcc.ac.uk/resources/curation-lifecycle-model
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
Sources
• Slides on the DCC Curation Lifecycle Model are based on:– DCC (n.d. a). Digital 101 materials. Edinburgh: Digital Curation
Centre. Retrieved from http://www.dcc.ac.uk/training/train-the-trainer/dc-101-training-materials.
– DCC (n.d. b) DCC Charter and Statement of Principles. Edinburgh: Digital Curation Centre. Retrieved from http://www.dcc.ac.uk/about-us/dcc-charter/dcc-charter-and-statement-principles.
– DCC (n.d. c) Lifecycle Model FAQ. Edinburgh: Digital Curation Centre. Rretrieved from http://www.dcc.ac.uk/resources/curation-lifecycle-model/lifecycle-model-faqs.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
References• Ball, A. (2006). Briefing Paper: the OAIS Reference Model. Retrieved
from http://www.ukoln.ac.uk/projects/grand-challenge/papers/oaisBriefing.pdf.
• Ball, A. (2012). Review of Data Management Lifecycle Models. Bath: University of Bath. Retrieved from http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf.
• Donnelly, M. (2012). Data management plans and planning. In G. Pryor (Ed.). Managing Research Data (pp. 83-104). London: Facet.
• Harvey, R. (2010) Digital Curation: A How-To-Do-It Manual. London: Facet.
• Higgins, S. (2012) The lifecycle of data management. In G. Pryor (Ed.). Managing Research Data (pp. 17-46). London: Facet.
Apr 15, 2023 Learning material produced by RDMRose http://www.sheffield.ac.uk/is/research/projects/rdmrose
References• Lavoie, B.F. (2004) The Open Archival Information System Reference Model:
introductory guide. Dublin, Ohio; York: OCLC Online Computer Library Centre; Digital Preservation Coalition. Retrieved from http://www.dpconline.org/docs/lavoie_OAIS.pdf.
• Lord, P. & Macdonald, A. (2003). Data Curation for e-Science in the UK: An Audit to Establish Requirements for Future Curation and Provision. Twickenham: The Digital Archiving Consultancy. http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf.
• Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B., & Stafford, S.G. (1997). Nongeospatial metadata for the ecological sciences. Ecological Applications, 7(1), 330-342.
• Rusbridge, C. (2008). Project data life course. Blogs. Edinburgh: Digital Curation Centre, http://www.dcc.ac.uk/news/project-data-life-course.
• Whyte, A., & Wilson, A. (2010). How to Appraise & Select Research Data for Curation. Edinburgh: Digital Curation Centre, http://www.dcc.ac.uk/webfm_send/828.