22
a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Scarp Investigating Our Digital landscape 1.The curation of earth observation data: an OAIS-based approach to preservation analysis 2. Curating digital support materials for atmospheric science data

A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

Embed Size (px)

Citation preview

Page 1: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Scarp Investigating Our Digital landscape

1.The curation of earth observation data: an OAIS-based approach to preservation analysis2. Curating digital support materials for atmospheric science data

Page 2: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

High level Data Survey

• World Data Centre• EISCAT• British Atmospheric Data Centre• ISIS• Diamond Light Source• Central laser Facility• Epubs • Tier 1

Page 3: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Analysed with high level data maps

Page 4: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Data set specific

- Iononosonde- MST- Eiscat

Page 5: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

CASPAR Questionaire• Information/Performance/Behaviour does your current user extract from this data and what needs

preserving?

• What information do you provides to a new data user and what support do you give them during the use of the data.

• A clear definition for the information contained in the dataset

• How is the digitally encoded information ingested into the repository

• How is the required data currently located and accessed

• Are there any access restrictions

• Identify common ”domain objects” currently used/are these objects special cases of simpler objects

• What Information is required to reconstruct the information objects or reproduce the performance or duplicate the required behaviour?

• Structure Representation Information

• Semantic Representation Information

• How is the data physically stored?

• Are there any additional preservation requirements?

Page 6: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Stakeholder analysis• Funding Bodies• Scientific Organisations• Data Producers• Scientists in the Community• Data Archivist

Page 7: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Impact of Archive evolution and management

Page 8: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Preservation Data Flows and strategies

Page 9: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

MST simple scenario – As a simple record of wind sped and trajectory above Aberystwyth

Page 10: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

MST complex – support atmospheric study and climate modelling on a global scale.

1. Permitting study of the following

2. PrecipitationConvectionGravity WavesRossby WavesMesoscale and Microscale Structures.Fallstreak CloudsOzone Layering

Page 11: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Ionosonde simple scenario

Page 12: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Ionsonde complex scenario - requiring raw data, instrument provenance, data provenance related to scaling of parameters, software technical manuals, bibliographies journal articles Eiscat Simple – Standard program rslt files with basic description of integration and analysis

Eiscat Complex – Special program reanalysis scenario, raw data capturing the ability to reprocess, operational provenance and scientific intent outcome within scientific experimental proposals and output.

Page 13: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Wide ranging discipline specific information Survey

- 10 data sets inspected

- Over 1000 files manually read

- Over 3000 OAIS relationships classified

Page 14: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Page 15: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Atmospheric Datasets

Page 16: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Signifigant Properties of softwareThe BADC has substantial data holdings of its own and also provides information and

links to data held by other data centres. The data held at the BADC are of two types:

• Datasets produced by NERC-funded projects; these datasets are of high priority since the BADC may be the only long-term archive of the data.

• Third party datasets that are required by a large section of the UK atmospheric research community and are most efficiently made available through one location (e.g. Met Office and ECMWF datasets).

The BADC therefore develops, supports, supplies and provides access to a variety of software necessary to locate access and interpret this atmospheric data. The BADCwould categorise the types of software it interacts with in the following ways

• Software which it utilises to facilitate the direct discovery, permit remote or local access to data

• Software which processes archived data for the “on-the-fly” provision of processed data product

• Generic Analysis tools• Large Scale Modelling specifically the Met Office Unified Model• Data Set Specific software tools and scripts which are informally archived• Community based models and analysis tools

Page 17: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Software examples inspected 1.The BADC website www.badc.rl.ac.uk 2.SSH clients and localised processing of data 3.Trajectories4.Data Extractor 5 Geosplat 6.Xconvsh/convsh7.GrADS 8.CDAT9.Met Office Ported Unified Model 10.Data Set Specific software tools and scripts11. MST data plotting software12 .Collected scripts instinctive in organic collection13.Community based models and analysis tools

Page 18: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Repository Solutions?• What the functional requirements?• Should we collaborate or build our own?• What are the legal copyright issues ?

Page 19: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Repository scope: desired and required research deposit types

• The core content intended for capture by an E-prints repository can be characterised by the following deposit types

• Thesis or Dissertations• Research Papers• Pre-Prints• Reports• Working Papers• Conference PapersIt was felt that this type of traditional research output should be reasonably in scope for capture within an E-

Prints repository. We have noted that NCAS produces other types of digital materials which could contribute to the understanding of atmospheric science. Some examples of this type of information we have identified are

• Software including code, documentation. description of algorithms and support materials for use of software

• File format descriptions• Data dictionaries, thesauri and informal semantic descriptions• Data provenance information including technical manuals calibration and operational information • WebPages including support materials, educational materials, non technical documents for consumption

by general audience, information packs and background documents• Subject specific bibliographies and texts

Page 20: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Advantages of collaborating with the NERC Open Research Archive (NORA)

This repository currently permits deposit by the following NERC research centres

• The Proudman Oceanographic Institute• British Geological Survey• Centre for Ecology and Hydrology• British Antarctic Survey• NORA is now in a position where it could allow a

wider range of scientists including NCAS to use this repository, where NCAS would be an additional depositing centre.

Page 21: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Software Selection Options

There are number of options open to an organisation some of which we looked at included Eprints, DSpace, CDSWare, Fedora, I-ToR, MyCorEe, MPGeDoc, ARNO and Epubs and there is of course the possibility of writing our own bespoke solution.

We though it essential that the NCAS institutional repository software be• OAI Compliant• Open Source • Use established technology• Should be well supported and easy to maintain• Easily configurable to needs of NCAS • High degree of acceptance by the target user community• It was felt that the E-prints software most closely met these

requirements It also has the advantage of strong advocacy and support services surrounding E-prints which is currently endorsed by organisations such JISC, E-prints support services, DCC and NERC

Page 22: A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike

a centre of expertise in data curation and preservation

Questions ?