Upload
charity-mason
View
214
Download
1
Embed Size (px)
Citation preview
a centre of expertise in data curation and preservation
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Scarp Investigating Our Digital landscape
1.The curation of earth observation data: an OAIS-based approach to preservation analysis2. Curating digital support materials for atmospheric science data
a centre of expertise in data curation and preservation
High level Data Survey
• World Data Centre• EISCAT• British Atmospheric Data Centre• ISIS• Diamond Light Source• Central laser Facility• Epubs • Tier 1
a centre of expertise in data curation and preservation
Analysed with high level data maps
a centre of expertise in data curation and preservation
Data set specific
- Iononosonde- MST- Eiscat
a centre of expertise in data curation and preservation
CASPAR Questionaire• Information/Performance/Behaviour does your current user extract from this data and what needs
preserving?
• What information do you provides to a new data user and what support do you give them during the use of the data.
• A clear definition for the information contained in the dataset
• How is the digitally encoded information ingested into the repository
• How is the required data currently located and accessed
• Are there any access restrictions
• Identify common ”domain objects” currently used/are these objects special cases of simpler objects
• What Information is required to reconstruct the information objects or reproduce the performance or duplicate the required behaviour?
• Structure Representation Information
• Semantic Representation Information
• How is the data physically stored?
• Are there any additional preservation requirements?
a centre of expertise in data curation and preservation
Stakeholder analysis• Funding Bodies• Scientific Organisations• Data Producers• Scientists in the Community• Data Archivist
a centre of expertise in data curation and preservation
Impact of Archive evolution and management
a centre of expertise in data curation and preservation
Preservation Data Flows and strategies
a centre of expertise in data curation and preservation
MST simple scenario – As a simple record of wind sped and trajectory above Aberystwyth
a centre of expertise in data curation and preservation
MST complex – support atmospheric study and climate modelling on a global scale.
1. Permitting study of the following
2. PrecipitationConvectionGravity WavesRossby WavesMesoscale and Microscale Structures.Fallstreak CloudsOzone Layering
a centre of expertise in data curation and preservation
Ionosonde simple scenario
a centre of expertise in data curation and preservation
Ionsonde complex scenario - requiring raw data, instrument provenance, data provenance related to scaling of parameters, software technical manuals, bibliographies journal articles Eiscat Simple – Standard program rslt files with basic description of integration and analysis
Eiscat Complex – Special program reanalysis scenario, raw data capturing the ability to reprocess, operational provenance and scientific intent outcome within scientific experimental proposals and output.
a centre of expertise in data curation and preservation
Wide ranging discipline specific information Survey
- 10 data sets inspected
- Over 1000 files manually read
- Over 3000 OAIS relationships classified
a centre of expertise in data curation and preservation
a centre of expertise in data curation and preservation
Atmospheric Datasets
a centre of expertise in data curation and preservation
Signifigant Properties of softwareThe BADC has substantial data holdings of its own and also provides information and
links to data held by other data centres. The data held at the BADC are of two types:
• Datasets produced by NERC-funded projects; these datasets are of high priority since the BADC may be the only long-term archive of the data.
• Third party datasets that are required by a large section of the UK atmospheric research community and are most efficiently made available through one location (e.g. Met Office and ECMWF datasets).
The BADC therefore develops, supports, supplies and provides access to a variety of software necessary to locate access and interpret this atmospheric data. The BADCwould categorise the types of software it interacts with in the following ways
• Software which it utilises to facilitate the direct discovery, permit remote or local access to data
• Software which processes archived data for the “on-the-fly” provision of processed data product
• Generic Analysis tools• Large Scale Modelling specifically the Met Office Unified Model• Data Set Specific software tools and scripts which are informally archived• Community based models and analysis tools
a centre of expertise in data curation and preservation
Software examples inspected 1.The BADC website www.badc.rl.ac.uk 2.SSH clients and localised processing of data 3.Trajectories4.Data Extractor 5 Geosplat 6.Xconvsh/convsh7.GrADS 8.CDAT9.Met Office Ported Unified Model 10.Data Set Specific software tools and scripts11. MST data plotting software12 .Collected scripts instinctive in organic collection13.Community based models and analysis tools
a centre of expertise in data curation and preservation
Repository Solutions?• What the functional requirements?• Should we collaborate or build our own?• What are the legal copyright issues ?
a centre of expertise in data curation and preservation
Repository scope: desired and required research deposit types
• The core content intended for capture by an E-prints repository can be characterised by the following deposit types
• Thesis or Dissertations• Research Papers• Pre-Prints• Reports• Working Papers• Conference PapersIt was felt that this type of traditional research output should be reasonably in scope for capture within an E-
Prints repository. We have noted that NCAS produces other types of digital materials which could contribute to the understanding of atmospheric science. Some examples of this type of information we have identified are
• Software including code, documentation. description of algorithms and support materials for use of software
• File format descriptions• Data dictionaries, thesauri and informal semantic descriptions• Data provenance information including technical manuals calibration and operational information • WebPages including support materials, educational materials, non technical documents for consumption
by general audience, information packs and background documents• Subject specific bibliographies and texts
a centre of expertise in data curation and preservation
Advantages of collaborating with the NERC Open Research Archive (NORA)
This repository currently permits deposit by the following NERC research centres
• The Proudman Oceanographic Institute• British Geological Survey• Centre for Ecology and Hydrology• British Antarctic Survey• NORA is now in a position where it could allow a
wider range of scientists including NCAS to use this repository, where NCAS would be an additional depositing centre.
a centre of expertise in data curation and preservation
Software Selection Options
There are number of options open to an organisation some of which we looked at included Eprints, DSpace, CDSWare, Fedora, I-ToR, MyCorEe, MPGeDoc, ARNO and Epubs and there is of course the possibility of writing our own bespoke solution.
We though it essential that the NCAS institutional repository software be• OAI Compliant• Open Source • Use established technology• Should be well supported and easy to maintain• Easily configurable to needs of NCAS • High degree of acceptance by the target user community• It was felt that the E-prints software most closely met these
requirements It also has the advantage of strong advocacy and support services surrounding E-prints which is currently endorsed by organisations such JISC, E-prints support services, DCC and NERC
a centre of expertise in data curation and preservation
Questions ?