Upload
michael-svendsen
View
17
Download
2
Embed Size (px)
Citation preview
Reuse for Research
Curating Astrophysical Datasets for Future Researchers
Practice Paper, IDCC17
Anders Conrad, Royal Danish LibraryMichael Svendsen, Royal Danish Library
Rasmus Handberg, Aarhus University
The NASA Kepler/K2 Mission
Read about the mission at https://kepler.nasa.gov/Mission/QuickGuide/
The Kepler Photometer
From Space to Aarhus…
Spacecraft
Deep Space Network
NASA MAST archiveKASOC archive, Aarhus
KASC scientists/ working groups
KASOC website (kasoc.phys.au.dk)
The challenge - Where Next?•Data will remain valuable for active research for at least 50 years!
•Who will take care when the current research organisation (Kepler Asteroseismic Science Consortium, KASC) does no longer exist?
•How can data be kept accessible for continued active research?
KASC requirements for a Living Archive
•Available for 50 years •Always freely available on-line•Continue to be used for active research
•Extendable: New information can be added
•Formats must be readable by both humans and computers
•Understandable and useful for future researchers – no matter the science case
Future workshops - Reuse for Research
•For which research questions might future researchers find this data useful?
•How would they most likely want to see data packaged?
•What documentation is needed to understand data outside the current context?
•What search criteria would most likely be used to discover data?
The 50 Years Issue•Institutionally: Who can offer more than 5-10 years of storage and preservation?
•Financially: Who will pay?
•Technically: How will data remain readable and understandable?
•Scientifically: How will data remain useful and trustworthy?
From ”Who” and ”How” to…•How to best
•Structure datasets in a way that is most useful for research
•Use formats that are suitable for long-term preservation
•Secure sufficient contextual and specific documentation for scientific reuse
•Facilitate cross-institutional collaboration, to provide a sustainable service
•Secure access and discoverability according to scientific needs
•Secure possibility for continued deposit
Dataset Structure
•One self-containing dataset for each star•5 different types of data products•Dataset-specific documentation•TOC file (machine and human readable)
•References to publications (bibcodes)
•One generic documentation package•E.g. NASA and KASC release notes
One BagIt Archive for Each Star Kepler_10.zip
│ bag-info.txt │ bagit.txt │ fetch.txt │ manifest-sha1.txt└───data │ bundle.xml │ readme.txt ├───datafiles │ └───... ├───additional_files │ └───... ├───documentation │ └───... └───stellar_models └───...
Documentation for Each Dataset
<star kic="12345678"> <numax value="3100" error="20" unit="uHz" /> <mass value="1.0" error="0.01" unit="solar" /> <radius value="1.0" error="0.01" unit="solar" /> <datafiles> <datafile uid=”1” path=”datafiles/original/kplr12345678_llc.fits” /> <datafile uid=”2” path=”datafiles/kasoc.ts/kplr12345678_kasoc.ts.fits”> <dependency datafile=”1” /> </datafile> … </datafiles> <model path=”stellar_models/kic12345678/” /></star>
● The bundle.xml file
Proof-of-concept - Repository Setup•Using Dataverse repository software
•Support for astrophysics metadata•Discoverability and citability (Datacite
DOI’s)•API’s for automatic ingest workflow•Versioning – allowing redeposit of
extended versions of datasets•Issues:
•Missing numeric fields for celestial coordinates (for discovery)
•Limited options for mapping to external storage (we use erda.dk)
Institutional Collaboration
Conclusions – as of February 2017•Data packages designed in a way that can outlive repository software•Caveat: may imply limitations in the use of repository features
•Preservation actions will potentially be possible, even if we don’t plan them
•We still work on establishing funding and a sustainable business model
•We need to establish a production environment for repository
Reuse for Research
Contact: Michael Svendsen, @tullemich, Royal Danish Library