Principles for Sustainable Data Curation;

  • Published on

  • View

  • Download

Embed Size (px)


Principles for Sustainable Data Curation;. Steven Worley Computational and Information Systems Laboratory NCAR. Can Research Library Repositories Benefit from the Federal Lab Experience?. Topics. My perspective Research Data Archive @ NCAR Principles for Sustainable Data Curation - PowerPoint PPT Presentation


TIGGE Archive Access at NCAR

Principles for Sustainable Data Curation;Steven WorleyComputational and Information Systems LaboratoryNCAR

1Pleasure to Speak to leadership of the Association of Research LibrariansOver the past 5-years the data curation and stewardship community has been drawing closer to the Library community for good reasons.By working together we can better support science research and productivityToday is another chance to continue that conversation

TITLE , dryCan Research Library Repositories Benefit from the Federal Lab Experience?2Best outcome for me is you take away some best practices that can be applied in your libraries as they develop digital repositories.TopicsMy perspective Research Data Archive @ NCARPrinciples for Sustainable Data CurationStable FundingKnowledgeable Staff Robust Digital StorageProtection from LossData and Metadata FormatPartnershipsData Management Evolution

321 March 2012ARL, Leadership FellowsMy perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows4

Operational and Reanalysis Model Outputs

Meteorological and Oceanographic Observations

Remote Sensing Observations

Topography, Bathymetry, Vegetation, and Land UseCore Data CategoriesSuite of information to support Earth Systems Research4My perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows5Purposes Support climate & weather research at NCAR and UCAR Universities Extend data service worldwide Basic MetricsEstablished in 1960s600+ datasets, +4M files+70 datasets growing daily - monthlyMy perspective Research Data Archive @ NCAR21 March 2012ARL, Leadership Fellows621 March 2012ARL, Leadership Fellows7USInternationalDataAssistanceFeedbackManagementSupervisionGuidanceIntegrityAccessArchivingMetadataData IntegrityPreservation

CurationSteward-shipUsersRequests andNeedsArchivingMetadataData IntegrityPreservation

Simplified Data Life Cycle, focus on the Curation part today7Sustainable Curation - Stable Funding

Permits:FlexibilityEvolution of data management to meet expectationsHolistic approach not driven by narrowly defined projectsTake advantage of unplanned opportunities

Necessary to keep collection viable for long-term

21 March 2012ARL, Leadership Fellows8Sustainable Curation - Knowledgeable StaffData domain knowledge enables:Understand data and do integrity checksChoose data organization to fit science disciplineDesign appropriate access systems and do consulting

Consistent staffing levels nurtures:Professionals dedicated to best practices Human-based knowledge cannot be under estimated21 March 2012ARL, Leadership Fellows9Big challenge for new repositories that have a broad data scopeWe find 5-10 years experience are needed to create a data scientist expert9Sustainable Curation Robust Digital Storage Keep pace with digital media evolution:Expect data migration every 2-5 yearsTape, disk capacity, etc.Plan, test, and implement migration carefullyMistakes are irrecoverable!Use knowledgeable staff heavily

Why evolve?Users expect more data with faster accessMedia will eventually fail21 March 2012ARL, Leadership Fellows10This rate of media evolution is new for Librarian experts10Sustainable Curation Protection from LossCreate backup data and test disaster recoveryWhy?Physical failuresEnvironmental: Power outage, Fire, Flood, ..Hardware: Disk system failure, Tape degradationPoor curation practicesMetadata lossAccidental data over-writes and deletionsSolutionsStore backup at separate physical locationTreat metadata and data as equals - couple together21 March 2012ARL, Leadership Fellows11If you lose metadata access may not be possible (documentation, software etc)No question, it is not if this will happen, but WHEN!11Sustainable Curation Protection from Loss21 March 2012ARL, Leadership Fellows1212Sustainable Curation Protection from Loss21 March 2012ARL, Leadership Fellows13RDA : 40%13Sustainable Curation Data and Metadata FormatFormats are a serious consideration because:Must maintain data access for long-termHow?Insist that data and metadata are in standard formatsAvoid computer OS dependent formatsWorry about application driven formatsE.G.: .xls, .xlsx, .doc, .docx, .ppt, .pptx, etc.Challenge; Scientist are reluctant to helpCurators nightmare; never ending data and metadata format diversity 21 March 2012ARL, Leadership Fellows14Proprietary formats, e.g. instrument dependent output definitely out!Data in MicroSoft work books is especially hard to deal with exporting it is error prone, e.g. empty cells, merged cells, etc DIVERSITY => UNSCALABLE SYSTEM, very expensive14Sustainable Curation PartnershipsScience productivity is enhanced by partnershipsOpen sharing of data and metadataRelies heavily on standardsNo one archive or repository can do it allBUT, users need/want it allCost saving by sharing21 March 2012ARL, Leadership Fellows1515Data Management Evolution Person-centric21 March 2012ARL, Leadership Fellows161960s to 1990s

16Data Management Evolution Metadata-centric21 March 2012ARL, Leadership Fellows17

1990s 2010s17Summary: For Research Library Repositories 1821 March 2012ARL, Leadership FellowsSustainable Data CurationStable FundingKnowledgeableStaffRobust Digital StorageProtection fromLossData/MetadataFormatPartnershipsCuration is support by best practices in six areas, bundled together in an operational system will facilitate susttainability1821 March 2012ARL, Leadership Fellows19Research Data Archive @ NCAR

Word cloud19


View more >