17
Shortened presentation title Shortened presentation title AMS 99 th Annual Meeting: TJ 22.4 Data management services supported by NCAR’s Research Data Archive AMS 99th Annual Meeting Paper TJ 22.4 10 January 2019 Thomas Cram and Doug Schuster National Center for Atmospheric Research Computational and Information Systems Laboratory (CISL) Research Data Archive

Data management services supported by NCAR’s Research Data ... · RDA user services-Challenges & strategies 4 •Enable reproducible research –Structured DOI policies –User

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Data management services supported by NCAR’s Research Data Archive

AMS 99th Annual MeetingPaper TJ 22.4

10 January 2019

Thomas Cram and Doug SchusterNational Center for Atmospheric Research

Computational and Information Systems Laboratory (CISL)

Research Data Archive

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

About the RDA• History– Established 1960s

• Purpose– Support climate & weather research at NCAR and

UCAR universities with reference datasets• Collections– Ocean & atmospheric observations, climate

reanalyses, operational NWP products– 600+ datasets, 10M files, 2.2 PB– Continually growing: 70+ updated daily-monthly

• Free and open access• Science educated staff

2

rda.ucar.edu

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

RDA Data usage

3

• FY 201814K+ unique web users2.85 PB data delivered

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

RDA user services- Challenges & strategies

4

• Enable reproducible research– Structured DOI policies– User history of data access maintained to

facilitate dynamic citation generation

• Make data discovery easy & relevant– Maintain robust metadata DB

• Provide scalable data access capabilities – Diversity of data formats/dialects– Growth in user numbers and types– Growth in archive volume and complexity

• Goal: Reduce time spent dealing with data

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

• Registration through DataCite.org• Collection level DOI, i.e. one DOI per

data collection• MySQL DB records for each file– DOI– Internal Version Control (IVC)– Date and time stamp of file activities

• Other features– Maintain history of file activities– Tracks user access via registration and

login

Data citation – Technical approach

5

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Three ways to get a citation

1. Generic citation, from RDA portal• Multiple citation formats

User DOI services

6

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Three ways to get a citation

1. Generic citation, from RDA portal2. Download service (scripts, subsetting):

Provide complete dataset citation, including “Accessed on” date

User DOI services

7

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Three ways to get a citation

1. Generic citation, from RDA portal2. Download service (scripts, subsetting):

Provide complete dataset citation, including “Accessed on” date

3. Generate citations on demand at a later time– Display user specific access activities• Utilize registration information

– Allow activity selection– Create the complete citation

User DOI services

8

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Dataset citation tracker

9

• Citations retrieved via – CrossRef.org API (journal articles)– Google API (books, chapters, etc.)

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Dataset citation tracker

10

• Citations retrieved via – CrossRef.org API (journal articles)– Google API (books, chapters, etc.)

• Current metrics (Jan. 2019)– 55 journal articles– 3 book chapters

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

• Citations retrieved via – CrossRef.org API (journal articles)– Google API (books, chapters, etc.)

• Current metrics (Jan. 2019)– 55 journal articles– 3 book chapters

• Adherence to citation guidelines (authors)– Enforce at review stage?

• Dependent on publishers providing info to CrossRef.org

Dataset citation tracker

11

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4 12

• Maintain programmatic data ingest capabilities

• Support a range of dataset search/discovery mechanisms

– Metadata Standards (ISO-19137, NASA-DIF, FGDC, etc.)

– Controlled vocabularies – NASA GCMD– Access points (OAI-PMH, CSW) – Schema.org embedded JSON-LD

metadata (to support Google Dataset Search)

• Support data subsetting services– 97% data download reduction

Metadata processing & support

toolbox.google.com/datasetsearch

Google Dataset Search

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4 13

• Maintain programmatic data ingest capabilities

• Support a range of dataset search/discovery mechanisms

– Metadata Standards (ISO-19137, NASA-DIF, FGDC, etc.)

– Controlled vocabularies – NASA GCMD– Access points (OAI-PMH, CSW) – Schema.org embedded JSON-LD

metadata (to support Google Dataset Search)

• Support data subsetting services– 97% data download reduction

Metadata processing & support

FY 2018 data subsetting• 57,000+ user requests• 43 PB data input for processing• 749 TB data output delivered• 98% data download volume reduction

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

• Dataset file feature harvesting is critical– Supports value added services– Validates file integrity during data ingest

• Programmatic harvest of data citation metrics is just beginning.– Wanted:

• More support from publishers through API access• Adherence to citation guidelines (authors)

– Participate in community efforts to broaden this capability (e.g. Make Data Count, makedatacount.org)

• For more info: rda.ucar.edu/#!data-citation

Lessons learned/Looking ahead

14

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Contact [email protected]

15

ncarrda.blogspot.com

@NCAR.RDA

@NCAR_RDA

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

Backup slides

16

Shortened presentation title

Shortened presentation title

AMS 99th Annual Meeting: TJ 22.4

1. Create a new DOI dataset2. Complete dataset replacement (new data

from the provider)3. Routine DOI dataset extension in time

(regularly updated datasets)4. Removal of DOI dataset5. Small scale data replacement within a

dataset (error fixes)

DOI policies – RDA use cases

17