Transcript
Page 1: Principles for Sustainable Data Curation;

Principles for Sustainable Data Curation;

Steven WorleyComputational and Information Systems Laboratory

NCAR

Page 2: Principles for Sustainable Data Curation;

Can Research Library Repositories Benefit from the Federal Lab Experience?

Page 3: Principles for Sustainable Data Curation;

3

Topics

My perspective – Research Data Archive @ NCAR Principles for Sustainable Data Curation

Stable FundingKnowledgeable Staff Robust Digital StorageProtection from LossData and Metadata FormatPartnerships

Data Management Evolution

21 March 2012 ARL, Leadership Fellows

Page 4: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 4

My perspective – Research Data Archive @ NCAR

21 March 2012

Operational and Reanalysis Model Outputs

Meteorological and Oceanographic Observations

Remote Sensing Observations

Topography, Bathymetry, Vegetation, and Land Use

Core Data Categories

Page 5: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 5

My perspective – Research Data Archive @ NCAR

21 March 2012

Purposes Support climate & weather research at NCAR and

UCAR Universities Extend data service worldwide

Basic MetricsEstablished in 1960s600+ datasets, +4M files+70 datasets growing daily - monthly

Page 6: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 6

My perspective – Research Data Archive @ NCAR

21 March 20122006.0 2007.0 2008.0 2009.0 2010.0

0

1000

2000

3000

4000

5000

6000

7000

8000

0

100

200

300

400

500

600

700

800RDA Total Size and # of Unique Users

# of Users Size (TB)Year

Uni

que

Use

rs

Size

in T

B

Users

Size

Page 7: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 721 March 2012

• US• International

• Data• Assistance• Feedback

• Management• Supervision• Guidance• Integrity• Access

• Archiving• Metadata• Data Integrity• Preservation

Curation Steward-ship

UsersRequests

andNeeds

• Archiving• Metadata• Data Integrity• Preservation

Page 8: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 8

Sustainable Curation - Stable Funding

Permits: Flexibility

Evolution of data management to meet expectationsHolistic approach – not driven by narrowly defined

projectsTake advantage of unplanned opportunities

Necessary to keep collection viable for long-term

21 March 2012

Page 9: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 9

Sustainable Curation - Knowledgeable Staff

Data domain knowledge enables: Understand data and do integrity checks Choose data organization to fit science discipline Design appropriate access systems and do

consulting

Consistent staffing levels nurtures: Professionals dedicated to best practices Human-based knowledge cannot be under

estimated

21 March 2012

Page 10: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 10

Sustainable Curation – Robust Digital Storage

Keep pace with digital media evolution: Expect data migration every 2-5 years

Tape, disk capacity, etc. Plan, test, and implement migration carefully

Mistakes are irrecoverable!Use knowledgeable staff heavily

Why evolve? Users expect more data with faster access Media will eventually fail

21 March 2012

Page 11: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 11

Sustainable Curation – Protection from Loss

Create backup data and test disaster recoveryWhy? Physical failures

Environmental: Power outage, Fire, Flood, …..Hardware: Disk system failure, Tape degradation

Poor curation practicesMetadata lossAccidental data over-writes and deletions

Solutions Store backup at separate physical location Treat metadata and data as equals - couple together

21 March 2012

Page 12: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 12

Sustainable Curation – Protection from Loss

21 March 20122006.0 2007.0 2008.0 2009.0 2010.0

0

200

400

600

800

1000

1200

Size

in T

Bs

User Data

Full Archive

Page 13: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 13

Sustainable Curation – Protection from Loss

21 March 20122006.0 2007.0 2008.0 2009.0 2010.0

0

200

400

600

800

1000

1200

Size

in T

Bs

Full Archive

User DataBACKUPSRDA : 40%

Page 14: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 14

Sustainable Curation – Data and Metadata Format

Formats are a serious consideration because: Must maintain data access for long-term How?

Insist that data and metadata are in standard formatsAvoid computer OS dependent formats

Worry about application driven formatsE.G.: .xls, .xlsx, .doc, .docx, .ppt, .pptx, etc.

Challenge; Scientist are reluctant to help Curators nightmare; never ending data and

metadata format diversity

21 March 2012

Page 15: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 15

Sustainable Curation – Partnerships

Science productivity is enhanced by partnerships Open sharing of data and metadata

Relies heavily on standards No one archive or repository can do it all

BUT, users need/want it all Cost saving by sharing

21 March 2012

Page 16: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 16

Data Management Evolution – Person-centric

21 March 2012

1960s to 1990s

Page 17: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 17

Data Management Evolution – Metadata-centric

21 March 2012

1990s – 2010s

Page 18: Principles for Sustainable Data Curation;

18

Summary: For Research Library Repositories

21 March 2012 ARL, Leadership Fellows

Sustainable Data Curation

Stable Funding KnowledgeableStaff

Robust Digital Storage

Protection fromLoss

Data/MetadataFormat

Partnerships

Page 19: Principles for Sustainable Data Curation;

ARL, Leadership Fellows 1921 March 2012

Research Data Archive @ NCARhttp://dss.ucar.edu/

[email protected]


Recommended