16
Managing the Impacts of Managing the Impacts of Change on Archiving Change on Archiving Research Data Research Data A Presentation for “International Workshop on A Presentation for “International Workshop on Strategies for Preservation of and Open Strategies for Preservation of and Open Access to Scientific Data” Access to Scientific Data” June 23, 2004 June 23, 2004 Beijing, China Beijing, China Raymond McCord Raymond McCord Oak Ridge National Laboratory* Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S.

Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Embed Size (px)

Citation preview

Page 1: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Managing the Impacts of Managing the Impacts of Change on Archiving Research Change on Archiving Research

DataData

A Presentation for “International Workshop on A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Strategies for Preservation of and Open Access to

Scientific Data” Scientific Data”

June 23, 2004June 23, 2004

Beijing, ChinaBeijing, China

Raymond McCord Raymond McCord

Oak Ridge National Laboratory*Oak Ridge National Laboratory*

Oak Ridge, Tennessee, USAOak Ridge, Tennessee, USA*Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of

Energy under contract DE-AC05-00OR22725Energy under contract DE-AC05-00OR22725

Page 2: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Presentation StrategyPresentation Strategy

Change is part of Change is part of ScienceScience

Accommodating Accommodating changechange

Integration with Integration with good practicesgood practices

Page 3: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Research Implies Change …Research Implies Change …

repeat…

New informationrequirements

New questions

Research

DiscoveryNot always true for other information

systems

Page 4: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Minimize Changes / Maximize Minimize Changes / Maximize DocumentationDocumentation

Unpredicted variation in data during research Unpredicted variation in data during research is: is: No excuse for loose management of changes!!No excuse for loose management of changes!! Often used as an excuse to avoid standards.Often used as an excuse to avoid standards. Unavoidable in all cases, but try…Unavoidable in all cases, but try…

Missing values will occur; Plan aheadMissing values will occur; Plan ahead Avoid this complexity: “Temp, temp, t, T, temperature…”Avoid this complexity: “Temp, temp, t, T, temperature…”

A source of ambiguity; be clear.A source of ambiguity; be clear. Consider the view of future usersConsider the view of future users Minimal observational intensity is: Minimal observational intensity is:

No excuse (!!) for skipping documentation!!No excuse (!!) for skipping documentation!! Quick study = no documentation?? {NO}Quick study = no documentation?? {NO}

The unexpected are rare and

most valuable??

Page 5: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Management Issues to Management Issues to ConsiderConsider

What will change?What will change?Which changes can be controlled?Which changes can be controlled?How are changes approved?How are changes approved?How are users notified about How are users notified about

changes?changes?How and when can changes be How and when can changes be

“smoothed” in the cumulative view?“smoothed” in the cumulative view?

Page 6: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Things that will ChangeThings that will Change

Access expectationsAccess expectations Removal or addition of access restrictionsRemoval or addition of access restrictions

The scope and logical hierarchy of the The scope and logical hierarchy of the information. information. New parametersNew parameters New disciplinesNew disciplines New study sitesNew study sites New data sources or methodsNew data sources or methods

Revisions and additions to metadata codes Revisions and additions to metadata codes for parameters, sites, and measurements.for parameters, sites, and measurements.

Updates of hardware and softwareUpdates of hardware and software

Page 7: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Design Considerations (1)Design Considerations (1)

Create “extensible standards” for metadataCreate “extensible standards” for metadata Have a process for proposing and implementing new Have a process for proposing and implementing new

standard metadata codes.standard metadata codes. Record the effective dates of changes.Record the effective dates of changes.

Build databases and applications software “for Build databases and applications software “for change”change” Put labels in “lookup” tables (outside the software Put labels in “lookup” tables (outside the software

code)code) DO NOT let the flexibility needed to store the DO NOT let the flexibility needed to store the

information become constrained by software that is information become constrained by software that is too complex to be changed!!too complex to be changed!!

Ask developers: Ask developers: “How hard will this design be to change in the future?” Before software and Before software and databases are built.databases are built.

Page 8: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Design Considerations (2)Design Considerations (2)

Include notification procedures to data users Include notification procedures to data users about changesabout changes Process is simple – distribute information to Process is simple – distribute information to

previous data users.previous data users. Records about previous data access are required.Records about previous data access are required. The description of the change maybe difficult to acquire The description of the change maybe difficult to acquire

and manage.and manage.

Allocate resources for reprocessingAllocate resources for reprocessing Some changes over time maybe very difficult (and Some changes over time maybe very difficult (and

irritating) to the data users.irritating) to the data users. Reprocessing can “smooth over” some changes.Reprocessing can “smooth over” some changes.

Reprocessing may be limited by available Reprocessing may be limited by available documentation.documentation.

Page 9: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Change and Dataset DesignChange and Dataset Design

The following series The following series of slides present:of slides present: Basic “principles” for Basic “principles” for

good dataset design good dataset design ANDAND

How the “principles” How the “principles” need to be adapted need to be adapted to accommodate to accommodate changes and future changes and future data archiving.data archiving.

Page 10: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Rules for CreatingRules for CreatingDatasets for Archiving (1)Datasets for Archiving (1)

Unique OccurrencesUnique Occurrences Each type of measurement is represented in a Each type of measurement is represented in a

consistent way.consistent way. Each measurement event is represented by Each measurement event is represented by

only one value. only one value. If multiple versions of datasets accumulate:

provide version informationExplain version differencesDocument effective date range for each version

When was “it done this way” (observation date range) When was “it distributed this way” (distribution date

range)

Page 11: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Rules for CreatingRules for CreatingDatasets for Archiving (2)Datasets for Archiving (2)

IdentifiersIdentifiersEach value is associated with a Each value is associated with a

parameter name.parameter name.Each measurement value has a quality Each measurement value has a quality

indicator and link to a method indicator and link to a method description.description.

When possible remove multiple aliases for the same identifier (sample ID, site ID or name, measurement name, etc.).

Page 12: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Rules for CreatingRules for CreatingDatasets for Archiving (3)Datasets for Archiving (3)

Place and TimePlace and TimeEach value is associated with a unique Each value is associated with a unique

place name with a quantitatively defined place name with a quantitatively defined location (geographic coordinates).location (geographic coordinates).

Each value is associated with a date and Each value is associated with a date and time.time.

Do not confuse date and time for measurements with:Date and time for storage storage or revisions.Date and time ranges for measurement or

encoding methods.

Page 13: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Rules for CreatingRules for CreatingDatasets for Archiving (4)Datasets for Archiving (4)

Data Storage and TransportData Storage and Transport Data are stored or managed with a database management Data are stored or managed with a database management

system or self documenting data format.system or self documenting data format. NetCDF is an example of a non-proprietary data format that NetCDF is an example of a non-proprietary data format that

is self-documented.is self-documented. Developed by the atmospheric sciences research community.Developed by the atmospheric sciences research community. Main documentation and software libraries are openly available.Main documentation and software libraries are openly available. http://my.unidata.ucar.edu/content/software/netcdf/index.htmlhttp://my.unidata.ucar.edu/content/software/netcdf/index.html Some commercial data analysis software include interfaces to Some commercial data analysis software include interfaces to

this open format.this open format. Include data analysis software in data management suite

Useful for comparing versions of data that accumulate over time Include data format conversion software in data

management suite Useful for migrating data from storage technology to another

Page 14: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Best Practices for Preparing Ecological and Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Ground-Based Data Sets to Share and

ArchiveArchive Best Practices Include:Best Practices Include:

Assign descriptive file namesAssign descriptive file names Use consistent and stable file formatsUse consistent and stable file formats Define the parametersDefine the parameters Use consistent data organizationUse consistent data organization Perform basic quality assurancePerform basic quality assurance Assign descriptive data set titlesAssign descriptive data set titles Provide documentationProvide documentation

Published: Cook et al. 2001. Bulletin of Published: Cook et al. 2001. Bulletin of the Ecological Society of Americathe Ecological Society of America http://www.daac.ornl.gov/DAAC/PI/http://www.daac.ornl.gov/DAAC/PI/

bestprac.htmlbestprac.html

Page 15: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

A Future Scientist’s ViewA Future Scientist’s View

Three years ago:Three years ago:I told my college-age daughter about the I told my college-age daughter about the

Japanese announcement of 1 TB of Japanese announcement of 1 TB of optical memory in 1 cubic centimeter.optical memory in 1 cubic centimeter.

Her reply was:Her reply was:“…We need to know how to think

critically and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”

Page 16: Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access

Comments and Questions…Comments and Questions…