50
Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh

Bridging the Gap between Libraries and Data Archives: Progress Report

  • Upload
    neena

  • View
    31

  • Download
    3

Embed Size (px)

DESCRIPTION

Bridging the Gap between Libraries and Data Archives: Progress Report. Roger Revelle, Gulf of California Expedition, 1939. JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh. Two new NSF Projects …. “Bridging the Gap between Libraries and Data Archives” - PowerPoint PPT Presentation

Citation preview

Page 1: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Bridging the Gap between Libraries and Data Archives: Progress Report

Roger Revelle, Gulf of California Expedition, 1939

JISC/NSF Digital Libraries Initiative All Projects Meeting

24-25 June 2002, Edinburgh

Page 2: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Two new NSF Projects …

“Bridging the Gap between Libraries and Data Archives”

NSDL Collections Track

“SIOExplorer: Web Exploration of Seagoing Archives”

Information Technology Research (ITR)

• Started October 2001

Page 3: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Collaborative effort

UCSD LibrariesScripps Institution of OceanographySan Diego Supercomputer Center

Advisory Board

NOAAUS Naval Oceanographic OfficePrivate IndustryOther oceanographic institutions

Page 4: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Combine …

Data50 years of digital dataGrowing 200 GB per year

Images99 years of SIO Archives

DocumentsReports, publications, books

… into one digital library

Page 5: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Data in the collection …

Page 6: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Bathymetry, magnetics, gravity• Gathered from worldwide sources

795 SIO cruise legs• Swath bathymetry since 1981

Approx. 3000 cruise legs online at SIO

Page 7: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Multibeam sonar revolutionizes seafloor understanding

Map a wide swath• Not just a single profile

– SeaBeam Classic, 1981-1992– 16 beams

– SeaBeam 2000, 1992-– 121 beams

– SeaBeam 2100, 1996-2000– 151 beams

– Simrad EM120, 2001-– 191 beams– 150 degree swath width

• Also backscatter– Determine bottom type

– Sediment– Lava flow

Realtime swath 20 km across-track

Page 8: Bridging the Gap  between Libraries and Data Archives:  Progress Report

SIO Swath Mapping Expeditions

244 swath mapping cruises on vessels, since 1981

Thomas Washington Melville Revelle

600 GB multibeam holdings Adding 200 GB/year

Page 9: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Deliver sampling information

Sample index, 1968-• 100,000 entries• 500 types

– Dredged rocks, cores– Biological trawls– Water samples– CTD

Build on www.EarthRef.org• Seamount catalog

(Amelia Earhart)

Roger Revelle, MidPac, 1950

Page 10: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Images in the collection …

Page 11: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Access Voyages of DiscoveryEncourage inquiry

• “What’s this?” links from image– Data (“What”)– Instruments (“How”)– Other voyages

Dual useResearch and education

Naga Expedition, 1959-61(artist’s illustrations from logbook)

Page 12: Bridging the Gap  between Libraries and Data Archives:  Progress Report

R/V Albatross departed SIO 1904

Sigsbee sounding machine

Page 13: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Voyages of Discovery in the PacificLa Perouse 1780’s

R/V Revelle • “La Perouse Expedition”

– Departed June 8

R/V Melville• “Cook Expedition”

– Returns July 17

Special Collections, UCSD Library

James CookBy Nathaniel Dance, 1776

Page 14: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Voyages of Discovery in the Pacific1950’s

Ed Hamilton, MidPac, 1950

Samoa, Capricorn, 1952

Page 15: Bridging the Gap  between Libraries and Data Archives:  Progress Report

R/V Spencer F. Baird

L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius,Henri Rotschi, Robert Livingston, Russell Raitt.

Seated: Dick Blumberg, Ronald Mason,Bob Dill, Art Maxwell, Winter Horton, Walter Munk,Helen Raitt

Capricorn Expedition, 1952-53

Query for ideas and careers

Not just data

Track a scientist’s expeditions and publications

Page 16: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Documents in the collection …

Page 17: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Full text of publications

The Challenger Expedition• 30,000 scanned pages

Anatomy of an Expedition• Bill Menard, 1967 Nova Expedition

– Link to 1998 Avon Expedition

Exploring the Deep Pacific• Helen Raitt, 1952 Capricorn Expedition

Page 18: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Cruise reports50 years available

• Scan older versions• Currently generate .pdf automatically

Page with swath bathymetry every 6 hours

Page 19: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Bridging the Gap:Progress Report

Page 20: Bridging the Gap  between Libraries and Data Archives:  Progress Report

The Problem

Archives are search-impaired

Content not a problemMaterial exists in great abundance

• Data archives• Historical archives

But it is hard to getLitany of woes …

Page 21: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Litany of archive woes

Magnetic media at risk• Need to migrate to new storage

Local access only• Some online, but sprawling

directories• Tapes and CDs in drawers• Inconsistent naming over 30 years

Home-grown software• Pre-database technology• Minimal documentation• Formal metadata non-existent

Creators now retired

What to do?

Shipboard archives for one recent cruise

Page 22: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Steps toward a SolutionSeek professional help

Computer scientistsAdvisory Board(Similar problems faced in many fields)

Review the problemSeven issues from national workshopAnalyze the dataflow

Build a prototype

Test the prototypeNew Zealand – Samoa Expedition

Page 23: Bridging the Gap  between Libraries and Data Archives:  Progress Report

SearchMetadata rarely exist

AccessAutomated management

QualityA challenge

DisplayInteractive tools

FlexibilityImport, export

ScalabilityInteroperate with large projects

StabilityCuration, beyond end of project

Review archive problems

NSF/ONR Marine Geology and Geophysics Workshop

Page 24: Bridging the Gap  between Libraries and Data Archives:  Progress Report

First, create a conceptual data model

Spend time to review with all participants

Design a robust model• Define common categories

– 9 basic directories– Specific subdirectories

• Controlled design document

Map existing digital objects to categories• Both documents and data• Accommodate variations

– Data types and names over 50 years– Valid for future developments

Result “CCDS” – Canonical Cruise Data Structure

Dataflow

Page 25: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Second, organize domain-specific content

Work inside a “Staging Area”

• Deal with complexity– Extract from 3 archive levels

– Shipboard (tape, CD)– Post-processing lab (tape)– Current online content – (not always “best”)

• Opportunity for data cleanup– Apply corrections– Weed out intermediate and duplicate versions– Gather information for metadata

Page 26: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Third, load the “CCDS”Clear transition in activities

• Domain specialists final approval• IT team takes over

Early mistake • “Pushed” content from legacy data directories

– Complex, vary over years– Revised to “pull” into Canonical Structure

IT lesson learned• Dataflow needs to be “template-driven”• Template can incorporate

– Rules for automatic loading– Adaptive choice among multiple alternatives

• Maintain flexibility as project evolves– Team members negotiate content of template

Page 27: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Fourth, load the data

Persistent data archive management• Use the “Storage Resource Broker”

– San Diego Supercomputer Center product

Fifth, load the metadata

Harvest metadata from data files, automaticallyProvide tools for metadata editingLoad into Oracle

Page 28: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Building a Collection Developer’s Toolkit

Page 29: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Collection Developer’s Toolkit

Make it easy to build, and maintain• Not just for IT experts

Portable and scalable for other projects

Integrate• Metadata tools• Data tools• Interactive search and display console

Page 30: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Make use of existing resourcesAlexandria Digital Library• Geospatial content• OAI-compliant server

Environmental data archive and delivery tools• John Helly, http://ceed.sdsc.edu/

Storage Resource Broker• http://www.npaci.edu/DICE/SRB/index.html/

Domain-specific toolkits• GMT, MB-System, ARC/IMS

Page 31: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Build metadata tools

Automate• Bulk harvesting from data files• Bulk loading into Oracle database

Use NSDL community standards• Dublin Core + “ADN” metadata

– Alexandria Digital Library (UCSB)– DLESE (Digital Library for Earth System Education)– NASA

• Controlled vocabularies– Science themes– Geographic names

Embed domain-specific metadata into standards

• Multibeam, cruise, sampling

Page 32: Bridging the Gap  between Libraries and Data Archives:  Progress Report

MOBE

• Metadata Object Browser and Editor

• Inherit metadata from– Dublin Core– Cruise

• Flexible– Expand for projects

as needed– Generic ascii

metadata interchange format “MIF”

– Export to xml

• Java

Page 33: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Search interface

Design for alternative approaches

• Geospatial– Lat, lon

• Temporal– “1995-2000”

• Keyword– Region

“Samoa” – Vessel

“Melville” – Cruise

“AVON02MV”– Data type “dredge”– Scientist

“Staudigel”

• Expert-level– Research, teacher, student,

publicPrototype search interface

Page 34: Bridging the Gap  between Libraries and Data Archives:  Progress Report

CruiseViewer

• Interactive browser and query interface

• Display tracks and samples

• Download library objects

• Java

Page 35: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Manage interfaces for multiple projectsBoth data and metadata

Page 36: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Lessons learned (so far)…

Page 37: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Make it easier to collaborate

Interactions between groups• Not just a technology project

• Diverse goals, vocabularies and audiences

Interoperate• Each domain has own sphere of responsibility

– Don’t engineer someone else’s domain

• Work through interfaces– Re-negotiate as needed– Avoid long-term maintenance headaches between

domains

Page 38: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Build tools for collaborative projects

3 “cultures” in this project• Oceanographers• Computer scientists• Librarians

Example: bridge vocabularies between separate domains

• Use metadata “triples,” not “pairs”• Reduce phone calls by including narrative label

parameter name

value narrative label

science_themes

geochemistry, marine geology, marine geophysics, hot spots, mantle plumes, geochronometry, seamount chains

keywords, from controlled vocabulary of science terms, selected from the “SIOExplorer Science Theme” template

Page 39: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Adding newprojects to SIOExplorer

Make use of• Collection

Developer’s Toolkit• NSDL server• Metadata

interchange• Query processing• SDSC

– Managed storage– Web service

Page 40: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Test the prototype

Melville departs Lyttelton harbor

Page 41: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Floating Digital Library Workshop

R/V Melville• March 7-21, New Zealand to

Samoa

Realtime acquisition of library objects?

• Load metadata into swath files

– At acquisition time

• Specify cruise metadata• Sensor documentation

database• Load the CCDS

Learn from a common experience

Page 42: Bridging the Gap  between Libraries and Data Archives:  Progress Report

A good day at 51° S

Renewed appreciation for the collection of field data

Page 43: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Common experience LibrariansComputer scientistsOceanographersRoyal New Zealand Navy

Melville in Lyttelton

Collaboration between SIO and RNZN

Page 44: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Floating Digital Library Workshop

Librarian at sea

Computer scientist in galley Oceanographer holding onto computer

Page 45: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Bollons Gap surveyNew Zealand Law of the Sea Claim

Librarian at sea

Visualization of swath bathymetry, looking north

Page 46: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Heading for Samoa Crossing the Louisville RidgeTonga TrenchOsbourn Trough (ancient spreading center)

Visualization of Global Topography, looking north

Page 47: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Relate cruise to SIO holdings

Display search results• Red

– SIO multibeam• Black

– Other cruises• Yellow

– SIO dredged rock samples• Also

– Volcanoes– Earthquakes– Plate boundaries

Typical research support product

• Make it available on web• Select cruises for further

study• Export for ArcView

– Related NSF/ITR project

Page 48: Bridging the Gap  between Libraries and Data Archives:  Progress Report

Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer

NSF Division of Biological InfrastructureCollaboration with Smithsonian Institution

Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL

NSF NSDL Collections Track

Track of the Albatross, 1884-1921

Next steps

Page 49: Bridging the Gap  between Libraries and Data Archives:  Progress Report

SIOExplorer: Expedition Planner

Open research data for student discovery

• Leverage Digital Library efforts

• Students design a virtual expedition – Explore relationships

– Depth, Sediment thickness, Crustal age– More …

– Earthquakes, volcanoes, trenches– Wind, waves, currents– Climate

• Students publish expedition report – On the web

• Teacher workshops– At the Birch Aquarium

Crustal Age

Sediment thickness

Global Topography

Page 50: Bridging the Gap  between Libraries and Data Archives:  Progress Report

SIO 100th Anniversary

September 26, 2003

SIO, 1909

http://SIOExplorer.ucsd.edu

R/V Alexander Agassiz, 1907