Upload
neena
View
31
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Bridging the Gap between Libraries and Data Archives: Progress Report. Roger Revelle, Gulf of California Expedition, 1939. JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh. Two new NSF Projects …. “Bridging the Gap between Libraries and Data Archives” - PowerPoint PPT Presentation
Citation preview
Bridging the Gap between Libraries and Data Archives: Progress Report
Roger Revelle, Gulf of California Expedition, 1939
JISC/NSF Digital Libraries Initiative All Projects Meeting
24-25 June 2002, Edinburgh
Two new NSF Projects …
“Bridging the Gap between Libraries and Data Archives”
NSDL Collections Track
“SIOExplorer: Web Exploration of Seagoing Archives”
Information Technology Research (ITR)
• Started October 2001
Collaborative effort
UCSD LibrariesScripps Institution of OceanographySan Diego Supercomputer Center
Advisory Board
NOAAUS Naval Oceanographic OfficePrivate IndustryOther oceanographic institutions
Combine …
Data50 years of digital dataGrowing 200 GB per year
Images99 years of SIO Archives
DocumentsReports, publications, books
… into one digital library
Data in the collection …
Bathymetry, magnetics, gravity• Gathered from worldwide sources
795 SIO cruise legs• Swath bathymetry since 1981
Approx. 3000 cruise legs online at SIO
Multibeam sonar revolutionizes seafloor understanding
Map a wide swath• Not just a single profile
– SeaBeam Classic, 1981-1992– 16 beams
– SeaBeam 2000, 1992-– 121 beams
– SeaBeam 2100, 1996-2000– 151 beams
– Simrad EM120, 2001-– 191 beams– 150 degree swath width
• Also backscatter– Determine bottom type
– Sediment– Lava flow
Realtime swath 20 km across-track
SIO Swath Mapping Expeditions
244 swath mapping cruises on vessels, since 1981
Thomas Washington Melville Revelle
600 GB multibeam holdings Adding 200 GB/year
Deliver sampling information
Sample index, 1968-• 100,000 entries• 500 types
– Dredged rocks, cores– Biological trawls– Water samples– CTD
Build on www.EarthRef.org• Seamount catalog
(Amelia Earhart)
Roger Revelle, MidPac, 1950
Images in the collection …
Access Voyages of DiscoveryEncourage inquiry
• “What’s this?” links from image– Data (“What”)– Instruments (“How”)– Other voyages
Dual useResearch and education
Naga Expedition, 1959-61(artist’s illustrations from logbook)
R/V Albatross departed SIO 1904
Sigsbee sounding machine
Voyages of Discovery in the PacificLa Perouse 1780’s
R/V Revelle • “La Perouse Expedition”
– Departed June 8
R/V Melville• “Cook Expedition”
– Returns July 17
Special Collections, UCSD Library
James CookBy Nathaniel Dance, 1776
Voyages of Discovery in the Pacific1950’s
Ed Hamilton, MidPac, 1950
Samoa, Capricorn, 1952
R/V Spencer F. Baird
L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius,Henri Rotschi, Robert Livingston, Russell Raitt.
Seated: Dick Blumberg, Ronald Mason,Bob Dill, Art Maxwell, Winter Horton, Walter Munk,Helen Raitt
Capricorn Expedition, 1952-53
Query for ideas and careers
Not just data
Track a scientist’s expeditions and publications
Documents in the collection …
Full text of publications
The Challenger Expedition• 30,000 scanned pages
Anatomy of an Expedition• Bill Menard, 1967 Nova Expedition
– Link to 1998 Avon Expedition
Exploring the Deep Pacific• Helen Raitt, 1952 Capricorn Expedition
Cruise reports50 years available
• Scan older versions• Currently generate .pdf automatically
Page with swath bathymetry every 6 hours
Bridging the Gap:Progress Report
The Problem
Archives are search-impaired
Content not a problemMaterial exists in great abundance
• Data archives• Historical archives
But it is hard to getLitany of woes …
Litany of archive woes
Magnetic media at risk• Need to migrate to new storage
Local access only• Some online, but sprawling
directories• Tapes and CDs in drawers• Inconsistent naming over 30 years
Home-grown software• Pre-database technology• Minimal documentation• Formal metadata non-existent
Creators now retired
What to do?
Shipboard archives for one recent cruise
Steps toward a SolutionSeek professional help
Computer scientistsAdvisory Board(Similar problems faced in many fields)
Review the problemSeven issues from national workshopAnalyze the dataflow
Build a prototype
Test the prototypeNew Zealand – Samoa Expedition
SearchMetadata rarely exist
AccessAutomated management
QualityA challenge
DisplayInteractive tools
FlexibilityImport, export
ScalabilityInteroperate with large projects
StabilityCuration, beyond end of project
Review archive problems
NSF/ONR Marine Geology and Geophysics Workshop
First, create a conceptual data model
Spend time to review with all participants
Design a robust model• Define common categories
– 9 basic directories– Specific subdirectories
• Controlled design document
Map existing digital objects to categories• Both documents and data• Accommodate variations
– Data types and names over 50 years– Valid for future developments
Result “CCDS” – Canonical Cruise Data Structure
Dataflow
Second, organize domain-specific content
Work inside a “Staging Area”
• Deal with complexity– Extract from 3 archive levels
– Shipboard (tape, CD)– Post-processing lab (tape)– Current online content – (not always “best”)
• Opportunity for data cleanup– Apply corrections– Weed out intermediate and duplicate versions– Gather information for metadata
Third, load the “CCDS”Clear transition in activities
• Domain specialists final approval• IT team takes over
Early mistake • “Pushed” content from legacy data directories
– Complex, vary over years– Revised to “pull” into Canonical Structure
IT lesson learned• Dataflow needs to be “template-driven”• Template can incorporate
– Rules for automatic loading– Adaptive choice among multiple alternatives
• Maintain flexibility as project evolves– Team members negotiate content of template
Fourth, load the data
Persistent data archive management• Use the “Storage Resource Broker”
– San Diego Supercomputer Center product
Fifth, load the metadata
Harvest metadata from data files, automaticallyProvide tools for metadata editingLoad into Oracle
Building a Collection Developer’s Toolkit
Collection Developer’s Toolkit
Make it easy to build, and maintain• Not just for IT experts
Portable and scalable for other projects
Integrate• Metadata tools• Data tools• Interactive search and display console
Make use of existing resourcesAlexandria Digital Library• Geospatial content• OAI-compliant server
Environmental data archive and delivery tools• John Helly, http://ceed.sdsc.edu/
Storage Resource Broker• http://www.npaci.edu/DICE/SRB/index.html/
Domain-specific toolkits• GMT, MB-System, ARC/IMS
Build metadata tools
Automate• Bulk harvesting from data files• Bulk loading into Oracle database
Use NSDL community standards• Dublin Core + “ADN” metadata
– Alexandria Digital Library (UCSB)– DLESE (Digital Library for Earth System Education)– NASA
• Controlled vocabularies– Science themes– Geographic names
Embed domain-specific metadata into standards
• Multibeam, cruise, sampling
MOBE
• Metadata Object Browser and Editor
• Inherit metadata from– Dublin Core– Cruise
• Flexible– Expand for projects
as needed– Generic ascii
metadata interchange format “MIF”
– Export to xml
• Java
Search interface
Design for alternative approaches
• Geospatial– Lat, lon
• Temporal– “1995-2000”
• Keyword– Region
“Samoa” – Vessel
“Melville” – Cruise
“AVON02MV”– Data type “dredge”– Scientist
“Staudigel”
• Expert-level– Research, teacher, student,
publicPrototype search interface
CruiseViewer
• Interactive browser and query interface
• Display tracks and samples
• Download library objects
• Java
Manage interfaces for multiple projectsBoth data and metadata
Lessons learned (so far)…
Make it easier to collaborate
Interactions between groups• Not just a technology project
• Diverse goals, vocabularies and audiences
Interoperate• Each domain has own sphere of responsibility
– Don’t engineer someone else’s domain
• Work through interfaces– Re-negotiate as needed– Avoid long-term maintenance headaches between
domains
Build tools for collaborative projects
3 “cultures” in this project• Oceanographers• Computer scientists• Librarians
Example: bridge vocabularies between separate domains
• Use metadata “triples,” not “pairs”• Reduce phone calls by including narrative label
parameter name
value narrative label
science_themes
geochemistry, marine geology, marine geophysics, hot spots, mantle plumes, geochronometry, seamount chains
keywords, from controlled vocabulary of science terms, selected from the “SIOExplorer Science Theme” template
Adding newprojects to SIOExplorer
Make use of• Collection
Developer’s Toolkit• NSDL server• Metadata
interchange• Query processing• SDSC
– Managed storage– Web service
Test the prototype
Melville departs Lyttelton harbor
Floating Digital Library Workshop
R/V Melville• March 7-21, New Zealand to
Samoa
Realtime acquisition of library objects?
• Load metadata into swath files
– At acquisition time
• Specify cruise metadata• Sensor documentation
database• Load the CCDS
Learn from a common experience
A good day at 51° S
Renewed appreciation for the collection of field data
Common experience LibrariansComputer scientistsOceanographersRoyal New Zealand Navy
Melville in Lyttelton
Collaboration between SIO and RNZN
Floating Digital Library Workshop
Librarian at sea
Computer scientist in galley Oceanographer holding onto computer
Bollons Gap surveyNew Zealand Law of the Sea Claim
Librarian at sea
Visualization of swath bathymetry, looking north
Heading for Samoa Crossing the Louisville RidgeTonga TrenchOsbourn Trough (ancient spreading center)
Visualization of Global Topography, looking north
Relate cruise to SIO holdings
Display search results• Red
– SIO multibeam• Black
– Other cruises• Yellow
– SIO dredged rock samples• Also
– Volcanoes– Earthquakes– Plate boundaries
Typical research support product
• Make it available on web• Select cruises for further
study• Export for ArcView
– Related NSF/ITR project
Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer
NSF Division of Biological InfrastructureCollaboration with Smithsonian Institution
Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL
NSF NSDL Collections Track
Track of the Albatross, 1884-1921
Next steps
SIOExplorer: Expedition Planner
Open research data for student discovery
• Leverage Digital Library efforts
• Students design a virtual expedition – Explore relationships
– Depth, Sediment thickness, Crustal age– More …
– Earthquakes, volcanoes, trenches– Wind, waves, currents– Climate
• Students publish expedition report – On the web
• Teacher workshops– At the Birch Aquarium
Crustal Age
Sediment thickness
Global Topography
SIO 100th Anniversary
September 26, 2003
SIO, 1909
http://SIOExplorer.ucsd.edu
R/V Alexander Agassiz, 1907