A Data Management Life-Cycle
By
David Ferderer Project ChiefChris Skinner ContractorGreg Gunther Contractor
Presentation Outline
• USGS Landscape
• Life-Cycle Model and Strategy
• Component Descriptions (Skinner)
• Demonstration (Gunther)
• Conclusions and Future Directions
USGS Landscape - Energy Program
• What We Do– Provides Science-Based Energy Assessments
• Organization Issues – Regional Centers and Competitive Funding Process
– Multiple Project Areas, Applications, Data Types, and Platforms
• Information Issues– Technology and Data Explosion
– Access, Delivery, and Archive Requirements
– Diverse Client and Product Needs
• Policy and Mandates
USGS Landscape - Central Energy Team
• 125 Full and Part-Time Employees– Independent Thinkers and Researchers
• Multiple Application Platforms– UNIX (ArcInfo 8, ArcView 3x, SDE 3, ORACLE 8,
EarthVision, Seismic, PETROMOD)
– PC/NT (ArcInfo 8, ArcView 3x, Geographix)
• Centralized and Distributed Data Storage
• 100mb Fast Ethernet Network
Central Energy Team “Information” Shift
Data ManagementInformation Services
GIS
ProjectLife-Cycle
Integration
Life-Cycle Model and Strategy
• Life-Cycle Model (Conceptual)– A Series of Processes and Utilities that Manage the Flow of Data
to Information, Products, and Knowledge
• Life-Cycle Implementation Strategy (Actual)– Processes are Translated into the Find, Get, Use, Deliver, and
Maintain Strategy
– Strategy Defines Tasks, Components, and Deliverables
Implementation Strategy
• DM Finds Internal and External Data Resources
• DM Gets the Data Organized, Documented, and
Accessible to Team Projects
• Projects Use the Data and Other Resources in Research
• DM Assists Projects in Delivering Products to Public
• DM Maintains the System and Upgrades Components
Strategy Components and Utilities (Internal USGS)
FindExternal Data
and Information GetData Organized
UseData and Other Resources
In Research Projects
FindInternal Data
andInformation(Archive and
Reuse)
DeliverData and Knowledge to Projects and the Public
Maintain(Upgrades and
Documentation)
Team Data Library
ArchiveLibrary
InventoryDatabase
MetadataUtilities
Data ProcessingUtilitiesProject
DesignIntranetResources
HypermediaPublications
CD-ROMTemplates
Data Life-Cycle
Team Data Library
• Centralized Storage
– Team Data Resources (primarily spatial)
– Theme and Sub-Theme Organization
• Standardized– Naming Conventions
– Directory Structure
– Storage Formats (e00, shape, SDE)
– Common Data Projection (geographic)
– Metadata
– Browse Graphics
Team Data Library
Team Archive Library
• Offline Storage of Team Data Resources
• Contains – Publications
– USGS Digital Data Products (DLG, DEM, DOQ)
– Team Archives
• Standardized File Names and Directory Structure
ArchiveLibrary
Inventory Database
• MS Access Database Tracking Team’s Data Holdings
• Contains– 60 Information Fields (10 Required) in 21 Tables
– 28 Fields Corresponding to FGDC Metadata Elements
– Inventoried 4600 Datasets and 680 Archives (> 500 GB)
InventoryDatabase
Inventory Database
• Features– Tracks Multiple Types of Data (Spatial, Text, Graphic and Tabular)
– Separately Tracks Archives, Publications, and Individual datasets
– Automatic Loading and Editing Scripts
– Serves as the Engine to DART…
InventoryDatabase
DART
• Data Access, Retrieval, and Tracking System
– Easy Access to Team Data Resources via Web Browsers
– Customized Search and Browse of Archives, Publications, and Datasets
– Direct Data and Metadata Download to User’s Desktop
– Object-Oriented Application
– Java Server Pages on ServeletExec 3.1
– Stay Tuned for the Demonstration!
Metadata Utilities
• Web-Based Metadata Entry and Creation System – Users Generate, Modify, and Save Compliant Metadata Output
to the Desktop– Provides a Simplified and Comprehensive Online Help System
• Contains– Links to Other Metadata Tools and Resources
– Library of Metadata
MetadataUtilities
Other Data Management Products
• Data Processing and Automation Utilities– Portal to ‘How-To’, AMLs, and FAQ Documents Residing in
the Team and On the WWW
• Project and Workspace Design Recommendations– Templates Promote Efficient Work-Flow, Data Organization,
Archives, and Rapid Publication
• CD-ROM Templates and Hypermedia Distribution
Data ProcessingUtilities
ProjectDesign
HypermediaPublications
CD-ROMTemplates
Maintenance
• DM Provides Continual Maintenance and Upgrades of System Components
• Develop Publications and Documentation – User Manuals
– Formal Component Documentation
– Templates, Guidelines, and Policies
– Fact Sheets and Bulletins
Demonstration
Greg Gunther
System Summary
• Easy Access to Datasets
• Generate Metadata Quickly and Easily
• Find External Data with Over 1000 WWW Links
• Simplify Data Processing Tasks
• Organizes Projects with Workspace Templates
• Streamlines CDROM Publications
• Provides One-Stop Shopping For Shared Internal Resources
Future Directions
• Increase Inventory Effort
• Integrate GeoDatabase Model (ArcGIS) for Proprietary Datasets
• Formalize Metadata Extension to FDGC Standard
• Streamline Product Delivery - Implement IMS
• Publish Documented Tools and Utilities
• Implement Enterprise Architecture and Planning
Future Architecture
Enterprise Planning*
*Modified from Spewak Model
Planning & Initiatives
BusinessProcesses
Current Systems
Getting Started
Where We Are Today
Where We Want To Be
Plan To The Future Implementation and Migration Plans
Data Architecture
GIS &ApplicationArchitecture
IS/ITArchitecture
Conclusions – What We Have Learned
Data Management:
• It’s ESSENTIAL for Survival But Needs to be Promoted
• Distributed Projects REQUIRE Data Centralization
• Projects RARELY Account for Data Management Planning and
Costs
• Data Stewardship MUST Begin at the Onset of Projects
• The Terms EASY and USEFUL - Lead to Implementation
• Component Model Must be FLEXIBLE to Adapt to Technology
Trends
The End
And
The Beginning Of a New Cycle…