22
Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris North Carolina State University Libraries DLF Fall Forum – NDIIPP Roundtable November 8, 2006

Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Embed Size (px)

Citation preview

Page 1: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Collection and Preservation of At-Risk Digital Geospatial Data:

NDIIPP Project Updateon the NC Geospatial Data Archiving Project (NCGDAP)

Steven P. MorrisNorth Carolina State University Libraries

DLF Fall Forum – NDIIPP Roundtable November 8, 2006

Page 2: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 2

NC Geospatial Data Archiving Project

Partnership between university library (NCSU) and state agency (NCCGIA)Focus on state and local geospatial data in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservationProject approaches: Technical and Social$520,000 over 3 years

Serve as catalyst for discussion within industry

Page 3: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 3

Risks to State/Local Geospatial Data

Producer focus on current dataData overwrite as common practice

Future support of data formats in question

No open, supported format for vector data

Shift to web services-based accessData becoming more ephemeral

Inadequate or nonexistent metadataImpedes discovery and use

Increasing use of spatial databases for data management

The whole is greater than the sum of the parts

Page 4: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 4

Different Ways to Approach Preservation

Technical solutions: How do we archive acquired content over the long term?

Build a data repository: not as an end in itself but as a catalyst for discussion within the data communityDevelop a repository ingest workflow: create technical points of engagement with the digital preservation community

Page 5: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 5

Different Ways to Approach Preservation

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production?

Engage data producer community and spatial data infrastructure through outreach and engagement; influence practiceSell the problem to software vendors and standards developmentFind overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc.Start a discussion about roles at the local, state, and federal level

Page 6: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 6

NCGDAP Technical Approach

Receive data as is – variety of distribution methodsMigration of some at-risk formatsMetadata remediation, normalization, and synchronizationDistilling complex objects into repository ingest items (not easy)Using DSpace for demonstration purposes (keeping repository platform at arms length)In the development: use METS record as dormant item “brain” within the repository

Some unsustainable activities – for learning experience

Page 7: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 7

Building Data Bundles: The Zip Codes Example

Page 8: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 8

Where is the Dataset?

Page 9: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 9

Here’s One!

Files

• Multi-file dataset• Georeferencing• Metadata file• Symbolization file• Additional documentation• License• Disclaimer• More

Metadata

• FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata

Page 10: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 10

Hub-and-Spoke Metadata Workflow

Page 11: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 11

Hub-and-Spoke Metadata Workflow

Page 12: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 12

Metadata: Going Beyond a Passive Role

Feedback to the NC OneMap Metadata Outreach Program vis-à-vis metadata quality problems encountered in repository ingestEngage standards body (Open Geospatial Consortium -- OGC) in discussions about:

content packaging standards for geospatial better practices for time-versioned data persistent identifier schemes contributing archive use cases to GeoDRM

Meetings with major software vendor development teams

Page 13: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 13

Social Issues: Changing Industry Thinking

Is the geospatial industry “temporally-impaired?”Lack of access to older dataLack for tool/model support for temporal analysisMetadata: poor support for changing dataEducation: building class projects around available data (i.e., not temporal)

Increased interest now in temporal applications?Increased demand for temporal data?Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.

IMPORTANT: Gathering business cases for using older data

Page 14: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 14

Social Issues: Content Exchange Networks

Solving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term accessLeveraging more compelling business problems: disaster preparedness and business continuity needs can put the data in motion (siphon off to the archive)Geospatial data: large data volumes, frequent data update, complex datasets, ambiguous rightsContent exchange network technical challenges:

Rights managementLarge-scale transfers on networkContent packaging (MPEG 21 DIDL, XFDU, METS, …)

Page 15: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 15

Content Issues: Frequency of Capture Survey

Survey objective:Document current practices for obtaining archival snapshots of county/municipal geospatial vector data layersSeek guidance about frequency of capture

Survey topics:General questions about data archiving practiceSpecific questions about parcels, street centerlines, jurisdictional boundaries, and zoning

Survey subjects:All 100 counties and 25 municipalities -- 58% response rateSurvey conducted September 2006

Added benefit: Survey socialized the preservation issue

Page 16: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 16

Project Status Cultivating a commercial market for older data.

Part of “permanent access” is marketing, advertising, and putting older data into the path of the user

Content Issues: What About Commercial Data?

Page 17: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 17

Mobile, LBS and, social networking applications drive demand for placed-based dataExample sources:

Oblique ImageryStreet-view Imagery (e.g., A9.com)Transportation Dept. Videologs

Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function

New Challenges:“Platial” vs. Spatial Imagery

Emerging: “Tricorder” applications

Page 18: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 18

Emerging online environments are increasingly used to make decisions, how are these decisions documented?Web mashup/AJAX interactions with existing systems spur creation of intermediate content layers: e.g., tiling and caching of WMS servicesFormulation of a standard tiling scheme may create a new preservation opportunity (temporal axis on caches?)

New Challenges: Ajax Applications, Google Earth and All That

Page 19: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 19

Working with the NDIIPP NetworkPartners meetings provide opportunities to cross-fertilize with other effortsCross-fertilization examples:

Maturing thinking about metadata transformations (inspiration from the UIUC/OCLC “hub and spoke” model)Stanford work with METS/PREMIS/FGDC informs NCGDAP metadata strategyNDIIPP-wide discussions about mutual use of tools in ingest workflow (JHOVE, ClamAV, noid, MD5, etc.)Discussions about repository exchange issues

Affiliation with a national preservation effort helps get traction in attracting additional partners

Page 20: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 20

Working with New PartnersState Archives now an informal member of the NCGDAP projectCollaboration with NARAWorking with the Open Geospatial Consortium on standards issuesAssociate Partnership with JISC-funded UK-wide projectSite visits with ESRI (major software vendor) development groupsParticipation in a variety of content exchange network activitiesMore …

Page 21: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 21

Next StepsWorking with NARA and the OGC Interoperability Institute to develop an OGC Data Preservation Working Group charter Evaluating results for the frequency of capture surveyStepping up data acquisition and repository ingestEvaluating initial data acquisition efforts (time factors, content variety, technical/legal barriers)Partnership with content exchange network activitiesRamping up partnerships with broader (non-geospatial) data repository efforts

Page 22: Collection and Preservation of At-Risk Digital Geospatial Data: NDIIPP Project Update on the NC Geospatial Data Archiving Project (NCGDAP) Steven P. Morris

Note: Percentages based on the actual number of respondents to each question 22

Questions?

Contact:

Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]

http://www.lib.ncsu.edu/ncgdap