29
Geospatial Data Preservation Challenges at the Sub-National Level: The North Carolina Experience Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Cambridge Conference July 18, 2007

Geospatial Data Preservation Challenges at the Sub-National Level: The North Carolina Experience Steve Morris Head of Digital Library Initiatives North

Embed Size (px)

Citation preview

Geospatial Data Preservation Challenges at the Sub-National Level:The North Carolina Experience Steve MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries

Cambridge Conference July 18, 2007

2

Outline

Project backgroundTargeted geospatial contentRisks to dataValue in older dataChallenges (Technical and organizational)Solutions (?)Next steps

3

NC Geospatial Data Archiving Project

Partnership between university library (NCSU) and NC Center for Geographic Information & AnalysisPart of the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation

Serve as catalyst for discussion within industry

4

NCGDAP Goals

Repository GoalCapture at-risk dataExplore technical and organizational challenges

Project End GoalData Producers: Improved temporal data management practicesArchives: More efficient means of acquiring and preserving data;

Progress towards best practices

Temporal data management vs. long-term preservation

5

• 96 of 100 North Carolina Counties have GIS systems as do many municipalities

• Over 30 state agency data producers

Collection Focus:State and Local Government Geospatial Data

• Exceptional value– Detailed, current, accurate

• Exceptional risk– Inconsistent or nonexistent archiving

practices– Complicated formats and complex

objects

Source: NC OneMap

Carrboro, NC : Population 17,797 (2005 est.)

22 downloadable GIS data layers

3 OGC WMS services (web services)

10 web mapping applications

9 downloadable PDF map layers

7

NCGDAP Data Types – Vector GIS

• Cadastral (tax parcels)

• Street centerlines

• Zoning

• Topographic contours

• School, sheriff, fire

• Voting precincts

• More …

• County, municipal, state

• Detailed, accurate, current

• Frequently updated

8

NCGDAP Data Types – Digital Orthophotography

• All 100 NC counties with orthos• 1-5 flight years per county• 30-300 gb per flight

Note: Percentages based on the actual number of respondents to each question 9

GIS Software

Software project file (.mxd, .apr, …)

Data layer file (.avl, .lyr, …)

PDF map exports

Web Services-based representations

NCGDAP Data Types – Cartographic

Note: Percentages based on the actual number of respondents to each question 10

Mobile, LBS, and, social networking applications

Long-term cultural heritage value in non-overhead imagery: more descriptive of place and function

Oblique Imagery

Road Videologs

Tax Dept. Photos

Street View Images

Other Data Types – Place-based Data

11

Digital Preservation Points of Failure

Data is not saved, or …can’t be found, or …media is obsolete, or …media is corrupt, or …format is obsolete, or …file is corrupt, or …meaning is lost

Solutions:

MigrationEmulationEncapsulation XML

12

Risks to Geospatial Data

Producer focus on current dataData overwrite as common practice

Future support of data formats in questionNo open, supported format for vector data

Shift to web services-based accessData becoming more ephemeral

Inadequate or nonexistent metadataImpedes discovery and use

Increasing use of spatial databases for data management

The whole is greater than the sum of the parts

13

Value in Older Data: Solving Business Problems

Suburban Development 1993/2002Near Mecklenburg-Cabarrus County border

Land use change analysis

Real estate trends analysis

Site location analysis

Disaster response

Resolution of legal challenges Impervious surface maps

14

Value in Older Data: Cultural Heritage

Future uses of data are difficult to anticipate (as with Sanborn Maps)

15

Challenge: Vector Data Formats

No widely-supported, open vector formats for geospatial data

Spatial Data Transfer Standard (SDTS) not widely supportedGeography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserveCan export individual data layers for curation, but relationships and context are lostSome thinking of using the spatial database as the primary archival platform

16

Challenge: Cartographic Representation

Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc.

17

Challenge: Geospatial Web Services

• How to capture records from decision- making processes?• Possible: Atlas collections from automated image capture• Web 2.0 impact: Emerging tiling and caching schemes (archive target?)

18

Challenge: Preservation Metadata

Metadata Archived?

0.0%10.0%20.0%30.0%40.0%50.0%60.0%70.0%

FGDC format Locally definedmetadata

NC OneMapmetadata starter

block

None

% o

f R

esp

on

den

ts

Results from a 2006 survey of all 100 NC counties and 25 largest NC municipalities

19

Challenge: Data Capture

Response:yes = 65.3%, no = 34.7%*

(out of 57.6% response rate)

Jurisdictions Archiving Snapshots

No: 34.7%

Yes: 65.3%

No response

Yes

No

2006 Frequency of Capture Survey targeting North Carolina counties and municipalities

20

Data Capture Survey Results: Overview

Two-thirds of responding agencies create and retain periodic snapshotsLong-term retention more common in counties with larger populationsStorage environments vary, with servers and CD-ROMs most commonOffsite storage (or both onsite and offsite) is used by nearly half of the respondentsPopularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents

21

Solutions: Content Exchange Infrastructure

Volume of state/federal requests for local data (“contact fatigue”) spurs rethinking of archive strategy for data acquisition Leveraging more compelling business reasons to put the data in motion (disaster preparedness, highway construction, census, …)Content exchange networks:

Minimize need to make contactAdd technical, administrative, descriptive metadataEstablish rights and provenance

22

Informing and Leveraging Other Infrastructure

OrthophotoData DistributionSystem

Efficient transfer of large quantities of imagery

Street Centerline Data Distribution System

Efficient transfer of data from 100 counties, with metadata and clarified rights

NC GIS Inventory

• Efficient data identification• Adding preservation elements

NC OneMap Data Download and Viewer

• Public access• Data visualization

23

• Partnered with EDINA (UK) and NARA to approach the Open Geospatial Consortium (OGC) in 2005-2006

• Working Group charter approved by OGC Technical Committee plenary Dec. 2006

Solutions: Engaging Standards Efforts

24

GML for archivingGeo Rights Management – adding archive use casesContent packaging Saving data state in web services InteractionsContent replication (OGC/Open Grid Forum talks)Persistent identifiersData versioning (metadata and catalog support)Cartographic representation

Points of Engagement with the OGC

Cross-fertilize between library/archives and geospatial communities

25

Project StatusCultivating a commercial market for older data.

Part of “permanent access” is marketing, advertising, and putting older data into the path of the user

Role of Commercial Data Providers

26

Software vendors are more keenly aware of temporal data management as a customer problemConsulting firms increasingly see temporal data management and archiving as a business opportunityInnovative practices emerging at local and state level to complement and inform national level activities

Signs of Hope

Viral adoption of archiving practices vs. mandated archiving practices: which will have more effect?

27

TechnicalRefining repository ingest workflow (currently using DSpace)Further investigation into use of METS (Metadata Encoding and Transmission Standard) and PREMIS (Preservation Metadata Standard)Content exchange tests with other organizations

OrganizationalOGC Data Preservation Working GroupEngaging State Archives: Local records outreach and records retention practicesWork towards formulating best practices for data capture practices for local agenciesContent exchange networks

Next Steps

28

Questions?

Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]

http://www.lib.ncsu.edu/ncgdap

Note: Percentages based on the actual number of respondents to each question 29