37
Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries

NC Geospatial Data Archiving Project (NCGDAP)

  • Upload
    chip

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data Steve Morris Head of Digital Library Initiatives NCSU Libraries. NC Geospatial Data Archiving Project (NCGDAP). Partnership between NCSU Libraries and NC Center for Geographic Information & Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: NC Geospatial Data Archiving Project (NCGDAP)

Cooperative Project with Library of Congress on Preservation of Digital Geospatial Data

Steve MorrisHead of Digital Library InitiativesNCSU Libraries

Page 2: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 2

NC Geospatial Data Archiving Project(NCGDAP)

Partnership between NCSU Libraries and NC Center for Geographic Information & Analysis$520,000 funding – 3 yearsFocus on state and local geospatial content in North Carolina (state demonstration)Address NC OneMap objective: “Historic and temporal data will be maintained and available.”One of eight projects in the first NDIIPP funding round: “Building a Network of Partners”

Page 3: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 3

Page 4: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 4

NDIIPP OverviewNational Digital Information Infrastructure and Preservation Program

Congress appropriated $100 million for this effort, which instructs the Library to spend an initial $25 million to develop and execute a congressionally approved strategic plan

Eight initial projects, 2004-2007: web pages, cultural heritage, numeric data, video, business records, mixed content, geospatial (2)

Developing partnerships and identifying issuesExtensive interaction among NDIIPP projects

Page 5: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 5

Targeted Content

Resource TypesGIS “vector” (point/line/polygon) data

Digital orthophotography

Digital maps

Tabular data (e.g. assessment data)

Content ProducersMostly state, local, regional agencies

Some university, not-for-profit, commercial

Selected local federal projects

Page 6: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 6

Risks to Digital Geospatial Data

.shp

.mif

.gml

.e00

.dwg

.dgn

.bsb

.bil

.sid

Page 7: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 7

Risks to Digital Geospatial Data

Focus on current dataArchiving data does not guarantee “permanent access”

Future support of data formats in questionNeed to migrate formats or allow for emulation

Data failure“Bit rot”, media failure

Preservation metadata requirementsDescriptive, administrative, technical, DRM

Shift to “streaming data” for access

Page 8: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 8

Time series – vector dataParcel Boundary Changes 2001-2004, North Raleigh, NC

Page 9: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 9

Time series – Ortho imageryVicinity of Raleigh-Durham International Airport 1993-2002

Page 10: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 10

Today’s geospatial data as tomorrow’s cultural heritage

Page 11: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 11

Earlier NCSU Acquisition Efforts

NCSU University Extension project 2000-2001Target: County/city data in eastern NC

“Digital rescue” not “digital preservation”

Project learning outcomesConfirmed concerns about long term access

Need for efficient inventory/acquisition

Wide range in rights/licensing

Need to work within statewide infrastructure

Acquired experience; unanticipated collaboration

Page 12: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 12

One Earlier Project Outcome: Directory of County and City Services

Among top 15 most used resources on library web site

99.5% of directory users from outside ncsu.edu

Page 13: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 13

NDIIPP Project Phases

Content Identification and Selection

Content Acquisition

Partnership Building

Content Retention and Transfer

All 8 NDIIPP cooperative projectsadhere to this structure

Page 14: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 14

Content Identification and Selection

Work from NC OneMap Data Inventory

Combine with inventory information from various state agencies and from previous NCSU efforts

Develop methodology for selecting from among “early,” “middle,” and “late” stage products

Develop criteria for time series development

Investigate use of emerging Open Geospatial Consortium technologies in data identification

Page 15: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 15

Content AcquisitionWork from NC OneMap Data Sharing Agreements as a starting point (the “blanket”)Secure individual agreements (the “quilt”) Investigate use of OGC technologies in captureUse METS (Metadata Encoding and Transfer Standard) as a metadata wrapper

Bundle data files, metadata, ancillary documentationSupplement FGDC metadata with additional administrative, technical, and descriptive metadataEncode rights (Digital Rights Management – DRM)Links to services

Page 16: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 16

Partnership Building

Work within context of the NC OneMap initiativeExplore state, local, federal partnerships

Defined characteristic: “Historic and temporal data will be maintained and available”Advisory Committee drawn from the NC Geographic Information Coordinating Council subcommittees

Seek external partnersNational States Geographic Information Council FGDC Historical Data Committee

… more

Page 17: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 17

Content Retention and Transfer

Ingest into Dspace open source digital repository software

Look more generically at the issue of putting geospatial content into digital repositories

Investigate re-ingest into a second platformStart to define format migration paths

Special problem: geodatabases

Purse long term solutionRoles of data producing agencies, state agencies; NC OneMap; NCSU

Page 18: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 18

Big Geoarchiving Challenges

Format migration paths

Management of data versions over time

Preservation metadata

Preserving cartographic representation

Keeping content repository-agnostic

Preserving geodatabases

Harnessing geospatial web services

More …

Page 19: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 19

Vector Data Format Issues

Vector data much more complicated than image data

‘Preservation’ vs. ‘Permanent access’An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access

Piles of XML need to be widely understood piles

GML: need widely accepted application schemas (like OSMM?)

The Geodatabase conundrumExport feature classes, and lose topology, annotation, relationships, etc.

… or use the Geodatabase as the primary archival platform (some are now thinking this way)

Page 20: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 20

Geography Markup Language Issues

GML still more useful as a transfer format than an archival format, support limited even for transferFGDC Historical Data Working Group investigations into GML for use in archivingPlans for environmental scan of existing GML profiles and application schemas or profiles

schema name (e.g. OSMM, top10NL, ESRI GML, LandGML)responsible agency; scheme has official government status?GML version; known unsupported GML componentsschema history; known interoperation with other schemas vendor support; translator support

Page 21: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 21

Managing Time-versioned Content

Many local agency data layers continuously updated

Older versions not generally available

Individual versioned datasets will wander off from the archive

How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”?

How do we certify concurrency and agreement between the metadata and the data?

Page 22: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 22

Preservation Metadata Issues

FGDC MetadataMany flavors, incoming metadata needs processingOther standards: PREMIS, MODS

Metadata wrapperMETS (Metadata Encoding and Transmission Standard) vs. other industry solutionsNeed a geospatial industry solution for the ‘METS-like problem’GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGC Web Services 3)

Page 23: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 23

Preserving Cartographic Representation

The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data:

Intellectual choices about symbolization, layer combinations

Data models, analysis, annotations

Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration

Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem

Page 24: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 24

Preserving Cartographic Representation

Page 25: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 25

Preserving Cartographic Representation

Image-based approaches (“dessicated data”)Generate images using Map Book or similar tools

Harvest existing atlas images

Capture atlases from WMS servers

Export ‘layouts’ or ‘maps’ to image

Vector-based approachesStore explicitly in the data format (e.g. Feature Class Representation in ArcGIS 9.2)

Archive and upward-migrate existing files .avl, .apr, .lyr, .mxd, etc.

SVG, VML or other XML approaches

Other?

Page 26: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 26

Preserving Cartographic Representation

Page 27: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 27

Preserving Cartographic Representation

Page 28: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 28

Preserving Geodatabases

Not just data layers and attributes—also topology, annotation, relationships, behaviors

ESRI Geodatabase archival issuesXML Export, Geodatabase History, File Geodatabase, Geodatabase Replication

Growing use of geodatabases by municipal, county agencies

Some looking to Geodatabase as archival platform (in addition to feature class export)

Page 29: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 29

Geodatabase Availability

According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format.

Cities: Street Centerline Formats

Geodatabase

Shapefile

Coverage

Other

Counties: Street Centerline Formats

Geodatabase

Shapefile

Coverage

Other

Page 30: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 30

Evolving Geodatabase Handling Approaches

Project Stage Planned Approach

Original Proposal (Nov. 2003)

Export feature classes as shapefiles; archive Geodatabases less than 2 GB in size

Finalized Work Plan (Dec. 2004)

Also export content as Geodatabase XML

Possible Future Work Plan Changes

Explore maintenance of some archival content in Geodatabase form; explore Geodatabase replication as an archive development approach; archive Geodatabases of unlimited size

Page 31: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 31

Harnessing Geospatial Web Services

Automated content identification ‘capabilities files,’ registries, catalog services

WMS (Web Map Service) for batch extraction of image atlases

last ditch capture option

preserve cartographic representation

retain records of decision-making process

… feature services (WFS) later.

Rights issues in the web services space are ambiguous

Page 32: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 32

Partnerships

ESRI Discussing software requirements: meetings with development teams April 2005

Open Geospatial Consortium (OGC)Meet with Architecture Working Group Nov. 2005

National Archives and Records AdministrationInvestigations into GML for archiving; planned presentation to NARA technology team

FGDC Historical Data Working GroupGeneral geospatial data preservation issues

Page 33: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 33

Partnerships

EDINA (University of Edinburgh, UK) NCSU is Associate Partner on UK project for geospatial institutional repositories

UC Santa Barbara & Stanford UniversityOther NDIIPP geospatial project

EROS Data CenterPlanned site visit

Project visits to regional GIS groupsAlbemarle Regional GIS meeting Nov. 3

More planned …

Page 34: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 34

Progress to Date

Completion of project agreements

Hiring staff

Acquisition and deployment of storage system (12.4 TB capacity – two 16.8 TB systems)

Testing and deployment of repository software

Development of metadata workflow

Development of ingest workflow

Pilot project with NC Geologic Survey data

… Initial focus on developing the “plumbing”

Page 35: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 35

Questions for You?

What are your current practices for:Archiving data and managing time versionsManaging geodatabase versionsTransfer mechanisms for data

• to regional entities?• to off-site storage for disaster recovery?

Archiving project files and finished products

What rights issues exist with regard to putting county and city data into an archive?What would you like this project to do?

Page 36: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 36

Ways to Participate in NCGDAP

Identifying data for inclusion in the repository

Discussing data format strategies

Sharing ideas about archiving approaches and architectures

Sharing and identifying concerns about rights issues, liability, etc.

Host project visits to regional GIS groups

Use Local Government GIS listserv to discuss preservation issues?

Page 37: NC Geospatial Data Archiving Project (NCGDAP)

Note: Percentages based on the actual number of respondents to each question 37

Questions?

Contact:

Steve MorrisHead, Digital Library InitiativesNCSU [email protected]

http://www.lib.ncsu.edu/ncgdap