15
Community Inventory of EarthCube Resources for Geoscience Interoperabil ity data discovery is the most often cited issue in executive summaries on the EarthCube web site CINERGI Ilya Zaslavsky, Steve Richard and the CINERGI t http:// workspace.earthcube.org/cinergi

C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability

  • Upload
    karik

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

C ommunity In ventory of E arthCube R esources for G eoscience I nteroperability. CINERGI. data discovery is the most often cited issue in executive summaries on the EarthCube web site . Ilya Zaslavsky, Steve Richard and the CINERGI team. http:// workspace.earthcube.org/cinergi. - PowerPoint PPT Presentation

Citation preview

Page 1: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Community Inventory of EarthCube Resources for Geoscience Interoperability

data discovery is the most often cited issue in executive summaries on the EarthCube web site

CINERGI

Ilya Zaslavsky, Steve Richard and the CINERGI teamhttp://workspace.earthcube.org/cinergi

Page 2: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Goals Large inventory of high quality information

resources across disciplines, with traceable provenance, usable across EarthCube research scenarios: datasets, catalogs, vocabularies, information models,

services, process models, repositories, etc. Make it open to the community Organize it to enable search and integration

across domains and linking between information objects Plus links between resources, people/organizations,

publications, models, workflows, software, activities, etc.

Page 3: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Approach Build on high-level resource inventory

started at http://connections.earthcube.org

Compile metadata for as many resources as we can (collect recommendations from geoscientists, harvest existing catalogs)

Expose through simple search interface Use off the shelf technology: Geoportal, ISO

metadata, CSW Make it accessible through EarthCube.org

Page 4: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability
Page 5: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

READINESS ASSESSMENT 1  Catalog MetadataM1 Has a data listingM2 Uses minimal metadata standard, such as Dublin CoreM3 Uses metadata standard, such as FGDC, or INSPIRE  Catalog  SearchS1 Search InterfaceS2  Search API, not following a standardS3 Complies with Opensearch APIS4 Complies with OGC CSW API  Catalog HarvestH1 Has a harvest APIH2 OAI APIH3 OGC CSW API

  Vocabulary – Control and AccessV1 Uses controlled terminologyV2 Community Managed TerminologyV3 SPARQL  Vocabulary -- RepresentationT1 Listing of terminology, such as web pagesT2 Uses ontology or SKOS

  Data Access APIA1 Bulk downloadA2 Static URLA3 Web Service  Data Query APIQ1 Simple query subsetQ2 Complex queryQ3 Processing Subset

  Information Model ConceptualC0  UnspecifiedC1 Domain/Conceptual Model using UMLC2 Domain/Conceptual Model using UML based on OGC or ISO standards  Information Model as XMLX1 XML Format. Schema may not be specifiedX2 Xml Schema  Information Model as SQLS1 Provides an SQL Schema

Also evaluated: processing services; visualization services; community consensus efforts; identifier persistence

Page 6: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

High-level inventory and readiness assessment: viewer

http://connections.earthcube.org

Page 7: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability
Page 8: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Staging Database

Document processing components

Harvest adapters

Public access components

Harvest adapters: components that connect to information sources and import descriptions of EarthCube resources into the staging database.

Staging Database: document database that persists the originally harvested descriptions in their native state, as well as any additional information or updates resulting from subsequent processing/curation of the description

Document processing components: components that

pull documents from the staging database, perform various functions to upgrade content or transform presentation. The processed document may be pushed back to the staging database or out to the public access components

Public access components: components that connect to document processors and implement external interfaces to present content for users

Inte

rface

s to

the

wor

ld

Resource descriptions

Ye Most Excellent EarthCube Inventory System

Page 9: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Then add features Links to organizations, researchers,

other systems Validation Services Deep registration of

datasets/databases (at feature level) Data search capabilities Quality/interop readiness assessment Annotation system

Page 10: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

CINERGI Outline (without deep registration so far)

Publication

Stagingandcuration

Harvesting

Geoportal

CSW, ISO 19115ATOM, GeoRSS, etc.

Linked data RDF, RDF store, eg Neo4j

Extra metadata, provenance, links, annotations

WAF w/XML ISO

Staging DB: MDBMongoDB,CouchDBGeoportal, etc.

ISO DC other

CSW, OAI-MPH, WAF, CKAN, other

DISCO

Validated triples

1. Metadata validation per record

2. Triggering parsers depending on metadata and validation results

Spatial parser

Person /org parser

LOD parser

Keyword parser

Topic parserTime

parser

3.

4. Finding ambiguities for manual curation

Need a parser API so parsers can be added

Duplicate detection, tagging, grouping

Curation UI

Results of parsingProvenanceDuplicate flags

Search UI

Reporting to sources

Pivot for search results

Harvesting dashboard

Record editor

Community pivots Hot page

Search in domain systems

geoportal

pivotDB

Page 11: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Challenges Scope Different levels of granularity Lack of formal information models Implicit domain semantics Multiple metadata registry platforms and

standards Lots of data outside managed repositories Cross-domain governance vs domain

systems Different expectations across domains

(survey)

Page 12: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Initial inventory

http://metadata.earthcube.org

Resources from domain workshops and surveys + initial harvesting

Page 13: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Domain inventories: you are invited to participate! All sources of data mentioned at domain end-user

workshops – are included Working with funded RCNs

Step 1: Prepare an initial collection in a spreadsheet.Step 2: CINERGI will set up your community resource viewer and editing system, seeded with your collectionStep 3: Community editing, updates and curation

Page 14: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Short questionnaire

Function Importance Comments

Making metadata from your facility available for search using standard metadata, via standard APIs

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Tracking demand for and cross-domain usage of your resources

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Identifying issues related to data and metadata quality and completeness

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Tracking search hits that become searches for resources managed by your data facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Connecting owners of relevant datasets to your facility for potential longer-term data management

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Connecting data from your facility with people, publications, models, and projects

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Identifying communities using data, tools, and models from your facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Validating published metadata and service signatures from your facility

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Finding and reporting to you resources that appear as duplicates across multiple registries

1 2 3 4 5 6 7Unimportant Essential

NA DK

 

Potential added value by a cross-domain systemIntegration with cross-domain searchKey characteristics for CINERGI See CINERGI Survey at

http://workspace.earthcube.org/data-facilities

Page 15: C ommunity  In ventory  of  E arthCube R esources  for  G eoscience  I nteroperability

Development Team

San Diego Supercomputer Center/UCSD Ilya Zaslavsky, David

Valentine, Tom Whitenack Amarnath Gupta, Jeff Grethe

(NIF project) Lamont /Columbia Univ./IEDA

Kerstin Lehnert, Leslie Hsu Arizona Geological Survey

Stephen Richard University of Chicago

Tanu Malik Open Geospatial Consortium

Luis Bermudez

Community Partners

• Anthony Aufdenkampe: Critical Zone Observatories

• Shanan Peters: stratigraphy• Bernhard Peucker-

Ehrenbrink: Global River Observatories

• RCN projects that plan to organize community resources

• Test Enterprise Governance• Building Blocks projects

working on web services, brokering solutions

• Agencies• International