21
Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais The Natural History Museum and Biodiversity Research Center University of Kansas

Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais

  • Upload
    cade

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Science Environment for Ecological Knowledge: Ecogrid Interfaces Dave Vieglais The Natural History Museum and Biodiversity Research Center University of Kansas. Science Environment for Ecological Knowledge. Research Objectives Access to ecological and environmental data - PowerPoint PPT Presentation

Citation preview

Page 1: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Science Environment for Ecological Knowledge: Ecogrid Interfaces

Dave VieglaisThe Natural History Museum and Biodiversity Research Center

University of Kansas

Page 2: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Science Environment for Ecological Knowledge

Research Objectives

Access to ecological and environmental data Enable data sharing & re-use Enhance data discovery at global scales

Scalable analysis and synthesis Taxonomic, Spatial, Temporal, Conceptual integration of

data Enable communication and collaboration for analysis Address data heterogeneity issues Enable re-use of analytical components

Page 3: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Data is Heterogeneous Syntax Schema Semantics

From many disciplines Biodiversity surveys, hydrology, atmospheric

chemistry, spatial data, behavioral experiments,… Data on economics, demographics, legal issues,…

Data is distributed

Informatics Challenges for SEEK

Page 4: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

SEEK Components

EcoGrid Ecological, biodiversity and environmental data Computational access

Analysis and Modeling System Modeling scientific workflows

Semantic Mediation System “Smart” data discovery Knowledge-based data integration Knowledge-based analysis integration

Knowledge Representation Ontologies for describing ecology

Page 5: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Building the EcoGrid

AND

SEV

LUQ

VCR

HBR

NTL

NRSPISCO1

PISCO2 OBFS

Metacat node

Site node

LTER Network (24)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)

SDSC

NET

KU

NCEAS

SRB node

DiGIR node

Page 6: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

SEEK EcoGrid

Integrate diverse data networks from ecology, biodiversity, and environmental sciences Metacat, DiGIR, SRB, Xanthoria, ...

EML is the core for data documentation Access to computational resources via the Grid

(OGSA)

Page 7: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Ecological Metadata Language (EML)

Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion

EML Discovery information

Creator, Title, Abstract, Keyword, etc. Coverage

Geographic, temporal, and taxonomic extent Logical and physical data structure

Data semantics via unit definitions and typing Protocols and methods

Page 8: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

DiGIR Overview

DiGIR = Distributed Generic Information Retrieval A DiGIR client may communicate with any number of

data providers A DiGIR data provider may expose any number of

resources (databases) A DiGIR resource is a collection of objects described

by a single federation schema

DiGIR Client

DiGIR Provider

DataResource1..n 1..n

Page 9: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

EcoGrid Interfaces

Registry

Session

Query

Taxon

SMS

Resolves references to objects

•Interface definitions

•Data structures

•Service instancesAuthentication

Details on session information

Coarse granularity of resource restriction

Search and retrieve metadata and data

Different levels of “conformance”

Low bar for participation in SEEKSystem to reduce ambiguity in scientific names

Commonly used to address synonomy

Mechanism for relating and resolving data andmetadata concepts

Page 10: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

EcoGrid Query Interfaces

Provides a mechanism for search and retrieval of metadata and federated data

Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval

Different levels of compliance Low barrier for participation Bulk of data will be accessible through Type I

Page 11: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Query Interfaces Implemented

Initial requirement to support query and retrieval from: SRB Metacat DiGIR Xanthoria

Federated data sets that subscribe to a small set of federation schemas

Page 12: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

EcoGrid Query Level I

Basic, entry level exposure of data and metadata for EcoGrid and SEEK

Response contains data – intended for direct communications rather than 3rd party indirection

ResultsetType query(SessionID,QueryType)

byte[] get(SessionID,objectID)

Page 13: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Query Example

<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"

xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-

query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace

prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>

<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"

concept="Genus">Peromyscus</condition></egq:query>

Page 14: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Query Structure

Language independent representation of a query structure

Transformed into the appropriate native language of the data store

Example:<AND> <condition operator="LIKE“ concept="ScientificName">

peromyscus man%</condition>

<condition operator="NOT EQUALS“ concept="DecimalLatitude"> NULL</condition>

</AND>

Page 15: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Specifying the Resultset

Specify the list of concepts (fields) to be returned in the resultset

Simple paths used to identify elements or document subtrees

Effectively flattens the structure of the records, but allows generic representation

Example: <returnfield>/ScientificName</returnfield>

<returnfield>/Longitude</returnfield>

<returnfield>/Latitude</returnfield>

Page 16: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Query Result Set Structure

<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-

1.0.0beta1 ../../src/xsd/resultset.xsd"> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> </resultsetMetadata> <record number="1"

system="http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2" identifier="mvz1" namespace="http://digir.net/schema/conceptual/darwin/2003/1.0" lastModifiedDate="2003-03-03T10:42:13" creationDate="2003-03-03T10:42:13"> <darwin:ScientificName>PEROMYSCUS LEUCOPUS NOVEBORACENSIS

</darwin:ScientificName> <darwin:Longitude>121</darwin:Longitude> <darwin:Latitude>33</darwin:Latitude> </record>

Page 17: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

EcoGrid Query Level II

More detailed handling of results Uses RSIDs to identify resultsets- handles

that can be passed to a third party

Resultset retrieve(SessionID,RSID,start,numrecs)

RSID search(SessionID,query)

query decodeResultsetIdentifier(SessionID,RSID)

statusinfo getResultStatus(SessionID)

int transfer(SessionID,sourceURL,destURL,ObjectID)

Page 18: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

EcoGrid Write

Used to push data back to sources (e.g. publishing EML documents)

Depends on the availability of an authentication system

put(sessionID, objectID, object, type)

delete(sessionID,objectID)

Page 19: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Data Instance Query?

New requirement to support direct query and retrieval with arbitrary data sets

Generally no common schemas between different instances

Could either Push data instance to service that can query

object (e.g. the SRB) Implement interface at the data instance location

Simple JDBC / SQL interface?

dbSchema getDataSchema(sessionID,objectID)

dbResultset search(sessionID,objectID,SQL)

Page 20: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Convergence with Globus?

EcoGrid originally intended to use Globus since it provided much of the infrastructure

Globus is not a viable infrastructure layer due to installation and reliability concerns

Should SEEK implement Globus infrastructure to support project requirements?

Likely to duplicate minimal service definitions and re-implement

Page 21: Science Environment for Ecological Knowledge:  Ecogrid Interfaces Dave Vieglais

Acknowledgements

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.

The Andrew W. Mellon Foundation.

PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)