30
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar ([email protected]) Matthew Jones ([email protected]) Bertram Ludäscher ([email protected]) UC DAVIS Department of Computer Science San Diego Supercomputer Center

The SEEK EcoGrid: A Data Grid System for Ecology

  • Upload
    amory

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

UC DAVIS Department of Computer Science. San Diego Supercomputer Center. The SEEK EcoGrid: A Data Grid System for Ecology. Arcot Rajasekar ([email protected]) Matthew Jones ([email protected]) Bertram Ludäscher ([email protected]). Large collaborative NSF/ITR (2002-2007) - PowerPoint PPT Presentation

Citation preview

Page 1: The SEEK EcoGrid:  A Data Grid System for Ecology

The SEEK EcoGrid: A Data Grid System for Ecology

Arcot Rajasekar ([email protected])

Matthew Jones ([email protected])

Bertram Ludäscher ([email protected])

UC DAVISDepartment ofComputer Science

San Diego Supercomputer Center

Page 2: The SEEK EcoGrid:  A Data Grid System for Ecology

Science Environment for Science Environment for Ecological KnowledgeEcological Knowledge

Large collaborative NSF/ITR (2002-2007)

Bringing together ecologists, IT experts, CS researchers, …

SEEK.ecoinformatics.org

Page 3: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

What is SEEK?

• Multidisciplinary research project to facilitate …

• Access to ecological, environmental, and biodiversity data– Enable data sharing & re-use– Enhance data discovery at global scales

• Scalable analysis and synthesis – Taxonomic, Spatial, Temporal, Conceptual integration of

data, addressing data heterogeneity issues– Enable communication and collaboration for analysis– Enable re-use of analytical components

Page 4: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

SEEK Components

Main Components:• Kepler

– Problem-solving environment for scientific data analysis and visualization “scientific workflows”

• EcoGrid– Distributed data network for environmental,

ecological, and systematics data– Making diverse environmental data systems

interoperate

• Semantic Mediation System– “Smart” data discovery and integration

• Knowledge Representation WG• Taxon WG• BEAM WG• Education, Outreach, Training

Page 5: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Ecological Metadata Language

Metadata: a means to manage ecological data There is no universal data model for ecology Accommodate heterogeneity and dispersion

EML Common language for archiving and transporting data Discovery information

Creator, Title, Abstract, Keyword, etc. Content Context Physical, logical structure

SEEK adds semantic structure

Page 6: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

An Example EML Document

<?xml version="1.0"?><eml:eml packageId="piscoUCSB.5.20" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.0.0"><dataset> <shortName>Alegria Temperatures</shortName> <title>PISCO: Intertidal Temperature Data: Alegria, California: 1996-1997</title> <creator id="C.Blanchette"> <individualName> <givenName>Carol</givenName> <surName>Blanchette</surName> </individualName> <organizationName>PISCO</organizationName> <address> <deliveryPoint>UCSB Marine Science Institute</deliveryPoint> <city>Santa Barbara</city> <administrativeArea>CA</administrativeArea> <postalCode>93106</postalCode> </address> </creator> <abstract> <para>These temperature data were collected at Alegria Beach, California, and were ... </para> </abstract> <keywordSet> <keyword>OceanographicSensorData</keyword> <keyword>Thermistor</keyword> <keywordThesaurus> PISCOCategories </keywordThesaurus> </keywordSet> <intellectualRights><para>Please contact the authors for permission to use these data. Please also acknowledge the authors in any publications.</para> </intellectualRights> <contact> <references>C.Blanchette</references> </contact></dataset></eml:eml>

Transform

Page 7: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

SEEK Overview

Page 8: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Ecogrid Focus

Data and Metadata Distributed Data XML-based Metadata

Service to Semantic Mediation Layer Access to Ontologies and Taxon Services Helping with Semantic Data Integration

Service to Analysis and Modelling Layer Interaction with Kepler - Workflows Interaction with Grid Computing Facilities

Access to Legacy Apps LifeMapper Spatial Data Workbench

Page 9: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

SEEK EcoGrid

• Goal: allow diverse environmental data systems to interoperate– Hides complexity of underlying systems using lightweight

interfaces– Integrate diverse data networks from ecology, biodiversity, and

environmental sciences

• Data systems– Any system can implement these interfaces – Prototyping using:

• Metacat, SRB, DiGIR, Xanthoria, etc.

• Supports multiple metadata standards– EML, Darwin Core as foci

Page 10: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Web services

• Service Oriented Architecture (SOA)– Remote discovery and execution of services

• Network transport of data (HTTP)• Message format (SOAP/XML)• Service interface description (WSDL)

Morpho

12

3

Diagram from http://www.w3.org/TR/2002/WD-ws-arch-20021114/

Page 11: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Grid Services

• A Grid service is a Web service– plus

• Lifecycle management – (persisting the service over outages)

• State management– (tracking sessions across multiple requests)

• Factory services– (allowing many clients to connect)

• Security– (authorization)

• …

Ecogrid defines a standard set of grid interfaces for useby many data servers

Page 12: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Example

query()

get()

EcoGrid WSDLquery(session, query)get(session, identifier)

EcoGridRegistry

1. Publish

3. Return service description

4. Execute search,handle response

5. Execute get,handle response

Morpho

2. Find service

Page 13: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Query Interfaces

• Provides a mechanism for search and retrieval of metadata and federated data– Supports third party interaction with search results

• forwarding of result set identifiers to another service instance for retrieval

• Different levels of compliance– Low barrier for participation– Bulk of data will be accessible through Type I ResultQuery

Page 14: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Query Level I

• Basic, entry level exposure of data and metadata for EcoGrid and SEEK

• Response contains data – intended for direct communications rather than 3rd party indirection

ResultsetType query(SessionID,QueryType)

byte[] get(SessionID,objectID)

Result Query

Page 15: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Query Conditions

• Language independent representation of a query structure

• Transformed into the appropriate native language of the data store

Example:

<AND>

<condition operator="LIKE“

concept="ScientificName">peromyscus%</condition>

<condition operator="NOT EQUALS“ concept="DecimalLatitude">NULL</condition>

</AND>Query

Page 16: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Specifying the Resultset

• Specify the list of concepts (fields) to be returned in the resultset

• Simple paths used to identify elements or document subtrees

• Effectively flattens the structure of the records, but allows generic representation

Example:

<returnfield>/ScientificName</returnfield>

<returnfield>/Longitude</returnfield>

<returnfield>/Latitude</returnfield>

Query

Page 17: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Full Query Example

<egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org"

xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 ../../src/xsd/query.xsd">

<namespace prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace>

<returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE"

concept="Genus">Peromyscus</condition></egq:query>

Query

Page 18: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

<rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1

../../src/xsd/resultset.xsd">

<resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</

system> </resultsetMetadata>

<record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> …</rs:resultset>

Query Result Set Structure

Result

Page 19: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Get & Put

• get enables retrieval of the content of a dataset/file such as SRB, MetaCat.

• get also enables SQL querying of relational databases (Oracle, DB2, etc), which are pre-registered as a data source in SRB.

• put for data: allows users to create (upload) files into EcoGrid resources such as MetCat, SRB.

• put for metadata: Ecogrid put service also allows ingestion of metadata such as EML in MetaCat or User-defined metadata in SRB.– Depends on the availability of an authentication and access

control system

– put(sessionID, objectID, object, type)– delete(sessionID,objectID)

Page 20: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Building the EcoGrid

AND

LUQ

NTL

Metacat node

Legacy system

LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)

SRB node

DiGIR node

VCR

VegBank node

Xanthoria node

HBR

Page 21: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Client Interactions

• Modes of interaction– Client-server– Fully distributed– Peer-to-peer

• EcoGrid Registry– Node discovery– Service discovery

• Aggregation services– Centralized access– Reliability– Data preservation

Page 22: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Layers in EcoGrid

Page 23: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

EcoGrid Queries in Kepler

Page 24: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Metadata-driven analysis cycle

Page 25: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Status

• Read, Query & Register Completed• Simple Registry Operational• EcoGrid Wrappers completed for:

– MetaCat– SRB– DiGIR– Xanthoria

• Available Interfaces– WSDL– Simple Web Interactivity– Kepler

Page 26: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Acknowledgements

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

PBI Collaborators:

NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of California, Davis, University of Kansas (Center for Biodiversity Research)

Kepler contributors:

SEEK, Ptolemy II, DOE SDM/SciDAC, GEON, and others.

Page 27: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Q & A

Page 28: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Frequently Asked Questions …

• Which version of Grid services do you use?– We currently use 3.2.x because it was the last stable version

based on OGSA. It seems that WSRF does not support the OGSA Factory pattern, which is the main Grid Service feature that we utilize and wouldn’t want to lose. We may migrate to WSRF eventually.

• How can a user (or developer) discover what catalogs are on the EcoGrid?– In Kepler, click the "Sources" button on the Data tab. The UI

allows a basic query of the EcoGrid registry to discover new nodes and choose which should be searched.

– Developers can program to the EcoGrid Registry API.• How much is the EcoGrid *integrated*? Is there a common

query language?– Yes, there is a common query syntax for expressing path-based

metadata queries. This syntax does not do any mapping among various metadata languages. We still need of a system that can translate a query that uses terms from one metadata language (e.g., DarwinCore) into queries for another metadata language (e.g., EML). The SEEK SMS system will help with this mapping.

Page 29: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Frequently Asked Questions …

• Is the EcoGrid a "federation of federations" ? – In a sense. The EcoGrid is an *API* (specifically a Grid Services

API) that allows clients to use a common set of communication protocols to access diverse data systems. The EcoGrid API has been implemented for Metacat, DIGIR, and SRB, all of which are federations. As clients can access the various systems via EcoGrid, the latter can be considered a federation of federations. The EcoGrid Registry has a list of systems that have published EcoGrid interfaces that are accessible to clients.

• Where are the WSDLs?– http://ecogrid.ecoinformatics.org/ogsa/services/org/

ecoinformatics/ecogrid/EcoGridQueryInterfaceLevelOneService?wsdl

• What’s on the EcoGrid right now?– The KNB network is gathering data and metadata from NCEAS, 24

LTER sites, and about 200 other field stations (KNB EcoGrid node)– The DIGIR system federates access to museum collections data in

the form of Darwin Core records. The EcoGrid node at KU points at this network of about ~150 museums that are accessible through DIGIR.

– SRB is currently used to hold some data objects that are described via EML metadata records that are in the KNB Metacat.

Page 30: The SEEK EcoGrid:  A Data Grid System for Ecology

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

Frequently Asked Questions …

• Where is the code for the EcoGrid? – Most code is in CVS at seek/projects/ecogrid. Some Kepler-

specific client-side UI code is in the Kepler CVS. – http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/

ecogrid– There are also Ecogrid design docs, meeting notes, etc.

• Are there plans for an "EcoGrid Portal" so that end users can access easily contribute data? – Yes, this is under development. In the interim, one can search

the KNB and DIGIR sites individually, or use Kepler.