11
The Federated Data System DataFed: Experiences in Data Homogenization and Networking R.B. Husar, K. Hoijarvi, S. R. Falke, E. M. Robinson , Washington University, St. Louis G. Leptoukh, NASA GSFC Spring AGU, May 29, 2008, Ft. Lauderdale

The Federated Data System DataFed: Experiences in Data Homogenization and Networking

  • Upload
    adanna

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

The Federated Data System DataFed: Experiences in Data Homogenization and Networking. R.B. Husar, K. Hoijarvi, S. R. Falke, E. M. Robinson , Washington University, St. Louis G. Leptoukh , NASA GSFC. Spring AGU, May 29, 2008, Ft. Lauderdale. DataFed Motivated by GEOSS. DataFed in a Nutshell: - PowerPoint PPT Presentation

Citation preview

Page 1: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

The Federated Data System DataFed:

Experiences in Data Homogenization and Networking

R.B. Husar, K. Hoijarvi, S. R. Falke, E. M. Robinson, Washington University, St. Louis

G. Leptoukh, NASA GSFC

Spring AGU, May 29, 2008, Ft. Lauderdale

Page 2: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

DataFed in a Nutshell:

A Federation of autonomous, distributed data providersPerforms non-intrusive wrapping of data into web services

Provides service-based analysis services and tools

General Experience with DataFed:

It is an agile virtual data system can deliver info products to diverse usersThird-party mediation can homogenize distributed data on the fly

Since 2005, DataFed is used by EPA and in research

DataFed Motivated by GEOSS

DataFed development is guided by the meme of GEOSS

Page 3: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Five practices for agile, seamless data federation:

1. Space-Time Query for standardized access to all data (WCS)

2. Data Wrappers for turning heterogeneous data into web services

3. Data Mediators for transforming data into ‘Views’

4. Mashups for connecting autonomous application

5. DataSpaces for shared metadata by the users, for the users

Page 4: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Parameter-Space-Time Query Using OGC WCS Data Access Protocol

Regardless of the data location, data type and format,

• the parameter-space-time query is the same

• the return is in user selectable format from the offerings

Coverage=THEEDDS.T& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=NetCDFCoverage=SEAW.Refl& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=GeoTIFFCoverage=SURF.Bext& BBOX=-126,24,-65,52,0,0 &TIME=2002-07-07/2002-07-07&FORMAT=NetCDF-table

Grid Image Station Data

Parameter Bounding Box Time Range Out Format

Page 5: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

DataFed wrappers are non-intrusive, third party

Third Party Data Wrappers Heterogeneous input data >>> Homogeneous (WCS) Query

Page 6: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Mediated User-Data InterfaceMediator turns data into Views

Mediated Integration is a flexible design pattern for System of Systems

Client-Server design is demanding:

User carries the burden of integration

Query

Data Views

Page 7: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

SOAP

RDF

Mashup Workflow

Mashups: Loose Coupling of Autonomous Applications

DataFed – Wiki -- GoogleEarth

Page 8: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

DataSpaces for Datasets

GEOSS Comp.Registry

CommunityAQ Portal

extracts

ServiceOfferor

registers

GEOSSClearinghouse

Catalog list Searches, harvests

invokes

referencespublishes

provides

Standards;SIF Registry

Adopted from Percivall, Feb 2008 by R. Husar, March 2008

CommunityAQ Catalog

CatalogUser

Service Workflow

composes DataAnalyst

visualizes

Reportsto

DecisionMaker

PolicyAnalyst

Informs

Services

find

CommunityDataSpaces

links to

GEOSS CoreService Offerors and Users

Shared Metadata by the Users, for the Users

Page 9: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Wiki ‘DataSpaces’Creating and Sharing Metadata

Community Catalog - Find Dataset

Describe Dataset

Discuss Dataset

ESIP Communal Wiki

• Semantic Wiki: Structured (RDF and Unstructured Content

• Open, Standard Matadata - RDF

• Ready for Export/Harvesting by Registries, Catalogs

Page 10: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Sharing Best Practices: GEO Best Practice Wiki

Page 11: The Federated Data System DataFed: Experiences in Data Homogenization and Networking

Developments and Challenges:

Favorable Engineering Developments:

• A Core network for Air Quality data sharing is emerging.• Standards are available for sharing previously unstructured data• Third-party mediation can homogenize the distributed data• Agile SOA-based systems can deliver info products to diverse users• Since 2005, one such IS, DataFed is used by EPA and in research

However:

• Service interfaces are still uneven; networks are still fragile• The utility of social networking in science is not understood• Users can not provide feedback to upstream providers • Many cultural, legal and other barriers hamper progress