27
http://www.ogsadai.org.uk The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh [email protected]

The OGSA-DAI Project Databases and the Grid

Embed Size (px)

DESCRIPTION

The OGSA-DAI Project Databases and the Grid. Neil Chue Hong Project Manager EPCC, Edinburgh [email protected]. What is OGSA-DAI?. It is a project: OGSA Data Access and Integration: funded by the UK eScience Grid Core Programme It is a vision: - PowerPoint PPT Presentation

Citation preview

Page 1: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

The OGSA-DAI ProjectDatabases and the Grid

Neil Chue HongProject Manager

EPCC, Edinburgh

[email protected]

Page 2: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

What is OGSA-DAI?

It is a project:– OGSA Data Access and Integration: funded by the UK

eScience Grid Core Programme

It is a vision:– From simple database access to truly virtualised data

resources

It is a standard:– The GridDataService Specification from the Data Access and

Integration Working Group (DAIS-WG) of the Global Grid Forum (GGF)

It is software that you can use:– Current version is R2.5

Page 3: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

OGSA-DAI Objective

To define:

– open standards and – open source based – uniform service interfaces – for accessing heterogeneous data sources – within the Open Grid Services Architecture (OGSA) framework

Why?– Because we are increasingly wanting to integrate different

data sources from different organisations together– The Grid, and OGSA, appears to provide a framework for

producing software to do this

Page 4: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Who are we?

£3 million, 18 months, started February 2002Funded by the Grid Core Programme

IBMUSA

Oxford

Glasgow

Cardiff

Southampton

London

Belfast

Daresbury Lab

RAL

EPCC & NeSC

Newcastle

IBM Hursley

Oracle

Manchester

Cambridge

Hinxton

Contributing to the globalgrid computing community

EPCC & NeSCIBM UKIBM USAManchester e-SCNewcastle e-SCOracle373 man months

Page 5: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

What are we doing?

Grid Plumbing & Security Infrastructure

Scheduling Accounting

Monitoring Diagnosis Logging

Data Intensive Applications

Data & Storage Resources

Distributed

Scientific Data Mining & Integration Technology

Page 6: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

What are we doing?

Grid Plumbing & Security Infrastructure

Scheduling Accounting

Monitoring Diagnosis Logging

Data Intensive Applications

Data & Storage Resources

Distributed

Authorisation Data Access

Data Integration

Structured Data

Scientific Data Mining & Integration Technology

Page 7: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

What are we doing?

Grid Plumbing & Security Infrastructure

Scheduling Accounting

Monitoring Diagnosis Logging

Data Intensive Applications

Data & Storage Resources

Distributed

Authorisation Data Access

Data Integration

Structured Data

Scientific Data Mining & Integration Technology

OperationsTeam

App. Developers

Owners

Page 8: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

What are we doing?

Grid Plumbing & Security Infrastructure

Scheduling Accounting

Monitoring Diagnosis Logging

Data Intensive Applications

Data & Storage Resources

Distributed

Authorisation Data Access

Data Integration

Structured Data

Scientific Data Mining & Integration Technology

OperationsTeam

App. Developers

Owners

Data Intensive Application Scientists

Data ProvidersData Curators

Tech. Developers

Page 9: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

DAIS WG

GridDatabaseService Specification– DAIS WG of the GGF– Aim to produce a V1.0 specification by early 2004– Defines an interface for a GridDatabaseService– May contributors, not just OGSA-DAI Project– OGSA-DAI (the software) seeks to be a reference

implementation of this standard• But does not necessarily track it exactly just now

– Requirements and Overview Informational documents also published

Page 10: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

The OGSA-DAI Approach

Reuse existing technologies and standards– OGSA, Query languages, Java, transport

Three key services:– GridDataService– GridDataServiceFactory– DAIServiceGroupRegistry

Benefits:– Location independence– Hides heterogeneity– Scalable– Flexible– Dynamic

Page 11: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

OGSA-DAI Positioning - Today

LocationMeta Data

Notification

OGSA

LifetimeDrivers

Query (CreateRetrieveUpdateDelete)

DataFormat

OGSA-DAI Basic Services

OGSA-DAI Distributed Query

Delivery

Database, Communication, OS… Technology

GDS DAISGRGDSF

Page 12: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

OGSA-DAI To Date

Assuming that OGSA becomes the standard framework– Have adopted the OGSA approach

Have first concentrated on data access– Released software has only limited data integration so far– Distributed query processor prototype due in July

Implementation provides focus on basic functionality first– But architecturally we have tried to answer many pertinent

questions– Functionality will increase over subsequent releases

Page 13: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

GDS in action

Database (XindiceMySQLOracleDB2)

1a. Request to Registry for sources of data about “x” 1b. Registry

responds with Factory handle

2a. Request to Factory for access to database

2b. Factory creates GridDataService to manage access

2c. Factory returns handle of GDS to client

3a. Client queries GDS with SQL, XPath, XQuery etc

3b. GDS interacts with database

3c. Results of query returned to client as XML

SOAP/HTTP

service creation

API interactions

Analyst

RegistryDAISGR

FactoryGDSF

Grid Data Service

GDS

Consumer

OR3d. Results of query delivered to consumer as XML

Page 14: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Activities

OGSA-DAI is structured around the concept of activities

This framework allows new functionality to be added easily

Three types of activity at present:– statement (e.g. SQLQuery, Xupdate)– transformation (e.g. XSL translation, compression)– delivery (e.g. GridFTP)

OGSA-DAI provides implementations of common functionality, others can extend

Page 15: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Documents

Accessing a Grid Data Resource is done using Documents– caveat: this may change

A document allows you to:– define parameters– execute activities– deliver results

Written in XML, normally used by a client.

<gridDataServicePerform><request name=“myRequest”><parameter name=“idname”><value name=“idvalue”>10</value></parameter>

<sqlQueryStatement name=“myStatement”><sqlParameter position=“1” from=“idvalue”/><expression>SELECT * FROM littleblackbook WHERE id=?</expression><webRowSetStream name=“statementresult”/></sqlQueryStatement>

<deliverToResponse name=“d1”><fromLocal from=“statementresult”/></deliverToResponse></request></gridDataServicePerform>

Page 16: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

OGSA-DAI Core Services

OGSA-DAI Release 2.5 – out now– Java, Tomcat, Globus Toolkit 3 Beta– Supports MySQL, DB2, Xindice; SQL92, XPath, Xupdate

OGSA-DAI Release 3 – end July– Java, Tomcat, Globus Toolkit 3.0– Supports MySQL, DB2, Oracle, Xindice; SQL92, XPath,

Xupdate– Adds Notification, Internationalisation, Transactions, Caching

Continue to track Globus Toolkit 3 releases– Experimental, then production, GT3 grids will help

Page 17: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Asynchronous delivery – Pull

Asynchronous delivery – Push

Client

Consumer

DB

GDS

GDT

GDS Instance

Ra

Q1

2

3

Rs

DTGSH/R + data id

D + GDH

Client

Consumer

DB

GDS

GDT

GDS Instance

Ra

Q + D + GSH/R1

2

3

Rs

DT

GSH/R

Asynchronous Delivery

Page 18: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

GDSClient

GDSClient

Client

1 Operation

GDSClient

2

DB

Operation

Operation

OperationDB

4

Operation

Operation

Operation

DB

GDS

GDS

GDS

3

Operation

Operation

Operation

DB

GDS

GDS

GDS

Client5

Operation

Operation

Operation

DB

GDS

GDS

GDS

GDS Composition

Page 19: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Distributed Query Service

A higher level service:– Extension of Polar* query processor, partitions and schedules

queries– Sits on top of OGSA and OGSA-DAI

Defines new portTypes and services– GridDistributedQuery(GDQ) PortType– GridDistributedQueryService(GDQS) – wraps Polar*– GridQueryEvaluatorService(GQES) – perform subqueries

Currently based on OGSA-DAI Release 1.5

Page 20: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

DQS Architecture

Page 21: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

DQP in action

Page 22: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

DQS: the future

The GridDistributedQueryService – is an example of a higher level data integration service which utilises

OGSA-DAI core services– Assumes that GDSF, GDQS Factory and client live in different

containers– Really requires a well-defined meta-model for the physical schema of

a database• Being partially addressed in DAIS WG

– Shows how a GDS can be both client and service• Service hierarchy and composition

DAIT (proposed follow-on to OGSA-DAI) would produce a robust reference implementation of the DQP components

Page 23: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Projects using OGSA-DAI

Industry:– FirstDIG: business process analysis (with First Transport Group)

• OGSA-DAI with datamining

Collaborative– Bridges: database integration over six geographically distributed

genomics research sites (with IBM UK)• OGSA-DAI with DiscoveryLink

– eDIKT: porting OGSA-DAI to other platforms• OGSA-DAI with performance

– DEISA: linking Europe’s HPC centres• OGSA-DAI with distributed accounting

– MS .Net Grid: porting OGSA-DAI to the .Net framework (with Microsoft Research UK)

• OGSA-DAI with .Net

Page 24: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

ODD Genes

OGSA-DAI used to query gene expression data resources at GTI and HGU– One data resource: low spatial resolution, high gene resolution– Other resource: high spatial resolution, low gene resolution– Query one database and use data to find correct data

resource to run more detailed query and produce visualisation– Simple example of data integration at work

Client

Query

Query

Render

GTIGDS

GDS

EPCC

HGU

Page 25: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Project Timeline

Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 Feb ’03 Sep ’03

Ship Release 1 (Jan 15th 2003)

RDB + GT2 / OGSA Prototypes Available

XML + OGSA Prototype Available

Design Documents & Demos for DAIS WG @ GGF5

XML + OGSA Prototypes for Early Adopters

WS + GSI UK support ( > 100 downloads)

Tutorial @ GGF7

GGF6 WG Papers & Prototypes

today

Release 2

Release 3

Phase 2 StartsPhase 1 Starts

Release 1.5 (Feb 28th 2003)OGSADAI Tutorial @ NeSC

Early Adopters Workshop @ NeSC

Tutorial @ NeSC

GT3 A3 GT3 Beta

GT3 A4 GT3 Final

May ’03

GT3 A1

GT3 A2

TP5TP4

Release 2.5

Page 26: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

A DAIT for the Future

DAIT (Data Access and Integration Two)– follow on project from OGSA-DAI, funded for two years– continue to research, prototype and productise– release every six months, R4 in December 2003– R4:

• support for SQL Server and structured filesystems

• extended DBMS management functionality (e.g. archive)

• bulk load operations (where supported)

• support for DFDL file access

• triggers exposed through notification

– R5• Distributed Query Processing, Distributed Transactions

• Virtualised views across databases

Page 27: The OGSA-DAI Project Databases and the Grid

http://www.ogsadai.org.uk

Further information

The OGSA-DAI Project Site:– http://www.ogsadai.org.uk

The DAIS-WG site:– http://cs.man.ac.uk/grid-db

OGSA-DAI Users Mailing list– [email protected]– General discussion on grid data access and integration

Formal support for OGSA-DAI releases– http://www.ogsadai.org.uk/support + [email protected]

OGSA-DAI training courses– http://www.ogsadai.org.uk/courses/