14
Extensible Framework for Data Access & Integration Malcolm Atkinson Director www.nesc.ac.uk 10 th November 2004

Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004

Embed Size (px)

Citation preview

Page 1: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Extensible Framework for Data Access &

Integration

Malcolm AtkinsonDirector

www.nesc.ac.uk

10th November 2004

Page 2: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Database GrowthPDB Content Growth

Page 3: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Wellcome Trust: Cardiovascular Functional Genomics

Glasgow Edinburgh

Leicester

Oxford

LondonNetherlands

Shared dataPublic curated

data

BRIDGESIBM

Page 4: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Biochemical Pathway Simulator

Closing the inf ormation loop – between lab and computational model.

(Computing Science, Bioinformatics, Beatson Cancer Research Labs)

DTI Bioscience Beacon Project Harnessing Genomics Programme

Slide from Muffy Calder, Glasgow

Now largest EU project in the Life Sciences – see http://www.cancerresearchuk.org/news/pressreleases/scottishscientists_22july04

Walter Kolch

Page 5: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

eDiaMoND – Compute

Mammograms have different appearances, depending on image settings and acquisition systems

StandardMammoFormat

StandardMammoFormat

Temporal mammography

ComputerAidedDetection

3D View

Provided by eDiamond project: Prof. sir Mike Brady et al.

Page 6: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Automatic registration technology

Rigid registration of MR and CT imagesof the head

Inter-subject image warpingProvided by IXI project: Prof. Derek Hill et al.

Page 7: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Move Computation to Data

Code scaleDepends on wet-ware

No noticeable rate of improvement

Data scaleGrows Moore’s Law or Moore’s Law2

Analysis of data Extracts & derivatives used

Often smaller – more value for current investigation

Implies move code to dataSQL, Xquery, Java code, …

Extensibility mechanisms used by OGSA-DAIersJava mobility (e.g. DataCutter), database procedures, …

Increasingly

necessary

Application control or

higher-level service

decisions

Page 8: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Integration is Everything

MotivationNo business or research team is satisfied with one data resource

Data Curation Expertise Human CentredIntegration Human centredDomain-specialist driven

Dynamic specification of combination functionIterative processes

Revised request minutes later Revised request after months of thought

Sources inevitably heterogeneousTime-varying content, structure & policiesRobust, stable steerable integration services

Higher-level services over multiple resourcesFundamental requirements for (re)negotiation

Federation or Virtualisation

preceding integration

or kit of integration tools to be interwoven

with an application?

Page 9: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

OGSA

Infrastructure Architecture

Grid or Web Service Infrastructure

Data Intensive Applications for Science X

Compute, Data & Storage Resources

Distributed

Simulation, Analysis & Integration Technology for Science X

Data Intensive X Scientists

Virtual Integration Architecture

Generic Virtual Data Access and Integration Layer

Structured DataIntegration

Structured Data Access

Structured Data Relational XML Semi-structured-

Transformation

Registry

Job Submission

Data Transport Resource Usage

Banking

Brokering Workflow

AuthorisationOGSA-DAI

Page 10: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Database (Xindice, MySQL

Oracle, DB2)

Request to Registry for sources of data about “x”

Registry responds with Factory handle

Request to Factory for access to database

Factory creates GridDataService

Factory returns handle of GDS to client

Client queries GDS with SQL, XPath, XQuery etc

GDS interacts

with database

Query results

returned XML

SOAP/HTTP

service creation

API interactions

Analyst

RegistryGDSR

FactoryGDSF

Grid Data Service

GDS

Consumer

ORdelivered to consumer

as XML

OGSA-DAI

Page 11: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

OGSA-DAI Downloads R4

690 downloads since May 04-Actual user downloads not search engine crawlers-Does not include downloads as part of GT3.2 releases

Total of 838 registered users

R1.0 (Jan 03) 104R1.5 (Feb 03) 108R2.0 (Apr 03) 250R2.5 (Jun 03) 291R3.0 (Jul 03) 792R3.1 (Feb 04) 630

Total 2865

United Kingdom21%

China26%

United States

13%

Japan

5%

Unknown7%

Germany5%

Italy5%

Austria2%

Australia2%

France3%

Taiwan2%

Downloads by Country – OGSA-DAI R4.0

Page 12: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Multiple tasks / request

1

2

R E Q U E S T O R S T U B

C L I E N T A P I

Data Set

Data Set

dr

IdentTypeValue

IdentTypeValue

IdentTypeValue

IdentTypeValue

IdentTypeValue

IdentTypeValue

IdentTypeValue

IdentTypeValue1234567 0

Page 13: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Be Direct

Double Handling costs too muchMemory cycles, bus capacity, cache disruption, …

Double Handling via discs pathologically badData translation expensive

Avoid Deliver as stored, …

ComposeStream

Main memory is not big enoughStream or use Disk

Couple generator & consumer directlyStream from RAM to RAM

Requires coupled computation execution

Breaks downboundaries and

merges data, execution &

transport requirements.

Demands smart workflow

enactment service &

foundation services

Models for process transformation and optimisation

Page 14: Extensible Framework for Data Access & Integration Malcolm Atkinson Director  10 th November 2004

Take Home Message

Data Access & IntegrationTwo Models

kit of parts Virtualisation

Ubiquitous NeedsPervasive and growing number and diversity of data collectionsOpportunity and power to integrate and mine

OGSA-DAI PioneeringTalk by Amrey Krause - 5:15 Today

Growing CommunityImplementationStandardsUsersJoin the party of users, contributors & researchers