34
UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre www.nesc.ac.uk 2 nd October 2002 The UK Biological Grid — Data and Computation The Wellcome Trust Genome Campus Hinxton, Cambridgeshire

UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre 2 nd October

Embed Size (px)

Citation preview

Page 1: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

UK e-Science

Grid Infrastructure meets BiologicalResearch Challenges

Malcolm Atkinson

Director of National e-Science Centrewww.nesc.ac.uk

2nd October 2002

The UK Biological Grid — Data and ComputationThe Wellcome Trust Genome Campus

Hinxton, Cambridgeshire

Page 2: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Overview

UK e-ScienceReminder of Investment and Infrastructure

International e-ScienceExamples and Collaboration

Data Access and IntegrationLego Bricks for Scientific Application Developers

A Computer Scientist’s View of Biology

Diversity and Opportunity

The Way Ahead

Page 3: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

e-Science

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Scientists (Biologists) have done this for Centuries

Page 4: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

e-Science (take 2)

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Text, digital media, structured, organised & curated data, computable

models, visualisation, shared instruments, shared systems,

shared administration, …

Nationally & Internationally Distributed, …

Routine, Daily, Automated, …

That Requires very Significant Investment in DigitalSystems and their Support

Page 5: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

e-Science (take 3)

Fundamentally about CollaborationSharing

Ideas Thought processes and Stimuli Effort Resources

Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure

Digital networks, digital work-places, digital

instruments, …

Metadata, ontologies, standards, shared curated

data, shared codes, …

Common platforms, shared software, shared training, …

The Grid SHOULD make this much easier byproviding a common, supported high-level of Software and Organisational infrastructure

Authentication, Authorisation, Accounting,

Provenance, Policies, …

Shared Provision of Platform,

Page 6: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Grid ExpectationsPersistence

Always there, Always Working, Always Supported

StabilityYou can build on foundations that don’t move

Trustworthy & PredictableHonours commitments

Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance

High-level & ExtensibleThe capabilities you need are already there

UbiquitousYour collaborators use it

Page 7: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Grid RealityPersistence

Always there, Always Working, Always Supported

StabilityYou can build on foundations that don’t move

Trustworthy & PredictableHonours commitments

Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance

High-level & ExtensibleThe capabilities you need are already there

UbiquitousYour collaborators use it

Political, Economic & Technical issues to Solve

Early days but Open Grid Services link with Web

Services + GGF standardisation

Not yet but very substantialglobal effort to achieve this

Good basis for extensionCommitment to basic functionality

WS + Community effort

Global & Industrial Rallying CryMust work with Web Services

Page 8: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Cambridge

Newcastle

Edinburgh

Oxford

Glasgow

Manchester

Cardiff

Southampton

London

Belfast

Daresbury Lab

RALHinxton

UK Grid Network

Nationale-

ScienceCentre

Access Grid always-on video always-on video wallswalls

HPC(x)

Page 9: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

National e-Science Centre

EventsWorkshopsResearch MeetingsInternational Meetings

History of EventsGGF5HPDC11Summer school > 50 workshops held> 1000 people in totalMany return often

Planned Events25 workshops Conferences to 2005

Visitors3 arrived4 arranged

International collaboration, visits & visitors

ChinaArgonne National LabSDSCNCSA…

Centre ProjectsPilot ProjectsRegional SupportResearch Projects

EPSRC, MRC, WT, SHEFC

Page 10: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

A day in the life of NeSC

Page 11: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago

tomographic reconstruction

real-timecollection

wide-areadissemination

desktop & VR clients with shared controls

Advanced Photon Source

Online Access to Scientific Instruments

archival storage

From Steve Tuecke 12 Oct. 01

Page 12: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

UCSF

UIUC

From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign

Page 13: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

DataGrid Testbed

Dubna

Moscow

RAL

Lund

Lisboa

Santander

Madrid

Valencia

Barcelona

Paris

Berlin

LyonGrenoble

Marseille

BrnoPrague

Torino

Milano

BO-CNAFPD-LNL

Pisa

Roma

Catania

ESRIN

CERN

HEP sites

ESA sites

IPSL

Estec KNMI

(>40)

[email protected] - [email protected]

Testbed Sites

Page 14: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

A Simplified Grid Anatomy

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute ResourcesOperationsTeam

ApplicationDevelopers

Distributed

Owners

Scientific Users

Page 15: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

A Biological Grid Anatomy

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute Resources

Distributed

Biological Users

Data Access

Data Integration

Structured Data

Page 16: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Database Growth

PDB protein structures

Page 17: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is but

Page 18: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is butWhat you do with it

SharingCurationMetadataAutomated movement, access & integrationComputational Access

Page 19: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Scientific Data

Deluge of DataExponential growth

Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months

Not How big it is butHow you Embrace & Manage Change

The Database is a Knowledge chestThe Database is a Communication HubAutonomously Managed (Curated) changeAn Essential part of e-BioMedical Science

Page 20: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Wellcome Trust: Cardiovascular Functional Genomics

Glasgow Edinburgh

Leicester

Oxford

LondonNetherlands

Shared dataPublic curated

data

Page 21: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Data Access & Integration

Central to e-ScienceEspecially Earth Sciences, Ecology, Biology & Medicine

Collaboration Shared Databases Curated Knowledge Accumulated Observations Accumulated Simulations

Computation Data mining Input to models Calibration of models

Presentation Publication of results Visualisation

Page 22: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

GGF DAIS WGChairs

Norman Paton (Manchester Uni.)Leanne Guy (CERN)Dave Pearson (Oracle UK)

ActivityBoF GGF4 TorontoWG Meeting GGF5 EdinburghPapers for GGF6Workshops & Mail lists

GoalsAgree Standards for Database Access & IntegrationFreely available reference implementations

OGSA-DAI one source & focus for discussions

Norman Paton,Inderpal Narang,

Leanne Guy, Susan Maliaka, Greg Ricardi, …

Page 23: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

OGSA-DAI project

Lego kit for Data Access & IntegrationComponents for e-Science Applications

Accelerated Application DevelopmentMultiple Data Models

Distributed DataAccess via Grid & Proxies

Integration, Translation & Transformation

Open Source Reference Implementation

For DAIS-WG standard

Trigger for Component ConstructionStart a community

Page 24: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Oxford

Glasgow

Cardiff

Southampton

London

Belfast

Daresbury Lab

RAL

OGSA-DAI Partners

EPCC & NeSC

Newcastle

IBMUSA

IBM Hursley

Oracle

Manchester

EPCC & NeSCIBM UKIBM USAManchester e-SCNewcastle e-SCOracle £3 million, 18 months, started February 2002

Cambridge

Hinxton

Page 25: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Primary Components

Client

Consumer

GDS

GDSF

GDSR

DB

Page 26: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Advanced Components

Consumer

GDS Client

GDT

Translation

Translation

DB

GDS:PerformScript

Page 27: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Composed Components

Translation

Consumer

GDS

Translation

GDT

GDS:performScript

GDT

GDT

Client

GDS:performScript

GDS:performScript

GDS:performScript

Page 28: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Distributed Query

Registry R

Client

Consumer GDT

GDS

GDTV

DQP

GDT

GDTV

GDS

QPM

NS

F Factory

Evaluator

GDTV GDT

Evaluator

GDTV GDT

Evaluator

GDTV GDT

GDS

GDS

GDTV DB

T

Q

T

PNM

T

PNM

GDS

T

GDTV

D Q P : D is t ribu te d Q u e ry Pro ce s s o rG D T : G rid D a ta Tra n s po rtT : Tra n s la t io nQ : Q u e ryG D TV : G rid D a ta Tra n s po rt V e h icleF : Fa cto ryQ PM : Q u e ry Pro g re s M o n ito rPNM : Pro g re s s No t if ica t io n M e s s a g eA M : A pplica t io n M e ta da taC R M : C o m pu ta t io n a l R e s o u rce M e ta da taNS : No t if ica t io n S in k

1

2

5

3

4

5

5

7

7

6

6

7

7

7

7

(7) 8

6

Page 29: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

OGSA-DAI Time Line

Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 Feb ’03 May ’03 Sep ’03

Ship Alpha Release for GT3 Integration

RDB + GT2 / OGSA Prototypes Available

XML + OGSA Prototype Available

Design Documents & Demos for DAIS WG @ GGF5

XML + OGSA Prototypes for Early Adopters

WS + GSI UK support ( > 100 downloads)

Phase 2 StartsPhase 1 Starts

Presentation & Beta @ GGF7

GGF6 WG Papers & Prototypes

Productisation, RAMPS &Extension

Page 30: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

OGSA-DAI Summary

On Schedule & Going WellContributions via DAIS-WG @ GGF5 & 6Releases with GT3 Releases scheduledStatus: Early Days

Released prototypesTested Architectural DesignUsing OGSAWorking with Early Adopter Pilot Projects

AstroGrid & MyGrid

Influence OGSA-DAI directionVia DAIS-WG & Direct messages to us

Page 31: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Biomedical e-Scientists

Is this one species?Understanding bird energyUnderstanding a river / ocean interactionUnderstanding a biochemical pathwayUnderstanding a cellUnderstanding a Heart or BrainUnderstanding RhododendraUnderstanding Evolution…

No One-Size fits all solutionsBut sharable re-usable components

Page 32: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Opportunities

Many, many …More than we can addressCompute needsData management needsData integration needs…

Must choose some pioneersTo meet a range of common requirementsTo provoke rich & high-level platformTo generate re-usable components

A Long-Term Commitment Needed

Page 33: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Advancing Biological Grid

Grid Plumbing & Security Infrastructure

Scheduling Accounting Authorisation

Monitoring Diagnosis Logging

Scientific Application

Data & Compute Resources

Distributed

Biological Users

Data Access

Data Integration

Structured Data

Biomedical (Grid) Application Component Library

Page 34: UK e-Science Grid Infrastructure meets Biological Research Challenges Malcolm Atkinson Director of National e-Science Centre  2 nd October

Summary

e-ScienceData as well as Compute Challenges

Needed to be put together

Need ubiquitous supported consistent platforms

GridA (potentially) invaluable platformOnly show in town

Data IntegrationHard Develop & Use Standard kit of partsStarted to build the kit

OpportunitiesNo one-size fits all, but re-usable subsystemsInvest in wider range of Problem driven pioneeringStrategic choices needed