Upload
zachary-mcfarland
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
UK e-Science
Grid Infrastructure meets BiologicalResearch Challenges
Malcolm Atkinson
Director of National e-Science Centrewww.nesc.ac.uk
2nd October 2002
The UK Biological Grid — Data and ComputationThe Wellcome Trust Genome Campus
Hinxton, Cambridgeshire
Overview
UK e-ScienceReminder of Investment and Infrastructure
International e-ScienceExamples and Collaboration
Data Access and IntegrationLego Bricks for Scientific Application Developers
A Computer Scientist’s View of Biology
Diversity and Opportunity
The Way Ahead
e-Science
Fundamentally about CollaborationSharing
Ideas Thought processes and Stimuli Effort Resources
Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure
Scientists (Biologists) have done this for Centuries
e-Science (take 2)
Fundamentally about CollaborationSharing
Ideas Thought processes and Stimuli Effort Resources
Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure
Text, digital media, structured, organised & curated data, computable
models, visualisation, shared instruments, shared systems,
shared administration, …
Nationally & Internationally Distributed, …
Routine, Daily, Automated, …
That Requires very Significant Investment in DigitalSystems and their Support
e-Science (take 3)
Fundamentally about CollaborationSharing
Ideas Thought processes and Stimuli Effort Resources
Requires Communication Common understanding & Framework Mechanisms for sharing fairly Organisation and Infrastructure
Digital networks, digital work-places, digital
instruments, …
Metadata, ontologies, standards, shared curated
data, shared codes, …
Common platforms, shared software, shared training, …
The Grid SHOULD make this much easier byproviding a common, supported high-level of Software and Organisational infrastructure
Authentication, Authorisation, Accounting,
Provenance, Policies, …
Shared Provision of Platform,
Grid ExpectationsPersistence
Always there, Always Working, Always Supported
StabilityYou can build on foundations that don’t move
Trustworthy & PredictableHonours commitments
Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance
High-level & ExtensibleThe capabilities you need are already there
UbiquitousYour collaborators use it
Grid RealityPersistence
Always there, Always Working, Always Supported
StabilityYou can build on foundations that don’t move
Trustworthy & PredictableHonours commitments
Digital policies, digital contracts, security, … Data integrity, longevity and accessibility Performance
High-level & ExtensibleThe capabilities you need are already there
UbiquitousYour collaborators use it
Political, Economic & Technical issues to Solve
Early days but Open Grid Services link with Web
Services + GGF standardisation
Not yet but very substantialglobal effort to achieve this
Good basis for extensionCommitment to basic functionality
WS + Community effort
Global & Industrial Rallying CryMust work with Web Services
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Southampton
London
Belfast
Daresbury Lab
RALHinxton
UK Grid Network
Nationale-
ScienceCentre
Access Grid always-on video always-on video wallswalls
HPC(x)
National e-Science Centre
EventsWorkshopsResearch MeetingsInternational Meetings
History of EventsGGF5HPDC11Summer school > 50 workshops held> 1000 people in totalMany return often
Planned Events25 workshops Conferences to 2005
Visitors3 arrived4 arranged
International collaboration, visits & visitors
ChinaArgonne National LabSDSCNCSA…
Centre ProjectsPilot ProjectsRegional SupportResearch Projects
EPSRC, MRC, WT, SHEFC
A day in the life of NeSC
DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
tomographic reconstruction
real-timecollection
wide-areadissemination
desktop & VR clients with shared controls
Advanced Photon Source
Online Access to Scientific Instruments
archival storage
From Steve Tuecke 12 Oct. 01
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular Modeling and Bioinformatics, Urbana-Champaign
DataGrid Testbed
Dubna
Moscow
RAL
Lund
Lisboa
Santander
Madrid
Valencia
Barcelona
Paris
Berlin
LyonGrenoble
Marseille
BrnoPrague
Torino
Milano
BO-CNAFPD-LNL
Pisa
Roma
Catania
ESRIN
CERN
HEP sites
ESA sites
IPSL
Estec KNMI
(>40)
[email protected] - [email protected]
Testbed Sites
A Simplified Grid Anatomy
Grid Plumbing & Security Infrastructure
Scheduling Accounting Authorisation
Monitoring Diagnosis Logging
Scientific Application
Data & Compute ResourcesOperationsTeam
ApplicationDevelopers
Distributed
Owners
Scientific Users
A Biological Grid Anatomy
Grid Plumbing & Security Infrastructure
Scheduling Accounting Authorisation
Monitoring Diagnosis Logging
Scientific Application
Data & Compute Resources
Distributed
Biological Users
Data Access
Data Integration
Structured Data
Database Growth
PDB protein structures
Scientific Data
Deluge of DataExponential growth
Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months
Not How big it is but
Scientific Data
Deluge of DataExponential growth
Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months
Not How big it is butWhat you do with it
SharingCurationMetadataAutomated movement, access & integrationComputational Access
Scientific Data
Deluge of DataExponential growth
Doubling timesAstronomy 12 monthsBio-Sequences 9 monthsFunctional Genomics 6 monthsBytes/dollar 12 to 18 months
Not How big it is butHow you Embrace & Manage Change
The Database is a Knowledge chestThe Database is a Communication HubAutonomously Managed (Curated) changeAn Essential part of e-BioMedical Science
Wellcome Trust: Cardiovascular Functional Genomics
Glasgow Edinburgh
Leicester
Oxford
LondonNetherlands
Shared dataPublic curated
data
Data Access & Integration
Central to e-ScienceEspecially Earth Sciences, Ecology, Biology & Medicine
Collaboration Shared Databases Curated Knowledge Accumulated Observations Accumulated Simulations
Computation Data mining Input to models Calibration of models
Presentation Publication of results Visualisation
GGF DAIS WGChairs
Norman Paton (Manchester Uni.)Leanne Guy (CERN)Dave Pearson (Oracle UK)
ActivityBoF GGF4 TorontoWG Meeting GGF5 EdinburghPapers for GGF6Workshops & Mail lists
GoalsAgree Standards for Database Access & IntegrationFreely available reference implementations
OGSA-DAI one source & focus for discussions
Norman Paton,Inderpal Narang,
Leanne Guy, Susan Maliaka, Greg Ricardi, …
OGSA-DAI project
Lego kit for Data Access & IntegrationComponents for e-Science Applications
Accelerated Application DevelopmentMultiple Data Models
Distributed DataAccess via Grid & Proxies
Integration, Translation & Transformation
Open Source Reference Implementation
For DAIS-WG standard
Trigger for Component ConstructionStart a community
Oxford
Glasgow
Cardiff
Southampton
London
Belfast
Daresbury Lab
RAL
OGSA-DAI Partners
EPCC & NeSC
Newcastle
IBMUSA
IBM Hursley
Oracle
Manchester
EPCC & NeSCIBM UKIBM USAManchester e-SCNewcastle e-SCOracle £3 million, 18 months, started February 2002
Cambridge
Hinxton
Primary Components
Client
Consumer
GDS
GDSF
GDSR
DB
Advanced Components
Consumer
GDS Client
GDT
Translation
Translation
DB
GDS:PerformScript
Composed Components
Translation
Consumer
GDS
Translation
GDT
GDS:performScript
GDT
GDT
Client
GDS:performScript
GDS:performScript
GDS:performScript
Distributed Query
Registry R
Client
Consumer GDT
GDS
GDTV
DQP
GDT
GDTV
GDS
QPM
NS
F Factory
Evaluator
GDTV GDT
Evaluator
GDTV GDT
Evaluator
GDTV GDT
GDS
GDS
GDTV DB
T
Q
T
PNM
T
PNM
GDS
T
GDTV
D Q P : D is t ribu te d Q u e ry Pro ce s s o rG D T : G rid D a ta Tra n s po rtT : Tra n s la t io nQ : Q u e ryG D TV : G rid D a ta Tra n s po rt V e h icleF : Fa cto ryQ PM : Q u e ry Pro g re s M o n ito rPNM : Pro g re s s No t if ica t io n M e s s a g eA M : A pplica t io n M e ta da taC R M : C o m pu ta t io n a l R e s o u rce M e ta da taNS : No t if ica t io n S in k
1
2
5
3
4
5
5
7
7
6
6
7
7
7
7
(7) 8
6
OGSA-DAI Time Line
Feb ’02 May ’02 Jul ’02 Sep ’02 Dec ’02 Feb ’03 May ’03 Sep ’03
Ship Alpha Release for GT3 Integration
RDB + GT2 / OGSA Prototypes Available
XML + OGSA Prototype Available
Design Documents & Demos for DAIS WG @ GGF5
XML + OGSA Prototypes for Early Adopters
WS + GSI UK support ( > 100 downloads)
Phase 2 StartsPhase 1 Starts
Presentation & Beta @ GGF7
GGF6 WG Papers & Prototypes
Productisation, RAMPS &Extension
OGSA-DAI Summary
On Schedule & Going WellContributions via DAIS-WG @ GGF5 & 6Releases with GT3 Releases scheduledStatus: Early Days
Released prototypesTested Architectural DesignUsing OGSAWorking with Early Adopter Pilot Projects
AstroGrid & MyGrid
Influence OGSA-DAI directionVia DAIS-WG & Direct messages to us
Biomedical e-Scientists
Is this one species?Understanding bird energyUnderstanding a river / ocean interactionUnderstanding a biochemical pathwayUnderstanding a cellUnderstanding a Heart or BrainUnderstanding RhododendraUnderstanding Evolution…
No One-Size fits all solutionsBut sharable re-usable components
Opportunities
Many, many …More than we can addressCompute needsData management needsData integration needs…
Must choose some pioneersTo meet a range of common requirementsTo provoke rich & high-level platformTo generate re-usable components
A Long-Term Commitment Needed
Advancing Biological Grid
Grid Plumbing & Security Infrastructure
Scheduling Accounting Authorisation
Monitoring Diagnosis Logging
Scientific Application
Data & Compute Resources
Distributed
Biological Users
Data Access
Data Integration
Structured Data
Biomedical (Grid) Application Component Library
Summary
e-ScienceData as well as Compute Challenges
Needed to be put together
Need ubiquitous supported consistent platforms
GridA (potentially) invaluable platformOnly show in town
Data IntegrationHard Develop & Use Standard kit of partsStarted to build the kit
OpportunitiesNo one-size fits all, but re-usable subsystemsInvest in wider range of Problem driven pioneeringStrategic choices needed