NCeSS International Conference Manchester Progress with e-Science? Malcolm Atkinson e-Science...

Preview:

Citation preview

NCeSSInternational Conference

Manchester

Progress with e-Science?

Malcolm Atkinson

e-Science Institute & e-Science Envoy

www.nesc.ac.uk29th June 2006

Overview

A ContextE-Science, Service-oriented Architectures & ServicesInternational & UK SceneProjects, Activities & Technologies

A Wealth of Opportunities & ChallengesCauses of Data GrowthInterpretational challengesSoftware challenges

Crucial IssuesUsability & AbstractionInteroperation & FederationsCultural challenges of e-Science

What is e-Science?Goal: to enable better research in all disciplinesMethod: Develop collaboration supported by advanced distributed computation

to generate, curate and analyse rich data resources

From experiments, observations and simulations Quality management, preservation and reliable evidence

to develop and explore models and simulations Computation and data at all scales Trustworthy, economic, timely and relevant results

to enable dynamic distributed collaboration Facilitating collaboration with information and resource sharing Security, trust, reliability, accountability, manageability and agility

Our challenge is to develop an integrated approach to all three

Are we making any progress?

General purpose e-InfrastructureSupports most research requirementsConvenient & reliableAmortises costs over wide communities

General purpose e-Science methodsConduct of multi-disciplinary collaborationDistributed collaborationUse of diverse information sourcesUse of advanced computational methods

Web Services

Generic & Industry SupportedMechanism for presenting any resource so it can be used remotely

Data resourcesComputation resourcesInstrumentsResearch processes & procedures

Wide range of choices as to how it may be doneEasy for provider to make localised decisionsQuick deployment Onus on users to compose services Little opportunity to amortise costs across services

An obvious first step & components of e-Infrastructure

A Grid Computing Timeline

I-Way

: Sup

erCom

putin

g ‘95

1995 ’96 ’97 ’98 ’99 2000 ’01 ’02 ’03 ’04 ’05 2006US G

rid F

orum

form

s at S

C ‘98

Grid F

orum

s mer

ge, fo

rm G

GF

OGSA-WG fo

rmed

OGSA v1.0

“Ana

tomy”

pape

r

“Phy

siolog

y” pa

per

Europ

ean

& AP G

rid F

orum

s

• UK e-Science programme starts• UK e-Science programme starts

• DARPA funds Globus Toolkit & Legion• EU funds UNICORE project• US DoE pioneers grids for scientific research• NSF funds National Technology Grid• NASA starts Information Power Grid

• DARPA funds Globus Toolkit & Legion• EU funds UNICORE project• US DoE pioneers grids for scientific research• NSF funds National Technology Grid• NASA starts Information Power Grid

Today:• Grid solutions are common for HPC• Grid-based business solutions are

becoming common• Required technologies & standards are

evolving

Today:• Grid solutions are common for HPC• Grid-based business solutions are

becoming common• Required technologies & standards are

evolving

Japan government funds:• Business Grid project • NAREGI project

Japan government funds:• Business Grid project • NAREGI project

Source: Hiro Kishimoto GGF17 Keynote May 2006

GGF & E

GA

form

OGF

What is a Grid?

LicenseLicense

PrinterPrinter

A grid is a system consisting of− Distributed but connected resources and − Software and/or hardware that provides and manages logically

seamless access to those resources to meet desired objectives

A grid is a system consisting of− Distributed but connected resources and − Software and/or hardware that provides and manages logically

seamless access to those resources to meet desired objectives

R2AD

DatabaseDatabase

Webserver

Webserver

Data CenterCluster

Handheld Supercomputer

Workstation

Server

Source: Hiro Kishimoto GGF17 Keynote May 2006

Source: Hiro Kishimoto GGF17 Keynote May 2006

Grid & Related Paradigms

Utility Computing• Computing “services”• No knowledge of provider• Enabled by grid technology

Utility Computing• Computing “services”• No knowledge of provider• Enabled by grid technology

Distributed Computing• Loosely coupled• Heterogeneous• Single Administration

Distributed Computing• Loosely coupled• Heterogeneous• Single Administration

Cluster• Tightly coupled• Homogeneous• Cooperative working

Cluster• Tightly coupled• Homogeneous• Cooperative working

Grid Computing• Large scale• Cross-organizational• Geographical distribution• Distributed Management

Grid Computing• Large scale• Cross-organizational• Geographical distribution• Distributed Management

How Are Grids Used?

High-performance computing

Collaborative data-sharing

Collaborative design

Drug discovery

Financial modeling

Data center automation

High-energy physics

Life sciences

E-Business

E-Science

Source: Hiro Kishimoto GGF17 Keynote May 2006

Grids In Use: E-Science Examples

• Data sharing and integration− Life sciences, sharing standard data-sets,

combining collaborative data-sets− Medical informatics, integrating hospital information

systems for better care and better science− Sciences, high-energy physics

• Data sharing and integration− Life sciences, sharing standard data-sets,

combining collaborative data-sets− Medical informatics, integrating hospital information

systems for better care and better science− Sciences, high-energy physics

• Capability computing− Life sciences, molecular modeling, tomography− Engineering, materials science− Sciences, astronomy, physics

• Capability computing− Life sciences, molecular modeling, tomography− Engineering, materials science− Sciences, astronomy, physics

• High-throughput, capacity computing for − Life sciences: BLAST, CHARMM, drug screening− Engineering: aircraft design, materials, biomedical− Sciences: high-energy physics, economic modeling

• High-throughput, capacity computing for − Life sciences: BLAST, CHARMM, drug screening− Engineering: aircraft design, materials, biomedical− Sciences: high-energy physics, economic modeling

• Simulation-based science and engineering− Earthquake simulation

• Simulation-based science and engineering− Earthquake simulation

Source: Hiro Kishimoto GGF17 Keynote May 2006

Grids

(Potentially) Generic & Industry SupportedMechanism for presenting many heterogeneous resources so they can be used remotely

Data resourcesComputation resourcesInstrumentsResearch processes & procedures

Restricting choices as to how it may be doneHarder for provider to make localised decisionsDeployment can be challenging

Providing more homogeneity through virtualisation

Should be easier to compose servicesMore opportunity to amortise costs

A component of e-Infrastructure

How much do Grids help a researcher?

The grid per se doesn’t provideSupported e-Science methodsSupported data & information resourcesComputations Convenient access

Grids help providers of theseInternational & national secure e-InfrastructureStandards for interoperationStandard APIs to promote re-use

But Research Support must be builtWhat is needed?Who should do it?

Commitment to e-Infrastructure

A shared resourceThat enables science, research, engineering, medicine, industry, …It will improve UK / European / … productivity

Lisbon Accord 2000 e-Science Vision SR2000 –

John Taylor

Commitment by UK government

Sections 2.23-2.25

Always there c.f. telephones, transport,

power

Globus ApacheProject & CDIG

CeSC (Cambridge)

e-Science Institute

The e-Science On The Map

Today

EGEE-II

NGSSupportCentre

NationalCentre fore-SocialScience

National Institute

forEnvironmental

e-Science

NERCe-Science

Centre

OMII-UKOMII-UKOMII-UK

Funded centres

National Grid

Service

DigitalCurationCentre

And much more besides

Supported data & information servicesE.g. EBI, MIMAS, EDINA, Zoo-DB, …

Community effortsE.g. GridPP, AstroGrid, NERC DataGrid

Access to new scientific facilitiesE.g. Satellite observations, Met-Office dataE.g. DiamondE.g. CEASAR, HPCx & HECTOR

Communities developing code & toolsPilot projects in most research areas

A Complex multi-provider System

StrengthsResilience – no over-dependence on one set of decisionsDynamic & rapid response to needResources gathered through many channelsSpecific user requirements understood

WeaknessesComplexity & possibly unnecessary duplicationReinforces old silosComposition and amortisation harder

Is more coordinationDesirable?Feasible?

©

OMII-UK & National Grid Service:Life Sciences Gateway

SOAPLab Services

Core NGS Middleware

Bioinformatics & Life Science Users

Hardware Resources

OMII: e-Infrastructure Services

OGSA-DAIGridSAM

GRIMOIRES

TavernaUpperware

Underware

Middleware

Talk to us about other ‘gateways’• Computational Chemistry• Engineering• Image Analysis•…

The National Grid Service Core sites

White Rose (Leeds) Manchester

Oxford CCLRC

Partner sites Bristol Cardiff

Lancaster

Access to HPC facilities

HPCxCSARCapacity

300 + CPUs 30+ Tera Bytes

Specialist facilitiesCardiff 4x16 proc SGI

Bristol: IntelLancaster SUN Cluster

Services: Job submission (GT2) Storage service (SRB)

Oracle ServiceOGSA-DAI

Ian Bird, SA1, EGEE Final Review 23-24th May 2006

Enabling Grids for E-sciencE

INFSO-RI-508833

A global, federated e-Infrastructure

EGEE infrastructure~ 200 sites in 39 countries~ 20 000 CPUs> 5 PB storage> 20 000 concurrent jobs per day> 60 Virtual Organisations

EUIndiaGrid

EUMedGrid

SEE-GRID

EELA

BalticGrid

EUChinaGridOSGNAREGI

Related projects & collaborations are where the future expansion of resources will come from

Project Anticipated resources (initial estimates)

Related Infrastructure projects

SEE-grid 6 countries, 17 sites, 150 cpu

EELA 5 countries, 8 sites, 300 cpu

EUMedGrid 6 countries

BalticGrid 3 countries, fewx100 cpu

EUChinaGrid TBC

Collaborations

OSG 30 sites, 10000 cpu

ARC 15 sites, 5000 cpu

DEISA Supercomputing resources

Ian Bird, SA1, EGEE Final Review 23-24th May 2006

Enabling Grids for E-sciencE

INFSO-RI-508833

Use of the infrastructure

Total

non-LCG0

5000

10000

15000

20000

25000

30000

35000

Jan-05 Feb-05 Mar-05 Apr-05 May-05 Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06

No

. jo

bs/

day

Sustained & regular workloads of >30K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Sustained & regular workloads of >30K jobs/day• spread across full infrastructure• doubling/tripling in last 6 months – no effect on operations

Massive data transfers > 1.5 GB/sMassive data transfers > 1.5 GB/s

About Grid Computing Now!

A Knowledge Transfer Network project funded by the DTI Technology Programme aimed at transferring knowledge about Grid Computing Technologies to Public and Private Sectors in the UK.

Partnership between Intellect, the UK Hi-Tech Trade Association; National e-Science Centre, a world leader in Grid Computing research; and CNR Ltd, a consultancy focused on SME organisations and business intermediaries.

Substantial number of industrial, business and academic partners

WebsiteBackground InformationIndustry News/EventsUser Case Studies

Events programmeTechnical OverviewsMultiple vendor perspectivesUser Case Studies

Sector AgendaHealthcare; Government; Telecoms; Services; etc..

User CommunityNetwork with peersFind useful contactsContribute experience

www.gridcomputingnow.org

INFSO-SSA-26637

Invest in People

• Training– Targeted– Immediate goals– Specific skills– Building a workforce

• Education– Pervasive– Long term and sustained– Generic conceptual models– Developing a culture

• Both are needed

Society

Graduates

EducationInnovation

Invests

PreparesCreate

Enriches

Organisation

Skilled Workers

TrainingServices & Applications

Invests

PreparesDevelop

Strengthens

Biomedical Research Informatics Delivered by Grid Enabled Services

Glasgow Edinburgh

Leicester Oxford

London

Netherlands

Publically Curated Data

Private data

Private data

Private data

Private data

Private data

Private data

CFG Virtual Organisation Ensembl

MGI

HUGO

OMIM

SWISS-PROT

… DATA HUB

RGD

SyntenyGrid

Service

blast

+

Portal

http://www.brc.dcs.gla.ac.uk/projects/bridges/

Overview of e-Infrastructure

eDiaMoND: Screening for Breast Cancer

1 Trust Many TrustsCollaborative WorkingAudit capabilityEpidemiology

Other Modalities-MRI-PET-Ultrasound

Better access toCase informationAnd digital tools

Supplement MentoringWith access to digitalTraining cases and sharingOf information acrossclinics

LettersRadiology reportingsystems

eDiaMoNDGrid

2ndary CaptureOr FFD

Case Information

X-Rays andCase Information

DigitalReading

SMF

Case andReading Information

CAD Temporal Comparison

Screening

ElectronicPatient Records

Assessment/ SymptomaticBiopsy

Case andReading Information

Symptomatic/AssessmentInformation

Training

Manage Training Cases

Perform Training

SMF CAD 3D Images

Patients

Provided by eDiamond project: Prof. Sir Mike Brady et al.

climateprediction.net and GENIE

• Largest climate model ensemble

• >45,000 users, >1,000,000 model years

10K2K

Response of Atlantic circulation to freshwater forcing

Courtesy of David Gavaghan & IB Team

Integrative Biology

Tackling two Grand Challenge research questions:

• What causes heart disease?• How does a cancer form and grow?

Together these diseases cause 61% of all UK deaths

Will build a powerful, fault-tolerant Grid infrastructure for biomedical science

Enabling biomedical researchers to use distributed resources such as high-performance computers, databases and visualisation tools to develop complex models of how these killer diseases develop.

Courtesy of David Gavaghan & IB Team

IB Partners

Foundations of Collaboration

Strong commitment by individualsTo work togetherTo take on communication challengesMutual respect & mutual trust

Distributed technologyTo support information interchangeTo support resource sharingTo support data integrationTo support trust building

Sufficient timeCommon goalsComplementary knowledge, skills & data

Can we predictwhen it will work?Can we findremedies when itdoesn’t?

CARMEN - Scales of Integration

resolving the ‘neural code’ from the timing of action potential activity

determining ion channel contribution to the timing of action potentials

examining integration within networks of differing dimensions

Understanding the brain may be the greatest

informatics challenge of the 21st century

CARMEN Consortium

Leadership & Infrastructure

Colin Ingram

Paul Watson

Leslie Smith Jim Austin

CARMEN Consortium

International Partners

Daniel Gardner(Cornell)

Lead for the US NIH, Neuroscience Information Framework and Brain ML

Shiro Usui(RIKEN Brain Science Institute)

Lead for the Japan Node of the International Neuroinformatics Coordinating Facility

Sten Grillner(Karolinska Institute)

Chairman of the OECD,International Neuroinformatics Coordinating Facility

Ad Aertsen(Freiburg)

Neural network modelling and large-scale simulations

George Gerstein(Pennsylvania)

Analysis of spike pattern trains

CARMEN Consortium

Commercial Partners

- applications in the pharmaceutical sector

- interfacing of data acquisition software

- application of infrastructure

- commercialisation of tools

16th March 2006

NanoCMOSgriD Meeting the Design Challenges of Nano-CMOS Electronics

The ChallengeThe Challenge

International Tech nology Roadmap for Semiconductors Year 2005 2010 2015 2020

MPU Half Pitch (nm) 90 45 25 14

MPU Gate Length (nm) 32 18 10 6

2005 edition Toshiba 04

Device diversification

90nm: HP, LOP, LSTP

45nm: UTB SOI

32nm: Double gate

25 nm

Bulk MOSFET

FD SOI

UTB SOI

FinFET

HP(MPU)

LOP

LSTP

Stat.Sets

230 nm

Bulk MOSFET

Standard

SingleSet

16th March 2006

NanoCMOSgriD Meeting the Design Challenges of Nano-CMOS Electronics

Advanced Processor Technologies Group (APTGUM)

Device Modelling Group (DMGUG)

Electronic Systems Design Group (ESDGUS)

Intelligent Systems Group (ISGUY)

National e-Science Centre (NeSC)

Microsystems Technology Group (MSTGUG)

Mixed-Mode Design Group in IMNS (MMDGUE)

e-Science NorthWest Centre (eSNW)

University PartnersUniversity Partners

16th March 2006

NanoCMOSgriD Meeting the Design Challenges of Nano-CMOS Electronics

Industrial PartnersIndustrial PartnersGlobal EDS vendor and world TCAD leader600 licences of grid implementation, model implementation

UK fabless design company and world microprocessor leaderCore IP, simulation tools, staff time

UK fabless design company and world mixed mode leaderAdditional PhD studentship for mixed mode design

Global semiconductor player with strong UK presenceAccess to technology, device data, processing

Global semiconductor player with strong UK presenceAccess to technology, device data, processing

Global semiconductor player with UK presenceCASE studentship, interconnects

Trade association of the microelectronics industry in the UKRecruiting new industrial partners and dissemination

Compound Causes of (Geo)Data Growth

Faster devicesCheaper devicesHigher-resolution

all ~ Moore’s law

Increased processor throughput more derived data

Cheaper & higher-volume storageRemote data more accessible

Public policy to make research data availableBandwidth increasesLatency doesn’t get less though

Interpretational Challenges

Finding & Accessing dataVariety of mechanisms & policies

Interpreting dataVariety of forms, value systems & ontologies

Independent provision & ownershipAutonomous changes in availability, form, policy, …

Processing dataUnderstanding how it may be relatedDevising models that expose the relationships

Presenting resultsHumans need either

Derived small volumes of statistics Visualisations

Interpretational Challenges

Finding & Accessing dataVariety of mechanisms & policies

Interpreting dataVariety of forms, value systems & ontologies

Independent provision & ownershipAutonomous changes in availability, form, policy, …

Processing dataUnderstanding how it may be relatedDevising models that expose the relationships

Presenting resultsHumans need either

Derived small volumes of statistics Visualisations

Interpretational Challenges

Finding & Accessing dataVariety of mechanisms & policies

Interpreting dataVariety of forms, value systems & ontologies

Independent provision & ownershipAutonomous changes in availability, form, policy, …

Processing dataUnderstanding how it may be relatedDevising models that expose the relationships

Presenting resultsHumans need either

Derived small volumes of statistics Visualisations

Collaboration

Essential to assemble expertsMulti-discipline, Multi-organisation, Multi-national, Multi-scale

Hard to achieveInstinctive competitionTrust slow to buildCommunication is difficult

RequirementsLeadershipInvestmentNew cultureTechnology Cross commercial – academic boundaries

Address these issues

Focus here

Towards Accessible e-Science

High-level toolsAbstractionMetadata-driven interpretation & guidanceWell-defined semanticsAutomated data management is key

Convenient user-controlled compositionLego-brick convenience + Precision controlUnderstood by scientists, engineers, modellers, diagnosticians, …

Responsibility & CreditProvenance tracking automated & understoodCulture developed for consortia and collaboration

Built on Dependable Infrastructure

Global FederationsDependable and persistent facilitiesAlways there & always onConsistent for mobile usersConsistent for mobile code & mobile information

Affordable e-InfrastructureBased on well-established standardsBased on well-honed operating proceduresInvestment preserved through stabilityUtility improved through agile developmentAmortised across disciplines and nationsConsistent framework for multiple provisions

Trustworthy management of information

Collaboration

Collaboration is a Key IssueMulti-disciplinaryMulti-nationalAcademia & industry

Trustworthy data sharing key for collaboration

Plenty of opportunities for research and innovationEstablish common frameworks where possible

Islands of stability – reference points for data integration

Establish international standards and cooperative behaviour

Extend incrementally

Trustworthy code & service sharing also key

Federation

Federation is a Key IssueMulti-organisationMulti-purposeMulti-nationalAcademia & industry

Build shared standards and ontologiesRequires immense effortRequires critical mass of adoptionRequires sufficient time

Trustworthy code & e-Infrastructure sharingEconomic & social necessityBalance independence & self-motivationWith social & technical constraints

Major Intellectual Challenges

RequireMany approaches to be integratedMany minds engagedMany years of effort

Using the SystemsRequires well-tuned modelsWell-tuned relationships between systems & peopleFlexibility, adaptability & agility

Enabling thisIs itself a major intellectual challenge

Matched Systems

Communities of ResearchersE-Infrastructure FacilitiesEach will evolveCan you help make this co-evolution successful?

Recommended