26
Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

Empowering Distributed Science

Ian FosterArgonne National Laboratory

University of ChicagoGlobus Alliance

eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing

Page 2: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

3

It’s Amazing How Much We Have Achieved in 10 Years

Applications Production services: Grid3, ESG, Fusion, CMCS; also

NEESgrid, many others that use DOE tech) Infrastructure

Broadly deployed PKI and single sign on Access Grid at 300+ institutions worldwide

Leadership and technology Grid concepts & software used worldwide Global Grid Forum: standards & community GridFTP: California Illinois at 27 Gbit/s

Multicast almost works

Page 3: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

4

There’s Still Much to Do:Where We Should Be vs. Where We Are

Goal: Any DOE scientist can access any DOE computer, software, data, instrument ~25,000 scientists* (vs. ~1000 DOE certs) ~1000 instruments** (vs. maybe 10 online?) ~1000 scientific applns** (vs. 2 Fusion services) ~10 PB of interesting data** (vs. 100TB on ESG) ~100,000 computers* (vs. ~3000 on Grid3)

Not to mention many external partners

I.e., we need to scale by 2-3 orders of magnitude to have DOE-wide impact!

* Rough estimate; ** WAG

Page 4: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

5

“25,000 Scientists”:The Many Aspects of Scaling

Data & computational services integrated into the fabric of science communities Used not by a handful but by 1000s Part of everyday science workflows

Scale load on services by factors of 100+ 100,000 requests annually to fusion codes 1000 concurrent users for ESG services 25,000 users to authenticate & authorize

Manageability as a key new challenge Resource management and provisioning Automation of management functions

Page 5: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

6

“25,000 Scientists”:Authentication & Authorization

User-managed PKI credentials

Single sign on & delegation (GSI)

DOEGrids CA: 1250 users MyProxy & related tools WS-Security & SAML-based

authentication/authorization

Page 6: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

7

Authentication & Authorization:Next Steps

Integration with campus infrastructures “Authenticate locally, act globally” E.g., KX509, GridLogon, GridShib, etc.

Enabling access while enhancing security Create secure virtual national laboratories Tech & policy solns to risk/benefit tradeoffs

Evolving what we mean by “trust” Colleagues collaboration community

Scaling to the ultrascale Data volumes, data rates, transaction rates

Page 7: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

8“1000 Instruments”:The Scale of the Problem

Lawrence BerkeleyNational Lab

•Advanced Light Source•National Center for Electron Microscopy

•National Energy Research Scientific Computing Facility

Los Alamos NeutronScience Center

Univ. of IL• Electron Microscopy Center

for Materials Research • Center for Microanalysis of

Materials

MIT•Bates Accelerator Center

•Plasma Science & Fusion Center

SC User FacilitiesInstitutions that Use SC Facilities

Fermi National Accelerator Lab•Tevatron

Stanford Linear Accelerator Center

•B-Factory•Stanford Synchrotron Radiation Laboratory

Princeton Plasma Physics Lab

GeneralAtomics

- DIII-D Tokamak

SC Laboratories

Pacific Northwest National Lab

• Environmental Molecular Sciences Lab

Argonne National Lab• Intense Pulsed Neutron Source•Advanced Photon Source•Argonne Tandem Linac Accelerator System

BrookhavenNational Lab

•Relativistic Heavy Ion Collider

•National Synchrotron Light Source

Oak Ridge National Lab•High-Flux Isotope Reactor Surface Modification & Characterization Center

•Spallation Neutron Source (under construction)

Thomas Jefferson NationalAccelerator Facility

•Continuous Electron Beam Accelerator Facility

Physics AcceleratorsSynchrotron Light SourcesNeutron SourcesSpecial Purpose FacilitiesLarge Fusion Experiments

Sandia Combustion Research Facility

James R. MacDonaldLaboratory

Page 8: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

9

For Example: NSF Network for Earthquake Engineering Simulation

Links instruments, data, computers, people

Page 9: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

10

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

NEESgrid: How it Really Happens(A Simplified View)

WebBrowser

ComputeServer

GlobusMCS/RLS

DataViewer

Tool

CertificateAuthority

CHEF ChatTeamlet

MyProxy

CHEF

ComputeServer

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

Globus IndexService

GlobusGRAM

GlobusGRAM

OGSADAI

OGSADAI

OGSADAI

Application Developer

2

Off the Shelf

9

Globus

Toolkit5

Grid Community

3

Page 10: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

11

Scaling to 1000 Instruments:

Challenges

Common teleoperation control interfaces NEESgrid Network Telecontrol Protocol (NTCP)

provides service-oriented interface: nice start? Major social & organizational challenges

Operating instruments as shared facilities Data sharing policies and mechanisms

Basic technological challenges also Provisioning/QoS for multi-modal experiments Hierarchical/latency tolerant control algorithms Reliability, health, and safety

Page 11: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

12

“1000 Applications”:Software as Service

Software is increasingly central to almost every aspect of DOE science

Service interfaces are needed for broad adoption: “shrink wrap” isn’t the answer

TransP production service: 1662 runs in FY03

Page 12: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

13

Software as Service:What If You Have 1000s of Users?

Service-oriented applications Wrapping applications as

Web services Composing applications

into workflows Service-oriented

infrastructure Provisioning physical

resources to support application workloads

ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Provisioning

Page 13: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

14“10 PB Data”:Distributed Data Integration

Major challenges in four dimensions Number & distribution of data sources Volume of data Diversity in data format, quality, semantics Sophistication & scale of data analysis

Experiments &Instruments

Simulationsfacts

facts

answers

questions

?Literature

Other Archives facts

facts

Page 14: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

15

Distributed Data Integration:Examples of Where We Are Today

Earth System Grid:O(100TB) online data

STAR: 5 TB transfer(SRM, GridFTP)

NASA/NVO: Mosaicsfrom multiple sources

Bertram Ludäscher’sexamples

Page 15: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

16

Distributed Data Integration:Enabling Automated Analysis

Data ingest Managing many petabytes Common schema and ontologies How to organize petabytes? Reorganize it? Interactive & batch analysis performance Universal annotation infrastructure Query, analysis, visualization tools

Page 16: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

17

Tomorrow?

“100,000 Computers”:A Healthy Computing Pyramid

Supercomputer

Cluster

Desktop

Today

Supercomputers USE SPARINGLY

Desktop100,000SERVINGS

Specializedcomputers2-3 SERVINGS

Clusters100s of

SERVINGS

Page 17: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

18Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 8 substantial applications + CS experiments Running since October 2003

Korea

http://www.ivdgl.org/grid2003

Page 18: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

19

ExampleGrid2003Workflows

Genome sequence analysis

Physicsdata

analysis

Sloan digital sky

survey

Page 19: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

20Example Grid2003 Application:NVO Mosaic Construction

NVO/NASA Montage: A small (1200 node) workflow

Construct custom mosaics on demand from multiple data sources

User specifies projection, coordinates, size, rotation, spatial sampling

Work by Ewa Deelman et al., USC/ISI and Caltech

Page 20: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

21Invocation Provenance

Completion status and resource usage

Attributes of executable transformation

Attributes of input and output files

Page 21: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

22

“100,000 Computers”:Future Challenges

New modes of working that are driven by (& drive) massive increases in computing Enabling massive data analysis, simulation-driven

problem solving, application services These make massively parallel computing essential, not

an academic curiousity More pesky security & policy challenges Technological challenges

Reliability, performance, usability as infrastructure, workflows, data volumes, user community scale by 2+ orders of magnitude

Manageability again

Page 22: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

23

Cross-cutting Challenges Institutionalize infrastructure

Broad deployment & support at sites Software as infrastructure Legitimate (& challenging) security concerns

Expand range of resource sharing modalities Research aimed at federating not just data & computers,

but workflow and semantics Scale data size, community sizes, etc., etc.

Reach new application domains Sustain current collaboratory pilots, and start new ones

of similar or greater ambition

Page 23: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

24

Summary: It’s Amazing How Much We Have Achieved in 10 Years

Applications Production services: Grid3, ESG, Fusion, CMCS; also

NEESgrid, many others that use DOE tech) Infrastructure

Broadly deployed PKI and single sign on Access Grid at 300+ institutions worldwide

Leadership and technology Grid concepts & software used worldwide Global Grid Forum: standards & community GridFTP: California Illinois at 27 Gbit/s

Multicast almost works

Page 24: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

25

But Over Those Same 10 Years:Dramatic Change

Exponential growth in network speed, data volume, computer speed, collaboration size E.g., 155 Mb/s 10 Gb/s (ESnet backbone)

eScience methods no longer optional but now vital to scientific competitiveness

We’ve demonstrated feasibility of eScience, but we are far from DOE-wide adoption

We have moved forward, but we’ve also fallen behind

Page 25: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

26

The $3.4B Question

Future science will be dominated by “eScience”

Europe is investing heavily in eScience EU: ~$70M/yr for “Grid” infrastructure, tech UK: ~$60M/yr for eScience apps and tech German, Italian, Dutch, etc., programs

Asia Pacific is investing heavily in eScience Japan, China, South Korea, Singapore, Australia

all have programs

How does DOE stay competitive?

Page 26: Empowering Distributed Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance eScience [n]: Large-scale science carried out

27

We Have Done Much, But Have Much More to Do

Any DOE scientist can access any DOE computer, software, data, instrument ~25,000 scientists* (vs. ~1000 DOE certs) ~1000 instruments** (vs. maybe 10 online?) ~1000 scientific applns** (vs. 2 Fusion services) ~10 PB of interesting data** (vs. 100TB on ESG) ~100,000 computers* (vs. ~3000 on Grid3)

Not to mention many external partners

We need to scale by 2-3 orders of magnitude to have DOE-wide impact

* Rough estimate; ** WAG