View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Empowering Distributed Science
Ian FosterArgonne National Laboratory
University of ChicagoGlobus Alliance
eScience [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing
3
It’s Amazing How Much We Have Achieved in 10 Years
Applications Production services: Grid3, ESG, Fusion, CMCS; also
NEESgrid, many others that use DOE tech) Infrastructure
Broadly deployed PKI and single sign on Access Grid at 300+ institutions worldwide
Leadership and technology Grid concepts & software used worldwide Global Grid Forum: standards & community GridFTP: California Illinois at 27 Gbit/s
Multicast almost works
4
There’s Still Much to Do:Where We Should Be vs. Where We Are
Goal: Any DOE scientist can access any DOE computer, software, data, instrument ~25,000 scientists* (vs. ~1000 DOE certs) ~1000 instruments** (vs. maybe 10 online?) ~1000 scientific applns** (vs. 2 Fusion services) ~10 PB of interesting data** (vs. 100TB on ESG) ~100,000 computers* (vs. ~3000 on Grid3)
Not to mention many external partners
I.e., we need to scale by 2-3 orders of magnitude to have DOE-wide impact!
* Rough estimate; ** WAG
5
“25,000 Scientists”:The Many Aspects of Scaling
Data & computational services integrated into the fabric of science communities Used not by a handful but by 1000s Part of everyday science workflows
Scale load on services by factors of 100+ 100,000 requests annually to fusion codes 1000 concurrent users for ESG services 25,000 users to authenticate & authorize
Manageability as a key new challenge Resource management and provisioning Automation of management functions
6
“25,000 Scientists”:Authentication & Authorization
User-managed PKI credentials
Single sign on & delegation (GSI)
DOEGrids CA: 1250 users MyProxy & related tools WS-Security & SAML-based
authentication/authorization
7
Authentication & Authorization:Next Steps
Integration with campus infrastructures “Authenticate locally, act globally” E.g., KX509, GridLogon, GridShib, etc.
Enabling access while enhancing security Create secure virtual national laboratories Tech & policy solns to risk/benefit tradeoffs
Evolving what we mean by “trust” Colleagues collaboration community
Scaling to the ultrascale Data volumes, data rates, transaction rates
8“1000 Instruments”:The Scale of the Problem
Lawrence BerkeleyNational Lab
•Advanced Light Source•National Center for Electron Microscopy
•National Energy Research Scientific Computing Facility
Los Alamos NeutronScience Center
Univ. of IL• Electron Microscopy Center
for Materials Research • Center for Microanalysis of
Materials
MIT•Bates Accelerator Center
•Plasma Science & Fusion Center
SC User FacilitiesInstitutions that Use SC Facilities
Fermi National Accelerator Lab•Tevatron
Stanford Linear Accelerator Center
•B-Factory•Stanford Synchrotron Radiation Laboratory
Princeton Plasma Physics Lab
GeneralAtomics
- DIII-D Tokamak
SC Laboratories
Pacific Northwest National Lab
• Environmental Molecular Sciences Lab
Argonne National Lab• Intense Pulsed Neutron Source•Advanced Photon Source•Argonne Tandem Linac Accelerator System
BrookhavenNational Lab
•Relativistic Heavy Ion Collider
•National Synchrotron Light Source
Oak Ridge National Lab•High-Flux Isotope Reactor Surface Modification & Characterization Center
•Spallation Neutron Source (under construction)
Thomas Jefferson NationalAccelerator Facility
•Continuous Electron Beam Accelerator Facility
Physics AcceleratorsSynchrotron Light SourcesNeutron SourcesSpecial Purpose FacilitiesLarge Fusion Experiments
Sandia Combustion Research Facility
James R. MacDonaldLaboratory
9
For Example: NSF Network for Earthquake Engineering Simulation
Links instruments, data, computers, people
10
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
NEESgrid: How it Really Happens(A Simplified View)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
CHEF ChatTeamlet
MyProxy
CHEF
ComputeServer
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
Globus IndexService
GlobusGRAM
GlobusGRAM
OGSADAI
OGSADAI
OGSADAI
Application Developer
2
Off the Shelf
9
Globus
Toolkit5
Grid Community
3
11
Scaling to 1000 Instruments:
Challenges
Common teleoperation control interfaces NEESgrid Network Telecontrol Protocol (NTCP)
provides service-oriented interface: nice start? Major social & organizational challenges
Operating instruments as shared facilities Data sharing policies and mechanisms
Basic technological challenges also Provisioning/QoS for multi-modal experiments Hierarchical/latency tolerant control algorithms Reliability, health, and safety
12
“1000 Applications”:Software as Service
Software is increasingly central to almost every aspect of DOE science
Service interfaces are needed for broad adoption: “shrink wrap” isn’t the answer
TransP production service: 1662 runs in FY03
13
Software as Service:What If You Have 1000s of Users?
Service-oriented applications Wrapping applications as
Web services Composing applications
into workflows Service-oriented
infrastructure Provisioning physical
resources to support application workloads
ApplnService
ApplnService
Users
Workflows
Composition
Invocation
Provisioning
14“10 PB Data”:Distributed Data Integration
Major challenges in four dimensions Number & distribution of data sources Volume of data Diversity in data format, quality, semantics Sophistication & scale of data analysis
Experiments &Instruments
Simulationsfacts
facts
answers
questions
?Literature
Other Archives facts
facts
15
Distributed Data Integration:Examples of Where We Are Today
Earth System Grid:O(100TB) online data
STAR: 5 TB transfer(SRM, GridFTP)
NASA/NVO: Mosaicsfrom multiple sources
Bertram Ludäscher’sexamples
16
Distributed Data Integration:Enabling Automated Analysis
Data ingest Managing many petabytes Common schema and ontologies How to organize petabytes? Reorganize it? Interactive & batch analysis performance Universal annotation infrastructure Query, analysis, visualization tools
17
Tomorrow?
“100,000 Computers”:A Healthy Computing Pyramid
Supercomputer
Cluster
Desktop
Today
Supercomputers USE SPARINGLY
Desktop100,000SERVINGS
Specializedcomputers2-3 SERVINGS
Clusters100s of
SERVINGS
18Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 8 substantial applications + CS experiments Running since October 2003
Korea
http://www.ivdgl.org/grid2003
19
ExampleGrid2003Workflows
Genome sequence analysis
Physicsdata
analysis
Sloan digital sky
survey
20Example Grid2003 Application:NVO Mosaic Construction
NVO/NASA Montage: A small (1200 node) workflow
Construct custom mosaics on demand from multiple data sources
User specifies projection, coordinates, size, rotation, spatial sampling
Work by Ewa Deelman et al., USC/ISI and Caltech
21Invocation Provenance
Completion status and resource usage
Attributes of executable transformation
Attributes of input and output files
22
“100,000 Computers”:Future Challenges
New modes of working that are driven by (& drive) massive increases in computing Enabling massive data analysis, simulation-driven
problem solving, application services These make massively parallel computing essential, not
an academic curiousity More pesky security & policy challenges Technological challenges
Reliability, performance, usability as infrastructure, workflows, data volumes, user community scale by 2+ orders of magnitude
Manageability again
23
Cross-cutting Challenges Institutionalize infrastructure
Broad deployment & support at sites Software as infrastructure Legitimate (& challenging) security concerns
Expand range of resource sharing modalities Research aimed at federating not just data & computers,
but workflow and semantics Scale data size, community sizes, etc., etc.
Reach new application domains Sustain current collaboratory pilots, and start new ones
of similar or greater ambition
24
Summary: It’s Amazing How Much We Have Achieved in 10 Years
Applications Production services: Grid3, ESG, Fusion, CMCS; also
NEESgrid, many others that use DOE tech) Infrastructure
Broadly deployed PKI and single sign on Access Grid at 300+ institutions worldwide
Leadership and technology Grid concepts & software used worldwide Global Grid Forum: standards & community GridFTP: California Illinois at 27 Gbit/s
Multicast almost works
25
But Over Those Same 10 Years:Dramatic Change
Exponential growth in network speed, data volume, computer speed, collaboration size E.g., 155 Mb/s 10 Gb/s (ESnet backbone)
eScience methods no longer optional but now vital to scientific competitiveness
We’ve demonstrated feasibility of eScience, but we are far from DOE-wide adoption
We have moved forward, but we’ve also fallen behind
26
The $3.4B Question
Future science will be dominated by “eScience”
Europe is investing heavily in eScience EU: ~$70M/yr for “Grid” infrastructure, tech UK: ~$60M/yr for eScience apps and tech German, Italian, Dutch, etc., programs
Asia Pacific is investing heavily in eScience Japan, China, South Korea, Singapore, Australia
all have programs
How does DOE stay competitive?
27
We Have Done Much, But Have Much More to Do
Any DOE scientist can access any DOE computer, software, data, instrument ~25,000 scientists* (vs. ~1000 DOE certs) ~1000 instruments** (vs. maybe 10 online?) ~1000 scientific applns** (vs. 2 Fusion services) ~10 PB of interesting data** (vs. 100TB on ESG) ~100,000 computers* (vs. ~3000 on Grid3)
Not to mention many external partners
We need to scale by 2-3 orders of magnitude to have DOE-wide impact
* Rough estimate; ** WAG