NSF’s Evolving Cyberinfrastructure Program
Guy Almes <[email protected]>Office of Cyberinfrastructure
Oklahoma Supercomputing Symposium 2005Norman5 October 2005
2
National Science
FoundationOverview
Cyberinfrastructure in Context
Existing Elements
Organizational Changes
Vision and High-performance Computing planning
Closing thoughts
3
National Science
Foundation
Cyberinfrastructure in Context
Due to the research university’s mission: each university wants a few people from each key research specialty
therefore, research colleagues are scattered across the nation / world
Enabling their collaborative work is key to NSF
4
National Science
Foundation
Traditionally, there were two approaches to doing science: theoretical / analytical experimental / observational
Now the use of aggressive computational resources has led to third approach in silico simulation / modeling
5
National Science
Foundation
Cyberinfrastructure Vision
A new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.
[NSF Blue Ribbon Panel report, 2003]
6
National Science
Foundation
Historical Elements Supercomputer Center program from 1980s
NCSA, SDSC, and PSC leading centers ever since
NSFnet program of 1985-95 connect users to (and through) those centers 56 kb/s to 1.5 Mb/s to 45 Mb/s within ten years
Sensors: telescopes, radars, environmental, but treated in an ad hoc fashion
Middleware: of growing importance, but underestimated in importance
7
National Science
Foundation
‘97
Partnerships for Advanced Computational Infrastructure
• Alliance (NCSA-led)• NPACI (SDSC-led)
‘93
HayesReport
BranscombReport
‘95 ‘99
PITACReport
Terascale Computing
Systems
‘00
ITRProjects
ETFManagement & Operations
‘03
AtkinsReport
FY‘05‘08
Core Support
• NCSA• SDSC
Discipline-specificCI Projects
Supercomputer Centers
• PSC• NCSA• SDSC• JvNC• CTC
‘85
8
National Science
Foundation
Explicit Elements Advanced Computing
Variety of strengths, e.g., data-, compute- Advanced Instruments
Sensor networks, weather radars, telescopes, etc.
Advanced Networks Connecting researchers, instruments, and computers together in real time
Advanced Middleware Enable the potential sharing and collaboration
Note the synergies!
9
National Science
Foundation
CRAFT: A normative example – Sensors + network + HEC
Univ OklahomaNCSA and PSCInternet2UCAR Unidata ProjectNational Weather Service
10
National Science
Foundation
Current Projects within OCI
Office of Cyberinfrastructure HEC + X Extensible Terascale Facility (ETF) International Research Network Connections
NSF Middleware Initiative Integrative Activities: Education, Outreach & Training
Social and Economic Frontiers in Cyberinfrastructure
11
National Science
Foundation
TeraGrid: One Component• A distributed system of unprecedented scale•30+ TF, 1+ PB, 40 Gb/s net
• Unified user environment across resources•User software environment User support resources
• Integrated new partners to introduce new capabilities•Additional computing, visualization capabilities
•New types of resources: data collections, instruments
• Built a strong, extensible Team
• Created an initial community of over 500 users, 80 PIs
• Created User Portal in collaboration with NMI
courtesy Charlie Catlett
12
National Science
Foundation
Key TeraGrid Resources Computational
very tightly coupled clusters LeMieux and Red Storm systems at PSC
tightly coupled clusters Itanium2 and Xeon clusters at several sites
data-intensive systems DataStar at SDSC
memory-intensive systems Maverick at TACC and Cobalt at NCSA
experimental MD-Grape system at Indiana and BlueGene/L at SDSC
13
National Science
Foundation
Online and Archival Storage e.g., more than a PB online at SDSC
Data Collections numerous
Instruments Spallation Neutron Source at Oak Ridge
Purdue Terrestrial Observatory
14
National Science
Foundation
TeraGrid DEEP Examples
Lattice-Boltzman SimulationsPeter Coveney, UCLBruce Boghosian, Tufts
Joel Saltz, OSUReservoir Modeling
Animation pointed to by 2003 Nobel chemistry prize
announcement.Klaus Schulten, UIUC
Aquaporin Mechanism
Groundwater/Flood ModelingDavid Maidment, Gordon Wells, UT
Atmospheric ModelingKelvin Droegemeier,
OU
Advanced Support for TeraGrid Applications:TeraGrid staff are “embedded” with applications to create
-Functionally distributed workflows
-Remote data access, storage and visualization
-Distributed data mining
-Ensemble and parameter sweeprun and data management
courtesy Charlie Catlett
15
National Science
Foundation
Cyberresources
Key NCSA Systems Distributed Memory Clusters
Dell (3.2 GHz Xeon): 16 Tflops Dell (3.6 GHz EM64T): 7 Tflops IBM (1.3/1.5 GHz Itanium2): 10 Tflops
Shared Memory Clusters IBM p690 (1.3 GHz Power4): 2 Tflops SGI Altix (1.5 GHz Itanium2): 6 Tflops
Archival Storage System SGI/Unitree (3 petabytes)
Visualization System SGI Prism (1.6 GHz Itanium2+GPUs)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
courtesy NCSA
16
National Science
Foundation
Cyberresources
Recent Scientific Studies at NCSA
Computational Biology
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Weather Forecasting
Molecular Science
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Earth Science
courtesy NCSA
17
National Science
Foundation
Computing: One Size Doesn’t Fit All
courtesy SDSC
Science Areas
Multi-physics & multi-scale
Dense linear algebra FFTs
Particle methods AMR
Data parallelism
Irregular control flow
Nanoscience X X X X X XCombustion X X X X XFusion X X X X X XClimate X X X X XAstrophysics X X X X X X X
Algorithm Requirements
Trade-off
Interconnect fabric Processing power Memory I/O
P
M
P
M
P
M
Interconnect
18
National Science
Foundation
Computing: One Size Doesn’t Fit All
Dat
a ca
pabi
lity
(Inc
reas
ing
I/O
and
sto
rage
)
Compute capability(increasing FLOPS)
SDSC Data Science Env
Campus, Departmental and
Desktop Computing
Traditional HEC Env
QCD
Protein Folding
CPMD
NVOEOL
CIPRes
SCECVisualization
Data Storage/Preservation Extreme I/O
1. 3D + time simulation
2. Out-of-CoreENZOVisualization
CFD
ClimateSCEC
Simulation ENZOsimulation
Can’t be done on Grid(I/O exceeds WAN)
Distributed I/OCapable
courtesy SDSC
19
National Science
Foundation
SDSC Resources COMPUTE SYSTEMS DataStar
2,396 Power4+ pes IBM p655 and p690 4 TB total memory Up to 2 GB/s I/O
to disk TeraGrid Cluster
512 Itanium2 pes 1 TB total memory
Intimidata Early IBM
BlueGene/L 2,048 PowerPC pes 128 I/O nodes
SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES
User Services Application/Community Collaborations Education and Training SDSC Synthesis Center Community SW, toolkits, portals, codes
DATA ENVIRONMENT 1 PByte SAN 6 PB StorageTek tape library DB2, Oracle, MySQL Storage Resource Broker HPSS 72-CPU Sun Fire 15K 96-CPU IBM p690s
Support for community
data collections and databases
Data management, mining,
analysis, and preservation
courtesy SDSC
20
National Science
Foundation
Pittsburgh Supercomputing Center
“Big Ben” System• Cray Redstorm XT3
• based on Sandia system• Working with Cray, SNL, ORNL• Approximately 2000 compute nodes
• 1 GB memory/node• 2 TB total memory
• 3D toroidal-mesh• 10 Teraflops• MPI latency: < 2µs (neighbor)
• < 3.5 µs (full system)• Bi-section BW: 2.0/2.9/2.7 TB/s (x,y,z)• Peak link BW: 3.84 GB/s• 400 sq. ft. floor space• < 400 KW power• Now operational
• NSF award in Sept. 2004•Oct. 2004 Cray announced Commercial version of Redstorm, XT3
courtesy PSC
21
National Science
Foundation
I-Light, I-Light2, and the TeraGrid Network Resource
courtesy IU and PU
22
National Science
Foundation
Purdue, Indiana Contributions to the
TeraGrid The Purdue Terrestrial
Observatory portal to the TeraGrid will deliver GIS data from IU and real-time remote sensing data from the PTO to the national research community
Complementary large facilities, including large Linux clusters
Complementary special facilities, e.g., Purdue NanoHub and Indiana University MD-GRAPE systems
Indiana and Purdue Computer Scientists are developing new portal technology that makes use of the TeraGrid (GIG effort)
courtesy IU and PU
23
National Science
Foundation
New Purdue RP resources
11 teraflops Community Cluster
(being deployed) 1.3 PB tape robot
Non-dedicated resources (opportunistic), defining a model for sharing university resources with the nation
courtesy IU and PU
24
National Science
Foundation
PTO, Distributed Datasets for Environmental
Monitoring
courtesy IU and PU
25
National Science
Foundation
TeraGrid as Integrative Technology A likely key to ‘all’ foreseeable NSF HPC capability resources
Working with OSG and others, work even more broadly to encompass both capability and capacity resources
Anticipate requests for new RPs Slogans:
Learn once, execute anywhere Whole is more than sum of parts
26
National Science
Foundation
TeraGrid as a Set of Resources
TeraGrid gives each RP an opportunity to shine
Balance: value of innovative/peculiar resourcesvs value of slogans
Opportunistic resources, SNS, Grapes as interesting examples
Note the stress on the allocation process
27
National Science
Foundation
2005 IRNC Awards Awards
TransPAC2 (U.S. – Japan and beyond) GLORIAD (U.S. – China – Russia – Korea) Translight/PacificWave (U.S. – Australia)
TransLight/StarLight (U.S. – Europe) WHREN (U.S. – Latin America)
Example use: Open Science Grid involving partners in U.S. and Europe, mainly supporting high energy physics research based on LHC
28
National Science
Foundation
NSF Middleware Initiative (NMI)
Program began in 2001 Purpose: To design, develop, deploy and support a set of reusable and expandable middleware functions that benefit many science and engineering applications in a networked environment
Program encourages open source development
Program funds mainly development, integration, deployment and support activities
29
National Science
Foundation
Example NMI-funded Activities
GridShib – integrating Shibboleth campus attribute services with Grid security infrastructure mechanisms
UWisc Build and Test facility – community resource and framework for multi-platform build and test of grid software
Condor – mature distributed computing system installed on 1000’s of CPU “pools” and 10’s of 1000’s of CPUs.
30
National Science
Foundation
Organizational Changes Office of Cyberinfrastructure
formed on 22 July 2005 had been a division within CISE
Cyberinfrastructure Council chair is NSF Director; members are ADs
Vision Document started HPC Strategy chapter drafted
Advisory Committee for Cyberinfrastructure
31
National Science
Foundation
Education &Training
DataTools &Services
Collaboration &Communication
Tools &Services
Cyberinfrastructure Components
High PerformanceComputing
Tools & Services
32
National Science
Foundation
Vision Document Outline
Call to Action Strategic Plans for …
High Performance Computing Data Collaboration and Communication Education and Workforce Development
Complete document by 31 March 2006
33
National Science
Foundation
Strategic Plan for High Performance
Computing Covers 2006-2010 period Enable petascale science and engineering by creating a world-class HPC environment Science-driven HPC Systems Architectures
Portable Scalable Applications Software Supporting Software
Inter-agency synergies will be sought
34
National Science
Foundation
Coming HPC Solicitation
There will be a solicitation issued this month
One or more HPC systems One or more RPs Rôle of TeraGrid Process driven by Science User needs Confusion about capacity/capability Workshops
Arlington -- 9 September Lisle -- 20-21 September
35
National Science
Foundation
IBM DataStar 10.4TF
I/O Intensive Platforms
ETFIntegrating Framework
HPC Platforms(2000-2005)
Tightly Coupled Platforms
SGI SMP system 6.6 TF
Dell Xeon Cluster 16.4 TF
Commodity Platforms
Marvel 0.3 TF
TCS LeMieux 6TF
IBM Itanium Cluster 8TF IBM Itanium Cluster 3.1 TF
Red Storm 10 TF
Cray-Dell Xeon Cluster 6.4 TF
IBM Cluster 0.2 TF
Purdue Cluster 1.7TF
Condor Pool 0.6 TF
36
National Science
Foundation
Cyberinfrastructure Vision
NSF will lead the development and support of a comprehensive cyberinfrastructure essential to 21st century advances
in science and engineering.
Internet2 Universities206 University Members, May 2005
Internet2 Universities206 University Members, May 2005Science Communities and Outreach
¥ Communities¥ CERNÕs Large Hadron Collider
experiments
¥ Physicists working in HEP andsimilarly data intensive scientificdisciplines
¥ National collaborators and thoseacross the digital divide indisadvantaged countries
¥ Scope¥ Interoperation between LHC
Data Grid Hierarchy and ETF
¥ Create and Deploy ScientificData and Services Grid Portals
¥ Bring the Power of ETF to bearon LHC Physics Analysis: Helpdiscover the Higgs Boson!
¥ Partners¥ Caltech
¥ University of Florida
¥ Open Science Grid and Grid3
¥ Fermilab
¥ DOE PPDG
¥ CERN
¥ NSF GriPhyn and iVDGL
¥ EU LCG and EGEE
¥ Brazil (UERJ,É )
¥ Pakistan (NUST, É )
¥ Korea (KAIST,É )
LHC Data Distribution Model