National Computational Science Alliance
Supercomputing: Directions in Technology, Architecture and Applications
• Keynote Talk to Supercomputer’98 in Mannheim, Germany
• June 18, 1998
National Computational Science Alliance
Supercomputing: Directions in Technology, Architecture and Applications
Abstract
"By using the results of the Top 500 over the last five years, one can easily trace out the complete transformation of the supercomputer industry. In 1993, none of the Top500 was made by a broadly based market driven company, while today over 3/4 of the Top500 are made by SGI, IBM, HP, or Sun. Similarly, vector architectures have been replaced in market share by microprocessor based SMPs. We now see a strong move to replace many MPPs and SMPs by the new architecture of Distributed Shared Memory (DSM) such as the SGI Origin or HP SPP series. A key trend is the move toward clusters of DSMs instead of monolithic MPPs. The next major change will be the emergence of Intel processors replacing RISC processors, particularly the Intel Merced processor which should become dominant shortly after 2000. A major battle will shape up between UNIX and Microsoft's NT operating systems, particularly at the lower end of the Top500. Finally, with each new architecture comes a new set of applications we can now attack. I will discuss how DSM will enable dynamic load balancing needed to support the multi-scale problems that teraflop machines will enable us to tackle."
National Computational Science Alliance
NCSA is the Leading Edge Site for the National Computational Science Alliance
www.ncsa.uiuc.edu
National Computational Science Alliance
Scientific Applications Continue to Require Exponential Growth in Capacity
MACHINE REQUIREMENT IN FLOPS1010 1012 1014 1016 1018 1020
1995 NSF Capability
108
2000 NSF Leading Edge
Molecular Dynamics for Biological Molecules
Computational Cosmology
Turbulent Convection
in Stars
Atomic/Diatomic Interaction
QCD1012
MEMORY
BYTES 1010
108
1014
= Long Range Projections from Recent Applications Workshop = Next Step Projections by NSF Grand Challenge Research Teams= Recent Computations by NSF Grand Challenge Research Teams
ASCI in 2004
100 year climate model in hours
NSF in 2004 (Projected)
From Bob Voigt, NSF
National Computational Science Alliance
The Promise of the Teraflop - From Thunderstorm to National-Scale Simulation
Simulation by Wilhelmson, et al.;Figure from Supercomputing and the Transformation of Science, Kaufmann and Smarr, Freeman, 1993
National Computational Science Alliance
Accelerated Strategic Computing Initiative is Coupling DOE Defense Labs to Universities• Access to ASCI Leading Edge Supercomputers• Academic Strategic Alliances Program • Data and Visualization Corridors
http://www.llnl.gov/asci-alliances/centers.html
National Computational Science Alliance
Comparison of the DoE ASCI and the NSF PACI Origin Array Scale Through FY99
www.lanl.gov/projects/asci/bluemtn/Hardware/schedule.html
Los Alamos Origin System FY995-6000 processors
NCSA Proposed System FY996x128 and 4x64=1024 processors
National Computational Science AllianceFuture Upgrade Under Negotiation with NSF
NCSA Combines Shared Memory Programming with Massive Parallelism
CM-5
CM-2
National Computational Science Alliance
The Exponential Growth of NCSA’s SGI Shared Memory Supercomputers
1
10
100
1000
10000
Jan
-94
Jan
-95
Jan
-96
Jan
-97
Jan
-98
Jan
-99
Jan
-00
Jan
-01
SG
I Pro
cess
ors
Doubling Every Nine Months!
Challenge
Power Challenge
Origin
SN1
National Computational Science Alliance
TOP500 Systems by Vendor
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
CRI
SGI
IBM
Convex
HP
SunTMC
IntelDEC
JapaneseOther
0
100
200
300
400
500Ju
n-9
3
No
v-93
Jun
-94
No
v-94
Jun
-95
No
v-95
Jun
-96
No
v-96
Jun
-97
No
v-97
Jun
-98
Nu
mb
er o
f S
yste
ms
Other
Japanese
DEC
Intel
TMC
Sun
HP
Convex
IBM
SGI
CRI
National Computational Science Alliance
Average User MFLOPS
Nu
mb
er o
f U
sers
0
50
100
15020 40 60 80
100
120
140
160
180
200
220
240
260
280
300
March, 1992 - February, 1993 Average Performance, Users > 0.5 CPU Hour
Cray Y-MP4 / 64
Average Speed 70 MFLOPS
Peak Speed MIPS R8000
Peak Speed Y-MP1
Why NCSA Switched From Vector to RISC Processors
NCSA 1992 Supercomputing Community
National Computational Science Alliance
Replacement of Shared Memory Vector Supercomputers by Microprocessor SMPs
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
Top
500
Inst
alle
d S
C’s
Ju
n-9
3
Ju
n-9
4
Ju
n-9
5
Ju
n-9
6
Ju
n-9
7
Jun
-980
100
200
300
400
500MPPSMP/DSMPVP
National Computational Science Alliance
Top500 Shared Memory Systems
Vector Processors Microprocessors
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
PVP Systems
0
100
200
300
Ju
n-9
3
No
v-93
Ju
n-9
4
No
v-94
Ju
n-9
5
No
v-95
Ju
n-9
6
No
v-96
Ju
n-9
7
No
v-97
Ju
n-9
8
Nu
mb
er o
f S
yste
ms Europe
Japan
USA
SMP + DSM Systems
0
100
200
300
Ju
n-9
3
No
v-93
Ju
n-9
4
No
v-94
Ju
n-9
5
No
v-95
Ju
n-9
6
No
v-96
Ju
n-9
7
No
v-97
Ju
n-9
8
Nu
mb
er o
f S
yste
ms
USA
National Computational Science Alliance
Simulation of the Evolution of the Universe on a Massively Parallel Supercomputer
12 Billion Light Years 4 Billion Light Years
Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume -688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society
http://www.mpg.de/universe.htm
National Computational Science Alliance
Limitations of Uniform Grids for Complex Scientific and Engineering Problems
Source: Greg Bryan, Mike Norman, NCSA
512x512x512 Run on 512-node CM-5
Gravitation Causes Continuous
Increase in Density Until There is a Large Mass in a
Single Grid Zone
National Computational Science Alliance
Use of Shared Memory Adaptive Grids To Achieve Dynamic Load Balancing
Source: Greg Bryan, Mike Norman, John Shalf, NCSA
64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge,Locally Equivalent to 8192x8192x8192 Resolution
National Computational Science Alliance
1
10
100
1000
10000
100000
1000000
1
16
31
46
61
76
91
10
6
12
1
13
6
15
1
16
6
18
1Rank
CP
U-H
ou
rs B
urn
ed 100k to 1 M
10k to 100k
1k to 10k
100 to 1k
10 to 1001 to 10
Extreme and Large PIs Dominant Usage of NCSA Origin
January thru April, 1998
National Computational Science Alliance
Disciplines Using the NCSA Origin 2000CPU-Hours in March 1995
Particle Physics
Chemistry
Materials Sciences
Engineering CFD
Astronomy
Physics
Industry
Molecular Biology Other
National Computational Science Alliance
A Variety of Discipline Codes -Single Processor Performance Origin vs. T3E
0
20
40
60
80
100
120
140
160
Origin T3E
Sin
gle
Pro
ce
ss
or
MF
LO
PS
QMC
RIEMANN
Laplace
QCD
PPM
PIMC
ZEUS
National Computational Science Alliance
0
1
2
3
4
5
6
70
10
20
30
40
50
60
Processors
Gig
afl
op
s
Origin-DSM
Origin-MPI
NT-MPI
SP2-MPI
T3E-MPI
SPP2000-DSM
Solving 2D Navier-Stokes Kernel - Performance of Scalable Systems
Source: Danesh Tafti, NCSA
Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
National Computational Science Alliance
Alliance PACS Origin2000 Repository
http://scv.bu.edu/SCV/Origin2000/
Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGIJohn Connolly, U Kentucky Developing Repository for HP Exemplar
National Computational Science Alliance
• NEC SX-5– 32 x 16 vector processor SMP– 512 Processors– 8 Gigaflop Peak Vector Processor
• IBM SP– 256 x 16 RISC Processor SMP– 4096 Processors– 1 Gigaflop Peak RISC Processor
• SGI Origin Follow-on – 32 x 128 RISC Processor DSM– 4096 Processors– 1 Gigaflop Peak EPIC Processor
High-End Architecture 2000-Scalable Clusters of Shared Memory Modules
Each is 4 Teraflops Peak
National Computational Science Alliance
Emerging Portable Computing Standards
• HPF• MPI• OpenMP• Hybrids of MPI and OpenMP
National Computational Science Alliance
Basket of Applications Average Performance as Percentage of Linpack Performance
0
200
400
600
800
1000
1200
1400
1600
1800
T90 C90 SPP-2000
SP2-160
Origin195
PCA
Linpack
Apps. Ave.
22%
25%
14% 19%
33% 26%
Applications Codes:
CFDBiomolecular
ChemistryMaterials
QCD
National Computational Science Alliance
Harnessing Distributed UNIX Workstations - University of Wisconsin Condor Pool
Condor Cycles
CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc)
National Computational Science Alliance
NT Workstation Shipments Rapidly Surpassing UNIX
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1995 1996 1997
Wo
rkst
atio
ns
Sh
ipp
ed (
Mill
ion
s)
UNIX
NT
Source: IDC, Wall Street Journal, 3/6/98
National Computational Science Alliance
First Scaling Testing of ZEUS-MP on CRAY T3E and Origin vs. NT Supercluster
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoftaccess.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html
Zeus-MP Hydro Code Running Under MPI
• Alliance Cosmology Team• Andrew Chien, UIUC • Rob Pennington, NCSA
0
20
40
60
80
100
120
140
T3
E
Orig
in
NT
Sin
gle
Pro
cesso
r S
peed
o
n Z
EU
S-M
P (
MF
LO
PS
)
0
1
2
3
4
5
6
7
8
0
20
40
60
80
100
120
140
160
180
200
Processors
GFLOPS
T3E
Origin
NT/Intel
National Computational Science Alliance
NCSA NT Supercluster Solving Navier-Stokes Kernel
Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
Single Processor Performance:MIPS R10k 117 MFLOPSIntel Pentium II 80 MFLOPS
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Processors
Sp
ee
du
p
NT MPI
Origin MPI
Origin SM
Perfect
0
1
2
3
4
5
6
7
0
10
20
30
40
50
60
70
Processors
Gig
afl
op
s
NT MPI
Origin MPI
Origin SM
National Computational Science Alliance
Near Perfect Scaling of Cactus - 3D Dynamic Solver for the Einstein GR Equations
0
20
40
60
80
100
1200
20
40
60
80
10
0
12
0Processors
Sc
alin
g
Origin
NT SC
Ratio of GFLOPsOrigin = 2.5x NT SC
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
Cactus was Developed by Paul Walker, MPI-PotsdamUIUC, NCSA
National Computational Science Alliance
NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops
http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html
• Parallel Computing on NT Clusters– Briand Sanderson, NCSA– Microsoft Co-Funds Development
• Features– Based on Microsoft DCOM– Batch or Interactive Modes– Application Development Wizards
• Current Status & Future Plans– Symbio Developer Preview 2 Released– Princeton University Testbed