Cactus 4.0
Cactus Computational Toolkit and Distributed Computing
• Solving Einstein’s Equations – Impact on computation
• Large collaborations essential and difficult! – Code becomes the collaborating tool.
• Cactus, a new community code for 3D GR-Astrophysics– Toolkit for many PDE systems– Suite of solvers for Einstein system
• Metacomputing for the general user– Distributed computing experiments with
Cactus and Globus
Gabrielle Allen,Ed SeidelAlbert-Einstein-InstitutMPI-Gravitationsphysik
Gabrielle Allen,Ed SeidelAlbert-Einstein-InstitutMPI-Gravitationsphysik
Einstein’s Equations and Gravitational Waves
• Einstein’s General Relativity– Fundamental theory of Physics (Gravity)– Black holes, neutron stars, gravitational waves, ...– Among most complex equations of physics
• Dozens of coupled, nonlinear hyperbolic-elliptic equations with 1000’s of terms
• New field: Gravitational Wave Astronomy– Will yield new information about the Universe– What are gravitational waves? “Ripples in the curvature of spacetime”
• A last major test of Einstein’s theory: do they exist?– Eddington: “Gravitational waves propagate at the speed of thought”– 1993 Nobel Prize Committee: Hulse-Taylor Pulsar (indirect evidence)
s(t) h = s/s ~ 10-22 ! Colliding BH’s and NS’s...
Detecting Gravitational Gravitational Waves
• LIGO, VIRGO (Pisa), GEO600,…$1 Billion Worldwide
We need results from numerical relativity to:
• Detect them…pattern matching against numerical templates to enhance signal/noise ratio
• Understand them…just what are the waves telling us?
4km
Hanford Washington Site
Merger Waveform Must Be Found Numerically
Teraflop computation, AMR, elliptic-hyperbolic, ???
Axisymmetric Black Hole Simulations: Cray C90
Evolution of Highly Distorted Black Hole
Collision of two Black Holes (“Misner Data”)
Computational Needs for 3D Numerical Relativity
• Finite Difference Codes~ 104 Flops/zone/time step
~ 100 3D arrays
• Currently use 2503
~ 15 GBytes
~ 15 TFlops/time step
• Need 10003 zones ~1000 GBytes
~1000 TFlops/time step
• Need TFlop, TByte machine
• Need Parallel AMR, I/O
• Initial Data: 4 couple nonlinear elliptics•Time step update
• explicit hyperbolic update• also solve elliptics
t=0
t=100
Mix of Varied Technologies and Expertise!
• Scientific/Engineering:– formulation of equations, equation of state, astrophysics, hydrodynamics ...
• Numerical Algorithms:– Finite differences? Finite elements? Structured meshes?
– Hyperbolic equations: explicit vs implicit, shock treatments, dozens of methods (and presently nothing is fully satisfactory!)
– Elliptic equations: multigrid, Krylov subspace, spectral, preconditioners (elliptics currently require most of the time…)
– Mesh Refinement?
• Computer Science:– Parallelism (HPF, MPI, PVM, ???)
– Architecture Efficiency (MPP, DSM, Vector, NOW, ???)
– I/O Bottlenecks (generate gigabytes per simulation, checkpointing…)
– Visualization of all that comes out!
• Clearly need huge teams, with huge expertise base to attack such problems…
• … in fact need collections of communities
• But how can they work together effectively?
• Need a code environment that encourages this…
NSF Black Hole Grand Challenge Alliance
• University of Texas (Matzner, Browne)• NCSA/Illinois/AEI (Seidel, Saylor,
Smarr, Shapiro, Saied)• North Carolina (Evans, York)• Syracuse (G. Fox)• Cornell (Teukolsky)• Pittsburgh (Winicour)• Penn State (Laguna, Finn)
Develop CodeTo Solve G 0
NASA Neutron Star Grand Challenge
• NCSA/Illinois/AEI (Saylor, Seidel, Swesty, Norman)• Argonne (Foster)• Washington U (Suen)• Livermore (Ashby)• Stony Brook (Lattimer)
“A Multipurpose Scalable Code for Relativistic Astrophysics”
Develop CodeTo Solve G 8T
What we learn from Grand Challenges
• Successful, but also problematic…– No existing infrastructure to support collaborative HPC
– Many scientists are Fortran programmers, and NOT computer scientists
– Many sociological issues of large collaborations and different cultures
– Many language barriers …
… Applied mathematicians, computational scientists, physicists have very different concepts and vocabularies…
– Code fragments, styles, routines often clash
– Successfully merged code (after years) often impossible to transplant into more modern infrastructure (e.g., add AMR or switch to MPI…)
• Many serious problems … this is what the Cactus Code seeks to address
What Is Cactus?
• Cactus was developed as a general, computational framework for solving PDEs (originally in numerical relativity and astrophysics)
• Modular … for easy development, maintenance and collaborations. Users supply “thorns” which plug-into compact core “flesh”
• Configurable … thorns register parameter, variable and scheduling information with “runtime function registry” (RFR). Object-orientated inspired features
• Scientist friendly … thorns written in F77, F90, C, C++• Accessible parallelism … driver layer (thorn) is hidden from
physics thorns by a fixed flesh interface
What Is Cactus?
• Standard interfaces … interpolation, reduction, IO, coordinates. Actual routines supplied by thorns
• Portable … Cray T3E, Origin, NT/Win9*, Linux, O2, Dec Alpha, Exemplar, SP2
• Free and open community code … distributed under the GNU GPL. Uses as much free software as possible
• Up-to-date … new computational developments and/or thorns immediately available to users (optimisations, AMR, Globus, IO)
• Collaborative … thorn structure makes it possible for large number of people to use and development toolkits … the code becomes the collaborating tool
• New version … Cactus beta-4.0 released 30th August
Core Thorn Arrangements Provide Tools
• Parallel drivers (presently MPI-based)• (Mesh refinement schemes: Nested Boxes, DAGH,
HLL)• Parallel I/O for Output, Filereading, Checkpointing
(HDF5, FlexIO, Panda, etc…)• Elliptic solvers (Petsc, Multigrid, SOR, etc…)• Interpolators• Visualization Tools (IsoSurfacer)• Coordinates and boundary conditions• Many relativity thorns• Groups develop their own thorn arrangements to add
to these
Cactus 4.0
IOFlexIO
FLESH(Parameters, Variables, Scheduling)
IOHDF5
PUGH
WaveToyF90
CartGrid3D
GrACE
Boundary
WaveToyF77
Current Status
• It works: many people, with different backgrounds, different personalities, on different continents, working together effectively on problems of common interest.
• Dozens of physics/astrophysics and computational modules developed and shared by “seed” community
• Connected modules work together, largely without collisions• Test suites used to ensure integrity of both code and physics
• How to get it …
• Workshop 27 Sept - 1 Oct NCSAhttp://www.ncsa.uiuc.edu/SCD/Training/
Movie from Werner Benger, ZIB
Near Perfect Scaling
0
20
40
60
80
100
120
0
20
40
60
80
10
0
12
0
Processors
Sc
alin
g
Origin
NT SC
Cactus Scaling on T3E-600
192
760
5980
47900
100
1000
10000
100000
1 10 100 1000
Number of Processors
Cactus on T3E 600 Total Mflops/sec
• Excellent scaling on many architectures– Origin up to 128 processors
– T3E up to 1024
– NCSA NT cluster up to 128 processors
• Achieved 142 Gflops/s on 1024 node T3E-1200 (benchmarked for NASA NS Grand Challenge)
Many Developers: Physics & Computational Science
DAGH/AMR(UTexas)
AEI
NASA
NCSA
Valencia
ZIB
Panda I/O(UIUC CS)
Globus(Foster)
Petsc(Argonne)
SGI
Wash. U
HDF5
FlexIO
Metacomputing: harnessing power when and where it is needed
• Easy access to available resources
– Find Resources for interactive use: Garching? ZIB? NCSA? SDSC?
– Do I have an account there? What’s the password?
– How do get executable there?
– Where to store data?
– How to launch simulation. What are local queue structure/OS idiosyncracies?
Metacomputing: harnessing power when and where it is needed
• Access to more resources
– Einstein equations require extreme memory, speed
– Largest supercomputers too small!
– Networks very fast!• DFN gigabit testbed: 622 Mbits Potsdam-Berlin-Garching,
connect multiple supercomputers• Gigabit networking to US possible• Connect workstations to make supercomputer
Metacomputing: harnessing power when and where it is needed
• Acquire resources dynamically during simulation!– Need more resolution in one area
• Interactive visualization, monitoring and steering from anywhere– Watch simulation as it progresses … live visualisation– Limited bandwidth: compute vis. online with simulation– High bandwidth: ship data to be visualised locally– Interactive Steering
• Are parameters screwed up? Very complex?• Is memory running low? AMR! What to do? Refine selectively or
acquire additional resources via Globus? Delete unnecessary grids?
Metacomputing: harnessing power when and where it is needed
• Call up an expert colleague … let her watch it too– Sharing data space– Remote collaboration tools– Visualization server: all privileged users can login and check
status/adjust if necessary
Globus: Can provide many such services for Cactus
• Information (Metacomputing Directory Service: MDS) – Uniform access to structure/state information:
Where can I run Cactus today?
• Scheduling (Globus Resource Access Manager: GRAM)– Low-level scheduler API:
How do I schedule Cactus to run at NCSA?
• Communications (Nexus)– Multimethod communication + QoS management:
How do I connect Garching and ZIB together for a big run?
• Security (Globus Security Infrastructure)– Single sign-on, key management:
How do I get authority at SDSC for Cactus?
Globus: Can provide many such services for Cactus
• Health and status (Heartbeat monitor): Is my Cactus run dead?
• Remote file access (Global Access to Secondary Storage: GASS):
How do I manage my output, and get executable to Argonne?
Colliding Black Holes and MetaComputing: German Project supported by DFN-Verein
• Solving Einstein’s Equations
• Developing Techniques to Exploit High Speed Networks
• Remote Visualization
• Distributed Computing Across OC-12 Networks between AEI (Potsdam), Konrad-Zuse-Institut (Berlin), and RZG (Garching-bei-München)
AEI
Distributing Spacetime: SC’97 Intercontinental Metacomputing at AEI/Argonne/Garching/NCSA
Immersadesk
512 Node T3E
Metacomputing the Einstein Equations:Connecting T3E’s in Berlin, Garching, San Diego
Collaborators
• A distributed astrophysical simulation involving the following institutions:– Albert Einstein Institute (Potsdam, Germany)– Washington University St. Louis, MO.– Argonne National Laboratory (Chicago, IL)– NLANR Distributed Applications Team (Champaign, IL)
• The following supercomputer centers:– San Diego Supercomputer Center (268 proc. T3E)– Konrad-Zuse-Zentrum in Berlin (232 proc. T3E)– Max-Planck-Institute in Garching (768 proc. T3E)
The Grand Plan
• Distribute simulation across 128 PE’s of SDSC T3E and 128 PE’s of Konrad-Zuse-Zentrum T3E in Berlin, using Globus
• Visualize isosurface data in real-time on Immersadesk in Orlando• Transatlantic bandwidth from an OC-3 ATM network
SC98 Neutron Star Collision
Movie from Werner Benger, ZIB
Cactus scaling across PE’s(Jason Novotny, NLANR)
Analysis of metacomputing experiments
• It works! (That’s the main thing we wanted at SC98…)• Cactus not optimized for metacomputing: messages too
small, lower MPI bandwidth, could be better:– ANL-NCSA
• Measured bandwidth 17Kbits/sec (small) --- 25Mbits/sec (large)• Latency 4ms
– Munich-Berlin• Measured bandwidth 1.5Kbits/sec (small) --- 4.2Mbits/sec (large)• Latency 42.5ms
– Within single machine: Order of magnitude better
• Bottom Line:– Expect to be able to improve performance significantly– Can run much larger jobs on multiple machines– Start using Globus routinely for job submission
The Dream: not far away...
Physics Module 1
Garching T3E
Globus Resource Manager
Cactus/Einstein solver
MPI, MG, AMR, DAGH, Viz, I/O, ...
NCSA Origin 2000 array
Mass storage
BH Initial Data
BuddingEinstein inBerlin...
Ultra 3000:Whatever-Wherever
Cactus 4.0 Credits
• Cactus flesh and design – Gabrielle Allen
– Tom Goodale
– Joan Massó
– Paul Walker
• Computational toolkit – Flesh authors
– Gerd Lanferman
– Thomas Radke
– John Shalf
• Development toolkit – Bernd Bruegmann
– Manish Parashar
– Many others
• Relativity and astrophysics – Flesh authors
– Miguel Alcubierre
– Toni Arbona
– Carles Bona
– Steve Brandt
– Bernd Bruegmann
– Thomas Dramlitsch
– Ed Evans
– Carsten Gundlach
– Gerd Lanferman
– Lars Nerger
– Mark Miller
– Hisaaki Shinkai
– Ryoji Takahashi
– Malcolm Tobias
• Vision and Motivation – Bernard Schutz
– Ed Seidel "the Evangelist"
– Wai-Mo Suen