20
Albert-Einstein-Institut www.aei-potsdam.mpg.de Cactus and Grid Computing The Cactus Team Here today: Ed Seidel, Gabrielle Allen Albert Einstein Institute [email protected] Cactus, a new community simulation code framework Toolkit for any PDE systems, ray tracing, etc... Suite of solvers for Einstein and astrophysics systems (CarlK: But Cactus is not an astro app) Grid Computing, remote collaborative tools: what a scientist really wants and needs (but may not yet realize…) QuickTime™ and a Motion JPEG A decompressor are needed to see this picture.

Cactus and Grid Computing

  • Upload
    anise

  • View
    34

  • Download
    4

Embed Size (px)

DESCRIPTION

Cactus and Grid Computing. Cactus, a new community simulation code framework Toolkit for any PDE systems, ray tracing, etc... Suite of solvers for Einstein and astrophysics systems (CarlK: But Cactus is not an astro app) - PowerPoint PPT Presentation

Citation preview

Page 1: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Cactus and Grid ComputingThe Cactus Team

Here today: Ed Seidel, Gabrielle AllenAlbert Einstein Institute

[email protected]

• Cactus, a new community simulation code framework– Toolkit for any PDE systems, ray tracing, etc...– Suite of solvers for Einstein and astrophysics

systems (CarlK: But Cactus is not an astro app)

• Grid Computing, remote collaborative tools: what a scientist really wants and needs (but may not yet realize…)

QuickTime™ and aMotion JPEG A decompressor

are needed to see this picture.

Page 2: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Computational Needs for 3D Numerical Relativity:Can’t fulfill them now, but about to change...

• InitialData: 4 coupled nonlinear elliptics• Evolution

• hyperbolic evolution• coupled with elliptic eqs.

t=0

t=100

Multi TFlop, Tbyte machine essential

• Explicit Finite Difference Codes– ~ 104 Flops/zone/time step– ~ 100 3D arrays

• Require 10003 zones or more– ~1000 Gbytes– Double resolution: 8x memory, 16x Flops

• Parallel AMR, I/O essential• A code that can do this could be useful to

other projects (we said this in all our grant proposals)!– Last few years devoted to making this useful

across disciplines…– All tools used for these complex simulations

available for other branches of science, engineering…

• Scientist/engineer wants to know only that!– But what algorithm? architecture?

parallelism?, etc...

Page 3: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

CactusNew concept in community developed simulation code infrastructure

• Developed as response to needs of big community projects– NSF Black Hole Grand Challenge, NASA NS GC, etc…maybe Geophysics GC...– New: EU Network about to be Grid Enabled!

• Numerical/computational infrastructure to solve PDE’s• Freely available, Open Source community framework: spirit of gnu/linux

– Many communities contributing to Cactus• Cactus Divided in “Flesh” (core) and “Thorns” (modules or collections of

subroutines)– Flesh, written in C, glues together various components– Multilingual: User apps can be Fortran, C, C++; automated interface between them

• Abstraction: Cactus Flesh provides API for virtually all CS type operations– Driver functions (storage, communication between processors, etc)– Interpolation, Reduction, etc...– IO (traditional, socket based, remote viz and steering…)– Checkpointing, coordinates– Etc, etc…

• Cactus is a Grid-enabling application middleware...

Page 4: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

How to use Cactus Features• Application scientist usually concentrates on the application...

– Performance– Algorithms– Logically: Operations on a grid (structured or unstructured (coming…))

• ...Then takes advantage of parallel API features enabled by Cactus– IO, Data streaming, remote visualization/steering, AMR, MPI, checkpointing, Grid

Computing, etc…– Abstraction allows one to switch between different MPI, PVM layers, different I/O

layers, etc, with no or minimal changes to application!• (nearly) All architectures supported and autoconfigured

– Common to develop on laptop (no MPI required); run on anything – Compaq / SGI Origin 2000 / T3E / Linux clusters + laptops / Hitachi

/NEC/HP/Windows NT/ SP2, Sun• Metacode Concept

– Very, very lightweight, not a huge framework (not Microsoft Office)– User specifies desired code modules in configuration files– Desired code generated, automatic routine calling sequences, syntax checking, etc…– You can actually read the code it creates...

• http://www.cactuscode.org

Page 5: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Modularity of Cactus...

Application 1

Cactus Flesh

Application 2 ...

Sub-app

AMR (GrACE, etc)

MPI layer 3 I/O layer 2

Unstructured...

Globus Metacomputing Services

User selectsdesired functionality…Code created...

Abstractions...

Remote Steer 2MDS/Remote Spawn

Legacy App 2

Page 6: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Computational Toolkit: provides parallel utilities (thorns) for computational scientist

• Cactus is a framework or middleware for unifying and incorporating code from Thorns developed by the community– Choice of parallel library layers (Native MPI, MPICH, MPICH-G(2), LAM,

WMPI, PACX and HPVM)

– Various AMR schemes: Nested Boxes, GrACE, Coming: HLL, Chombo, Samrai, ???

– Parallel I/O (Panda, FlexIO, HDF5, etc…)

– Parameter Parsing

– Elliptic solvers (Petsc, Multigrid, SOR, etc…)

– Visualization Tools, Remote steering tools, etc…

– Globus (metacomputing/resource management)

– Performance analysis tools (Autopilot, PAPI, etc…)

– Remote visualization and steering

– INSERT YOUR CS MODULE HERE...

PAPI

GrACE/DAGH

Page 7: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Geophysics(Bosl)

Numerical Relativity CommunityCornell

Crack prop.

NASA NS GC

Livermore

SDSS(Szalay)

Intel

Microsoft

Clemson

“Egrid”NCSA, ANL, SDSC

Cactus Community Development Projects

AEI Cactus Group(Allen)

NSF KDI(Suen)

EU Network(Seidel)

Astrophysics(Zeus)

US Grid Forum

DLR

DFN Gigabit(Seidel)

“GRADS”(Kennedy,, et al)

ChemEng(Bishop)

San Diego, GMD, Cornell

Berkeley

Page 8: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Some fun simulations...

QuickTime™ and aMotion JPEG A decompressor

are needed to see this picture.

QuickTime™ and aMotion JPEG B decompressor

are needed to see this picture.

3843, 100GB simulation,Largest production relativity256 Processor Origin 2000 at NCSASimulation, ~500GB output data

Grid Future: Stream data, monitor, steer, distribute, farm tasks, etc

3D Colliding BH’s

3D Waves forming BH’s

Page 9: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Future view of Comp. Science: much of it here already...

• Scale of computations much larger– Complexity approaching that of Nature– Simulations of the Universe and its constituents

• Black holes, neutron stars, supernovae• Airflow around advanced planes, spacecraft• Human genome, human behavior

• Teams of computational scientists working together– Must support efficient, high level problem description– Must support collaborative computational science– Must support all different languages

• Ubiquitous Grid Computing– Very dynamic simulations, deciding their own future– Apps find the resources themselves: distributed, spawned, etc...– Must be tolerant of dynamic infrastructure (variable networks, processor

availability, etc…)– Monitored, viz’ed, controlled from anywhere, with colleagues anywhere else...

Page 10: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Our Team Requires Grid Technologies, Big Machines for Big Runs

WashU

NCSA

Hong Kong

AEI

ZIB

Thessaloniki

How Do We:• Maintain/develop Code?• Manage Computer Resources?• Carry Out/monitor Simulation?

Paris

Page 11: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

What we need and want in simulation science: a higher level Portal to provide the following...

• Got idea? Configuration manager: Write Cactus module, link to other modules, and…• Find resources

– Where? NCSA, SDSC, Garching...– How many computers? Distribute Simulations?– Big jobs: “Fermilab” at disposal: must get it right while the beam is on!

• Launch Simulation– How do get executable there?– How to store data?– What are local queue structure/OS idiosyncracies?

• Monitor the simulation– Remote Visualization live while running

• Limited bandwidth: compute viz. inline with simulation• High bandwidth: ship data to be visualized locally

– Visualization server: all privileged users can login and check status/adjust if necessary• Are parameters screwed up? Very complex!• Call in an expert colleague…let her watch it too

– Performance: how efficient is my simulation? Should something be adjusted?• Steer the simulation

– Is memory running low? AMR! What to do? Refine selectively or acquire additional resources via Globus? Delete unnecessary grids? Performance steering...

• Postprocessing and analysis– 1TByte output at NCSA, research groups in St. Louis and Berlin…how to deal with this?

• Cactus Portal and VMR under development by Michael Russell, Jason Novotny, et al...

Page 12: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Grid-Enabled Cactus (static version)

• Cactus and its ancestor codes have been using Grid infrastructure since 1993 (part of famous I-Way of SC’95)

• Support for Grid computing was part of the design requirements

• Cactus compiles “out-of-the-box” with Globus [using globus device of MPICH-G(2)]

• Design of Cactus means that applications are unaware of the underlying machine/s that the simulation is running on … applications become trivially Grid-enabled

• Infrastructure thorns (I/O, driver layers) can be enhanced to make most effective use of the underlying Grid architecture

Page 13: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Grid Computing Scenarios: Old StuffBut still not used by community: let’s fix this!

• Simple: Sit here, compute there…– But very complex: users in my community still don’t like to do it!– Actually still very hard!– Portal is essential, and still very hard to get good implementation

• Manage app configuration• Choose resources (Which one? How?)• Manage batch, files, results afterwards

• Compute there, monitor and steer…– Visualization– Performance– Science/Engineering output improved…

• Choose multiple sites in advance– Need more than any site could provide (simulate universe or human behavior…)– Need more than any site can provide NOW

• Must wait a week to get 512 procs at NCSA, but could get 256 at NCSA and 256 at ANL now, even if it runs at 50% efficiency!

Page 14: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Remote Visualization and Steering

Remote Viz data

Remote Viz data

HTTP

HDF5

Amira

Any Viz Client

Changing any steerable parameter• Parameters• Physics, algorithms• Performance

IsoSurfaces and GeodesicsComputed inline with simulation

Only geometry sent across network

OpenDX

Arbitrary Grid Functions Streaming HDF5

Page 15: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Remote Offline Visualization

Viz Client (Amira)

HDF5 VFD

DataGrid (Globus)

DPSS FTP HTTP

VisualizationClient

DPSS Server

FTP Server

Web Server Remote

Data Server

Downsampling, hyperslabs

Viz in Berlin

4TB distributed across NCSA/ANL/Garching

Only what is needed

Page 16: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Dynamic Grid Computing Scenarios: New Stuff Must make apps able to respond to dynamic, changing Grid environment...

• Managing intelligent parameter surveys (Condor does this)• Distributing multiple grids across different machines (Climate)• Outsourcing: Spawning off independent jobs to new machines, e.g.

analysis tasks– “Grid Vector”: master code “outsources” slave simulations at every timestep– “Grid Pipeline”: slave processes “outsource” tasks, which outsource…– Elliptic solve taking too long: stream matrix to Dongarra’s Netsolve for help...

• Dynamic staging … seeking out and moving to faster/larger/cheaper machines as they become available

• Scripting capabilities (management, launching new jobs, checking out new code, etc)

• Dynamic load balancing (e.g. inhomogeneous loads, multiple grids), based on performance...

• Etc…many new computing paradigms: preparing papers...

Page 17: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Application Code as Information Server/Gatherer• Code should be aware of its environment

– What resources are out there?– What is their current state?– What is my allocation?– What is the bandwidth and latency between sites?– How can I adjust myself to take advantage of the current state?

• Code should be able to make decisions on its own– A slow part of my simulation can run asychronously…spawn it off!– New, more powerful resources just became available…migrate there!– An unexpected event occurred: checkout, compile, and run new Cactus and

stream data on newly discovered resource…– Python, perl scripting thorns driven by events…send email, ask for help, etc...– Etc...

• Code should be able to publish this information to central server for tracking, monitoring, steering…– Will have entire hierarchies of related simulations…need to track everything...

Page 18: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Cactus Worm: Illustration of basic scenario• Cactus simulation starts, launched from a portal

• Queries MDS, finds available resources

• Migrates itself to next site– Uses some logic to choose next resource

– Starts up remote simulation (passes proxy…)

– Transfers memory contents to remote simulation (using streaming HDF5, scp, GASS, whatever…)

• Registers new location to Cactus GRIS, terminates previous simulation

• User tracks and monitors with continuous remote viz and control using thorn http, streaming data, etc...…

• Continues around Europe, and so on…

• Fun Grid game: Find and trap the Cactus Worm!

• If we can do this, much of what we want can be done!

• Want to build GADK…Grid App Dev Toolkit: bring users into the grid.

Page 19: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Grand PictureRemote steering and monitoring

from airport

Origin: NCSA

Remote Viz in St Louis

T3E: Garching

Simulations launched from Cactus Portal

Grid enabled Cactus runs on

distributed machines

Remote Viz and steering from Berlin

Viz of data from previous simulations in SF café

DataGrid/DPSSDownsampling

Globus

http

HDF5

IsoSurfaces

MathematicaLogin from AEI

Page 20: Cactus and Grid Computing

Albert-Einstein-Institut www.aei-potsdam.mpg.de

Further details...• Cactus

– http://www.cactuscode.org

– http://www.computer.org/computer/articles/einstein_1299_1.htm

• Movies, research overview (needs major updating)– http://jean-luc.ncsa.uiuc.edu

• Simulation Collaboratory/Portal Work: – http://wugrav.wustl.edu/ASC/mainFrame.html

• Remote Steering, high speed networking– http://www.zib.de/Visual/projects/TIKSL/

– http://jean-luc.ncsa.uiuc.edu/Projects/Gigabit/

• EU Astrophysics Network– http://www.aei-potsdam.mpg.de/research/astro/eu_network/index.html