Upload
richard-hood
View
215
Download
0
Embed Size (px)
Citation preview
Department of Computer ScienceUniversity of the West Indies
Part II
Parallel Programming?
ENIAC, University of Pennsylvania 1946(http://www.library.upenn.edu/special/gallery/mauchly/jwmintro.html)
The Need For Power
Computational Science
Traditional scientific and engineering paradigm Do theory or paper design Perform experiments or build system
Replacing both by numerical experiments Real phenomena are too complicated to model by hand Real experiments are:
too hard, e.g., build large wind tunnels too expensive, e.g., build a throw-away passenger jet too slow, e.g., wait for climate or galactic evolution too dangerous, e.g., weapons, drug design
Computational Science Examples
Astrophysical thermonuclear flashes
Nuclear weapons
Weather prediction
Climate and atmospheric modeling
Drug design
Blood flow
Fluid dynamics (CFD)
Fluid Dynamics
Forced convective heat transfer
Buoyant convection
Hairpin vortex generation
Rayleigh-Taylor instability
Hairpin Vortices - Transition to Turbulence
Boundary layer flow past a hemispherical roughness element Re=200-2000 based on hemisphere height K=512-8168 spectral elements of polynomial degree N=7-15
Simulation Cost
Cost is O(Re3)
Re=1K simulation ~ 1 week on 512 processors of ASCI Red 50GF, 64 GB
Re=10K ~ 1 year on all 8192 processors of ASCI Red 800GF, 1TB
We’re really interested in Re=1M …
Can’t even think of doing the Re=1K problem on a uniprocessor machine let alone the 10K or 1M problems!
The Necessity of Parallel Computing
How fast can a serial computer be?
Consider the 1 Tflop sequential machine data must travel some distance, r, to get from memory to CPU to get 1 data element per cycle, this means 1012 times per second at the
speed of light, c = 3e8 m/s r < c/1012 = 0.3 mm
Now put 1 TB of storage in a .3 mm2 area each word occupies about 3 Angstroms2, the size of a small atom
r = .3 mm1 Tflop 1 TB sequential machine
Even if we could make it ...
... it’d be too expensive
Market forces are dictating use of COTS
The Solution ?
Add more workers!
Use a collection of processors and memory modules to work together to solve our problems
Supercomputers, MPPs, Clusters, Beowulfs
Bad News
Still Lots of Work
Decide on and implement an interconnection network for the processors and memory modules
Design and implement system software for the hardware
Devise algorithms and data structures for solving our problems
Divide the algorithms and data structures up into subproblems
Identify the communication that will be needed between the subproblems
Assign subproblems to processors and memory modules
Modern Layered Framework
CAD
Multiprogramming Sharedaddress
Messagepassing
Dataparallel
Database Scientific modeling Parallel applications
Programming models
Communication abstractionUser/system boundary
Compilationor library
Operating systems support
Communication hardware
Physical communication medium
Hardware/software boundary