Upload
lyle
View
33
Download
1
Embed Size (px)
DESCRIPTION
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing. Rahul .S. Sampath May 9 th 2007. Computational Power Today…. Floating Point Operations Per Second (FLOPS). Humans doing long division: Milli-flops (1/1000th of one flop) - PowerPoint PPT Presentation
Citation preview
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing
Rahul .S. Sampath
May 9th 2007
Computational Power Today…
Floating Point Operations Per Second (FLOPS) Humans doing long division: Milli-flops (1/1000th of
one flop) Cray-1 supercomputer, 1976, $8m: 80 MFLOPS Pentium II, 400 mhz: 100 MFLOPS
TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS Sony Playstation 3, 2006: 2 TFLOPS IBM TRIPS, 2010 (one-chip solution, CPU only): 1
TFLOPS IBM Blue Gene, < 2010 (with 65,536
microprocessors): 360 TFLOPS
Why do we need more?
"DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, 1980.
"640k ought to be enough for anybody"--Bill Gates, 1981.
Bottom-line: Demand for computational power will continue to increase.
Some Computationally Intensive Applications Today
Computer Aided Surgery Medical Imaging MD simulations FEM simulations with > 10^10 unknowns Galaxy formation and evolution 17 million particle Cold Dark Matter
Cosmology simulation
Any application, which can be scaled up
should be treated as a computationally intensive application.
The Need for Parallel Computing Memory (RAM)
There is a theoretical limit on the RAM that is available on your computer. 32 bit systems: 4GB (2^32) 64 bit systems: 16 exabytes (> 16,000 TB)
Speed Upgrading microprocessors can’t help you anymore Flops is not the bottleneck, memory is What we need is more registers Think pre-computing, higher bandwidth memory bus, L2/L3
cache, compiler optimizations, assembly language Asylum Or… Think parallel…
Hacks
If Speed is not an issue… Is out-of-core implementation an option?
Parallel programs can be converted into out-of-core implementations easily.
Parallel Algorithms
The Key Questions
Why? Memory Speed Both
What kind of platform? Shared Memory Distributed Computing
Typical size of the application Small (< 32 processors) Medium ( 32 - 256 processors) Large (> 256 processors)
How much time and effort do you want to invest? How many times will the component be used in a single
execution of the program?
Factors to Consider in any Parallel Algorithm Design Give equal work to all processors at all times
Load Balancing Give equal amount of data to all processors
Efficient Memory Management Processors should work independently as much as possible
Minimize communication, especially iterative communication If communication is necessary, try to do some work in the
background as well Overlapping communication and computation
Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible Optimal Work Algorithm
Difference Between Sequential and Parallel Algorithms
Not all data is accessible at all times All computations must be as localized as possible
Can’t have random access New dimension to the existing algorithm – division of work
Which processor does what portion of the work? If communication can not be avoided
How will it be initiated? What type of communication? What are the pre-processing and post-processing operations?
Order of operations could be very critical for performance
Parallel Algorithm Approaches Data-Parallel Approach
Partition the data among the processors Each processor will execute the same set of commands
Control-Parallel Approach Partition the tasks to be performed among the processors Each processor will execute different commands
Hybrid Approach Switch between the two approaches at different stages of
the algorithm Most parallel algorithms fall in this category
Performance Metrics
Speedup Overhead Scalability
Fixed Size Iso-granular
Efficiency Speedup per processor
Iso-Efficiency Problem size as a function of p in order to keep efficiency
constant
The Take Home Message
A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm.
What model to use? – Problem dependent. e.g. a+b+c+… = (a+b) + (c+d) + … Not much choice really.
It is a big investment, but can really be worth it.
Parallel Programming
How does a parallel program work? You request a certain number of processors You setup a communicator
Give a unique id to each processor – rank Every processor executes the same program Inside the program
Query for the rank and use it decide what to do Exchange messages between different processors using
their ranks In theory, you only need 3 functions: Isend, Irecv, wait In practice, you can optimize communication depending on
the underlying network topolgoy – Message Passing Standards…
Message Passing Standards
The standards define a set of primitive communication operations.
The vendors implementing these on any machine are responsible to optimize these operations for that machine.
Popular Standards Message Passing Interface (MPI) Open Message Passing (OpenMP)
Languages that support MPI
Fortran 77
C/C++
Python
Matlab
MPI Implementations
MPICH ftp://info.mcs.anl.gov/pub/mpi
LAM http://www.mpi.nd.edu/lam/download
CHIMP ftp://ftp.epcc.ed.ac.uk/pub/chimp/release
WinMPI (Windows) ftp://csftp.unomaha.edu/pub/rewini/WinMPI
W32MPI (Windows) http://dsg.dei.uc.pt/wmpi/intro.html
Open Source Parallel Software PETSc ( Linear and NonLinear Solvers )
http://www-unix.mcs.anl.gov/petsc/petsc-as/ ScaLAPACK ( Linear Algebra )
http://www.netlib.org/scalapack/scalapack_home.html SPRNG ( Random Number Generator )
http://sprng.cs.fsu.edu/ Paraview ( Visualization )
http://www.paraview.org/HTML/Index.html NAMD ( Molecular Dynamics )
http://www.ks.uiuc.edu/Research/namd/ CHARMM++ ( Parallel Objects )
http://charm.cs.uiuc.edu/research/charm/
References
Parallel Programming with MPI, Peter S. Pacheco Introduction to Parallel Computing, A. Grama, A.
gupta, G. Karypis, V. Kumar MPI-The Complete Reference, William Gropp et.al. http://www-unix.mcs.anl.gov/mpi/ http://www.erc.msstate.edu/mpi http://www.epm.ornl.gov/~walker/mpi http://www.erc.msstate.edu/mpi/mpi-faq.html (FAQ) Comp.parallel.mpi (Newsgroup) http://www.mpi-forum.org (MPI Forum)
Thank You