23
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul .S. Sampath May 9 th 2007

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

  • Upload
    lyle

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing. Rahul .S. Sampath May 9 th 2007. Computational Power Today…. Floating Point Operations Per Second (FLOPS). Humans doing long division: Milli-flops (1/1000th of one flop) - PowerPoint PPT Presentation

Citation preview

Page 1: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Rahul .S. Sampath

May 9th 2007

Page 2: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Computational Power Today…

Page 3: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Floating Point Operations Per Second (FLOPS) Humans doing long division: Milli-flops (1/1000th of

one flop) Cray-1 supercomputer, 1976, $8m: 80 MFLOPS Pentium II, 400 mhz: 100 MFLOPS

TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS Sony Playstation 3, 2006: 2 TFLOPS IBM TRIPS, 2010 (one-chip solution, CPU only): 1

TFLOPS IBM Blue Gene, < 2010 (with 65,536

microprocessors): 360 TFLOPS

Page 4: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Why do we need more?

"DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, 1980.

"640k ought to be enough for anybody"--Bill Gates, 1981.

Bottom-line: Demand for computational power will continue to increase.

Page 5: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Some Computationally Intensive Applications Today

Computer Aided Surgery Medical Imaging MD simulations FEM simulations with > 10^10 unknowns Galaxy formation and evolution 17 million particle Cold Dark Matter

Cosmology simulation

Page 6: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Any application, which can be scaled up

should be treated as a computationally intensive application.

Page 7: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

The Need for Parallel Computing Memory (RAM)

There is a theoretical limit on the RAM that is available on your computer. 32 bit systems: 4GB (2^32) 64 bit systems: 16 exabytes (> 16,000 TB)

Speed Upgrading microprocessors can’t help you anymore Flops is not the bottleneck, memory is What we need is more registers Think pre-computing, higher bandwidth memory bus, L2/L3

cache, compiler optimizations, assembly language Asylum Or… Think parallel…

Page 8: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Hacks

If Speed is not an issue… Is out-of-core implementation an option?

Parallel programs can be converted into out-of-core implementations easily.

Page 9: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Parallel Algorithms

Page 10: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

The Key Questions

Why? Memory Speed Both

What kind of platform? Shared Memory Distributed Computing

Typical size of the application Small (< 32 processors) Medium ( 32 - 256 processors) Large (> 256 processors)

How much time and effort do you want to invest? How many times will the component be used in a single

execution of the program?

Page 11: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Factors to Consider in any Parallel Algorithm Design Give equal work to all processors at all times

Load Balancing Give equal amount of data to all processors

Efficient Memory Management Processors should work independently as much as possible

Minimize communication, especially iterative communication If communication is necessary, try to do some work in the

background as well Overlapping communication and computation

Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible Optimal Work Algorithm

Page 12: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Difference Between Sequential and Parallel Algorithms

Not all data is accessible at all times All computations must be as localized as possible

Can’t have random access New dimension to the existing algorithm – division of work

Which processor does what portion of the work? If communication can not be avoided

How will it be initiated? What type of communication? What are the pre-processing and post-processing operations?

Order of operations could be very critical for performance

Page 13: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Parallel Algorithm Approaches Data-Parallel Approach

Partition the data among the processors Each processor will execute the same set of commands

Control-Parallel Approach Partition the tasks to be performed among the processors Each processor will execute different commands

Hybrid Approach Switch between the two approaches at different stages of

the algorithm Most parallel algorithms fall in this category

Page 14: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Performance Metrics

Speedup Overhead Scalability

Fixed Size Iso-granular

Efficiency Speedup per processor

Iso-Efficiency Problem size as a function of p in order to keep efficiency

constant

Page 15: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

The Take Home Message

A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm.

What model to use? – Problem dependent. e.g. a+b+c+… = (a+b) + (c+d) + … Not much choice really.

It is a big investment, but can really be worth it.

Page 16: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Parallel Programming

Page 17: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

How does a parallel program work? You request a certain number of processors You setup a communicator

Give a unique id to each processor – rank Every processor executes the same program Inside the program

Query for the rank and use it decide what to do Exchange messages between different processors using

their ranks In theory, you only need 3 functions: Isend, Irecv, wait In practice, you can optimize communication depending on

the underlying network topolgoy – Message Passing Standards…

Page 18: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Message Passing Standards

The standards define a set of primitive communication operations.

The vendors implementing these on any machine are responsible to optimize these operations for that machine.

Popular Standards Message Passing Interface (MPI) Open Message Passing (OpenMP)

Page 19: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Languages that support MPI

Fortran 77

C/C++

Python

Matlab

Page 20: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

MPI Implementations

MPICH ftp://info.mcs.anl.gov/pub/mpi

LAM http://www.mpi.nd.edu/lam/download

CHIMP ftp://ftp.epcc.ed.ac.uk/pub/chimp/release

WinMPI (Windows) ftp://csftp.unomaha.edu/pub/rewini/WinMPI

W32MPI (Windows) http://dsg.dei.uc.pt/wmpi/intro.html

Page 21: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Open Source Parallel Software PETSc ( Linear and NonLinear Solvers )

http://www-unix.mcs.anl.gov/petsc/petsc-as/ ScaLAPACK ( Linear Algebra )

http://www.netlib.org/scalapack/scalapack_home.html SPRNG ( Random Number Generator )

http://sprng.cs.fsu.edu/ Paraview ( Visualization )

http://www.paraview.org/HTML/Index.html NAMD ( Molecular Dynamics )

http://www.ks.uiuc.edu/Research/namd/ CHARMM++ ( Parallel Objects )

http://charm.cs.uiuc.edu/research/charm/

Page 22: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

References

Parallel Programming with MPI, Peter S. Pacheco Introduction to Parallel Computing, A. Grama, A.

gupta, G. Karypis, V. Kumar MPI-The Complete Reference, William Gropp et.al. http://www-unix.mcs.anl.gov/mpi/ http://www.erc.msstate.edu/mpi http://www.epm.ornl.gov/~walker/mpi http://www.erc.msstate.edu/mpi/mpi-faq.html (FAQ) Comp.parallel.mpi (Newsgroup) http://www.mpi-forum.org (MPI Forum)

Page 23: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Thank You