Upload
desiree-bowers
View
60
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Introduction to Message Passing. CMSC 34000 Lecture 3 1/11/05. Class goals. Parallel hardware and software issues First look at actual algorithm (trapezoidal rule for numerical integration) Introduction to message passing in MPI Networks and the cost of communication. - PowerPoint PPT Presentation
Citation preview
Introduction to Message Passing
CMSC 34000
Lecture 3
1/11/05
Class goals
• Parallel hardware and software issues
• First look at actual algorithm (trapezoidal rule for numerical integration)
• Introduction to message passing in MPI
• Networks and the cost of communication
Parallel Hardware (Flynn’s Taxonomy)
• SISD
• MIMD
• SIMD
• MISD
S = Single, M = Multiple
I = Instruction stream
D = Data stream
Von Neumann & Modern
• CPU (control and arithmetic)
• Memory (main & register
• Data/instruction transfer bottleneck
• Pipelining (multiple instructions operating simultaneously)
• Vectorizing (single instruction acts on vector register
• Cache -- memory hierarchy
SIMD / MIMD
• Single CPU for control
• Many (scalar) ALUs with registers
• One clock
• Many CPUs• Each has control and
ALU• Memory may be
“shared” or “distributed”
• Synchronized?
Shared memory MIMD
• Bus (contention)
• Switch (expensive)
• Cache-coherency?
Distributed memory MIMD
• General interconnection network (e.g. CS Linux system)
• Each processor has its own memory
• To share information, processors must pass (and receive) messages that go over the network.
• Topology is very important
Different mesh topologies
• Totally connected
• Linear array/ring
• Hypercube
• Mesh/Torus
• Tree / hypertree
• Ethernet…
• And others
Issues to consider
• Routing (shortest path = best-case cost of single message)
• Contention - multiple messages between different processors must share a wire
• Programming: would like libraries that hide all this (somehow)
Numerical integration
• Approximate
• Using quadrature:
• Repeated subdivision: €
f (x)dxa
b
∫
€
f (x)dxa
b
∫ ≈ hfa + b
2
⎛
⎝ ⎜
⎞
⎠ ⎟
€
f (x)dxa
b
∫ ≈ h f x i( )i=1
n
∑
Finite differences
• On (0,1):
• At endpoints
€
− ′ ′ u = f
€
u(0) = u(1) = 0
€
′ u (x i) ≈u(x i+1) − u(x i)
Δx
′ ′ u (x i) ≈u(x i+1) − 2u(x i) + u(x i−1)
Δx 2
System of Equations
• Algebraic system of equations at each point
• “Nearest neighbor stencil”
• A row of the matrix looks like
€
(L 0 −1 2 −1 0 L )
Parallel strategy: Integration
• Divide [a,b] into p intervals
• Approximation on each subinterval
• Sum approximations over each processor– How do we communicate?– Broadcast / reduction
Parallel strategy: Finite differences
• How do we multiply the matrix by a vector (needed in Krylov subspace methods)?
• Each processor owns:– A range of points– A range of matrix rows– A range of vector entries
• To multiply by a vector (linear array)– Share the values at endpoints with neighbors
SPMD (Integration)
• Single program running on multiple data– Summation over intervals– Particular points are different
• Instances of program can talk to each other– All must share information at the same time– Synchronization
MPI
• Message Passing Interface
• Developed in 1990’s
• Standard for– Sharing message– Collective communication– Logical topologies– Etc
Integration in MPI
• Python bindings developed by Pat Miller (LLNL)
• Ignore data types, memory size for now
• Look at sample code
Two versions
• explicit send and receive
• O(p) communication cost (at best)
• Broadcast sends to all processes
• Reduce collects information to a single process
• Run-time depends on topology, implementation
Fundamental model of a message
• Processor p “sends”• Processor q “receives”• Information needed:
– Address (to read from/write to)– Amount of data being sent– Type of data– Tag to screen the messages– How much data actually received?
MPI Fundamentals
• MPI_COMM_WORLD
• MPI_Init() // import mpi
• MPI_Comm_size() // mpi.size
• MPI_Comm_rank() // mpi.rank
• MPI_Finalize() // N/A
MPI Fundamentals
• send• receive• non-blocking versions
• broadcast• reduce• other collective ops
Communication costs over a network
• Send, broadcast, reduce– Linear array– Point-to-point– Binary tree
Getting started with MPI on CS machines
• Machines available (Debian unstable)– bombadil, clark, guts, garfield
• mpich (installed on our system already)– mpicc is the compiler (mpic++, mpif77,etc) – mpirun -np x -machinefile hosts executable args)
• download pyMPI & build in home directory– http://sourceforge.net/projects/pympi/ – ./configure --prefix = /home/<you>– builds out of box (fingers crossed)