Introduction to Message Passing

Introduction to Message Passing

CMSC 34000

Lecture 3

1/11/05

Class goals

• Parallel hardware and software issues

• First look at actual algorithm (trapezoidal rule for numerical integration)

• Introduction to message passing in MPI

• Networks and the cost of communication

Parallel Hardware (Flynn’s Taxonomy)

• SISD

• MIMD

• SIMD

• MISD

S = Single, M = Multiple

I = Instruction stream

D = Data stream

Von Neumann & Modern

• CPU (control and arithmetic)

• Memory (main & register

• Data/instruction transfer bottleneck

• Pipelining (multiple instructions operating simultaneously)

• Vectorizing (single instruction acts on vector register

• Cache -- memory hierarchy

SIMD / MIMD

• Single CPU for control

• Many (scalar) ALUs with registers

• One clock

• Many CPUs• Each has control and

ALU• Memory may be

“shared” or “distributed”

• Synchronized?

Shared memory MIMD

• Bus (contention)

• Switch (expensive)

• Cache-coherency?

Distributed memory MIMD

• General interconnection network (e.g. CS Linux system)

• Each processor has its own memory

• To share information, processors must pass (and receive) messages that go over the network.

• Topology is very important

Different mesh topologies

• Totally connected

• Linear array/ring

• Hypercube

• Mesh/Torus

• Tree / hypertree

• Ethernet…

• And others

Issues to consider

• Routing (shortest path = best-case cost of single message)

• Contention - multiple messages between different processors must share a wire

• Programming: would like libraries that hide all this (somehow)

Numerical integration

• Approximate

• Using quadrature:

• Repeated subdivision: €

f (x)dxa

b

∫

€

f (x)dxa

b

∫ ≈ hfa + b

2

⎛

⎝ ⎜

⎞

⎠ ⎟

€

f (x)dxa

b

∫ ≈ h f x i( )i=1

n

∑

Finite differences

• On (0,1):

• At endpoints

€

− ′ ′ u = f

€

u(0) = u(1) = 0

€

′ u (x i) ≈u(x i+1) − u(x i)

Δx

′ ′ u (x i) ≈u(x i+1) − 2u(x i) + u(x i−1)

Δx 2

System of Equations

• Algebraic system of equations at each point

• “Nearest neighbor stencil”

• A row of the matrix looks like

€

(L 0 −1 2 −1 0 L )

Parallel strategy: Integration

• Divide [a,b] into p intervals

• Approximation on each subinterval

• Sum approximations over each processor– How do we communicate?– Broadcast / reduction

Parallel strategy: Finite differences

• How do we multiply the matrix by a vector (needed in Krylov subspace methods)?

• Each processor owns:– A range of points– A range of matrix rows– A range of vector entries

• To multiply by a vector (linear array)– Share the values at endpoints with neighbors

SPMD (Integration)

• Single program running on multiple data– Summation over intervals– Particular points are different

• Instances of program can talk to each other– All must share information at the same time– Synchronization

MPI

• Message Passing Interface

• Developed in 1990’s

• Standard for– Sharing message– Collective communication– Logical topologies– Etc

Integration in MPI

• Python bindings developed by Pat Miller (LLNL)

• Ignore data types, memory size for now

• Look at sample code

Two versions

• explicit send and receive

• O(p) communication cost (at best)

• Broadcast sends to all processes

• Reduce collects information to a single process

• Run-time depends on topology, implementation

Fundamental model of a message

• Processor p “sends”• Processor q “receives”• Information needed:

– Address (to read from/write to)– Amount of data being sent– Type of data– Tag to screen the messages– How much data actually received?

MPI Fundamentals

• MPI_COMM_WORLD

• MPI_Init() // import mpi

• MPI_Comm_size() // mpi.size

• MPI_Comm_rank() // mpi.rank

• MPI_Finalize() // N/A

MPI Fundamentals

• send• receive• non-blocking versions

• broadcast• reduce• other collective ops

Communication costs over a network

• Send, broadcast, reduce– Linear array– Point-to-point– Binary tree

Getting started with MPI on CS machines

• Machines available (Debian unstable)– bombadil, clark, guts, garfield

• mpich (installed on our system already)– mpicc is the compiler (mpic++, mpif77,etc) – mpirun -np x -machinefile hosts executable args)

• download pyMPI & build in home directory– http://sourceforge.net/projects/pympi/ – ./configure --prefix = /home/<you>– builds out of box (fingers crossed)

Documents

Introduction to Message Passing