33
Programming the ANDC Cluster Sudhang Shankar

Parallel Programming on the ANDC cluster

Embed Size (px)

Citation preview

Page 1: Parallel Programming on the ANDC cluster

Programming the ANDC Cluster

Sudhang Shankar

Page 2: Parallel Programming on the ANDC cluster

Traditional Programming

● :S e r ia l One instruction at a time, one after the other, on a single CPU.

PROBLEM

t6 t5 t4 t3 t2 t1 CPU

Instructions

Page 3: Parallel Programming on the ANDC cluster

The Funky Ishtyle

● :P a r a lle l The problem is split in parts. Each part is represented as a sequence of instructions. Each such sequence is run on a separate CPU.

Problem

Sub-Problem1

Sub-Problem2

t3 t2 t1

t3 t2 t1

CPU1

CPU2

Page 4: Parallel Programming on the ANDC cluster

Why Parallelise?

● S p e e d - ”Many Hands Make Light Work”

● /P r e c is io n S c a le – We can solve bigger problems, with greater accuracy.

Page 5: Parallel Programming on the ANDC cluster

Parallel Programming Models

● There are several parallel programming models in common use:– Shared Memory– Threads– Message Passing– Data Parallel

Page 6: Parallel Programming on the ANDC cluster

Message Passing Model

● The applications on the ANDC cluster as of now work on this model

● Tasks use their own local memory during computation

● Task exchange data through messages

Page 7: Parallel Programming on the ANDC cluster

The MPI Standard

● MPI: Message Passing Interface– A standard, with many implementations– Codifies ”best practises” of the Parallel Design

community– Implementations

● LAM/MPI – Argonne Labs● MPICH● openMPI

Page 8: Parallel Programming on the ANDC cluster

How MPI works

● c o m m u n ic a t o r s define which collection of processes may communicate with each other

● Every process in a communicator has a unique r a n k

● The s iz e of the communicatoris the total no. of processesin the communicator

Page 9: Parallel Programming on the ANDC cluster

MPI primitives – Environment Setup● _ :MP I IN IT initialises the MPI execution environment● _ _ :MP I C O MM S IZE Determines the number of

processes in the group associated with a communicator● _ _ :MP I C O MM R AN K

Determines the rank of the calling process within the communicator

● _ :MP I F IN ALIZE Terminates the MPI execution environment

Page 10: Parallel Programming on the ANDC cluster

MPI primitives – Message Passing

● MPI_Send(buffer,count,type,dest,tag,comm) ● MPI_Recv(buffer,count,type,source,tag,comm,s

tatus)

Page 11: Parallel Programming on the ANDC cluster

An Example Application

● The Monte-Carlo Pi Estimation Algorithm● AKA ”The Dartboard Algorithm”

Page 12: Parallel Programming on the ANDC cluster

Algorithm Description

● Imagine you have a square ”dartboard”, with a circle inscribed in it:

Page 13: Parallel Programming on the ANDC cluster

● Randomly throw N darts at the board● Count the no of HITS ( darts landing within

circle)

hits flops

Page 14: Parallel Programming on the ANDC cluster

● pi will be the value obtained after multiplying the ratio of hits to total throws by 4

:Wh y = p i A c / r 2

A s = 4 r 2r 2 = A s / 4

= 4 * p i A c / A s

hits flops

Page 15: Parallel Programming on the ANDC cluster

Parallel Version● Make each worker throw an equal number of

darts● A worker counts the HITS● The Master adds all the individual ”HITS”● It then computes pi as:

pi = (4.)*(HITS)/N

Page 16: Parallel Programming on the ANDC cluster

To Make it Faster....

● Increase the number of workers p, while keeping N constant.

● Each worker deals with (N/p) throws● so the greater the value of p, the fewer throws a

worker handles● Fewer throws => faster

Page 17: Parallel Programming on the ANDC cluster

To Make it ”Better”

● Increase the number of throws N● This makes the calculation more accurate

Page 18: Parallel Programming on the ANDC cluster

MPI● For each task, run the dartboard algorithm

homehits = dboard(DARTS);● Workers send homehits to master

if (taskid != MASTER) MPI_Send(&homehits, 1,

MPI_DOUBLE,

MASTER, count,

MPI_COMM_WORLD);

Page 19: Parallel Programming on the ANDC cluster

● Master gets homehit values from workers for (i=0;i<p;i++) {

rc = MPI_Recv(&hitrecv, 1,MPI_DOUBLE, MPI_ANY_SOURCE,

mtype, MPI_COMM_WORLD, &status);

totalhits = totalhits + hitrecv; }

● Master calculates pi as pi = (4.0)*(totalhits)/N

Page 20: Parallel Programming on the ANDC cluster

MapReduce● Framework for simplifying the development of

parallel programs.● Developed at Google.● FLOSS implementations

– Hadoop (Java) from Yahoo– Disco (Erlang) from Nokia– Dumbo from Audioscrobbler– Many others (including a 36-line Ruby one!)

Page 21: Parallel Programming on the ANDC cluster

● The MapReduce library requires the user to implement:– Map(): takes as input a function and a sequence of values.

Applies the function to each value in the sequence

– Reduce(): combines all the elements of a sequence using a binary operation

MapReduce

Page 22: Parallel Programming on the ANDC cluster

How it works (oversimplified)● ()m a p takes as input a set of <key,value> pairs

produces a set of <intermediate key,value> pairs● This is all done in parallel, across many machines● The parallelisation is done by the mapreduce library (the

programmer doesn't have to think about it)

Page 23: Parallel Programming on the ANDC cluster

● The MapReduce Library groups together all intermediate values associated with the same intermediate key I and passes them to reduce()

● A ()R e d u c e instance takes a set of <intermediate key,value> pairs and produces an output value for that key, like a ”summary” value.

Page 24: Parallel Programming on the ANDC cluster

Pi Estimation in MapReduce● Here, map() is the dartboard algo.● Each worker runs the algo. Hits are represented as

<1,no_of_Hits> and flops as <0,no_of_flops>● Thus Each Map() instance returns two <boolean,count> to

the MapReduce library.

Page 25: Parallel Programming on the ANDC cluster

● The library then clumps all the <bool,count> pairs into two ”sets”: one for key 0 and one for key 1 and passes them to reduce()

● Reduce() then adds up the ”count” for each key to produce a ”grand total”. Thus we know the total hits and total flops. These are output as <key, Grand_total> pairs to the master process.

● The master then finds pi as pi = 4*(hits)/(hits+flops)

Page 26: Parallel Programming on the ANDC cluster

Other Solutions

● PVM● OpenMP● LINDA● Occam● Parallel/Scientific Python●

Page 27: Parallel Programming on the ANDC cluster

Problems/Limitations

● :P a r a lle l S lo w d o w n parallelization of a parallel computer program beyond a certain point causes the program to run slower

● :Th e Am d a h l P r in c ip le Parallel speedup is limited by the sequential fraction of the program.

Page 28: Parallel Programming on the ANDC cluster

Applications

● Compute clusters are used whenever we have:– lots and lots of data to process, – Too little time to work sequentially.

Page 29: Parallel Programming on the ANDC cluster

Finance● :R is k A s s e s s m e n t

– India's NSE uses a linux cluster in order to monitor the risk of members

– Broker crosses VAR limit => account disabled– VAR is calculated in realtime using P R IS M (Parallel Risk

Management System), which uses MPI– NSE's PRISM handles 500 trades/sec

and can scale to 1000 trades/sec.

Page 30: Parallel Programming on the ANDC cluster

Molecular Dynamics

● Given a collection of atoms, we’d like to calculate how they interact and move under realistic laboratory conditions

● expensive part: determining the force on each atom, since it depends on the positions of all other atoms in the system

Page 31: Parallel Programming on the ANDC cluster

● Software: – :G R O MAC S helps scientists simulate the

behavior of large molecules (like proteins, lipids, and even polymers)

– :P yM o l molecular graphics and modelling package which can be also used to generate animated sequences. Raytraced Lysozyme structure

created with Pymol

Page 32: Parallel Programming on the ANDC cluster

Other Distributed Problems

● Rendering multiple frames of high-quality animation (eg – Shrek)

● Indexing the web (eg – Google)● Data Mining

Page 33: Parallel Programming on the ANDC cluster

Questions?