Basic of Parallel Algorithm Design

7/29/2019 Basic of Parallel Algorithm Design

1/16

uses some of the slides for chapters 3 and 5 accompanying

Introduction to Parallel Computing,

Addison Wesley, 2003.

http://www-users.cs.umn.edu/~karypis/parbook


2/16

Preliminaries: Decomposition,

Tasks, and Dependency Graphs Parallel algorithm should decompose the problem

into tasks that can be executed concurrently.

A decomposition can be illustrated in the form of adirected graph (task dependency graph).

nodes = tasks and edges = dependencies: the result of onetask is required for processing the next

Degree of concurrency determines the amount oftasks which can indeed run in parallel.

Critical path is the longest path in the dependencygraph lower bound on program runtime.


3/16

Example: Multiplying a Dense Matrix

with a Vector

Observations: While tasks share data (namely, the vector b ),they do not have any control dependencies - i.e., no taskneeds to wait for the (partial) completion of any other. Alltasks are of the same size in terms of number of operations. Isthis the maximum number of tasks we could decompose this

problem into?

Computation of each element ofoutput vectory is independent ofother elements. Based on this, adense matrix-vector product canbe decomposed into nindependent tasks. Is this themaximum number of tasks wecould decompose this probleminto?


4/16

Multiplying a dense matrix with a

vector

2n CPUs availableTask 1 Task 2

On what kind of platform will we have 2N processors?


5/16

Multiplying a dense matrix with a

vector

2n CPUs availableTask 1

Task 2 Task 2n+1

Task 1 Task 2Task 2n+1

Task 2n-1 Task 2n

Task 3n


6/16

Granularity of Task Decompositions The size of tasks into which a problem is

decomposed is called granularity


7/16

Parallelization efficiency


8/16

Guidelines for good parallel algorithm

design Maximize concurrency.

Spread tasks evenly between processors to avoid

idling and achieve load balancing. Execute tasks on critical path as soon as the

dependencies are satisfied.

Minimize(overlap) communication between

processors.


9/16

Evaluating parallel algorithm Speedup S = Tserial / Tparallel

Efficiency E = S / #processors

Cost

C = #processors * Tparallel Scalability

Speedup (efficiency) as a function of number ofprocessors with fixed task size.


10/16

Simple example: summing n numbers on p CPUs


11/16

Summing n numbers on p CPUs Parallel Time = ((n/p)log p)

Serial Time = (n)

Speedup = (p/(log p))

Efficiency = (1/log p)


12/16

Parallel multiplication of 3 matricesA, B, C rectangular matrices N x N

We want to compute in parallel: A x B x C

How to achieve the maximum performance?


13/16

A B TEMP

X

TEMP C Result

X

Try 12-CPU parallelization:


14/16

A B C'

X

C' C Result

X

Try 23-CPU parallelization:

Is this a good way?


15/16

A(1,1)*B(1,1) A(1,2)*B(2,1) A(1,1)*B(1,2) A(1,2)*B(2,2)

T(1,1)*C(1,1) T(2,1)*C(2,1)

Result(1,1)

+

+

+

Dynamic task creation -

Simple Master-Worker Identify independent computations

Create a queue of tasks

Each process picks from the queue and may insert anew task to the queue


16/16

Hypothetic implementation

by multiple threads Let's assign separate thread for each multiply/sumoperation.

How do we coordinate the threads?

Initialize threads to know their dependencies.

Build producer-consumer like logic for every thread.

Basic of Parallel Algorithm Design

Documents

PARALLEL A* ALGORITHM

3 Parallel Algorithm Design - ntnu.edu.tw

Sanders: Algorithm Engineering – Parallel Sorting

Parallel FFT in Julia Slides - MIT OpenCourseWare · Parallel Transpose Algorithm . 10. 18.337 . Parallel Transpose Algorithm . 11. 18.337 . Julia implementations - Input is represented

Chapter 3. Parallel Algorithm Design Methodology · 2014. 2. 20. · CSci 493.65 Parallel Computing Chapter 3 Parallel Algorithm Design Prof. Stewart Weiss Chapter 3 Parallel Algorithm

Parallel algorithm designMapping Parallel algorithm design – p. 2. General remarks Warning: Parallel algorithm design is not always easily reduced to simple recipes ... Foster’s

Parallel Algorithm Models 1232998220568990 2

ISSN: A DIFFERENTIAL EVOLUTION ALGORITHM PARALLEL ... · A DIFFERENTIAL EVOLUTION ALGORITHM PARALLEL ... promising alternative to write parallel codes in an efficient and economical

Parallel Algorithm Oriented Mesh Datastructure

A PARALLEL ALGORITHM FOR GLOBAL ROUTING

Parallel Algorithm Design

Parallel ½-approx Weighted Matching Algorithm

Parallel Algorithm Design - paginas.fe.up.pt

Parallel ICA Algorithm and Modeling

Parallel Thinning Algorithm

A Parallel Biobjective Shortest Path Algorithm

ECSE 420 - Parallel Cholesky Algorithm - Report

Principles of Parallel Algorithm Design

Parallel Search Algorithm

Implementation of dijsktra’s algorithm in parallel