22
Parallel algorithms Parallel and Distributed Computing Wrocław, 07.05.2010 Paweł Duda

Parallel algorithms

Embed Size (px)

Citation preview

Page 1: Parallel algorithms

Parallel algorithms

Parallel and Distributed ComputingWrocław, 07.05.2010

Paweł Duda

Page 2: Parallel algorithms

2

Parallel algorithm – definition

A parallel algorithm is an algorithm that has been specifically written for execution on a computer with two or more processors.

Page 3: Parallel algorithms

3

Parallel algorithms

can be run on computers with single processor (multiple functional units, pipelined functional units, pipelined memory systems)

Page 4: Parallel algorithms

4

Modelling algorithms 1

when designing algorithm, take into account the cost of communication, the number of processors (efficiency)

designer usually uses an abstract model of computation called parallel random-access machine (PRAM)

each CPU operation = one step

model’s advantages

Page 5: Parallel algorithms

5

Modelling algorithms 2 - PRAM

neglects such isses as synchronisation and communication

no limit on the number of processors in the machine

any memory location is uniformely accessible from any processor

no limit on the amount of shared memory in the system

Page 6: Parallel algorithms

6

Modelling algorithms 3 - PRAM

no conflict in accessing resources

generally the programs written on those machines are MIMD

Page 7: Parallel algorithms

7

Multiprocessor model

Page 8: Parallel algorithms

8

Parallel Algorithms

Multiprocessor model

Page 9: Parallel algorithms

9

Work-depth model

How the cost of the algorithm can be calculated?Work - WDepth - DP = W/D – PARALLELISM of the algorithm

Picture: Summing 16 numbers on a tree.The total depth (longest chain of dependencies) is 4 and The total work (number of operations) is 15.

Page 10: Parallel algorithms

10

Mergesort

Conceptually, a merge sort works as follows:- input: sequence of n keys- output: sorted sequence of n keys

If the list is of length 1, then it is already sorted.

Otherwise:

• Divide the unsorted list into two sublists of about half the size.• Sort each sublist recursively by re-applying merge sort.• Merge the two sublists back into one sorted list.

Page 11: Parallel algorithms

11

Mergesort

Page 12: Parallel algorithms

12

• General-purpose computing on graphics processing units (GPGPU) - recent trend

• GPUs co-processors • linear algebra matrix operations

General-purpose computing on graphics processing units (GPGPU)

Nvidia's Tesla GPGPU card

Page 13: Parallel algorithms

13

Algorithm: MATRIX_MULTIPLY(A,B)1 (l,m) := dimensions (A)2 (m,n) := dimensions (B)3 in parallel for i ∊ [o..l) do4 in parallel for j ∊ [0..n) do5 Rij := sum( { Aik * Bkj : k ∊ [0..m) } )

Matrix multiplication

Page 14: Parallel algorithms

14

We need log n matrix multiplications, each taking time O(n3)

The serial complexity of this procedure is O(n3log n).

This algorithm is not optimal, since the best known algorithms have complexity O(n3).

Matrix multiplication

Page 15: Parallel algorithms

15

Search

Dynamic creation of tasks and channels during program execution

Looking for nodes coresponding to ‘solutions’

Initially a task created for the root of the tree

procedure search(A)begin if(solution(A)) then score = eval(A); report solution and score else foreach child A(i) of A search (A(i)) endfor endifend

Page 16: Parallel algorithms

16

Shortest-Path Algorithms

Picture: A simple directed graph, G, and its adjacency matrix, A. 

The all-pairs shortest-path problem involves finding the shortest path between all pairs of vertices in a graph.

A graph G=(V,E) comprises a set V of N vertices {vi} , and a set E ⊆ V x X of edges.

For (vi, vj) and (vi,vj), i ≠ j

Page 17: Parallel algorithms

17

Floyd’s algorithm

Floyd’s algorithm is a graph analysis algorithm for finding shortest paths in a weighted graph.

A single execution of the algorithm will find the shortest paths between all pairs of vertices.

Page 18: Parallel algorithms

18

parallel Floyd’s algorithm 1

Parallel Floyd’s algorithm 1

The first parallel Floyd algorithm is based on a one-dimensional, rowwise domain decomposition of the intermediate matrix I and the output matrix S.

the algorithm can use at most N processors.

Each task has one or more adjacent rows of I and is responsible for performing computation on those rows.

Page 19: Parallel algorithms

19

parallel Floyd’s algorithm 1

Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous block of rows. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block and the k th row. 

Page 20: Parallel algorithms

20

parallel Floyd’s algorithm 2

Parallel Floyd’s algorithm 2

An alternative parallel version of Floyd's algorithm uses a two-dimensional decomposition of the various matrices.

This version allows the use of up to N2  processors

Page 21: Parallel algorithms

21

parallel Floyd’s algorithm 2

Parallel Floyd 2

Parallel version of Floyd's algorithm based on a two-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous submatrix. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block, and part of the k th row and column. 

Page 22: Parallel algorithms

Thank you for attention