29
Parallel Algorithms DESIGN AND ANALYSIS OF ALGORITHMS PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel algorithms

Embed Size (px)

Citation preview

Page 1: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel AlgorithmsDESIGN AND ANALYSIS OF ALGORITHMS

Page 2: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Group Members Arsalan Ali Daim (BSCS14068)

Danish Javed (BSCS14028)

Muhammad Hamza (BSCS14062)

Page 3: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Outline1. What is Parallel Algorithm?

2. Its Abilities.

3. Why Parallel Computing?

4. Parallel Algorithms.

5. Limitations for Parallel Algorithms.

Page 4: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

What is Parallel Algorithm? A parallel algorithm is an algorithm that has been specifically written for execution on a computer with two or more processors.But it can be run on computers with single processor

(multiple functional units, pipelined functional units, pipelined memory systems)

Page 5: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

What makes Parallel Algorithms better?• Throughput: Is the number of operations done per time unit.

• Latency : Is the time needed to complete one operation.

Page 6: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Why Parallel Computing?The Real World is Massively Parallel:• In the natural world, many complex, interrelated events are happening at the same time, yet within a

temporal sequence.• Compared to serial computing, parallel computing is much better suited for modeling, simulating and

understanding complex, real world phenomena.

Page 7: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Why Parallel Computing?SOLVE LARGER / MORE COMPLEX PROBLEMS:• Many problems are so large and/or complex that it is impractical or impossible to solve them on a

single computer, especially given limited computer memory.• Example: Web search engines/databases processing millions of transactions per second

TAKE ADVANTAGE OF NON-LOCAL RESOURCES:• Using computer resources on a wide area network, or even the Internet when local computer

resources are scarce or insufficient.

Page 8: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Hardware implementation forParallel algorithms (PRAM MODEL)

In the PRAM model, processors communicate by reading from and writing to the shared memory locations

Page 9: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Classification of PRAM MODELPRAM is classified in two basic types

1. CRAM(Concurrent RAM)

2. ERAM(Exclusive RAM)

And they also have some combinations.

Page 10: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Algorithm1. Odd – Even Transposition Sort

2. Parallel Merge Sort

3. Computing Sum of a Sequence with parallelism

There are many more…

Page 11: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Odd – Even Transposition Sort Variation of bubble sort. Operates in two alternating phases, even phase and odd phase. Even phase

Even-numbered processes exchange numbers with their right neighbors. Odd phase

Odd-numbered processes exchange numbers with their right neighbors.

Page 12: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Odd – Even Transposition Sort – Example

Parallel time complexity: Tpar = O(n) (for P=n)

Page 13: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Algorithm

Page 14: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Odd – Even Transposition Sort Assuming our array of n elements to sort is very large, we will be working with many virtual processors on the same processor to emulate one Process per element.

Page 15: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Merge Sort Example of divide and conquer algorithms.

Sorting algorithm to sort a vector, first divide it into two parts.

Apply same method again to each subdivided part. When both are sorted with m and n elements.

Merge them and it will produce sorted vector.

The average complexity T(n) = O(nlogn)

Page 16: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Merge SortDivided into two tasks:

1.Divide the list2.Conquer the list

Page 17: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Merge Sort Divide the list onto different processors

Simple tree structure like this:

Page 18: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Merge Sort Merge elements as they come together.

Simple tree structure like this:

Page 19: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Merge Sort – Algorithm ALGORITHM: mergesort(A)

1 if (|A| = 1) then return A

2 else

3 in parallel do

4 L := mergesort(A[0..|A|/2))

5 R := mergesort(A[|A|/2..|A|))

6 return merge(L, R)

Page 20: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Merge Sort – Complexity Sequential Merge Sort = O(nlogn)

In Parallel, we have n processors

logn time required to divide sequence

logn time required to merge sequence

logn+logn = 2logn

So, T(n) = O(logn)

Page 21: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Computing the sum of a Sequence.

Consider a sequence of n elements.

Devise an algorithm that performs many operations in parallel.

In parallel, each element of A with an even index is paired and summed with the next element of A. Like , A[0] is paired with A[1], A[2] with A[3], and so on.

The result is a new sequence of n/2 numbers.⌈ ⌉ This pairing and summing step can be repeated until, after log⌈ 2 n steps, a sequence consisting ⌉of a single value is produced, and this value is equal to the final sum.

Sequentially, its time complexity is O(n) but using this technique of parallelism the time complexity reduced to O(log2n).

Page 22: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

The Limitations and Problems• Data Dependency

• Race Condition

• Resource Requirements

• Scalability

• Parallel Slowdown

Page 23: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Data Dependency Results from multiple use of the same location(s) in storage by different tasks.

e.g.

for (int i=0;i<100;i++)

array[i]=array[i-1]*20;

Shared memory architectures -synchronize read/write operations between tasks.

Page 24: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Race Condition If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and 3B, the program will produce incorrect data. This is known as a race condition.

Page 25: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Resource Requirements• The primary intent of parallel programming is to decrease execution wall clock time, however

in order to accomplish this, more CPU time is required. For example, a parallel code that runs in 1 hour on 8 processors actually uses 8 hours of CPU time.

• The amount of memory required can be greater for parallel codes than serial codes, due to the need to replicate data and for overheads associated with parallel support libraries and subsystems.

Page 26: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Scalability Two types of scaling based on time to solution:

◦ Strong scaling: The total problem size stays fixed as more processors are added.◦ Weak scaling: The problem size per processor stays fixed as more processors are added.

Hardware factors play a significant role in scalability. Examples:◦ Memory-CPU bus bandwidth ◦ Amount of memory available on any given machine or set of machines◦ Processor clock speed

Page 27: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Slowdown• Not all parallelization results in speed-up.

• Once task split up into multiple threads those threads spend a large amount of time communicating among each other resulting degradation in the system.

• This is known as parallel slowdown.

Page 28: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

Parallel Slowdown – Example I have observed a few such attempts where parallel code used the Threading Building Blocks library (TBB). Much to experimenters’ astonishment, not only their simple parallel programs sometimes expose no reasonable speedup but even those can be slower than sequential counterparts!

Conclusion: when developing programs with TBB, you should take into account that using TBB classes and functions may impact compiler optimizations, which has especially bad impact on simple algorithms with small amount of work per iteration. Proper use of local variables helps optimization and improves parallel speedup.

For Further info: https://software.intel.com/en-us/blogs/2008/03/04/why-a-simple-test-can-get-parallel-slowdown

Page 29: Parallel algorithms

PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)

The End.

شکریہ