21
CSCI-455/552 Introduction to High Performance Computing Lecture 11

CSCI-455/552 Introduction to High Performance Computing Lecture 11

Embed Size (px)

Citation preview

CSCI-455/552

Introduction to High Performance Computing

Lecture 11

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.9

Bucket SortOne “bucket” assigned to hold numbers that fall within each region. Numbers in each bucket sorted using a sequential sorting algorithm.

Sequential sorting time complexity: O(nlog(n/m).Works well if the original numbers uniformly distributed across a known interval, say 0 to a - 1.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.10

Parallel Version of Bucket SortSimple approach

Assign one processor for each bucket.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11

Further ParallelizationPartition sequence into m regions, one region for each processor.

Each processor maintains p “small” buckets and separates numbers in its region into its own small buckets.

Small buckets then emptied into p final buckets for sorting, which requires each processor to send one small bucket to each of the other processors (bucket i to processor i).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.12

Another Parallel Version of Bucket Sort

Introduces new message-passing operation - all-to-all broadcast.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.13

“all-to-all” Broadcast RoutineSends data from each process to every other process

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.14

“all-to-all” routine actually transfers rows of an array to columns:Transposes a matrix.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort • The critical aspect of the above algorithm is one of

assigning ranges to processors. This is done by suitable splitter selection.

• The splitter selection method divides the n elements into p blocks of size n/p each, and sorts each block by using quicksort.

• From each sorted block it chooses p – 1 evenly spaced elements.

• The p(p – 1) elements selected from all the blocks represent the sample used to determine the buckets.

• This scheme guarantees that the number of elements ending up in each bucket is uniformed (less than 2n/p).

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort

An example of the execution of sample sort on an array with 24 elements on three processes.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved.

Parallel Bucket and Sample Sort

• The splitter selection scheme can itself be parallelized.

• Each processor generates the p – 1 local splitters in parallel.

• All processors share their splitters using a single all-to-all broadcast operation.

• Each processor sorts the p(p – 1) elements it receives and selects p – 1 uniformly spaces splitters from them.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.11

Parallel Complexity Analysis

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.15

Numerical Integration Using Rectangles

Each region calculated using an approximation given by rectangles:Aligning the rectangles:

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.16

Numerical Integration Using Trapezoidal Method

May not be better!

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.17

Adaptive QuadratureSolution adapts to shape of curve. Use three areas, A, B, and C. Computation terminated when largest of A and B sufficiently close to sum of remain two areas .

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, @ 2004 Pearson Education Inc. All rights reserved. 4.18

Adaptive Quadrature with False Termination.

Some care might be needed in choosing when to terminate.

Might cause us to terminate early, as two large regions are the same (i.e., C = 0).