49
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

  • View
    232

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

CSCI 4440 / 8446

Parallel ComputingThree Sorting Algorithms

Page 2: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Outline

Sorting problemSequential quicksortParallel quicksortHyperquicksortParallel sorting by regular sampling

Page 3: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sorting Problem

Permute: unordered sequence ordered sequenceTypically key (value being sorted) is part of record with additional values (satellite data)Most parallel sorts designed for theoretical parallel models: not practicalOur focus: internal sorts based on comparison of keys

Page 4: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

17 14 65 4 22 63 11

Unordered list of values

Page 5: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

17 14 65 4 22 63 11

Choose pivot value

Page 6: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

1714 654 22 6311

Low list( 17)

High list(> 17)

Page 7: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

174 6511 22 6314

Recursivelyapply quicksortto low list

Page 8: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

174 2211 63 6514

Recursivelyapply quicksortto high list

Page 9: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Sequential Quicksort

174 2211 63 6514

Sorted list of values

Page 10: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Attributes of Sequential Quicksort

Average-case time complexity: (n log n)Worst-case time complexity: (n2)

Occurs when low, high lists maximally unbalanced at every partitioning step

Can make worst-case less probable by using sampling to choose pivot value

Example: “Median of 3” technique

Page 11: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Quicksort Good Starting Point for Parallel Algorithm

SpeedGenerally recognized as fastest sort in average casePreferable to base parallel algorithm on fastest sequential algorithm

Natural concurrencyRecursive sorts of low, high lists can be done in parallel

Page 12: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Definitions of “Sorted”

Definition 1: Sorted list held in memory of a single processorDefinition 2:

Portion of list in every processor’s memory is sortedValue of last element on Pi’s list is less than or equal to value of first element on Pi+1’s list

We adopt Definition 2: Allows problem size to scale with number of processors

Page 13: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 91, 15, 64, 21, 8, 88, 54

50, 12, 47, 72, 65, 54, 66, 22

83, 66, 67, 0, 70, 98, 99, 82

20, 40, 89, 47, 19, 61, 86, 85

P0

P1

P2

P3

Page 14: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 91, 15, 64, 21, 8, 88, 54

50, 12, 47, 72, 65, 54, 66, 22

83, 66, 67, 0, 70, 98, 99, 82

20, 40, 89, 47, 19, 61, 86, 85

P0

P1

P2

P3

Process P0 chooses and broadcastsrandomly chosen pivot value

Page 15: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 91, 15, 64, 21, 8, 88, 54

50, 12, 47, 72, 65, 54, 66, 22

83, 66, 67, 0, 70, 98, 99, 82

20, 40, 89, 47, 19, 61, 86, 85

P0

P1

P2

P3

Exchange “lower half” and “upper half” values”

Page 16: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 15, 64, 21, 8, 54, 66, 67, 0, 70

50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61

83, 98, 99, 82, 91, 88

89, 86, 85

P0

P1

P2

P3

After exchange step

Lower“half”

Upper“half”

Page 17: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 15, 64, 21, 8, 54, 66, 67, 0, 70

50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61

83, 98, 99, 82, 91, 88

89, 86, 85

P0

P1

P2

P3

Processes P0 and P2 choose andbroadcast randomly chosen pivots

Lower“half”

Upper“half”

Page 18: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

75, 15, 64, 21, 8, 54, 66, 67, 0, 70

50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61

83, 98, 99, 82, 91, 88

89, 86, 85

P0

P1

P2

P3

Exchange values

Lower“half”

Upper“half”

Page 19: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

15, 21, 8, 0, 12, 20, 19

50, 47, 72, 65, 54, 66, 22, 40,47, 61, 75, 64, 54, 66, 67, 70

83, 82, 91, 88, 89, 86, 85

98, 99

P0

P1

P2

P3

Exchange values

Lower “half”of lower “half”

Lower “half”of upper “half”

Upper “half”of lower “half”

Upper “half”of upper “half”

Page 20: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Quicksort

0, 8, 12, 15, 19, 20, 21

22, 40, 47, 47, 50, 54, 54, 61,64, 65, 66, 66, 67, 70, 72, 75

82, 83, 85, 86, 88, 89, 91

98, 99

P0

P1

P2

P3

Each processor sorts values it controls

Lower “half”of lower “half”

Lower “half”of upper “half”

Upper “half”of lower “half”

Upper “half”of upper “half”

Page 21: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Analysis of Parallel Quicksort

Execution time dictated by when last process completesAlgorithm likely to do a poor job balancing number of elements sorted by each processCannot expect pivot value to be true medianCan choose a better pivot value

Page 22: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

Start where parallel quicksort ends: each process sorts its sublistFirst “sortedness” condition is metTo meet second, processes must still exchange valuesProcess can use median of its sorted list as the pivot valueThis is much more likely to be close to the true median

Page 23: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

75, 91, 15, 64, 21, 8, 88, 54

50, 12, 47, 72, 65, 54, 66, 22

83, 66, 67, 0, 70, 98, 99, 82

20, 40, 89, 47, 19, 61, 86, 85

P0

P1

P2

P3

Number of processors is a power of 2

Page 24: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

19, 20, 40, 47, 61, 85, 86, 89

P0

P1

P2

P3

Each process sorts values it controls

Page 25: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

8, 15, 21, 54, 64, 75, 91, 88

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

19, 20, 40, 47, 61, 85, 86, 89

P0

P1

P2

P3

Process P0 broadcasts its median value

Page 26: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

8, 15, 21, 54, 64, 75, 91, 88

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

19, 20, 40, 47, 61, 85, 86, 89

P0

P1

P2

P3

Processes will exchange “low”, “high” lists

Page 27: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

0, 8, 15, 21, 54

12, 19, 20, 22, 40, 47, 47, 50, 54

64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99

61, 65, 66, 72, 85, 86, 89

P0

P1

P2

P3

Processes merge kept and received values.

Page 28: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

0, 8, 15, 21, 54

12, 19, 20, 22, 40, 47, 47, 50, 54

64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99

61, 65, 66, 72, 85, 86, 89

P0

P1

P2

P3

Processes P0 and P2 broadcast median values.

Page 29: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

0, 8, 15, 21, 54

12, 19, 20, 22, 40, 47, 47, 50, 54

64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99

61, 65, 66, 72, 85, 86, 89

P0

P1

P2

P3

Communication pattern for second exchange

Page 30: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Hyperquicksort

0, 8, 12, 15

19, 20, 21, 22, 40, 47, 47, 50, 54, 54

61, 64, 65, 66, 66, 67, 70, 72, 75, 82

83, 85, 86, 88, 89, 91, 98, 99

P0

P1

P2

P3

After exchange-and-merge step

Page 31: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Complexity Analysis Assumptions

Average-case analysisLists stay reasonably balancedCommunication time dominated by message transmission time, rather than message latency

Page 32: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Complexity Analysis

Initial quicksort step has time complexity ((n/p) log (n/p))Total comparisons needed for log p merge steps: ((n/p) log p)Total communication time for log p exchange steps: ((n/p) log p)

Page 33: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Isoefficiency Analysis

Sequential time complexity: (n log n)Parallel overhead: (n log p)Isoefficiency relation:n log n C n log p log n C log p n pC

The value of C determines the scalability. Scalability depends on ratio of communication speed to computation speed.

1//)( CCC pppppM

Page 34: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Another Scalability Concern

Our analysis assumes lists remain balancedAs p increases, each processor’s share of list decreasesHence as p increases, likelihood of lists becoming unbalanced increasesUnbalanced lists lower efficiencyWould be better to get sample values from all processes before choosing median

Page 35: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Parallel Sorting by Regular Sampling (PSRS Algorithm)

Each process sorts its share of elementsEach process selects regular sample of sorted listOne process gathers and sorts samples, chooses pivot values from sorted sample list, and broadcasts these pivot valuesEach process partitions its list into p pieces, using pivot valuesEach process sends partitions to other processesEach process merges its partitions

Page 36: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

75, 91, 15, 64, 21, 8, 88, 54

50, 12, 47, 72, 65, 54, 66, 22

83, 66, 67, 0, 70, 98, 99, 82

P0

P1

P2

Number of processors does nothave to be a power of 2.

Page 37: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

Each process sorts its list using quicksort.

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 38: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

Each process chooses p regular samples.

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 39: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

One process collects p2 regular samples.

15, 54, 75, 22, 50, 65, 66, 70, 83

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 40: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

One process sorts p2 regular samples.

15, 22, 50, 54, 65, 66, 70, 75, 83

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 41: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

One process chooses p-1 pivot values.

15, 22, 50, 54, 65, 66, 70, 75, 83

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 42: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

One process broadcasts p-1 pivot values.

15, 22, 50, 54, 65, 66, 70, 75, 83

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 43: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

Each process divides list, based on pivotvalues.

8, 15, 21, 54, 64, 75, 88, 91

12, 22, 47, 50, 54, 65, 66, 72

0, 66, 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 44: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

Each process sends partitions tocorrect destination process.

8, 15, 21 12, 22, 47, 50 0

54, 64 54, 65, 66 66

75, 88, 91 72 67, 70, 82, 83, 98, 99

P0

P1

P2

Page 45: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

PSRS Algorithm

Each process merges p partitions.

0, 8, 12, 15, 21, 22, 47, 50

54, 54, 64, 65, 66, 66

67, 70, 72, 75, 82, 83, 88, 91, 98, 99

P0

P1

P2

Page 46: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Assumptions

Each process ends up merging close to n/p elementsExperimental results show this is a valid assumptionProcessor interconnection network supports p simultaneous message transmissions at full speed4-ary hypertree is an example of such a network

Page 47: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Time Complexity Analysis

ComputationsInitial quicksort: ((n/p)log(n/p))Sorting regular samples: (p2 log p)Merging sorted sublists: ((n/p)log pOverall: ((n/p)(log n + log p) + p2log p)

CommunicationsGather samples, broadcast pivots: (log p)All-to-all exchange: (n/p)Overall: (n/p)

Page 48: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Isoefficiency Analysis

Sequential time complexity: (n log n)Parallel overhead: (n log p)Isoefficiency relation:n log n Cn log p log n C log pScalability function same as for hyperquicksortScalability depends on ratio of communication to computation speeds

Page 49: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms

Summary

Three parallel algorithms based on quicksortKeeping list sizes balanced

Parallel quicksort: poorHyperquicksort: betterPSRS algorithm: excellent

Average number of times each key moved:Parallel quicksort and hyperquicksort: log p / 2PSRS algorithm: (p-1)/p