19
CS 361 – Chapters 8-9 Sorting algorithms Selection, insertion, bubble, “swap” Merge, quick, stooge Counting, bucket, radix How to select the n-th largest/smallest element without sorting.

CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Embed Size (px)

Citation preview

Page 1: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

CS 361 – Chapters 8-9

• Sorting algorithms– Selection, insertion, bubble, “swap” – Merge, quick, stooge– Counting, bucket, radix

• How to select the n-th largest/smallest element without sorting.

Page 2: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

The problem

• Arrange items in a sequence so that their keys are in ascending or descending order.

• Long history of research. Many new algorithms are tweaks of famous sorting algorithms.

• Some methods do better based on how values distributed, size of data, nature of underlying hardware (parallel processing, memory hierarchy), etc.

• Some implementations given on the class Web site and also:

http://cg.scs.carleton.ca/~morin/misc/sortalg

Page 3: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Some methods

• Selection sort: Find the largest value and swap it into first position, find 2nd largest value and put it 2nd, etc.

• Bubble sort: Scan the list and see which consecutive values are out of order, and swap them. Multiple passes are required.

• Insertion sort: Place the next element in the correct place by shifting other ones over to make room. We maintain a boundary between the sorted and unsorted parts of the list.

• Merge sort: Split list in half until just 1-2 elements. Merge adjacent lists by collating them.

Page 4: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Analysis

• What is the best case () time for any sorting algorithm?

• Selection, bubble, and insertion sort all run in O(n2) time. –Why?–Among the three: which is the best, which is the worst?

• Merge sort runs in O(n log2 n) time.–If we imagine the tree of recursive calls, the nested calls go about log2 n deep. At each level, we must do O(1) work at each of the n values. –Later, we will use a more systematic approach to compute the complexity of recursive algorithms.

Page 5: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Quick sort

• Like merge sort, it’s a recursive algorithm based on divide and conquer.

• Call quickSort initially with parameters (array, 0, n – 1).

quickSort(a, p, r):

if p < r:

q = partition(a, p, r)

quickSort(a, p, q)

quickSort(a, q+1, r)

• What makes quick sort distinctive is its partitioning. See handout.

Page 6: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

QS partitioning

• Given a sub-array spanning indices p..r• Let x = value of first element here, i.e. a[p]• We want to put smaller values on left side and larger

values on right side of this array slice.• We return the location of the boundary between the low

and high regions.

• Practice with handout!

Page 7: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Counting sort

• Designed to run in linear time• Works well when range of values is not large• Find how many values are less than x = a[i]. That will tell you

where x belongs in output sorted array.for i = 1 to k:

C[i] = 0

for i = 1 to n: // Let C[x] = #elements == x

++ C[A[i]]

for i = 2 to k: // Let C[x] = #elements <= x

C[i] += C[i – 1]

for i = n downto 1: // Put sorted values into B.

B[C[A[i]]] = A[i]

-- C[A[i]] // try: 3,6,4,1,3,4,1,4

Page 8: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Bucket sort

• Assume array’s values are (more or less) evenly distributed over some range.

• Create n buckets, each covering 1/n of the range.• Insert each a[i] into the appropriate bucket.• If a bucket winds up with 2+ values, use any method to

sort them.

• Ex. { 63, 42, 87, 37, 60, 58, 95, 75, 97, 3 }– We can define buckets by tens

Page 9: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Radix sort

• Old and simple method.

• Sort values based on their ones’ digit. – In other words, write down all numbers ending with 0, followed by

all numbers ending with 1, etc.

• Continue: Sort by the tens’ digit. Then by the hundreds’ digit, etc.

• Can easily be modified to alphabetize words.• Technique also useful for sorting records by several fields.

Page 10: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Stooge sort

• Designed to show that divide & conquer does not automatically mean a faster algorithm.

• Soon we will learn how to mathematically determine the exact O(g(n)) runtime.

stoogeSort(A, i, j):

if A[i] > A[j]:

swap A[i] and A[j]

if i+1 >= j:

return

k = (j – i + 1) / 3

stoogeSort(A, i, j – k) // how much of A?

stoogeSort(A, i + k, j)

stoogeSort(A, i, j – k)

Page 11: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest
Page 12: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Selection

• The “selection problem” is: given a sequence of values, return the k-th smallest value, for some k.– If k = 1 or n, problem is simple.– It would be easy to write a O(n log n) algorithm by sorting all

values first. But this does unnecessary work.

• A randomized method with expected runtime of O(n).– Based on randomized quick sort: choose any value to be the

pivot– So it’s called “randomized quick select”– Algorithm takes as input S and k, where 1 k n. (Indices

count from 1)

Page 13: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Pseudocode

quickSelect(S,k):

if n = 1

return S[1]

x = random element from S

L = [ all elements < x ]

E = [ all elements == x ]

G = [ all elements > x ]

if k <= |L|

return quickSelect(L, k)

else if k <= |L| + |E|

return x

else

return quickSelect(G, k – |L| - |E|)

// e.g. 12th out of 20 = 2nd out of 10

Page 14: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Analysis

• To find O(g(n)), we are going to find an upper bound on the “expected” execution time.

• Expected, as in expected value – the long term average if you repeated the random experiment many times.

• Book has some preliminary notes…– You can add expected values, but not probabilities.– Consider rolling 1 vs. 2 dice.

P1(rolling 4) = 1/6 but P2(rolling 8) = 5/36

So the probabilities don’t add!

You can only add probability if it’s 2 alternatives of the same experiment, e.g. rolling a 4 or a 5 on one die: 1/6+1/6.

– Exp(1 die) = 3.5 Exp(2 dice) = 7

These values can add.

Page 15: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Selection proof

• We want to show that the selection algorithm is O(n).• The algorithm is based on partitioning S.

– Define a “good” partition = where x is in the middle half of the distribution of values (not in the middle half of locations).

– Probability = ½, and sizes of L and of G used for the next recursive call have sizes .75n.

• How many recursive calls until we have a “good” partition? Same as asking how many times a coin flips until we get heads: we would expect 2.

• Overhead in doing one function invocation.– We need a loop, so this is O(n). Say, bn for some constant b.

• T(n) = expected time of algorithm• T(n) T(.75n) + 2bn

Page 16: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Work out recurrence

T(n) T(.75n) + 2bn

Let’s expand T(.75n) T(.752 n) + 2b(.75n)

Substitute: T(n) T(.752 n) + 2b(.75n) + 2bn

Expand: T(.752 n) T(.753 n) + 2b(.752 n)

Substitute: T(n) T(.753 n) + 2b(.752 n) + 2b(.75n) + 2bn

We can keep going and eventually the argument of T on the right side becomes at most 1. (The O(1) base case.) When does that occur?

Solve for k: (.75k n) 1

(3/4)k 1/n (4/3)k n k log4/3 n k = ceil(log4/3 n)

So, T(n) T(1) + k terms of 2bn multiplied by .75 i (for i = 0 to k)

(This sum is at most 4.)

T(n) O(1) + 2bn (4) = O(n)

Page 17: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Merge sort

• We can use the same technique to analyze merge sort. (p. 248). Let’s look at cost of recursive case:

T(n) = 2 T(n/2) + cn

Expand recursive case: T(n/2) = 2 T(n/4) + c(n/2)

Substitute: T(n) = 2 T(n/2) + cn = 2 [ 2 T(n/4) + c(n/2) ] + cn

= 4 T(n/4) + 2cn

Expand: T(n/4) = 2 T(n/8) + c(n/4)

Substitute: T(n) = 4 T(n/4) + cn = 4 [ 2 T(n/8) + c(n/4) ] + 2cn

= 8 T(n/8) + 3cn. See a pattern?

T(n) = 2k T(n/2k) + kcn

At some point, n/2k = 1 n = 2k k = log2 n.

T(n) = 2log2 n T(1) + (log2 n) cn = O(n log2 n).

Page 18: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Stooge sort

• Yes, even Stooge sort can be analyzed a similar way!T(n) = 3 T((2/3)n) + cn

Expand: T((2/3)n) = 3 T((4/9) n) + c((2/3)n)

Substitute: T(n) = 3 T((2/3)n) + cn

= 3 [ 3 T((4/9) n) + c((2/3)n) ] + cn

= 9 T((4/9) n) + 3cn

Expand: T((4/9) n) = 3 T((8/27) n) + c((4/9)n)

Substitute: T(n) = 9 T((4/9) n) + 3cn

= 9 [3 T((8/27) n) + c((4/9)n) ] + 3cn

= 27 T((8/27) n) + 7cn

Continuing, we observe:

T(n) = 3k T((2/3)k n) + (2k – 1)cn

Page 19: CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest

Stooge sort (2)

At some point, the recursive argument reaches (or goes below) 1.

((2/3)k n) = 1 (2/3)k = 1/n (3/2)k = n k = log3/2 n

So T(n) = 3 log 3/2 n T(1) + (2 log 3/2 n) – 1) cn

= O(3 log 3/2 n ) + O(n 2 log 3/2 n)

Is this exponential complexity? No – let’s simplify:

3 log 3/2 n = ((3/2) log 3/2 3) log 3/2 n

= ((3/2) log 3/2 n) log 3/2 3

= n log 3/2 3

The other term can be simplified similarly and we have n 1 + log 3/2 2 which turns out to be the same order.

T(n) = O(n log 3/2 3).