Upload
gyles-sharp
View
219
Download
2
Embed Size (px)
Citation preview
Computer AlgorithmsLecture 10
Quicksort
Ch. 7
Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR
Quicksort
• A triumph of analysis by C.A.R. Hoare– The quicksort algorithm was developed in 1960 by C. A. R.
Hoare while in the Soviet Union, as a visiting student at Moscow State University. At that time, Hoare worked in a project on machine translation for the National Physical Laboratory. He developed the algorithm in order to sort the words to be translated, to make them more easily matched to an already-sorted Russian-to-English dictionary that was stored on magnetic tape.
• Worst-case execution time – (n2).• Average-case execution time – (n lg n).
– How do the above compare with the complexities of other sorting algorithms?
• Empirical and analytical studies show that quicksort can be expected to be twice as fast as its competitors.
Design
• Follows the divide-and-conquer paradigm.
• Divide: Partition (separate) the array A[p..r] into two (possibly empty) subarrays A[p..q–1] and A[q+1..r].– Each element in A[p..q–1] A[q].– A[q] each element in A[q+1..r].– Index q is computed as part of the partitioning procedure.
• Conquer: Sort the two subarrays by recursive calls to quicksort.
• Combine: The subarrays are sorted in place – no work is needed to combine them.
• How do the divide and combine steps of quicksort compare with those of merge sort?
PseudocodeQuicksort(A, p, r)
if p < r thenq := Partition(A, p, r);Quicksort(A, p, q – 1);Quicksort(A, q + 1, r)
Quicksort(A, p, r)if p < r then
q := Partition(A, p, r);Quicksort(A, p, q – 1);Quicksort(A, q + 1, r)
Partition(A, p, r)x, i := A[r], p – 1;for j := p to r – 1 do
if A[j] x theni := i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partition(A, p, r)x, i := A[r], p – 1;for j := p to r – 1 do
if A[j] x theni := i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
5
A[p..r]
A[p..q – 1] A[q+1..r]
5 5
Partition 5
Example p rinitially: 2 5 8 3 9 4 1 7 10 6 note: pivot (x) = 6 i j
next iteration: 2 5 8 3 9 4 1 7 10 6 i j
next iteration: 2 5 8 3 9 4 1 7 10 6 i j
next iteration: 2 5 8 3 9 4 1 7 10 6 i j
next iteration: 2 5 3 8 9 4 1 7 10 6 i j
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Example (Continued)next iteration: 2 5 3 8 9 4 1 7 10 6 note: pivot (x) = 6 i j
next iteration: 2 5 3 8 9 4 1 7 10 6 i j
next iteration: 2 5 3 4 9 8 1 7 10 6 i j
next iteration: 2 5 3 4 1 8 9 7 10 6 i j
next iteration: 2 5 3 4 1 8 9 7 10 6 i j
next iteration: 2 5 3 4 1 8 9 7 10 6 i j
after final swap: 2 5 3 4 1 6 9 7 10 8 i j
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partitioning• Select the last element A[r] in the subarray A[p..r] as the pivot – the
element around which to partition.
• As the procedure executes, the array is partitioned into four (possibly empty) regions.
1. A[p..i] — All entries in this region are pivot.
2. A[i+1..j – 1] — All entries in this region are > pivot.
3. A[r] = pivot.
4. A[j..r – 1] — Not known how they compare to pivot.
• The above hold before each iteration of the for loop, and constitute a loop invariant. (4 is not part of the LI.)
Correctness of Partition• Use loop invariant.
• Initialization:– Before first iteration
• A[p..i] and A[i+1..j – 1] are empty – Conds. 1 and 2 are satisfied (trivially).
• r is the index of the pivot – Cond. 3 is satisfied.
• Maintenance:
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
5
A[p..q – 1] A[q+1..r]
5 5
Partition 5
ri p,j
1. A[p..i]: pivot. 2. A[i+1..j – 1]: > pivot.3. A[r] = pivot.4. A[j..r – 1]: Not known how they compare
to pivot.
Correctness of Partition
>x x
p i j r
x > x
x
p i j r
x > x
Case 1: A[j] > xIncrement j only.LI is maintained.
A[r] is unaltered.Condition 3 is maintained.
1. A[p..i]: pivot. 2. A[i+1..j – 1]: > pivot.3. A[r] = pivot.4. A[j..r – 1]: Not known how they compare to pivot.
Correctness of Partition
x x
p i j r
x > x
• Case 2: A[j] x– Increment i– Swap A[i] and A[j]
• Condition 1 is maintained.
– Increment j• Condition 2 is maintained.
A[r] is unaltered.Condition 3 is maintained.
x > x
x
p i j r
1. A[p..i]: pivot. 2. A[i+1..j – 1]: > pivot.3. A[r] = pivot.4. A[j..r – 1]: Not known how they
compare to pivot.
Correctness of Partition
• Termination:– When the loop terminates, j = r, so all elements in A are
partitioned into one of the three cases: • A[p..i] pivot• A[i+1..j – 1] > pivot• A[r] = pivot
• The last two lines swap A[i+1] and A[r].– Pivot moves from the end of the array to between the two
subarrays.– Thus, procedure partition correctly performs the divide
step.
Complexity of Partition
• PartitionTime(n) is given by the number of iterations in the for loop.
• (n) : n = r – p + 1.
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Partition(A, p, r)x A[r]
i p – 1;for j p to r – 1
do if A[j] x theni i + 1;
A[i] A[j]A[i + 1] A[r];return i + 1
Analysis of Quicksort: Worst case
• In the worst case, partitioning always divides the size n array into these three parts:– A length one part, containing the pivot itself– A length zero part, and– A length n-1 part, containing everything else
• We don’t recur on the zero-length part• Recurring on the length n-1 part requires (in the
worst case) recurring to depth n-1
Worst case partitioning
Performance of Quicksort• Worst-case partitioning
– One region has 1 element and one has n – 1 elements
– Maximally unbalanced
• Recurrence
T(n) = T(n – 1) + T(1) + (n)
= )(1 2
1
nknn
k
n
n - 1
n - 2
n - 3
2
1
1
1
1
1
1
n
n
nn - 1
n - 2
3
2
(n2)
Running time for worst-case partitions at each recursive level: T(n) = T(n – 1) + T(0) + PartitionTime(n)
= T(n – 1) + (n) = k=1 to n(k) = (k=1 to n k ) = (n2)
16
Analysis of Quicksort: Best case
• Best-case partitioning– Partitioning produces two regions of size n/2
• Recurrence
T(n) = 2T(n/2) + (n)
T(n) = (nlgn) (Master theorem)
Partitioning at various levels
18
Analysis of Quicksort
• Balanced partitioning– Average case is closer to best case than to worst case– (if partitioning always produces a constant split)
• E.g.: 9-to-1 proportional split
T(n) = T(9n/10) + T(n/10) + n
Typical case for quicksort
• If the array is sorted to begin with, Quicksort is terrible: O(n2)• It is possible to construct other bad cases
• However, Quicksort is usually O(n log2n)
• The constants are so good that Quicksort is generally the fastest algorithm known
• Most real-world sorting is done by Quicksort
• Is average-case closer to best-case or worst-case?
Performance of Quicksort
• Average case– All permutations of the input numbers are equally likely– On a random input array, we will have a mix of well balanced and
unbalanced splits– Good and bad splits are randomly distributed throughout the tree
Alternation of a badand a good split
Nearly wellbalanced split
n
n - 11
(n – 1)/2(n – 1)/2
n
(n – 1)/2(n – 1)/2 + 1
• Running time of Quicksort when levels alternate between good and bad splits is O(n lg n)
combined cost:2n-1 = (n)
combined cost:n = (n)
Picking a better pivot
• Before, we picked the last element of the subarray to use as a pivot– If the array is already sorted, this results in O(n2) behavior– It’s no better if we pick the last element– Note that an array of identical elements is already sorted!
• We could do an optimal quicksort (guaranteed O(n log n)) if we always picked a pivot value that exactly cuts the array in half– Such a value is called a median: half of the values in the
array are larger, half are smaller– The easiest way to find the median is to sort the array and
pick the value in the middle (!)
Median of three
• Obviously, it doesn’t make sense to sort the array in order to find the median to use as a pivot
• Instead, compare just three elements of our (sub)array—the first, the last, and the middle– Take the median (middle value) of these three as pivot– It’s possible (but not easy) to construct cases which will make
this technique O(n2)• Suppose we rearrange (sort) these three numbers so that the
smallest is in the first position, the largest in the last position, and the other in the middle– This lets us simplify and speed up the partition loop
Randomized Algorithms• The behavior is determined in part by values produced by a
random-number generator
– RANDOM(a, b) returns an integer r, where a ≤ r ≤ b and each of the b-
a+1 possible values of r is equally likely
• Algorithm generates randomness in input
• No input can consistently elicit worst case behavior
– Worst case occurs only if we get “unlucky” numbers from the random
number generator
• Randomized PARTITION
Alg.: RANDOMIZED-PARTITION(A, p, r)
i ← RANDOM(p, r)
exchange A[r] ↔ A[i]
return PARTITION(A, p, r)
Alg. : RANDOMIZED-QUICKSORT(A, p, r)
if p < r
then q ← RANDOMIZED-PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, q)
RANDOMIZED-QUICKSORT(A, q + 1, r)
Final comments
• Until 2002, quicksort was the fastest known general sorting algorithm, on average.
• Still the most common sorting algorithm in standard libraries.
• For optimum speed, the pivot must be chosen carefully.– Median of three– Randomization
• There will be some cases where Quicksort runs in O(n2) time.