Upload
hila
View
29
Download
0
Embed Size (px)
DESCRIPTION
Sorting. 15-211 Fundamental Data Structures and Algorithms. Peter Lee February 20, 2003. Announcements. Homework #4 is out Get started today! Reading: Chapter 8 Quiz #2 available on Tuesday. Objects in calendar are closer than they appear. Introduction to Sorting. - PowerPoint PPT Presentation
Citation preview
Sorting
15-211 Fundamental Data Structures and Algorithms
Peter Lee
February 20, 2003
Announcements
Homework #4 is out
Get started today!
Reading:
Chapter 8
Quiz #2 available on Tuesday
Objects in calendar are closer than they appear.
Introduction to Sorting
Comparison-based sorting
We assume
• Items are stored in an array.
• Can be moved around in the array.
• Can compare any two array elements.
Comparison has 3 possible outcomes:
< = >
Flips and inversions
An unsorted array.
24 47 13 99 105 222
inversion flip
Naïve sorting algorithms
Bubble sort.
24 47 13 99 105 22213 4713 24
Keep scanning for flips, until all are fixed
What is the running time?
Insertion sort
105 47 13 99 30 222
47 105 13 99 30 222
13 47 105 99 30 222
13 47 99 105 30 222
13 30 47 99 105 222
105 47 13 99 30 222
Sorted sublist
Insertion sort
for i = 2 to n do
insert a[i] in the proper place
in a[1:i-1]
How fast is insertion sort?
Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with.
3 2 1 6 5 4 9 8 7 …
How many inversions?
Consider the worst case:
n n-1 … 3 2 1 0
In this case, there are
n + (n-1) + (n-2) + … + 1
or
How many inversions?
What about the average case? Consider:
p = x1 x2 x3 … xn-1 xn
For any such p, let rev(p) be its reversal.
Then (xi,xj) is an inversion either in p or in rev(p).
There are n(n-1)/2 pairs (xi,xj), hence the average number of inversions in a permutation is n(n-1)/4, or O(n2).
How long does it take to sort?
Can we do better than O(n2)?
In the worst case?
In the average case
Heapsort
Remember heaps:buildHeap has O(n) worst-case running
time.deleteMin has O(log n) worst-case
running time.
Heapsort:Build heap. O(n)
DeleteMin until empty. O(n log n)
Total worst case: O(n log n)
N2 vs Nlog N
N^2Nlog N
Sorting in O(n log n)
Heapsort establishes the fact that sorting can be accomplished in O(n log n) worst-case running time.
Heapsort in practice
The average-case analysis for heapsort is somewhat complex.
In practice, heapsort consistently tends to use nearly n log n comparisons.
So, while the worst case is better than n2, other algorithms sometimes work better.
Shellsort
Shellsort, like insertion sort, is based on swapping inverted pairs.
It achieves O(n4/3) running time.
[See your book for details.]
Shellsort
Example with sequence 3, 1.
105 47 13 99 30 222
99 47 13 105 30 222
99 30 13 105 47 222
99 30 13 105 47 222
30 99 13 105 47 222
30 13 99 105 47 222
...
Several inverted pairs fixed in one exchange.
Recursive Sorting
Recursive sorting
Intuitively, divide the problem into pieces and then recombine the results.
If array is length 1, then done.
If array is length N>1, then split in half and sort each half.
Then combine the results.
An example of a divide-and-conquer algorithm.
Divide-and-conquer
Divide-and-conquer
Why divide-and-conquer works
Suppose the amount of work required to divide and recombine is linear, that is, O(n).
Suppose also that the amount of work to complete each step is greater than O(n).
Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.
Divide-and-conquer is big
We will see several examples of divide-and-conquer in this course.
Recursive sorting
If array is length 1, then done.
If array is length N>1, then split in half and sort each half.
Then combine the results.
Analysis of recursive sorting
Suppose it takes time T(N) to sort N elements.
Suppose also it takes time N to combine the two sorted arrays.
Then:T(1) = 1T(N) = 2T(N/2) + N, for N>1
Solving for T gives the running time for the recursive sorting algorithm.
Remember recurrence relations?
Systems of equations such as
T(1) = 1
T(N) = 2T(N/2) + N, for N>1
are called recurrence relations (or sometimes recurrence equations).
A solution
A solution for
T(1) = 1T(N) = 2T(N/2) + N
is given by
T(N) = Nlog N + Nwhich is O(Nlog N).
How to solve such equations?
Recurrence relations
There are several methods for solving recurrence relations.
It is also useful sometimes to check that a solution is valid.This is done by induction.
Checking a solution
Base case:
T(1) = 1log 1 + 1 = 1
Inductive case:
Assume T(M) = Mlog M + M, all M<N.T(N) = 2T(N/2) + N
Checking a solution
Base case:
T(1) = 1log 1 + 1 = 1
Inductive case:
Assume T(M) = Mlog M + M, all M<N.T(N) = 2T(N/2) + N
Checking a solution
Base case:
T(1) = 1log 1 + 1 = 1
Inductive case:
Assume T(M) = Mlog M + M, all M<N.T(N) = 2T(N/2) + N = 2((N/2)(log(N/2))+N/2)+N = N(log N - log 2)+2N = Nlog N - N + 2N = Nlog N + N
Logarithms
Some useful equalities.
xA = B iff logxB = A
log 1 = 0
log2 2 = 1
log(AB) = log A + log B, if A, B > 0
log(A/B) = log A - log B, if A, B > 0
log(AB) = Blog A
Upper bounds for rec. relations
Divide-and-conquer algorithms are very useful in practice.
Furthermore, they all tend to generate similar recurrence relations.
As a result, approximate upper-bound solutions are well-known for recurrence relations derived from divide-and-conquer algorithms.
Divide-and-Conquer Theorem
Theorem: Let a, b, c 0.
The recurrence relation
T(1) = bT(N) = aT(N/c) + bN for any N which is a power of c
has upper-bound solutions
T(N) = O(N) if a<c
T(N) = O(Nlog N) if a=c
T(N) = O(Nlogca) if a>c
a=2, b=1,c=2 for recursivesorting
Upper-bounds
Corollary:
Dividing a problem into p pieces, each of size N/p, using only a linear amount of work, results in an O(Nlog N) algorithm.
Upper-bounds
Proof of this theorem later in the semester.
Exact solutions
Recall from earlier in the semester that it is sometimes possible to derive closed-form solutions to recurrence relations.
Several methods exist for doing this.
As an example, consider again our current equations:T(1) = 1
T(N) = 2T(N/2) + N, for N>1
Repeated substitution method
One technique is to use repeated substitution.
T(N) = 2T(N/2) + N2T(N/2) = 2(2T(N/4) + N/2) = 4T(N/4) + NT(N) = 4T(N/4) + 2N4T(N/4) = 4(2T(N/8) + N/4) = 8T(N/8) + NT(N) = 8T(N/8) + 3NT(N) = 2kT(N/2k) + kN
Repeated substitution, cont’d
We end up with
T(N) = 2kT(N/2k) + kN, for all k>1
Let’s use k=log N.
Note that 2log N = N.
So:
T(N) = NT(1) + Nlog N = Nlog N + N
Other methods
There are also other methods for solving recurrence relations.
For example, the “telescoping method”…
Telescoping method
We start with
T(N) = 2T(N/2) + N
Divide both sides by N to get
T(N)/N = (2T(N/2) + N)/N
= T(N/2)/(N/2) + 1
This is valid for any N that is a power of 2, so we can write the following:
Telescoping method, cont’d
Additional equations:
T(N)/N = T(N/2)/(N/2) + 1
T(N/2)/(N/2) = T(N/4)/(N/4) + 1
T(N/4)/(N/4) = T(N/8)/(N/8) + 1
…
T(2)/2 = T(1)/1 + 1
What happens when we sum all the left-hand and right-hand sides?
Telescoping method, cont’d
Additional equations:
T(N)/N = T(N/2)/(N/2) + 1
T(N/2)/(N/2) = T(N/4)/(N/4) + 1
T(N/4)/(N/4) = T(N/8)/(N/8) + 1
…
T(2)/2 = T(1)/1 + 1
We are left with:T(N)/N = T(1)/1 + log(N)
Telescoping method, cont’d
We are left with
T(N)/N = T(1)/1 + log(N)
Multiplying both sides by N gives
T(N) = N log(N) + N
Mergesort
Mergesort
Mergesort is the most basic recursive sorting algorithm.
Divide array in halves A and B.
Recursively mergesort each half.
Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array.
Note: Should be a careful to avoid creating of lots of result arrays.
Mergesort
Mergesort
But don’t actually want to create all of these arrays!
Mergesort
L LR L
Use simple indexes to perform the split.
Use a single extra array to hold each intermediate result.
Analysis of mergesort
Mergesort generates almost exactly the same recurrence relations shown before.
T(1) = 1
T(N) = 2T(N/2) + N - 1, for N>1
Thus, mergesort is O(Nlog N).
Quicksort
Quicksort
Quicksort was invented in 1960 by Tony Hoare.
Although it has O(N2) worst-case performance, on average it is O(Nlog N).
More importantly, it is the fastest known comparison-based sorting algorithm in practice.
Quicksort idea
Choose a pivot.
Quicksort idea
Choose a pivot.
Rearrange so that pivot is in the “right” spot.
Quicksort idea
Choose a pivot.
Rearrange so that pivot is in the “right” spot.
Recurse on each half and conquer!
Quicksort algorithm
If array A has 1 (or 0) elements, then done.
Choose a pivot element x from A.
Divide A-{x} into two arrays:
B = {yA | yx}
C = {yA | yx}
Quicksort arrays B and C.
Result is B+{x}+C.
Quicksort algorithm
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
Quicksort algorithm
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
In practice, insertion sort is used once the arrays get “small enough”.
105 222
Doing quicksort in place
85 24 63 50 17 31 96 45
85 24 63 45 17 31 96 50
L R
85 24 63 45 17 31 96 50
L R
31 24 63 45 17 85 96 50
L R
Doing quicksort in place
31 24 63 45 17 85 96 50
L R
31 24 17 45 63 85 96 50
R L
31 24 17 45 50 85 96 63
31 24 17 45 63 85 96 50
L R
Quicksort is fast but hard to do
Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.
More about invariants next time.
Quicksort is very fast in practice.
Faster than mergesort because Quicksort can be done “in place”.
Informal analysis
If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.Ideally, split down the middle.
Also, not specified how to choose the pivot.Ideally, the median value of the array,
but this would be expensive to compute.
As a result, it is possible that Quicksort will show O(N2) behavior.
Worst-case behavior
105 47 13 17 30 222 5 195
47 13 17 30 222 19 105
47 105 17 30 222 19
13
17
47 105 19 30 22219
Analysis of quicksort
Assume random pivot.
T(0) = 1
T(1) = 1
T(N) = T(i) + T(N-i-1) + cN, for N>1
where I is the size of the left subarray.
Worst-case analysis
If the pivot is always the smallest element, then:
T(0) = 1T(1) = 1T(N) = T(0) + T(N-1) + cN, for N>1 T(N-1) + cN = O(N2)
See the book for details on this solution.
Best-case analysis
In the best case, the pivot is always the median element.
In that case, the splits are always “down the middle”.
Hence, same behavior as mergesort.
That is, O(Nlog N).
Average-case analysis
Consider the quicksort tree:
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
Average-case analysis
The time spent at each level of the tree is O(N).
So, on average, how many levels?That is, what is the expected height of
the tree?
If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.
Average-case analysis
We’ll answer this question next time…
Summary of quicksort
A fast sorting algorithm in practice.
Can be implemented in-place.
But is O(N2) in the worst case.
Average-case performance?