62
Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004

Sorting

Embed Size (px)

DESCRIPTION

Sorting. 15-211 Fundamental Data Structures and Algorithms. Klaus Sutner February 17, 2004. Announcements. Homework 5 is out Reading: Chapter 8 in MAW Quiz 1 available on Thursday. Introduction to Sorting. Boring …. - PowerPoint PPT Presentation

Citation preview

Page 1: Sorting

Sorting

15-211 Fundamental Data Structures and Algorithms

Klaus Sutner

February 17, 2004

Page 2: Sorting

Announcements

Homework 5 is out

Reading:

Chapter 8 in MAW

Quiz 1 available on Thursday

Page 3: Sorting

Introduction to Sorting

Page 4: Sorting

Boring …

Sorting is admittedly not very sexy, everybody knows some algorithms already, …

But: Good sorting algorithms are needed absolutely everywhere.

Sorting is fairly well understood theoretically.

Provides a good way to introduce some important ideas.

Page 5: Sorting

The Problem

We are given a sequence of items

a1 a2 a3 … an-1 an

We want to rearrange them so that they are in non-decreasing order.

More precisely, we need a permutation f such that

af(1) af(2) af(3) … af(n-1) af(n)

.

Page 6: Sorting

A Constraint

Comparison Based Sorting

While we are rearranging the items, we will only use queries of the form

ai aj

Or variants thereof (<,> and so forth).

Page 7: Sorting

Say What?

The important point here is that the algorithm can can only make comparison such as

if( a[i] < a[j] ) …

We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.

Page 8: Sorting

An Easy Upper Bound

Here is a simple idea to sort an array: a flip is a position in the array where two adjacent elements are out of order.

a[i] > a[i+1]

Let’s look for a flip and correct it by swapping the two elements.

Page 9: Sorting

A Prototype Algorithm

// FlipSortwhile( there is a flip )

pick one, fix it

Is this algorithm guaranteed to terminate?

If so, what can we say about its running time?

Is it correct, i.e., is the array sorted?

Page 10: Sorting

Termination

while( there is a flip )pick one, fix it

It’s tempting to do induction on the number of flips but beware:

10 15 5 10 10 5 15 10

We need to talk about inversions instead.

Page 11: Sorting

Flips and Inversions

24 47 13 99 105 222

inversion flip

Page 12: Sorting

Running Time

The total number of inversions is clearly quadratic at most.

So we can sort in quadratic time if we can manage to find and fix a flip in constant time.

We need to organize the search somehow.

Probably should try to avoid recomputation.

Page 13: Sorting

Naïve sorting algorithms

Bubble Sort

Selection Sort

Insertion Sort this one is actually important

Are all quadratic in the worst case and on average.

Page 14: Sorting

Bubble Sort

Scan through the array, fix flips as you go along. Repeat until array is sorted.

for( i = 2; i <= n; i++ )

for( j = n; j >= i; j-- )

if( A[j-1] > A[j] )

swap A[j-1] and A[j];

Page 15: Sorting

Selection Sort

For k = n, n-1, … find the smallest element in the last k elements of the array and swap it to the front.

for( i = 1; i <= n-1; i++ )

find A[j] minimal in A[i..n]

swap with A[i]

Page 16: Sorting

Insertion Sort

Place the ith element into the proper place into the already sorted list of the first i-1 elements.

for i = 2 to n do

order-insert a[i] in a[1:i-1]

Can be implemented nicely.

Page 17: Sorting

Insertion Sort

Using a sentinel.

for( i = 2; i <= n; i++ )

x = A[i];

A[0] = x;

for( j = i; x < A[j-1]; j-- )

A[j] = A[j-1];

A[j] = x;

Page 18: Sorting

Insertion sort

105 47 13 99 30 222

47 105 13 99 30 222

13 47 105 99 30 222

13 47 99 105 30 222

13 30 47 99 105 222

105 47 13 99 30 222

Sorted sublist

Page 19: Sorting

How fast is insertion sort?

Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with.

3 2 1 6 5 4 9 8 7 …

Page 20: Sorting

How long does it take to sort?

Can we do better than O(n2)?

In the worst case?

In the average case

Page 21: Sorting

Sorting in O(n log n)

O(n log n) turns out to be a Magic Wall: it is hard to reach, and exceedingly hard to break through.

In fact, it’s impossible in a sense to do better than O(n log n).

We already know that Heapsort will give us this bound:

- build the heap in linear time,- destroy it in O(n log n).

Page 22: Sorting

Heapsort in practice

The average-case analysis for heapsort is somewhat complex.

In practice, heapsort consistently tends to use nearly n log n comparisons.

So, while the worst case is better than n2, other algorithms sometimes work better.

Page 23: Sorting

Shellsort

Shellsort, like insertion sort, is based on swapping inverted pairs.

It achieves O(n4/3) running time.

[See your book for details.]

Page 24: Sorting

Shellsort

Example with sequence 3, 1.

105 47 13 99 30 222

99 47 13 105 30 222

99 30 13 105 47 222

99 30 13 105 47 222

30 99 13 105 47 222

30 13 99 105 47 222

...

Several inverted pairs fixed in one exchange.

Page 25: Sorting

Recursive Sorting

Page 26: Sorting

Recursive sorting

Intuitively, divide the problem into pieces and then recombine the results.

If array is length 1, then done.

If array is length N>1, then split in half and sort each half.

Then combine the results.

An example of a divide-and-conquer algorithm.

Page 27: Sorting

Divide-and-conquer

Page 28: Sorting

Divide-and-conquer

Page 29: Sorting

Why divide-and-conquer works

Suppose the amount of work required to divide and recombine is linear, that is, O(n).

Suppose also that the amount of work to complete each step is greater than O(n).

Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Page 30: Sorting

Divide-and-conquer is big

We will see several examples of divide-and-conquer in this course.

Page 31: Sorting

Recursive Sorting

If array is length 1, then done.

Otherwise, split into two smaller pieces.

Sort each piece.

Combine the sorted pieces.

Page 32: Sorting

Two Major Approaches

1. Make the split trivial, but perform some work when the pieces are combined Merge Sort.

2. Work during the split, but then do nothing in the combination step Quick Sort.

In either case, the overhead should be linearwith small constants.

Page 33: Sorting

Analysis

The analysis is relatively easy if the two pieceshave (approximately) the same size.

This is the case for Merge Sort, but not forQuick Sort.

Let’s ignore the second case for the time being.

Page 34: Sorting

Recurrence Equations

We need to deal with equations of the form

T(1) = 1T(n) = 2 T(n/2) + f(n)

Here f(n) is the non-recursive overhead.

There are two recursive calls, each to a sub-instance of the same size n/2.

Of course, there are other cases to consider.

Page 35: Sorting

Recurrence Equations

A slight generalization is

T(1) = 1T(n) = a T(n/b) + f(n)

Here f(n) is again the non-recursive overhead.

There are a recursive calls, each to a sub-instance of the size n/b.

Page 36: Sorting

Recurrence Equations

Of course, we’re cheating:

T(1) = 1T(n) = a T(n/b) + f(n)

Makes no sense unless b divides n.

Let’s just ignore this. In reality there are ceilings and floors and continuity arguments everywhere.

Page 37: Sorting

Mergesort

Page 38: Sorting

The Algorithm

Merging the two sorted parts here is responsible for the overhead.

merge( nil, B ) = B;merge( A, nil ) = A;

merge( a A, b B ) = if( a <= b )

prepend( merge( A, b B ), a ) elseprepend( merge( a A, B ), b )

Page 39: Sorting

The Algorithm

The main function.

List MergeSort( List L ){

if( length(L) <= 1 ) return L;

A = first half of L;B = second half of L;return

merge(MergeSort(A),MergeSort(B));}

Page 40: Sorting

Harsh Reality

In reality, the items are always given in an array.

The first and second half can be found by index arithmetic.

L LR L

Page 41: Sorting

But Note …

We cannot perform the merge operation in place.

Rather, we need to have another array as scratch space.

The total space requirement for Merge Sort is

2n + O(log n)

Assuming the recursive implementation.

Page 42: Sorting

Running Time

Solving the recurrence equation for Merge Sort one can see that the running time is

O(n log n)

Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media.

But overall it is no match for Quick Sort.

Page 43: Sorting

Quicksort

Page 44: Sorting

Quicksort

Quicksort was invented in 1960 by Tony Hoare.

Although it has O(N2) worst-case performance, on average it is O(Nlog N).

More importantly, it is the fastest known comparison-based sorting algorithm in practice.

Page 45: Sorting

Quicksort idea

Choose a pivot.

Page 46: Sorting

Quicksort idea

Choose a pivot.

Rearrange so that pivot is in the “right” spot.

Page 47: Sorting

Quicksort idea

Choose a pivot.

Rearrange so that pivot is in the “right” spot.

Recurse on each half and conquer!

Page 48: Sorting

Quicksort algorithm

If array A has 1 (or 0) elements, then done.

Choose a pivot element x from A.

Divide A-{x} into two arrays:

B = {yA | yx}

C = {yA | yx}

Quicksort arrays B and C.

Result is B+{x}+C.

Page 49: Sorting

Quicksort algorithm

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Page 50: Sorting

Quicksort algorithm

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

In practice, insertion sort is used once the arrays get “small enough”.

105 222

Page 51: Sorting

Doing quicksort in place

85 24 63 50 17 31 96 45

85 24 63 45 17 31 96 50

L R

85 24 63 45 17 31 96 50

L R

31 24 63 45 17 85 96 50

L R

Page 52: Sorting

Doing quicksort in place

31 24 63 45 17 85 96 50

L R

31 24 17 45 63 85 96 50

R L

31 24 17 45 50 85 96 63

31 24 17 45 63 85 96 50

L R

Page 53: Sorting

Quicksort is fast but hard to do

Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.

More about invariants next time.

Quicksort is very fast in practice.

Faster than mergesort because Quicksort can be done “in place”.

Page 54: Sorting

Informal analysis

If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.Ideally, split down the middle.

Also, not specified how to choose the pivot.Ideally, the median value of the array,

but this would be expensive to compute.

As a result, it is possible that Quicksort will show O(N2) behavior.

Page 55: Sorting

Worst-case behavior

105 47 13 17 30 222 5 195

47 13 17 30 222 19 105

47 105 17 30 222 19

13

17

47 105 19 30 22219

Page 56: Sorting

Analysis of quicksort

Assume random pivot.

T(0) = 1

T(1) = 1

T(N) = T(i) + T(N-i-1) + cN, for N>1

where I is the size of the left subarray.

Page 57: Sorting

Worst-case analysis

If the pivot is always the smallest element, then:

T(0) = 1T(1) = 1T(N) = T(0) + T(N-1) + cN, for N>1 T(N-1) + cN = O(N2)

See the book for details on this solution.

Page 58: Sorting

Best-case analysis

In the best case, the pivot is always the median element.

In that case, the splits are always “down the middle”.

Hence, same behavior as mergesort.

That is, O(Nlog N).

Page 59: Sorting

Average-case analysis

Consider the quicksort tree:

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Page 60: Sorting

Average-case analysis

The time spent at each level of the tree is O(N).

So, on average, how many levels?That is, what is the expected height of

the tree?

If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

Page 61: Sorting

Average-case analysis

We’ll answer this question next time…

Page 62: Sorting

Summary of quicksort

A fast sorting algorithm in practice.

Can be implemented in-place.

But is O(N2) in the worst case.

Average-case performance?