29
1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

Embed Size (px)

Citation preview

Page 1: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

1/28

COP 3540 Data Structures with OOP

Chapter 7 - Part 1Advanced Sorting

Page 2: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

2/28

Advanced Sorting

Two sorts we will cover first. Shell Sort – an O(n(log2 n) 2) sort … in

general, and ‘can approach’ O(n) performance!

Partitioning, an O(nlog2n) sort.

Then, we’ll cover the QuickSort.

Page 3: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

3/28

Recall how the Insertion Sort worked.

Took an element out of the ‘array’ and assumed all elements ‘to the left’ were sorted.

We marked this spot. And we extracted out that element. We then

compared the element extracted out with the elements ‘to the left’ of this element and

‘inserted’ this element into its proper place shifting all elements to the right as needed to

make room for this inserted element and fill the vacated spot.

Page 4: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

4/28

Approach that helped us:

Constraints: Helped ourselves by:

• starting with a single element to the left – so knew ‘that’ element was sorted - certainly sorted unto itself.

Then we proceeded:• Slowly the elements to the ‘left’ of the

marked element grew in sorted number, as new numbers find their proper place in the subarray to the left - while the unsorted elements to the right diminish in number.

Page 5: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

5/28

Potential Problems with the Insertion Sort

Now, what happens if the new number to be sorted is very small (or very large) and our sort is ‘ascending (or descending)?’

This may require a large number of ‘copies’ to the right to make room for this new element. Can require a number of ‘copies’ close to ‘n’ in fact. Average number of copies is clearly n/2. For n elements to be sorted and an average of n/2

copies per element, we have n*n/2 or n2/2 copies. That may result in a very inefficient sort. This is how the insertion sort is an O(n2) sort.

It is this number of copies (comparing and shifting) that decreases its performance.

Page 6: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

6/28

Shell Sort Approach Want to reduce these numbers of large shifts Shell sort does this by sorting a very small subset

of numbers – like three or four: Where the numbers themselves might be large

distances apart (like in a large array) and it sorts them with respect to each other

By sorting a small number of numbers, very small (or very large) numbers can be put much more nearly ‘in place’ much more quickly than with other approaches.

How done?

Page 7: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

7/28

Shell Sort uses the notion of a ‘computed Gap’ The Shell Sort uses a computed ‘gap’ between

numbers represented by an ‘h’ as the distance between numbers in each subset to be sorted.

1. Sorts all numbers (say in the array of numbers) with the same ‘h’ (gap)

• Like, numbers eight apart – or four apart…

• Sorts these numbers with respect to each other.

2. Then, after doing this, the algorithm reduces the gap (or distance) to a smaller number, like maybe 4 apart.

3. (Ultimately the gap has size = 1;) Then the algorithm ‘1-sorts’ the array using the insertion sort.

Page 8: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

8/28

Example

Consider: sort three elements at a time with respect to each other, where the numbers are some ‘h’ distance apart

……………………………………………………. For array size n=10, and if gap size h = 4, we

have four sub-arrays: (We call this a 4-sort) Indices: (0,4,8), (1,5,9), (2,6) and (3,7).

These sets are sorted with respect to each other.

(Note: all ten are sorted!) Arrays are interleaved, but, again, sorted with

respect to each other. (Note: the integers are not yet in final spot.

Page 9: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

9/28

Consider Improved Performance!

Recall again the Insertion Sort Recalling how the insertion sort works, very efficient for arrays nearly sorted (fewer swaps and movement,

and yet can be very inefficient (due to shifts and copies) if the data are very unsorted.

• Particularly true for very large / very small numbers.

Shell sort does ‘n-sorting’ Capitalizes on initial position of elements especially if they are far

from where they might ultimately end up. Brings numbers more quickly to final position…(or nearer)

Algorithm moves elements that may be very far apart much closer to their final position more quickly thus reducing copying and shifting and swapping!

Shell Sort can approach O(n) performance: much better than O(n2) !

Page 10: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

10/28

What about Larger Arrays? Gap Size?

Using a carefully researched algorithm to compute optimum gap size,. Don Knuth developed a ‘recursive’ relationship:

h= 3*h+1 to start with, and then, subsequent gaps at (h-1)/3. (note the ‘recursion’ in the formula itself. Uses value of h to compute new value of h.

These h-values are referred to as interval sequence or gap sequence

and are recursively computed as functions of h.

In more detail:

Page 11: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

11/28

Don Knuth’s algorithm will start with a 3-sort; that is, sort three numbers some distance apart.

By Don Knuth’s research reveals, as it turns out (algorithm is a few slides ahead), for an array of size > 364 and < 1093, 3-sort with a gap size of 364;

After that sort, use a gap size of 121; then gap size = 40; steadily decreasing…

Develop initial gap size recursively by computing h: (algorithm is three slides ahead)h 3*h+1 h is determined by computing the largest value

of h 1 4 computing h=h*3 +1 until h <= nElems/3 is false 4 13 13 40 So, computing h we see that h increases from 1 to 4 to

13 to 121 to 364 to …. 40 121 121 364 Once original gap is determined, sort continues and

algorithm steadily reduces gap h from 364 to 121 ..

364 1093 until h = 1 1093 3280 So for array size > 364 and < 1093, gap = 364, etc.

Gapsizes

Page 12: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

12/28

Algorithm (covered in previous slide) Algorithm first uses a short loop to

generate the first (initial) value of h. Then, once we have an initial value of h:

additional values of h are recursively computed depending on the size of the array to be sorted.

Gap then starts with largest h-value.

For a 1000-element array, our initial gap size is 364.

After sorting, we would successively decrease the gap using the formula: h = (h-1)/3 as shown.

Page 13: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

13/28

Note:

1. As it turns out, the algorithm actually sorts the first two elements of each group for a given gap first; then it goes back and sorts all three-element groups. This results in better performance time.

You will see this if you look carefully at the algorithm.

Page 14: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

14/28

public void shellSort(){ int inner, outer; long temp; int h = 1; // find initial value of h while (h <= nElems/3) // COMPUTE GAP SIZE h = h*3 + 1; // (1, 4, 13, 40, 121, 364,...) // Compute initial value of h // Value of h depends on original size of array, nElems.

// start with largest gap (h-value) such that h < nElem/3 while (h > 0) // for 1000 element array, h = 364

{ for (outer=h; outer<nElems; outer++) // h – sort the structure… { // for 1000 elements, h = 364; outer < nElems (1000); increment by one. temp = theArray[outer]; inner = outer; while (inner > h-1 && theArray[inner-h] >= temp) { theArray[inner] = theArray[inner-h]; inner -= h; } // end while

theArray[inner] = temp;

} // end for

h = (h-1) / 3; // computes new gap: decreases h } // end while (h>0)} // end shellSort()

Page 15: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

15/28

Google: Shell Sort Applet

Google: applet Lafore You will get a number of applet choices. Select and enjoy

Page 16: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

16/28

Demo of Shell Sort Do n=12 and notice how the gap varies across the

bars. You can see when h goes from 4 to 1. Can see when it compares two in the interval …

then three; then 1-sorts.

Do 100 sort. It starts with h = 40. See it compares two of the

three in the interval until there are only intervals of two left.

There is a larger number of intervals when it goes to h= 13.

Go to h=4 and see more intervals yet. Finally, h=1.

Do this.

Page 17: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

17/28

Shell Sort - Evaluation

Good for medium-sized array up to a few thousand items.

Shell Sort - O(n(log2n)2 ) is not as fast as the Quick Sort O(nlog2n) (coming soon)

Not so good for large files, but Easy to implement Requires very little extra space.

All sorts have a ‘worst case’ performance. For Shell Sorts, the

Worse case is not much worse than average performance, so this is good!

(Worse case is very different than average case in a Quick Sort).

Page 18: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

18/28

Final Remarks on Shell Sort

Other sequences are available. Many alternatives available. Can experiment… Ultimately, need to end up with a 1 Forces last pass to be an insertion sort.

Guideline: Gaps should be relatively prime. Note Shell Sort’s numbers presented are not all prime (4,

40…). • This led to some earlier inefficiencies.

Experiments on Shell Sort yield performance mostly between O(n3/2) to O(n7/6)) or from almost O(n2) down to almost O(n)!

Quite a difference and the difference is realized as n increases, which makes sense.

Page 19: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

19/28

Partitioning

Page 20: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

20/28

Partitioning

Partitioning is key to QuickSort thinking.

Partitioning divides data into two groups dependent upon the value of a key. E.g. Divide students into two groups: < 3.0 gpa; > 3.0

• (Incidentally, why is a gpa of 3.0 important??)

We select a Pivot Value: value used to separate data items into two groups: end up with Data < pivot value and Data > pivot value.

Page 21: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

21/28

Pivot Values

Note: pivot point can be any key value. Need not be a midpoint or value ‘half-way.’

Would be nice if pivot were half-way point, but we have no way of knowing…

Later we will see how the choice of the pivot impacts performance!

Pivot value used to separate array into left side and right side.

Ideally, we’d ‘like’ the sub-arrays to be roughly the same size, and we will work toward that reality.

Page 22: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

22/28

Run Partition Algorithm to build Sub-Arrays

Once pivot value selected, we run the partition algorithm

Once run, data on the left side of the pivot ‘belongs’ to the left side of

the array (whatever number of elements may be on the left) and,

Data on the right side (>=) than the pivot value belong to the right side, however many elements are on the right side.

Note: Once partitioning is run, data is NOT sorted, But, the items are a lot ‘closer’ to their final position… And array is partitioned based on the pivot value.

Page 23: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

23/28

The Partitioning Algorithm

Pick a pivot value… (more later)

Start with index at the left side of one partition. Let’s call it left scan.

Move toward the right. Compare element to pivot value. If an element is less than the pivot value, leave it alone. Move to the right.

Advance to the right until element is >= pivot value and then Stop.

Starting with index at right most index on the right side Let’s call it a right scan.

Move toward the left. Compare element to the pivot value If an element is >= pivot value, leave it alone; Move to the left.

Advance to the left until element is < pivot value and then Stop.

Swap the two values.

Iterate (back on the left; then right) until left and right scan are looking at the same entry.

….

Page 24: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

24/28

Let’s look at the applet

Page 25: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

25/28

Partition.html

Google: applet Lafore

Run with n=12 with various orderings…

Run with n=40. Notice the partition first and the final ordering…

Note: in running the partitioning algorithm the data are not totally sorted – but they are a good bit closer.

Page 26: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

26/28

Partitioning and the Pivot Value Note partitioning is not stable. As elements on one side are moved to the other

side of the pivot value, they are NOT necessarily in the same relative positions in this ‘new’ partition!

In fact, they tend to be in reverse order.

Further, the number of elements on each side need not be the same either – depends on the pivot value.

Very likely, there is NOT the same number of elements on each side of the pivot.

Page 27: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

27/28

One (of several) Problems with Partitioning

1. What if a poor pivot value were chosen such that all elements to the left were < pivot value?

Algorithm index keeps advancing. End up with array index out of bounds exception.

Ditto the other way. See code below.while (leftPtr < right && theArray[++leftPtr] < pivot)

; // nop

Clearly – as any program that is to be robust, there must be checks on the pivot value.

Page 28: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

28/28

Efficiency of the Partition Algorithm is pretty efficient too Runs in O(n) time.

Pointers move from opposite ends moving and swapping at a constant rate.

If n were 2n, the algorithm would take roughly twice as long.

Thus the algorithm operates in O(n) time – means time is proportional to the number of items being sorted.

Page 29: 1/28 COP 3540 Data Structures with OOP Chapter 7 - Part 1 Advanced Sorting

29/28

Efficiency of the Partitioning Algorithm

Non random data yields terrible results! If data is inversely ordered, then every pair will be

swapped, so n/2 swaps! Very inefficient! Multiply this by n elements and we have a n2 /2. Poor!

Random data: yields fewer than n/2 swaps. Some will already be in the right place. On average for random data, about half of maximum no.

of swaps will take place.

Regardless of random / non-random, both situations result in an efficiency proportional to n.