25
Sorting – Part II CS 367 – Introduction to Data Structures

Sorting – Part II CS 367 – Introduction to Data Structures

Embed Size (px)

Citation preview

Page 1: Sorting – Part II CS 367 – Introduction to Data Structures

Sorting – Part II

CS 367 – Introduction to Data Structures

Page 2: Sorting – Part II CS 367 – Introduction to Data Structures

Better Sorting

• The problem with all previous examples is the O(n2) performance– this may be acceptable for small data sets,

but not large ones

• Theoretically, O(n log n) is possible– see proof in Section 9.2 of the book

Page 3: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort

• Major problem with selection sort– it has to search entire back end of array on

every search for next smallest item– what if we could make this search faster?

• A heap always keeps the largest element at the top– it only takes O(log n) to remove the top– O(log n) is much better than O(n) search time

of selection sort

Page 4: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort

• Basic procedure– build a heap– swap the root with the last element– rebuild the heap excluding the last element

• the last element is where it is supposed to be

– repeat until only one item left in the heap

Page 5: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort - ConceptuallyZ

T

X

N

M

J L

queue Z

0 1 2 3 4 5 6

X

L

T

N

M

J

queue X Z

0 1 2 3 4 5 6

queue T X Z

0 1 2 3 4 5 6T

L

N

J

M

Page 6: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort - Implementation

Z X M T N J L

0 1 2 3 4 5 6

swap 0 and 6, rebuild

X T M L N J Z

0 1 2 3 4 5 6

swap 0 and 5, rebuild

T N M L J X Z

0 1 2 3 4 5 6

swap 0 and 4, rebuild

N L M J T X Z

0 1 2 3 4 5 6

M L J N T X Z

0 1 2 3 4 5 6

swap 0 and 3, rebuild

L J M N T X Z

0 1 2 3 4 5 6

swap 0 and 2, rebuild

J L M N T X Z

0 1 2 3 4 5 6

swap 0 and 1, done

Page 7: Sorting – Part II CS 367 – Introduction to Data Structures

Building the Heap

• The heap will be build within the array– no extra data structures will be needed

• Basic idea– start at the last non-terminal node – restore heap for tree rooted at this node

• simply swap this node with it’s largest child if the child is larger

– repeat this process for all non-terminal nodes

Page 8: Sorting – Part II CS 367 – Introduction to Data Structures

Building the Heap

Z XM T NJ L

0 1 2 3 4 5 6

compare Z with its children(no move made)

ZJM T NX L

0 1 2 3 4 5 6

compare J with its children(swap it with X)

ZXM T NJ L

0 1 2 3 4 5 6

compare M with its children(swap it with Z and then N)

NXZ T MJ L

0 1 2 3 4 5 6

Valid Heap

Page 9: Sorting – Part II CS 367 – Introduction to Data Structures

Building the Heap

• Code to re-build the heapvoid moveDown(Object[ ] data, int first, int last) {

int child = 2 * first + 1;while(child <= last) { if((child < last) && ((child + 1) <= last)) { if(data[child] < data[child + 1]) { child++; }

if(data[first] < data[child]) { swap(first, child); first = child; child = 2 * child + 1; }

else { break; } }

}

Page 10: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort

• Code to build the heap and sort itvoid heapSort(Object[ ] data) {

// build the heap out of the data

for(int i=data.length / 2; i >= 0; i--)

moveDown(data, i, data.length – 1);

// now sort it

for(int i = data.length – 1; i < 0; i--) {

swap(0, i);

moveDown(data, 0, i – 1);

}

}

Page 11: Sorting – Part II CS 367 – Introduction to Data Structures

Heap Sort

• Time to build the heap in worst case– O(n)– proof can be found in Section 6.9.2 of book

• Number of swaps to perform– always (n – 1)

• Performance to rebuild the heap– O(n log n)

• Overall performance– O(n) + (n-1) + O(n log n) = O(n log n)

Page 12: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort• Basic procedure

– divide the initial array into two parts• all of the elements in the left side must be smaller than all of

the elements in the right side

– sort the two arrays separately and put them back together

• we now have a completely sorted array

– however, before sorting the two arrays, divided them each into two more arrays

• we now have a total of 4 arrays• smallest elements in far left and largest in far right

– repeat this process until only 1 element arrays remain• put them all together and the overall array is sorted

Page 13: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort

Z TM X NJ L

0 1 2 3 4 5 6

break into two parts

LM NJ

0 1 2 3

TZ X

0 1 2

break into four parts

Z

0

M N

0 1

J L

0 1

X T

0 1

break into 7 parts

Z

0

J

0

L

0

M

0

N

0

T

0

X

0

Page 14: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort - Implementing

• Steps1. move the largest value to the highest spot

– this prevents some array overflow problems

2. pick an upper bound for the left sub-array– pick the value in the center of the array– move this to first element so it doesn’t get moved

3. move all elements less than this to left side4. move all elements greater to the right side5. bound will now be in its final position6. repeat with the two new arrays

– from 0 to index(bound) – 1– from index(bound) + 1 to array.length - 1

Page 15: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort - Implementingvoid quickSort(Object[ ] data) {

if(data.length < 2) { return; }

int max = 0;

// find the highest value and put it in top spot

for(int i=1; i<data.length; i++)

if(data[i] > data[max]) { max = i; }

swap(max, data.length – 1);

// start the real algorithm

quickSort(data, 0, data.length – 2);

}

Page 16: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort - Implementingvoid quickSort(Object[ ] data, int first, int last) {

int lower = first + 1, upper = last;

swap(first, (first + last) / 2); // find the bound

Comparable bound = data[first];

while(lower <= upper) { // divides the array in half

while(data[lower] < bound) { lower++; } // lowers that are right

while(data[upper] > bound) { upper--; } // uppers that are right

if(lower < upper) { swap(lower++, upper--); }

else { lower++; } // arrays are already split

}

swap(upper, first); // puts bound in its final location

if(first < upper – 1) { quickSort(data, first, upper – 1); }

if(upper + 1 < last) { quickSort(data, upper + 1, last); }

}

Page 17: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort Performance

• Worst case– consider selecting the smallest (or largest)

number as the bound– then all of the numbers end up on one “side”– consider the sorting the following array

• [5 3 2 1 4 6 8]• 1 will be the first bound and end up in its proper

location• however, there will still be n – 1 elements to sort• this will happen on each iteration

– the result is an O(n2) algorithm

Page 18: Sorting – Part II CS 367 – Introduction to Data Structures

Quicksort Performance• So what’s the average case?

– the answer is O(n log n)

• In practice, quicksort is usually the best sorting algorithm– the closer the bound is to the median, the

better it is– beware, for arrays under 30 elements,

insertion sort is more efficient• can you think how quicksort and insertion sort

could be combined?

Page 19: Sorting – Part II CS 367 – Introduction to Data Structures

Mergesort

• One of the first ever sorting algorithms used on a computer

• It works on a principle similar to quicksort– each array is broken into two parts and then

sorted separately– this partition and sort method continues until

only single element arrays exist– then all of the arrays are put back together to

form a sorted array

Page 20: Sorting – Part II CS 367 – Introduction to Data Structures

Mergesort

• Big difference from quicksort is that the arrays are always broken into equal partitions– or in the case of an odd sized array, as close

as possible to even

• There is no bound selected• To put the arrays back together, simply

select the smallest element from either array and make it next

Page 21: Sorting – Part II CS 367 – Introduction to Data Structures

Mergesort

J XZ R VM T

0 1 2 3 4 5 6

break into 7 parts

V

0

Z

0

M

0

J

0

R

0

X

0

T

0

RJ ZM

0 1 2 3

XT V

0 1 2

M Z

0 1

J R

0 1

T X

0 1

R VJ T ZM X

0 1 2 3 4 5 6

Page 22: Sorting – Part II CS 367 – Introduction to Data Structures

Merging

• The most sophisticated part of mergesort is recombining (or merging) two separate arrays

• Just go through each array selecting the smallest remaining element from each array– add it to the new array

Page 23: Sorting – Part II CS 367 – Introduction to Data Structures

Merging

• Pseudo-codemerge(array, first, last) {

mid = (first + last) / 2;

i1= 0;

i2 = first;

i3 = mid + 1;

while( // both left and right sub-arrays contain elements ) {

if(array[i2] < array[i3]) { tmp[i1++] = array[i2++]; }

else { tmp[i1++ = array[i3++]; }

}

// load into temp array remaining elements of array

// copy elements in temp back into array

}

Page 24: Sorting – Part II CS 367 – Introduction to Data Structures

Mergesort

• Once the merge code is done, the code for mergesort is easy

• Psuedo-codemergeSort(data, first, last) {

if(first < last) {

mid = (first + last) / 2;

mergeSort(data, first, mid);

mergeSort(data, mid + 1, last);

merge(data, first, last);

}

}

Page 25: Sorting – Part II CS 367 – Introduction to Data Structures

Mergesort Performance

• Mergesort produces a lot of copying in memory

• It also requires extra storage space for the temporary array– this can be prohibitive for very large data sets