131
Sorting 1 Joe Meehean

1 Joe Meehean. Problem arrange comparable items in list into sorted order Most sorting algorithms involve comparing item values We assume items

Embed Size (px)

Citation preview

CS242 Data Structures II

Sorting1Joe Meehean

SortingProblemarrange comparable items in list into sorted orderMost sorting algorithms involve comparing item valuesWe assume items define< operator> operator== operatorSorting in STLvoid sort( Iterator begin, Iterator end)data items must override < operatorvoid sort( Iterator begin, Iterator end, Comparator cmp)Comparator is comparison functorcmp(a,b) returns true if a should go before b in the sorted listOften implemented using quick sort3SortingAspects we care aboutrun timememory costTypes of algorithmscomparison vs. non comparisonExamples of important sorting algorithmsGoogle Search: real world example

4Run-time complexityObvious algorithms are O(N2)Clever ones are O(NlogN)special purpose sorts can go even fasterAdditional considerationsdoes algorithm always take worst-case time? what is the average case?what happens when the list is already sorted?

5Memory CostIn-placesorts list with constant extra memorye.g., temporary variablesNot in-placerequires additional memory in relation to input sizee.g., another parallel list2 Kinds of Sorting AlgorithmsComparison sortcompare items in the listplace smaller items near the frontfastest worst case: O(NlogN)Non-comparison sortsort using special properties of itemsuse/extrapolate additional informatione.g., non-comparison sort O(Range+N)7Heap SortSorting algorithm based on heapsIdeainsert items from unsorted list into heapuse heap::removeMin to get items out of heap in sorted orderput items back into list in sorted order8Heap SortProblems with this approachComplexity not idealinserting N items into heap is O(NlogN)removing N items from heap is O(NlogN)it would be better if would could do the whole thing in O(NlogN)Memory costnot in-placeneed original list + a heap9Heap Sort: Improved ComplexityCan heapify a vector/array in O(N)convert unsorted vector into a max heapFor each parent node (N/2 to 0)make sure its larger than its childrenif its not, swap parent with largest childshiftDown(int pos, K val)Minor complicationvector starts at 0 not 1 like a normal heap

10Heapify110123456789LP4730168592Heapify120123456789PswapL4730168592Heapify130123456789PL4732168590Heapify140123456789LPR4732168590Heapify150123456789LPRswap4732168590Heapify160123456789LPR4792168530Heapify170123456789LPR4792168530Heapify180123456789LPRswap4792168530Heapify190123456789LPR4892167530Heapify200123456789LPR4892167530Heapify210123456789LPRswap4892167530Heapify220123456789LPRPLRswap9842167530Heapify230123456789LPRPLR9852167430Heapify240123456789LPR9852167430Heapify250123456789LPRswap9852167430Heapify260123456789LPRPLR18529674301852967430Heapify270123456789LPRPLRAnd So onHeapify2801234567895842967130Heapify ComplexityO(N)proof is somewhat complexsee Weiss 6.4.3 if interestedIntuitively it is faster because we only need to shiftdown the nodesplus starting at bottom reduces number of shift downsinserting each node into a heap shifts down for each insert (all the nodes)29Heap Sort: In-placeRemoving an item from the heap creates a space at the endThis space is where the largest item should go in the finished arrayWhy dont we just put it thererecall in heap::removeMax we return h[first] and replace h[first] with h[last]instead lets swap h[first] with h[last]30Heap Sort: In-place3101234567895842967130= Heap= Sorted VectorswapHeap Sort: In-place3201234567895842067139= Heap= Sorted VectorHeap Sort: In-place3301234567895842067139LPRshift down= Heap= Sorted VectorHeap Sort: In-place3401234567895042867139LPRshift down= Heap= Sorted VectorHeap Sort: In-place3501234567895742860139Pshift down= Heap= Sorted VectorHeap Sort: In-place3601234567895742860139swap= Heap= Sorted VectorHeap Sort: In-place3701234567895742360189shift downLPR= Heap= Sorted Vector= Heap= Sorted VectorHeap Sort: In-place3801234567895742360189shift downLPRAnd So onHeap Sort: In-place3901234567891234056789= Heap= Sorted VectorHeap Sort

Position in the ArrayItem valueHeap Sort ComplexityHeapifyO(N)In-place conversion of heap into sorted arrayO(NlogN)O(N) + O(NlogN) = O(NlogN)Costs the same if array was sorted to begin with41Questions?42Quick SortFundamental Ideaif all values in sorted array A are less than all values in sorted array Bwe can easily combine theman array of size 1 is sorted01234123405678901234AB012341234056789A5678943Quick Sort Algorithmif number of items in A is one or zero, returnChoose a value from A to be the pivotPartition A into sub-listsall values pivot into left partall values pivot into the right partReturn quicksort(L-part) + pivot + quicksort(R-part)

44Quick Sort: On the way down5643201745Quick Sort: On the way down45643201746Quick Sort: On the way down263154075643201747Quick Sort: On the way down4826315407156432017Quick Sort: On the way down4926315407231056432017Quick Sort: On the way down50263154072310356432017Quick Sort: On the way down512631540723102356432017Quick Sort: On the way down5226315407231023656432017Quick Sort: On the way down532631540723102365756432017Quick Sort: On the way up5441023657Quick Sort: On the way up5541023657Quick Sort: On the way up5641023657Quick Sort: On the way up5741023657Quick Sort: On the way up5841023657Quick Sort: On the way up5941023657In practice items are already in the correct place when we get to the bottom. All work is done on the way down.How to choose a good pivot?Goal: Choose the median valueso that left and right arrays are the same sizeIf we choose the smallest valueeach partition only reduces the problem by onesorting tree height will be Ninstead of log NSame if we choose the largest

60How to choose a good pivot?Actually finding median is O(N)Choose 1st item very bad if A is already sortedor reverse sortedChoose a random item (index)OK if you have a fast, accuraterandom number generatorwe dont

61Median of 3Reduces comparisons by 14%Compare the center, left, and right itemschoose the median as the pivotonly works if >= 3 items to be sortedPartitioning optimizationplace smallest of 3 in left ( pivot)place largest of 3 in right ( pivot)place pivot in the center62Quick Sort: In-place63Choose the median of 3place in the right positionsleftrightcenter56237014Quick Sort: In-place64Choose the median of 3place in the right positionsleftrightcenter56432017Quick Sort: In-place65Swap pivot with right 1leftrightcenter56432017Quick Sort: In-place66Swap pivot with right 1leftrightcenter56132047Quick Sort: In-place67Use indices hi and lo to partition remainder of the arrayincrement lo until it finds a value pivotdecrement hi until it finds a value pivotlohi56132047Quick Sort: In-place68When lo & hi stopswap lo and hiincrement lo decrement hilohi56132047Quick Sort: In-place69lohi06132547When lo & hi stopswap lo and hiincrement lo decrement hiQuick Sort: In-place70Repeat until lo > hi06132547lohiQuick Sort: In-place7103162547lohiRepeat until lo > hiQuick Sort: In-place7203162547lohiRepeat until lo > hiQuick Sort: In-place73Restore the pivotswap with lo03162547lohiQuick Sort: In-place74Done03142567lohiQuick Sort

Position in the ArrayItem valueQuick SortValues equal to the pivote.g., A[lo] pivot OR A[lo] > pivotworst case entire list is the same valueif we didnt swap for duplicates, lo would be all the way at the rightuneven partitionbest to swap values that are equal to pivot76Quick SortSmall listsinsertion sort is faster for N < 20Quick sort is recursivewill always, eventually, sort lists < 20large lists are broken down into small onescommonly quick sort is used until each sub-list is count[Ai]++scan count printing the ints weve seen

83Bucket Sort8436432019AAmax = 901234567Bucket Sort8536432019A00000000count00012345678901234567Bucket Sort8636432019A01000000count00i012345670123456789Bucket Sort8736432019A01100000count00i012345670123456789Bucket Sort8836432019A01100010count00i012345670123456789Bucket Sort8936432019A01110010count00i012345670123456789Bucket Sort9036432019A01210010count00i012345670123456789Bucket Sort9136432019A01211010count00i012345670123456789Bucket Sort9236432019A11211010count00i012345670123456789Bucket Sort9336432019A11211010count01i012345670123456789Bucket Sort9436430019A11211010count01i012345670123456789jBucket Sort9516430019A11211010count01i012345670123456789jBucket Sort9612430019A11211010count01i012345670123456789jBucket Sort9712330019A11211010count01i012345670123456789jBucket Sort9812330019A11211010count01i012345670123456789jBucket Sort9912330419A11211010count01i012345670123456789jBucket Sort10012330419A11211010count01i012345670123456789jBucket Sort10112330469A11211010count01i012345670123456789jBucket Sort10212330469A11211010count01i012345670123456789jBucket Sort10312330469A11211010count01i012345670123456789jBucket Sort10412330469A11211010count01i012345670123456789jBucket Sort AnalysisNot-in placerequires an extra M memoryComplexityscan the original list O(N)scan the count list O(M)O(M+N)105Radix SortRequires items are sequences of comparablesnumbers (sequence of digits)strings (sequence of characters)Useful for short sequences of comparablesIdeasort each position in sequence separately106Radix Sort ApproachUse an auxiliary array of queuesarray must be large enough to store queues for full range of digits 0-9 for numbersa-z for wordsProcess sequences from R to Lleast significant digit firstEach pass evaluates the next digitstore each item in queue in auxiliary array based on value of current digitdequeue items back into original array107Radix Sort Example0123456789[132, 355, 104, 327, 111, 285, 391, 543, 123, 535][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]108Radix Sort Example0123456789[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]109Radix Sort Example0123456789[132][ ][ ][ ][ ][ ][ ][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]110Radix Sort Example0123456789[132][355][ ][ ][ ][ ][ ][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]111Radix Sort Example0123456789[132][355][ ][ ][104][ ][ ][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]112Radix Sort Example0123456789[132][355][ ][ ][104][ ][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]113Radix Sort Example0123456789[132][355][111][ ][104][ ][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]114Radix Sort Example0123456789[132][355, 285][111][ ][104][ ][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]115Radix Sort Example0123456789[132][355, 285][111, 391][ ][104][ ][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]116Radix Sort Example0123456789[132][355, 285][111, 391][ ][104][543][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]117Radix Sort Example0123456789[132][355, 285][111, 391][ ][104][543, 123][327][ ][ ][ ][132, 355, 104, 327, 111, 285, 391, 543, 123, 535]118Radix Sort Example0123456789[132, 355, 104, 327, 111, 285, 391, 543, 123, 535][132][355, 285, 535][111, 391][ ][104][543, 123][327][ ][ ][ ]119Radix Sort Example0123456789[132][355, 285, 535][111, 391][ ][104][543, 123][327][ ][ ][ ][111, 391, 132, 543, 123, 104, 355, 285, 535, 327]120Radix Sort Example0123456789[111, 391, 132, 543, 123, 104, 355, 285, 535, 327][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]121Radix Sort Example0123456789[111, 391, 132, 543, 123, 104, 355, 285, 535, 327][111][123, 327][132, 535][543][355][285][104][391][ ][ ]122Radix Sort Example0123456789[111][123, 327][132, 535][543][355][285][104][391][ ][ ][104, 111, 123, 327, 132, 535, 543, 355, 285, 391]123Radix Sort Example0123456789[104, 111, 123, 327, 132, 535, 543, 355, 285, 391][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]124Radix Sort Example0123456789[104, 111, 123, 327, 132, 535, 543, 355, 285, 391][104, 111, 123, 132][285][327, 355, 391][535, 543][ ][ ][ ][ ][ ][ ]125Radix Sort Example0123456789[104, 111, 123, 132][285][327, 355, 391][535, 543][ ][ ][ ][ ][ ][ ][104, 111, 123, 132, 285, 327, 355, 391, 535, 543]126Time for Radix SortEach pass puts items in correct Q: O(N)moves items from Q to array: O(N)pass total: O(N) # of passes depends on # of digitsO(N * #digits)

127Compare Merge and Radix SortSort every number between 0 and 4,294,967,296 (4 billion something)Merge Sort Nlog2NN = 4 billionLog2(4 billion) = 324 billion * 32 = 128 billion operations

128Compare Merge and Radix SortRadix sortO(N * #digits)N = 4 billion# of digits = 104billion * 10 = 40 billionRadix sort about 3 times faster

129Sorting SummaryBestWorstAvgIn-PlaceSelectionO(N2)O(N2)O(N2)YesInsertionO(N)O(N2)O(N2)YesHeapO(NlogN)O(NlogN)O(NlogN)YesMergeO(NlogN)O(NlogN)O(NlogN)NoQuickO(NlogN)O(N2)O(NlogN)YesBucketO(N + M)O(N + M)O(N + M)NoRadixO(N * #digits)O(N * #digits)O(N * #digits)No130Questions?131