Lower bounds for sorting, Counting Sort

Embed Size (px)

Citation preview

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    1/13

    Lower Bounds for Sorting,Counting SortTerm Paper: Algorithm Analysis & Design

    Sadashiv Srivastava10810742

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    2/13

    Acknowledgement

    First and foremost, I would like to thank my teacher, Ms Shivani Malhotra, who has

    assigned me this topic to bring out my capabilities.

    I express my gratitude to my parents for being a continuous source of encouragement and for all

    their financial aid given to me.

    I would like to acknowledge the assistance provided to me by the library staff of LPU.

    My heartfelt gratitude to my friends for helping me to complete my work in time.

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    3/13

    Counting Sort

    Counting sort is a linear time sorting algorithm used to sort items when they belong to a fixed and

    finite set. Integers which lie in a fixed interval, say k1 to k2, are examples of such items.

    The algorithm proceeds by defining an ordering relation between the items from which the set to

    be sorted is derived (for a set of integers, this relation is trivial).Let the set to be sorted be called A.

    Then, an auxiliary array with size equal to the number of items in the superset is defined, say B.

    For each element in A, say e, the algorithm stores the number of items in A smaller than or equal

    to e in B (e). If the sorted set is to be stored in an array C, then for each e in A, taken in reverse

    order, C [B[e]] = e. After each such step, the value of B (e) is decremented.

    The algorithm makes two passes over A and one pass over B. If size of the range k is smaller than

    size of input n, then time complexity=O (n). Also, note that it is a stable algorithm, meaning that

    ties are resolved by reporting those elements first which occur first.

    Algorithm for Counting Sort

    COUNTING SORT (A,B,k)

    1. Fori = 1 to k

    2. Do c[i] = 0

    3. Forj = 1 to length [A]

    4. Do C [A[j]] = C [A[j]] + 1

    5. >C[i] now contains the number of elements equal to i

    6. Fori = 2 to k

    7. Do C[i] = C[i] + C[i-1]

    8. >C[i] now contains the number of elements less than or equal to i

    9. Forj = length[A] down to 1

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    4/13

    10. DoB[C [A[j]]] = A[j]

    11. C [A[j]] = C[A[J]] 1

    Analysis

    Because the algorithm uses only simple for loops, without recursion or subroutine calls, it

    is straightforward to analyze.

    The initialization of the Count array, and the second for loop which performs a prefix sum

    on the count array, each iterate at most k+ 1 times and therefore take O(k) time.

    The other two for loops, and the initialization of the output array, each take O(n) time.

    Therefore the time for the whole algorithm is the sum of the times for these steps, O(n + k).

    Because it uses arrays of length k+ 1 and n, the total space usage of the algorithm is

    also O(n + k).

    For problem instances in which the maximum key value is significantly smaller than the

    number of items, counting sort can be highly space-efficient, as the only storage it uses

    other than its input and output arrays is the Count array which uses space O(k).

    Example

    Each line below shows the step by step operation of counting sort.

    A 3 6 4 1 3 4 1 4 C 2 0 2 3 0 1

    C 2 2 4 7 7 8

    B 4 C 2 2 4 6 7 8

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    5/13

    B 1 4 C 1 2 4 6 7 8

    B 1 4 4 C 2 2 4 5 7 8

    B 1 1 3 3 4 4 4 6

    Analysis

    1. The loop of lines 1-2 takes O(k) time

    2. The loop of lines 3-4 takes O(n) time

    3. The loop of lines 6-7 takes O(k) time4. The loop of lines 9-11 takes O(n) time

    Therefore, the overall time of the counting sort is O(k) + O(n) + O(k) + O(n) = O(k+ n)

    In practice, we usually use counting sort algorithm when have k= O(n), in which case running

    time is O(n).

    The Counting sort is a stable sort i.e., multiple keys with the same value are placed in the sorted

    array in the same order that they appear in the input array.

    Suppose that the for-loop in line 9 of the Counting sort is rewritten:

    9 forj 1 to n

    then the stability no longer holds. Notice that the correctness of argument in the CLR does not

    depend on the order in which arrayA[1 . . n] is processed. The algorithm is correct no matter what

    order is used. In particular, the modified algorithm still places the elements with value kin

    position c[k - 1] + 1 through c[k], but in reverse order of their appearance inA[1 . . n].

    Note that Counting sort beats the lower bound of (n lg n), because it is not a comparison sort.

    There is no comparison between elements. Counting sort uses the actual values of the elements to

    index into an array.

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    6/13

    Lower Bounds for Sorting

    1. Overview

    Here we will discuss the notion of lower bounds, in particular for the problem of

    sorting. We show that any deterministic comparison-based sorting algorithm must take

    (n log n) time to sort an array ofn elements in the worst case.

    We then extend this result to average case performance, and to randomized algorithms.

    In the process, we introduce the 2-player game view of algorithm design and analysis.

    2. Lower Bound on Complexity for Sorting Methods

    Result 1

    The worst case complexity of any sorting algorithm that only uses key comparisons is

    (nlog n) .

    Result 2

    The average case complexity of any sorting algorithm that only uses key comparisons

    is

    (nlog n)

    The above results are proved using a Decision Tree which is a binary tree in which the

    nodes represent the status of the algorithm after making some comparisons.

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    7/13

    Consider a nodexin a decision tree and letybe its left child and zits right child. See

    Figure 1

    Figure 1: A decision tree scenario

    Basically,y represents a state consisting of the information known atx plus the fact that the

    key k1 is less than key k2. For a decision tree for insertion sort on 3 elements, see Figure 2

    Figure 2: Decision tree for a 3-element insertion sort

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    8/13

    3. Result 1: Lower Bound on Worst Case Complexity

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    9/13

    Given a list ofn distinct elements, there are n! possible outcomes that represent correct sorted

    orders.

    o Any decision tree describing a correct sorting algorithm on a list ofn elements will

    have at least n! leaves.

    o In fact, if we delete nodes corresponding to unnecessary comparisons and if we

    delete leaves that correspond to an inconsistent sequence of comparison results,

    there will be exactly n! leaves.

    The length of a path from the root to a leaf gives the number of comparisons made when

    the ordering represented by that leaf is the sorted order for a given input listL.

    The worst case complexity of an algorithm is given by the length of the longest path in the

    associated decision tree.

    To obtain a lower bound on the worst case complexity of sorting algorithm, we have to

    consider all possible decision trees having n! leaves and take the minimum longest path.

    In any decision tree, it is clear that the longest path will have a length of at least log n!

    Since

    n!

    log n! nlog n

    More Precisely,

    n!

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    10/13

    orlog(n!)

    log

    =

    logn -

    Thus any sorting algorithm that only uses comparisons has a worst case

    complexity (

    n log n) .

    4. Result 2: Lower bound on Average Case Complexity

    We shall show that in any decision tree withKleaves, the average depth of a leaf is at least

    logK.

    We shall show the result for any binary tree withKleaves.

    Suppose the result is not true. Suppose Tis the counterexample with the fewest nodes.

    Tcannot be a single node because log 1 = 0. Let Thave kleaves. Tcan only be of thefollowing two forms. Now see Figure 3

    Figure 3: Two possibilities for a counterexample with fewest nodes

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    11/13

    Suppose Tis of the from Tree 1. The tree rooted at n1, has fewer nodes than Tbut the same number

    of leaves and the hence an even smaller counterexample than T. Thus Tcannot be of Tree 1 form.

    Suppose Tis of the form of Trees 2. The trees T1 and T2 rooted at n1 and n2 are smaller than Tand

    therefore the

    Average depth of T1 log k1

    Average depth of T2 log k2

    Thus the average depth ofT

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    12/13

    the premise that the average depth ofTis < log k.

    Thus Tcannot be of the form of Tree 2.

    Thus in any decision tree with n! leaves, the average path length to a leaf is at least

    log(n!) O(nlog n)

    Bibliography

  • 8/2/2019 Lower bounds for sorting, Counting Sort

    13/13

    Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "8.2 Counting

    Sort", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 168170,ISBN 0-262-03293-7.

    See also the historical notes on page 181.

    Edmonds, Jeff (2008), "5.2 Counting Sort (a Stable Sort)", How to Think about Algorithms, Cambridge

    University Press, pp. 7275, ISBN 978-0-521-84931-9.

    Sedgewick, Robert (2003), "6.10 Key-Indexed Counting",Algorithms in Java, Parts 1-4: Fundamentals,

    Data Structures, Sorting, and Searching(3rd ed.), Addison-Wesley, pp. 312314.

    Knuth, D. E. (1998), The Art of Computer Programming, Volume 3: Sorting and Searching(2nd ed.),

    Addison-Wesley, ISBN 0-201-89685-0. Section 5.2, Sorting by counting, pp. 7580, and historical notes,

    p. 170.

    Burris, David S.; Schember, Kurt (1980), "Sorting sequential files with limited auxiliary

    storage", Proceedings of the 18th annual Southeast Regional Conference, New York, NY, USA: ACM,

    pp. 2331,doi:10.1145/503838.503855.

    Zagha, Marco; Blelloch, Guy E. (1991), "Radix sort for vector multiprocessors", Proceedings of

    Supercomputing '91, November 18-22, 1991, Albuquerque, NM, USA, IEEE Computer Society / ACM,

    pp. 712721, doi:10.1145/125826.126164.

    http://en.wikipedia.org/wiki/Thomas_H._Cormenhttp://en.wikipedia.org/wiki/Charles_E._Leisersonhttp://en.wikipedia.org/wiki/Ron_Rivesthttp://en.wikipedia.org/wiki/Clifford_Steinhttp://en.wikipedia.org/wiki/Introduction_to_Algorithmshttp://en.wikipedia.org/wiki/MIT_Presshttp://en.wikipedia.org/wiki/McGraw-Hillhttp://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0-262-03293-7http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/978-0-521-84931-9http://en.wikipedia.org/wiki/Robert_Sedgewick_(computer_scientist)http://en.wikipedia.org/wiki/Donald_Knuthhttp://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0-201-89685-0http://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.1145%2F503838.503855http://en.wikipedia.org/wiki/Guy_Blellochhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.1145%2F125826.126164http://en.wikipedia.org/wiki/Thomas_H._Cormenhttp://en.wikipedia.org/wiki/Charles_E._Leisersonhttp://en.wikipedia.org/wiki/Ron_Rivesthttp://en.wikipedia.org/wiki/Clifford_Steinhttp://en.wikipedia.org/wiki/Introduction_to_Algorithmshttp://en.wikipedia.org/wiki/MIT_Presshttp://en.wikipedia.org/wiki/McGraw-Hillhttp://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0-262-03293-7http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/978-0-521-84931-9http://en.wikipedia.org/wiki/Robert_Sedgewick_(computer_scientist)http://en.wikipedia.org/wiki/Donald_Knuthhttp://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0-201-89685-0http://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.1145%2F503838.503855http://en.wikipedia.org/wiki/Guy_Blellochhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.1145%2F125826.126164