CSC301 GROUP.doc

Quick Sort

The basic version of quick sort algorithm was invented by C. A. R. Hoare in 1960 and formally introduced quick sort in 1962. It is used on the principle of divide-and-conquer. Quick sort is an algorithm of choice in many situations because it is not difficult to implement, it is a good "general purpose" sort and it consumes relatively fewer resources during execution.

Good points It is in-place since it uses only a small auxiliary stack. It requires only n log(n) time to sort n items.

It has an extremely short inner loop

This algorithm has been subjected to a thorough mathematical analysis; a very precise statement can be made about performance issues.

Bad Points It is recursive. Especially if recursion is not available, the

implementation is extremely complicated. It requires quadratic (i.e., n2) time in the worst-case.

It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to perform badly.

Quick sort works by partitioning a given array A[p . . r] into two non-empty sub array A[p . . q] and A[q+1 . . r] such that every key in A[p . . q] is less than or equal to every key in A[q+1 . . r]. Then the two subarrays are sorted by recursive calls to Quick sort. The exact position

of the partition depends on the given array and index q is computed as a part of the partitioning procedure.

QuickSort1. If p < r then

2. q Partition (A, p, r)

3. Recursive call to Quick Sort (A, p, q)

4. Recursive call to Quick Sort (A, q + r, r)

Note that to sort entire array, the initial call Quick Sort (A, 1, length[A])

As a first step, Quick Sort chooses as pivot one of the items in the array to be sorted. Then array is then partitioned on either side of the pivot. Elements that are less than or equal to pivot will move toward the left and elements that are greater than or equal to pivot will move toward the right.

Partitioning the Array

Partitioning procedure rearranges the subarrays in-place.

PARTITION (A, p, r)

1. x ← A[p]

2. i ← p-1

3. j ← r+1

4. while TRUE do

5. Repeat j ← j-1

6. until A[j] ≤ x

7. Repeat i ← i+1

8. until A[i] ≥ x

9. if i < j

10. then exchange A[i] ↔ A[j]

11. else return j

Partition selects the first key, A[p] as a pivot key about which the array will partitioned:

Keys ≤ A[p] will be moved towards the left .Keys ≥ A[p] will be moved towards the right.

The running time of the partition procedure is (n) where n = r - p +1 which is the number of keys in the array.

Another argument that running time of PARTITION on a subarray of size (n) is as follows: Pointer i and pointer j start at each end and move towards each other, conveying somewhere in the middle. The total number of times that i can be incremented and j can be decremented is therefore O(n). Associated with each increment or decrement there are O(1) comparisons and swaps. Hence, the total time is O(n).

Array of Same Elements

Since all the elements are equal, the "less than or equal" teat in lines 6 and 8 in the PARTITION (A, p, r) will always be true. this simply means that repeat loop all stop at once. Intuitively, the first repeat loop moves j to the left; the second repeat loop moves i to the right. In this case, when all elements are equal, each repeat loop moves i and j towards the middle one space. They meet in the middle, so q= Floor(p+r/2). Therefore, when all elements in the array A[p . . r] have the same value equal to Floor(p+r/2).

Performance of Quick Sort

The running time of quick sort depends on whether partition is balanced or unbalanced, which in turn depends on which elements of an array to be sorted are used for partitioning.

A very good partition splits an array up into two equal sized arrays. A bad partition, on other hand, splits an array up into two arrays of very different sizes. The worst partition puts only one element in one array and all other elements in the other array. If the partitioning is balanced, the Quick sort runs asymptotically as fast as merge sort. On the other hand, if partitioning is unbalanced, the Quick sort runs asymptotically as slow as insertion sort.

Best Case

The best thing that could happen in Quick sort would be that each partitioning stage divides the array exactly in half. In other words, the best to be a median of the keys in A[p . . r] every time procedure 'Partition' is called. The procedure 'Partition' always split the array to be sorted into two equal sized arrays.

If the procedure 'Partition' produces two regions of size n/2. the recurrence relation is then

T(n) = T(n/2) + T(n/2) + (n) = 2T(n/2) + (n)

And from case 2 of Master theoremi

Worst case Partitioning

The worst-case occurs if given array A[1 . . n] is already sorted. The PARTITION (A, p, r) call always return p so successive calls to partition will split arrays of length n, n-1, n-2, . . . , 2 and running time proportional to n + (n-1) + (n-2) + . . . + 2 = [(n+2)(n-1)]/2 = (n2). The worst-case also occurs if A[1 . . n] starts out in reverse order.

Randomized Quick Sort

In the randomized version of Quick sort we impose a distribution on input. This does not improve the worst-case running time independent of the input ordering.

In this version we choose a random key for the pivot. Assume that procedure Random (a, b) returns a random integer in the range [a, b); there are b-a+1 integers in the range and procedure is equally likely to return one of them. The new partition procedure, simply implemented the swap before actually partitioning.

RANDOMIZED_PARTITION (A, p, r)

i ← RANDOM (p, r)Exchange A[p] ↔ A[i]return PARTITION (A, p, r)

Now randomized quick sort call the above procedure in place of PARTITION

RANDOMIZED_QUICKSORT (A, p, r)

If p < r then q ← RANDOMIZED_PARTITION (A, p, r) RANDOMIZED_QUICKSORT (A, p, q) RANDOMIZED_QUICKSORT (A, q+1, r)

Like other randomized algorithms, RANDOMIZED_QUICKSORT has the property that no particular input elicits its worst-case behavior; the behavior of algorithm only depends on the random-number generator. Even intentionally, we cannot produce a bad input for RANDOMIZED_QUICKSORT unless we can predict generator will produce next.

Analysis of Quick sort

Worst-case

Let T(n) be the worst-case time for QUICK SORT on input size n. We have a recurrence

T(n) = max1≤q≤n-1 (T(q) + T(n-q)) + (n) --------- 1

where q runs from 1 to n-1, since the partition produces two regions, each having size at least 1.

Now we guess that T(n) ≤ cn2 for some constant c.

Substituting our guess in equation 1.We get

T(n) = max1≤q≤n-1 (cq2 ) + c(n - q2)) + (n) = c max (q2 + (n - q)2) + (n)

Since the second derivative of expression q2 + (n-q)2 with respect to q is positive. Therefore, expression achieves a maximum over the range 1≤ q ≤ n -1 at one of the endpoints. This gives the bound max (q2 + (n - q)2)) 1 + (n -1)2 = n2 + 2(n -1).

Continuing with our bounding of T(n) we get

T(n) ≤ c [n2 - 2(n-1)] + (n) = cn2 - 2c(n-1) + (n)

Since we can pick the constant so that the 2c(n -1) term dominates the (n) term we have

T(n) ≤ cn2

Thus the worst-case running time of quick sort is (n2).

Average-case Analysis

If the split induced by RANDOMIZED_PARTITION puts constant fraction of elements on one side of the partition, then the recurrence tree has depth (lgn) and (n) work is performed at (lg n) of these level. This is an intuitive argument why the average-case running time of RANDOMIZED_QUICKSORT is (n lg n).

Let T(n) denotes the average time required to sort an array of n elements. A call to RANDOMIZED_QUICKSORT with a 1 element array takes a constant time, so we have T(1) = (1).

After the split RANDOMIZED_QUICKSORT calls itself to sort two sub arrays. The average time to sort an array A[1 . . q] is T[q] and the average time to sort an array A[q+1 . . n] is T[n-q]. We have

T(n) = 1/n (T(1) + T(n-1) + n-1∑q=1 T(q) + T(n-q))) + (n) --- 1

We know from worst-case analysis

T(1) = (1) and T(n -1) = O(n2)T(n) = 1/n ( (1) + O(n2)) + 1/n n-1∑q=1 (r(q) + T(n - q)) + (n) = 1/n n-1∑q=1(T(q) + T(n - q)) + (n) ------- 2 = 1/n[2 n-1∑k=1(T(k)] + (n) = 2/n n-1∑k=1(T(k) + (n) --------- 3

Solve the above recurrence using substitution method. Assume inductively that T(n) ≤ anlgn + b for some constants a > 0 and b > 0.

If we can pick 'a' and 'b' large enough so that n lg n + b > T(1). Then for n > 1, we have

T(n) ≥ n-1∑k=1 2/n (aklgk + b) + (n) = 2a/n n-1∑k=1 klgk - 1/8(n2) + 2b/n (n -1) + (n) ------- 4

At this point we are claiming that

n-1∑k=1 klgk ≤ 1/2 n2 lgn - 1/8(n2)

Stick this claim in the equation 4 above and we get

T(n) ≤ 2a/n [1/2 n2 lgn - 1/8(n2)] + 2/n b(n -1) + (n) ≤ anlgn - an/4 + 2b + (n) ---------- 5

In the above equation, we see that (n) + b and an/4 are polynomials and we certainly can choose 'a' large enough so that an/4 dominates (n) + b.

We conclude that QUICKSORT's average running time is (n lg(n)).

Conclusion

Quick sort is an in place sorting algorithm whose worst-case running time is (n2) and expected running time is (n lg n) where constants hidden in (n lg n) are small.

Implementationvoid quickSort(int numbers[], int array_size){ q_sort(numbers, 0, array_size - 1);}

void q_sort(int numbers[], int left, int right){ int pivot, l_hold, r_hold;

l_hold = left; r_hold = right; pivot = numbers[left]; while (left < right) { while ((numbers[right] >= pivot) && (left < right)) right--; if (left != right) { numbers[left] = numbers[right]; left++; }

while ((numbers[left] <= pivot) && (left < right)) left++; if (left != right) { numbers[right] = numbers[left]; right--; } } numbers[left] = pivot; pivot = left; left = l_hold; right = r_hold; if (left < pivot) q_sort(numbers, left, pivot-1); if (right > pivot) q_sort(numbers, pivot+1, right);

Shell Sort

This algorithm is a simple extension of Insertion sort. Its speed comes from the fact that it exchanges elements that are far apart (the insertion sort exchanges only adjacent elements).

The idea of the Shell sort is to rearrange the file to give it the property that taking every hth element (starting anywhere) yields a sorted file. Such a file is said to be h-sorted.

SHELL_SORT (A)

for h = 1 to h N/9 do for (; h > 0; h != 3) do for i = h +1 to i n do

v = A[i] j = i while (j > h AND A[j - h] > v A[i] = A[j - h] j = j - h A[j] = v i = i + 1

The function form of the running time for all Shell sort depends on the increment sequence and is unknown. For the above algorithm, two conjectures are n (log n)2 and n1.25. Furthermore, the running time is not sensitive to the initial ordering of the given sequence, unlike Insertion sort.

Shell sort is the method of choice for many sorting application because it has acceptable running time even for moderately large files and requires only small amount of code that is easy to get working. Having said that, it is worthwhile to replace Shell sort with a sophisticated sort in given sorting problem.

IMPLEMENTATIONvoid shellSort(int numbers[], int array_size){

int i, j, increment, temp;

increment = 3; while (increment > 0) { For (i=0; i < array_size; i++) { j = i; temp = numbers[i]; while ((j >= increment) && (numbers[j-increment] > temp)) { numbers[j] = numbers[j - increment]; j = j - increment; } numbers[j] = temp; } if (increment/2 != 0) increment = increment/2; else if (increment == 1) increment = 0; else increment = 1; }}

Bucket Sort

Bucket sort runs in linear time on the average. It assumes that the input is generated by a random process that distributes elements uniformly over the interval [0, 1).

The idea of Bucket sort is to divide the interval [0, 1) into n equal-sized subintervals, or buckets, and then distribute the n input numbers into the buckets. Since the inputs are uniformly distributed over (0, 1), we don't expect many numbers to fall into each bucket. To produce the output, simply sort the numbers in each bucket and then go through the bucket in order, listing the elements in each.

The code assumes that input is in n-element array A and each element in A satisfies 0 ≤ A[i] ≤ 1. We also need an auxiliary array B[0 . . n -1] for linked-lists (buckets).

BUCKET_SORT (A)

1. n ← length [A]

2. For i = 1 to n do

3. Insert A[i] into list B[nA[i]]

4. For i = 0 to n-1 do

5. Sort list B with Insertion sort

6. Concatenate the lists B[0], B[1], . . B[n-1] together in order.

Example

Given input array A[1..10]. The array B[0..9] of sorted lists or buckets after line 5. Bucket i holds values in the interval [i/10, (i +1)/10]. The sorted output consists of a concatenation in order of the lists first B[0] then B[1] then B[2] ... and the last one is B[9].

Analysis

All lines except line 5 take O(n) time in the worst case. We can see inspection that total time to examine all buckets in line 5 is O(n-1) i.e., O(n).

The only interesting part of the analysis is the time taken by Insertion sort in line 5. Let ni be the random variable denoting the number of elements in the bucket B[i]. Since the expected time to sort by INSERTION_SORT is O(n2), the expected time to sort the elements in bucket B[i] is

E[O(2ni)] = O(E[2ni]]

Therefore, the total expected time to sort all elements in all buckets is

n-1∑i=0 O(E[2ni]) = O n-1∑i=0 (E[2ni]) ------------ A

In order to evaluate this summation, we must determine the distribution of each random variable n

We have n elements and n buckets. The probability that a given element falls in a bucket B[i] is 1/n i.e., Probability = p = 1/n. (Note that this problem is the same as that of "Balls-and-Bin" problem).

Therefore, the probability follows the binomial distribution, which has

mean: E[ni] = np = 1 variance: Var[ni] = np(1- p) = 1- 1/n

For any random variable, we have

E[2ni] = Var[ni] + E2[ni] = 1 - 1/n + 12

= 2 - 1/n = (1)

Putting this value in equation A above, (do some tweaking) and we have a expected time for INSERTION_SORT, O(n).

Now back to our original problem

In the above Bucket sort algorithm, we observe

T(n) = [time to insert n elements in array A] + [time to go through auxiliary array B[0 . . n-1] * (Sort by INSERTION_SORT) = O(n) + (n-1) (n) = O(n)

Therefore, the entire Bucket sort algorithm runs in linear expected time.

Documents

CSC301 GROUP.doc