14
1 Mark Dunlop, Computer and Information Sciences, Strathclyde University http://www.cis.strath.ac.uk/~mdd/ 1 Algorithms and Complexity Notes part 2: Searching & Sorting Mark D Dunlop [email protected] 3 5 11 Mark Dunlop, Computer and Information Sciences, Strathclyde University http://www.cis.strath.ac.uk/~mdd/ 2 Algorithms and Complexity Searching Mark D Dunlop [email protected] Mark Dunlop, Computer and Information Sciences, Strathclyde University http://www.cis.strath.ac.uk/~mdd/ 3 Split the room Sort yourself by birthday (e.g. 1/1 .... 31/12) Sort yourself by first name (e.g. Aaron...Zakia) Mark Dunlop, Computer and Information Sciences, Strathclyde University http://www.cis.strath.ac.uk/~mdd/ 4 Searching - examples of accesses How many people in your half are called David How many people in your half have the same birthday Does anyone share my birthday Who’s birthday is next in your half How many people have a unique first name in your half

Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

1

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

1

Algorithms and ComplexityNotes part 2: Searching & Sorting

Mark D [email protected]

35

11Mark Dunlop, Computer and Information Sciences, Strathclyde University

http://www.cis.strath.ac.uk/~mdd/2

Algorithms and ComplexitySearching

Mark D [email protected]

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

3

Split the room

• Sort yourself by birthday(e.g. 1/1 .... 31/12)

• Sort yourself by first name(e.g. Aaron...Zakia)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

4

Searching - examples of accesses

• How many people in your half are called David

• How many people in your half have the same birthday

• Does anyone share my birthday• Who’s birthday is next in your half • How many people have a unique first

name in your half

Page 2: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

2

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

5

SimpleMap.javapackage uk.ac.strath.cis.mdd.ac.simplemap;import java.util.Iterator;

public interface SimpleMap{public Iterator find(Comparable key);public void insert(SimpleMapMember member)

throws SimpleMapException;public void delete(Comparable key)

throws SimpleMapException;}

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

6

DirectoryMember.javapackage uk.ac.strath.cis.mdd.ac.simplemap;

public interface SimpleMapMember {public Comparable getKey();

}

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

7

SimplePhoneDirectoryimplementation

• Also on web are:– SimplePhoneDirectory.java– SimplePhoneKey.java– SimplePhoneMember.java

– Tester.java

• You are encouraged to examine....Mark Dunlop, Computer and Information Sciences, Strathclyde University

http://www.cis.strath.ac.uk/~mdd/8

Linear Search• Work through the items one at a time comparing• Clearly worst case O(n)

• on average O(½n) ≡ O(n)

int LinearSearch(S, k, low, high) //simplified

i = low; foundAt = -1while i<=high & !found

if S.keyAtPos(i)==k then foundAt = ii++

if foundAt == -1 throw DirectoryException(“not found”)return found

Page 3: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

3

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

9

Background:Divide and Conquer

• An algorithmic design pattern– Make the problem smaller by working on a

smaller amount of data– Then have a quick way of combining

results, if need be• If halving each time then Divide and

Conquer often ends up with a complexity of O(log n)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

10

Background:Divide and Conquer

• How many halvings before you have a individual case

splits number covered

=

1 2 21

2 4 22

3 8 23

4 16 24

5 32 25

i n 2i

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

11

Search Complexity• Linear search: proportional to n• Binary search: proportional to log2(n)

– How many times you need to halve it• log2 2 = 1• log2 4 = 2• log2 8 = 3• log2 1,024 = 10• log2 1,048,576 = 20

– BUT you need sorted data to start

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

12

Binary Search

• Narrow down the search range in stages using divide and conquer approach

• Pick middle point and decide if to left or right, then search that half recursively

• Must be very careful with boundaries

Page 4: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

4

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

13

Binary SearchAlgorithm BinarySearch(S, k, low, high)

if low > high thenthrow DirectoryException(“Not Found”)

elsemid = (low+high) / 2

if k == key(mid) thenreturn key(mid)

else if k < key(mid) thenreturn BinarySearch(S, k, low, mid-1)

elsereturn BinarySearch(S, k, mid+1, high)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

14

logs - a reminder

• Standard properties

• So log2(n) ≡ log4(n) ≡ logx(n) ---> log(n)

x yx yx =log

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

15

Exponents - a reminder

• Common properties

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

16

Directory ADT complexitypublic Iterator find(DirectoryKey key);

- unsorted - sorted

public void insert(DirectoryMember member) throws....

- unsorted - sorted

public void delete(DirectoryKey key ) throws....

- unsorted - sorted

Using an array for storage and binary search if sorted

Page 5: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

5

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

17

Summary

• Divide and conquer• Linear (O(n)) v Binary search (O(log n))• Directory ADT• Careful O() analysis of Linear Search

• Next time:– Sorting introduction

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

18

Algorithms and ComplexitySorting

Mark D [email protected]

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

19

Quick Aside• What is the total of all the numbers from

1 to100?

• 5,050 - an O(1) algorithm that showed just how clever 10 year old Karl Guass was at school in 1787

2)1(

1

+=∑ =

nniN

i

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

20

Summing up to n

• 1+2+...+(n-2)+(n-1)+n

21

1

)( +=∑ =

nniN

i

Page 6: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

6

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

21

Link to complexityint sum = 0;for (int i=1; i<=N; i++)

for (int j=1; j<=i; j++)sum++;

• Complexity O(n2), based on )(2

2 nnO +

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

22

Related...

• Geometric sequence

• e.g.

1+2+4+8+...+2n-1 = 2n -1

aaaaaa

NnN

ii

−−

=++++=+

=∑ 111

12

0...

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

23

Motivation for looking at sorting

• Many common operations much faster if data is kept sorted– find(key)– find min, find max, find median

• Most data is accessed much more than created

• Good algorithms for complexity analysis

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

24

Sorting Algorithms

• Bubble sort• Selection sort• Insertion sort• Shell sort

• Merge sort• Quick sort• Bucket sort and Radix sort

Page 7: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

7

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

25

Bubble Sort

• Generally considered silly, but simple

for i = 0 to A.length-1for j = 1 to A.length-i

if A[j-1]>A[j] swap(A[j-1],A[j])

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

26

Selection Sortfor i = 0 to A.length-1

minpos = ifor j = i+1 to A.length-1

if A[j]<A[minpos] minpos = jswap(A[i], A[minpos])

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

27

Insertion Sortfor i = 1 to A.length-1

basevalue = A[i] j = iwhile j>0 && basevalue<A[j-1]

A[j] = A[j-1]j = j-1

A[j] = basevalue

• Involves lots of data shuffling• Works nicely with a user inputting the

data at screenMark Dunlop, Computer and Information Sciences, Strathclyde University

http://www.cis.strath.ac.uk/~mdd/28

Shell Sort

• Named after Donald Shell (inv 1959)• Big idea is to avoid large amount of data

movement by first comparing elements far apart, then slowly reducing to insertion sort

• Uses an increment sequence h1, h2, h3, ..., ht with h1 = 1 (any sequence will do but some are faster)

Page 8: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

8

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

29

Shell Sort example 1

• Use increment sequence 1, 3, 5

• Shell proposed an increment sequence of 1, ..., N/16, N/8, N/4, N/2

after 1 sort 11 12 15 17 28 35 41 58 75 81 94 95 96

unsorted 81 94 11 96 12 35 17 95 28 58 41 75 15after 5-sort 35 17 11 28 12 41 75 15 96 58 81 94 95

before 3-sort 35 17 11 28 12 41 75 15 96 58 81 94 95after 3-sort 28 12 11 35 15 41 58 17 94 75 81 96 95

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

30

ShellSort: Psuedo Code//shellsort using Shell’s increment seqfor (int gap = A.length/2; gap>0; gap = gap/2)

for (int i=gap; i<A.length; i++)basevalue = A[i] j = iwhile j>= gap && basevalue<A[j-gap]

A[j] = A[j-gap]j = j - gap

A[j] = basevalue

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

31

ShellSort: Complexity• Shell proposed an increment sequence

of 1, ..., N/16, N/8, N/4, N/2• This still gives an O(n2) algorithm but a

much faster one - better “normal case”• If we divide by 2 and get an even

number then adding 1 to make it odd gives a complexity of O(n3/2)

• Dividing by 2.2 appears to give O(n5/4) but no proof exists

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

32

Performance of Shellsort

0

100,000

200,000

300,000

400,000

500,000

600,000

0 20,000 40,000 60,000 80,000

Insertion SortShell's ShellsortOdd Gaps Shellsortdiv 2.2 Shellsort

0

500

1,000

1,500

2,000

2,500

3,000

0 20,000 40,000 60,000 80,000

Insertion SortShell's ShellsortOdd Gaps Shellsortdiv 2.2 Shellsort

NInsertion

SortShell's

ShellsortOdd Gaps Shellsort

div 2.2 Shellsort

1,000 122 11 11 92,000 483 26 21 234,000 1,936 61 59 548,000 7,950 153 141 114

16,000 32,560 358 322 26932,000 131,911 869 752 57564,000 520,000 2,091 1,705 1,249

Page 9: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

9

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

33

Ave Performance of Shellsort

• insertion sort – very approx 0.000122n2 + 0.2n +....

• Shell’s shellsort– very approx 0.0000002n2 + 0.02n +...

• But both still have a worst case of O(n2) but the revisions remove this worst case for shell sort

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

34

Merge Sort

• Shell sort is the best we can do with a swap ‘em around strategy

• Merge sort is a divide and conquer strategy (c.f. binary search)

If number of items to search <= 1 return

elsesort left half; sort right halfmerge halves

35

11

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

35

Sorting is the easy bitvoid sort (int[] A, left, right)if (left!=right)

int centre = (left + right) / 2sort (A, left, centre)sort (A, centre+1, right)merge (A, left, centre, right)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

36

void merge (int[] A , left1, centre, right2)int[] B = new int[]int p1 = left1; int p2 = centre+1; int pB=0while p1<=centre & p2<=right2

if (A[p1]<=A[p2]) B[pB]=A[p1]; pB++; p1++

else B[pB]=A[p2]; pB++; p2++

// one list is empty now

while p1<=centre B[pB]=A[p1]; pB++; p1++

while p2<=right2B[pB]=A[p2]; pB++; p2++

// copy back from list B

for (int i=0, i<=right2-left1, i++) A[i+left1]=B[i]

Page 10: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

10

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

37

Complexity Analysis

• Merge is clearly O(n)

• But how many times is it called?

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

38

The call tree

• at level 3: 8 merges of size n/8• at level i: 2i merges of size n/2i

• each level is O(n)

1 merge of size n (level 0)2 merges of size n/24 merges of size n/48 merges of size n/8 (level 3)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

39

So how many levels?

• log2(n) levels each of O(n)

• so merge sort is O(n log n)

splits number covered

=

1 2 21

2 4 22

3 8 23

4 16 24

5 32 25

i n 2i

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

40

Common classes of Big-Oh

0

100

200

300

400

500

600

700

800

900

1000

0 20 40 60 80 100 120

number

time

log nnn log nn*n2*n*nn!

Page 11: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

11

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

41

Quick Sort

• Can we do mergesort in line?• Yes, if we can we get rid of the merges?• Yes, if all data left is split so all data on

the left is below all data right of split.

• So, quicksort shuffles data before recursing

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

42

Overview of Quicksort

• Worst case is O(n2) but this can be avoided

• Very tightly written innermost loops make it very fast

• Very tricky to implement correctly -slightly off and its much slower

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

43

Quick Sort

if number of elements <=1 return

elsepick a pivot value v in APartition A into L and R such that

∀ i∈L: i≤v & ∀ j∈R: j≥v sort(L) sort(R)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

44

Picking the pivot

• A[left]– wrong - if sorted leads to O(n2)

• Ideal is to split data in middle ( |L|=|R| ) to give O(n log n ) equalling merge sort

• Pick the median is perfect but doing this involves sorting!

Page 12: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

12

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

45

Picking the pivot• Safe choice is value at (low+high)/2

– will do very well if already sorted– but not guaranteed to split data equally

• Could pick one randomly - average best• Pragmatic choice: pick the median of

– A[low]– A[(low+high)/2]– A[high]

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

46

Very pseudo code

//assume all elements are unique for nowPick the pivotGet the pivot out the way by putting it at the endSearch from left to right looking for a large element and

also search from right to left looking for a small elementSwap the large and the smallrepeat red until large & small elements swap roundswap pivot with last large itemsort left and right parts (pivot now in right place)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

47

void sort(a, low, high)

//Pick the pivotint mid = (low + high) /2;if a[mid]<a[low] swap(a, low, mid)if a[high]<a[low] swap(a, low, high)if a[high]<a[mid] swap(a, mid, high)// low, mid, high are now sorted - use mid as pivot// note a[low] is small and a[high] is high so leave them

//Get the pivot out the way by putting it at the endswap(a, mid, high-1)

//begin partitioning...

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

48

//begin partitioningpivot = a[high-1]int i = low+1; int j=high-1;while i<j //i.e. while not swapped over

//Search from left to right looking for a large element //Search from right to left looking for a small elementwhile (a[i]<pivot) i++;while (a[j]>pivot) j--;if (i<j) swap(a, i, j) //Swap the large and the small

//swap pivot with last large item we found aboveswap(a, i, high-1)

//sort left and right parts - NB a[i] is in right placesort(a, low, i-1)sort(a, i+1, high)

Page 13: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

13

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

49

Average case complexity

• On average we have O(n log n) plus a very fast n part

• Insertion sort is faster for small lists - so good quicksort implementations actually call insertion sort when the size of list is <5..20

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

50

Bucket Sort

• Now for something completely different• How about sorting based on the

contents of the key rather than just the order of the keys?

• Similar to how we sort forms by name– put the as in a pile, a*, the bs into b*...

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

51

Pseudo Codevoid bucketSort (queue q)//N = size of alphabetqueue[N] bucket

while q.isnotempty()x = a.removefromfront();bucket[x.key].add(x)

for i=0 to N-1while bucket[i].isnotempty()

a.insertatend(bucket[i].removefromfront())

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

52

Radix Sort

• Not the full story, how do we sort...– put the as in a pile, a*, the bs into b*...– split a* into aa* ab* ac*....– split the aa* into aaa*, aab*....

• That’s radix sorting

• But first a new concept...

Page 14: Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary search: proportional to log 2(n) – How many times you need to halve it •log

14

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

53

Stable Sort

• A stable sort is a sort that preserves the order of elements that have equal keys

• Most sorts don’t honour this but its a nice feature that radix sort needs to be most efficient

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

54

Working backwards

• If we can do a stable sort of keyi, then we should sort keyi+1 first then stabily sort keyi

• e.g. GH KI FE BH IU XS DE HJ JT HU• sort second letter gives

– FE DE GH BH KI HJ KS KT IU HU• sort first letter stabily gives

– BH DE FE GH HJ HU IU KI KS KT

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

55

Radix Sort Complexity

• Given an alphabet of A letters, key length of L letters and N items

• Radix sort works in O(L(N+A))• Given that L & A are fixed for a given

dataset, this gives an O(N) sorter!

• Pseudo code left as exercise 8-)

Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/

56

Sorting Algorithms

• Selection sort, Bubble sort, Insertion sort• Shell sort• Merge sort, Quick sort(~)• Bucket sort, Radix sort

• Next: sorting like things– Hash tables, priority queues & heaps