Priority Queue Sorting - cse.iitrpr.ac.incse.iitrpr.ac.in/ckn/courses/f2015/csl201/w6.pdf · to sort a list of comparable elements 1. Insert the elements one by one with a series

Priority Queue Sorting q  We can use a priority queue

to sort a list of comparable elements 1.  Insert the elements one by

one with a series of insert operations

2.  Remove the elements in sorted order with a series of removeMin operations

q  The running time of this sorting method depends on the priority queue implementation

Algorithm PQ-Sort(S, C) n  Input list S, comparator C

for the elements of S n  Output list S sorted in

increasing order according to C

P ← priority queue with comparator C

while ¬S.isEmpty () e ← S.remove(S.first ())

P.insert (e, ∅) while ¬P.isEmpty()

e ← P.removeMin().getKey() S.addLast(e)

© 2014 Goodrich, Tamassia, Goldwasser 1 Heaps

Selection-Sort q  Selection-sort is the variation of PQ-sort where the

priority queue is implemented with an unsorted sequence

q  Running time of Selection-sort: 1.  Inserting the elements into the priority queue with n insert

operations takes O(n) time 2.  Removing the elements in sorted order from the priority

queue with n removeMin operations takes time proportional to n + n-1 + … +2 +1

q  Selection-sort runs in O(n2) time


Selection-Sort Example

© 2014 Goodrich, Tamassia, Goldwasser Heaps 3

Sequence S Priority Queue P Input: (7,4,8,2,5,3,9) () Phase 1

(a) (4,8,2,5,3,9) (7) (b) (8,2,5,3,9) (7,4) .. .. .. (g) () (7,4,8,2,5,3,9)

Phase 2

(a) (2) (7,4,8,5,3,9) (b) (2,3) (7,4,8,5,9) (c) (2,3,4) (7,8,5,9) (d) (2,3,4,5) (7,8,9) (e) (2,3,4,5,7) (8,9) (f) (2,3,4,5,7,8) (9) (g) (2,3,4,5,7,8,9) ()

Insertion-Sort q  Insertion-sort is the variation of PQ-sort

where the priority queue is implemented with a sorted sequence

q  Running time of Insertion-sort: n  Inserting the elements into the priority queue

with n insert operations takes time proportional to 1 + 2 + …+ n

n  Removing the elements in sorted order from the priority queue with a series of n removeMin operations takes O(n) time

q  Insertion-sort runs in O(n2) time


Insertion-Sort Example Sequence S Priority queue P

Input: (7,4,8,2,5,3,9) () Phase 1 (a) (4,8,2,5,3,9) (7)

(b) (8,2,5,3,9) (4,7) (c) (2,5,3,9) (4,7,8) (d) (5,3,9) (2,4,7,8) (e) (3,9) (2,4,5,7,8) (f) (9) (2,3,4,5,7,8) (g) () (2,3,4,5,7,8,9)

Phase 2

(a) (2) (3,4,5,7,8,9) (b) (2,3) (4,5,7,8,9) .. .. .. (g) (2,3,4,5,7,8,9) ()


In-place Insertion-Sort q  Instead of using an

external data structure, we can implement selection-sort and insertion-sort in-place

q  A portion of the input sequence itself serves as the priority queue

q  For in-place insertion-sort n  We keep sorted the initial

portion of the sequence n  We can use swaps instead

of modifying the sequence


5 4 2 3 1

5 4 2 3 1

4 5 2 3 1

2 4 5 3 1

2 3 4 5 1

1 2 3 4 5

1 2 3 4 5

Heap Sort q  Consider a priority queue

with n items implemented by means of a heap n  the space used is O(n) n  methods insert and

removeMin take O(log n) time

n  methods size, isEmpty, and min take time O(1) time

q  Using a heap-based priority queue, we can sort a sequence of n elements in O(n log n) time

q  The resulting algorithm is called heap-sort

q  Heap-sort is much faster than quadratic sorting algorithms, such as insertion-sort and selection-sort


Heap Sort q  Create a heap q  Do removeMin repeatedly till head

becomes empty q  To do an in place sort, we move deleted

element to end of heap

8 Heaps

Heap Sort

Heaps 9

12

20 23

25

31 26

11

17

33

29

13

8

8 11 13 12 25 17 29 20 23 31 26 33

Heap Sort

Heaps 10

12

20 23

25

31 26

11

17

8

29

13

33

33 11 13 12 25 17 29 20 23 31 26 8

Heap Sort

Heaps 11

12

20 23

25

31 26

33

17

8

29

13

11

11 33 13 12 25 17 29 20 23 31 26 8

Heap Sort

Heaps 12

33

20 23

25

31 26

12

17

8

29

13

11

11 12 13 33 25 17 29 20 23 31 26 8

Heap Sort

Heaps 13

20

33 23

25

31 26

12

17

8

29

13

11

11 12 13 20 25 17 29 33 23 31 26 8

Heap Sort

Heaps 14

20

33 23

25

31 11

12

17

8

29

13

26

26 12 13 20 25 17 29 33 23 31 11 8

Heap Sort

Heaps 15

20

33 23

25

31 11

26

17

8

29

13

12

12 26 13 20 25 17 29 33 23 31 11 8

Heap Sort

Heaps 16

26

33 23

25

31 11

20

17

8

29

13

12

12 20 13 26 25 17 29 33 23 31 11 8

Heap Sort

Heaps 17

23

33 26

25

31 11

20

17

8

29

13

12

12 20 13 23 25 17 29 33 26 31 11 8

Heap Sort

Heaps 18

23

33 26

25

12 11

20

17

8

29

13

31

31 20 13 23 25 17 29 33 26 12 11 8

Heap Sort

Heaps 19

26

17 13

25

12 11

31

23

8

20

29

33

33 31 29 26 25 23 20 17 13 12 11 8

Heapsort - Analysis q  Create a heap - O(n) q  Do removeMin repeatedly till head

becomes empty n  O(log n) + O(log n-1) + O(log n-2) + …

O(1) = O(n log n)

q  To do an in place sort, we move deleted element to end of heap

20 Heaps

Maps

Maps q  A map models a searchable collection of

key-value entries q  The main operations of a map are for

searching, inserting, and deleting items q  Multiple entries with the same key are

not allowed q  Applications:

n  address book n  student-record database

© 2014 Goodrich, Tamassia, Goldwasser 22 Maps

The Map ADT q  get(k): if the map M has an entry with key k, return its

associated value; else, return null q  put(k, v): insert entry (k, v) into the map M; if key k is not

already in M, then return null; else, return old value associated with k

q  remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null

q  size(), isEmpty() q  entrySet(): return an iterable collection of the entries in M q  keySet(): return an iterable collection of the keys in M q  values(): return an iterator of the values in M


Example Operation Output Map isEmpty() true Ø put(5,A) null (5,A) put(7,B) null (5,A),(7,B) put(2,C) null (5,A),(7,B),(2,C) put(8,D) null (5,A),(7,B),(2,C),(8,D) put(2,E) C (5,A),(7,B),(2,E),(8,D) get(7) B (5,A),(7,B),(2,E),(8,D) get(4) null (5,A),(7,B),(2,E),(8,D) get(2) E (5,A),(7,B),(2,E),(8,D) size() 4 (5,A),(7,B),(2,E),(8,D) remove(5) A (7,B),(2,E),(8,D) remove(2) E (7,B),(8,D) get(2) null (7,B),(8,D) isEmpty() false (7,B),(8,D)


A Simple List-Based Map q  We can implement a map using an

unsorted list n  We store the items of the map in a list S

(based on a doubly linked list), in arbitrary order


trailer header nodes/positions

entries

9 c 6 c 5 c 8 c

The get(k) Algorithm Algorithm get(k):

B = S.positions() {B is an iterator of the positions in S} while B.hasNext() do p = B.next() { the next position in B } if p.element().getKey() = k then return p.element().getValue() return null {there is no entry with key equal to k}


The put(k,v) Algorithm Algorithm put(k,v): B = S.positions() while B.hasNext() do

p = B.next() if p.element().getKey() = k then t = p.element().getValue() S.set(p,(k,v)) return t {return the old value}

S.addLast((k,v)) n = n + 1 {increment variable storing number of

entries} return null { there was no entry with key equal to k }


The remove(k) Algorithm Algorithm remove(k): B =S.positions() while B.hasNext() do

p = B.next() if p.element().getKey() = k then t = p.element().getValue() S.remove(p) n = n – 1 {decrement number of entries} return t {return the removed value}

return null {there is no entry with key equal to k}


Performance of a List-Based Map q  Performance:

n  put takes O(1) time since we can insert the new item at the beginning or at the end of the sequence

n  get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key

n  Average case time?

q  The unsorted list implementation is effective only for maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)

29 Maps

Different Data Structures to Implement Map ADT q  arrays, linked lists (inefficient) q  Binary Trees q  Hash Tables q  Red/Black Trees q  AVL Trees q  B-Trees q  Java

n  java.util.Dictionary - abstract class n  java.util.Map - interface

30 Maps

Binary Search


Direct Addressing q  an array that is indexed by the key

n  O(1) - time n  O(r) space - where r is the range of

numbers n  wastage of space

32 Maps

A B C D

XXXX-XXXX XXXX-XXXX 1234-5678 XXXX-XXXX

Hash Tables

∅

∅

0 1 2 3 4 451-229-0004

981-101-0002 025-612-0001

Intuitive Notion of a Map q  Intuitively, a map M supports the abstraction

of using keys as indices with a syntax such as M[k].

q  As a mental warm-up, consider a restricted setting in which a map with n items uses keys that are known to be integers in a range from 0 to N − 1, for some N ≥ n.

© 2014 Goodrich, Tamassia, Goldwasser 34 Hash Tables

Hash Table Solution q  O(1) - expected time q  O(n+m) - space, where m is size of the table q  instead of a one-to-one map between the key values

and array locations, find a function to map the large range into one which we can manage n  e.g., key value modulo size of array, and use that as an index n  Insert (12345678, C) into a hashed array of size 5, -

12345678 mod 5 = 3

35 Hash Tables

C

XXXX-XXXX XXXX-XXXX XXXX-XXXX 1234-5678 XXXX-XXXX

More General Kinds of Keys q  But what should we do if our keys are not

integers in the range from 0 to N – 1? n  Use a hash function to map general keys to

corresponding indices in a table. n  For instance, the last four digits of a Identification

number.


∅

∅

0 1 2 3 4 451-229-0004

981-101-0002 025-612-0001

…

Hash Functions and Hash Tables q  A hash function h maps keys of a given type to

integers in a fixed interval [0, N - 1] q  Example:

h(x) = x mod N is a hash function for integer keys

q  The integer h(x) is called the hash value of key x


q  A hash table for a given key type consists of n  Hash function h n  Array (called table) of size N

q  When implementing a map with a hash table, the goal is to store item (k, o) at index i = h(k)

Example

0 1 2 3 4 5 6 7 8 9 10

(1,D) (25,C)

(3,F)

(14,Z)

(39,C)

(6,A) (7,Q)

38 Hash Tables

Collision – Two different keys resulting in the same hash value

Hash Functions q  Need to choose a good hash function

n  quick to compute n  uniform distribution of keys throughout the

table n  good hash functions are very rare.

q  How to deal with hashing non-integer keys n  find some way to turn keys into integers n  use standard hash functions on these

integers

39 Hash Tables

Hash Functions (2) q  A hash function is

usually specified as the composition of two functions: Hash code: h1: keys → integers Compression function: h2: integers → [0, N - 1]

q  The hash code is applied first, and the compression function is applied next on the result, i.e.,

h(x) = h2(h1(x)) q  The goal of the hash

function is to “disperse” the keys in an apparently random way


Hash Codes q  Integer cast:

n  We reinterpret the bits of the key as an integer

n  Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java)

n  For keys of length greater than the number of bits of the integer type, ignore the exceeding bits

q  Component sum: n  We partition the bits of the

key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows)

n  Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java)

n  Not a good choice for strings

41 Hash Tables

Hash Codes (2) q  Polynomial accumulation:

n  We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an-1

n  We evaluate the polynomial p(z) = a0 + a1 z + a2 z2 + …

… + an-1zn-1

at a fixed value z, ignoring overflows

n  Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words)

q  Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: n  The following polynomials

are successively computed, each from the previous one in O(1) time p0(z) = an-1 pi (z) = an-i-1 + zpi-1(z) (i = 1, 2, …, n -1)

q  We have p(z) = pn-1(z)


Compression Functions q  Division:

n  Use the remainder n  h2 (y) = y mod N

w y is the key w N is size of the table

n  how to choose N?

43 Hash Tables

Consider (200, 205,210,215,220,…600) N = 100 N = 101

Compression Functions (2) q  Division:

n  Use the remainder n  h2 (y) = y mod N

w  y is the key w N is size of the table

n  how to choose N? n  N = bx (bad)

w N is a power of 2, h2(y) gives the x least significant bits of y.

w all keys with the same ending go to the same place n  N is prime (good)

w helps ensure uniform distribution

44 Hash Tables

Consider (200,205,210,215,220,…600) N = 100 N = 101

Compression Functions (3) q  Multiply, Add and Divide (MAD):

n  h2 (y) = [(ay + b) mod p ]mod N n  a and b are nonnegative integers such that

a mod N ≠ 0 n  Otherwise, every integer would map to the

same value b n  p is a prime number larger than N n  a and b are chosen at random from the

interval [0, p-1], with a > 0

45 Hash Tables

Compression Functions (4) q  For any choice of hash function, there

always exists a bad set of keys q  You could choose only these keys,

resulting in all of them getting mapped to the same slot. n  reduction in performance.

q  Solution n  collection of hash functions n  a random hash function n  choose a hash function that is independent of

the keys

46 Hash Tables

Collision Handling q  Collisions occur when

different elements are mapped to the same cell

q  Separate Chaining: let each cell in the table point to a linked list of entries that map there n  complexity depends

on load factor

q  Separate chaining is simple, but requires additional memory outside the table


∅

∅ ∅

0 1 2 3 4 451-229-0004 981-101-0004

025-612-0001

Map with Separate Chaining Delegate operations to a list-based map at each cell:

Algorithm get(k): return A[h(k)].get(k) Algorithm put(k,v): t = A[h(k)].put(k,v) if t = null then {k is a new key}

n = n + 1 return t Algorithm remove(k): t = A[h(k)].remove(k) if t ≠ null then {k was found}

n = n - 1 return t


Linear Probing q  Open addressing: the colliding item is placed

in a different cell of the table n  load factor is at most 1

q  Linear probing: handles collisions by placing the colliding item in the next (circularly) available table cell

q  Each table cell inspected is referred to as a “probe”

q  Colliding items lump together, causing future collisions to cause a longer sequence of probes

49 Hash Tables

Linear Probing (2) q  Example:

n  h(x) = x mod 13 n  Insert keys 18, 41, 22, 44, 57, 32, 31, 73,

in this order


0 1 2 3 4 5 6 7 8 9 10 11 12



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

18



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

18



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 22



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 22



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 22

44



in this order w h(x) = (x+1) mod 13


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 22

44





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 22



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 22



in this order


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 22

57





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 22

57



in this order w h(x) = (x+1) mod 13 w h(x) = (x+2) mod 13


0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 22

57





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 57 22





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 57 32 22





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 57 32 22 31





0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 57 32 22 31 73

Search with Linear Probing q  Consider a hash table A

that uses linear probing q  get(k)

n  We start at cell h(k) n  We probe consecutive

locations until one of the following occurs w  An item with key k is

found, or w  An empty cell is found,

or w  N cells have been

unsuccessfully probed


Algorithm get(k) i ← h(k) p ← 0 repeat c ← A[i] if c = ∅

return null else if c.getKey () = k return c.getValue() else i ← (i + 1) mod N

p ← p + 1 until p = N return null

Search with Linear Probing (2) q  Example:


in this order n  Search for 57

69 Hash Tables

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 57 32 22 31 73

Updates with Linear Probing q  To handle insertions and deletions, we introduce a

special object, called DEFUNCT, which replaces deleted elements

q  remove(k) n  We search for an entry with key k

n  If such an entry (k, o) is found, we replace it with the

special item DEFUNCT and we return element o n  Else, we return null

70 Hash Tables

Update with Linear Probing (2) q  Example:


in this order n  Remove 57 n  Search for 32

71 Hash Tables

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 32 22 31 73



in this order n  Remove 57 n  Search for 32

72 Hash Tables

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 X 32 22 31 73

Updates with Linear Probing (4) q  To handle insertions and deletions, we introduce a special

object, called DEFUNCT, which replaces deleted elements

q  put(k, o) n  We throw an exception if the table is full n  We start at cell h(k) n  We probe consecutive cells until one of the

following occurs w A cell i is found that is either empty or stores

DEFUNCT, or w N cells have been unsuccessfully probed

n  We store (k, o) in cell i




in this order n  Remove 57 n  Search for 32 n  Insert 58

74 Hash Tables

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 X 32 22 31 73



in this order n  Remove 57 n  Search for 32 n  Insert 58

75 Hash Tables

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 58 32 22 31 73

Double Hashing q  Double hashing uses a secondary hash

function d(k) and handles collisions by placing an item in the first available cell of the series

(i + jd(k)) mod N for j = 0, 1, … , N - 1

q  The secondary hash function d(k) cannot have zero values

q  The table size N must be a prime to allow probing of all the cells


Example of Double Hashing q  Consider a hash table

storing integer keys that handles collision with double hashing n  N = 13 n  h(k) = k mod 13 n  d(k) = 7 - k mod 7

q  Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order


k h (k ) d (k ) Probes18 5 3 541 2 1 222 9 6 944 5 5 5 1059 7 4 732 6 3 631 5 4 5 9 073 8 4 8

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18 44 59 32 22 31 73

31 41 18 32 59 73 22 44

Performance of Hashing q  In the worst case, searches,

insertions and removals on a hash table take O(n) time

q  The worst case occurs when all the keys inserted into the map collide

q  The load factor α = n/N affects the performance of a hash table

q  Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is

1 / (1 - α)

q  The expected running time of all the dictionary ADT operations in a hash table is O(1)

q  In practice, hashing is very fast provided the load factor is not close to 100%

q  Applications of hash tables: n  small databases n  compilers n  browser caches


Documents

Priority Queue Sorting - cse.iitrpr.ac.incse.iitrpr.ac.in/ckn/courses/f2015/csl201/w6.pdf · to sort a list of comparable elements 1. Insert the elements one by one with a series