40
CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees

CS 221 Analysis of Algorithms Ordered Dictionaries and Search Trees

Embed Size (px)

Citation preview

CS 221

Analysis of Algorithms

Ordered Dictionaries and Search Trees

Portions of these slides come from Michael Goodrich and Roberto Tamassia,

Algorithm Design: Foundations, Analysis and Internet Examples, 2002, John Wiley and Sons.

and its authors, Michael Goodrich and Roberto Tamassia,

the books publisher John Wiley & Sons and… www.wikipedia.org

Reading material

Goodrich and Tamassia, 2002 Chapter 2, section 2.5,pages 114-137 see also section 2.6

Chapter 3, section 3.1 pages 141-151

Wikipedia: http://en.wikipedia.org/wiki/AVL_trees

in the previous episode… …we defined a data structure which we

called a dictionary. It was… a container to hold multiple objects or in

Goodrich and Tamassia’s terminology “items” each item = a (key, element) pair element = a “piece” of data

think= name, address, phone number key = a value we associate the element to help

us find, retrieve, delete, etc an element think = rdbms autoincrement key, student ID#

Dictionaries

Up til now we looked at Unordered dictionaries

container for (k,e) pairs but… in no particular order

Logfiles Hash Tables

Dictionaries

A terminology note for purposes of our discussion –

A linear unordered dictionary = logfile A lineary ordered dictionary = lookup table

Game Time

Twenty Questions One person thinks of an object that can

be any person, place or thing… and does not disclose the selected object

until it is specifically identified by the other players…

All other players take turns asking Yes/No questions in an attempt to identify the mystery object

Game Time

Twenty Questions An efficient problem solving strategy is

to ask questions for which the answers will optimally narrow the size of the problem space (possible solutions)

for example, Q: Is it a person? A: Yes ….we just eliminated all places and

non-human objects from the solution set

Game Time

Twenty Questions Size of problem?

N=??? large ~∞

Yes/No attack makes this a binary search problem…

So, what size of problem space can we effectively search? 220

Game Time

Twenty Questions Something to think about…

N is conceivably much larger than 220

So, how is that we can usually solve this problem in 20 steps or less… i.e. correctly identify the mystery object

Dictionaries Ordered Dictionaries

suppose the items in a dictionary are ordered (sorted) like low to high

Would that make a difference in terms of size() isEmpty() findElement() insertItem() removeItem()

Dictionaries Ordered Dictionaries

suppose we implement an ordered dictionary as a linear data structure or more specifically a vector

items are in vector in key order we gain considerable efficiency because we can

visit D[x], where x is a rank in O(1) time Can we achieve the same time of findElement()

time if the ordered dictionary were implemented as a linked list?

Binary Search Binary search performs operation findElement(k) on a

dictionary implemented by means of an array-based sequence, sorted by key similar to the high-low game at each step, the number of candidate items is halved terminates after O(log n) steps

Example: findElement(7)

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

1 3 4 5 7 8 9 11 14 16 18 19

0

0

0

0

ml h

ml h

ml h

lm h

Binary Search

Lookup tables are not very efficient for dynamic data (lot of insertItem, removeElement

Lookup tables are efficient for dictionaries where predominant access is findElement, and relatively little inserts or removes credit card authorizations, code translation tables,…

Method Logfile Lookup Table

findElement O(n) O(log n)

insertItem O(1) O(n)

removeElement

O(n) O(n)

closetKeyBef O(n) O(log n)

Binary Search Tree

Binary tree for holding (k,e) items, such that… each internal node v store elem e with

key k k of e in left subtree of v <= k of v k of e in right subtree of v >= k of v external nodes store no elements…

only placeholder (NULL_NODE)

Binary Search Tree Each left

subtree is less than its parent

Each right subtree is greater than its parent

All leaf nodes hold no items

58

31 90

25 42

12 36

62

75

SearchAlgorithm findElement(k, v)

if T.isExternal (v)return NO_SUCH_KEY

if k key(v)return findElement(k, T.leftChild(v))

else if k key(v)return element(v)

else { k key(v) }return findElement(k, T.rightChild(v))

6

92

41 8

removeElement(k) – simple case

To perform operation removeElement(k), we search for key k

Assume key k is in the tree, and let let v be the node storing k

If node v has a leaf child w, we remove v and w from the tree with operation removeAboveExternal(w)

Example: remove 4

6

92

41 8

5

vw

6

92

51 8

RemoveElement(k) – more complicated case

We consider the case where the key k to be removed is stored at a node v whose children are both internal we find the internal node w

that follows v in an inorder traversal

we copy key(w) into node v we remove node w and its

left child z (which must be a leaf) by means of operation removeAboveExternal(z)

Example: remove 3

3

1

8

6 9

5

v

w

z

2

5

1

8

6 9

v

2

Binary Search Tree Performance Consider a dictionary

with n items implemented by means of a binary search tree of height h the space used is O(n) methods findElement ,

insertItem and removeElement take O(h) time

The height h is O(n) in the worst case and O(log n) in the best case

Balanced Trees

When a path in a tree gets very long relative to other paths in the tree…

the tree is unbalanced In fact, in its extreme form an

unbalanced tree is a linear list. So, to achieve optimal performance… you need to keep the tree balanced

AVL Trees we want to maintain a balanced tree recall-

height of a node v = longest path from v to an external node

We want to maintain the principle that for every node v the height of its children

can differ by no more than 1 Height-Balance Property

AVL Trees h(right_subtree)-h(left_subtree) =

Balance Factor |h(right_subtree)-h(left_subtree)| =

{0,1} Tree with Balance Factor ≠ {-1,0,1}

Unbalanced Tree Must be rebalanced

Balance Factor exists for every node v except (trivially) external nodes

AVL Trees

If Balance Factor = -1,0,1 tree balanced does not need restructured

If Balance Factor = -2, 2 tree unbalanced needs restructured

restructured done by process called rotation

AVL Trees

Rotation Four types – but two are symmetrical

Left Single Rotation Right Single Rotation Left Double Rotation Right Double Rotation

Since two are symmetrical –only consider single and double rotation

AVL Trees

Rotation if BF = 2

AVL Trees

Binary Trees that maintain the Height-Balance Property are called

AVL trees the name comes from the inventors

G.M. Adelson-Velsky and E.M. Landis in paper entitled “An Algorithm for Information Organization”

AVL Trees

Unbalanced Tree Balanced Tree

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees h(right_subtree)-h(left_subtree) =

Balance Factor (BF) If BF = {-1,0,1} then tree balanced

(do nothing) If BF ≠{-1,0,1} then tree unbalanced

(must be restructured) Restructuring done by rotation

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees

Rotation four cases – but pairs are symmetrical

left single rotation right single rotation left double rotation right double rotation

singe symmetric – we only examine single and double

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion Rotation

If BF > 2 unbalance occurred further down in right subtree Recursively walk down subtree until |BF| =2

If BF < -2 unbalance occurred further down in left subtree Recursively walk down subtree until |BF| =2

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion Rotation

If BF = 2 unbalance occurred in right subtree Recursively walk down subtree until |BF| =2

If BF = -2 unbalance occurred in left subtree Recursively walk down subtree until |BF| =2

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion Rotation

If BF = 2 unbalance occurred in right subtree Step down to subtree to find where

insertion occurred If BF = -2 unbalance occurred in left

subtree Step down to subtree to find where

insertion occurred

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion

Rotation If BF at subtree = 1

insertion occurred on right leaf node single rotation required

If BF at subtree = -1 insertion occurred on left leaf node double rotation occurred

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion

Rotation See

http://en.wikipedia.org/wiki/AVL_trees

from:http://en.wikipedia.org/wiki/AVL_trees

AVL Trees - Insertion

Performance rotations – O(1) Recall h(T) maintained at O(log n) insertItem – O(log n) balanced tree - priceless

from:http://en.wikipedia.org/wiki/AVL_trees

Bounded –depth Search Trees

Search efficiency in tree is related to the depth of the tree

Can use depth bounded tree to create ordered dictionaries that run in O(log n) for search and update run-time

Multi-way Search Trees

Remember Binary Search Trees any node v can have at most 2 children what if we get rid of that rule

Suppose a node could have multiple children (>2)

Terminology – if v has d children – v is a d-node

Multi-way Search Trees

Multi-way Search Tree - T Each Internal node must have at least

two children -- internal node is d-node with d ≥ 2

Internal nodes store collections of items (k,e)

Each d-node stores d-1 items Special keys k0 = -∞ and kd = ∞ External nodes only placeholders