49
Tonga Institute of Higher Education Design and Analysis of Algorithms IT 254 Lecture 4: Data Structures

Tonga Institute of Higher Education Design and Analysis of Algorithms

  • Upload
    ina

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Tonga Institute of Higher Education Design and Analysis of Algorithms. IT 254 Lecture 4: Data Structures. Data Structures. Data structures are important for many reasons in computer science. The way data is stored can determine how fast you find it and what you can do with it - PowerPoint PPT Presentation

Citation preview

Page 1: Tonga Institute of Higher Education Design and Analysis of Algorithms

Tonga Institute of Higher Education

Design and Analysis of Algorithms

IT 254

Lecture 4:

Data Structures

Page 2: Tonga Institute of Higher Education Design and Analysis of Algorithms

Data Structures

• Data structures are important for many reasons in computer science. The way data is stored can determine how fast you find it and what you can do with it

• The data structures we will focus on are hash tables, binary search trees and Red-Black trees

• These data structures have a special name:– Dynamic Sets

• They are all able to do the following operations:– Search(S, k), FindMin(S), FindMax(S), Insert(S,x),

Remove(S,x)

Page 3: Tonga Institute of Higher Education Design and Analysis of Algorithms

Binary Search Trees

• Binary Search Trees (BSTs) are an important data structure in dynamic sets

• Each piece of data in the tree will contain:– key: an identifying field that allows ordering– left: pointer to a left child (may be NULL)– right: pointer to a right child (may be NULL)– p: pointer to a parent node (NULL for root)

Page 4: Tonga Institute of Higher Education Design and Analysis of Algorithms

Binary Search Trees

• BST property: key[left(x)] key[x]

key[right(x)]• Example:

15

11 16

17139

Page 5: Tonga Institute of Higher Education Design and Analysis of Algorithms

Inorder TreeWalk

• What does the following code do?TreeWalk(x) TreeWalk(left[x]); print(x); TreeWalk(right[x]);

• Answer: prints elements in sorted (increasing) order

• This is called an inorder tree walk– Preorder tree walk: print root, then left, then

right– Postorder tree walk: print left, then right, then

root

Page 6: Tonga Institute of Higher Education Design and Analysis of Algorithms

Searching a BST

• We can search for an item in a tree by using a key and a pointer to a node

• The search will return an element with that key or NULL if not found:

TreeSearch(x, k) if (x == NULL or k == key[x]) return x; if (k < key[x]) return TreeSearch(left[x], k); else return TreeSearch(right[x], k);

Page 7: Tonga Institute of Higher Education Design and Analysis of Algorithms

More BST Searching

• Here’s another function that does the same:

TreeSearch(x, k) while (x != NULL and k != key[x]) if (k < key[x]) x = left[x]; else x = right[x]; return x;

• Which of these two functions is more efficient?

Page 8: Tonga Institute of Higher Education Design and Analysis of Algorithms

BST Insert()

• To insert an element into the tree, we just need to make sure that after inserting the tree still holds the BST property

• How can we insert and maintain the property– Use the search algorithm from Searching– Because we will not find the item, we will get a

NULL value. Insert the item where the NULL is– Use a “trailing pointer” to keep track of where

you came from

Page 9: Tonga Institute of Higher Education Design and Analysis of Algorithms

BST Insert• Like searching,

TreeInsert begins at the root and goes downward.

• The pointer x traces the path and pointer y is the parent of x

• The lines 3-9 move the x and the y down the tree by using the BST property until x is NULL

• The NULL is the place we want to put the item "z"

• Lines 10-16 set the pointer for item "z"

0: TreeInsert(Tree,z)1: y = NULL2: x = root[Tree]3: while (x != NULL) {4: y = x5: if z->key < x->key6: x = x->left7: else8: x = x->right9: }10: z->parent = y11: if y == NULL12: root[Tree] = z13: else if z->key < y->key14: y->left = z15: else16: y->right = z

Page 10: Tonga Institute of Higher Education Design and Analysis of Algorithms

Running Time of Search/Insert

• What is the running time of TreeSearch() or TreeInsert()?– Answer: O(h), where h = height of tree

• What is the height of a binary search tree?– Answer: worst case: h = O(n) when tree is just

a linear string of left or right children– Most of the time, it will be 2h = #of items, so if

there are 64 items, 2h = 64, h = 8• For now, we'll just say running times in terms of

"h"• Later we’ll see how to make sure h is always O(lg

n)

Page 11: Tonga Institute of Higher Education Design and Analysis of Algorithms

Sorting with BSTs

• An algorithm for sorting an array A of length n:BSTSort(A)

for i=1 to n

TreeInsert(A[i]);

InorderTreeWalk(root);• What is the running time of this?

– Worst case? – Average case?

Page 12: Tonga Institute of Higher Education Design and Analysis of Algorithms

Sorting with BSTs

• Average case analysis– It’s a form of quicksort!

3 1 8 2 6 7 5

5 7

1 2 8 6 7 5

2 6 7 5

3

1 8

2 6

5 7

for i=1 to n TreeInsert(A[i]);InorderTreeWalk(root);

Page 13: Tonga Institute of Higher Education Design and Analysis of Algorithms

Sorting BSTs• The same partitions are done as with

quicksort, but in a different order– In the previous example

• Everything was compared to 3 once• Then those items < 3 were compared to 1

once• Etc.

– This is basically the same type of comparisons that are done in quicksort, just in a different order.

– If it’s using the same idea as quicksort, BSTs are also O(n lg n).

Page 14: Tonga Institute of Higher Education Design and Analysis of Algorithms

Deletion in BST

• Deletion is a bit tricky• 3 cases:

– x has no children: • Remove x

– x has one child: • Take out x, movechild into x’s spot

– x has two children: • Swap x with successor• Perform case 1 or 2 to delete it

F

B H

KDA

CExample: delete Kor H or B

Page 15: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red-Black Trees

• Red-Black trees – A special form of the binary tree. – The difference is the RB Tree has a

property that says the tree will always be "balanced"

– Balanced means that all operations will happen in O(lg n) time in the worst case

– Nodes in the tree get different colors – Using the colors to help organize the tree,

we can make sure the height of the tree is always O(lg n)

Page 16: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red-Black Trees

• Our goals for Red-Black Trees– First: describe the properties of red-

black trees– Then: prove that the properties

guarantee that h = O(lg n)– Finally: describe operations that are

done with red-black trees

Page 17: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red Black Properties

1. Every node is either red or black2. Every leaf (NULL pointer) is black

• Note: this means every “real” node has 2 children

3. If a node is red, both children are black• Note: can’t have 2 consecutive reds on a path

4. Every path from node to descendent leaf contains the same number of black nodes

5. The root is always black

The black-height: # black nodes on path to leafNow every node has 5 items:

(color, key, left, right and parent)

Page 18: Tonga Institute of Higher Education Design and Analysis of Algorithms

Height of Red-Black trees

• Theorem: A red-black tree with "n" nodes has height that is at most 2 lg(n + 1)

• This is a good thing to know. It says that our operations will never have to go farther down a tree than 2*lg n, which is O(lg n).

• We will try to prove this is true by using induction.

Page 19: Tonga Institute of Higher Education Design and Analysis of Algorithms

Proving Red-Black Height

• First we will show that at any node "x", the tree below x has a height of at least 2bh(x) – 1 nodes.

• To prove this we will use induction:– Base Case: If the height of x = 0, then x

is a leaf, and the number of nodes is 20 – 1 = 0, which is true (there are no nodes below if height is 0)

Page 20: Tonga Institute of Higher Education Design and Analysis of Algorithms

Proving Red-Black Height• Inductive Case:

– Pretend there is a node x that has:• A positive height • 2 children

– Each child has a black-height of either:• bh(x) or bh(x)-1, depending if it is red or black

– The height of a child of x must be less than the height of x itself, so we can use induction to say:

• Number of nodes a child has is at least 2bh(x)-1 -1• And both children + "x" have at least

– (2bh(x)-1 - 1) + (2bh(x)-1 - 1) + 1 = 2bh(x) – 1• This proves the claim about how many nodes there are

Page 21: Tonga Institute of Higher Education Design and Analysis of Algorithms

Proving Red-Black Height

• Now we are ready to prove that the height is at most 2 lg (n+1)– Property 3 says at least half the nodes on any

path must be black. So, the black height on any path must be at least h/2

– So we can now say: n > 2h/2 – 1 (because we plug in h/2 for bh(x) from the last slide)

– This equals:• n + 1 > 2h/2 • lg (n+1) > h/2 • 2 lg (n+1) > h

– This is what we wanted to prove, that the height is always less than O(lg n).

– All our operations will now be guaranteed to be O(lg n) as well (like search, insert, delete…)

Page 22: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red-Black Trees Example

• Example Red Black tree• Is it colored correctly?

7

5 9

1212

5 9

7

Red-black properties:1. Every node is either red or black2. Every leaf (NULL pointer) is black3. If a node is red, both children are black4. Every path from node to descendent leaf

contains the same number of black nodes5. The root is always black

Page 23: Tonga Institute of Higher Education Design and Analysis of Algorithms

Inserting in Red-Black Trees

• Insert 8– Where does it go?– What color is it?

12

5 9

7

1. Every node is either red or black2. Every leaf (NULL pointer) is black3. If a node is red, both children are black4. Every path from node to descendent leaf

contains the same number of black nodes5. The root is always black

Page 24: Tonga Institute of Higher Education Design and Analysis of Algorithms

Inserting in Red Black

• Insert 8

12

5 9

7

8

Page 25: Tonga Institute of Higher Education Design and Analysis of Algorithms

Insert in Red Black

• How about inserting 11?• What color?

– It can’t be black!– It can’t be red!

1. Every node is either red or black2. Every leaf (NULL pointer) is black3. If a node is red, both children are black4. Every path from node to descendent leaf

contains the same number of black nodes5. The root is always black

12

5 9

7

8

11

Page 26: Tonga Institute of Higher Education Design and Analysis of Algorithms

Insert in Red-Black

• Insert 11– Where does it go?– What color?– Solution:

Change the colors in the tree

1. Every node is either red or black2. Every leaf (NULL pointer) is black3. If a node is red, both children are black4. Every path from node to descendent leaf

contains the same number of black nodes5. The root is always black

12

5 9

7

8

11

Page 27: Tonga Institute of Higher Education Design and Analysis of Algorithms

Inserting Problems

• Insert 10– Where does it go?– What color?

1. Every node is either red or black2. Every leaf (NULL pointer) is black3. If a node is red, both children are black4. Every path from node to descendent leaf

contains the same number of black nodes5. The root is always black

12

5 9

7

8

11

10

Page 28: Tonga Institute of Higher Education Design and Analysis of Algorithms

Insertion Problems

• Insert 10– Where does it go?– What color?

• Answer: no color! • Tree is too imbalanced• Must change tree structure

to allow re-coloring

– Goal: to change the tree in O(lg n) time so that all properties hold

12

5 9

7

8

11

10

Page 29: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red Black Rotation• Our basic operation for changing the

tree is called rotation:

• Trees are rotated by changing the pointers in the tree

y

x C

A B

x

A y

B C

rightRotate(y)

leftRotate(x)

Page 30: Tonga Institute of Higher Education Design and Analysis of Algorithms

Rotation• What does rightRotate() do?• Answer: It changes a lot of pointers

– x keeps its left child– y keeps its right child– x’s right child becomes y’s left child– x’s and y’s parents change

• What is the running time: O(1).

rightRotate(y)y

x C

A B

x

A y

B C

Page 31: Tonga Institute of Higher Education Design and Analysis of Algorithms

Rotation Example

• Goal: To rotate left about 9:

12

5 9

7

8

11

Page 32: Tonga Institute of Higher Education Design and Analysis of Algorithms

Rotation Example

• Rotate left around 9

5 12

7

9

118

Page 33: Tonga Institute of Higher Education Design and Analysis of Algorithms

Rotation Code Left Rotate ExampleLeftRotate(Tree,x) y = x->right // we only know x at start, so get y x->right = y->left // start the rotating if y->left != NULL // if B is not null … y->parent->left = x // switch parent of B to be x y->parent = x->parent // change parent's if x->parent == NULL // if x was root node … root[Tree] = y // change root node to be y else if x == x->parent->left x->parent->left = y // if x was on left side… else // make y on left side of parent x->parent->right = y // else make y on right side of

parent y->left = x // now x is the child of y x->parent = y // now x's parent is y

yx C

A B

xA y

B C

leftRotate(x)

Page 34: Tonga Institute of Higher Education Design and Analysis of Algorithms

Red Black Insertion

• The idea of Insertion is to:– Insert x into tree, color x red– The only red black property that might be

violated is if x’s children are also red (because they have to be black)

– If so, move violation up tree until a place is found where it can be fixed

– Total time will be O(lg n)– To actually write the code for this is

complicated because there are many cases we must be careful of. For example: What happens when a parent's left child is null and red.

– Deletion is also O(lg n), and is also quite complicated

Page 35: Tonga Institute of Higher Education Design and Analysis of Algorithms

Hashing and Hash Tables

• Sometimes, people just need a few operations: Insert, Search, Delete. A hash table is a good solution, because searching is usually O(1)

• At it’s most basic level, a hash table is a data structure like an array. Data is stored in the array at specific indexes. The indexes are chosen by a hash function

• A hash function is a mapping between the input set of data and a set of integers

• This allows for getting the data in O(1) time.• You just take the thing you are looking for, apply the

hash function, and see if it is at the right place in the array

Page 36: Tonga Institute of Higher Education Design and Analysis of Algorithms

Hashing and Hash Tables

• Definition: a Hash will take a value and convert it to an integer, so that it can be put in an array

• With hash tables, there is always the chance that two elements will hash to the same integer value.

• This is called a collision, and special things have to be done to make sure that things don’t get deleted.

Page 37: Tonga Institute of Higher Education Design and Analysis of Algorithms

What can we do with hash tables?

• One idea: A dictionary– Goal: We want to be able to lookup any word in a

dictionary very fast.– Problem: If the words are in an array, we must

go through each word before we find the correct one. We cannot use trees because words cannot be sorted very well (most of the time)

– Solution: We can use a hash table to map words to integers and store them at a place in an array.

– We choose a hash function that changes letters to numbers.

– Then when we need to look something up, we use the hash function, check that place in the array and see if it exists.

Page 38: Tonga Institute of Higher Education Design and Analysis of Algorithms

Hash Tables

• To use mathematical language:– Given a table T and a record x, with key

and any other data, we need to support:• Insert (T, x)• Delete (T, x)• Search(T, x)

– We want these to be fast, but don’t care about sorting the records

• The structure we can use is a hash table– Supports all the above in O(1) expected

time!

Page 39: Tonga Institute of Higher Education Design and Analysis of Algorithms

A Simple Hash: Direct Addressing

• Suppose:– The range of keys is 0..m – And all keys are different

• The idea:– Set up an array T[0..m] in which

• T[i] = x if x T and hash-key[x] = i• T[i] = NULL otherwise

– This is called a direct-address table• Operations take O(1) time!• So what’s the problem?

Page 40: Tonga Institute of Higher Education Design and Analysis of Algorithms

Direct Addressing Problems

• Direct addressing works well when the range m of keys is small

• But what if the keys are 32-bit integers?– Problem 1: direct-address table will have

232 entries, more than 4 billion– Problem 2: even if memory is not an issue, the

time to initialize the elements to NULL may be really long

• Solution: map keys to smaller range 0..n where n < m

• This mapping is called a hash function

Page 41: Tonga Institute of Higher Education Design and Analysis of Algorithms

Hash tables

T

0

m - 1

h(k1)

h(k4)

h(k2) = h(k5)

h(k3)

k4

k2 k3

k1

k5

U(universe of keys)

K(actualkeys)

Now the only problem is: what happens if two keys go to thesame place in the array (called a collision)? We can fix collisions with policies: Examples:

ChainingOpen Addressing

Page 42: Tonga Institute of Higher Education Design and Analysis of Algorithms

Open Addressing

• Basic idea of open addressing to resolve collisions– On an insert: if index is full, try another index, …, until

an open index is found (called probing)– To search, follow same sequence of probing as would be

used when inserting the element• If you reach the element with correct key, return it• If you reach a NULL pointer, element is not in table

• Good for fixed sets – Good for sets where there is no deleting only adding – When you delete you will create NULL holes in indexes

and searching might not return the correct element

• Tables don’t need to be much bigger than n

Page 43: Tonga Institute of Higher Education Design and Analysis of Algorithms

Chaining

• Chaining will put elements in a list if there is a collision

——

——

——

——

——

——

k4

k2k3

k1

k5

U(universe of keys)

K(actualkeys)

k6

k8

k7

k1 k4 ——

k5 k2

k3

k8 k6 ——

——

k7 ——

Page 44: Tonga Institute of Higher Education Design and Analysis of Algorithms

Analysis of Chaining

• To insert, delete and search, it is simple. Apply the hash function to the data. Then search down the list

• To analyze:– Assume that each key in the table is equally likely to be

hashed to any slot

• Given n keys and m slots in the table: – Then n/m = average # keys per slot

• How long does it take for a search that does not find something? – You need to apply the hash function and search the whole

list – O(1+n/m)

• How long for something that is found? (on average)– = O(1+ (n/m) * (1/2)) = O(1 + n/m)

Page 45: Tonga Institute of Higher Education Design and Analysis of Algorithms

Analysis of Chaining

• So the cost of searching = O(1 + n/m)

• If the number of keys is about the same as the number of indexes in the table, what is n/m?

• Answer: n/m = O(1)– In other words, we can make the

average cost of searching constant if we make n/m constant

Page 46: Tonga Institute of Higher Education Design and Analysis of Algorithms

Making a hash function

• Choosing a good hash function will be important, because it will determine how many collisions you have

• One idea: The Division Method:• Hash(k) = k mod m

– In words: hash integer k into a table with m slots using the slot given by the remainder of k divided by m

• This is a decent function, but if m is a power of 2 or power of 10, there might be a lot of collisions.

• Solution: pick table size m = prime number not too close to a power of 2 (or 10)

Page 47: Tonga Institute of Higher Education Design and Analysis of Algorithms

Hash Functions: Polynomial Hash Codes

• A modulus hash function might be good for whole numbers, but what if you have strings?

• A polynomial hash code can take into account strings and the position of each character within a string to produce a relatively unique hash code

• If we choose a constant "a" > 1 and use the following code– If we take each character and convert to it's ASCII,

then use those ASCII codes as co-efficients "xi"– F(x) = x0ak-1 + x1ak-2 + … + xk-2a + xk-1

• This produces a number that will have few collisions for the most part, but if the string is really long (longer than 31 characters), the number that is produced might not be useful for a hash function

Page 48: Tonga Institute of Higher Education Design and Analysis of Algorithms

Using Hash Functions

• Very rarely will you need to make a Hashtable or hash function.

• Instead most languages, C++, Java and Visual Basic for example, all have built in Hashtables that can be used easily

• Remember, Hashtables are useful because they can quickly insert, remove and find elements.

Page 49: Tonga Institute of Higher Education Design and Analysis of Algorithms

Summary

• Data structures allow a programmer to control and manipulate their data in different ways that can improve performance and solve problems.

• We looked at Dynamic sets like Binary search Trees, Red-black trees and hash functions. You should be familiar with how all three of them work

• You should also know the running times of these data structures and be aware of when you should use them.