60
15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny Sleator, William Scherlis, Ananda Guna & Klaus Sutner

15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

15-211Fundamental Structuresof Computer Science

Jan. 23, 2003Ananda Guna

Binary Search Trees

Based on lectures given by Peter Lee, Avrim Blum, Danny Sleator, William Scherlis, Ananda Guna & Klaus Sutner

Page 2: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

First a Review of Stacks and Queues

Page 3: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A Stack interface

public interface Stack { public void push(Object x); public void pop(); public Object top(); public boolean isEmpty(); public void clear();}

Page 4: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Stacks are LIFO

a

b

c

d

e

Push operations:

Page 5: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Stacks are LIFO

a

b

c

d

e

Pop operation:

Last element that was pushed is the first to be popped.

Page 6: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A Queue interface

public interface Queue { public void enqueue(Object x); public Object dequeue(); public boolean isEmpty(); public void clear();}

Page 7: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Queues are FIFO

frontback

k r q c m

Page 8: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Queues are FIFO

frontback

k r q c my

Enqueue operation:

Page 9: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Queues are FIFO

frontback

k r q c my

Enqueue operation:

Page 10: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Queues are FIFO

frontback

k r q c my

Dequeue operation:

Page 11: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Implementing stacks, 1

a

b

c

Linked representation.All operations constant time, or O(1).

Page 12: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Implementing stacks, 2

An alternative is to use an array-based representation.

What are some advantages and disadvantages of an array-based representation?

a b c

top

Page 13: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A queue from two stacks

f

g

h

i

j

Enqueue:

e

d

c

b

a

Dequeue:

What happens when the stack on the right becomes empty?

Page 14: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Now to Trees

Page 15: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny
Page 16: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

CS is upside down

root

leaves

Page 17: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Trees are everywhere

Trees are everywhere in life.

As a result, in computer programs, trees turn out to be one of the most commonly used data structures.

Page 18: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Arithmetic Expressions

+

* 5

2 7

Page 19: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Game trees

Page 20: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Directory structure

/afs

cs andrew

acs course usr

15 18

127 211

usr

Page 21: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A tree is a set of nodes and a set of directed edges that connects pairs of nodes.

A tree is a a Directed, Acyclic Graph (DAG) with the following properties

- one vertex is distinguished as the root; no edges enter this vertex

- every other vertex has exactly one entering edge

Tree Definitions

Page 22: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Trees, more abstractly

A tree is a directed graph with the following characteristics:

There is a distinguished node called the root node.

Every non-root node has exactly one parent node (the root has none).

Page 23: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A closer look at Trees

R

T1

T2T3

siblings

Page 24: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Unique parents

a

b c d

e f

root

Page 25: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Implementation of Trees

How do we implement a general tree? Eg: A file system

Each node will have two links One to its left most child One to its right sibling

Page 26: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Implementation of a binary tree with an array

Assume that the left child of node i (i=1….) is stored at 2i and right child of node I is stored at 2i+1

Draw the tree represented by the following array (assume indices start from 1)

12 10 15 8 11 14 18Question: What is the minimum

height of a binary tree with n nodes? What is the maximum height?

Page 27: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Binary Tree Traversals

Inorder – Left-Root-Right Use stack or recursion

PreOrder – Root-Left-Right Use Stack or recursion

PostOrder-Left-Right-Root Use Stack or recursion

Level Order Traversal Use a queue

What is the output of each of the traversal? (see next slide for BFS in a tree)

Page 28: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Algorithm for Breadth-first traversal (of a tree using a queue)

enqueue the root while (the queue is not empty) { dequeue the front element print it enqueue its left child (if present) enqueue its right child (if present) }

Page 29: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Facts and Questions About Trees

A path from node n1 to nk is defined as a path n1,n2, …. nk such that ni is the parent of ni+1

Depth of a node is the length of the path from root to the node. What is the depth of root? What is the maximum depth of a tree with N nodes?

What is the number of edges in a tree with N nodes?

Height of a node is length of a path from node to the deepest leaf. The height of the tree is the ________________?

Let T(n) be the number of null pointers in a tree of n nodes. Show that T(n) = n + 1

Page 30: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Time to think about complexity of Algorithms

Considering algorithms Is the approach correct? How fast does it run? How much memory does it use? Can I finish writing the code in the next 8

hours? What is most important? Consider fib(n) = fib(n-1)+fib(n-2) for n >= 2

fib(0)=fib(1)=1Lets look at a simple algorithm

Page 31: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Fibonacchi Tree

public static long fib(int n) { if (n <= 1) return 1; return fib(n-1) + fib(n-2); }

It turns out the number of function calls is proportional to fib(n) itself! In fact, it's exactly 2*fib(n) - 1.

fib(90) takes about 7000 years on 1Ghz machine.

F(5)

F(4) F(3)

F(3) F(2) F(2) F(1)

F(2) F(1) F(0)F(1)

F(1) F(0)

F(1) F(0)

Closed form

Page 32: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Making Fibonacci more efficient

Can we write a better algorithm? Can we reuse some of the parts of the recursion?

// call initially as fastfib(0,1,n) public static long fastfib(long prev, long current,

int togo) { if (togo <= 0) return current; return fastfib(current, current+prev, togo-1); } What is the complexity of this algorithm?

Page 33: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

A question about height and number of nodes in a binary tree

Suppose we have n nodes in a complete binary tree of height h. What is the relation between n and h?

The number of nodes in level i is 2i (i=0,1,…,h) Therefore total nodes in all levels is

So what is the relation between n and h? A binary tree is completely full if it is of height,

h, and has 2h+1-1 nodes.

Page 34: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Bit about asymptotic analysis

O notation: T(n) is O(f(n)) if there exist two positive constants

c and n0 such that T(n) <= c*f(n) for all n > n0

Omega notation: T(n) is Omega(f(n)) if there exist two positive

constants c and n0 such that T(n) >= c*f(n) for all n > n0

Theta notation: T(n) is Theta(f(n)) if it is both O(f(n)) AND

Omega(f(n)).

Page 35: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

“Big-Oh” notation

N

cf(N)

T(N)

n0

runn

ing t

ime

T(N) = O(f(N))“T(N) is order f(N)”

Page 36: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Some examples

If f(n) = 10n + 5 and g(n) = nshow f(n) is O(g(n))

f(n) = 3n2 + 4n + 1. Show f(n) is O(n2)

show that 5log(n) is O(n) f(n) = 3n2 + 4n + 1. Show f(n) is

(n2)Therefore f(n) = theta(n2)

Page 37: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Logarithms and exponents

Logarithms and exponents are everywhere in algorithm analysis

logba = c if a = bc

Page 38: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Logarithms and exponents

Usually will leave off the base b when b=2, so for example

log 1024 = 10

Page 39: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Some useful equalities

logbac = logba + logbclogba/c = logba - logbclogbac = clogbalogba = (logca) / logcb(ba)c = bac

babc = ba+c

ba/bc = ba-c

Page 40: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Big-Oh again

When T(N) = O(f(N)), we are saying that T(N) grows no faster than f(N). I.e., f(N) describes an upper bound on

T(N).

Put another way: For “large enough” inputs, cf(N) always

dominates T(N).Called the asymptotic behavior

Page 41: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Big-O characteristic

If T(N) = cf(N) then T(N) = O(f(N)) Constant factors “don’t matter”

Because of this, when T(N) = O(cg(N)), we usually drop the constant and just say O(g(N))

Page 42: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Big-O characteristic

Suppose T(N)= k, for some constant kThen T(N) = O(1)

Page 43: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Big-O characteristic

More interesting: Suppose T(N) = 20n3 + 10nlog n + 5 Then T(N) = O(n3) Lower-order terms “don’t matter”

Question: What constants c and n0 can be used to

show that the above is true?Answer: c=35, n0=1

Page 44: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Big-O characteristic

If T1(N) = O(f(N)) and T2(N) = O(g(N)) then T1(N) + T2(N) = max(O(f(N)), O(g(N)). The bigger task always dominates

eventually.

Also: T1(N) T2(N) = O(f(N) g(N)).

Page 45: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Some common functions

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10

10N100 log N5 N^2N^32^N

Page 46: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

BST-An Inductive Perspective

Let's focus on binary trees (left/right child only).

A binary tree is either

• empty (we'll write nil for clarity), or

• looks like (x,L,R) where

x is the element stored at the root, and

L, R are the left and right subtrees of the root.

Page 47: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

In Pictures

x

L

R

Empty Tree

Page 48: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Flattening a BT

a

b d

e f g

T

flat(T) = e, b,f,a,d,g

Page 49: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Def: Binary Search Tree

A binary T is a binary search tree (BST) iff

flat(T) is an ordered sequence.

Equivalently, in (x,L,R) all the nodes in L are less than x, and all the nodes in R are larger than x.

Page 50: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

flat(T) = 2,3,4,5,6,7,9

Example

5

3

6

7

2 4 9

Page 51: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Why do we care?

versus

Page 52: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

search(x,nil) = falsesearch(x,(x,L,R)) = true

search(x,(a,L,R)) = search(x,L) x<asearch(x,(a,L,R)) = search(x,R) x>a

Binary Search

should return value

How does one search in a BST?

Page 53: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Correctness

Clearly, search() can never return a false positive answer.

But search() only walks down one branch, so how do we know we don't get false negative answers?

Suppose T is a BST that contains x.

Claim: search(x,T) properly returns "true".

Page 54: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Proof

T cannot be nil, so suppose T = (a,L,R).

Case 1: x = a: done.

Case 2: x < a: Since T is a BST, x must be in L.

But by induction (on trees), search(x,L) returns true. Done.

Case 3: x > a: same as case 2.

Page 55: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Insertions

Insertions in a BST are very similar to searching: find the right spot, and then put down the new element as a new leaf.

We will not allow multiple insertions of the same element, so there is always exaxtly one place for the new guy.

Page 56: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

How Many?

How many decisions do we have to make before we have either found the element, or know it's not in the tree?

We walk down a branch in the tree, so the worst case RT for search is

O( depth of T ) = O( # nodes )

Page 57: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Good Tree

But in a "good" BST we have

depth of T = O( log # nodes )

Theorem: If the tree is constructed from n inputs given in random order, then we can expect the depth of the tree to be log2 n.

But if the input is already (nearly, reverse,…) sorted we are in trouble.

Page 58: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Forcing good behavior

It is clear (?) that for any n inputs, there always is a BST containing these elements of logarithmic depth.

But if we just insert the standard way, we may build a very unbalanced, deep tree.

Can we somehow force the tree to remain shallow?

At low cost?

Page 59: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

AVL-Trees

G. M. Adelson-Velskii and E. M. Landis, 1962

1 or less

Page 60: 15-211 Fundamental Structures of Computer Science Jan. 23, 2003 Ananda Guna Binary Search Trees Based on lectures given by Peter Lee, Avrim Blum, Danny

Next Week

More about AVL trees on tuesday

Homework 1 is due Monday 27th.

This is a good time to catch up with Java deficiencies, if any.