Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua...

Preview:

Citation preview

Trees (Ch. 9.2)

Longin Jan LateckiTemple University

based on slides by

Simon Langley and Shang-Hua Teng

Basic Data Structures - Trees

Informal: a tree is a structure that looks like a real tree (up-side-down)

Formal: a tree is a connected graph with no cycles.

Trees - Terminology

x

b e m

c d a

root

leaf

height=2

size=7

Every node must have its value(s)Non-leaf node has subtree(s)Non-root node has a single parent nodeA parent may have 1 or more children

value

subtree

nodes

Types of Tree

Binary Tree

m-ary Trees

Each node has at most 2 sub-trees

Each node has at most m sub-trees

Binary Search Trees

A binary search tree: … is a binary tree. if a node has value N, all values in its

left sub-tree are less than N, and all values in its right sub-tree are greater than N.

This is a binary search tree

This is NOT a binary search tree

5

4 7

3 2 8 9

Searching a binary search tree

search(t, s) {

If(s == label(t))

return t;

If(t is leaf) return null

If(s < label(t))

search(t’s left tree, s)

else

search(t’s right tree, s)}

h

Time per level

O(1)

O(1)

Total O(h)

Searching a binary search tree

search( t, s )

{ while(t != null)

{ if(s == label(t)) return t;

if(s < label(t)

t = leftSubTree(t);

else

t = rightSubTree(t);

}

return null;

h

Time per level

O(1)

O(1)

Total O(h)

Here’s another function that does the same (we search for label s):

TreeSearch(t, s)

while (t != NULL and s != label[t])

if (s < label[t])

t = left[t];

else

t = right[t];

return t;

Insertion in a binary search tree:we need to search before we insert

5

3 8

2 4 7 9

Time complexity ?

Insert 6 6

6

6

6

Insert 1111

11

11

O(height_of_tree)O(log n) if it is balanced n = size of the tree

always insert to a leaf

Insertion

insertInOrder(t, s)

{ if(t is an empty tree) // insert here

return a new tree node with value s

else if( s < label(t))

t.left = insertInOrder(t.left, s )

else

t.right = insertInOrder(t.right, s)

return t }

Try it!!

Build binary search trees for the following input sequences• 7, 4, 2, 6, 1, 3, 5, 7

• 7, 1, 2, 3, 4, 5, 6, 7

• 7, 4, 2, 1, 7, 3, 6, 5

• 1, 2, 3, 4, 5, 6, 7, 8

• 8, 7, 6, 5, 4, 3, 2, 1

Comparison –Insertion in an ordered list

Insert 6

Time complexity?

2 3 4 5 7 98

6 6 6 6 6

O(n) n = size of the list

insertInOrder(list, s) { loop1: search from beginning of list, look for an item >= s loop2: shift remaining list to its right, start from the end of list insert s}

6 7 8 9

Suppose we have 3GB character data file that we wish to include in an email.

Suppose file only contains 26 letters {a,…,z}. Suppose each letter in {a,…,z} occurs with frequency

f. Suppose we encode each letter by a binary code If we use a fixed length code, we need 5 bits for each

character The resulting message length is

Can we do better?

Data Compression

zba fff 5

Data Compression: A Smaller Example Suppose the file only has 6 letters {a,b,c,d,e,f}

with frequencies

Fixed length 3G=3000000000 bits Variable length

110011011111001010

101100011010001000

05.09.16.12.13.45.

fedcba

Fixed length

Variable length

G24.2405.409.316.312.313.145.

How to decode?

At first it is not obvious how decoding will happen, but this is possible if we use prefix codes

Prefix Codes No encoding of a

character can be the prefix of the longer encoding of another character:

We could not encode t as 01 and x as 01101 since 01 is a prefix of 01101

By using a binary tree representation we generate prefix codes with letters as leaves

e

a

t

n s

0 1

1

1

1

0

0

0

Prefix codes allow easy decoding

e

a

t

n s

0 1

1

1

1

0

0

0

Decode:

11111011100

s 1011100

sa 11100

san 0

sane

Prefix codes

A message can be decoded uniquely.

Following the tree until it reaches to a leaf, and then repeat!

Draw a few more trees and produce the codes!!!

Some Properties

Prefix codes allow easy decoding An optimal code must be a full binary tree (a

tree where every internal node has two children)

For C leaves there are C-1 internal nodes The number of bits to encode a file is

ccfT TCc

length )()B(

where f(c) is the freq of c, lengthT(c) is the tree depth of c, which corresponds to the code length of c

Optimal Prefix Coding Problem

Input: Given a set of n letters (c1,…, cn) with frequencies (f1,…, fn).

Construct a full binary tree T to define a prefix code that minimizes the average code length

iT

n

i i cfT length )Average(1

Greedy Algorithms

Many optimization problems can be solved using a greedy approach• The basic principle is that local optimal decisions may be used to

build an optimal solution

• But the greedy approach may not always lead to an optimal solution overall for all problems

• The key is knowing which problems will work with this approach and which will not

We study• The problem of generating Huffman codes

Greedy algorithms A greedy algorithm always makes the choice that looks

best at the moment• My everyday examples:

• Driving in Los Angeles, NY, or Boston for that matter

• Playing cards

• Invest on stocks

• Choose a university

• The hope: a locally optimal choice will lead to a globally optimal solution

• For some problems, it works

Greedy algorithms tend to be easier to code

David Huffman’s idea

A Term paper at MIT

Build the tree (code) bottom-up in a greedy fashion

Each tree has a weight in its root and symbols as its leaves.

We start with a forest of one vertex trees representing the input symbols.

We recursively merge two trees whose sum of weights is minimal until we have only one tree.

Building the Encoding Tree

Building the Encoding Tree

Building the Encoding TreeBuilding the Encoding Tree

Building the Encoding TreeBuilding the Encoding Tree

Building the Encoding TreeBuilding the Encoding Tree

Recommended