Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed

Optimal Binary Search Tree

• We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed.

• To find an optimal binary search tree for a given static file, a cost measure must be determined for search trees.

• It’s reasonable to use the level number of a node as the cost.

Binary Search Tree Example

for

do while

return

if

for

do return

if while

4 comparisons in worst case

3 comparisons in worst case

Extended Binary Tree Example

for

do while

return

if

for

do return

if while

(a)

(b)

External Path Length and Internal Path Length

• External path length of a binary tree is the sum over all external nodes of the lengths of the paths from the root to those nodes.

• Internal path length is the sum over all internal nodes of the lengths of the paths from the root to those nodes.

• Let the internal path length be I and external path length E, then the binary tree of (a) has I = 0+1+1+2+3 = 7, E = 2+2+4+4+3+2 = 17.

External Path Length and Internal Path Length

(Cont.)• It can be shown that E = I + 2n.• Binary trees with maximum E also have

maximum I.• For all binary trees with n internal

nodes, – maximum I = (skew tree)

– minimum I = (complete binary tree)

1

0

2/)1(n

i

nni

ni

i1

2log

Binary Search Tree Containing A Symbol

Table• Let’s look at the problem of representing a

symbol table as a binary search tree. If a binary search tree contains the identifiers a1, a2, …, an with a1 < a2 < … < an, and the probability of searching for each ai is pi, then the total cost of any binary search tree is

when only successful searches are made.

ni

ii alevelp1

)(*

Binary Search Tree Containing A Symbol

Table• For unsuccessful searches, let’s partitioned the identifiers

not in the binary search tree into n+1 classes Ei, 0 ≤ i ≤ n. If qi is the probability that the identifier being sought is in E i, then the cost of the failure node is

• Therefore, the total cost of a binary search tree is

• An optimal binary search tree for the identifier set a1, …, an is one that minimize the above equation over all possible binary search trees for this identifier set. Since all searches must terminate either successfully or unsuccessfully, we have

1) - ) node urelevel(fail( iqi

)1 - ) node urelevel(fail()level(ni0ni1

iqap iii

ni ni

ii qp1 0

1

Binary Search Tree With Three Identifiers Example

while

if

do

if

do while

do

if

while

while

do

if

do

while

if

(a)

(b)

(c)

(d)

(e)

Cost of Binary Search Tree In The Example

• With equal probabilities, pi = qj = 1/7 for all i and j, we have cost(tree a) = 15/7; cost(tree b) = 13/7 cost(tree c) = 15/7; cost(tree d) = 15/7 cost(tree e) = 15/7 Tree b is optimal.

• With p1=0.5, p2=0.1, p3=0.05, q0=0.15, q1=0.1, q2=0.05, and q3=0.05 we have

cost(tree a) = 2.65; cost(tree b) = 1.9 cost(tree c) = 1.5; cost(tree d) = 2.05 cost(tree e) = 1.6

Tree c is optimal.

Determine Optimal Binary Search Tree

• So to determine which is the optimal binary search, it is not practical to follow the above brute force approach since the complexity is O(n4n/n3/2).

• Now let’s take another approach. Let Tij denote an optimal binary search tree for ai+1, …, aj, i<j. Let cij be the cost of the search tree Tij. Let rij be the root of Tij and let wij be the weight of Tij, where

• Therefore, by definition rii=0, wii=qi, 0 ≤ i ≤ n. T0n is an optimal binary search tree for a1, …, an. Its cost function is c0n, it weight w0n, and it root is r0n.

j

ikkkiij pqqw

1

)(

Determine Optimal Binary Search Tree (Cont.)

• If Tij is an optimal binary search tree for ai+1, …, aj, and rij =k, then i< k <j. Tij has two subtrees L and R. L contains ai+1, …, ak-1, and R contains ak+1, …, aj. So the cost cij of Tij is

cij = pk + cost(L) + cost(R) + weight(L) + weight(R)cij = pk + ci,k-1+ ckj + wi,k-1+ wkj

= wij + ci,k-1+ ckj

• Since Tij is optimal, we have wij + ci,k-1 + ckj = }min{ 1, ijliij

jliccw

}{min 1,1, ljlijlikjki cccc

Example 10.2• Let n=4, (a1, a2, a3, a4) = (do, if return, while). Let (p1,

p2, p3, p4)=(3,3,1,1) and (q0, q1, q2, q3, q4)=(2,3,1,1,1). wii = qii, cii=0, and rii=0, 0 ≤ i ≤ 4.

w01 = p1 + w00 + w11 = p1 +q1 +w00 = 8 c01 = w01 + min{c00 +c11} = 8 r01 = 1 w12 = p2 + w11 + w22 = p2 +q2 +w11 = 7 c12 = w12 + min{c11 +c22} = 7 r12 = 2 w23 = p3 + w22 + w33 = p3 +q3 +w22 = 3 c23 = w23 + min{c22 +c33} = 3 r23 = 3 w34 = p4 + w33 + w44 = p4 +q4 +w33 = 3 c34 = w34 + min{c33 +c44} = 3 r34 = 4

Example 10.2 Computation

w00=2c00=0r00=0

w11=3c11=0r11=0

w22=1c22=0r22=0

w00=2c00=0r00=0

w33=1c33=0r33=0

w44=1c44=0r44=0

w01=8c01=8r01=1

w12=7c12=7r12=2

w23=3c23=3r23=3

w34=3c34=3r34=4

w02=12c02=19r02=1

w13=9c13=12r13=2

w24=5c24=8r24=3

w03=14c03=25r03=2

w14=11c14=19r14=2

w04=16c04=32r04=2

0 1 2 3 4

4

0

1

2

3

Computation Complexity of Optimal Binary Search

Tree• To evaluate the optimal binary tree we need to compute

cij for (j-i)=1, 2, …,n in that order. When j-i=m, there are n-m+1 cij’s to compute.

• The computation of each cij’s can be computed in time O(m).

• The total time for all cij’s with j-i=m is therefore O(nm-m2). The total time to evaluate all the cij’s and rij’s is

• The computing complexity can be reduced to O(n2) by limiting the search of the optimal l to the range of ri,j-1 ≤ l ≤ ri+1,j according to D. E. Knuth.

nm

nOmnm1

32 )()(

AVL Trees• Dynamic tables may also be

maintained as binary search trees.• Depending on the order of the

symbols putting into the table, the resulting binary search trees would be different. Thus the average comparisons for accessing a symbol is different.

Binary Search Tree for The Months of The Year

JAN

APR

AUG

DEC

SEPT

OCT

NOV

FEB MAR

MAYJUNE

JULY

Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG, SEPT, OCT, NOV, DEC

Max comparisons: 6Average comparisons: 3.5

A Balanced Binary Search Tree For The Months of The

Year

JULY

Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN, JUNE, SEPT, NOV

APR DEC JUNE

MAR

NOV SEPT

OCTAUG

FEB MAY

JAN

Max comparisons: 4Average comparisons: 3.1

Degenerate Binary Search Tree

APRInput Sequence: APR, AUG, DEC, FEB, JAN, JULY, JUNE, MAR, MAY, NOV, OCT, SEPT

AUGDEC

FEBJAN

JULYJUNE

MAR

SEPT

NOVOCT

MAYMax comparisons: 12Average comparisons: 6.5

Minimize The Search Time of Binary Search Tree In Dynamic Situation

• From the above three examples, we know that the average and maximum search time will be minimized if the binary search tree is maintained as a complete binary search tree at all times.

• However, to achieve this in a dynamic situation, we have to pay a high price to restructure the tree to be a complete binary tree all the time.

• In 1962, Adelson-Velskii and Landis introduced a binary tree structure that is balanced with respect to the heights of subtrees. As a result of the balanced nature of this type of tree, dynamic retrievals can be performed in O(log n) time if the tree has n nodes. The resulting tree remains height-balanced. This is called an AVL tree.

AVL Tree• Definition: An empty tree is height-balanced. If

T is a nonempty binary tree with TL and TR as its left and right subtrees respectively, then T is height-balanced iff

(1) TL and TR are height-balanced, and (2) |hL – hR| ≤ 1 where hL and hR are the heights of

TL and TR, respectively.• Definition: The Balance factor, BF(T) , of a node

T is a binary tree is defined to be hL – hR, where hL and hR, respectively, are the heights of left and right subtrees of T. For any node T in an AVL tree, BF(T) = -1, 0, or 1.

Balanced Trees Obtained for The Months of The Year

MAR0

MAR-1

MAY0

(a) Insert MARCH

(b) Insert MAY

MAR-2

MAY-1

(c) Insert NOVEMBER

NOV0

MAY0

NOV00

MAR

RR

(d) Insert AUGUST

MAY+1

NOV0+

1MAY0

AUG


(Cont.)

(e) Insert APRIL

MAY+2

NOV0+

2MAR+1AUG

0APR

LL MAY+1

NOV00

AUG0

APR MAR0

MAY+2

NOV0-1

AUG0

APR MAR+1

0JAN (f) Insert JANUARY

MAR0

MAY-10

AUG0

APR JAN0

NOV0

LR


(Cont.)

MAR+1

MAY-1-1

AUG0

APR JAN+1 NOV

0

0DEC

(g) Insert DECEMBER

MAR+1

MAY-1-1

AUG0

APR JAN0

NOV0

0DEC

(h) Insert JULY

JULY0


(Cont.)

(i) Insert FEBRUARY

MAR+2

MAY-2-2

AUG0

APR JAN+1 NOV

0

-1DEC JULY

0

FEB0

MAR+1

MAY-10

DEC+1AUG JAN

0NOV

0

0FEB JULY

0

RL

0APR


(Cont.)

(j) Insert JUNE

LR 0JAN

AUG FEB

NOV

JULY MAY

APR JUNE

DEC MAR+1

0

+1

0

0 0 0

-1 -1

+2MAR

AUG JAN

JULY

NOV

DEC MAY

+1

-1

-1

0

JUNE0

FEB0

-1-1

APR0


(Cont.)

APR

AUG FEB

JAN

DEC MAR

JULY MAY

NOV

OCT

JUNE

-1

+1

-1

-1

-1

-2

0

0

0+1

0

(k) Insert OCTOBER

APR

AUG FEB

JAN

DEC MAR

JULY NOV

MAY OCTJUNE

-1

+1

0

-1

0

0

00

0+1

0

RR


(Cont.)

(i) Insert SEPTEMBER

JAN

DEC MAR

AUG FEB JULY NOV

-1

+1

0 -1OCT

-1

MAY0

JUNE0APR

-1-1

0SEPT

0+1

Rebalancing Rotation of Binary Search Tree

• LL: new node Y is inserted in the left subtree of the left subtree of A

• LR: Y is inserted in the right subtree of the left subtree of A

• RR: Y is inserted in the right subtree of the right subtree of A

• RL: Y is inserted in the left subtree of the right subtree of A.

• If a height–balanced binary tree becomes unbalanced as a result of an insertion, then these are the only four cases possible for rebalancing.

Rebalancing Rotation LL

+1A

0B

BL BR

ARh+2h

+2A

+1B

BL BR

AR

0B

BL

0A

BR AR

h+2

LL

height of BL increases to h+1

Rebalancing Rotation RR

-1A

0B

BRBL

AL

h+2

-2A

-1B

BRBL

AL

0B

BR

0A

BLAL

h+2

RR

height of BR increases to h+1

Rebalancing Rotation LR(a)

+1A

0B

+2A

-1B

0C

0B

0C

0A

LR(a)

Rebalancing Rotation LR(b)

+1A

0B

0C

BLhCL CR

AR

h+2h

+2A

-1B

+1C

BL

CL CR

AR

LR(b)

0C

0B

-1A

CLBLCR AR

h+2

h

Rebalancing Rotation LR(c)

+2A

-1B

-1C

BL

CL CR

AR

LR(c)0C

+1B

0A

CLBLCR AR

h+2

h

AVL Trees (Cont.)• Once rebalancing has been carried

out on the subtree in question, examining the remaining tree is unnecessary.

• To perform insertion, binary search tree with n nodes could have O(n) in worst case. But for AVL, the insertion time is O(log n).

AVL Insertion Complexity• Let Nh be the minimum number of nodes in a

height-balanced tree of height h. In the worst case, the height of one of the subtrees will be h-1 and that of the other h-2. Both subtrees must also be height balanced. Nh = Nh-1 + Nh-2 + 1, and N0 = 0, N1 = 1, and N2 = 2.

• The recursive definition for Nh and that for the Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1.

• It can be shown that Nh= Fh+2 – 1. Therefore we can derive that . So the worst-case insertion time for a height-balanced tree with n nodes is O(log n).

15/2 hhN

Probability of Each Type of Rebalancing Rotation

• Research has shown that a random insertion requires no rebalancing, a rebalancing rotation of type LL or RR, and a rebalancing rotation of type LR and RL, with probabilities 0.5349, 0.2327, and 0.2324, respectively.

Comparison of Various Structures

Operation Sequential List Linked List AVL TreeSearch for x O(log n) O(n) O(log n)

Search for kth item

O(1) O(k) O(log n)

Delete x O(n) O(1)1 O(log n)

Delete kth item

O(n - k) O(k) O(log n)

Insert x O(n) O(1)2 O(log n)

Output in order

O(n) O(n) O(n)

1. Doubly linked list and position of x known.2. Position for insertion known

2-3 Trees• If search trees of degree greater than 2 is used, we’ll have

simpler insertion and deletion algorithms than those of AVL trees. The algorithms’ complexity is still O(log n).

• Definition: A 2-3 tree is a search tree that either is empty or satisfies the following properties:

(1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one element; a 3-node has two elements.

(2) Let LeftChild and MiddleChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3 subtree with root LeftChild have key less than dataL.key, whereas all elements in the 2-3 subtree with root MiddleChild have key greater than dataL.key.

(3) Let LeftChild, MiddleChild, and RightChild denote the children of a 3-node. Let dataL and dataR be the two elements in this node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree with root LeftChild are less than dataL.key; all keys in the 2-3 subtree with root MiddleChild are less than dataR.key and greater than dataL.key; and all keys in the 2-3 subtree with root RightChild are greater than dataR.key.

(4) All external nodes are at the same level.

2-3 Tree Example

40

10 20 80

A

B C

The Height of A 2-3 Tree• Like leftist tree, external nodes are

introduced only to make it easier to define and talk about 2-3 trees. External nodes are not physically represented inside a computer.

• The number of elements in a 2-3 tree with height h is between 2h - 1 and 3h - 1. Hence, the height of a 2-3 tree with n elements is between and )1(log3 n )1(log2 n

2-3 Tree Data Structuretemplate<class KeyType> class Two3;class Two3Node {friend class Two3<KeyType>;private:

Element<KeyType> dataL, dataR;Two3Node *LeftChild, *MiddleChild, *RightChild;

};

template<class KeyType>class Two3{public:

Two3(KeyType max, Two3Node<KeyType>* int=0): MAXKEY(max), root(init) {}; // constructor

Boolean Insert(const Element<KeyType>&);Boolean Delete(const Element<KeyType>&);Two3Node<KeyType>* Search(const Element<KeyType>&);

private:Two3Node<KeyType>* root;KeyType MAXKEY;

};

Searching A 2-3 Tree• The search algorithm for binary search

tree can be easily extended to obtain the search function of a 2-3 tree (Two3::Search()).

• The search function calls a function compare that compares a key x with the keys in a given node p. It returns the value 1, 2, 3, or 4, depending on whether x is less than the first key, between the first key and the second key, greater than the second key, or equal to one of the keys in node p.

Searching Function of a 2-3 Tree

template <class KeyType>Two3Node<KeyType>* Two3<KeyType>:: Search(const

Element<KeyType>& x)// Search the 2-3 tree for an element x. If the element is not in the tree, then return 0. // Otherwise, return a pointer to the node that contains this element.{

for (Two3Node<KeyType>* p = root; p;) switch(p->compare(x)){ case 1: p = p->LeftChild; break;

case 2: p = p->MiddleChild; break; case 3: p = p->RightChild; break; case 4: return p; // x is one of the keys in p }}

Insertion Into A 2-3 Tree• First we use search function to search the 2-3

tree for the key that is to be inserted. • If the key being searched is already in the tree,

then the insertion fails, as all keys in a 2-3 tree are distinct. Otherwise, we will encounter a unique leaf node U. The node U may be in two states:– the node U only has one element: then the key can be

inserted in this node.– the node U already contains two elements: A new node

is created. The newly created node will contain the element with the largest key from among the two elements initially in p and the element x. The element with the smallest key will be in the original node, and the element with median key, together with a pointer to the newly created node, will be inserted into the parent of U.

Insertion to A 2-3 Tree Example

40

10 20 70 80

A

B C

(a) 70 inserted

20 40

10 30

A

B D

(b) 30 inserted

8070C

Insertion of 60 Into Figure 10.15(b)

20

10 30

A

B D60C

80E

70F

40G

Node Split• From the above examples, we find

that each time an attempt is made to add an element into a 3-node p, a new node q is created. This is referred to as a node split.

2-3 Tree Insertion Function

template <class KeyType>Boolean Two3<KeyType>::Insert(const Element<KeyType>& y){ Two3Node<KeyType>* p; Element<KeyType> x = y;

if (x.key>=MAXKEY) return FALSE; // invalid keyif (!root) {NewRoot(x, 0); return TRUE;}if (!(p = FindNode(x))){ InsertionError(); return FALSE;}for (Two3Node<KeyType> *a = 0;;) if (p->dataR.key == MAXKEY) { // p is a 2-node

p->PutIn(x, a); return TRUE;

} else { // p is a 3-node Two3Node<KeyType>* olda = a; a = new(Two3Node<KeyType>); x = Split(p, x, olda, a); if (root == p) { // root has been split

NewRoot(x, a); return TRUE;

} else p = p->parent(); }

}

Deletion From a 2-3 Tree

• If the element to be deleted is not in a leaf node, the deletion operation can be transformed to a leaf node. The deleted element can be replaced by either the element with the largest key on the left or the element with the smallest key on the right subtree.

• Now we can focus on the deletion on a leaf node.

Deletion From A 2-3Tree Example

50 80

10 20 60 70

A

B D9590

C

50 80

10 20 60

A

B D9590

C

50 80

10 20 60

A

B D95

C

(a) Initial 2-3 tree (b) 70 deleted

(c) 90 deleted

Deletion From A 2-3Tree Example (Cont.)20 80

10 50

A

B D95

C

(d) 60 deleted 20

10 50 80

A

B C

(e) 95 deleted

20

10 80

A

B C

(f) 50 deleted

20 80B

(g) 10 deleted

Rotation and Combine• As shown in the example, deletion may

invoke a rotation or a combine operations.

• For a rotation, there are three cases– the leaf node p is the left child of its parent r.– the leaf node p is the middle child of its

parent r.– the leaf node p is the right child of its parent

r.

Three Rotation Casesx ?

y z

a b dc

r

p qy ?

x z

a b dc

r

p q

z ?

x y

a b dc

r

pqy ?

x z

a b dc

r

pq

w zr

px yq

b dc e

w yr

zp

xq

b dc e

a

(a) p is the left child of r

(b) p is the middle child of r

(c) p is the right child of r

Steps in Deletion From a Leaf Of a 2-3 Tree

• Step 1: Modify node p as necessary to reflect its status after the desired element has been delete.

• Step 2: for (; p has zero elements && p != root; p = r) { let r be the parent of p, and let q be the left or right sibling of p;

if (q is a 3-node) perform a rotation else perform a combine; }• Step 3: If p has zero elements, then p must be the root.

The left child of p becomes the new root, and node p is deleted.

Combine When p is the Left Child of rx z

y

a b c

r

p qz

x y

a b c

r

p

x z

y

a b c

r

pq

a

rz

x

b

d p

(a)

(b)

c

d

2-3-4 TreesDefinition: A 2-3-4 tree is a search tree that either is

empty or satisfies the following properties: (1) Each internal node is a 2-, 3-, or 4-node. A 2-node

has one element, a 3-node has two elements, and a 4-node has three elements.

(2) Let LeftChild and LeftMidChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3-4 subtree with root LeftMidChild have key greater than dataL.key.

(3) LeftChild, LeftMidChild, and RightMidChild denote the children of a 3-node. Let dataL and dataM be the two elements in this node. Then, dataL.key < dataM.key; all keys in the 2-3-4 subtree with root LeftChild are less than dataL.key; all keys in the 2-3-4 subtree with root LeftMidChild are less than datM.key and greater than dataL.key; and all keys in the 2-3-4 subtree with root RightMidChild are greater than dataM.key.

2-3-4 Trees (Cont.)(4) Let LeftChild, LeftMidChild, RigthMidChild,

RightChild denote the children of a 4-node. Let dataL, dataM, dataR be the three elements in this node. The, dataL.key < dataM.key < dataR.key; all keys in the 2-3-4 subtree with root LeftChild are less than dataL.key; all keys in the 2-3-4 subtree with root LeftMidChild are less than dataM.key and greater than dataL.key; all keys in the 2-3-4 subtree with root RightMideChild are greater than dataM.key but less than dataR.key; and all keys in the 2-3-4 subtree with root RightChild are greater than dataR.key.

(5) All external nodes are at the same level.

2-3-4 Tree Example50

10 70 80

5 7 8 30 40 60 75 85 90 92

2-3-4 Trees (Cont.)• Similar to the 2-3 tree, the height of a 2-3-4 tree with

n nodes h is bound between and • 2-3-4 tree has an advantage over 2-3 trees in that

insertion and deletion can be performed by a single root-to-leaf pass rather than by a root-to-leaf pass followed by a leaf-to-root pass.

• So the corresponding algorithms in 2-3-4 trees are simpler than those of 2-3 trees.

• Hence 2-3-4 trees can be represented efficiently as a binary tree (called red-black tree). It would result in a more efficient utilization of space.

)1(log4 n )1(log2 n

Top-Down Insertion• If the leaf node into which the element is to be inserted is

a 2-node or 3-node, then it’s easy. Simply insert the element into the leaf node.

• If the leaf node into which the element is to be inserted is a 4-node, then this node splits and a backward (leaf-to-root) pass is initiated. This backward pass terminates when either a 2-node or 3-node is encountered, or when the root is split.

• To avoid the backward pass, we split 4-nodes on the way down the tree. As a result, the leaf node into which the insertion is to be made is guaranteed to be a 2- or 3-node.

• There are three different cases to consider for a 4-node: (1) It is the root of the 2-3-4 tree. (2) Its parent is a 2-node. (3) Its parent is a 3-node.

Transformation When the 4-Node Is The Root

x y z

a b c d

y

x z

a b c d

root root

Increase height by one

Transformation When the 4-Node is the Child of a 2-

Nodez

ew x y

a b c d

x z

w y

a b c d

w

ax y z

b c d e

w y

x z

b c d e

(a)

(b)

a

e

Transformation When the 4-Node is the Left Child of a 3-

Node

y z

ev w x

b c da

f

w y z

v x

a b c d

e f

Transformation When the 4-Node is the Left Middle Child of a 3-Node

v z

aw x y

b c d e

v x z

w y

b c d e

af f

Transformation When the 4-Node is the Right Middle Child of a 3-Node

v w

ax y z

b

c d e

v w y

x z

c d e f

a

f

b

Top-Down Deletion• The deletion of an arbitrary element from a 2-3-4

tree may be reduced to that of a deletion of an element that is in a leaf node. If the element to be deleted is in a leaf that is a 3-node or 4-node, the its deletion leaves behind a 2-node or a 3-node. No restructure is required.

• To avoid a backward restructuring path, it is necessary to ensure that at the time of deletion, the element to be deleted is in a 3-node or a 4-node. This is accomplished by restructuring the 2-3-4 tree during the downward pass.

Top-Down Deletion (Cont.)

• Suppose the search is presently at node p and will move next to node q. The following cases need to be considered:

(1) p is a leaf: The element to be deleted is either in p or not in the tree.

(2) q is not a 2-node. In this case, the search moves to q, and no restructuring is needed.

(3) q is a 2-node, and its nearest sibling, r, is also a 2-node.– if p is a 2-node, p must be root, and p, q, r are combined

by reserving the 4-node being the root splitting transformation.

– if p is a 3-node or a 4-node, perform, in reverse, the 4-node splitting transformation.

(4) q is a 2-node, and its nearest sibling, r, is a 3-node. (5) q is a 2-node and its nearest sibling, r, is a 4-node.

Deletion Transformation When the Nearest Sibling is a 3-Node

w z

v x y

a b c d

f

p

q

e

x z

v w y

a b d e

f

p

qr r

v y z

u w x

a b c d

f

p

q

e

w y z

u v x

a b d e

f

p

qr rg g

(a) q is the left child of a 3-node

(b) q is the left child of a 4-node

Red-Black Trees• A red-black tree is a binary tree representation of a 2-3-

4 tree. • The child pointer of a node in a red-black tree are of two

types: red and black. – If the child pointer was present in the original 2-3-4 tree, it

is a black pointer.– Otherwise, it is a red pointer.

• A node in a 2-3-4 is transformed into its red-black representation as follows:(1) a 2-node p is represented by the RedBlackNode q with

both its color data members black, and data = dataL; q->LeftChild = p->LeftChild, and q->RightChild = p ->LeftMidChild.

(2) A 3-node p is represented by two RedBlackNodes connected by a red pointer. There are two ways in which this may be done.

(3) A 4-node is represented by three RedBlackNodes, one of which is connected to the remaining two by red pointers.

Transforming a 3-Node into Two RedBlackNodes

x y

b ca

y

x

a b

c

orx

y

cb

a

Transforming a 4-Node into Three

RedBlackNodes

x y z

b ca

y

x

a b

z

dc

d

Red-Black Trees (Cont.)• One may verify that a red-black tree satisfies the

following properties:(P1) It is a binary search tree.(P2) Every root-to-external-node path has the same number

of black links.(P3) No root-to-external-node path has two or more

consecutive red pointers.• An alternate definition of a red-black tree is given in the

following:(Q1) It is a binary search tree.(Q2) The rank of each external node is 0(Q3) Every internal node that is the parent of an external

node has rank 1.(Q4) For every node x that has a parent p(x), rank(x) ≤

rank(p) ≤ rank(x) + 1.(Q5) For every node x that has a grandparent gp(x), rank(x)

< rank(gp(x)).

Red-Black Trees (Cont.)• Each node x of a 2-3-4 tree T is represented

by a collection of nodes in its corresponding red-black tree. All nodes in this collection have a rank equal to height(T) – level(x) +1.

• Each time there is a rank change in a path from the root of the red-black tree, there is a level change in the corresponding 2-3-4 tree.

• Black pointers go from a node of a certain rank to one whose rank is one less.

• Red pointers connect two nodes of the same rank.

Lemma 10.1Lemma 10.1: Every red-black tree RB

with n (internal) nodes satisfies the following:

(1)(2)(3)

)1(log2)( 2 nRBheight

)(2)( RBrankRBheight

)1(log)( 2 nRBrank

Searching a Red-Black Tree

• Since a red-black tree is a binary search tree, the search operation can be done by following the same search algorithm used in a binary search tree.

Red-Black Tree Insertion

• An insertion to a red-black tree can be done in two ways: top-down or bottom-up.

• In a top-down insertion, a single root-to-leaf pass is made over the red-black tree.

• A bottom-up insertion makes both a root-to-leaf and a leaf-to-root pass.

Top-Down Insertion• We can detect a 4-node simply by looking for nodes q

for which both color data members are red. Such nodes, together with their two children, form a 4-node.

• When such a 4-node q is detected, the following transformations are needed:

(1) Change both the colors of q to black.(2) If q is the left (right) child of its parent, then change the

left (right) color of its parent to red.(3) If we now have two consecutive red pointers, then one

is from the grandparent, gp, of q to the parent, p, of q and the other from p to q. Let the direction of the first of these be X and that of the second be Y. Depending on XY = LL, LR, RL, and RR, transformations are performed to remove violations.

Transformation for a Root 4-Node

y

x

a b

z

dc

root

y

x

a b

z

dc

root

Transformation for a 4-Node That is the Child of a 2-Node

x

w

a b

y

dc

z

e x

w

a b

y

dc

z

e

y

z

ed

x

b c

w

a y

z

ed

x

b c

w

a

Transformation for a 4-Node That is the Left Child of a 3-

Nodez

y

w

v x

a b c d

ef w

v x

a b c d

y

z

e f

w

v x

a b c d

y

z

e f

w

v x

a b c d

y

z

e f

(a) LL rotation

(b) color change

Transformation for a 4-Node That is the Left Middle Child of a 3-Node

x

w y

b c d e

v

z

f

aw

v

x

a

b c

y

z

f

d e

x

yw

z

v

a

fw

v

x

a

b c

y

z

f

d eb c d e

(a) LR rotation

(b) RL rotation

Bottom-Up Insertion• In bottom-up insertion, the element to be

inserted is added as the appropriate child of the node last encountered. A red pointer is used to join the new node to its parent.

• However, this might violates the red-black tree definition since there might be two consecutive red pointers on the path.

• To resolve this problem, we need to perform color transformation.

• Let s be the sibling of node q. The violation is classified as an XYZ violation, where X=L if <p, q> is a left pointer, and X=R otherwise; Y=L if <q, r> is a left pointer, and Y=R otherwise; and Z=r if s≠0 and <p, s> is a red pointer, and Z= b otherwise.

Bottom-Up Insertion (Cont.)

• The color changes potentially propagate the violation up the tree and may need to be reapplied several times. Note that the color change would not affect the number of black pointers on a root-to-external path.

LLr and LRr Color Changes for Bottom-Up Insertion

w

x

y

a

z

e

b

c d w

x

y

a

z

e

b

c d

x

w

y

a

z

e

b c

d x

w

y

b

z

e

c

da

(a) LLr color change

(b) LRr color change

LLb and LRb Rotations for Bottom-Up Insertion

w

x

y

a

z

e

b

c d

w

x

y

e

c

d

x

w

y

a

z

e

b c

d z

w

x

b

y

e

ca

(a) LLb rotation

(b) LRb rotation

za b

d

Comparison of Top-Down and Bottom-Up

• In comparing the top-down and the bottom-up insertion methods, the top-down method, O(log n) rotations can be performed, whereas only one rotation is possible in the bottom up method.

• Both methods may perform O(log n) color changes. However, the top-down method can be used in pipeline mode to perform several insertions in sequence. The bottom-up cannot be used in this way.

Deletion from a Red-Black Tree

• If the node to be delete is root, then the result is an empty red-black tree.

• If the leaf node to be deleted has a red pointer to its parent, then it can be deleted directly because it is part of 3-node or 4-node.

• If the leaf node to be deleted has a black pointer, then the leaf is a 2-node. Deletion from a 2-node requires a backward restructuring pass. This is not desirable.

• To avoid deleting a 2-node, insertion transformation is used in the reverse direction to ensure that the search for the element to be deleted moves down a red pointer.

• Since most of the insertion and deletion transformations can be accomplished by color changes and require no pointer changes or data shifts, these operations take less time using red-black trees than when a 2-3-4 tree is represented using nodes of type Two34Node.

Joining and Splitting Red-Black Trees

• In binary search tree we have the following operations defined: ThreeWayJoin, TwoWayJoin, and Split. These operations can be performed on red-black trees in logarithmic time.

Large Search Tree That Does Not Fit in Memory

• The aforementioned balanced search trees (AVL trees, 2-3 trees, 2-3-4 trees) only work fine when the table can fit in the internal memory.

• If the table is larger than the internal memory, then a search may require O(h) disk accesses, where h is the height of the tree.

• Since disk accesses tend to take significant amount of time compared to internal memory accesses, it is desirable we develop a structure to minimize the number of disk accesses.

M-Way Search TreeDefinition: An m-way search tree, either is empty or satisfies the following properties:

(1)The root has at most m subtrees and has the following structures:

n, A0, (K1, A1), (K2, A2), …, (Kn, An) where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and the Ki, 1 ≤ i ≤ n ≤ m, are key values.(2) Ki < Ki +1, 1 ≤ i ≤ n (3) All key values in the subtree Ai are less than Ki +1 and greater then Ki , 0 ≤ i ≤ n (4) All key values in the subtree An are greater than Kn , and those in A0 are less than K1.(5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.

Searching an m-Way Search Tree

• Suppose to search a m-Way search tree T for the key value x. Assume T resides on a disk. By searching the keys of the root, we determine i such that Ki ≤ x < Ki+1.– If x = Ki, the search is complete.– If x ≠ Ki, x must be in a subtree Ai if x is in T.– We then proceed to retrieve the root of the

subtree Ai and continue the search until we find x or determine that x is not in T.

Searching an m-Way Search Tree

• The maximum number of nodes in a tree of degree m and height h is

• Therefore, for an m-Way search tree, the maximum number of keys it has is mh - 1.

• To achieve a performance close to that of the best m-way search trees for a given number of keys n, the search tree must be balanced.

10

)1/()1(hi

hi mmm

B-TreeDefinition: A B-tree of order m is an m-way

search tree that either is empty or satisfies the following properties:

(1) The root node has at least two children.(2) All nodes other than the root node and

failure nodes have at least children.

(3) All failure nodes are at the same level.

2/m

B-Tree (Cont.)• Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree

is a B-tree of order 4.• Also all B-trees of order 2 are full binary trees.• A B-tree of order m and height l has at most ml -1 keys.• For a B-tree of order m and height l, the minimum

number of keys (N) in such a tree is • If there are N key values in a B-tree of order m, then all

nonfailure nodes are at levels less than or equal to l, . The maximum number of accesses that have to be made for a search is l.

• For example, a B-tree of order m=200, an index with N ≤ 2x106-2 will have l ≤ 3.

.1 ,12/2 1 lmN l

1}2/)1{(log 2/ Nl m

The Choice of m• B-trees of high order are desirable since they

result in a reduction in the number of disk accesses.

• If the index has N entries, then a B-tree of order m=N+1 has only one level. But this is not reasonable since all the N entries can not fit in the internal memory.

• In selecting a reasonable choice for m, we need to keep in mind that we are really interested in minimizing the total amount of time needed to search the B-tree for a value x. This time has two components:(1)the time for reading in the node from the disk(2) the time needed to search this node for x.

The Choice of m (Cont.)• Assume a node of a B-tree of order m is of a fixed size and is

large enough to accommodate n, A0 , and m-1 triple (Ki , Ai , Bi), 1 ≤ j < m.

• If the Ki are at most charactersα long and Ai and Bi each characters βlong, then the size of a node is about m(α+2β). Then the time to access a node is

ts + tl + m(α+2β) tc = a+bm where a = ts + tl = seek time + latency time b = (α+2β) tc , and tc = transmission time per character.• If binary search is used to search each node of the B-tree,

then the internal processing time per node is c log2 m+d for some constants c and d.

• The total processing time per node is τ= a + bm + c log2 m+d • The maximum search time is where f is some constant.

}loglog

{*}2/)1{(log*22

2 cm

bmmdaNf

Figure 10.36: Values of (35+0.06m)/log2m

m Search time (sec)2 35.124 17.628 11.83

16 8.9932 7.3864 6.47128 6.10256 6.30512 7.301024 9.642048 14.354096 23.408192 40.50

Figure 10.37: Plot of (35+0.06m)/log2m

50 400125

5.7

6.8

m

Tota

l max

imum

sear

ch

time

Insertion into a B-Tree• Instead of using 2-3-4 tree’s top-down insertion, we

generalize the two-pass insertion algorithm for 2-3 trees because 2-3-4 tree’s top-down insertion splits many nodes, and each time we change a node, it has to be written back to disk. This increases the number of disk accesses.

• The insertion algorithm for B-trees of order m first performs a search to determine the leaf node p into which the new key is to be inserted.– If the insertion of the new key into p results p having m keys, the

node p is split.– Otherwise, the new p is written to the disk, and the insertion is

complete.• Assume that the h nodes read in during the top-down pass

can be saved in memory so that they are not to be retrieved from disk during the bottom-up pass, then the number of disk accesses for an insertion is at most h (downward pass) +2(h-1) (nonroot splits) + 3(root split) = 3h+1.

• The average number of disk accesses is approximately h+1 for large m.

Figure 10.38: B-Trees of Order 3

10, 30

10 25, 30

20

10 25, 30

20, 28

10

(a) p = 1, s = 0

(b) p = 3, s = 1

(c) p = 4, s = 2

p is the number of nonfailure nodes in the final B-tree with N entries.s is the number of split

Deletion from a B-Tree• The deletion algorithm for B-tree is also a

generalization of the deletion algorithm for 2-3 trees.

• First, we search for the key x to be deleted.– If x is found in a node z, that is not a leaf, then the

position occupied by x in z is filled by a key from a leaf node of the B-tree.

– Suppose that x is the ith key in z (x =Ki). Then x may be replaced by either the smallest key in the sbutree Ai or the largest in the subtree Ai-1. Since both nodes are leaf nodes, the deletion of x from a nonleaf node is transformed into a deletion from a leaf.

Deletion from a B-Tree (Cont.)

• There are four possible cases when deleting from a leaf node p. – In the first case, p is also the root. If the root is left with at

least one key, the changed root is written back to disk. Otherwise, the B-tree is empty following the deletion.

– In the second case, following the deletion, p has at least keys. The modified leaf is written back to disk.– In the third case, p has keys, and its nearest

sibling, q, has at least keys. Check only one of p’s nearest siblings. p is deficient, as it has one less than the minimum number of keys required. q has more keys than the minimum required. As in the case of a 2-3 tree, a rotation is performed. In this rotation, the number of keys in q decreases by one, and the number in p increases by one.

– In the fourth case, p has keys, and q has keys. p is deficient and q has minimum number of keys permissible for a nonroot node. Nodes p and q and the keys Ki are combined to form a single node.

12/ m 22/ m

2/m

22/ m 12/ m

Figure 10.39 B-Tree of Order 5

2 20 35

2 10 15 2 25 30 3 40 45 50

Splay Trees• If we only interested in amortized complexity rather

than worst-case complexity, simpler structures can be used for search trees.

• By using splay trees, we can achieve O(log n) amortized time per operation.

• A splay tree is a binary search tree in which each search, insert, delete, and join operations is performed in the same way as in an ordinary binary search tree except that each of these operations is followed by a splay.

• Before a split, a splay is performed. This makes the split very easy to perform.

• A splay consists of a sequence of rotations.

Starting Node of Splay Operation

• The start node for a splay is obtained as follows:(1) search: The splay starts at the node containing the

element being sought.(2) insert: The start node for the splay is the newly

inserted node.(3) delete: The parent of the physically deleted node is

used as the start node for the splay. If this node is the root, then no splay is done.

(4) ThreeWayJoin: No splay is done.(5) split: Suppose that we are splitting with respect to the

key i and that key i is actually present in the tree. We first perform a splay at the node that contains i and then split the tree.

Splay Operation• Splay rotations are performed along the path

from the start node to the root of the binary search tree.

• Splay rotations are similar to those performed for AVL trees and red-black trees.

• If q is the node at which splay is being performed. The following steps define a splay(1) If q either is 0 or the root, then splay terminates.(2) If q has a parent p, but no grandparent, then the

rotation of Figure 10.42 is performed, and the splay terminates.

(3) If q has a parent, p, and a grandparent, gp, then the rotation is classified as LL (p is the left child of gp, and q is left child of p), LR ( p is the left child of gp, q is right child of p), RR, or RL. The splay is repeated at the new location of q.

Splay Amortized Cost• Note that all rotations move q up the tree and that

following a splay, q becomes the new root of the search tree.

• As a result, splitting the tree with respect to a key, i, is done simply by performing a splay at i and then splitting at the root.

• The analysis for splay trees will use a potential technique. Let P0 be the initial potential of the search tree, and let Pi be its potential following the ith operation in a sequence of m operations. The amortized time for the ith operation is defined as

(actual time for the ith operation) + Pi – Pi-1

So the actual time for the ith operation is (amortized time for the ith operation) + Pi – Pi-1

Hence, the actual time needed to perform the m operations in the sequence is m

i

PPi 0operation)th for the timeamortized(

Figure 10.42: Rotation when q is Right Child and Has no Grandparent

a

b c

p

q

a b

cp

q

a, b, and c are substrees

Figure 10.43 RR and RL Rotations

a

b

c d

gp

a

b

cd

b c

d

a

a b c d

p

q

p

q

gp

gp

p

q gp

p

q

(a) Type RR

(b) Type RL

Figure 10.44 Rotations In A Splay Beginning At Shaded

Node

19

82

76

34

5fe

dc

b

a

gh

ij

(a) Initial search tree

19

82

76

5

b

a

gh

ij

43

dc

fe

(b) After RR rotation

Figure 10.44: Rotations In A Splay Beginning At Shaded

Node (Cont.)

19

82

5

a

ij

43

dce

67

g hf

b

(c) After LL rotation (d) After LR rotation

19

52

aj

43

b

dce

86

7i

g hf

Figure 10.44: Rotations In A Splay Beginning At Shaded

Node (Cont.)

12

4e

b

5

3dc

a9

86

fi

7g h

j

(e) After RL rotation

Upper Bound of Splay’s Amortized Cost

• Let the size, s(i) of the subtree with root i be the total number of nodes in it.

• The rank, r(i), of node i is equal to .

• The potential of the tree is .• Lemma 10.2: Consider a binary search

tree that has n elements/nodes. The amortized cost of a splay operation that begins at node q is at most .

)(log2 is

i

ir )(

1))(log(3 2 qrn

Splay Tree ComplexityTheorem 10.1: The total time

needed for a sequence of m search, insert, delete, join, and split operations performed on a collection of initially empty splay trees is O(m log n), where n, n > 0, is the number of inserts in the sequence.

Digital Search Trees• A digital search tree is a binary tree in which

each node contains one element. The element-to-node assignment is determined by the binary representation of the element keys.

• Suppose we number the bits in the binary representation of a key from left to right beginning at one. Then bit one of 1000 is 1. All keys in the left subtree of a node at level i have bit i equal to 0 whereas those in the right subtree of nodes at this level have bit i = 1.

Figure 10.45 Digital Search

1000

0010

0001

0000

1001

1100

1000

0010

0001

0000

1001

1100

0011

(a) Initial tree (b) After 0011 inserted

a

b c

d e

f

a

b c

d e

f g

Digital Search Trees (Cont.)

• The digital search tree functions to search and insert are quite similar to the corresponding functions for binary search trees.

• During insert or search, the subtree to move to is determined by a bit in the search key rather than by the result of the comparison of the search key and the key in the current node.

• Deleting an item in a leaf node is easy by simply removing the node.

• Deleting the key in a non-leaf node, the deleted item is replaced by a value from any leaf in its subtree and that leaf is removed.

• Each of these operations can be performed in O(h) time, where h is the height of the digital search tree.

• If each key in a digital search tree has KeySize bits, then the height of the digital search tree is at most KeySize +1.

Binary Tries• When we are dealing with very long keys, the cost of a key

comparison is high. • The cost can be reduced to one by using a related structure

called Patricia (Practical algorithm to retrieve information coded in alphanumeric).

• Three steps to develop the structure:– First, introduce a structure called binary trie.– Then, transform binary tries into compressed binary tries.– Finally, from compressed binary tries we obtain Patricia.

• A binary trie is a binary tree that has two kinds of nodes: branch nodes and element nodes.– A branch node has two data members LeftChild and RightChild but

no data data member.– An element node has single data member data.

• Branches nodes are used to build binary tree search structure similar to that of a digital search tree.

Figure 10.46: Example of A Binary Trie

0000 0001

0010

1000 1001

1100

Compressed Binary Trie

• Observe that a successful search in a binary trie always ends at an element node.

• Once this element node is reached, key comparison is performed.

• Observe from Figure 10.46, we found that there are some degree one node in the tree. We can use another data member BitNumber to eliminate all degree-one branch nodes from the trie.

• The BitNumber gives the bit number of the key that is to be used at this node.

• A binary trie that has been modified to contain no branch nodes of degree one is called a compressed binary trie.

Figure 10.47: Binary Trie of Figure 10.46 With Degree-One Nodes Eliminated

0000 0001

0010

1000 1001

1100

1

23

4 4

Patricia• Compressed binary tries may be represented using nodes of

a single type. The new nodes, called augmented branch nodes, are the original branch nodes augmented by the data member data. The resulting structure is called Patricia and obtained from a compressed binary trie in the following way:

(1)Replace each branch node by an augmented branch node.(2) Eliminate the element nodes.(3) Store the data previously in the element nodes in the data data members of the augmented branch nodes. Since every nonempty compressed binary trie has one less branch node than it has element nodes, it is necessary to add one augmented branch node. This node is called head node. The remaining structure is the left subtree of the head node. The node has BitNumber equal to 0. The assignment of data to the augmented branch node is done in such a way that the BitNumber in the augmented branch node is less than or equal to that in the parent of the element node that contained this data.(4) Replace the original pointers to element nodes by pointers to the respective augmented branch nodes.

Figure 10.48: An Example of Patricia

1100

0000

0010

0001

1001

1000

0

1

3

4

2

4

Figure 10.49: Insertion Into Patricia

10000 root

00101

1000

00101

1000

1001

0

4

00101

1000

1100

0

2

10014

00101

1000

1100

0

2

10014

00003

00101

1000

1100

0

2

10014

00003

00014

(a) 1000 inserted (b) 0010

inserted

(c) 1001 inserted

(d) 1100 inserted

(e) 0000 inserted (f) 0001 inserted

0

Analysis of Patricia Insertion

• The complexity of Patricia insertion is O(h) where h is the height of the Patricia.

• h can be as large as min{KeySize+1, n}, where KeySize is the number of bits in a key and n is the number of elements.

• When the keys are uniformly distributed, the height is O(log n).

Tries• A trie is an index structure that is particularly useful

when key values are of varying size.• It is the generalized of the binary trie.• A trie is a tree of degree m ≥ 2 in which the branching

at any level is determined not by the entire key value, but by a portion of it.

• A trie contains two types of nodes: element node, and branch node.– An element node has only data data member.– A branch node contains pointers to subtrees.

• Assume each character is one of the 26 letters of the alphabet, a branch node has 27 pointer data members. The extra data member is used for the blank character (denoted as b). It is used to terminates all keys.

Figure 10.50: Trie Created Using Characters Of Key Value From Left To

Right, One At A Time

bluebird bunting

cardinal chickadee

godwit goshawk

gull

oriole wren

a

thrasher thrush

b c g o t wb

l u a h o u h

r

a u

d s

Figure 10.51: Trie Showing Need For A Terminal Character

to together

t

o

gb

Sampling Strategies• Given a set of key values to be represented in an index,

the number of levels in the trie will depend on the strategy or key sampling technique used to determine the branching at each level.

• The trie structure we just discussed had sample(x,i) = ith character.

• We could choose different sample functions that will result in different trie structure.

• Ideally, with a fixed set of keys, we should be able to find a best trie structure that has the fewest number of levels. However, in reality, it is very difficult to do so. If we consider dynamic situation with keys being added and deleted, it is even more difficult. In such case, we wish to optimize average performance.

• Without the knowledge of future key values, the best sampling function probably is the randomized sampling function.

Sampling Strategies (Cont.)

• The sampling strategy is not limited to one character at a time. In fact, multiple characters can be used in one sampling.

• In some cases, we want to limit the number of levels. We can achieve this by allowing nodes to hold more than one key value.

• If the maximum number of levels allowed is l, then all key values that are synonyms up to level l-1 are entered into the same element node.

Figure 10.52: Trie Constructed For Data Of Figure 10.50 Sampling One Character At A

Time, From Right To Left

a b c d e f g h i j k l m n o p q r s t u v w x y zb

bluebird bunting

cardinalchickadee

godwitgoshawk thrasher

thrush

gulloriole

wren

e la l

Figure 10.53: An Optimal Trie For The Data Of Figure 10.50 Sampling On The First Level Done

By The Fourth Character Of The Key Values


bluebird

buntingcardinal

chickadee

godwitgoshawkthrasher

thrushgull oriole

wren

Figure 10.54: Trie Obtained For Data Of Figure 10.50 When Number Of Levels Is Limited To 3; Keys Have Been Sampled From Left To Right,

One Character At A Time


bluebird bunting

cardinal chickadee

godwitgoshawk

thrasherthrush

gull

oriole wren

a h

l u o u h

Figure 10.55: Selection of Trie of Figure 10.50 Showing Changes Resulting from

Inserting Bobwhite and Bluejay

bluebird bluejay

buntingbobwhite

b

l o u

u

e

b j

δ1

δ2

δ3

σ

ρ

Documents

Optimal Binary Search Tree We now want to focus on the construction of binary search trees for a static set of identifiers. And only searches are performed