175
Data Structures and Algorithms Course’s slides: Hierarchical data structures www.mif.vu.lt/~algis

Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Data Structuresand

Algorithms

Course’s slides: Hierarchical data structures

www.mif.vu.lt/~algis

Page 2: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Trees• Linear access time of linked lists is prohibitive

• Does there exist any simple data structure for which the running time of most operations (search, insert, delete) is O(log N)?

• A tree is a collection of nodes: a collection can be empty or (recursive definition) If not empty, a tree consists of a distinguished node r (the root), and zero or more nonempty subtrees T1, T2, ...., Tk, each of whose roots are connected by a directed edge from r

Page 3: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Some Terminologies

• Child and parent• Every node except the root has one parent

• A node can have an arbitrary number of children

• Leaves• Nodes with no children

• Sibling• nodes with same parent

Page 4: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Some Terminologies• Path

• Length• number of edges on the path

• Depth of a node• length of the unique path from the root to that node• The depth of a tree is equal to the depth of the deepest leaf

• Height of a node• length of the longest path from that node to a leaf• all leaves are at height 0• The height of a tree is equal to the height of the root

• Ancestor and descendant• Proper ancestor and proper descendant

Page 5: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Example: UNIX Directory

Page 6: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Example: Expression Trees

• Leaves are operands (constants or variables)• The other nodes (internal nodes) contain operators• Will not be a binary tree if some operators are not binary

Page 7: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Traversal: preorder, postorder and inorder• Used to name or print out the data in a tree in a certain order,

according to hierarchical structure (predecessors or successors)• Preorder traversal

• node, left, right• prefix expression

• ++a*bc*+*defg

Page 8: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Preorder, Postorder and Inorder• Postorder traversal

• left, right, node• postfix expression

• abc*+de*f+g*+

• Inorder traversal• left, node, right.• infix expression

• a+b*c+d*e+f*g

Page 9: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Preorder, Postorder and Inorder

Page 10: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Implementation of a general tree

Page 11: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary TreesPossible operations on the Binary Tree ADT:

parentleft_child, right_childsiblingroot, etc

ImplementationBecause a binary tree has at most two children, we can keep direct pointers to them

Page 12: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary Trees• A tree in which no node can have more than two children:

• The depth of an “average” binary tree is considerably smaller than N, even though in the worst case, the depth can be as large as N – 1:

Page 13: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary Search Trees

A binary search tree Not a binary search tree

Page 14: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary Search TreesStores keys in the nodes in a way so that searching, insertion and deletion can be done efficiently.

Binary search tree property:

For every node X, all the keys in its left subtree are smaller than the key value in X, and all the keys in its right subtree are larger than the key value in X

Page 15: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary search trees

• Average depth of a node is O(log N); maximum depth of a node is O(N)

Two binary search trees representing the same set:

Page 16: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Searching BST• If we are searching for 15, then we are done.• If we are searching for a key < 15, then we should search in the

left subtree.• If we are searching for a key > 15, then we should search in the

right subtree.

Page 17: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inorder traversal of BST• Print out all the keys in sorted order

Inorder: 2, 3, 4, 6, 7, 9, 13, 15, 17, 18, 20

Page 18: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

findMin/findMaxReturn the node containing the smallest element in the tree

Start at the root and go left as long as there is a left child. The stopping point is the smallest element

Similarly for findMax

Time complexity = O (height of the tree)

Page 19: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Insert (x)Proceed down the tree as you would with a findIf X is found, do nothing (or update something)Otherwise, insert X at the last spot on the path traversedTime complexity = O (height of the tree)

Page 20: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Delete (x)When we delete a node, we need to consider how we take care of the children of the deleted node.This has to be done such that the property of the search tree is maintained.Three cases: (1) the node is a leaf - delete it immediately; (2) the

node has one child - adjust a pointer from the parent to bypass that node

Page 21: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Delete (x)(3) the node has 2 children - replace the key of that node with

the minimum element at the right subtree

• delete the minimum element

• Has either no child or only right child because if it has a left child, that left child would be smaller and would have been chosen. So invoke case 1 or 2

Time complexity = O (height of the tree)

Page 22: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary search tree – best time• All BST operations are O (d), where d is tree depth

• minimum d is for a binary tree with N nodes

• What is the best case tree?

• What is the worst case tree?

• So, best case running time of BST operations is O (log N)

ë ûNlogd 2=

Page 23: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Binary Search Tree - Worst Time• Worst case running time is O (N)

• What happens when you insert elements in ascending order?

• Insert: 2, 4, 6, 8, 10, 12 into an empty BST• Problem: Lack of “balance”:

• compare depths of left and right subtree• Unbalanced degenerate tree

Page 24: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Balanced and unbalanced BST

4

2 5

1 3

1

5

2

4

3

7

6

4

2 6

5 71 3

Is this “balanced”?

Page 25: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Approaches to balancing trees• Don't balance

• May end up with some nodes very deep• Strict balance

• The tree must always be balanced perfectly• Pretty good balance

• Only allow a little out of balance• Adjust on access

• Self-adjusting

Page 26: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Balancing binary search trees

• Many algorithms exist for keeping binary search trees balanced

• AVL trees (Adelson-Velskii, Landis, height-balanced trees)

• Splay trees and other self-adjusting trees

• 2-3 trees and 2-3-4 trees

• B-trees and other multiway search trees

Page 27: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Perfect balance• Want a complete tree after every operation

• tree is full except possibly in the lower right

• This is expensive

• For example, insert 2 in the tree on the left and then rebuild as a complete tree

Insert 2 &complete tree

6

4 9

81 5

5

2 8

6 91 4

Page 28: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

AVL - good but not perfect balance

• AVL trees are height-balanced binary search trees

• Balance factor of a node

• height(left subtree) - height(right subtree)

• An AVL tree has balance factor calculated at every node

• For every node, heights of left and right subtree can differ by no more than 1

• Store current heights in each node

Page 29: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Height of an AVL tree• N (h) = minimum number of nodes in AVL tree of height h.• Basis

• N (0) = 1, N (1) = 2• Induction

• N (h) = N (h-1) + N (h-2) + 1• Solution (recall Fibonacci analysis)

• N (h) > fh (f » 1.62)

h-1h-2

h

Page 30: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Height of an AVL Tree• N (h) > fh (f » 1.62)

• Suppose we have n nodes in an AVL tree of height h.

• n > N (h) (because N (h) was the minimum)

• n > fh hence logf n > h (relatively well balanced tree!!)

• h < 1.44 log2n (i.e., Find takes O (logn))

Page 31: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Node Heights

1

00

2

0

6

4 9

81 5

1

height of node = hbalance factor = hleft - hrightempty height = -1

0

0

Height = 2 BF = 1 – 0 = 1

0

6

4 9

1 5

1

Tree A (AVL)Tree B (AVL)

Page 32: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Node heights after insert 7

2

10

3

0

6

4 9

81 5

1

height of node = hbalance factor = hleft - hrightempty height = -1

1

0

2

0

6

4 9

1 5

1

07

07

balance factor 1-(-1) = 2

-1

Tree A (AVL) Tree B (not AVL)

Page 33: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Insert and rotation in AVL trees• Insert operation may cause balance factor to become 2 or –2

for some node

• only nodes on the path from insertion point to root node have possibly changed in height

• So after the Insert, go back up to the root node by node, updating heights

• If a new balance factor (the difference hleft - hright) is 2 or –2, adjust tree by rotation around the node

Page 34: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Single Rotation in an AVL Tree

2

10

2

0

6

4 9

81 5

1

07

0

1

0

2

0

6

4

9

8

1 5

1

07

Page 35: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Let the node that needs rebalancing be a.There are 4 cases:

Outside Cases (require single rotation) :1. Insertion into left subtree of left child of a.2. Insertion into right subtree of right child of a.

Inside Cases (require double rotation) :3. Insertion into right subtree of left child of a.4. Insertion into left subtree of right child of a.

The rebalancing is performed through four separate rotation algorithms.

Insertions in AVL trees

Page 36: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

X YZ

Consider a validAVL subtree

AVL insertion: outside case

h

hh

Page 37: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

XY

Z

Inserting into Xdestroys the AVL property at node j

AVL Insertion: Outside Case

h

h+1 h

Page 38: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

XY

Z

Do a “right rotation”

AVL Insertion: Outside Case

h

h+1 h

Page 39: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

XY

Z

Do a “right rotation”

Single right rotation

h

h+1 h

Page 40: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

jk

X Y Z

“Right rotation” done!(“Left rotation” is mirror

symmetric)

Outside Case Completed

AVL property has been restored!

h

h+1

h

Page 41: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

X YZ

AVL Insertion: Inside Case

Consider a validAVL subtree

h

hh

Page 42: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting into Y destroys theAVL propertyat node j

j

k

XY

Z

AVL Insertion: Inside Case

Does “right rotation”restore balance?

h

h+1h

Page 43: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

jk

X

YZ

“Right rotation”does not restorebalance… now k isout of balance

AVL Insertion: Inside Case

hh+1

h

Page 44: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Consider the structureof subtree Y… j

k

XY

Z

AVL Insertion: Inside Case

h

h+1h

Page 45: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

XV

Z

W

i

Y = node i andsubtrees V and W

AVL Insertion: Inside Case

h

h+1h

h or h-1

Page 46: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

XV

Z

W

i

AVL Insertion: Inside Case

We will do a left-right “double rotation” . . .

Page 47: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

X V

ZW

i

Double rotation: first rotation

left rotation complete

Page 48: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

j

k

X V

ZW

i

Double rotation: second rotation

Now do a right rotation

Page 49: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

jk

X V ZW

i

Double rotation: second rotation

right rotation complete

Balance has been restored

hh h or h-1

Page 50: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Implementation

balance (1,0,-1)

key

rightleft

No need to keep the height; just the difference in height, i.e. the balance factor; this has to be modified on the path of insertion even if you donʼt perform rotations

Once you have performed a rotation (single or double) you wonʼt need to go back up the tree

Page 51: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Arguments for AVL trees:

1. Search is O (log N) since AVL trees are always balanced.

2. Insertion and deletions are also O (logn)

3. The height balancing adds no more than a constant factor to the speed of insertion.

Pros and Cons of AVL Trees

Page 52: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Arguments against using AVL trees:

1. Difficult to program & debug; more space for balance factor.

2. Asymptotically faster but rebalancing costs time.

3. Most large searches are done in database systems on disk and use other structures (e.g. B-trees).

4. May be OK to have O (N) for a single operation if total run time for many consecutive operations is fast (e.g. Splay trees).

Pros and Cons of AVL Trees

Page 53: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splay trees - motto

Unbalanced binary search tree Balanced binary search tree

Balanced Binary Search Trees

Balancing by rotations property Rotations preserve BST property

A B

C

x

y

A

B C

x

y

Zig

Page 54: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Motivation for Splay treesBecause of problems with AVL trees (ugly delete code, extra complexity for height), the solution - Splay trees:

• not aiming at balanced trees always

• Splay trees are self-adjusting BSTs that have the additional helpful property that more commonly accessed nodes are more quickly retrieved.

• Amortized time (average over a sequence of inputs) for all operations is O (log n)

• Worst case time is O (n)

Page 55: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splay tree key idea (nuolaidžio medis)

17

10

92

5

3

You’re forced to make a really deep access:

Since you’re down there anyway,fix up a lot of deep nodes!

Why splay? - This brings the most recently accessed nodes up towards the root.

Page 56: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splaying • Bring the node being accessed to the root of the tree, when

accessing it, through one or more splay steps.

• A splay step can be:

• Zig Zag

• Zig-zig Zag-zag

• Zig-zag Zag-zigDouble rotations

Single rotation

Page 57: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splaying cases

Node being accessed (n) is:

• the root

• a child of the root

•Do single rotation: Zig or Zag pattern

• has both a parent (p) and a grandparent (g)

Double rotations:

(i) Zig-zig or Zag-zag pattern:

g à p à n is left-left or right-right

(ii) Zig-zag pattern:

g à p à n is left-right or right-left 57

Page 58: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 0: Access root - do nothing

X

n

Y

root

X

n

Y

root

Page 59: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 1: Access child of root – Zig/Zag

p

X

n

Y

Z

rootn

Z

p

Y

X

root

Zig – right rotation

Zag – left rotation

Page 60: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 1: Access child of root: Zig/demo

p

X

n

Y

Z

rootZig

Page 61: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 2: Access (LR, RL) Zig-Zag

g

Xp

Y

n

Z

W

n

Y

g

W

p

ZX

Page 62: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 2: Access (LR, RL) Zig-Zag

g

Xp

Y

n

Z

W

Zig

Page 63: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 2: Access (LR, RL) Zig-Zag

g

Xn

Yp

Z W

63

Zag

Page 64: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Case 3: Access (LL, RR) Zag-Zag

n

Z

Y

p

X

g

W

g

W

X

p

Y

n

Z

No more cookies! We are done showing animations.

1

2

Page 65: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

65 9732

28 54 82

7629

80

78

Splay(78) 50

x

y

z

zig-zag

Page 66: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

65 9732

28 54 82

7829

8076

Splay(78) 50

zig-zag

x

yz

Page 67: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

65 9732

28 54 82

7829

8076

Splay(78) 50

x

y

z

zig-zag

Page 68: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

65

9732

28

54

82

78

29 8076

Splay(78) 50

zig-zagz y

x

Page 69: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

65

9732

28

54

82

78

29 8076

Splay(78) 50

x

y

z

zig-zag

Page 70: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

659732

2854

82

29

8076

Splay(78)

50

78

zig-zag

z y

x

Page 71: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

659732

2854

82

29

8076

Splay(78)

50

78

yx

w

zig

Page 72: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Complete example

44

8817

659732

2854

82

29

8076

Splay(78)

50

78 xy

w

zig

Page 73: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Result of splaying

• The result is a binary tree, with the left subtree having all

keys less than the root, and the right subtree having keys

greater than the root.

• Also, the final tree is “more balanced” than the original.

• However, if an operation near the root is done, the tree

can become less balanced.

Page 74: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Why splaying helps• If a node n on the access path, to a target node say x, is at

depth d before splaying x, then it’s at depth <= 3+d/2 after the splay

• Overall, nodes which are below nodes on the access path tend to move closer to the root

• Splaying gets amortized to give O (log n) performance (maybe not now, but soon, and for the rest of the operations.)

Page 75: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splay operations: find

Find the node in normal BST manner

Note that we will always splay the last node on the access path even if we don’t find the node for the key we are looking for.

Splay the node to the root

75

Page 76: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splaying example: using find operation

2

1

3

4

5

6

Find (6)

2

1

3

6

5

4

zag-zag

Page 77: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

… still splaying …

zag-zag

2

1

3

6

5

4

1

6

3

2 5

4

Page 78: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

… 6 splayed out!

zag

1

6

3

2 5

4

6

1

3

2 5

4

Page 79: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splay operations: insert

• Can we just do BST insert?

• Yes. But we also splay the newly inserted node up to the root.

• Alternatively, we can do a Split (T, x)

Page 80: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Digression: splitting

• Split (T, x) creates two BSTs L and R:

• all elements of T are in either L or R (T = L È R)

• all elements in L are £ x

• all elements in R are ³ x

• L and R share no elements (L Ç R = Æ)

Page 81: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting in splay trees

How can we split?

• We can do Find (x), which will splay x to the root.

• Now, what’s true about the left subtree L and right subtree R of the root?

• So, we simply cut the tree at x, attach x either L or R

Page 82: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting

Split (x)

T L Rsplay

OR

L R L R

£ x ³ x> x < x

Page 83: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Back to insert

Split (x)

L R

x

L R> x< x

83

Page 84: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Insert example

91

6

4 7

2

Insert (5)

Split (5)

9

6

7

1

4

2

1

4

2

9

6

7

1

4

2

9

6

7

5

Page 85: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splay operations: delete

find(x)

L R

x

L R> x< x

delete (x)

85

Do a BST style delete and splay the parent of the deleted node. Alternatively,

Page 86: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Joining

Join (L, R): given two trees such that L < R, merge them

Splay on the maximum element in L, then attach R

L R Rsplay

L

Page 87: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Delete completed

T

find(x)

L R

x

L R> x< x

delete x

T - x

Join(L,R)

Page 88: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Delete example

91

6

4 7

2

Delete(4)

find(4) 9

6

7

1

4

2

1

2

9

6

7

Find max

2

1

9

6

7

2

1

9

6

7

Page 89: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Outline

q Balanced Search Trees

• 2-3 Trees• 2-3-4 Trees• Red-Black Trees• B-Trees

Page 90: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Why care about advanced implementations?

Same entries, different insertion sequence:

à Not good! Would like to keep tree balanced.

Page 91: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3 Trees

Ø each internal node has either 2 or 3 childrenØ all leaves are at the same level

Features

Page 92: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3 Trees with Ordered Nodes2-node 3-node

• leaf node can be either a 2-node or a 3-node

Page 93: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Example of 2-3 Tree

Page 94: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Traversing a 2-3 Treeinorder(in ttTree: TwoThreeTree)

if(ttTree’s root node r is a leaf)visit the data item(s)

else if(r has two data items){

inorder(left subtree of ttTree’s root)visit the first data iteminorder(middle subtree of ttTree’s root)visit the second data iteminorder(right subtree of ttTree’s root)

}else{

inorder(left subtree of ttTree’s root)visit the data iteminorder(right subtree of ttTree’s root)

}

Page 95: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Searching a 2-3 treeretrieveItem(in ttTree: TwoThreeTree,

in searchKey:KeyType,out treeItem:TreeItemType):boolean

if(searchKey is in ttTree’s root node r){

treeItem = the data portion of rreturn true

}else if(r is a leaf)

return falseelse{

return retrieveItem( appropriate subtree,searchKey, treeItem)

}

Page 96: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

What did we gain?

What is the time efficiency of searching for an item?

Page 97: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Gain: Ease of keeping the Tree balancedBinary Search Tree

2-3 Tree

both trees afterinserting items

39, 38, ... 32

Page 98: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting Items

Insert 39

Page 99: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting ItemsInsert 38

insert in leafdivide leaf

and move middlevalue up to parent

result

Page 100: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting ItemsInsert 37

Page 101: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting ItemsInsert 36

insert in leaf

divide leafand move middlevalue up to parent

overcrowdednode

Page 102: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting Items... still inserting 36

divide overcrowded node,move middle value up to parent,

attach children to smallest and largestresult

Page 103: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting ItemsAfter Insertion of 35, 34, 33

Page 104: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting so far

Page 105: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting so far

Page 106: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting ItemsHow do we insert 32?

Page 107: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting Itemsà creating a new root if necessary

à tree grows at the root

Page 108: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting Items

Final Result

Page 109: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

70

Deleting ItemsDelete 70

80

Page 110: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting ItemsDeleting 70: swap 70 with inorder successor (80)

Page 111: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Deleting 70: ... get rid of 70

Page 112: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Result

Page 113: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Delete 100

Page 114: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Deleting 100

Page 115: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Result

Page 116: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Delete 80

Page 117: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Deleting 80 ...

Page 118: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Deleting 80 ...

Page 119: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting Items

Deleting 80 ...

Page 120: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deleting ItemsFinal Result

comparison withbinary search tree

Page 121: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deletion Algorithm I

1. Locate node n, which contains item I

2. If node n is not a leaf à swap I with inorder successor

à deletion always begins at a leaf

3. If leaf node n contains another item, just delete item Ielse

try to redistribute nodes from siblings (see next slide)if not possible, merge node (see next slide)

Deleting item I:

Page 122: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deletion Algorithm II

A sibling has 2 items:àredistribute item

between siblings andparent

No sibling has 2 items:à merge nodeà move item from parent

to sibling

Redistribution

Merging

Page 123: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deletion Algorithm III

Internal node n has no item leftàredistribute

Redistribution not possible:à merge nodeà move item from parent

to siblingà adopt child of n

If n's parent ends up without item, apply process recursively

Redistribution

Merging

Page 124: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Deletion Algorithm IV

If merging process reaches the root and root is without itemà delete root

All operations have time complexity of log n

Page 125: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Trees• similar to 2-3 trees

• 4-nodes can have 3 items and 4 children

4-node

Page 126: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree example

Page 127: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: InsertionInsertion procedure:

• similar to insertion in 2-3 trees

• items are inserted at the leafs• since a 4-node cannot take another item,

4-nodes are split up during insertion process

Strategy

• on the way from the root down to the leaf:split up all 4-nodes "on the way"

à insertion can be done in one pass(remember: in 2-3 trees, a reverse pass might be necessary)

Page 128: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: InsertionInserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100

Inserting 60, 30, 10, 20 ...

... 50, 40 ...

Page 129: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion

Inserting 70 ...

... 80, 15 ...

Inserting 50, 40 ...

Page 130: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion

Inserting 80, 15 ...

... 90 ...

Page 131: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion

Inserting 90 ...

... 100 ...

Page 132: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion

Inserting 100 ...

Page 133: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion Procedure

Splitting 4-nodes during Insertion

Page 134: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion ProcedureSplitting a 4-node whose parent is a 2-node during insertion

Page 135: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Insertion ProcedureSplitting a 4-node whose parent is a 3-node during insertion

Page 136: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Deletion

Deletion procedure:• similar to deletion in 2-3 trees

• items are deleted at the leafsà swap item of internal node with inorder successor

• note: a 2-node leaf creates a problem

Strategy (different strategies possible)

• on the way from the root down to the leaf:turn 2-nodes (except root) into 3-nodes

à deletion can be done in one pass(remember: in 2-3 trees, a reverse pass might be necessary)

Page 137: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: DeletionTurning a 2-node into a 3-node ...

Case 1: an adjacent sibling has 2 or 3 items"steal" item from sibling by rotating items and moving subtree

30 50

10 20 40

25

20 50

10 30 40

25

"rotation"

Page 138: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: DeletionTurning a 2-node into a 3-node ...

Case 2: each adjacent sibling has only one itemà "steal" item from parent and merge node with sibling

(note: parent has at least two items, unless it is the root)

30 50

10 40

25

50

25

merging10 30 40

35 35

Page 139: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

2-3-4 Tree: Deletion Practice

Delete 32, 35, 40, 38, 39, 37, 60

Page 140: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Tree

• binary-search-tree representation of 2-3-4 tree

• 3- and 4-nodes are represented by equivalent binary trees

• red and black child pointers are used to distinguish betweenoriginal 2-nodes and 2-nodes that represent 3- and 4-nodes

Page 141: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Representation of 4-node

Page 142: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Representation of 3-node

Page 143: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Tree Example

Page 144: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Tree Example

Page 145: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Red-Black Tree Operations

Traversals

à same as in binary search trees

Insertion and Deletion

à analog to 2-3-4 tree

à need to split 4-nodes

à need to merge 2-nodes

Page 146: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting a 4-node that is a root

Page 147: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting a 4-node whose parent is a 2-node

Page 148: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting a 4-node whose parent is a 3-node

Page 149: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting a 4-node whose parent is a 3-node

Page 150: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Splitting a 4-node whose parent is a 3-node

Page 151: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Motivation for B-Trees• So far we have assumed that we can store an entire data structure

in main memory

• What if we have so much data that it wonʼt fit?• We will have to use disk storage but when this happens our time

complexity fails

• The problem is that Big-Oh analysis assumes that all operations take roughly equal time

• This is not the case when disk access is involved

Page 152: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Motivation (cont.)

• Assume that a disk spins at 3600 RPM

• In 1 minute it makes 3600 revolutions, hence one revolution occurs in 1/60 of a second, or 16.7ms

• On average what we want is half way round this disk – it will take 8ms

• This sounds good until you realize that we get 120 disk accesses a second – the same time as 25 million instructions

• In other words, one disk access takes about the same time as 200,000 instructions

• It is worth executing lots of instructions to avoid a disk access

Page 153: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Motivation (cont.)• Assume that we use an Binary tree to store all the details of people

in Canada (about 32 million records)

• We still end up with a very deep tree with lots of different disk accesses; log2 20,000,000 is about 25, so this takes about 0.21 seconds (if there is only one user of the program)

• We know we canʼt improve on the log n for a binary tree

• But, the solution is to use more branches and thus less height!

• As branching increases, depth decreases

Page 154: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Definition of a B-tree• A B-tree of order m is an m-way tree (i.e., a tree where each node

may have up to m children) in which:

1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree

2. all leaves are on the same level

3. all non-leaf nodes except the root have at least ém / 2ùchildren

4. the root is either a leaf node, or it has from two to m children

5. a leaf node contains no more than m – 1 keys

• The number m should always be odd

Page 155: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

An example B-Tree

51 6242

6 12

26

55 60 7064 9045

1 2 4 7 8 13 15 18 25

27 29 46 48 53

A B-tree of order 5 containing 26 items

Note that all the leaves are at the same level

Page 156: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree• Suppose we start with an empty B-tree and keys arrive in the

following order: 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

• We want to construct a B-tree of order 5

• The first four items go into the root:

• To put the fifth item in the root would violate condition 5

• Therefore, when 25 arrives, pick the middle key to make a new root

1281 2

Page 157: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree

Add 25 to the tree

1 12

8 2

25 6

14 28 17

7 52 16 48 68

3 26 29 53 55 45

1281 2 25

Exceeds Order. Promote middle and split.

Page 158: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

6, 14, 28 get added to the leaf nodes:

1 12

8 2

256

14 2817

7 52 16 48 68

3 26 29 53 55 45

12

8

1 2 25

12

8

1 2 2561 2 2814

Page 159: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

Adding 17 to the right leaf node would over-fill it, so we take the middle key, promote it (to the root) and split the leaf

1 12

8 2

256

14 28 17

7 52 16 48 68

3 26 29 53 55 45

1 12

8 2

256

14 28 17

7 52 16 48 68

3 26 29 53 55 45

12

8

2 2561 2 2814 2817

Page 160: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

7, 52, 16, 48 get added to the leaf nodes

1 12

8 2

256

14 28 17

7 52 16 4868

3 26 29 53 55 45

12

8

2561 2 2814

17

7 5216 48

Page 161: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

Adding 68 causes us to split the right most leaf, promoting 48 to the root

1 12

8 2

256

14 28 17

7 52 16 48 68

3 26 29 53 55 45

8 17

7621 161412 52482825 68

Page 162: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

Adding 3 causes us to split the left most leaf

1 12

8 2

256

14 28 17

7 52 16 48 68

326 29 53 55 45

48178

7621 161412 25 28 52 683 7

Page 163: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)1 12

8 2

256

14 28 17

7 52 16 48 68

3 26 29 53 5545

Add 26, 29, 53, 55 then go into the leaves

481783

1 2 6 7 52 6825 28161412 26 29 53 55

Page 164: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Constructing a B-tree (contd.)

Add 45 increases the trees level

1 12

8 2

256

14 28 17

7 52 16 48 68

3 26 29 53 55 45

481783

29282625 685553521614126 71 2 45

Exceeds Order. Promote middle and split.

Exceeds Order. Promote middle and split.

Page 165: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Inserting into a B-Tree• Attempt to insert the new key into a leaf

• If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leafʼs parent

• If this would result in the parent becoming too big, split the parent into two, promoting the middle key

• This strategy might have to be repeated all the way to the top

• If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

Page 166: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Removal from a B-tree

• During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:

• 1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.

• 2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case can we delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position.

Page 167: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Removal from a B-tree (2)• If (1) or (2) lead to a leaf node containing less than the minimum

number of keys then we have to look at the siblings immediately adjacent to the leaf in question:

• 3: if one of them has more than the min’ number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf

• 4: if neither of them has more than the min’ number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required

Page 168: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #1: Simple leaf deletion

12 29 52

2 7 9 15 22 56 69 7231 43

Delete 2: Since there are enoughkeys in the node, just delete it

Assuming a 5-wayB-Tree, as before...

Page 169: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #2: Simple non-leaf deletion

12 29 52

7 9 15 22 56 69 7231 43

Delete 52

Borrow the predecessoror (in this case) successor

56

Page 170: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #4: Too few keys in node and its siblings

12 29 56

7 9 15 22 69 7231 43

Delete 72Too few keys!

Join back together

Page 171: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #4: Too few keys in node and its siblings

12 29

7 9 15 22 695631 43

Page 172: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #3: Enough siblings

12 29

7 9 15 22 695631 43

Delete 22

Demote root key andpromote leaf key

Page 173: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Type #3: Enough siblings

12

297 9 15

31

695643

Page 174: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Analysis of B-Trees• maximum number of items in a B-tree of order m and height h:

root m – 1

level 1 m (m – 1)

level 2 m2 (m – 1)

. . .

level h mh (m – 1)

• So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1

• When m = 5 and h = 2 this gives 53 – 1 = 124

Page 175: Data Structures and Algorithmsalgis/dsax/Data-structures-3.pdfTrees ò Linear access time of linked lists is prohibitive ò Does there exist any simple data structure for which the

Reasons for using B-Trees• When searching tables held on disc, the cost of each disc transfer is

high but doesn't depend much on the amount of data transferred, especially if consecutive items are transferred

• If we use a B-tree of order 101, say, we can transfer each node in one disc read operation

• A B-tree of order 101 and height 3 can hold 1014 – 1 items (approximately 100 million) and any item can be accessed with 3 disc reads (assuming we hold the root in memory)

• If we take m = 3, we get a 2-3 tree, in which non-leaf nodes have two or three children (i.e., one or two keys)

• B-Trees are always balanced (since the leaves are all at the same level), so 2-3 trees make a good type of balanced tree