Upload
milo-richardson
View
261
Download
3
Embed Size (px)
Citation preview
2-3 Trees
• Extended tree. Tree in which all empty subtrees are replaced
by new nodes that are called external nodes. Original nodes are called internal nodes.
2-3 Tree Definition
• Every internal node is either a 2-node or a 3-node.
• A 2-node has one key and 2 children/subtrees.
All keys in left subtree are smaller than this key. All keys in right subtree are bigger than this key.
8 2-node
L R
2-3 Tree Definition• A 3-node has 2 keys and 3 children/subtrees; first
key is smaller than second key.
1 3 3-node
RML
All keys in left subtree are smaller than first key. All keys in middle subtree are bigger than first key and
smaller than second key. All keys in right subtree are bigger than second key.
• All external nodes are on the same level.
Minimum # Of Pairs/Elements
• Number of nodes = 2h – 1, where h is tree height (excluding external nodes).
• Each node has 1 (key, value) pair.
• So, minimum # of pairs = 2h – 1
Maximum # Of Pairs/Elements
• Happens when all internal nodes are 3-nodes.
• Full degree 3 tree.
• # of nodes = 1 + 3 + 32 + 33 + … + 3h-1 = (3h – 1)/2.
• Each node has 2 pairs.
• So, # of pairs = 3h – 1.
Node Structure
• 2-node uses LC, P1, and MC.
• 3-node uses all fields.
• May have optional parent field.
• Only internal nodes are represented!
LC P1 MC P2 RC
Insert
15 20
8
4
1 3 5 6 30 409
• Move P1 to P2.
• Insert as P1.
• Now insert a pair with key = 2.
• New pair goes into a 3-node.
16 17
Insert Into A Leaf 3-node• Insert new pair so that the 3 keys are in
ascending order.
• Move third key into a new 2-node.
1 2 3
1 2 3
• Insert second key and pointer to new 2-node into parent.
31
2
Insert Into A Leaf 3-node• Insert new pair so that the 3 keys are in
ascending order.
• Move third key into a new 2-node.
16 17 18
16 17 18
• Insert second key and pointer to new 2-node into parent.
1816
17
Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3
keys are in ascending order.
• Move third key and 3rd and 4th pointers into a new 2-node.
• Insert second key and pointer to new 2-node into parent.
15 17 20
15 17 20
2015
17
Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3
keys are in ascending order.
• Move third key and 3rd and 4th pointers into a new 2-node.
5 6 7
5 6 7
• Insert second key and pointer to new 2-node into parent.
75
6
Insert
• Insert a pair with key = 6 plus a pointer into parent.
30 401
2 4
93 16
15
18
20
8 17
5
7
6
Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3
keys are in ascending order.
• Move third key and 3rd and 4th pointers into a new 2-node.
2 4 6
2 4 6
• Insert second key and pointer to new 2-node into parent.
62
4
Insert
• Insert a pair with key = 6 plus a pointer into parent.
30 401
2 4
93 16
15
18
20
8 17
5
7
6
Insert
• Insert a pair with key = 4 plus a pointer into parent.
30 401 93 16
15
18
20
8 17
6
4
2
5 7
Insert
• Insert a pair with key = 8 plus a pointer into parent.
• There is no parent. So, create a new root.
30 401
93
16
15
18
206
8
2
5 7
417
Delete
• Delete the pair with key = 8.
• Transform deletion from interior into deletion from a leaf.
• Replace by largest in left subtree.
15 20
8
1
2 4
5 6 30 409 16 173
Delete From A Leaf
• Delete the pair with key = 16.
• 3-node becomes 2-node.
15 20
8
1
2 4
5 6 30 409 16 173
Delete From A Leaf
• Delete the pair with key = 17.
• Deletion from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If so borrow a pair and a subtree via parent node.
15 20
8
1
2 4
5 6 30 4093 17
Delete From A Leaf
• Delete the pair with key = 20.
• Deletion from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If not, combine with sibling and parent pair.
15 30
8
1
2 4
5 6 93 20 40
Delete From A Leaf
• Delete the pair with key = 30.
• Deletion from a 3-node.
• 3-node becomes 2-node.
30 40
8
1
2 4
5 6 93
15
Delete From A Leaf8
1
2 4
5 6 93
15
40
• Delete the pair with key = 3.
• Deletion from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If so borrow a pair and a subtree via parent node.
Delete From A Leaf8
1
2 5
94
15
40
• Delete the pair with key = 6.
• Deletion from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If not, combine with sibling and parent pair.
6
Delete From A Leaf8
1 4 5 9
15
40
• Delete the pair with key = 40.
• Deletion from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If not, combine with sibling and parent pair.
2
Delete From A Leaf8
1 4 5
• Parent pair was from a 2-node.
• Check one sibling and determine if it is a 3-node.
• If not, combine with sibling and parent pair.
2
9 15
Delete From A Leaf
1 4 5
• Parent pair was from a 2-node.
• Check one sibling and determine if it is a 3-node.
• No sibling, so must be the root.
• Discard root. Left child becomes new root.
9 15
2 8
2-3-4 Trees
• Problems with 2-3 trees.
LC P1 MC P2 RC
2-3 node structure
• 2-nodes waste space.
• Overhead of moving a pair and pointers when changing between 2-node and 3-node use.
• Extend to 2-3-4 tree, which may be represented as a binary tree.
2-3-4 Tree Definition
• Every internal node is either a 2-, 3-, or 4-node.• 2- and 3-nodes have same properties as in a 2-3 tree.• A 4-node has 3 keys and 4 children; 1st key is smaller than
2nd key which is smaller than 3rd key. All keys in left subtree are smaller than 1st key. All keys in 2nd subtree are bigger than 1st key and smaller than
2nd key. All keys in 3rd subtree are bigger than 2nd key and smaller than
3rd key. All keys in right subtree are bigger than 3rd key.
• All external nodes are on the same level.
Minimum # Of Pairs
• Number of nodes = 2h – 1, where h is tree height (excluding external nodes).
• Each node has 1 (key, value) pair.
• So, minimum # of pairs = 2h – 1
Maximum # Of Pairs
• Happens when all internal nodes are 4-nodes.
• Full degree 4 tree.
• # of nodes = 1 + 4 + 42 + 43 + … + 4h-1 = (4h – 1)/3.
• Each node has 3 pairs.
• So, # of pairs = 4h – 1.
Node Structure
• 2-node uses LC, P1, and LMC.• 3-node uses LC, P1, LMC, P2, and RMC.• 4-node uses all fields.• Optional parent field.• Only internal nodes are represented!
LC P1 LMC P2 RCRMC P3
Two-Pass Insert
• Move down from root to a leaf.
• Insert in leaf.
• If leaf now has 4 pairs, split as below.
A B C D E
10 20 30 40
• Insert 20 and pointer to new 3-node into parent, as was done for 2-3 trees.
10
A B
20
30 40
C D E
Two-Pass Delete
• Transform interior delete to leaf delete.• Delete from a 3-node or 4-node leaf reduces leaf
degree.• Delete from a 2-node leaf.
Check one sibling and determine if it is a 3- or 4-node.
If so, borrow a pair and a subtree via parent node.
If not, combine with sibling 2-node and in-between pair in parent. Continue up the tree if parent was a 2-node.
One-Pass Operations
• No bottom-to-top pass.
• Can pipeline inserts.
• Can pipeline deletes from leaf nodes.
Top-Down Insert
• Bottom-up pass is triggered when new pair is inserted into a 4-node leaf.
• Split 4-nodes on the way down so you never insert into a 4-node leaf!
• Look before you leap! If the node you are about to move to is a 4-node,
split it into two 2-nodes. Then move to a 2-node.
Cases For 4-node Move
• The 4-node we attempt to move to may be: The root. Child of a 2-node. Child of a 3-node.
• It cannot be the child of a 4-node, because we will never be at a 4-node.
Root Is A 4-node
• Height of tree increases by 1.• Compare with y and then move to x or z.
x y z
a b c d
x
y
z
a b c d
4-node Left Child Of 2-node
• No change in height of subtree.
• Compare with x and then move to w or y.
• 4-node right child of 2-node is similar.
w x y
a b c d
z
e w
z
y
a b c d
x
e
4-node Left Child Of 3-node
• No change in height of subtree.• Compare with w and then move to v or x.• 4-node middle or right child of 3-node is similar.
v w x
a b c d
f
zy
e v x
a b c d
w y z
fe
Top-Down Delete
• Bottom-up pass is triggered when deletion is from a 2-node leaf.
• Look before you leap! May start at a 2-node root but may not be at any
other 2-node. If the node you are about to move to is a 2-
node, make it a 3-node or 4-node. Then move to the 3-node 4-node.
Cases For 2-node Move
• Moving to a 2-node root is permitted.• No other move to a 2-node is permitted.• Other attempts to move to a 2-node may be
classified as below. The 2-node’s nearest sibling is also a 2-node. The 2-node’s nearest sibling is a 3-node. The 2-node’s nearest sibling is a 4-node. In each of the preceding cases, the 2-node’s
parent may be a 2-node root, a 3-node, or a 4-node.
Moving To 2-node Whose Nearest Sibling Is 2-node
• Current node is 2-node => at root.
x y z
a b c dx
y
z
a b c d
• Height decreases by 1.• Reapply moving rules before you move down.
• No change in height of subtree.
• Moving to middle or right child is similar.
• Current node is 4-node is also similar.
w x y
a b c d
z
ew
z
y
a b c d
x
e
• Current node is 3-node.
• Moving to w.
Moving To 2-node Whose Nearest Sibling Is 2-node
Moving To 2-node Whose Nearest Sibling Is 3-node
• Current node is 2-node => at root.
w
x
a b c
z
d
y
e
y
z
d ea
x
b
w
c• Moving to w.• No change in height of tree.
• Moving to right child 2-node is similar.
Moving To 2-node Whose Nearest Sibling Is 3-node
• No change in height of subtree.
• Moving to middle or right child 2-node is similar.
• Current node is 4-node is also similar.
• Current node is 3-node.
• Moving to v.
v
z
a c
w
f
b
y
d
x
e
y
z
da
x
f
e
w
b
v
c
Moving To 2-node Whose Nearest Sibling Is 4-node
• Current node is 2-node => at root.
• Moving to u.• No change in height of tree.
• Moving to right child 2-node is similar.
u
v
a b c
x
d
w
e
y
f
w
a
v
b
u
c
yx
d e f
Moving To 2-node Whose Nearest Sibling Is 4-node
• No change in height of subtree.
• Moving to middle or right child 2-node is similar.
• Current node is 4-node is also similar.
• Current node is 3-node.
• Moving to u.
u
z
a c
v
g
b
x
d
w
e
y
f
z
a
w
gv
b
u
c
yx
d e f
Binary Tree Representation Of 2-3-4 Trees
• Problems with 2-3-4 trees.
• 2- and 3-nodes waste space.
• Overhead of moving pairs and pointers when changing among 2-, 3-, and 4-node use.
• Represented as a binary tree for improved space and time performance.
2-3-4 node structure
LC P1 LMC P2 RCRMC P3
Properties Of Binary Tree Representation
• Nodes and edges are colored. The root is black. Nonroot black node has a black edge from its
parent. Red node has a red edge from its parent.
• Can deduce edge color from node color and vice versa.
• Need to keep either edge or node colors, not both.
Red Black Trees
Colored Nodes Definition• Binary search tree.• Each node is colored red or black.• Root and all external nodes are black.• No root-to-external-node path has two
consecutive red nodes.• All root-to-external-node paths have the
same number of black nodes
Red Black Trees
Colored Edges Definition• Binary search tree.• Child pointers are colored red or black.• Pointer to an external node is black.• No root to external node path has two
consecutive red pointers.• Every root to external node path has the
same number of black pointers.
Red Black Tree
• The height of a red black tree that has n (internal) nodes is between log2(n+1) and 2log2(n+1).
• C++ STL implementation
• java.util.TreeMap => red black tree
4-node Left Child Of 3-node
v w x
a b c d
f
zy
e v x
a b c d
w y z
fe
y
v x
w
a b c d
z
e f
y
v x
w
a b c d
z
e f
4-node Left Child Of 3-node
v w x
a b c d
f
zy
e v x
a b c d
w y z
fe
y
v x
w
a b
z
e
f
c d
y
v x
w
a b c d
z
e f
4-node Middle Child Of 3-node
v
w y
x
b c
z
af
d e
x
w
v
ya
db c
z
e
f
w x y
b c d e
f
zv
a wa
b c
fy
d e
v x z
4-node Middle Child Of 3-node
d
z
w y
x
b c
v
a
f
e
x
w
v
ya
db c
z
e
f
w x y
b c d e
f
zv
a wa
b c
fy
d e
v x z
4-node Right Child Of 3-node
• One orientation of 3-node requires color flip.
• Other orientation requires RR rotation.
Red-Black Analysis
• Less memory required than by 2-3-4 representation.
• Less time required by 4-node splits when red-black representation is used.
• O(log n) rotations per insert/delete.
B-Trees
• Extension of 2-3 and 2-3-4 trees to higher degree trees.
• Used to represent very large dictionaries that reside on disk.
AVL Trees
• n = 230 = 109 (approx).
• 30 <= height <= 43.
• When the AVL tree resides on a disk, up to 43 disk access are made for a search.
• This takes up to (approx) 4 seconds.
• Not acceptable.
Red-Black Trees
• n = 230 = 109 (approx).
• 30 <= height <= 60.
• When the AVL tree resides on a disk, up to 60 disk access are made for a search.
• This takes up to (approx) 6 seconds.
• Not acceptable.
Maximum # Of Pairs
• Happens when all internal nodes are m-nodes.
• Full degree m tree.
• # of nodes = 1 + m + m2 + m3 + … + mh-1
= (mh – 1)/(m – 1).
• Each node has m – 1 pairs.
• So, # of pairs = mh – 1.
Capacity Of m-Way Search Tree
m = 2 m = 200 h = 3 7 8 * 106 - 1
h = 5 31 3.2 * 1011 - 1
h = 7 127 1.28 * 1016 - 1
Definition Of B-Tree
• Definition assumes external nodes (extended m-way search tree).
• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least
ceil(m/2) children. External (or failure) nodes on same level.
2-3 And 2-3-4 Trees
• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least
ceil(m/2) children. External (or failure) nodes on same level.
• 2-3 tree is B-tree of order 3.
• 2-3-4 tree is B-tree of order 4.
B-Trees Of Order 5 And 2
• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least
ceil(m/2) children. External (or failure) nodes on same level.
• B-tree of order 5 is 3-4-5 tree (root may be 2-node though).
• B-tree of order 2 is full binary tree.
Minimum # Of Pairs• n = # of pairs.
• # of external nodes = n + 1.
• Height = h => external nodes on level h + 1.
level # of nodes
1 12 >= 23 >= 2*ceil(m/2)
h + 1 >= 2*ceil(m/2)h-1
n + 1 >= 2*ceil(m/2)h-1, h >= 1
Minimum # Of Pairs
• m = 200.
n + 1 >= 2*ceil(m/2)h-1, h >= 1
height # of pairs
2 >= 1993 >= 19,9994 >= 2 * 106 – 1
5 >= 2 * 108 – 1
h <= log ceil(m/2) (n+1)/2 + 1
Choice Of m
• Worst-case search time. (time to fetch a node + time to search node) * height (a + b*m + c * log2m) * h
where a, b and c are constants.
m
search time
50 400
Bottom-Up Insert
15 20
8
4
1 3 5 6 30 409
Insertion into a full leaf triggers bottom-up node splitting pass.
16 17
Split An Overfull Node
• ai is a pointer to a subtree.
• pi is a dictionary pair.
m a0 p1 a1 p2 a2 … pm am
ceil(m/2)-1 a0 p1 a1 p2 a2 … pceil(m/2)-1 aceil(m/2)-1
m-ceil(m/2) aceil(m/2) pceil(m/2)+1 aceil(m/2)+1 … pm am
• pceil(m/2) plus pointer to new node is inserted in parent.
Worst-Case Disk Accesses
• Assume enough memory to hold all h nodes accessed on way down.
• h read accesses on way down.
• 2s+1 write accesses on way up, s = number of nodes that split.
• Total h+2s+1 disk accesses.
• Max is 3h+1.
Average Disk Accesses• Start with empty B-tree.• Insert n pairs.• Resulting B-tree has p nodes.• # splits <= p –2, p > 2.
• # pairs >= 1+(ceil(m/2) – 1)(p – 1).
• savg <= (p – 2)/(1+(ceil(m/2) – 1)(p – 1)).
• So, savg < 1/(ceil(m/2) – 1).
• m = 200 => savg < 1/99.
• Average disk accesses < h + 2/99 + 1 ~ h + 1.• Nearly minimum.