Upload
griselda-townsend
View
212
Download
0
Embed Size (px)
Citation preview
Arboles B
2
7.1 External Search
• The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
• Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower)
Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM
For efficiency: keep the number of accesses to the pages low!
3
For external search: a variant of search trees:1 node = 1 page
Multiple way search trees!
4
Definition (Multiple way-search trees)
An empty tree is a multiple way search tree with an empty set of keys {} .
Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence:
T0 k1 T1 k2 T2 k3 .... kn Tn
a multiple way-search trees only when:
• for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x
5
B-Tree
Definition
A B-Tree of Order m is a multiple way tree with the following characteristics
• 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes.• All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly s+1
children. • 2-3 Trees is a particular case for m=1
6
Example: a B-tree of order 2:
7
Assessment of B-trees
The minimal possible number of nodes in a B-tree of order m and height h:
• Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1
= ( (m+1)h – 1) / m.
The root of the minimal tree has only one key and two children, all other nodes have m keys.
Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h – 1
Thus the following holds for each B-tree of height h with n keys:h logm+1 ((n+1)/2) .
8
ExampleThe following holds for each B-tree of height h with n keys:
h logm+1 ((n+1)/2).
Example: for• Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have
h log 64 500 000.5 < 4 and with that hmax = 3.
9
Algorithms for searching keys in a B-tree
Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p = pointer to last node visited in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)
10
Algorithms for inserting and deleting of keys in a B-tree
Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }
11
Algorithm overflow (p) = split (p)
Algorithm split (p) first case: p has a parent q.
Divide the overflowed node. The key of the middle goes to the parent.
remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.
Algorithm Split (1)
12
Algorithm split (p) second case: p is the
root.
Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).
Algorithm Split (2)
13
//delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }
Algorithm delete (r,x)
14
Algorithm underflow (p)
if p has a neighboring node with s>m nodes { balance (p,p') }else // because p cannot be the root, p must have a neighbor with
m keys { be p' the neighbor with m keys; merge (p,p')}
15
Algorithm balance (p, p') // balance node p with its neighbor p'
(s > m , r = (m+s)/2 -m )
16
Algorithm merge (p,p') // merge node p with its neighbor perform the following operation:
afterwards:if( q <> root) and (q
has m-1 keys) underflow (q)
else (if(q= root) and (q empty)) {free q let root point to p^}
17
Recursion
If when performing underflow we have to perform merge, we might have to perform underflow again one level up
This process might be repeated until the root.
18
Example:B-Tree of order 2 (m = 2)
19
Cost
Be m the order of the B-tree, n the number of keys.
Costs for search , insert and delete: O(h) = O(logm+1 ((n+1)/2) )
= O(logm+1(n)).
20
Remark:
B-trees can also be used as internal storage structure:
Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes).
Cost of search, insert, delete: O(log n).
21
Remark: use of storage memory
Over 50%reason: the condition:
1/2•k #(keys in the node) k For nodes root
(k=2m)