Upload
kenneth-mason
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
COSC 2007Data Structures II
Chapter 15External Methods
2
Topics
Indexing B tree
Insertion deletion
B+ tree
3
External Data Structure Data structures are not always stored in
the computer memory Volatile Has a limited capacity Fast, which makes it relatively expensive
Sometimes, we need to store, maintain and perform operations on our data structures entirely on disk Called external data structures
4
External Data Structure Problems: disks are much slower than
memory Disk access time usually measured in milliseconds Memory access time measured in nanoseconds
So the same data structures that work well in memory may be really awful on disk
5
External Data Structure Two types of files
Sequential files Access to records done in a strictly sequential
manner Searching a file using sequential access takes
O(n) where n is the number of records in file to be read
Random files Access to records done strictly by a key look up
mechanism
A Look At External Storage
7
Indexing In order to gain fast random access to
records in block, maintain index structure Index on largest/smallest key value
8
Indexing
An index is much like an index in a book In a book, an index provides a way to quickly look up info
on a particular topic by giving you a page number which you then use to go directly to the info you need
In an Indexed file, the index accepts a key value and gives you back the disk address of a block of data containing the data record with that key
Thus, an indexed file consists of two parts The index The actual file data
Indexing An External File An index (or index file)
Used to locate items in an external data file Contains an index record for each record in the data file
Indexing An External File An index record has two parts
A key contains the same value as the search key of its corresponding record in the data file
A pointer shows the number of the block in the data file that contains the data record
Advantages of an index file An index file can often be manipulated with fewer block
accesses than would be needed to manipulate the data file Data records do not need to be shifted during insertions and
deletions Allows multiple indexing
Indexing An External File
A simple scheme for organizing the index file Store index records sequentially
12
B-Trees Almost all file systems on almost all computers use
B-Trees to keep track of which portions of which files are in which disk sectors.
B-Trees are an example of multiway trees. In multiway trees, nodes can have multiple data
elements (in contrast to one for a binary tree node). Each node in a B-Tree can represent possibly many
subtrees.
13
2-3 Trees A 2-node, which has two children
Must contain a single data item whose search key si greater than the left child’s and less than the right child’s
A 3-node, which has three children Must contain two data items whose search keys
satisfy certain condition A leaf node contain either one of two data items
s
<S >s
S L
<S >S, <L >L
Nodes
Figure 15-11Figure 15-11a) A node with two children; b) a node with three children; c) a node with m
children
15
m-Way Trees An m-way tree is a search tree in which each node can
have from zero to m subtrees. m is defined as the order of the tree. In a non-empty m-way tree:
Each node has 0 to m subtrees. Given a node with k<m subtrees, the node contains k subtrees (some of
which may be null) and k-1 data entries. The keys are ordered, key1<=key2<=key3<=….<=keyk-1.
The key values in the first subtree are less than the key values in the first entry.
A binary search tree is an m-way tree of order ?. A 2-3 tree is an m-way tree of order ?
16
An m-way tree
A 4-way Tree
Keys
Subtrees
K1 K2 K3
Keys < K1 K1 <=Keys < K2 K2 <=Keys < K3 Keys >= K3
17
B-Trees
A B-Tree is an m-way tree with the following additional properties: The root is either a leaf or it has 2….m subtrees. All internal nodes have at least m/2 non-null subtrees and at most m nonnull
subtrees. All leaf nodes are at the same level; that is, the tree is perfectly balanced. A leaf node has at least m/2 -1 and at the most m-1 entries.
There are four basic operations for B-Trees: insert (add) delete (remove) traverse search
18
A B-tree of Order 5* (m=5)
*Min # of subtrees is 3 and max is 5;*Min # of entries is 2 and max is 4
42
11 14 17 19 20 21 22 23 24 45 52 63 65 74 78 79 85 87 94 97
16 21 58 76 81 93
Root
Node with minimum entries (2)
Node with maximumentries (4)
Four keys, five subtrees
19
B-Tree Search
Search in a B-tree is a generalization of search in a 2-3 tree.
Perform a binary search on the keys in the current node. If the search key is found, then return the record. If the current node is a leaf node and the key is not found, then report an unsuccessful search.
Otherwise, follow the proper branch and repeat the process.
20
Insertion B-tree insertion takes place at a leaf node. Steps: locate the leaf node for the data being inserted.
if node is not full (max no. of entries) then insert data in sequence in the node.
When leaf node is full, we have an overflow condition. Insert the element anyway (temporary violate tree conditions) Split node into two nodes Each new node contains half the data middle entry is promoted to the parent (which may in turn become
full!) B-trees grow in a balanced fashion from the bottom up!
21
Follow Through An Example Given a B-Tree structure of order m=5. Insert 11, 21, 14, 78, and 97. Suppose I now add the following keys to the tree: 85, 74, 63,
42, 45, 57.
22
B-tree Deletion Deletion is done similarly
If the number of items in a leaf falls below the minimum, adopt an item from a neighboring leaf
If the number of items in the neighboring leaves are also minimum, combine two leaves. Their parent will then lose a child and it may need to be combined with its neighbor
This combination process should be recursively executed up the tree until:
Getting to the root A parent has more than the minimum number of children
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
27
B+ Trees
B-tree only (effectively) gives you random access to data
B+ tree gives you the ability to access data sequentially as well Internal nodes do not store records, only key values to guide the
search. Leaf nodes store records or pointers to the records. A leaf node has a pointer to the next sibling node. This allows
for sequential processing. An internal node with 3 keys has 4 pointers. The 3 keys are the
smallest values in the last 3 nodes pointed to by the 4 pointers. The first pointer points to nodes with values less than the first key.
28
Sample B+-Tree
Los Angeles
Detroit
Baltimore Chicago Detroit
Redwood City
Los Angeles
Redwood City SF B+-tree with n=3 interior nodes: no more than 3 pointers,
but at least 2