26
COSC 2007 Data Structures II Chapter 14 External Methods

COSC 2007 Data Structures II Chapter 14 External Methods

Embed Size (px)

Citation preview

Page 1: COSC 2007 Data Structures II Chapter 14 External Methods

COSC 2007Data Structures II

Chapter 14External Methods

Page 2: COSC 2007 Data Structures II Chapter 14 External Methods

2

Topics

Indexing B tree

Insertion deletion

B+ tree

Page 3: COSC 2007 Data Structures II Chapter 14 External Methods

3

External Data Structure Data structures are not always stored in

the computer memory Volatile Has a limited capacity Fast, which makes it relatively expensive

Sometimes, we need to store, maintain and perform operations on our data structures entirely on disk Called external data structures

Page 4: COSC 2007 Data Structures II Chapter 14 External Methods

4

External Data Structure Problems: disks are much slower than

memory Disk access time usually measured in milliseconds Memory access time measured in nanoseconds

So the same data structures that work well in memory may be really awful on disk

Page 5: COSC 2007 Data Structures II Chapter 14 External Methods

5

External Data Structure Two types of files

Sequential files Access to records done in a strictly sequential

manner Searching a file using sequential access takes

O(n) where n is the number of records in file to be read

Random files Access to records done strictly by a key look up

mechanism

Page 6: COSC 2007 Data Structures II Chapter 14 External Methods

6

Indexing Given the physical characteristics of

secondary memory, need to optimize disk I/O Block or page is smallest unit of disk space that

can be input/output Many records per block, sorted by key value

In order to gain fast random access to records in block, maintain index structure Index on largest/smallest key value

Page 7: COSC 2007 Data Structures II Chapter 14 External Methods

7

Indexing

An index is much like an index in a book In a book, an index provides a way to quickly look up

info on a particular topic by giving you a page number which you then use to go directly to the info you need

In an Indexed file, the index accepts a key value and gives you back the disk address of a block of data containing the data record with that key

Thus, an indexed file consists of two parts The index The actual file data

Page 8: COSC 2007 Data Structures II Chapter 14 External Methods

8

B-Trees Almost all file systems on almost all computers use

B-Trees to keep track of which portions of which files are in which disk sectors.

B-Trees are an example of multiway trees. In multiway trees, nodes can have multiple data

elements (in contrast to one for a binary tree node). Each node in a B-Tree can represent possibly many

subtrees.

Page 9: COSC 2007 Data Structures II Chapter 14 External Methods

9

2-3 Trees A 2-node, which has two children

Must contain a single data item whose search key si greater than the left child’s and less than the right child’s

A 3-node, which has three children Must contain two data items whose search keys

satisfy certain condition A leaf node contain either one of two data items

s

<S >s

S L

<S >S, <L >L

Page 10: COSC 2007 Data Structures II Chapter 14 External Methods

10

m-Way Trees An m-way tree is a search tree in which each node can

have from zero to m subtrees. m is defined as the order of the tree. In a nonempty m-way tree:

Each node has 0 to m subtrees. Given a node with k<m subtrees, the node contains k subtrees (some of

which may be null) and k-1 data entries. The keys are ordered, key1<=key2<=key3<=….<=keyk-1.

The key values in the first subtree are less than the key values in the first entry.

A binary search tree is an m-way tree of order 2. A 2-3 tree is an m-way tree of order 3

Page 11: COSC 2007 Data Structures II Chapter 14 External Methods

11

An m-way tree

A 4-way Tree

Keys

Subtrees

K1 K2 K3

Keys < K1 K1 <=Keys < K2 K2 <=Keys < K3 Keys >= K3

A binary search tree is an m-way tree of order 2.

Page 12: COSC 2007 Data Structures II Chapter 14 External Methods

12

B-Trees

A B-Tree is an m-way tree with the following additional properties: The root is either a leaf or it has 2….m subtrees. All internal nodes have at least m/2 non-null subtrees and at most m nonnull

subtrees. All leaf nodes are at the same level; that is, the tree is perfectly balanced. A leaf node has at least m/2 -1 and at the most m-1 entries.

There are four basic operations for B-Trees: insert (add) delete (remove) traverse search

Page 13: COSC 2007 Data Structures II Chapter 14 External Methods

13

A B-tree of Order 5* (m=5)

*Min # of subtrees is 3 and max is 5;*Min # of entries is 2 and max is 4

42

11 14 17 19 20 21 22 23 24 45 52 63 65 74 78 79 85 87 94 97

16 21 58 76 81 93

Root

Node with minimum entries (2)

Node with maximumentries (4)

Four keys, five subtrees

Page 14: COSC 2007 Data Structures II Chapter 14 External Methods

14

B-Tree Search

Search in a B-tree is a generalization of search in a 2-3 tree.

Perform a binary search on the keys in the current node. If the search key is found, then return the record. If the current node is a leaf node and the key is not found, then report an unsuccessful search.

Otherwise, follow the proper branch and repeat the process.

Page 15: COSC 2007 Data Structures II Chapter 14 External Methods

15

Insertion B-tree insertion takes place at a leaf node. Step 1: locate the leaf node for the data being inserted.

if node is not full (max no. of entries) then insert data in sequence in the node.

When leaf node is full, we have an overflow condition. Insert the element anyway (temporary violate tree conditions) Split node into two nodes Each new node contains half the data middle entry is promoted to the parent (which may in turn become

full!) B-trees grow in a balanced fashion from the bottom up!

Page 16: COSC 2007 Data Structures II Chapter 14 External Methods

16

Follow Through An Example Given a B-Tree structure of order m=5. Insert 11, 21, 14, 78, and 97. Because order 5, a single node can contain a maximum of 4 (m -1) entries. Step 1.

11 causes the creation of a new node that becomes the root of the tree. As 21, 14, and 78 are inserted, they are just added (in order) to the root node

(which is the only node in the tree at this point.

Inserting 97 causes a problem, because the node where it should go (the root) is full.

11

root

11 14 21 78

root

Page 17: COSC 2007 Data Structures II Chapter 14 External Methods

17

Inserting 97 When root node is full (that is, the node where the current value should go):

CHEAT! Insert 97 in the node anyway.

Now, because the node is larger than allowed, split it into two nodes:

Propagate median value (21) to root node and insert it there (causes creation of a new root node in this case).11 14 21 78

root

97 Violation!

11 14 21 78 97

Page 18: COSC 2007 Data Structures II Chapter 14 External Methods

18

Creation of a new Root Node

Tree grows ‘from bottom up’. Tree is always balanced.

11 14 78 97

21

Page 19: COSC 2007 Data Structures II Chapter 14 External Methods

19

Continuing the Example Suppose I now add the following keys to the tree: 85, 74, 63,

42, 45, 57. Inserting 85 then 74

11 14 78 85

21

97

12

74

Now insert 63…what happens

Page 20: COSC 2007 Data Structures II Chapter 14 External Methods

20

Example, cont’d. 63 causes the node to overflow - but add it anyway!

11 14 78 85

21

97

3

7463

This node violates the B-tree conditionsso it must be split.

78 85 977463

split it up

Page 21: COSC 2007 Data Structures II Chapter 14 External Methods

21

Example: Splitting a node

85 977463

78

1

23

4

1. Median value is to be sent to parent node - 78 here2,3: Create a temporary root node with one entry (78) and attach links to right and left subtrees4. Insert this node into the nodelist of the parent

Page 22: COSC 2007 Data Structures II Chapter 14 External Methods

22

Example: Tree after inserting 63

Now insert 45 and 42 Then insert 57

11 14 85

21

977463

78

Page 23: COSC 2007 Data Structures II Chapter 14 External Methods

23

Example: adding 42, 45, and 57

11 14 7463 85 97

21 57 78

4542

Page 24: COSC 2007 Data Structures II Chapter 14 External Methods

24

B-tree Deletion Deletion is done similarly If the number of items in a leaf falls below the

minimum, adopt an item from a neighboring leaf If the number of items in the neighboring leaves

are also minimum, combine two leaves. Their parent will then lose a child and it may need to be combined with its neighbor

This combination process should be recursively executed up the tree until: Getting to the root A parent has more than the minimum number of children

Page 25: COSC 2007 Data Structures II Chapter 14 External Methods

25

B+ Trees

B-tree only (effectively) gives you random access to data

B+ tree gives you the ability to access data sequentially as well Internal nodes do not store records, only key values to guide the

search. Leaf nodes store records or pointers to to the records. A leaf node has a pointer to the next sibling node. This allows

for sequential processing. An internal node with 3 keys has 4 pointers. The 3 keys are the

smallest values in the last 3 nodes pointed to by the 4 pointers. The first pointer points to nodes with values less than the first key.

Page 26: COSC 2007 Data Structures II Chapter 14 External Methods

26

Sample B+-Tree

Los Angeles

Detroit

Baltimore Chicago Detroit

Redwood City

Los Angeles

Redwood City SF B+-tree with n=3 interior nodes: no more than 3 pointers, but at least 2