Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Preview:

DESCRIPTION

Factors of Indices 1. Access type 1. Access type 2. Access Time 2. Access Time 3. Insertion time 3. Insertion time 4. Deletion time 4. Deletion time 5. Space overhead 5. Space overhead

Citation preview

Indexing and B+-TreesIndexing and B+-Trees

By Kenneth CheungBy Kenneth CheungCS 157B TR 07:30-08:45CS 157B TR 07:30-08:45

Professor LeeProfessor Lee

Introduction to IndexingIntroduction to Indexing

Goal: to make it easier to look Goal: to make it easier to look up dataup data

Do by saving the data in a Do by saving the data in a sorted, compressed versionsorted, compressed version

Searching and insertion will be Searching and insertion will be easiereasier

Factors of IndicesFactors of Indices

1. Access type1. Access type 2. Access Time2. Access Time 3. Insertion time3. Insertion time 4. Deletion time4. Deletion time 5. Space overhead5. Space overhead

Clustering IndexClustering Index

an index whose search key also an index whose search key also defines the sequential order of defines the sequential order of the filethe file

Index-sequential filesIndex-sequential files

files ordered sequentially on a files ordered sequentially on a search keysearch key

Index RecordIndex Record

(aka index entry)- holds the (aka index entry)- holds the search-key value and pointers to search-key value and pointers to the records with the valuethe records with the value

PointerPointer

identifies disk block or offset to identifies disk block or offset to disk blockdisk block

Dense IndexDense Index

a record appears for every a record appears for every search key value. Records are search key value. Records are stored in the same search-keystored in the same search-key

faster access time, but higher faster access time, but higher space overheadspace overhead

Sparse IndexSparse Index

an index record appears on an index record appears on some search-key values. To find some search-key values. To find a record, the system finds the a record, the system finds the largest search key value that is largest search key value that is less than or equal to the given less than or equal to the given search-key value then it moves search-key value then it moves up to finds it if it is notup to finds it if it is not

lower space overhead, but lower space overhead, but higher access timehigher access time

Larger DatabasesLarger Databases

Make a sparse index on a Make a sparse index on a clustering index, using 2 levels clustering index, using 2 levels of indicesof indices

Multilevel indices search faster Multilevel indices search faster than a binary searchthan a binary search

Index Update (Insertion)Index Update (Insertion)

A. Look up search keyA. Look up search key B. If the index record stores all B. If the index record stores all

pointers with the same index pointers with the same index value, then add a new pointer to value, then add a new pointer to the index recordthe index record

C. Otherwise, the index stores C. Otherwise, the index stores the first pointer to the index the first pointer to the index valuevalue

Index update- (Insertion to Index update- (Insertion to Sparse Indices)Sparse Indices) For sparse indices, if the system For sparse indices, if the system

makes a new block, then it must makes a new block, then it must add the first search-key value to add the first search-key value to the new index the new index

if the value has the least search if the value has the least search key value in the block, the index key value in the block, the index record is updated pointing to the record is updated pointing to the blockblock

DeletionDeletion

A. Look up recordA. Look up record B. If it was a dense index and B. If it was a dense index and

the record deleted was the only the record deleted was the only one with the search key, then one with the search key, then delete the key form the indexdelete the key form the index

C. If the record stores pointers C. If the record stores pointers to all records, then the pointer to to all records, then the pointer to the deleted record is removedthe deleted record is removed

Deletion (cont’d)Deletion (cont’d)

D. If the record stores the D. If the record stores the pointer to the first record and the pointer to the first record and the first record is deleted, then the first record is deleted, then the pointer moves to the following pointer moves to the following recordrecord

E. If the index is sparse and E. If the index is sparse and the index does not contain the the index does not contain the search-key value, then the index search-key value, then the index remains the same.remains the same.

Deletion (cont’d)Deletion (cont’d)

F. If deleted record had the only F. If deleted record had the only search key, then the system search key, then the system replaces the corresponding replaces the corresponding index search record for the next index search record for the next search key value. If the next search key value. If the next search key value is an index search key value is an index entry, then the entry is deleted entry, then the entry is deleted instead of being replacedinstead of being replaced

Deletion (cont’d)Deletion (cont’d)

G. If the index record for the G. If the index record for the search-key point to the record search-key point to the record being deleted, the pointer goes being deleted, the pointer goes to the next record with the same to the next record with the same search key value.search key value.

Secondary IndicesSecondary Indices

A. Secondary Indices are dense A. Secondary Indices are dense and points to all recordsand points to all records

B. Stored sequentially and may B. Stored sequentially and may not have non-candidate keysnot have non-candidate keys

C. If a multi-indexed database is C. If a multi-indexed database is updated, then every index must updated, then every index must be updated alsobe updated also

B+-TreesB+-Trees

An alternative to An alternative to Binary Search TreesBinary Search Trees

Conditions of a B+-TreeConditions of a B+-Tree

A. Search-key values are K1, A. Search-key values are K1, K2...Kn-1K2...Kn-1

B. Pointers P1, P2...PnB. Pointers P1, P2...Pn C. Search key values are kept in C. Search key values are kept in

sorted ordersorted order

Conditions (cont’d)Conditions (cont’d)

D. Pointer P points to a file D. Pointer P points to a file record with a search-key value record with a search-key value of K or a bucket of more of K or a bucket of more pointerspointers

E. Each node has more than 2 E. Each node has more than 2 pointers (binary tree has 2)pointers (binary tree has 2)

F. Stores redundant search-key F. Stores redundant search-key valuesvalues

BucketsBuckets

Buckets are used only if the Buckets are used only if the search key value does not form search key value does not form a candidate key and if the file is a candidate key and if the file is not stored in search key ordernot stored in search key order

LeavesLeaves

A. Each leaf holds up to n-1 A. Each leaf holds up to n-1 valuesvalues

B. Pointers P chain together B. Pointers P chain together leaf nodes in search key orderleaf nodes in search key order

C. Non-leaf nodes are sparse C. Non-leaf nodes are sparse multilevel indicesmultilevel indices

Leaves (cont’d)Leaves (cont’d)

D. Non-leaf nodes may hold up D. Non-leaf nodes may hold up to n/2 ceil to n pointersto n/2 ceil to n pointers

E. Number of pointers in a node E. Number of pointers in a node is a fan out of a nodeis a fan out of a node

F. The root must hold at 2 to n/2 F. The root must hold at 2 to n/2 pointerspointers

Queries for finding VQueries for finding V

A. To find search-key value V, A. To find search-key value V, start at rootstart at root

B. It looks for the smallest B. It looks for the smallest search-key greater than Vsearch-key greater than V

C. If it finds a K, then the pointer C. If it finds a K, then the pointer P goes to another nodeP goes to another node

Queries (cont’d)Queries (cont’d)

D. The process repeats going D. The process repeats going down the tree by finding a down the tree by finding a search key value K that equals search key value K that equals V. V.

E. If there is no K that equals V E. If there is no K that equals V at the leaf, then no such record at the leaf, then no such record existsexists

B+-tree InsertionB+-tree Insertion

A. First look upA. First look up B. If the search key value exists in B. If the search key value exists in

the leaf node, then add a file to the the leaf node, then add a file to the record and a bucket pointer if record and a bucket pointer if necessarynecessary

C. If a search-key value does not C. If a search-key value does not exist, then insert a new record into exist, then insert a new record into the file and make a new bucket and the file and make a new bucket and pointer if necessarypointer if necessary

Insertion (cont’d)Insertion (cont’d)

D. If there is no search key value D. If there is no search key value and there is no room in the node, and there is no room in the node, then split the node.then split the node.

E. Adjust the two leaves to a new E. Adjust the two leaves to a new greatest and least search-key valuegreatest and least search-key value

F. After a split, insert a new node to F. After a split, insert a new node to the parent and repeat the process of the parent and repeat the process of splitting when it gets too fullsplitting when it gets too full

B+-Tree DeletionB+-Tree Deletion

A. Look up the record and A. Look up the record and remove it from fileremove it from file

B. If no bucket was associated B. If no bucket was associated with its search-key value, with its search-key value, remove the search-key valueremove the search-key value

C. If the bucket is empty, C. If the bucket is empty, remove the search-key valueremove the search-key value

Deletion (cont’d)Deletion (cont’d)

D. If there are too few pointers D. If there are too few pointers in a node, transfer teh pointers in a node, transfer teh pointers to a sibling node, then delete itto a sibling node, then delete it

E. If transferring pointers gives a E. If transferring pointers gives a node to many pointers, node to many pointers, redistribute the pointers. the redistribute the pointers. the parent of the two nodes, need to parent of the two nodes, need to change pointerschange pointers

B+-Tree File OrganizationB+-Tree File Organization

A. Leaf nodes store records instead A. Leaf nodes store records instead of pointers to recordsof pointers to records

B. Insertion and deletion happens B. Insertion and deletion happens the same waythe same way

C. When inserting, the system adds C. When inserting, the system adds the record to the block if there is the record to the block if there is enough space, otherwise it splits the enough space, otherwise it splits the blockblock

D. Any Split will propagate upward if D. Any Split will propagate upward if necessarynecessary

BibliographyBibliography

Sliberchatz, Abraham, Henry F. Sliberchatz, Abraham, Henry F. Korth, and S. Sudarshan Korth, and S. Sudarshan Database System Concepts 5th Database System Concepts 5th Ed. Boston: McGraw Hill, 2002. Ed. Boston: McGraw Hill, 2002. Ch 12Ch 12

Recommended