Ch.12 Indexing and Hashing

Common DB operations we want to support support: random lookup + sequential scan

READ p.482 → Five factors for evaluating indexing/hashing algorithms

Insertion

Deletion

Concepts:

Classifications:

Clustered (a.k.a. primary) vs. non-clustered (a.k.a. secondary)

Dense vs. sparse

Examples:

Dense:

Sparse:

Clustered or non-clustered?

Other minor practical issues: Overflow blocks

Long records that extend over multiple blocks

Duplicates that extend over multiple blocks

Major practical issue: For a large table, the index itself will be large!

Solutions: Store index in RAM

Store index on disk how many blocks?

o Since index is sorted logarithmic search log2(b) disk accesses

o Logarithmic search vs. linear search, worst-case

Multi-level index → example on next page

Index updates:

Single-level

o Insertion

sparse

o Deletion

Sparse

Multi-level ……..

READ and take notes: Section 12.2.3 → Detailed algorithms for the above

What if the file is not ordered on the desired searck key?

Secondary index

All secondary indices must be dense!

Problem with all index-sequential files:

Both random lookups and sequential scans get slower after many

insertions and deletions, due to overflow blocks

o Solution: reorganize file periodically O(K) linear time

o Solution: leave room to grow wasted memory

o Use a different type of index!

TREES!

Introduction to Section 12.3 – Trees

Fundamental benefit of trees: LOGARITHMIC HEIGHT

N = 15 = 24 – 1

H = 4 = log2(N)

Fundamental problem of trees: BALANCING

---------------------------------------------------------------------------------------------------------

1] List the 3 classification criteria we covered for indices.

2] A further classification criterion for indices is whether their search key (SK) is

a candidate key (CK) of the table or not.

If SK ≠ CK, then we have to solve this problem: how does a unique index entry

point to multiple tuples?

With clustered indices, we can simply point to the tuple containing the first

ocurrence of SK:

Explain why this works!

Does this solution work for unclustered (secondary indices)? Explain why or

why not.

12.3 B+ Trees for index files

B is for balanced … but there are many definitions of balanced!

Properties:

Each key stored in

the node is the

minimal key in the

right sub-tree

Example:

The non-leaf levels form a hierarchy of sparse indices!

Logarithmic height property:

If there are K search-key values in the file, H ≤ log n/2 (K) Explain this for a BT

Why is it important?

Random searches can be performed in logarithmic time b/c the

height of the tree needs only be traversed once!

(algorithm below)

“Back-of-the-envelope” estimate:

------------------------------------------------------------------------

Week 14, Lect 3/3

Quiz: A DB file has a B+ tree index.

The node size in the B+ tree is 4 KB, the searck keys are 24-byte strings, and each

pointer is represented on 8 bytes. What is the maximum # of pointers n that can be

stored in a node?

What is the minimum?

If the file has 5 million search keys, what is the number of disk accesses when we

search for a random key?

What is the number of disk accesses when we access all keys sequentially?

Insertions and deletions to the main file can be handled efficiently,

as the index can be reorganized in logarithmic time.

Important exceptions:

o When inserting, a node becomes too big → split nodes

o When deleting, a node become too small → merge nodes

Insert “Clearview”

Delete “Downtown”

It’s not always possible to merge nodes

Delete “Perryridge” → Node a is left with too few pointers (remember n/2 )

Solution: merge it w/its sibling node → root now has too few pointers → simply

eliminate root and merged node becomes new root!

It’s not always possible to merge nodes!

What if the sibling is (almost) full?

Solution: redistribute the pointers between siblings.

Delete “Perryridge” → As before, has too few pointers, but it’s sibling has now too

borrows the rightmost pointer of .

Rightmost key of can always overwrite the leftmost one of its own parent (root here)!

12.3.4 – B+ Tree File Organization

12.3.5 – Indexing strings

12.4 – B-Trees

12.5 – Multiple-Key Access

12.6 Static Hashing

Hash = implicit index

Notation: set of all search keys K

set of all “bucket” addresses B (buckets are disk blocks)

hash function h is a function from K to B → h(Ki)

A bucket may contain tuples with different search keys → after being read from the disk,

the entire bucket must be searched.

Worst hash function ever: all search keys are mapped into the same bucket!

Properties of a good hash function:

Uniform distribution

Random distribution (Why?)

o Typically, h() operates on the low-level binary representation of the search key

READ example p.508 (31 is prime!)

-----------------------------------------------------------------------------------------------

Week 15, Lect.1/3 (last!)

Practice exercise 12.3 (a): Construct a B+ tree from empty, by inserting the

following values in order:

(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)

The max. # of pointers is n = 4.

Practice exercise 12.4 (d): From the previous tree, delete 23.

Back to hashing …

p.508 “The function can be implemented efficiently …” → Horner’s algorithm!

12.6.2 Bucket overflow

Even if the hash funtion is perfect (i.e. uniform/random), overflow can still occur

due to:

the growth of the DB!

multiple records w/same search key K

Delay overflow by using fudge factor → nB = (nr/fr) (1+d)

When overflow happens, use overflow buckets.

Hash index w/overflow buckets

Do you see why overflow buckets lead to degraded performance?

Solution …

12.7 Dynamic Hashing

Extendable hashing idea: The hashing function generates a “large” number of

bits b (e.g. 32), but not all of them are being used as bucket addresses. Only i (i <

b) are.

Nice example in text pp.515-517

We have the following branch names and the associated hash values (handout):

Buckets can hold only 2 records.

We start w/empty hash table, i = 0 bits → 20 = 1 bucket

Insert Brighton and two Downtown:

Insert Mianus:

Insert three Perryridge:

12.8 Comparison

Ordered indexing (sequential or B+ tree) vs. hashing:

Performance depends on what type of queries we perform most often:

Lookup of individual values vs. range queries

End of material required for final.

Ch.12 Indexing and Hashing - Tarleton State University

Documents

12. Indexing and Hashing in DBMS

1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley

Chapter 12: Indexing and Hashing. 12.2 Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static

Ch12: Indexing and Hashing - fenix. · PDF fileCh12: Indexing and Hashing Basic Concepts Ordered Indices B+ -Tree Index Files B-Tree Index Files Hashing Static and Dynamic Hashing

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic…

Indexing and Hashing Chapter 11 Basic Concepts Ordered Indices B + -Tree Index Files Multiple-Key Access Static Hashing Comparison of Ordered Indexing

Chapter 12: Indexing and Hashing ( Cnt .) - · PDF fileB-Tree Index Files Static Hashing ... Data file degradation problem is solved by using B+-Tree File Organization. ... Implementation

Chapter 12: Indexing and HashingB+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL Multiple-Key

Hash-Based Indexing Torsten Grust Chapter 6db.inf.uni-tuebingen.de/staticfiles/teaching/ws1011/db2/db2-hash... · Hashing vs. B +-trees ... In a B +-tree-world, ... Hash-Based Indexing

11266 Ch12 Indexing and Hashing-2

Indexing: Overview & Hashing - Joyce Hojoyceho.github.io/cs377_s16/slides/index-17.pdf · Example: Extendible Hashing Structure (3) Insert new tuple Einstein whose ﬁrst 1 bit hash

Chapter 12: Indexing and Hashing

Indexing and Hashing - Simon Fraser · PDF file... Indexing and Hashing 22 B-Tree ... Indexing and Hashing 26 Using ... • Variable-size buckets in leaf nodes complicates the B+-tree

File storage with an Introduction to Indexing...Linear Hashing • This is another dynamic hashing scheme, an alternative to Extendible Hashing. • LH handles the problem of long

Chapter 12: Indexing and Hashing - · PDF fileChapter 12: Indexing and Hashing ... tree), it can have between 0 and (n–1) values. A B+tree is a rooted tree satisfying the following

Hash-Based Indexingadrem.uantwerpen.be/sites/default/files/db2-hash-indexes.pdfHash-Based Indexing Hash-Based Indexing Static Hashing Hash Functions Extendible Hashing Search Insertion

Chapter 12: Indexing and Hashing

Dynamic Hashing and Indexing

Chapter 12: Indexing and Hashing - homepages.cwi.nlmanegold/teaching/DBtech/slides/ch12-2.pdf · 1 Database System Concepts 12.1 ©Silberschatz, Korth and Sudarshan Chapter 12: Indexing

Chapter 11: Indexing and Hashing - IIT-Computer Sciencecs.iit.edu/~cs425/slides/ch11-indexing-and-storage.pdf · Chapter 11: Indexing and Storage. ... l Read data that is on the current