Indexing

Indexing

2421: Database Systems - Index Structures

Cost Model for Data Access

Data should be stored such that it can be accessed fast

Evaluation of Access Methods based on measuring the number of page I/O’s disk access in general more costly than CPU costs CPU costs considered to be negligible in comparison with I/O

Analysis ignores gains of pre-fetching blocks of pages; thus, even I/O cost is only approximated.

Average-case analysis; based on several simplistic assumptions. Good enough to show the overall trends!


Typical Operations Scan over all records

SELECT * FROM Employee Equality Search

SELECT * FROM Employee WHERE eid = 100 Range Search

SELECT * FROM Employee WHERE age > 30 and age <= 50 Insert

INSERT INTO Employee VALUES (23, ‘lilly’, 37) Delete

DELETE FROM Employee WHERE eid = 100 DELETE FROM Employee WHERE age >30 AND age <= 50

Update Delete+insert


Alternative File Organizations

Various alternative file organizations exist; each is ideal for some operations, not so good in others:

Heap files: Linked, unordered list of all pages of the file (e.g., per relation)

Suitable when typical access is a file scan retrieving all records.

Costs for equality search (read on avg. half the pages) and range search (read all pages) is high.

Cost for insert low (insert anywhere) Cost for delete/update is cost of executing WHERE clause

Sorted Files: Records are ordered according to one or more attributes of the relation

Outperforms heap files for equality and range queries on the ordering attribute (find first qualifying page with binary search in log2(number-of-pages)

Also good for ordered output High insert and delete/update costs


Indexes Even sorted file only support queries on sorted attributes.

In order to speed up selections on any collection of attributes, we can build an index for a relation over this collection. Additional information that helps finding specific tuples faster

We call the collection of attributes over which the index is built the search key attributes for the index.

Any subset of the attributes of a relation can be the search key for an index on the relation.

Search key is not the same as primary key / key candidate (minimal set of attributes that uniquely identify a record in a relation).


B+ Tree: The Most Widely Used Index

Each node/leaf represents one page Since the page is the transfer unit to disk

Leafs contain data entries (denoted as k*) For now, assume each data entry represents one tuple. The data entry consists of two parts

Value of the search key Record identifier (rid = (page-id, slot))

Root and inner nodes have auxiliary index entries

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Index for SailorsOn attribute sid


B+ Tree (contd.) keep tree height-balanced.

Each path from root to tree has the same height F = fanout = number of children for each node (~ number of

index entries stored in node) N = # leaf pages Insert/delete at log F N cost; Minimum 50% occupancy (except for root).

Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree.

Supports equality and range-searches efficiently.

Index Entries

Data Entries


Example B+ Tree Example tree has height 2 Assume “Select * from Emp where eid = 5” Search begins at root, and key comparisons direct

it to a leaf Search for 5*, 15*, all data entries >= 24* ... Good for equality search AND range queries

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13


Inserting a Data Entry Find correct leaf L. Put data entry onto L.

If L has enough space, done! Else, must split L (into L and a new node L2)

Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L.

This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)

Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top.


Inserting 8* into Example B+ Tree

2* 3* 5* 7*

Assume that inner pagesCan only contain 4 index entries

2* 3* 5* 7*

Insert into Leaf with leaf split

Insert into internal node with node split

8*

17 24 301317 24 30135

17 24 3013 5 24 3013

17


Example: After Inserting 8*

Notice that root was split, leading to increase in height.

In this example, we can avoid split by redistributing entries; however, this is usually not done in practice.

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*


Data Entry k* in Index An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k.

Three alternatives: (1) Data tuple with key value k (direct indexing) (2) <k, rid of data record with search key value k> (indirect indexing)

(3) <k, list of rids of data records with search key k> (indirect indexing)

Choice of alternative for data entries is orthogonal to the indexing technique (B-tree, hashing etc.)


Alternatives for Data Entries

Alternative 1: (direct indexing) If this is used, index structure is a file organization for data records (like sorted files).

At most one direct index on a given collection of data records. (Otherwise, data records duplicated, leading to redundant storage and potential inconsistency.)

If data records very large, # of pages containing data entries is high. Implies size of auxiliary information in the index is also large, typically.

NOTE: FOR THE REST OF THIS COURSE WE WILL NOT CONSIDER DIRECT INDEXING ANYMORE

Index entries

Data entries = data records


Alternatives for Data Entries (Contd.)

Alternatives 2 and 3: (indirect indexing) Data entries typically much smaller than data records. So, better than direct indexing with large data records, especially if search keys are small.

If more than one index is required on a given file, at most one direct index; rest must use indirect indexing

Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.Index entries

Data entries

Data records


Index Classification

Primary vs. secondary: If search key contains primary key, then called primary index. Unique index: Search key contains a candidate key.

Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of data entries, then called clustered index. Clustered index can be of alternatives 1, 2 and 3 ! A file can be clustered on at most one search key. Cost of retrieving data records through index varies greatly based on whether index is clustered or not!


Clustered vs. Unclustered Index

Assume data itself (the real tuples) is stored in a Heap file. To build clustered index, first sort the Heap file (with some free space on each page for future inserts).

Overflow pages may be needed for inserts. (Thus, order of data records is `close to’, but not identical to, the sort order.)

Index entries

Data entries

(Data file)

Data Records Data Records

CLUSTERED UNCLUSTERED


B+-tree cost example

Relation R(A,B,C,D,E,F) A and B are int (each 6 Bytes), C-F is char[40] (160 Bytes)

Size of tuple: 172 Bytes 200,000 tuples Each data page has 4 K and is around 80% full

200,000*172/(0.8*4000) = 10750 pages Values of B are within [0;19999] uniform distribution Non-clustered B-tree for attribute B, alternative (2) An index page has 4K and intermediate pages are filled between 50% - 100%

The size of an rid = 10 Bytes The size of a pointer in intermediate pages: 8 Bytes Index entry in root and intermediate pages: size(key)+size(pointer) = 6 Bytes + 8 Bytes = 14 Bytes


Size of B+tree

The average number of rids per data entry Number of tuples / different values (if uniform) (Example 200,000/20,000 = 10)

The average length per data entry: Key value + #rids * size of rid (Example: 6 + 10*10 = 106)

The average number of data entries per leaf page: Fill-rate * page-size / length of data entry Example: 0.75*4000 / 106 = 28 entries per page

The estimated number of leaf pages: Number of entries = number of different values / #entries per page

Example 20000 / 28 = 715 Number of entries intermediate page:

Fill-rate * page-size /length of index entry Min fill-rate: 0.5, max fill rate: 1 Example: 0.5 * 4000 / 14 = 143 entries ; 1* 4000/14 = 285 entries

Height is 3: the root has between three and four children Three children: each child has around 715/3 = 238 entries Four children: each child has around 715/4 = 179 entries


B+ Trees in Practice Typical order d of inner nodes: 100 (I.e., an inner

node has between 100 and 200 index entries) Typical fill-factor: 67%. average fanout = 133

Leaf nodes have often less entries since data entries larger (rids)

Typical capacities (order of inner nodes 100, leaf with 100 rids): Height 4: 1334 = 312,900,721 records Height 3: 1333 = 2,352,637 records Height 2: 1332 = 17,680 records

Can often hold top levels in buffer pool: Level 1 (root) = 1 page = 4 Kbytes Level 2 = 133 pages = 0.5 Mbyte Level 3 = 17,689 pages = 70 MBytes


Index in DB2 Simple

Create index ind1 on Sailors(sid); drop index ind1;

Index also good for referential integrity (uniqueness) Create unique index ind1 on Sailors(name)

Additional attributes Create unique index ind1 on Sailors(sid) include (name)

Index only on sid Data entry contains key value (sid) + name + rid SELECT name FROM Sailors WHERE sid = 100

Can be answered without accessing Sailors relation! Clustered index

Create index ind1 on Sailors(sid) cluster


Index in DB2

Index on multiple attributes: Create index ind1 on Sailors(Age,Rating); Order is important:

Here data entries are first ordered by age Sailors with the same age are then ordered by rating

Supports: SELECT * FROM Sailors WHERE age = 20; SELECT * FROM Sailors WHERE age = 20 AND rating < 5;

Does not support SELECT * FROM Sailors WHERE rating < 5;



Summary for B+-trees Tree-structured indexes are ideal for range-searches, also good for equality searches. High fanout (F) means depth rarely more than 3 or 4.

Almost always better than maintaining a sorted file.

Can have several indices on same tables (over different attributes)

Most widely used index in database management systems because. One of the most optimized components of a DBMS.


File Organizations Hashed Files:.

File is a collection of buckets. Bucket = primary page plus zero or more overflow pages.

Hashing function h: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.

Best for equality search (only one page access and maybe access to overflow page)

No advantage for range queries Fast insert Cost on delete depends on cost for WHERE clause


More comments on B+-trees Corresponding delete operations exist that might merge subtrees

B+-trees for predicate locking Locking B+-trees

To allow concurrent access to the B-tree, internal locking protocol used (non 2PL -- in case of abort: logical undo!!!)

Special Index Locks used to implement “predicate-locking”

Documents

Indexing