Files & Indexing. Files of Records uPage or block is OK when doing I/O, but higher levels of DBMS operate…

Files & Indexing

Files of Records Page or block is OK when doing I/O, but

higher levels of DBMS operate on records, and files of records.

FILE: A collection of pages, each containing a collection of records. Must support: insert/delete/modify records read a particular record (specified using record

id) scan records (possibly with some conditions on

the records to be retrieved)

Alternative File OrganizationsMany alternatives exist, tradeoffs for each:

Heap files: • Suitable when typical access is file scan of all records.

Sorted Files: • Best for retrieval in search key order• Also good for search based on search key

Indexes: Organize records via trees or hashing. • Like sorted files, speed up searches for search key

fields• Updates are much faster than in sorted files.

Unordered (Heap) Files Simplest file structure contains records in no

particular order. As file grows and shrinks, disk pages are

allocated and de-allocated. To support record level operations, we must:

keep track of the pages in a file keep track of free space on pages keep track of the records on a page

There are many alternatives for keeping track of this.

Heap File Implemented as a List

The header page id and Heap file name must be stored someplace.

Problem: Most pages might be on free space list (holes)

HeaderPage

DataPage

DataPage

DataPage

DataPage

DataPage

DataPage Pages with

Free Space

Full Pages

Heap File Using a Page Directory

The entry for a page can include the number of free bytes on the page.

The directory is a collection of pages; linked list implementation is just one alternative. Much smaller than linked list of all HF pages!

DataPage 1

DataPage 2

DataPage N

HeaderPage

DIRECTORY

Indexes Sometimes need to retrieve records by the values

in one or more fields, e.g., Find all students in the “CS” department Find all students with a gpa > 3

An index on a file is a: Disk-based data structure Speeds up selections on the search key fields for the

index. Any subset of the fields of a relation can be index search

key Search key is not the same as (candidate) key

• (e.g. doesn’t have to be unique). An index

Contains a collection of index and data entries Supports efficient retrieval of all records with a given

search key value k.

Given condition(s) on attribute(s) find qualified records

Attr = value

Condition may also be Attr>value Attr>=value

valueQualified records

valuevalue

Goal of Indexing

9

First Question About Indexes

What kinds of selections do they support? Selections of form field <op> constant Equality selections (op is =) Range selections (op is one of <, >, <=, >=,

BETWEEN) More exotic selections:

• 2-dimensional ranges (“east of Troy and west of Schenectady and North of Albany and South of Watervliet”)

– Or n-dimensional• 2-dimensional distances (“within 2 miles of Sage Hall”)

– Or n-dimensional• Ranking queries (“10 italian restaurants closest to Troy”)• Regular expression matches, genome string matches,

etc.

Alternatives for Data Entry k* in Index

Three alternatives: Actual data record (with key value k) <k, rid of matching data record> <k, list of rids of matching data records>

Choice is orthogonal to the indexing technique. techniques: B+ trees, hash-tables, R trees, … Typically, index contains auxiliary information

that directs searches to the desired data entries

Can have multiple (different) indexes per file. E.g. file sorted by age, with a hash index on

salary and a B+tree index on name.

Basic Indexing Methods Indexed Sequential File B-Tree Hash Index

Indexed Sequential File Search key ( primary key) Primary index (on Sequencing field)

The index on the attribute (a.k.a. search key) that determines the sequencing of the table

Secondary index Index on any other attribute

Dense index (all Search Key values in) Sparse index Multi-level index

Sequential File

2010

4030

6050

8070

10090

Tuples are sorted by their primary key

Block

Sequential File

2010

4030

6050

8070

10090

Dense Index102030405060708090

100110120

Index file needs much fewer blocks than the data file, hence easier to fit in memory

For a given key K, only log2n, out of n, index blocks need to be accessed

Sequential File

2010

4030

6050

8070

10090

Sparse Index1030507090

110130150170190210230

Typically, only one key per data block

Find the index record with largestvalue that is less or equal to thevalue we are looking

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level1030507090110130150170190210230

1090

170250

330410490570

Treat the index as a file and build an index on it

• Two levels are usually sufficient

• More than three levels are rare

{FILE,INDEX} may be contiguous or not

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Deletion from sparse index– delete record 40

2010

4030

6050

8070

10305070

90110130150

If the deleted entry does not appear in the index do nothing


2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

If the deleted entry appears in the index replace it with the next search-key value


2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

If the next search key value has its own index entry, then delete the entry

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Deletion from dense primary index file is handled in the same way with deletion from a sequential file

Insertion, sparse index case

2010

30

5040

60

10304060


2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!


2010

30

5040

60

10304060


1520

3020

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index


2010

30

5040

60

10304060


overflow blocks(reorganize later...)

• How often do we reorganize and how expensive is it?B-Trees offer convincing answers

Index (sequential)

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

Insertion Example

Conventional Indexes

Advantage: Simple algorithms Index is sequential file

• good for scans Disadvantage:

Inserts expensive, and/or Eventually sequentiality is lost because of overflows

• reorganizations are needed

B+-Tree Index

B+ Tree Indexes

Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages contain index entries and direct searches:

P0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leafPages

Pages Leaf

Example B+ Tree

Find 28*? 29*? All > 15* and < 30* Insert/delete: Find data entry in leaf, then

change it. Need to adjust parent sometimes. And change sometimes bubbles up the tree

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Entries < 17 Entries >= 17

B+ Tree: Most Widely Used Index

Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages)

Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree.

Supports equality and range-searches efficiently.

Index Entries

Data Entries("Sequence set")

(Direct search)

Example B+ Tree Search begins at root, and key comparisons

direct it to a leaf. Search for 5*, 15*, all data entries >= 24* ...

Based on the search for 15*, we know it is not in the tree!

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Inserting into a B+ Tree Find correct leaf L. Put data entry onto L.

If L has enough space, done! Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

This can happen recursively To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height.

Tree growth: gets wider or one level taller at top.

Inserting 8* into Example B+ Tree

Observe how minimum occupancy is guaranteed in both leaf and index page splits.

Note difference between copy-up and push-up; be sure you understand the reasons for this.

2* 3* 5* 7* 8*

5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

s copied up and

appears once in the index. Contrast

5 24 30

17

13

Entry to be inserted in parent node.(Note that 17 is pushed up and only

this with a leaf split.)

Example B+ Tree After Inserting 8*

Notice that root was split, leading to increase in height. In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice.

2* 3*

Root17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Deleting from a B+ Tree Start at root, find leaf L where entry belongs. Remove the entry.

If L is at least half-full, done! If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing

to L or sibling) from parent of L. Merge could propagate to root, decreasing

height.

Example Tree After (Inserting 8*, Then)

Deleting 19* and 20* ...

Deleting 19* is easy. Deleting 20* is done with re-distribution.

Notice how middle key is copied up.

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

... And Then Deleting 24* Must merge.

Observe `toss’ of index entry (on right), and `pull down’ of index entry (below).

30

22* 27* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root30135 17

Non-leaf Re-distribution Tree is shown below during deletion

of 24*. (What could be a possible initial tree?)

In contrast to previous example, can re-distribute entry from left child of root to right child.

Root

135 17 20

22

30

14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*

After Re-distribution Intuitively, entries are re-distributed by

`pushing through’ the splitting entry in the parent node.

It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.

14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*

Root

135

17

3020 22

Bulk Loading of a B+ Tree If we have a large collection of records, and we

want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.

Bulk Loading can be done much more efficiently.

Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

Bulk Loading (Contd.) Index entries for leaf

pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)

Much faster than repeated inserts, especially when one considers locking!

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ treeData entry pages

Documents

Files & Indexing. Files of Records uPage or block is OK when doing I/O, but higher levels of DBMS operate…