Lecture 9: File Organization and Indexing · Lecture 9: File Organization and Indexing . Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management

Database Systems

Mohamed Zahran (aka Z)

[email protected]

http://www.mzahran.com

CSCI-GA.2433-001

Lecture 9: File Organization and Indexing

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

DBMS Layers

Organizes data carefully to support fast access to desired subsets of records.

File organization: is a method of arranging the records in a file when the file is stored on disk. A relation is typically stored as a file of records.

Tuples/Records

Relations

Sectors

Files

Blocks

Pages

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

• Stores records in a file in a collection of disk pages • Keeps track of pages allocated to each file. • Tracks available space within pages allocated to the file.

Do you think it is a good idea to sort the records stored in a file? Why?

Example: Suppose we have a relation with fields: name, age, and salary. How will do we sort it?

Records Index

What is an index?

Index

• It is: – a data structure – a pointer (called data entry in the textbook) to a

data record – organized based on search key

• Three alternatives to indices and data record interaction – Put the data record with the index – Store a record ID in the index to point to the data

record – Store a list of record IDs of data record with the

same search key value

Special Case: Clustered Indexes

Definition: The ordering of data records is the same as, or close to, the ordering of some index.

Why is it important? Reduces the cost of using an index to answer a range of search queries.

But: Too expensive to maintain when the data is updated.

If data records cannot be kept sorted, how can we speed-up the search?

Very Un-special case: Heap File!

• The simplest file structure

• It is an unordered file

• Records are stored in a random order across the pages of the file

Hash-Based Indexing

• Use a hash function h(r) where r is a field value

• The output of h(r) points to a bucket. • Bucket = primary page plus zero or more

overflow pages • The buckets contain <key, rid> or <key, rid-

list> pairs. • Hash-based indexes are best for equality

selections cannot support range searches

Hash-Based Indexing

Hash- based indexes Index points to data records

Tree-Based Indexing

Example: Find all data entries with 24<age<50

Number of disk I/Os = the length of a path from the root to a leaf + the number of leaf pages with qualifying data entries

Example to Show Cost!

• Assume our search key is <age,sal> • Assume 5 configurations:

– heap file – File sorted on <age,sal> – clustered B+ tree file sorted on <age,sal> – heap file with unclustered B+ tree index on <age,sal> – heap file with an unclustered hash index on <age,sal>

• The operations we consider are: – Scan (fetch all records in a file) – Search with equality selection – Search with range selection – Insert a record – Delete a record


• Cost model: – B: number of data pages – R: number of records per page – D: average time to read or write a disk

page (typical 15 msec) – C: average time to process a record (typical

100 nanosec) – H: time to apply the hash function (if we

use hashing) (typical 100 nanonsec) – F: fanout of a tree (if we use a tree)


(a) Scan (b) Equality (c ) Range (d) Insert (e) Delete

(1) Heap BD 0.5BD BD 2D Search +D

(2) Sorted BD Dlog 2B Dlog 2 B + # matches

Search + BD

Search +BD

(3) Clustered 1.5BD Dlog F 1.5B Dlog F 1.5B + # matches

Search + D

Search +D

(4) Unclustered Tree index

BD(R+0.15) D(1 + log F 0.15B)

Dlog F 0.15B + # matches

D(3 + log F 0.15B)

Search + 2D

(5) Unclustered Hash index

BD(R+0.125)

2D BD 4D Search + 2D

The Table above shows and average of the I/O cost only.

Before Choosing Your Index: Know Your Workload

• For each query in the workload: – Which relations does it access? – Which attributes are retrieved? – Which attributes are involved in selection/join

conditions? How selective are these conditions likely to be?

• For each update in the workload: – Which attributes are involved in selection/join

conditions? How selective are these conditions likely to be?

– The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.

Index Choice

• What indexes should we create? – Which relations should have indexes? What

field(s) should be the search key? Should we build several indexes?

• For each index, what kind of an index should it be? – Clustered? Hash/tree? – Hash-based are optimized for equality – Tree-based supports equality and range – Sorted file is pretty expensive to maintain

Index Choice

• One approach: – Consider the most important queries in turn. – Consider the best plan using the current indexes – see if a better plan is possible with an additional

index. – If so, create it.

• Before creating an index, must also consider the impact on updates in the workload! – Trade-off: Indexes can make queries go faster,

updates slower. Require disk space, too.

Guidelines

• Attributes in WHERE clause are candidates for index keys. – Exact match condition suggests hash index. – Range query suggests tree index.

• Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.

• Multi-attribute search keys should be considered when a WHERE clause contains several conditions. – Order of attributes is important for range queries.

• Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.

Examples

• Clustered

• B+ tree index on E.age can be used to get qualifying tuples.

SELECT E.dno

FROM Emp E

WHERE E.age>40

Examples

• If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly.

• Clustered E.dno index may be better!

SELECT E.dno, COUNT (*)

FROM Emp E

WHERE E.age>10

GROUP BY E.dno

Examples

• To retrieve Emp records with age=30 AND sal=4000, an index on <age,sal> would be better than an index on age or an index on sal.

• If condition is: 20<age<30 AND 3000<sal<5000: – Clustered tree index on <age,sal>

or <sal,age> is best. • If condition is: age=30 AND

3000<sal<5000: – Clustered <age,sal> index much

better than <sal,age> index! • Composite indexes are larger,

updated more often.

sue 13 75

bob

cal

joe 12

10

20

80 11

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data records sorted by name

Data entries in index sorted by <sal,age>

Data entries sorted by <sal>

Examples of composite key indexes

A Closer look Into Pages

• How pages store records?

• How pages are organized into

file?

• A file can span several pages.

Tuples/Records

Relations

Sectors

Files

Blocks

Pages

Pages and Heap Files

• Every record has a unique rid • Every page in the file has the same size. • Supported operations include:

– create and destroy files – insert/delete a record with given rid – get a record with given rid – scan all records in the file

• Given the id or the record, we must be able to find the id of the page containing the record

Pages and Heap Files

• We must keep track of the pages in each heap file to support scans.

• We must keep track of pages with empty spaces to support efficient insertions.

How to maintain this info?

Linked List of Pages

• Doubly linked list of pages • DBMS remembers where is the first page

(called header page) located • Each pointer is really a page id • If new page is required, a request is made to

the disk space manager.

Header Page

Data Page

Data Page

Data Page

Data Page

Data Page

Data Page

Pages with Free Space

Full Pages

Linked List of Pages

• Disadvantage: If records are of variable length: – virtually all pages will be on the free list!

– Inserting a record may require retrieving and examining several pages

Header Page

Data Page

Data Page

Data Page

Data Page

Data Page

Data Page

Pages with Free Space

Full Pages

Directory of Pages

• DBMS must remember where the first directory page of each heap file is located.

• The Directory itself is a collection of pages.

• Each directory entry identify a page or sequence of pages.

Data Page 1

Data Page 2

Data Page N

Header Page

DIRECTORY

Directory of Pages

• Size of the directory is likely to be very small in comparison to the size of the heap file.

• Each directory entry can indicate whether the corresponding page has free entry or the size of free space.

Data Page 1

Data Page 2

Data Page N

Header Page

DIRECTORY

Page Format

• Page abstraction is useful for I/O issues.

• Higher levels of DBMS see data as collection of records.

• How can a collection of records be arranged on a page? – Think of it as: group of slots

– Each slot contains a record

– Record identified by: < page ID, slot #> rid

– Other approaches are also possible

Page Format: Fixed Length Records

• Record slots are uniform and can be arranged consecutively within a page.

• At any instant: some slots are occupied by records and some are not.

• How do we keep track of empty slots and how do we locate all records on a page?


• Alternative 1: – Store records in the first N slots

– Whenever a record is deleted move the last record on the page into the vacated slot.

– Advantage: Can locate the ith record on the page with simple offset calculation.

– Advantage: All empty slots appear at the end of the page.

– Disadvantage: Does not work if there are external references to the record that moved.


• Alternative 2: – Handle deletion using array of bit

– One bit per slot to keep track of free slots

– When a page is deleted, its bit is turned off (i.e. 0).

– Locating records on the page requires scanning the bit array to locate slots whose bit is on.


Slot 1 Slot 2

Slot N

. . . . . .

N M 1 0 . . .

M ... 3 2 1

PACKED UNPACKED, BITMAP

Slot 1 Slot 2

Slot N

Free Space

Slot M

1 1

number of records

number of slots

Page Format: Variable Length Records

• Cannot divide the page into fixed-length slots

• Challenge: When a new record is to be inserted, we have to find an empty slot of just the right length.

• Challenge: We must ensure that the free space on the page is contiguous.

• So The ability to move records on a page becomes very important


• Directory of slots

• <record offset, record length> per slot

• record offset: offset in bytes from the start of the data area to the start of the record

• Deletion: setting record offset to -1

• rid <page id, slot id> does not change when a record moves.


• Maintain a pointer to the start of the free space area

• When a new record does not fit into the remaining free space move records in the page to reclaim space deleted earlier.

• Cannot always remove a slot of a deleted record (or the rid of the other slots will change).

• When a new record is inserted, the directory is scanned for an element not pointing to a record.


Page i Rid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointer to start of free space

SLOT DIRECTORY

N . . . 2 1

20 16 24 N

# slots

Page Format

• Beside slots information, a page usually contains file-level information (e.g. id of the next page, etc).

• The slotted page organization used for variable length records can also be used for fixed-length records

What about Records?

• How to organize fields within a record?

• Issues we have to take into account: – Fields of the record are of fixed or

variable length

– cost of various operations on the records

• Information common to all records (e.g. number of fields, types, …) are stored in the system catalog

Fixed-Length Records

• Each field has a fixed length

• Number of fields is also fixed

• Fields can be stored consecutively

Base address (B)

L1 L2 L3 L4

F1 F2 F3 F4

Address = B+L1+L2

Variable-Length Records

• Alternative 1: – Store fields consecutively – Separate fields by delimiters – Disadvantage: requires the scan of the whole

record to reach a field

• Alternative 2: – Reserve space at the beginning of the record

for use as an array of integer offsets – The ith entry is the starting address of the ith

field relative to the start of the record.

Variable-Length Records

4 $ $ $ $

Field Count

Fields Delimited by Special Symbols

F1 F2 F3 F4

Array of Field Offsets

Issues in Variable Length Records

• Modifying a field may cause it to grow need to shift subsequent fields to make room.

• A modified record may no longer fit in the available space on its page – May need to move to another page – rid will change causing problems – We must leave a forwarding address in the old

place • A record may grow so large that it cannot fit on

any page – Break a record into smaller records – Chain them together

OS does disk space & buffer mgmt: why not let OS manage these tasks?

• Differences in OS support: portability issues

• Some limitations, e.g., files can’t span disks.

• Buffer management in DBMS requires ability to: – pin a page in buffer pool, force a page to disk

(important for implementing CC & recovery), • adjust replacement policy, and pre-fetch

pages based on access patterns in typical DB operations

Conclusions

• Many alternative file organizations exist, each appropriate in some situation.

• If selection queries are frequent, sorting the file or building an index is important.

• Indexes support efficient retrieval of records based on the values in some fields.

• Understanding the nature of the workload for the application, and the performance goals, is essential to developing a good design.

Documents

Lecture 9: File Organization and Indexing · Lecture 9: File Organization and Indexing . Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management