33
QUIZ: Source: http://courses.cs.washington.edu/courses/cse444/06wi/lectures/lecture09.pdf Is either set of attributes a superkey? A candidate key?

Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.1Database System Concepts - 6th Edition

QUIZ:

Source: http://courses.cs.washington.edu/courses/cse444/06wi/lectures/lecture09.pdf

Is either set of attributes a superkey? A candidate key?

Page 2: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.2Database System Concepts - 6th Edition

QUIZ: MVD

What MVDs can you

spot in this table?

Source: https://en.wikipedia.org/wiki/Multivalued_dependency

Page 3: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

Database System Concepts, 6th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Chapter 10: Storage and File Structure

Page 4: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.4Database System Concepts - 6th Edition

Chapter 10: Storage and File Structure

1. Overview of Physical Storage Media

2. Magnetic Disks

3. RAID

4. Tertiary Storage

5. File Organization

6. Organization of Records in Files

7. Data Dictionary/Catalog Storage

8. DB Buffer

Page 5: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.5Database System Concepts - 6th Edition

10.5 File Organization

A file is partitioned into fixed-length storage units called blocks,

which are the units of both storage allocation and data transfer

from/to the secondary storage (HDD).

Most DBMSs use block sizes of 4 to 8 kilobytes by default

many DBMSs allow the

block size to be specified

when a DB instance is

created.

Page 6: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.6Database System Concepts - 6th Edition

10.5 File Organization

The database is stored as a collection of files.

Each file is a sequence of records.

A record is a sequence of fields.

Two possible approaches:

• Record size is fixed

• Record size is variable

Page 7: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.7Database System Concepts - 6th Edition

10.5.1 Fixed-Length Records

Even if the records are not really of fixed length (e.g.

varchar below), we assume that each field has max. size:

Record access is simple: Record i starts at byte n (i – 1),

where n is the size of each record (e.g. 53 bytes).

Page 8: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.8Database System Concepts - 6th Edition

10.5.1 Fixed-Length Records

Problem 1: Unless the block size is a multiple of n, the last

record in a block crosses the block boundary

Requires two block accesses!

Modification: leave the fractional record at the end of the

block unused.

Page 9: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.9Database System Concepts - 6th Edition

10.5.1 Fixed-Length Records

Problem 2: What to do when a record ( i ) is deleted?

Possible solutions:

shift records i + 1, . . ., n to i, . . . , n – 1

move record n to i

do not move records, but

link all free records on a

free list

Page 10: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.10Database System Concepts - 6th Edition

Deleting record 3 and shifting

Page 11: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.11Database System Concepts - 6th Edition

Deleting record 3 and moving last record

Page 12: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.12Database System Concepts - 6th Edition

Free Lists (Linked)

Store the address of the first deleted record in the file header.

Can think of these stored addresses as pointers since they

“point” to the location of a record.

For efficiency, reuse the space for normal attributes in the free

records to store pointers. (No pointers stored in in-use records!)

Page 13: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.13Database System Concepts - 6th Edition

QUIZ

Page 14: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.14Database System Concepts - 6th Edition

10.5.2 Variable-Length Records

Variable-length records arise in several ways:

Storage of multiple record types in a file.

E.g. the records represent tuples from different tables

Record types that allow variable lengths for one or more

fields such as strings (varchar)

Record types that allow repeating fields (used in some

older data models).

Page 15: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.15Database System Concepts - 6th Edition

Internal representation of variable-length

records

Attributes are stored in order, but

Variable length attributes represented by a fixed size pair

(offset, length), with actual data stored after all fixed length

attributes

Null values represented by null-value bitmap

Page 16: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.16Database System Concepts - 6th Edition

A slotted page has a header which contains:

The nr. of record entries

The end of free space in the block

The location and size of each record

Representation of variable-length records

inside a block: Slotted page structure

Page 17: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.17Database System Concepts - 6th Edition

Records are allocated contiguously in the page/block, starting

from the end.

Records can be moved around within the page/block to keep

them contiguous (no empty space between them)

header entry is updated on every move

b/c of this, outside pointers should not point directly to

record but to the header entry.

Page 18: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.18Database System Concepts - 6th Edition

What if the data is larger than the block size?

Remember the data types BLOB and CLOB

Large objects are often stored separately from the other

(short) attributes, in special file(s). In this case, the record

containing the large object has only a pointer to the object.

File 1

BLOB 1

File 2

BLOB 2

Page 19: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.19Database System Concepts - 6th Edition

10.6 Organization of Records in Files

Heap – a record can be placed anywhere in the file (in any

block) where there is space

No ordering whatsoever

Sequential – store records in sequential order, based on the

value of the search key of each record

Search key need not be PK, or even superkey!

See next slides

Hashing – a hash function computed on some attribute of

each record; the result specifies in which block of the file the

record should be placed

See Ch.11

Page 20: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.20Database System Concepts - 6th Edition

10.6.1 Sequential File Organization

The records in the file are ordered by a search-key

Suitable for applications (e.g. queries) that require sequential

processing of the entire file

Page 21: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.21Database System Concepts - 6th Edition

Sequential File Organization (Cont.)

Deletion – use pointer chains

Insertion –locate the position where the record is to be inserted

if there is free space insert there

if no free space, insert the record in an overflow block

In either case, pointer chain must be updated

Need to reorganize the file

from time to time to restore

sequential order

Page 22: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.22Database System Concepts - 6th Edition

10.6.2 Multitable Clustering File Organization

Many large-scale DBMSs do not rely directly on the underlying

OS for file management.

Instead, the OS allocates one large file to the DBMS, and the

DBMS stores all relations in this one file, and manages the file

itself.

Even if multiple relations are stored in this a single large file,

the default is to store records of only one relation in a given

block.

• This simplifies data management.

Page 23: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.23Database System Concepts - 6th Edition

However, in some cases it can be useful to store records of

more than one relation in a single block. This is called

multitable clustering. Example:

department

instructor

multitable clustering

of department and

instructor

Page 24: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.24Database System Concepts - 6th Edition

Multitable Clustering File Organization (cont.)

good for queries involving department instructor, and for

queries involving one single department and its instructors

bad for queries involving only department

results in variable size records

Can add pointer chains to link records of a particular relation

Page 25: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.25Database System Concepts - 6th Edition

10.7 Data Dictionary Storage

Information about relations

names of relations

names, types and lengths of attributes of each relation

names and definitions of views

integrity constraints

User and accounting information, including passwords

Statistical and descriptive data

number of tuples in each relation

Physical file organization information

How relation is stored (sequential/hash/…)

Physical location of relation

Information about indices (Chapter 11)

The Data dictionary (also called system catalog) stores

metadata; that is, data about data, such as:

Page 26: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.26Database System Concepts - 6th Edition

Relational Representation of Dictionary/Catalog

Relational

representation on

disk

Specialized data

structures

designed for

efficient access, in

memory

If multiple indices,

this can be multi-

valued, so the

dictionary DB is

not even in 1NF!

Page 27: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.27Database System Concepts - 6th Edition

PostgreSQL example

Page 28: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.28Database System Concepts - 6th Edition

10.8 DB Buffer

A DB file is partitioned into fixed-length storage units called

blocks.

The DBMS seeks to minimize the number of block transfers

between disk and main memory (MM). We can reduce the

number of disk accesses by keeping as many blocks as

possible in MM.

Buffer = portion of main memory available to store copies of

disk blocks.

Buffer manager = subsystem responsible for allocating

buffer space in MM.

Page 29: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.29Database System Concepts - 6th Edition

Buffer Manager (BM)

A program places call to BM when it needs a block from disk.

1. If the block is already in the buffer, BM returns to the program the

MM address of the block.

2. If the block is not in the buffer, the BM does the following:

1. Allocates space in buffer for block

1. Replaces (throws out) some other block, if required, to make

space for the new block.

2. Replaced block written back to disk only if it was modified

since the most recent time that it was written to/fetched from

the disk.

2. Reads the block from the disk to the buffer, and returns the

address of the block in MM to program.

Page 30: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.30Database System Concepts - 6th Edition

Buffer Replacement Policies

Least recently used (LRU) strategy is popular in OSs

Idea behind LRU: use past pattern of block references as a

predictor of future references

However, queries have well-defined access patterns (such as

sequential scans), so a DBMS can use the information in a

user’s query to better predict future references

Page 31: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.31Database System Concepts - 6th Edition

Buffer-Replacement Policies (Cont.)

Pinned block – memory block that is not allowed to be written

back to disk.

Most recently used (MRU) strategy – system must pin the

block currently being processed. After the final tuple of that

block has been processed, the block is unpinned, and it

becomes the most recently used block.

A.k.a. toss-immediate

Buffer manager can use statistical information regarding the

probability that a request will reference a particular relation

E.g., the data dictionary is frequently accessed →

Heuristic: keep data-dictionary blocks in MM buffer

Page 32: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.32Database System Concepts - 6th Edition

LRU can be a bad strategy for certain access

patterns involving repeated scans of data

Example: Compute join of 2 relations r and s by nested loop:

for each tuple tr of r do

for each tuple ts of s do

if the tuples tr and ts match …

A mixed strategy is preferable:

Toss immediate for r

LRU for s

Page 33: Chapter 10: Storage and File Structures - Faculty Websites · 2019. 12. 16. · Database System Concepts - 6th Edition 10.14 ©Silberschatz, Korth and Sudarshan 10.5.2 Variable-Length

©Silberschatz, Korth and Sudarshan10.33Database System Concepts - 6th Edition

Homework for Ch.10

End-of-chapter exercises 4, 6, 7, 17, 18

EOL 1