Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
©Silberschatz, Korth and Sudarshan10.1Database System Concepts - 6th Edition
QUIZ:
Source: http://courses.cs.washington.edu/courses/cse444/06wi/lectures/lecture09.pdf
Is either set of attributes a superkey? A candidate key?
©Silberschatz, Korth and Sudarshan10.2Database System Concepts - 6th Edition
QUIZ: MVD
What MVDs can you
spot in this table?
Source: https://en.wikipedia.org/wiki/Multivalued_dependency
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 10: Storage and File Structure
©Silberschatz, Korth and Sudarshan10.4Database System Concepts - 6th Edition
Chapter 10: Storage and File Structure
1. Overview of Physical Storage Media
2. Magnetic Disks
3. RAID
4. Tertiary Storage
5. File Organization
6. Organization of Records in Files
7. Data Dictionary/Catalog Storage
8. DB Buffer
©Silberschatz, Korth and Sudarshan10.5Database System Concepts - 6th Edition
10.5 File Organization
A file is partitioned into fixed-length storage units called blocks,
which are the units of both storage allocation and data transfer
from/to the secondary storage (HDD).
Most DBMSs use block sizes of 4 to 8 kilobytes by default
many DBMSs allow the
block size to be specified
when a DB instance is
created.
©Silberschatz, Korth and Sudarshan10.6Database System Concepts - 6th Edition
10.5 File Organization
The database is stored as a collection of files.
Each file is a sequence of records.
A record is a sequence of fields.
Two possible approaches:
• Record size is fixed
• Record size is variable
©Silberschatz, Korth and Sudarshan10.7Database System Concepts - 6th Edition
10.5.1 Fixed-Length Records
Even if the records are not really of fixed length (e.g.
varchar below), we assume that each field has max. size:
Record access is simple: Record i starts at byte n (i – 1),
where n is the size of each record (e.g. 53 bytes).
©Silberschatz, Korth and Sudarshan10.8Database System Concepts - 6th Edition
10.5.1 Fixed-Length Records
Problem 1: Unless the block size is a multiple of n, the last
record in a block crosses the block boundary
Requires two block accesses!
Modification: leave the fractional record at the end of the
block unused.
©Silberschatz, Korth and Sudarshan10.9Database System Concepts - 6th Edition
10.5.1 Fixed-Length Records
Problem 2: What to do when a record ( i ) is deleted?
Possible solutions:
shift records i + 1, . . ., n to i, . . . , n – 1
move record n to i
do not move records, but
link all free records on a
free list
©Silberschatz, Korth and Sudarshan10.10Database System Concepts - 6th Edition
Deleting record 3 and shifting
©Silberschatz, Korth and Sudarshan10.11Database System Concepts - 6th Edition
Deleting record 3 and moving last record
©Silberschatz, Korth and Sudarshan10.12Database System Concepts - 6th Edition
Free Lists (Linked)
Store the address of the first deleted record in the file header.
Can think of these stored addresses as pointers since they
“point” to the location of a record.
For efficiency, reuse the space for normal attributes in the free
records to store pointers. (No pointers stored in in-use records!)
©Silberschatz, Korth and Sudarshan10.13Database System Concepts - 6th Edition
QUIZ
©Silberschatz, Korth and Sudarshan10.14Database System Concepts - 6th Edition
10.5.2 Variable-Length Records
Variable-length records arise in several ways:
Storage of multiple record types in a file.
E.g. the records represent tuples from different tables
Record types that allow variable lengths for one or more
fields such as strings (varchar)
Record types that allow repeating fields (used in some
older data models).
©Silberschatz, Korth and Sudarshan10.15Database System Concepts - 6th Edition
Internal representation of variable-length
records
Attributes are stored in order, but
Variable length attributes represented by a fixed size pair
(offset, length), with actual data stored after all fixed length
attributes
Null values represented by null-value bitmap
©Silberschatz, Korth and Sudarshan10.16Database System Concepts - 6th Edition
A slotted page has a header which contains:
The nr. of record entries
The end of free space in the block
The location and size of each record
Representation of variable-length records
inside a block: Slotted page structure
©Silberschatz, Korth and Sudarshan10.17Database System Concepts - 6th Edition
Records are allocated contiguously in the page/block, starting
from the end.
Records can be moved around within the page/block to keep
them contiguous (no empty space between them)
header entry is updated on every move
b/c of this, outside pointers should not point directly to
record but to the header entry.
©Silberschatz, Korth and Sudarshan10.18Database System Concepts - 6th Edition
What if the data is larger than the block size?
Remember the data types BLOB and CLOB
Large objects are often stored separately from the other
(short) attributes, in special file(s). In this case, the record
containing the large object has only a pointer to the object.
File 1
BLOB 1
File 2
BLOB 2
©Silberschatz, Korth and Sudarshan10.19Database System Concepts - 6th Edition
10.6 Organization of Records in Files
Heap – a record can be placed anywhere in the file (in any
block) where there is space
No ordering whatsoever
Sequential – store records in sequential order, based on the
value of the search key of each record
Search key need not be PK, or even superkey!
See next slides
Hashing – a hash function computed on some attribute of
each record; the result specifies in which block of the file the
record should be placed
See Ch.11
©Silberschatz, Korth and Sudarshan10.20Database System Concepts - 6th Edition
10.6.1 Sequential File Organization
The records in the file are ordered by a search-key
Suitable for applications (e.g. queries) that require sequential
processing of the entire file
©Silberschatz, Korth and Sudarshan10.21Database System Concepts - 6th Edition
Sequential File Organization (Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is to be inserted
if there is free space insert there
if no free space, insert the record in an overflow block
In either case, pointer chain must be updated
Need to reorganize the file
from time to time to restore
sequential order
©Silberschatz, Korth and Sudarshan10.22Database System Concepts - 6th Edition
10.6.2 Multitable Clustering File Organization
Many large-scale DBMSs do not rely directly on the underlying
OS for file management.
Instead, the OS allocates one large file to the DBMS, and the
DBMS stores all relations in this one file, and manages the file
itself.
Even if multiple relations are stored in this a single large file,
the default is to store records of only one relation in a given
block.
• This simplifies data management.
©Silberschatz, Korth and Sudarshan10.23Database System Concepts - 6th Edition
However, in some cases it can be useful to store records of
more than one relation in a single block. This is called
multitable clustering. Example:
department
instructor
multitable clustering
of department and
instructor
©Silberschatz, Korth and Sudarshan10.24Database System Concepts - 6th Edition
Multitable Clustering File Organization (cont.)
good for queries involving department instructor, and for
queries involving one single department and its instructors
bad for queries involving only department
results in variable size records
Can add pointer chains to link records of a particular relation
©Silberschatz, Korth and Sudarshan10.25Database System Concepts - 6th Edition
10.7 Data Dictionary Storage
Information about relations
names of relations
names, types and lengths of attributes of each relation
names and definitions of views
integrity constraints
User and accounting information, including passwords
Statistical and descriptive data
number of tuples in each relation
Physical file organization information
How relation is stored (sequential/hash/…)
Physical location of relation
Information about indices (Chapter 11)
The Data dictionary (also called system catalog) stores
metadata; that is, data about data, such as:
©Silberschatz, Korth and Sudarshan10.26Database System Concepts - 6th Edition
Relational Representation of Dictionary/Catalog
Relational
representation on
disk
Specialized data
structures
designed for
efficient access, in
memory
If multiple indices,
this can be multi-
valued, so the
dictionary DB is
not even in 1NF!
©Silberschatz, Korth and Sudarshan10.27Database System Concepts - 6th Edition
PostgreSQL example
©Silberschatz, Korth and Sudarshan10.28Database System Concepts - 6th Edition
10.8 DB Buffer
A DB file is partitioned into fixed-length storage units called
blocks.
The DBMS seeks to minimize the number of block transfers
between disk and main memory (MM). We can reduce the
number of disk accesses by keeping as many blocks as
possible in MM.
Buffer = portion of main memory available to store copies of
disk blocks.
Buffer manager = subsystem responsible for allocating
buffer space in MM.
©Silberschatz, Korth and Sudarshan10.29Database System Concepts - 6th Edition
Buffer Manager (BM)
A program places call to BM when it needs a block from disk.
1. If the block is already in the buffer, BM returns to the program the
MM address of the block.
2. If the block is not in the buffer, the BM does the following:
1. Allocates space in buffer for block
1. Replaces (throws out) some other block, if required, to make
space for the new block.
2. Replaced block written back to disk only if it was modified
since the most recent time that it was written to/fetched from
the disk.
2. Reads the block from the disk to the buffer, and returns the
address of the block in MM to program.
©Silberschatz, Korth and Sudarshan10.30Database System Concepts - 6th Edition
Buffer Replacement Policies
Least recently used (LRU) strategy is popular in OSs
Idea behind LRU: use past pattern of block references as a
predictor of future references
However, queries have well-defined access patterns (such as
sequential scans), so a DBMS can use the information in a
user’s query to better predict future references
©Silberschatz, Korth and Sudarshan10.31Database System Concepts - 6th Edition
Buffer-Replacement Policies (Cont.)
Pinned block – memory block that is not allowed to be written
back to disk.
Most recently used (MRU) strategy – system must pin the
block currently being processed. After the final tuple of that
block has been processed, the block is unpinned, and it
becomes the most recently used block.
A.k.a. toss-immediate
Buffer manager can use statistical information regarding the
probability that a request will reference a particular relation
E.g., the data dictionary is frequently accessed →
Heuristic: keep data-dictionary blocks in MM buffer
©Silberschatz, Korth and Sudarshan10.32Database System Concepts - 6th Edition
LRU can be a bad strategy for certain access
patterns involving repeated scans of data
Example: Compute join of 2 relations r and s by nested loop:
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
A mixed strategy is preferable:
Toss immediate for r
LRU for s
©Silberschatz, Korth and Sudarshan10.33Database System Concepts - 6th Edition
Homework for Ch.10
End-of-chapter exercises 4, 6, 7, 17, 18
EOL 1