Upload
others
View
62
Download
2
Embed Size (px)
Citation preview
Database Systems
Mohamed Zahran (aka Z)
http://www.mzahran.com
CSCI-GA.2433-001
Lecture 8: Physical Schema: Storage
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Disk What happens under the hood?
1. Create a model of the enterprise (e.g., using ER)
2. Create a logical “implementation” (using a relational model and normalization)
Requirements Analysis
Logical Design
Conceptual Design
Schema Refinement
Physical Design
Application & Security Design
• Uses a file system to store the relations • Requires knowledge of hardware and
operating systems characteristics
First, Let’s Look at a Typical Hierarchy
Level Access time Typical size
Registers "instantaneous" under 1KB
Level 1 Cache 1-3 ns 64KB per core
Level 2 Cache 3-10 ns 256KB per core
Level 3 Cache 10-20 ns 2-20 MB per chip
Main Memory 30-60 ns 4-32 GB per system
Hard Disk 3,000,000-10,000,000 ns over 1TB
In DB, we care about those two levels
Physical Design
Criteria Storage media
Indexes File Structures
Performance • Hard-disk drives • Solid-state Disks
Disk Drives
• To access data: — seek time: position head over the proper track — rotational latency: wait for desired sector (RPM) — transfer time: grab the data (one or more sectors)
Platter
Track
Platters
Sectors
Tracks
Hard Disks
• spinning platter of special material • mechanical arm with read/write head
must be close to the platter to read/write data
• data is stored magnetically • disks are random access meaning data
can be read/written anywhere on the disk
A Conventional Hard Disk Structure
Hard Disk Architecture
• Surface = group of tracks
• Track = group of sectors
• Sector = group of bytes
• Cylinder: several tracks on corresponding surfaces
Disk Sectors and Access • Each sector records
– Sector ID – Data (512 bytes, 4096 bytes proposed) – Error correcting code (ECC)
• Used to hide defects and recording errors
– Synchronization fields and gaps
• Access to a sector involves – Queuing delay if other accesses are pending – Seek: move the heads – Rotational latency – Data transfer – Controller overhead
Example of a Real Disk: MKxx59GSM from Toshiba
• Size: 1TB • 5,400 RPM • 2.5” • Number of platters: 3 • Number of data heads: 6 • Interface: SATA • Transfer rate to host: 3GB/sec • Average seek time: 12ms • Track-to-Track seek: 2ms
Disks: Other Issues
• Average seek and rotation times are helped by locality.
• Disk performance improves about 10%/year
• Capacity increases about 60%/year
• Common disk interfaces/controllers: • SCSI, IDE, SATA
The Disk: A View from the Top
• A sequence of blocks
– A physical unit of access is always a block
• All blocks are of the same size
• Actually, a block consists of 1 or more sectors
• A file:
– Logically: a series of records, of similar or different sizes
– Physically: a series of, no necessarily contiguous blocks
• The file system helps us to:
– Find the first block
– Find the last block
– Find the next block
– Find the previous block
Relation
File
Blocks
Records
Sectors
Each tuple is a record in the file.
Logical view
Physical view
Assumptions: •There can be several records in a block •No record spans more than one block
There is one extra layer that we will discuss shortly!
1 1200
3 2100
4 1800
2 1200
6 2300
9 1400
8 1900
E# Salary
1 1200
3 2100
4 1800
2 1200
6 2300
9 1400
8 1900
RecordsRelation Blocks
1 1200
3 2100
4 1800
2 1200
6 2300
9 1400
8 1900
Records
6 23009 1400
1 1200 3 2100 8 1900
4 18002 1200
Left-over
SpaceFirst block
of the file
Example
Processing a Query: What Happens Under The Hood?
SELECT E# FROM R WHERE SALARY > 1500;
Read from disk into RAM all
relevant blocks
Get the relevant information from
blocks
Additional processing then
produce the answer What is the cost of this?
A Simple Cost Model
• Assumptions – Reading or Writing a block costs one time unit – Processing is free
• Justification – Accessing the disk is much more expensive
than any reasonable CPU processing of queries • Implication
– Goal: minimize number of block accesses – Good Heuristic: Organize the physical
database so that you make as much use as possible from any block you read/write
Why Not Store Everything in Main Memory?
Example
Blocks on disk
1 1200
3 2100
4 1800
2 1200
6 2300
9 1400
8 1900
Array in RAM
6 23009 1400
1 1200 3 2100 8 1900
4 18002 1200
What is the cost of accessing: E# 2 and E# 9?
What is the cost of accessing: E# 2 and E# 4?
What Is The Best Place for the Next Block?
RAID
• Disk Array: Arrangement of several disks that gives abstraction of a single, large disk.
• Goals: Increase performance and reliability. • Two main techniques:
– Data striping: Data is partitioned; size of a partition is called the striping unit. Partitions are distributed over several disks.
– Redundancy: More disks => more failures. Redundant information allows reconstruction of data if a disk fails.
RAID Levels
• Level 0: No redundancy
• Level 1: Mirrored (two identical copies)
– Each disk has a mirror image – Parallel reads, a write involves two disks. – Maximum transfer rate = transfer rate of one disk – Better reliability but no protection against data corruption of
viruses
RAID Levels
• Level 0+1: Striping and Mirroring – Parallel reads, a write involves two disks.
– Maximum transfer rate = aggregate bandwidth
Level 1+0 is the same as
0+1 with reverse in mirroring and stripping.
RAID Levels • Level 2: Bit-Interleaved
– Uses hamming code for error correction – Can recover data from single-bit corruption – Out of fashion!
For Error correction (Hamming code)
RAID Levels • Level 3: Byte-Interleaved with Parity
– Striping Unit: One byte. – One check disk. – Each read and write request involves all
disks; disk array can process one request at a time.
RAID Levels • Level 4: Block-Interleaved Parity
– Striping Unit: One disk block. One check disk.
– Parallel reads possible for small requests, large requests can utilize full bandwidth
– Writes involve modified block and check disk
RAID Levels • Level 5: Block-Interleaved Distributed
Parity – Similar to RAID Level 4, but parity blocks
are distributed over all disks
Level 6 is similar to 5 but with extra parity block
What Is This Story of Solid State Disks (SSD)?
• No moving parts hence called “solid” state
• reads and writes to a medium called NAND flash memory
• Faster startup: no spinning
• Extremely low read latency
• Deterministic: performance does not depend on the location of the data
BUT …
• Much more expensive than hard disks (~3$/GB vs. 0.15$/GB)
• Limited write erase time
• Slower write speeds
• High capacity SSDs may have significant higher power requirements
• SSD can get slower as it ages
SSD Organization
• SSD has its own page
• SSD pages are getting larger: 8KB, 16KB, 32KB
• Sector addressable
• Most SSD uses 4KB sectors
Comparison of typical hard-drives and SSD
More up-to-date- SSD: Samsung SSD 840 Pro Series: 256 GB • Read XFR Rate: ~510 MB/s •Write XFR Rate: ~500 MB/s
DBMS
Application
OS
DISK Controller
Data must be in RAM for DBMS to operate on it!
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Important: Tow flavors of DBMS •Relying on OS file systems •Uses its own or extend OS
Disk Space Manager
• Abstractly: deals with pages as a unit of data (read, write, allocate, deallocate)
• Page size usually chosen as disk block • Keeps track of:
– Which blocks are in use • List of bitmap
– Which pages are on which page blocks
• Hides the details of the hardware and OS and makes higher level layers think in term of pages
Buffer Manager
• Memory may not be able to hold all needed pages
• the software layer responsible for bringing pages from disk to main memory as needed
• Implements replacement policy • Higher levels of the DBMS code can be
written without worrying about whether data pages are in memory or not.
Buffer Manager
• It does its job as follows: – Partitions the memory into frames (1 frame holds 1 page)
– The collection of all frames/pages is called buffer pool
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher Levels
BUFFER POOL
choice of frame dictated by replacement policy
Hierarchy of Data
Tuples/Records
Relations
Sectors
Files
Blocks
Pages
We can try to minimize the number of block
accesses for “frequent” queries.
Indexing File Organization
Oversimplification: Tries to provide When you read a block you get “many” useful records
Oversimplification: Tries to provide where blocks containing useful records are
Conclusions
• From the application to the disk, each layer “sees” the data differently
• Disks provide cheap, non-volatile storage.
• Performance and cost are the main issues