Upload
vuongduong
View
242
Download
6
Embed Size (px)
Citation preview
1
Chapter 16
Disk Storage, Basic File Structures, Hashing, and Modern
Storage
- Databases are stored as files of records stored on disks
- Physical database file structures
- Physical levels of three schema architecture
2
- The collection of data in a DB must be stored on some storage
medium. The DBMS software can retrieve, update, and process
this data as needed
- Storage media forms a hierarchy
3
-primary, secondary, tertiary, etc..
- offline storage, archiving databases (larger capacity, less cost,
slower access, not directly accessible by CPU)
Memory Hierarchies and Storage Devices
- Cache, static RAM (Prefetch, Pipeline)
- Dynamic RAM (main memory(
Secondary and Tertiary Storage
-mass storage (magnetic disks, CD, DVD (measured in KB, MB, TB, PB
- programs are in main memory (DRAM)
-permanent databases reside in secondary storage
- main memory buffers are used to read and write to secondary
storage
- Flash memory: non volatile, NAND and NOR flash based
- Optical disks: CDs (700MB) and DVDs (4.5 – 15GB), Blue Ray (54GB)
- Magnetic Tapes and Juke Boxes
Depending upon the intended use and application requirements,
data is kept in one or more levels of hierarchy
4
Storage Organization of Database
-Large amount of data that must persist for a long period of time
(called persistent data)
- parts of this data are accessed and processed repeatedly during the
storage period
- transient data during the period of execution
- most DBs are stored on secondary storage (magnetic disks)
- DB is too large to fit in main memory
- permanent loss on disk is less likely
- less cost on disk than primary storage
7
- A range of cylinders have the same number of sectors per arc.
- A common sector size is 512 bytes
- A division of a track into equal sized disk blocks (or pages) is set by
OS during formatting
- Fixed block size can’t be changed dynamically
- Block sizes 512b – 8192b
- Blocks are separated by fixed size interblock gaps
- Storage capacity and transfer rates improving all the time, also
cost is down at the same time ($100/TB)
Disk
- Random access addressable device
- Transfer from disk to main memory is in units of blocks
- Hardware address of block consists of (cylinder#, track#, block#)
- Modern disks have a single number called LBA (logical block
address)
- The LBA 0 – n-1 is mapped to the right block on the disk
- The LBA maps to a contiguous address in main memory
- One block at a time or a cluster to transfer
- Disk controller controls the disk drive
- Standard interface from a computer to a disk is called SCSI (small
computer system interface)
- Connection of HDDs, CDs and DVDs to a computer is through
SATA (Serial AT attachment), 16 bit IBM AT bus), 1.5Gbps – 6Gbps
- New SATA is NL-SAS (nearline SAS)
- The controller accepts high level I/O commands and takes
appropriate action to position the arm and cause read/write
- Seek time 5-10msec
- Rotational latency 4msec
- Block transfer time
8
- Transfer several consecutive blocks on the same track or cylinder
to be effective (avoids seek time and rotational latency for blocks
except the first one, total time 9-60msec, subsequent blocks 0.4
to 2mses)
- Locating data on a disk is a major bottleneck – need efficient
techniques to do this…
Making Data Access More Efficient on Disk
(1) Buffering
a. Mismatch of speeds of CPU and disks
b. Application using current data and I/O fetching new data to
the buffer
(2) Organization
a. Use contiguous cylinders and tracks
b. Avoid movement of arm and seek time
(3) Prefetch
a. Read data ahead of request
b. Read consecutive blocks on tracks or cylinders though not
needed
c. May not be efficient for random data
(4) Scheduling
a. Proper scheduling of I/O requests
b. Efficient scheduling algorithms (e.g elevator)
(5) Use Log Disks
a. Log disks to hold data temporarily
b. Single disk used to hold logging of writes
c. All blocks go to disk sequentially, avoiding seek time
d. Place data and log files on the log disk
e. Not possible to do for most applications
9
(6) Use Flash Memory
a. Use SSDs or Flash memory instead of hard disks
b. Do writes and updates to battery backup DRAM
c. Later save to hard disk
11
Solid State Device Storage (SDD)
Use flash memory as intermediate storage enterprise flash
drives (EFDs)
Magnetic Tape Storage Devices
- Sequential access devices to access nth block on tape
- Read/write head is used to access tapes
- Used for backup and recovery
Buffering on Blocks
When several blocks to be transferred to memory and all the block
addresses are known, several buffers can be reserved in memory to
speed up the transfer.
When one buffer being read/written by I/O, CPU can process other
buffer.
12
- Processes A, B are running concurrently in interleaved fashion, C,
D are running in parallel.
- Use of two buffers shown in Fig. 16.4. File A is in one buffer and
File B is in another buffer (double buffering)
- Double buffering permits contiguous reading or writing of data
blocks, thus reducing seek time.
13
Buffer Management
- It is impossible to bring all data into memory at the same time
- Buffer is a part of main memory that is available to receive blocks
or pages of data from disk
- Buffer manager is a software component of a DBMS, which
manages buffers. It knows, which pages to bring and which buffer
to use
14
- The size of the shared buffer pool is a parameter for the DBMS
controlled by DBAs
Two kinds of buffer management:
1. Controls the main memory directly (RDBMS)
2. Allocates buffers in virtual memory (OS Control), OODBMS
Goals:
1. Maximize probability that a requested page is found in main
memory
2. Efficient page replacement algorithm
Keeps Information:
1. A pin-count (number of requests or number of current users); If
the count is 0, it is unpinned; a pinned block should not be
allowed to write to disk
2. A dirty bit
a. a dirty bit is set when a page is updated by any application
program
b. make sure no of buffers fit in main memory
c. if the requested amount exceeds buffer pool, use page
replacement
d. if the space is in virtual memory, OS thrashing may happen
e. if the requested page is already in the buffer pool,
increment pin count
f. if the page is not in the buffer pool:
i. choose a page replacement
15
ii. if dirty bit is on in the replacement page (old copy is on
the disk), use the slot for a new page and copy the data
and release the buffer to an application.
Buffer Replacement Strategies
1. LRU (least recently used); maintain a time stamp; least used page
is replaced
2. Clock priority; round robin variant of LRU; flag 0 or 1; if 0, use it; if
1, reset to 0, if dirty bit is set then write to disk
Flag 0 or 1 in each
slot
3. FIFO
a. Notes the time each page loaded into memory
b. Simple approach
c. It may bring back the same block (sometimes)
LRU and Clock policies best policies for DB applications
16
Placing File Records on Disk
Set of records are organized into set of files.
Records and Record Types:
- Data is in the form of records
- Each record consists of collection of related data values or items
(corresponds to a field)
Record type is a collection of records
Record structure is an entity
Data type is associated with each field
Standard data types: integer, long, float, char, ….
Other data types: date, time, …
struct employee {
Char name[30];
Char ssn[9];
Int salary;
Int job-code;
Char department[20];
};
Database also have to store unstructured data (binary large objects,
BLOBs), digital images, videos as pointers to the blobs included in the
record.
17
Files, Fixed and Variable Lengths
- Same record type in a file
- If every record is same size, then it is called fixed length record
- If different records have different lengths, it is called variable
length records
o Variable length fields (name)
o Repeating fields, or repeating group fields
o Different types of records
o Separator characters are used for variable length fields
o If too many fields, but less actual fields; then
<field name, field value> format is used
<field type, field value>
- Repeating fields; one char to separate values; one char to
separate fields and one char to terminate; (= , ||, #)
- These characters are the part of the file system, but hidden from
the programmer (0x0d and 0x0a)
Record Blocking
Records are stored in blocks (sectors)
Block size B
Record size R
Unit of transfer from disk to memory is a block
If B > R, bfr (blocking factor) = Ɩ B/R ɺ records per block (integer
division)
If it does not divide evenly, unused space is:
18
B – (bfr * R) bytes
To utilize space, a record may be spanned in two blocks:
If R > B spanned record; number of blocks needed for a file of r records:
b = ɾ r/bfr ɿ blocks (next integer value)
Allocation of Files on Disk
- Contiguous
- Linked
- Index
- (clusters and extents)
File Headers
- Contains information about files (disk addresses, record format
descriptions)
- Records are copied into memory and searched one block at a
time
24
Operations on Files
- Retrieval
- Updates
A simple or compound selection conditions are used:
Ssn = ‘12345678’
Department = ‘Research’
Salary > 30000
Complex conditions must be decomposed into simple conditions to
locate records on the disk.
A high level programs like DBMS software use file operations such as:
- Open
- Reset
- Find (or Locate)
- Read (or Get)
- Find Next
- Delete
- Modify
- Insert
- Close
- Scan (returns first or next record)
- FindAll
- FindChar
(Record at a time operations except reset and close)
25
Files of Unordered Records (Heap)
- Records are placed in a file the way they arrived and inserted,
new records are placed at the end; This arrangement is called
HEAP.
- The last disk block is copied into buffer from the disk; the record is
inserted into the buffer, the buffer is copied back to the disk; the
address of the last file block is kept in the file header
- Inserting a record is very efficient (new records are at the end)
- Searching involves linear search (b/2)
- To delete a record, find a block that has the record, copy to
buffer, delete the record, write buffer back to disk
- This leaves unused space in the disk block (wasted)
- Another method, keep a delete marker in the record, bit or byte,
search considers only valid records
- Require periodic reorganization of file to remove deleted records
- Deleted records can be reused for new records, but needs more
book keeping
- Soring can also be used for deleted records, but expensive
Buffer
Record
26
Direct or Relative File
-fixed length records
-un-spanned blocks
-contiguous allocation
File records: 0, 1, 2, …., r-1
Records in each block: 0, 1, 2, ….., bfr – 1
ith record of a file is located in a block: Ɩ i/bfr ɺ
and (i mod bfr) is that block
no of records = 550
bfr = 20
i = 221
i is located at (i/bfr), that is 221/20 = 11th block
and (i mod bfr), that is (221 mod 20) = 1st record in the block
Files of Ordered Records
- Order the records based on one of its field
- This leads to ordered or sequential file
- If the ordered field is a key field of the file, unique value in each
record (name of the employee as a key)
28
Advantages of ordered files:
1. No sorting required to access, key can be used
2. Search type can be a key value or a range of key values
3. Finding next record from current record is easy
4. Binary search can be used to speed up search log2b
5. Inserting and deleting is expensive
6. Keep some unused space (for inserting), same problem after it
is finished
7. Another approach, use master file and overflow file; overflow
file can be sorted and merged with the master file during the
file reorganization
29
Hashing Techniques
Search conditions on a single field called hash field. In most cases, the
hash field is also a key field.
Internal Hashing:
Hashing used as an internal file structure within a program
Hash table 0 - (m-1)
We have m slots, whose addresses correspond to the array indexes
We have a hash function that translates to value between 0 and m-1
One common function: h(k) = k mod m
Other functions called “folding” use arithmetic functions such as add, or
logical function such as xor
Collision occurs and has to be resolved:
- Open addressing (checks available subsequent addresses)
- Chaining
- Multiple hashing
External hashing for disk files
Hashing for disk files is called external hashing to suit the characteristics
of disk storage.
30
- Address space is made of buckets
- Each bucket holds multiple records
- A bucket can be one or more disk blocks (cluster)
- Hashing function maps a key into a relative bucket number rather
than assigning an absolute block address to the bucket
- A table maintained in the file header converts the bucket number
to the corresponding disk block address Fig. 16.9
- Fixed number of buckets called static hashing
SKIP 16.8.3
34
Files on Mixed Records
We assume all records of a particular file are of same type of
records. In most databases, numerous types of records have
relationships and there is a need for mixed records in the files.
-numerous types of entities are related in various ways
-they need to be clustered in the same block or blocks
-OODBs also need clustering of objects
- there are other types of data structures such as B trees to store
DB data for efficient access
Parallelizing Disk Access Using RAID Technology
- Redundant array of independent disks
- Provide reliability and high performance
- Large array of independent disks acting as a single logical disk
Data Striping: distribute data across many disks to appear as a single
logical disk
Bit-level Striping: individual bits are split among disks
Block-level Stripping: blocks across disks
37
Reliability with RAID
- For n disks, likelihood of failure is n times
- MTBF of 200,000 hours, for 100 disks, the MTBF is 83 days
- To improve reliability, mirroring or shadowing and other RAID
techniques are used
- When redundancy is used, MTR (mean time to repair) of 24 hours,
MTBF becomes 90 years ((200000)^2 / 2*24)
Modern Storage Architecture
(1) Storage Area Networks (SAN)
(2) Network Attached Storage (NAS)
(3) iSCSI and Other Network-based Protocols (SCSI
commands are encapsulated and put into IP
packets using this protocol)
(4) Automated Storage Tiering (moves data between
different storage types (SATA, SAS, .)
(5) Object-based storage (instead of files, objects are
used; Facebook and other big data uses object
storage)…Also uses SCSI commands to transmit
objects on the Internet (Microsoft Azure,
Openstack Swift protocol)