Chapter 16 Disk Storage, Basic File Structures, Hashing ...orion.towson.edu/~karne/teaching/c657sl/Ch16Notes.pdf · Disk Storage, Basic File Structures, Hashing, ... - Databases are

1

Chapter 16

Disk Storage, Basic File Structures, Hashing, and Modern

Storage

- Databases are stored as files of records stored on disks

- Physical database file structures

- Physical levels of three schema architecture

2

- The collection of data in a DB must be stored on some storage

medium. The DBMS software can retrieve, update, and process

this data as needed

- Storage media forms a hierarchy

3

-primary, secondary, tertiary, etc..

- offline storage, archiving databases (larger capacity, less cost,

slower access, not directly accessible by CPU)

Memory Hierarchies and Storage Devices

- Cache, static RAM (Prefetch, Pipeline)

- Dynamic RAM (main memory(

Secondary and Tertiary Storage

-mass storage (magnetic disks, CD, DVD (measured in KB, MB, TB, PB

- programs are in main memory (DRAM)

-permanent databases reside in secondary storage

- main memory buffers are used to read and write to secondary

storage

- Flash memory: non volatile, NAND and NOR flash based

- Optical disks: CDs (700MB) and DVDs (4.5 – 15GB), Blue Ray (54GB)

- Magnetic Tapes and Juke Boxes

Depending upon the intended use and application requirements,

data is kept in one or more levels of hierarchy

4

Storage Organization of Database

-Large amount of data that must persist for a long period of time

(called persistent data)

- parts of this data are accessed and processed repeatedly during the

storage period

- transient data during the period of execution

- most DBs are stored on secondary storage (magnetic disks)

- DB is too large to fit in main memory

- permanent loss on disk is less likely

- less cost on disk than primary storage

5

6

7

- A range of cylinders have the same number of sectors per arc.

- A common sector size is 512 bytes

- A division of a track into equal sized disk blocks (or pages) is set by

OS during formatting

- Fixed block size can’t be changed dynamically

- Block sizes 512b – 8192b

- Blocks are separated by fixed size interblock gaps

- Storage capacity and transfer rates improving all the time, also

cost is down at the same time ($100/TB)

Disk

- Random access addressable device

- Transfer from disk to main memory is in units of blocks

- Hardware address of block consists of (cylinder#, track#, block#)

- Modern disks have a single number called LBA (logical block

address)

- The LBA 0 – n-1 is mapped to the right block on the disk

- The LBA maps to a contiguous address in main memory

- One block at a time or a cluster to transfer

- Disk controller controls the disk drive

- Standard interface from a computer to a disk is called SCSI (small

computer system interface)

- Connection of HDDs, CDs and DVDs to a computer is through

SATA (Serial AT attachment), 16 bit IBM AT bus), 1.5Gbps – 6Gbps

- New SATA is NL-SAS (nearline SAS)

- The controller accepts high level I/O commands and takes

appropriate action to position the arm and cause read/write

- Seek time 5-10msec

- Rotational latency 4msec

- Block transfer time

8

- Transfer several consecutive blocks on the same track or cylinder

to be effective (avoids seek time and rotational latency for blocks

except the first one, total time 9-60msec, subsequent blocks 0.4

to 2mses)

- Locating data on a disk is a major bottleneck – need efficient

techniques to do this…

Making Data Access More Efficient on Disk

(1) Buffering

a. Mismatch of speeds of CPU and disks

b. Application using current data and I/O fetching new data to

the buffer

(2) Organization

a. Use contiguous cylinders and tracks

b. Avoid movement of arm and seek time

(3) Prefetch

a. Read data ahead of request

b. Read consecutive blocks on tracks or cylinders though not

needed

c. May not be efficient for random data

(4) Scheduling

a. Proper scheduling of I/O requests

b. Efficient scheduling algorithms (e.g elevator)

(5) Use Log Disks

a. Log disks to hold data temporarily

b. Single disk used to hold logging of writes

c. All blocks go to disk sequentially, avoiding seek time

d. Place data and log files on the log disk

e. Not possible to do for most applications

9

(6) Use Flash Memory

a. Use SSDs or Flash memory instead of hard disks

b. Do writes and updates to battery backup DRAM

c. Later save to hard disk

10

11

Solid State Device Storage (SDD)

Use flash memory as intermediate storage enterprise flash

drives (EFDs)

Magnetic Tape Storage Devices

- Sequential access devices to access nth block on tape

- Read/write head is used to access tapes

- Used for backup and recovery

Buffering on Blocks

When several blocks to be transferred to memory and all the block

addresses are known, several buffers can be reserved in memory to

speed up the transfer.

When one buffer being read/written by I/O, CPU can process other

buffer.

12

- Processes A, B are running concurrently in interleaved fashion, C,

D are running in parallel.

- Use of two buffers shown in Fig. 16.4. File A is in one buffer and

File B is in another buffer (double buffering)

- Double buffering permits contiguous reading or writing of data

blocks, thus reducing seek time.

13

Buffer Management

- It is impossible to bring all data into memory at the same time

- Buffer is a part of main memory that is available to receive blocks

or pages of data from disk

- Buffer manager is a software component of a DBMS, which

manages buffers. It knows, which pages to bring and which buffer

to use

14

- The size of the shared buffer pool is a parameter for the DBMS

controlled by DBAs

Two kinds of buffer management:

1. Controls the main memory directly (RDBMS)

2. Allocates buffers in virtual memory (OS Control), OODBMS

Goals:

1. Maximize probability that a requested page is found in main

memory

2. Efficient page replacement algorithm

Keeps Information:

1. A pin-count (number of requests or number of current users); If

the count is 0, it is unpinned; a pinned block should not be

allowed to write to disk

2. A dirty bit

a. a dirty bit is set when a page is updated by any application

program

b. make sure no of buffers fit in main memory

c. if the requested amount exceeds buffer pool, use page

replacement

d. if the space is in virtual memory, OS thrashing may happen

e. if the requested page is already in the buffer pool,

increment pin count

f. if the page is not in the buffer pool:

i. choose a page replacement

15

ii. if dirty bit is on in the replacement page (old copy is on

the disk), use the slot for a new page and copy the data

and release the buffer to an application.

Buffer Replacement Strategies

1. LRU (least recently used); maintain a time stamp; least used page

is replaced

2. Clock priority; round robin variant of LRU; flag 0 or 1; if 0, use it; if

1, reset to 0, if dirty bit is set then write to disk

Flag 0 or 1 in each

slot

3. FIFO

a. Notes the time each page loaded into memory

b. Simple approach

c. It may bring back the same block (sometimes)

LRU and Clock policies best policies for DB applications

16

Placing File Records on Disk

Set of records are organized into set of files.

Records and Record Types:

- Data is in the form of records

- Each record consists of collection of related data values or items

(corresponds to a field)

Record type is a collection of records

Record structure is an entity

Data type is associated with each field

Standard data types: integer, long, float, char, ….

Other data types: date, time, …

struct employee {

Char name[30];

Char ssn[9];

Int salary;

Int job-code;

Char department[20];

};

Database also have to store unstructured data (binary large objects,

BLOBs), digital images, videos as pointers to the blobs included in the

record.

17

Files, Fixed and Variable Lengths

- Same record type in a file

- If every record is same size, then it is called fixed length record

- If different records have different lengths, it is called variable

length records

o Variable length fields (name)

o Repeating fields, or repeating group fields

o Different types of records

o Separator characters are used for variable length fields

o If too many fields, but less actual fields; then

<field name, field value> format is used

<field type, field value>

- Repeating fields; one char to separate values; one char to

separate fields and one char to terminate; (= , ||, #)

- These characters are the part of the file system, but hidden from

the programmer (0x0d and 0x0a)

Record Blocking

Records are stored in blocks (sectors)

Block size B

Record size R

Unit of transfer from disk to memory is a block

If B > R, bfr (blocking factor) = Ɩ B/R ɺ records per block (integer

division)

If it does not divide evenly, unused space is:

18

B – (bfr * R) bytes

To utilize space, a record may be spanned in two blocks:

If R > B spanned record; number of blocks needed for a file of r records:

b = ɾ r/bfr ɿ blocks (next integer value)

Allocation of Files on Disk

- Contiguous

- Linked

- Index

- (clusters and extents)

File Headers

- Contains information about files (disk addresses, record format

descriptions)

- Records are copied into memory and searched one block at a

time

19

20

21

Contiguous Allocation

22

Linked Allocation

23

Indexed Allocation

24

Operations on Files

- Retrieval

- Updates

A simple or compound selection conditions are used:

Ssn = ‘12345678’

Department = ‘Research’

Salary > 30000

Complex conditions must be decomposed into simple conditions to

locate records on the disk.

A high level programs like DBMS software use file operations such as:

- Open

- Reset

- Find (or Locate)

- Read (or Get)

- Find Next

- Delete

- Modify

- Insert

- Close

- Scan (returns first or next record)

- FindAll

- FindChar

(Record at a time operations except reset and close)

25

Files of Unordered Records (Heap)

- Records are placed in a file the way they arrived and inserted,

new records are placed at the end; This arrangement is called

HEAP.

- The last disk block is copied into buffer from the disk; the record is

inserted into the buffer, the buffer is copied back to the disk; the

address of the last file block is kept in the file header

- Inserting a record is very efficient (new records are at the end)

- Searching involves linear search (b/2)

- To delete a record, find a block that has the record, copy to

buffer, delete the record, write buffer back to disk

- This leaves unused space in the disk block (wasted)

- Another method, keep a delete marker in the record, bit or byte,

search considers only valid records

- Require periodic reorganization of file to remove deleted records

- Deleted records can be reused for new records, but needs more

book keeping

- Soring can also be used for deleted records, but expensive

Buffer

Record

26

Direct or Relative File

-fixed length records

-un-spanned blocks

-contiguous allocation

File records: 0, 1, 2, …., r-1

Records in each block: 0, 1, 2, ….., bfr – 1

ith record of a file is located in a block: Ɩ i/bfr ɺ

and (i mod bfr) is that block

no of records = 550

bfr = 20

i = 221

i is located at (i/bfr), that is 221/20 = 11th block

and (i mod bfr), that is (221 mod 20) = 1st record in the block

Files of Ordered Records

- Order the records based on one of its field

- This leads to ordered or sequential file

- If the ordered field is a key field of the file, unique value in each

record (name of the employee as a key)

27

28

Advantages of ordered files:

1. No sorting required to access, key can be used

2. Search type can be a key value or a range of key values

3. Finding next record from current record is easy

4. Binary search can be used to speed up search log2b

5. Inserting and deleting is expensive

6. Keep some unused space (for inserting), same problem after it

is finished

7. Another approach, use master file and overflow file; overflow

file can be sorted and merged with the master file during the

file reorganization

29

Hashing Techniques

Search conditions on a single field called hash field. In most cases, the

hash field is also a key field.

Internal Hashing:

Hashing used as an internal file structure within a program

Hash table 0 - (m-1)

We have m slots, whose addresses correspond to the array indexes

We have a hash function that translates to value between 0 and m-1

One common function: h(k) = k mod m

Other functions called “folding” use arithmetic functions such as add, or

logical function such as xor

Collision occurs and has to be resolved:

- Open addressing (checks available subsequent addresses)

- Chaining

- Multiple hashing

External hashing for disk files

Hashing for disk files is called external hashing to suit the characteristics

of disk storage.

30

- Address space is made of buckets

- Each bucket holds multiple records

- A bucket can be one or more disk blocks (cluster)

- Hashing function maps a key into a relative bucket number rather

than assigning an absolute block address to the bucket

- A table maintained in the file header converts the bucket number

to the corresponding disk block address Fig. 16.9

- Fixed number of buckets called static hashing

SKIP 16.8.3

31

32

33

34

Files on Mixed Records

We assume all records of a particular file are of same type of

records. In most databases, numerous types of records have

relationships and there is a need for mixed records in the files.

-numerous types of entities are related in various ways

-they need to be clustered in the same block or blocks

-OODBs also need clustering of objects

- there are other types of data structures such as B trees to store

DB data for efficient access

Parallelizing Disk Access Using RAID Technology

- Redundant array of independent disks

- Provide reliability and high performance

- Large array of independent disks acting as a single logical disk

Data Striping: distribute data across many disks to appear as a single

logical disk

Bit-level Striping: individual bits are split among disks

Block-level Stripping: blocks across disks

35

36

37

Reliability with RAID

- For n disks, likelihood of failure is n times

- MTBF of 200,000 hours, for 100 disks, the MTBF is 83 days

- To improve reliability, mirroring or shadowing and other RAID

techniques are used

- When redundancy is used, MTR (mean time to repair) of 24 hours,

MTBF becomes 90 years ((200000)^2 / 2*24)

Modern Storage Architecture

(1) Storage Area Networks (SAN)

(2) Network Attached Storage (NAS)

(3) iSCSI and Other Network-based Protocols (SCSI

commands are encapsulated and put into IP

packets using this protocol)

(4) Automated Storage Tiering (moves data between

different storage types (SATA, SAS, .)

(5) Object-based storage (instead of files, objects are

used; Facebook and other big data uses object

storage)…Also uses SCSI commands to transmit

objects on the Internet (Microsoft Azure,

Openstack Swift protocol)

38

Storage Area Networks

39

Network Attached Storage

Documents

Chapter 16 Disk Storage, Basic File Structures, Hashing ...orion.towson.edu/~karne/teaching/c657sl/Ch16Notes.pdf · Disk Storage, Basic File Structures, Hashing, ... - Databases are