12
tabase Management Systems ,Shri Prasad Sawant . Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Embed Size (px)

Citation preview

Page 1: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 1

Storing Data: Disks and Files

Unit 1Mr.Prasad Sawant

Page 2: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 2

Introduction

DBMS has to store data somewhere Choices:

Main memory• Expensive – compared to secondary and tertiary storage• Fast – in memory operations are fast• Volatile – not possible to save data from one run to its next• Used for storing current data

Secondary storage (hard disk)• Less expensive – compared to main memory• Slower – compared to main memory, faster compared to tapes• Persistent – data from one run can be saved to the disk to be

used in the next run• Used for storing the database

Tertiary storage (tapes)• Cheapest• Slowest – sequential data access• Used for data archives

Page 3: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 3

Why Not Store Everything in Main Memory?

Costs too much. Main memory is volatile. We want data

to be saved between runs. (Obviously!) Typical storage hierarchy:

Main memory (RAM) for currently used data. Disk for the main database (secondary

storage). Tapes for archiving older versions of the

data (tertiary storage).

Page 4: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 4

Disks

Secondary storage device of choice. Main advantage over tapes: random

access vs. sequential. Data is stored and retrieved in units

called disk blocks or pages.

Page 5: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 5

Components of a Disk

Platters

The platters spin (say, 90rps).

Spindle

The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).

Disk head

Arm movement

Arm assembly

Only one head reads/writes at any one time.

Tracks

Sector

Block size is a multiple of sector size (which is fixed).

Page 6: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 6

Animated

Page 7: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 7

Accessing a Disk Page

Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)

Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page

Page 8: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 8

File Organization

Three types of file organization

Unordered or Heap files Ordered or sequential files Hash files

Page 9: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 9

Unordered Or Heap File

Records are stored in the same order in which they are created

Insert operation Fast – because the incoming record is written at the end

of the last page of the file Search (or update) operation

Slow – because linear search is performed on pages Delete Operation

Slow – because the record to be deleted is first searched for

Deleting the record creates a hole in the page Periodic file compacting work required to reclaim the

wasted space

Page 10: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 10

Ordered or Sequential File

Records are sorted on the values of one or more fields Ordering field – the field on which the records are sorted Ordering key – the key of the file when it is used for record sorting

Search (or update) Operation Fast – because binary search is performed on sorted records Update the ordering field?

Delete Operation Fast – because searching the record is fast Periodic file compacting work is, of course, required

Insert Operation Poor – because if we insert the new record in the correct position

we need to shift all the subsequent records in the file Alternatively an ‘overflow file’ is created which contains all the

new records as a heap Periodically overflow file is merged with the main file If overflow file is created search and delete operations for records

in the overflow file have to be linear!

Page 11: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 11

Files of Records

Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records.

FILE: A collection of pages, each containing a collection of records. Must support: insert/delete/modify record read a particular record (specified using record

id) scan all records (possibly with some conditions

on the records to be retrieved)

Page 12: Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant

Database Management Systems ,Shri Prasad Sawant . 12

System Catalogs For each index:

structure (e.g., B+ tree) and search key fields For each relation:

name, file name, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints

For each view: view name and definition

Plus statistics, authorization, buffer pool size, etc.

Catalogs are themselves stored as relations!