CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding...

Preview:

Citation preview

CPSC 231 D.H. 1

Learning Objectives

• Understanding of disk versus RAM performance gap.

• Understanding definition, design goals and design problems of file structure.

• Understanding of file structure research history.

• Understanding and naming key terms used in file structure.

CPSC 231 D.H. 2

Secondary Storage in Computer Systems

• Data can be stored on:• hard disks

• floppy disks

• tapes

• CD-ROMs

• ZIP and JAZZ disks

• network servers

• Most data is stored on hard disks.

CPSC 231 D.H. 3

Disks

• Disks provide enormous capacity to store information.

• Disks are orders of magnitude slower than main memory (a single disk access can take a quarter of million times longer than a single RAM access).

• DISK = LARGE and SLOW and CHEAP

• RAM = SMALL and FAST

CPSC 231 D.H. 4

RAM versus DiskPerformance Gap

• Example:– 120 nanoseconds to access RAM (Main Memory)

– 30 milliseconds to access disk

• Analogy:– 20 seconds versus 58 days

• CONCLUSION:– Application programs have to spend a lot of

time waiting for data to be read from the disk or to be written to the disk.

CPSC 231 D.H. 5

Questions• What is a millisecond, microsecond and

nanosecond?• Millisecond = 1/1000 s

• Microsecond = 1/1000000 s

• Nanosecond = 1/1000000000 s

• How many times is RAM access faster than disk access?

• Assume • 120 nanoseconds to access RAM (Main Memory)

• 30 milliseconds to access disk

CPSC 231 D.H. 6

File Structure• Definition:

– A file structure is a combination of: • representation for data in files and

• of operations for accessing the data.

– A file structure allows applications to read, write and modify data.

– A good file structure design will give an application an efficient (fast) access to the needed data.

CPSC 231 D.H. 7

File Structure Design Goals

• Minimize the total disk access time • by clustering related data together

• by keeping adjacent blocks close to each other on the disk

• ideally, get all the needed data in just ONE disk access

• Maximize the total disk space utilization• disk de-fragmentation procedures

• data compression

CPSC 231 D.H. 8

Files structure design problems

• One of the most difficult problems in meeting the design goals of a file structure is the fact that files are quite dynamic, i.e. they:

• grow

• shrink

• change their data

• The design goals would be easier to meet if files were static. WHY?

CPSC 231 D.H. 9

Historical view of file structure design

• Early work • presumed that files were located on tapes

• access was sequential

• Recent work• most files are stored on direct access devices (s.a.

hard disks, floppy disks, CD-ROMs, ZIP disks , etc.)

• large files required indexing

• indexes and keys allowed for speedy searches of data on the disk

CPSC 231 D.H. 10

File structure history cont.

• Indexed files grew and became slow to access => tree structures emerged.

• Unfortunately some trees grew very unevenly resulting in slow (almost sequential) searches => AVL trees emerged (self-adjusting binary trees)

• AVL trees grew large and required multiple disk accesses => B-trees emerged.

Tree File

CPSC 231 D.H. 11

B - Tree

CPSC 231 D.H. 13

CPSC 231 D.H. 14

File structure history cont.• B-trees provided excellent performance for

non-sequential files but sequential access was very slow => B+ trees emerged.

• B-trees and B+ trees became the basis for many commercial file systems, since they provide access times that grows in the proportion to logkN, where N is the number of entries in the file and k is the number of entries indexed in a single block of the B-tree.

B+ Trees

CPSC 231 D.H. 15

CPSC 231 D.H. 16

Hashing

• Hashing is a data access mechanism that is based on converting the search key into a storage address.

• A good hashing algorithm can significantly reduce the number of disk accesses.

• Extendible hashing is a hashing that works well with files that over time undergo substantial changes in size.

CPSC 231 D.H. 18

Key terms.• AVL tree - self adjusting binary tree that

can guarantee good access times for data stored in memory (but not on the disk).

• B-tree - a tree structure that provides fast access to data stored in files. B-tree does NOT have to be a binary tree.

• B+ tree - a variation of the B-tree structure that provides for fast sequential access to data as well as indexed access.

CPSC 231 D.H. 19

Key Terms Cont.• File structure

– the organization of data on secondary storage devices such as disks together with operations defined for the data

• Sequential access– access of data that takes records in serial order,

looking at the first, second, and so on.

• Random access– access of data that that takes records in any

order, not necessary serial.

CPSC 231 D.H. 20

Physical files and logical files.• Files are collections of related information.

• Physical files exist on secondary storage devices. Operating systems are responsible for managing physical files.

• Logical files are visible to application programs. Application programs do not know about physical locations of the files (often they do not know if the data is coming from a file or from a keyboard)

CPSC 231 D.H. 21

Association between physical and logical files

• Applications have to make an association between physical and logical file names. In C++ this can be done in the following way:

• ofstream outClientFile (“clients.dat”, ios:out)

• The application can write to outClientFile while the operating system sees clients.dat

CPSC 231 D.H. 22

Special Characters in Files• All computer systems have reserved a

number of characters for specific system functions.

• Examples:– Control-Z indicates often end-of-file in MS-

DOS programs– Control-D indicates often end-of-file in Unix

programs– CR (Carriage return) and LF (Line Feed)

characters together indicate end-of-line

CPSC 231 D.H. 23

Directory Structures

• Files are stored in directories. Thus directories are collections of files

• Most modern systems maintain a tree directory structure:(WHY?)

CPSC 231 D.H. 24

I/O Redirection

• I/O redirection allows for changing the source of input to come from a file instead of a keyboard:

– program < file /* program reads input form a file /* instead of keyboard

• I/O redirection allows for directing the output to go a file instead of the screen

– program > file /* program writes to a file instead of /* the screenRedirection

operator

CPSC 231 D.H. 25

Pipes

• An output of one program can be used as an input to another program be using pipes:

• Example:– program1 | program2

Pipe operator

Pipe Operator

CPSC 231 D.H. 26

Recommended