26
CPSC 231 D.H. 1 Learning Objectives • Understanding of disk versus RAM performance gap. • Understanding definition, design goals and design problems of file structure. • Understanding of file structure research history. • Understanding and naming key terms used in file structure.

CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

Embed Size (px)

Citation preview

Page 1: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 1

Learning Objectives

• Understanding of disk versus RAM performance gap.

• Understanding definition, design goals and design problems of file structure.

• Understanding of file structure research history.

• Understanding and naming key terms used in file structure.

Page 2: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 2

Secondary Storage in Computer Systems

• Data can be stored on:• hard disks

• floppy disks

• tapes

• CD-ROMs

• ZIP and JAZZ disks

• network servers

• Most data is stored on hard disks.

Page 3: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 3

Disks

• Disks provide enormous capacity to store information.

• Disks are orders of magnitude slower than main memory (a single disk access can take a quarter of million times longer than a single RAM access).

• DISK = LARGE and SLOW and CHEAP

• RAM = SMALL and FAST

Page 4: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 4

RAM versus DiskPerformance Gap

• Example:– 120 nanoseconds to access RAM (Main Memory)

– 30 milliseconds to access disk

• Analogy:– 20 seconds versus 58 days

• CONCLUSION:– Application programs have to spend a lot of

time waiting for data to be read from the disk or to be written to the disk.

Page 5: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 5

Questions• What is a millisecond, microsecond and

nanosecond?• Millisecond = 1/1000 s

• Microsecond = 1/1000000 s

• Nanosecond = 1/1000000000 s

• How many times is RAM access faster than disk access?

• Assume • 120 nanoseconds to access RAM (Main Memory)

• 30 milliseconds to access disk

Page 6: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 6

File Structure• Definition:

– A file structure is a combination of: • representation for data in files and

• of operations for accessing the data.

– A file structure allows applications to read, write and modify data.

– A good file structure design will give an application an efficient (fast) access to the needed data.

Page 7: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 7

File Structure Design Goals

• Minimize the total disk access time • by clustering related data together

• by keeping adjacent blocks close to each other on the disk

• ideally, get all the needed data in just ONE disk access

• Maximize the total disk space utilization• disk de-fragmentation procedures

• data compression

Page 8: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 8

Files structure design problems

• One of the most difficult problems in meeting the design goals of a file structure is the fact that files are quite dynamic, i.e. they:

• grow

• shrink

• change their data

• The design goals would be easier to meet if files were static. WHY?

Page 9: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 9

Historical view of file structure design

• Early work • presumed that files were located on tapes

• access was sequential

• Recent work• most files are stored on direct access devices (s.a.

hard disks, floppy disks, CD-ROMs, ZIP disks , etc.)

• large files required indexing

• indexes and keys allowed for speedy searches of data on the disk

Page 10: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 10

File structure history cont.

• Indexed files grew and became slow to access => tree structures emerged.

• Unfortunately some trees grew very unevenly resulting in slow (almost sequential) searches => AVL trees emerged (self-adjusting binary trees)

• AVL trees grew large and required multiple disk accesses => B-trees emerged.

Page 11: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

Tree File

CPSC 231 D.H. 11

Page 13: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

B - Tree

CPSC 231 D.H. 13

Page 14: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 14

File structure history cont.• B-trees provided excellent performance for

non-sequential files but sequential access was very slow => B+ trees emerged.

• B-trees and B+ trees became the basis for many commercial file systems, since they provide access times that grows in the proportion to logkN, where N is the number of entries in the file and k is the number of entries indexed in a single block of the B-tree.

Page 15: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

B+ Trees

CPSC 231 D.H. 15

Page 16: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 16

Hashing

• Hashing is a data access mechanism that is based on converting the search key into a storage address.

• A good hashing algorithm can significantly reduce the number of disk accesses.

• Extendible hashing is a hashing that works well with files that over time undergo substantial changes in size.

Page 18: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 18

Key terms.• AVL tree - self adjusting binary tree that

can guarantee good access times for data stored in memory (but not on the disk).

• B-tree - a tree structure that provides fast access to data stored in files. B-tree does NOT have to be a binary tree.

• B+ tree - a variation of the B-tree structure that provides for fast sequential access to data as well as indexed access.

Page 19: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 19

Key Terms Cont.• File structure

– the organization of data on secondary storage devices such as disks together with operations defined for the data

• Sequential access– access of data that takes records in serial order,

looking at the first, second, and so on.

• Random access– access of data that that takes records in any

order, not necessary serial.

Page 20: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 20

Physical files and logical files.• Files are collections of related information.

• Physical files exist on secondary storage devices. Operating systems are responsible for managing physical files.

• Logical files are visible to application programs. Application programs do not know about physical locations of the files (often they do not know if the data is coming from a file or from a keyboard)

Page 21: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 21

Association between physical and logical files

• Applications have to make an association between physical and logical file names. In C++ this can be done in the following way:

• ofstream outClientFile (“clients.dat”, ios:out)

• The application can write to outClientFile while the operating system sees clients.dat

Page 22: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 22

Special Characters in Files• All computer systems have reserved a

number of characters for specific system functions.

• Examples:– Control-Z indicates often end-of-file in MS-

DOS programs– Control-D indicates often end-of-file in Unix

programs– CR (Carriage return) and LF (Line Feed)

characters together indicate end-of-line

Page 23: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 23

Directory Structures

• Files are stored in directories. Thus directories are collections of files

• Most modern systems maintain a tree directory structure:(WHY?)

Page 24: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 24

I/O Redirection

• I/O redirection allows for changing the source of input to come from a file instead of a keyboard:

– program < file /* program reads input form a file /* instead of keyboard

• I/O redirection allows for directing the output to go a file instead of the screen

– program > file /* program writes to a file instead of /* the screenRedirection

operator

Page 25: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

CPSC 231 D.H. 25

Pipes

• An output of one program can be used as an input to another program be using pipes:

• Example:– program1 | program2

Pipe operator

Page 26: CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file

Pipe Operator

CPSC 231 D.H. 26