18
External Storage • Primary Storage : Main Memory (RAM). • Secondary Storage: Peripheral Devices – Disk Drives – Tape Drives • Secondary storage is CHEAP. • Secondary storage is SLOW, about 250,000 times slower. (i.e. can do 250,000 CPU instructions in one I/O time). • Primary storage is volatile. • Secondary storage is permanent.

External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Embed Size (px)

Citation preview

Page 1: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

External Storage• Primary Storage : Main Memory (RAM).

• Secondary Storage: Peripheral Devices– Disk Drives– Tape Drives

• Secondary storage is CHEAP.

• Secondary storage is SLOW, about 250,000 times slower. (i.e. can do 250,000 CPU instructions in one I/O time).

• Primary storage is volatile.

• Secondary storage is permanent.

Page 2: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Golden Rule of File ProcessingMinimize the number of disk accesses!!

• Arrange information so that you get what you want with few disk accesses.

• Arrange information so you minimize future disk accesses.

• An organization for data on disk is often called a file structure.

• Disk based space/time tradeoff: Compress information to save time by reducing disk accesses.

Page 3: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Disk Drives• Track - a circle around a disk that holds

information.• Sector - arc portion of a track.• Interleaving factor - Physical distance between

logically adjacent sectors on a track, to allow for processing of sector data.

• Locality of Reference - If a record is read from a disk, the next request is likely to come from near the same place in the file.

• Cluster - smallest unit of allocation; usually several sectors.

• Extent - group of physically contiguous clusters• Internal Fragmentation - wasted space within a

sector if the record size is not the sector size.

Page 4: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Access Time• Seek Time - time for I/O head to reach

desired track. Based on distance between I/O head and desired track.– F(n)=t*n+s where t is time to traverse one

track and s is the startup time for the I/O head.• Rotational delay (latency) - time for data to

rotate to I/O head position. Determined by disk RPM.

• Transfer time - time for data to move under the I/O head. Determined by RPM and size of information to transfer.

Page 5: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Disk Access time example• 675 Mbyte disk drive

– 15 platters --> 45 Mbyte/platter– 612 tracks/platter– 150 sectors/track --> 512 bytes/sector– 8 sectors/cluster -> 4K/cluster -> 18

clusters/track– 1 track seek --> .08ms– seek startup 3 ms– 1 revolution --> 16.7ms– Interleaving factor of 3 -> 3 revolutions to read

1 track (50.1 msec)• How long to read a file of 128K divided

into 256 records of 512 bytes?– Uses 2 tracks (150 sectors on one 106 on other)– The average distance for seek is disk size/3 (not

2!)

Page 6: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Access Time• total time = initial seek + second seek

+2*(rotational delay+transfer time)• total = (612/3*.08+3) + (.08+3) +

2*(.5+3)*16.7= 139.3• Assumes clusters are contiguous and tracks

are contiguous• If clusters are randomly spread across disk

– time = 32*(612/3*.08+3+16.7/2+24/150*16.7) = 969.6 msec

– 24/150 is the part of the disk to read (8 sectors with an interleaving factor of 3).

Page 7: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Buffers• Read time for one track

– 612/3*.08 + 3 + (3+.5)*16.7 = 77.8• Read time for one sector

– 612/3*.08 +3 + (.5+1/150)*16.7 = 27.8• Read time for one byte

– 612/3*.08 +3 + (.5)*16.7 = 27.7• Nearly all disk drives read/write one

sector every time there is an I/O access.

• The information in a sector is stored in a buffer in the operating system.

Page 8: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Buffer Pools• A series of buffers used by an

application to cache disk data is called a buffer pool.

• Double buffering - read data from disk while CPU is processing the previous buffer. Same with writing; store written data into buffer while previous buffer is being physically placed on disk.

• Many buffers - allows for large differences between I/O time and CPU time. Sometimes I/O may fill a buffer faster than CPU can empty it.

Page 9: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Programmer’s View of Files• Logical view of files:

– An array of bytes.– A file pointer marks the current position.

• Three fundamental operations:– Read bytes from current position (move

file pointer).– Write byes to current position (move file

pointer).– Set file pointer to specified position.

Page 10: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

External Sorting• Problem: sort data sets too large to fit

in main memory.• Assume the data is stored on a disk

drive.• To sort, portions of the data must be

brought into main memory, processed, and returned to disk.

• An external sort should minimize disk accesses.

Page 11: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

External Computation Model• Secondary storage is divided into equal

sized blocks.

• The basic I/O operation transfers one block of information.

• Under certain circumstances, reading blocks of a file in sequential order is more efficient. – Minimize seek time.– File physically sequential.– Head does not move between accesses - no

timesharing

Page 12: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

More Model• Typically, the time to perform a single

block I/O operation is enough to Quicksort the contents of the block.

• Most systems have single drive, so must sort on a single drive.

• Need to minimize the number of block I/O operations.

Page 13: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Key Sorting• Often records are large while keys are small

• Approach 1– read in records, sort them, write them out

• Approach 2 (resembles pointer sort)– Read in only the key values– Store with each key the location on disk of its

associated record.– Read in records in key order and write them out

in sorted order

Page 14: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

External sort: Simple Mergesort• Quicksort requires random access to the

entire set of records

• A better process for external data is a modified Mergesort algorithm

• This processes n elements in O(log n) passes.

• A group of sorted records is called a run.

Page 15: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

External Mergesort Algorithm• Split the file into two files.• Read a block from each file.• Take first record from each block, output

them in sorted order.• Take next record from each block, output

them to a second file in sorted order.• Repeat until finished, alternating between

output files. Read new input blocks as needed. Now have runs of size 2

• Repeat steps 2-5 except the input files have groups of 2 that need to be merged.

• Each pass provides runs of twice the size.

Page 16: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Problems• Is each pass through then input and output

files truly sequential?– If the files are all on the same disk, then the

heads will be moving from one input file to another input file to an output file to another output file.

– Very helpful if each file has its own disk head.• How do we reduce the number of passes.

– At the beginning, read in as much data as possible and sort it internally and just write it out.

• Can merge more than 2 runs at a time. - multiway merging.

Page 17: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

THE MERGESORT• The Mergesort process has 2 phases:

– Break the file into large initial runs.– Merge the runs together to make a single sorted

list

Page 18: External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary

Replacement Selection• This method tries to maximize the size of

initial runs• Break available memory into an array for a

heap, an input buffer and an output buffer.• Fill the array from disk.• Make a min-heap• Send the smallest value to the output buffer• Read next key• If new key is greater than last output value

– replace the root with this key and “heapify”• else

– replace the root with the last key and “heapify”– add next record to a new heap (end of the

array).