View
218
Download
0
Category
Preview:
DESCRIPTION
HW 6 It was too easy (graded out of extra credit) –mean 75 –median 80
Citation preview
Lecture Topics: 11/22• HW 7• File systems
– block allocation• Unix and NT
– disk scheduling– file caches– RAID
File System Overview• A disk block (aka cluster) is a fixed-size contiguous
group of disk sectors• You can imagine a file is simply a sequence of disk
blocks– may not be contiguous
• A directory is just a file that points to other files (or directories)
• File system issues– how many sectors per block?– how do you keep track of which blocks a file is using?– how do you keep track of which blocks are free?– most files are small, but most I/O is to big files. Must
optimize both
Sectors per Block• What if there are many sectors per block
– a file might fit in a single block (faster access)
– internal fragmentation (FAT16 b/c can only have 64K blocks)
• What if there is only one sector per block– increases access time because files are
spread over multiple blocks
Contiguous Allocation• Allocating a file in contiguous blocks (e.g.
blocks on the same track) makes accessing it fast, why? Random access is easy
• What does the directory need to store?
Pain to make files bigger.
Often, must copy whole files.
Linked Allocation• Each block contains a pointer to the next
block in the file (the last block is NULL)• What does the directory need to store?
• Advantages– easy to grow files
• Disadvantages– poor random access– pay seek penalty many times
Indexed Allocation• An array lists where each block of
the file is stored• Try to allocate blocks contiguously• But can allocate a blocks anywhere• Where is this array list stored?
• Is the array fixed size?
Unix Inodes• In Unix this list of blocks is stored in an inode
– for each file a directory stores an the file name and a inode • Some entries point directly to a file block
– these are sufficient for small files (up to 1KB)• Some entries point to a list of block entries
– these are sufficient for medium sized files (up to 256KB)• Some entries point to lists of lists of block entries
– these are sufficient for large files (up to 64MB)• Some entries point to lists of lists of lists of block
entries– these are sufficient for humongous files (up to 16GB)
• Advantages– number of pointers grows dynamically– can have sparse files
Inode Exampleother file
information
direct blocks
singly indirectdoubly indirecttriply-indirect
data
data
data
data
data
datadatadatadata
data
data
data
datadata
NTFS• The MFT (master file table) stores information about
each file• This is stored at a well known location so you can
always find it• Each entry is 1KB and stores
– name, attribute, security info, data– a small file’s data fits in the MFT entry (don’t even need to
allocate another block)– or data can be list of block ranges (similar to inodes)
• A directory in NT is like any other file– it stores the MFT numbers of the files (or subdirectories) in
that directory• The root directory of a volume is stored at a fixed
location in the MTF so you always know where to start
Free Space• How do you find free disk blocks?• Bitmap: One long string of bits
represents the disk, one bit per block– Unix and NT
• Linked list: each free block points to the next one (slow!)
• Grouping: list free blocks in the first free block
• Counting: keep a list of streaks of free blocks and their lengths
Making Disks Faster• If a program reads one byte from a
file and does some processing, then reads the next byte from a file and does some processing?
• What if a program writes results to a file in the same way
• Ways to make disks faster– minimize seeks– caching
Disk Layout• Prevent fragmentation
– allocate files to contiguous blocks• Put directories and their files (and
the files' inodes) near each other– improves locality, don’t have to seek
far• Put commonly used directories in center track
Thanksgiving directory on same track as Thanksgiving files
Disk Scheduling• The disk has requests to read tracks
– 0, 10, 4, 7 (0 is on the outside)• If the disk head is at track 1, how
should we order these reads to minimize how far the disk head moves?
Disk Scheduling• Evaluate these scheduling algorithms• FIFO--First In First Out
– what's the problem?• SSTF--Shortest Seek Time First
– pick the request closest to the disk head– what's the problem
• SCAN– also known as an elevator algorithm– take the closest request in the direction of
travel– head moves back and forth from the center to
the edge, to the center, to the edge…– does this solve SSTF's problem?
Disk Buffers• Most files are read sequentially• When one block is read, the disk
reads the blocks that follow it because they will likely be read too
• These blocks are stored in a memory buffer on the disk
• Reads to the next blocks don’t have to pay seek and rotational delay
File Caches• File accesses exhibit locality just like
everything else• Therefore cache frequently-used file
blocks in main memory– modern file systems wouldn't work without this
• File cache should be write-through not write-back? Why?
• It's a little ironic that we use memory to store frequently-used disk blocks and disk to store infrequently used memory pages
File Cache• A portion of memory is devoted to
storing frequently used files• The amount of memory changes
based on the workload– if more files are being accessed then
use more memory• Virtual pages that are evicted
from physical memory often go to the file cache before the page file– gives a virtual page a second chance– doesn't require a copy because file
cache can be stored anywhere in memory
Memory
Virtual memory pages
File cache
Frequently used file blocks
A Case for RAID• If CPU and memory speeds increase• But IO does not improve at the
same rate
• All IO applications will become IO bound
• How do we fix this?
1984 Disk Comparison• SLED (Single Large
Expensive Magnetic Disk)
• IBM 3380– 14” diameter– 24 cubic feet– 7.5 GB– 200 IO’s per second– 3 MB/sec transfer
rate
• Conners CP3100– 3.5” diameter– .03 cubic feet– 100 MB– .2 IO’s per second– .3 MB/sec transfer
rate
2000 Disk Comparison• Today there are no longer two sets
of disks (expensive and inexpensive)– there are also no longer two sets of
CPUs • How do we make these small
inexpensive disks fast and reliable?
AID• Array of Inexpensive Disks
– use a lot of small inexpensive disks instead of one big expensive disk
• Advantages– lower cost– improved data rate (pieces of a file on multiple
disks)– improved IO rate (increase xput by how many
disks you have)• Disadvantages
– reliability is terrible • Mean Time to Failure changes from 4 years to 2 weeks
RAID• Redundant Array of Inexpensive Disks• Use extra disks to improve reliability
– data disks– check disks (parity)
• Can provide any level of reliability• But we still want
– efficient disk usage– good performance
• RAID1 – RAID5
RAID4• Use one disk to store the parity of the
the other disks– parity bit is 1 if the sum of the other bits is
odd– parity bit is 0 if the sum of the other bits is
even– allows you to determine if any single bit is
wrong– can recreate a disk if one disk fails by using
the parity disk– Example, Disk3 crashes, what was stored
there?Disk0 Disk1 Disk2 Disk3 Disk4 Parity
Disk
1 0 1 ? 0 1
RAID6 Continued• Striping a file with 16 blocks
across 5 disks + 1 parity disk
051015
16110
27121
38132
49143
PPPP
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Parity Disk
• Can read file in ¼ of the time
Recommended