CSCI 3431: OPERATING SYSTEMS
Chapter 11 – File-System Implementation (Pgs 461-499 )
File System Structure
Files are predominantly stored on Disks1. Can be rewritten in place2. All blocks directly accessible (c.f., CD) But really ...A. PersistenceB. Accessibility C. WriteabilityD. Access time
Layered Systems
Application(s)
Files, Directories: File System
OS – File Manager
Device Driver, Interrupt Handlers
Device + Hardware
File Representation
FCB: File Control Block – the OS representation of a file
Same as PCB representation of a process
Inode – FCB in Unix
Disk Organisation
Boot control block: typically block 1, sector 1, track 1, platter 1 – boot information
Volume control block: superblock – partition details (block size, number of blocks, blocks free, location of free block list)
Directory structure: Root directory "\" in a known location
FCBs: Inodes/Data for each actual file Data blocks: Contents of the files
OS Data
Mount table – what partitions are mounted?
Directory cache Open file table (system wide) Per-Process open file table IO BuffersAside:
Many OS/FS treat a directory as another kind of file
Disks
Are divided into sections called partitions or volumes
Partitions may contain a file system ("cooked") or store "raw" data directly
E.g., page swap partition has no file system Boot sector typically stores the boot loader The boot loader accesses the root partition
(of OS selected) which contains the OS and its root (always mounted) file system
Other partitions are mounted as required
Logical File System
Model of File System managed by OS and visible to programs/programmers
Example: Linux Components inode: an individual file FILE: an open file superblock: a file system dentry: a directory entry
Directories: May be implemented as: Lists: Sorted, Unsorted, B-Tree Hash Tables
Contiguous Allocation
Disk blocks are linearly ordered Files occupy continuous sets of blocks Problem occurs when files are deleted,
shortened, or moved creating spaces on the disk
Exactly the same issue as fitting a process into memory (best fit, first fit, etc.)
Compaction removes spaces, but creates extra work
Generally a bad idea for general purpose file systems, but may be useful for specialised OSs
Linked Allocation
Files are assigned a (potentially scattered) set of available disk blocks
A tiny portion of each block is used to store a pointer (address) of the next block
Need a second pointer to support "rewind" in a file
Slow file access because a block must be read, and then the pointer used to schedule the next read
Indexed Allocation
Use the first block to store a list (an "index") of all the blocks used
Index may waste space, but data blocks do not need pointers
Multi-level or linked approaches can be used for large files that need more than one index block
Access is faster than linked allocation, but still requires reads from many different disk locations
Indices can be cached in memory to improve performance
Free-Space Management
Generally need to know if a disk block is being used or is available
Could use a bitmap stored on the disk, with one bit per block
1 TeraB disk (with 4K blocks) requires 32 MegaB bitmap
Relatively fast and simple
Other Approaches
Linked list of free blocks Can use the empty blocks to store the
pointers to the next empty block Very space efficient -- Only need to store
one pointer to the first empty block Very simple, but time consuming to allocate
large numbers of blocks Can "group" the pointers into a single block
for efficiency, and have the last pointer on the block point to the next group of empty blocks
Compression
In run-length compression, we store a value, followed by the number of occurrences of that value – saves lots of space if long "runs" exist
We can compress the free space map by storing pairs of values: A free block, and the number of consecutive free blocks that follow it
This compressed version has as many entries as there are memory holes
Efficiency and Performance We generally desire a file system to be as small and
fast as possible However, what works best is often a factor of how it
will be used and factors such as: Disk size Other physical properties (heads, platters, etc.) Average file size Read:Write ratio Number of IO buffers Amount of RAM available for caching tables & indices, use of
cache for disk blocks as well as pages (Unified Virtual Memory)
Synchronous vs. Asynchronous access requirements Viability of "Read-Ahead" Redundancy requirements
Recovery
Lost data in RAM (except newly generated data not saved on disk) is usually recoverable in the event of errors, bugs, power-failures, etc. – reload it from disk
Disk data must be better protected so that errors and failures can be recovered from
Causes
Memory contents lost (power failure, crash) before disk can be updated ... particularly with cached index or free space tables
Disk block failure (hardware fault) Write failure (power loss, system
crash) Bugs in the OS, corruption of FS by
applications
Consistency Checking
fsck (unix) and chkdsk (windows) checks all the tables and structures on a disk for consistency.
I.e., does the free space + used space indicated by directories = all the available space?
Can be run at mount, at boot, via chron, etc.
Can be supplemented with change flags stored on disk, access/update timestamps, etc.
Journalled (logged) FS
All disk transactions are written first to a log
Log may be stored on a different disk for redundancy
Log tends to store a considerable amount of data for a non-trivial time period
If inconsistency is found, each log entry is checked to see if it was performed
Of course, if the log is corrupted, then we are still in trouble
Uses database transaction techniques
Duplication Techniques
Modems split their EPROM in half and duplicate things so there are two copies
If one is corrupted, the other is used Can use similar approaches with disks, but
is very wasteful of space Can also do limited duplication and avoid
overwriting data until the disk is full Complete duplication to another disk is the
only possible backup in the event of a hardware failure that renders the disk inoperable
NFS
The location of a file system shouldn't really matter to the user (except that non-local data may take longer to access)
Various different protocols are available
File storage in the "Cloud" is really just a trendy term for a networked file system on a WAN (usually the Internet)
Networked File Systems
Require: Mount Protocol Access Protocol (for specific FS
items) Naming Protocol – to allow local vs.
non-local paths to be mapped Possible format changes to facilitate
local hardware and OS needs – but this is often seen as an application-level concern
To Do:
Finish Assignment 2 (Due next week) Complete Lab 6 (last required lab) Read Chapter 11 (pgs 461-499; this
lecture)