Download pptx - Chapter 11 – File-System Implementation (Pgs 461-499 )

CSCI 3431: OPERATING SYSTEMS

Chapter 11 – File-System Implementation (Pgs 461-499 )

File System Structure

Files are predominantly stored on Disks1. Can be rewritten in place2. All blocks directly accessible (c.f., CD) But really ...A. PersistenceB. Accessibility C. WriteabilityD. Access time

Layered Systems

Application(s)

Files, Directories: File System

OS – File Manager

Device Driver, Interrupt Handlers

Device + Hardware

File Representation

FCB: File Control Block – the OS representation of a file

Same as PCB representation of a process

Inode – FCB in Unix

Disk Organisation

Boot control block: typically block 1, sector 1, track 1, platter 1 – boot information

Volume control block: superblock – partition details (block size, number of blocks, blocks free, location of free block list)

Directory structure: Root directory "\" in a known location

FCBs: Inodes/Data for each actual file Data blocks: Contents of the files

OS Data

Mount table – what partitions are mounted?

Directory cache Open file table (system wide) Per-Process open file table IO BuffersAside:

Many OS/FS treat a directory as another kind of file

Disks

Are divided into sections called partitions or volumes

Partitions may contain a file system ("cooked") or store "raw" data directly

E.g., page swap partition has no file system Boot sector typically stores the boot loader The boot loader accesses the root partition

(of OS selected) which contains the OS and its root (always mounted) file system

Other partitions are mounted as required

Logical File System

Model of File System managed by OS and visible to programs/programmers

Example: Linux Components inode: an individual file FILE: an open file superblock: a file system dentry: a directory entry

Directories: May be implemented as: Lists: Sorted, Unsorted, B-Tree Hash Tables

Contiguous Allocation

Disk blocks are linearly ordered Files occupy continuous sets of blocks Problem occurs when files are deleted,

shortened, or moved creating spaces on the disk

Exactly the same issue as fitting a process into memory (best fit, first fit, etc.)

Compaction removes spaces, but creates extra work

Generally a bad idea for general purpose file systems, but may be useful for specialised OSs

Linked Allocation

Files are assigned a (potentially scattered) set of available disk blocks

A tiny portion of each block is used to store a pointer (address) of the next block

Need a second pointer to support "rewind" in a file

Slow file access because a block must be read, and then the pointer used to schedule the next read

Indexed Allocation

Use the first block to store a list (an "index") of all the blocks used

Index may waste space, but data blocks do not need pointers

Multi-level or linked approaches can be used for large files that need more than one index block

Access is faster than linked allocation, but still requires reads from many different disk locations

Indices can be cached in memory to improve performance

Free-Space Management

Generally need to know if a disk block is being used or is available

Could use a bitmap stored on the disk, with one bit per block

1 TeraB disk (with 4K blocks) requires 32 MegaB bitmap

Relatively fast and simple

Other Approaches

Linked list of free blocks Can use the empty blocks to store the

pointers to the next empty block Very space efficient -- Only need to store

one pointer to the first empty block Very simple, but time consuming to allocate

large numbers of blocks Can "group" the pointers into a single block

for efficiency, and have the last pointer on the block point to the next group of empty blocks

Compression

In run-length compression, we store a value, followed by the number of occurrences of that value – saves lots of space if long "runs" exist

We can compress the free space map by storing pairs of values: A free block, and the number of consecutive free blocks that follow it

This compressed version has as many entries as there are memory holes

Efficiency and Performance We generally desire a file system to be as small and

fast as possible However, what works best is often a factor of how it

will be used and factors such as: Disk size Other physical properties (heads, platters, etc.) Average file size Read:Write ratio Number of IO buffers Amount of RAM available for caching tables & indices, use of

cache for disk blocks as well as pages (Unified Virtual Memory)

Synchronous vs. Asynchronous access requirements Viability of "Read-Ahead" Redundancy requirements

Recovery

Lost data in RAM (except newly generated data not saved on disk) is usually recoverable in the event of errors, bugs, power-failures, etc. – reload it from disk

Disk data must be better protected so that errors and failures can be recovered from

Causes

Memory contents lost (power failure, crash) before disk can be updated ... particularly with cached index or free space tables

Disk block failure (hardware fault) Write failure (power loss, system

crash) Bugs in the OS, corruption of FS by

applications

Consistency Checking

fsck (unix) and chkdsk (windows) checks all the tables and structures on a disk for consistency.

I.e., does the free space + used space indicated by directories = all the available space?

Can be run at mount, at boot, via chron, etc.

Can be supplemented with change flags stored on disk, access/update timestamps, etc.

Journalled (logged) FS

All disk transactions are written first to a log

Log may be stored on a different disk for redundancy

Log tends to store a considerable amount of data for a non-trivial time period

If inconsistency is found, each log entry is checked to see if it was performed

Of course, if the log is corrupted, then we are still in trouble

Uses database transaction techniques

Duplication Techniques

Modems split their EPROM in half and duplicate things so there are two copies

If one is corrupted, the other is used Can use similar approaches with disks, but

is very wasteful of space Can also do limited duplication and avoid

overwriting data until the disk is full Complete duplication to another disk is the

only possible backup in the event of a hardware failure that renders the disk inoperable

NFS

The location of a file system shouldn't really matter to the user (except that non-local data may take longer to access)

Various different protocols are available

File storage in the "Cloud" is really just a trendy term for a networked file system on a WAN (usually the Internet)

Networked File Systems

Require: Mount Protocol Access Protocol (for specific FS

items) Naming Protocol – to allow local vs.

non-local paths to be mapped Possible format changes to facilitate

local hardware and OS needs – but this is often seen as an application-level concern

To Do:

Finish Assignment 2 (Due next week) Complete Lab 6 (last required lab) Read Chapter 11 (pgs 461-499; this

lecture)