29
A Fast File System for UNIX By Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, Robert S. Fabry Presented by Agnimitra Roy

A Fast File System for UNIX

Embed Size (px)

DESCRIPTION

A Fast File System for UNIX. By Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, Robert S. Fabry Presented by Agnimitra Roy. Agenda. Introduction Old file system overview New file system organization Performance File system functional enhancements Conclusion. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: A Fast File System for UNIX

A Fast File System for UNIX

By Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, Robert S. Fabry

Presented by Agnimitra Roy

Page 2: A Fast File System for UNIX

Agenda

Introduction Old file system overview New file system organization Performance File system functional enhancements Conclusion

Page 3: A Fast File System for UNIX

Introduction Original 512 byte UNIX file system

- Runs on PDP-11 - Has simple & elegant file system facilities - No alignment constraints on data transfers - All operations made to appear synchronous - Cannot provide required data throughput rate when

used on VAX-11. New UNIX file system introduced with 4.2 BSD

Page 4: A Fast File System for UNIX

Old File System Developed at Bell Labs Each disk drive is divided into one or more partitions Each disk partition may contain one file system File system never spans multiple partitions

Page 5: A Fast File System for UNIX

Old File System – Description File system described by its super-block Super Block has information about

- Number of data blocks in the file system- Count of maximum number of files- Pointer to free list- Linked list of all free blocks in the file system

File systems contain files

Page 6: A Fast File System for UNIX

File System Layout on Disk

Ref: Modern Operating Systems by A. S. Tanenbaum

Page 7: A Fast File System for UNIX

Files in Old File System Files distinguished as directories

- Contain pointers to files that may be directories A file has an associated descriptor called i-node I-node has information about

- Ownership of the file- Timestamps mark last modification & access time of file- An array of indices pointing to data blocks for the file

Page 8: A Fast File System for UNIX

I-node Data Structure Given the i-node, it is

possible to find all blocks of a file

Advantage of i-node scheme over linked files using an in-memory table- I-node needs to be in memory only when the corresponding file is open.

Page 9: A Fast File System for UNIX

I-node (contd.) An i-node may also contain

- References to indirect blocks containing further data block indices

In a file size with a 512 byte block size1. A singly indirect block: Contains 128 further block

addresses2. A doubly indirect block: Contains 128 addresses of

further singly indirect blocks3. A triply indirect block: Contains 128 addresses of

further doubly indirect blocks

Page 10: A Fast File System for UNIX

I-node Diagram

Page 11: A Fast File System for UNIX

Issues with Original UNIX File System Cannot provide required data throughput rate required

by applications - When used on VAX-11 - VLSI design / Image Processing - small amount of processing on large data quantities - Programs that map files from the file system into large

virtual address spaces What is required?

- File system provides higher bandwidth than the original 512 byte UNIX

Page 12: A Fast File System for UNIX

File System performance Work done at Berkley to improve reliability and

throughput System performance increased by a factor of 2

- Block size changed from 512 to 1024 bytes Increase caused by 2 factors:

- Each disk transfer accessed twice as much data - Most files can be described without need to access

indirect blocks since direct blocks contain twice as much data

File system with current changes referred to as old file system

Performance improvement indicates block size increase improves throughput

Page 13: A Fast File System for UNIX

Existing Problem Throughput had doubled - Old file system was still using only about 4% of disk

bandwidth Free list was initially ordered for optimal access - Got quickly scrambled as files were created & removed. With time, free list became entirely random - Caused files to have their blocks allocated randomly

over the disk Required a seek before every block access Transfer rate deteriorated due to randomization of data

block placement.

Page 14: A Fast File System for UNIX

New File System Each disk drive contains one or more file systems. A file system is described by its super-block

located at the beginning of the file system’s disk partition contains critical data

Divides a disk partition into one or more areas called cylinder groups Composed of one or more consecutive cylinders on a disk Bookkeeping information: redundant copy of the superblock, space for i-

nodes, a bit map and, summary information Bookkeeping information starts at a varying offset from the beginning of

the cylinder group

Page 15: A Fast File System for UNIX

NFS–Optimizing Storage Utilization

Data is laid out – larger blocks can be transferred in a single disk transaction

Example: File in OFS – 1024 byte blocks File in NFS – 4096 byte blocks

By increasing block size, disk access in NFS can transfer 4 times information than OFS per disk transaction.

Problem with larger blocks: - Most UNIX file systems are composed of small files - A uniformly large block size wastes space

Page 16: A Fast File System for UNIX

Optimization of storage allocation

Note: As block size on the disk increases, amount of waste raises quickly

Page 17: A Fast File System for UNIX

NFS - Optimizing Storage Allocation

Divide a single file system block into one or more fragments

Block map is associated with each cylinder group

- Records the space available in a cylinder group at the fragment level

Each bit in the map records the status of a fragmentX -> fragment in use

0 -> fragment available

Bits in map

XXXX XX00 00XX 0000

Fragment numbers

0 - 3 4 - 7 8 - 11

12 - 15

Block numbers

0 1 2 3

Example layout of blocks and fragments in a 4096/1024 file system

Page 18: A Fast File System for UNIX

NFS - File System Parameterization

Goal To parameterize the processor capabilities & mass storage

characteristics

Blocks can be allocated in an optimum configuration dependent way

Parameters used Speed of the processor Hardware support for mass storage transfers Characteristics of the mass storage devices

Page 19: A Fast File System for UNIX

NFS - Global Layout Policies

Use file system wide summary information Responsible for deciding the placement of

new directories and files

Calculate rotationally optimal block layouts Decide when to force a long seek to a new cylinder group if there

are insufficient blocks left in the current cylinder group

Try to improve performance by clustering related information

Page 20: A Fast File System for UNIX

NFS - Layout Policies (1/2) Tries to place all i-nodes of files

in a directory in the same cylinder group

Allocation of i-nodes is done randomly/using a next free strategy within a cylinder group- Small & constant upper bound on the number of disk transfers (to access the i-nodes for all files in a directory)

When data blocks are used, file spilling is handled by redirecting block allocation to a different cylinder group

NFS OFS

All i-nodes for a particular cylinder group can be read with 8-16 disk transfers

Requires one disk transfer to fetch the i-node for each file in a directory

Page 21: A Fast File System for UNIX

NFS - Layout Policies (2/2)

Global policy routines call local allocation routines Use a locally optimal scheme to layout data blocks

Methods to improve file system performance Increase the locality of reference

To minimize seek latency Improve the layout of data

To make larger transfers possible

Page 22: A Fast File System for UNIX

Local Allocator’s 4-Level Allocation Strategy

Use next available block rotationally closest to the requested block

If there are no blocks available on the same cylinder, use a block within the same cylinder

group

If that cylinder group is entirely full, quadratically hash the cylinder group number to choose another cylinder group to look for a free block

If hash fails, apply exhaustive search to all cylinder groups

Note: Quadratic hash is used - fast in finding unused slots in nearly full hash tables

Page 23: A Fast File System for UNIX

Performance

I-node layout policy is effective

Large directory have many directories within it # of disk accesses for i-nodes cut by a factor of 2

Large directories having only files # of disk accesses for i-nodes cut by a factor of 8

Page 24: A Fast File System for UNIX

Throughput Analysis (1/2)

Page 25: A Fast File System for UNIX

Throughput Analysis (2/2) Details

In the 8192 byte block file system, the write rates are about the same as the read rates.

In the 4096 byte block file system, the write rates are slower than the read rates

Reason: The slower write rates occur because the kernel has to do twice as many disk allocations/sec, making the processor unable to keep up with the disk transfer rate

Results % of bandwidth is a measure of the effective utilization of

the disk by the file system The OFS uses about 3-5% of the disk bandwidth, while NFS uses

upto 47% of the bandwidth

Both reads and writes are faster in the new system Speedup is due to larger block size used by the new file system

Page 26: A Fast File System for UNIX

File System Functional Enhancements (1/3) Long File Names

File names can be of nearly arbitrary length File Locking

Old File System No provision for locking files Drawbacks

Processor consumed CPU time by looping over attempts to create locks

Locks left lying around because of system crashes had to be manually remove

Processes running as system administrators are always permitted to create files

New File System Provision for file locking

Hard locks: Enforced when a program tries to access a file Advisory locks: Applied only when requested by a program

Page 27: A Fast File System for UNIX

File System Functional Enhancements (2/3) Symbolic Links

Implemented as a file that contains a pathname When system encounters a symbolic link while interpreting

a component of a pathname, the contents of the symbolic link is pre-pended to the rest of the pathname, and this name is interpreted to yield the resulting pathname

Allows references across physical file systems and supports inter-machine linkage.

Rename Old File System

Required three calls to the system If programs were interrupted or the system crashed between these

calls , target file could be left with only its temporary name. New File System

Create a new version of an existing file: Create a new version as a temporary file and rename the temporary file

Page 28: A Fast File System for UNIX

File System Functional Enhancements (3/3)

Quotas Old File System

Any single user can allocate all available space in the file system

New File System Quota mechanism sets limit on both number of i-nodes and

the number of disk blocks that a user may allocate

Page 29: A Fast File System for UNIX

Conclusion

Transition from Old File System to Fast File System

Fast File Systems include: FreeBSD NetBSD OpenBSD NeXTStep Solaris