Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Systems Design and ImplementationI.5 – File Systems
h
Jan Stoess
Philipp Kupferschmied
University of Karlsruhe
System Architecture Group, SS 2009
University of Karlsruhe
May 26, 2009
Reminder
All SDI Groups: Please contact the tutor before your presentations take place!
2© 2009 University of Karlsruhe, System Architecture Group
Tutor Marcel Noe Consultation Time:
Monday, 16.00 - 18.00 R154, 50.34
Overview
Introduction Motivation File types, attributes, access, operations Directory types, operations
3© 2009 University of Karlsruhe, System Architecture Group
o y yp s, op a o s Implementing files and directories [Parts taken from A.Tanenbaums slides on modern
OSes] Case Studies:
FAT NFS
Tanenbaum’s motivation for files: Enable storing large amount of data Make data survive termination of processes
or the system
Why files?
4© 2009 University of Karlsruhe, System Architecture Group
or the system Let processes access persistent data
concurrently My 2cents
Structure your data
Source: Andy Tanenbaum: Modern Operating Systems, 2nd edition. Supplementary powerpoint slides, http://www.cs.vu.nl/~ast/books/book_software.html
File naming and structuring
5© 2009 University of Karlsruhe, System Architecture Group
Typical file extensions.
File Structure
6© 2009 University of Karlsruhe, System Architecture Group
Three kinds of files byte sequence record sequence tree
File Types
7© 2009 University of Karlsruhe, System Architecture Group
(a) An executable file (b) An archive
File Attributes
8© 2009 University of Karlsruhe, System Architecture Group
Possible file attributes
File Access
Sequential access read all bytes/records from the beginning cannot jump around, but rewind convenient when medium was mag tape
9© 2009 University of Karlsruhe, System Architecture Group
g p Random access
bytes/records read in any order essential for data base systems read can be …
move file marker (seek), then read or … read and then move file marker
File Operations
1. Create2. Delete3. Open
7. Append8. Seek9. Get
10© 2009 University of Karlsruhe, System Architecture Group
4. Close5. Read6. Write
attributes10. Set
Attributes11. Rename
Example Program Using File System Calls
11© 2009 University of Karlsruhe, System Architecture Group
Example Program Using File System Calls
12© 2009 University of Karlsruhe, System Architecture Group
Memory-Mapped Files
Buffer cacheBuffer cache
sys_read()
13© 2009 University of Karlsruhe, System Architecture Group
(a) Reading files using file system calls (b) Reading files using memory mappings
Buffer cacheBuffer cache
DirectoriesSingle-Level Directory Systems
14© 2009 University of Karlsruhe, System Architecture Group
A single level directory system contains 4 files owned by 3 different people, A, B, and C
(Letters indicate owners of the directories and files)
DirectoriesTwo level Directory Systems
15© 2009 University of Karlsruhe, System Architecture Group
DirectoriesHierarchical Directory Systems
16© 2009 University of Karlsruhe, System Architecture Group
A hierarchical directory system
Directory and Path Names
17© 2009 University of Karlsruhe, System Architecture Group
A UNIX directory tree
Directory Operations
1. Mkdir2. Rmdir3 Opendir
5. Readdir6. Rename7. Link
18© 2009 University of Karlsruhe, System Architecture Group
3. Opendir4. Closedir 8. Unlink
File System Implementation
19© 2009 University of Karlsruhe, System Architecture Group
A possible file system layout
Implementing Files
20© 2009 University of Karlsruhe, System Architecture Group
(a) Contiguous allocation of disk space for 7 files(b) State of the disk after files D and E have been removed
Implementing Files
21© 2009 University of Karlsruhe, System Architecture Group
Storing a file as a linked list of disk blocks
Implementing Files
22© 2009 University of Karlsruhe, System Architecture Group
Linked list allocation using a file allocation table in RAM
Implementing Files
23© 2009 University of Karlsruhe, System Architecture Group
An example i-node
Implementing Directories
24© 2009 University of Karlsruhe, System Architecture Group
A simple directory fixed size entries disk addresses and attributes in directory entry
Directory in which each entry just refers to an i-node
Implementing Directories
25© 2009 University of Karlsruhe, System Architecture Group
Two ways of handling long file names in directory In-line In a heap
Shared Files
26© 2009 University of Karlsruhe, System Architecture Group
File system containing a shared file
Shared Files
27© 2009 University of Karlsruhe, System Architecture Group
Situation prior to linking After the link is created After the original owner removes the file
The FAT file system
Origins in the late 1970s Simple file system
Floppy disks less than 500K size.
Enhanced to support larger data.
28© 2009 University of Karlsruhe, System Architecture Group
pp g FAT = file allocation table
Specifies used/free areas of disk 3 FAT file system types
FAT12 FAT16 FAT32 Specifies #bits/entry in FAT structure
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
FAT structures
Boot record Cluster
Group of data sectors on disk Used to store file and directory data
29© 2009 University of Karlsruhe, System Architecture Group
Us d o s o a d d o y da a Number of sectors stored in boot record
File allocation table (FAT) Simple array of 12/16/32 bit entries Singly linked list of cluster chains (files) 2 synchronized copies / disk
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
FAT structures
Root directory Normal directory without “..” Location hardcoded after FAT
Data area
30© 2009 University of Karlsruhe, System Architecture Group
Data area Arranged in clusters
Wasted sectors #sectors % sizeof(cluster)
Boot Sector Root Folder DataFAT2FAT1 W
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
FAT entries
Simple bit field
0x00000000 Free
0x00000001 Reserved
31© 2009 University of Karlsruhe, System Architecture Group
0x00000001 Reserved
0x00000002-0xFFFFEFFF Used; value points to next cluster
0xFFFFFFF7 Bad
0xFFFFFFF8-0xFFFFFFFF Last cluster in file
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
Directories
Directories are special files Table of file entries
Structure of file entries Name + Extension (fixed size) Attributes Create time
32© 2009 University of Karlsruhe, System Architecture Group
Last access date Last modified time Last modified date Starting cluster number File size
Long file names Phony entries (invalid volume attribute) Ignored by most old DOS programs New programs can retrieve LFN from entry
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
Opening a file
Go to parent directory Search for file entry
Retrieve first cluster numberRetrieve data from cluster
33© 2009 University of Karlsruhe, System Architecture Group
Retrieve data from cluster For more clusters
Go to FAT Retrieve entry of first cluster Follow chain of clusters Retrieve data from clusters
Source: Microsoft Corporation. Microsoft Extensible Firmware Initiative FAT32 File System Specification. FAT: General Overview of On-Disk Format. Version 1.03, 2000
NFS – The Network File System
Invented by Sun Microsystems, mid 1980s Idea:
Transparent, remote access to filesystems Portability to different OSes and architectures
34© 2009 University of Karlsruhe, System Architecture Group
Approach: specified using external data representation (XDR)
describes protocols machine-independently
based on RPC package Simplify protocol definition, implementation, maintenance
Source: R.Sandberg et al. Design and Implementation of the Sun Network Filesystem. Proceeding of the USENIX 1985 Summer Conference
NFS – The Network File System
First implementation UNIX 4.2 kernel Completely new kernel interface
Separates generic from specific filesystem
35© 2009 University of Karlsruhe, System Architecture Group
Separates generic from specific filesystem implementations
Two basic parts VFS: operations on a filesystem VNode: operations on a file
Source: R.Sandberg et al. Design and Implementation of the Sun Network Filesystem. Proceeding of the USENIX 1985 Summer Conference
NFS Design considerations
Goals: Machine and OS independence Crash recovery
Transparent access
36© 2009 University of Karlsruhe, System Architecture Group
Transparent access Maintain UNIX semantics on client Reasonable performance
Source: R.Sandberg et al. Design and Implementation of the Sun Network Filesystem. Proceeding of the USENIX 1985 Summer Conference
NFS Design considerations
Basic design Uses RPC mechanism
Protocol defined as a set of procedures, arguments and results
Synchronous behavior
37© 2009 University of Karlsruhe, System Architecture Group
Synchronous behavior
Stateless protocol Each call contains all information to complete the call Stateful alternative discarded since
Client would need to detect server crashes Server would need to detect client crashes (why?)
No recovery needed after crash No difference between crashed and slow server
NFS Design considerations
Basic design RPC package is transport independent
First implementation uses UDP/IP
Most common parameter: file handle
38© 2009 University of Karlsruhe, System Architecture Group
Most common parameter: file handle Provided by server Used by client as reference Opaque for client
NFS protocol procedures
null() returns () lookup(dirfh, name) returns (fh, attr) create(dirfh, name, attr) returns (newfh, attr) remove(dirfh, name) returns (status)
39© 2009 University of Karlsruhe, System Architecture Group
getattr(fh) returns (attr) setattr(fh, attr) returns (attr)
read(fh, offset, count) returns (attr, data) write(fh , offset, count, data) returns (attr) rename(dirfh, name, tofh, toname) returns (status)
NFS protocol procedures
link(dirfh, name, tofh, toname) returns (status) symlink(dirfh, name, string) returns (status) readlink(fh) returns (string)
40© 2009 University of Karlsruhe, System Architecture Group
mkdir(dirfh, name, attr) returns (fh, newattr) rmdir(dirfh, name) returns (status) readdir(dirfh, cookie, count) returns(entries)
statfs(fh) returns (fsstats)next
NFS protocol procedures
Filesystem root obtained via external mount protocol Takes UNIX directory pathname Checks permissions Returns filehandle Idea:
41© 2009 University of Karlsruhe, System Architecture Group
Easy extension of filesystem access checks Only place where UNIX names are used
External data representation XDR Similar to IDL Specification of data types Specification of procedures Defines size, byte order, alignment of data types C-like definition
NFS Server Side
Stateless server No server-internal caching Server flushes modified data immediately
Filehandle generation
42© 2009 University of Karlsruhe, System Architecture Group
Filehandle = <filesystem id, inode number, inode generation number>
NFS introduces filesystem IDs NFS introduces inode generation number
(what for?)
NFS Client Side
Need transparent access to remote files Do not change path name structure Explicit <host:/path> not backwards compatible Approach:
43© 2009 University of Karlsruhe, System Architecture Group
Do hostname lookup and file address binding once Attach remote filesystem to local path Use mount protocol
Implementation: Add new filesystem interface to the kernel
VFS: operations on a remote file system VNode: operations on files within a file system
NFS Filesystem Interface
System Calls System Calls
Client Server
44© 2009 University of Karlsruhe, System Architecture Group
RPC/XDR
VFS/VNode
Sun 4.2 FS NFS Filesystem
RPC/XDR
Server routines
VFS/VNode
NFS Filesystem Interface
Filesystem operations per filesystem mount, mount_root
VFS operations per mounted filesystem
t t t tf
45© 2009 University of Karlsruhe, System Architecture Group
unmount, root, statfs, sync VNode operations
lookup, create, remove, rename open, close, rdwr, ioctl, select getattr, setattr, access mkdir, rmdir, readdir link, symlink, readlink …
NFS Filesystem Interface
VNode operations some operations map to NFS procedures, some
not Pathname lookup
Problem:
46© 2009 University of Karlsruhe, System Architecture Group
Pathname could contain mountpoint Mount information is contained in the client, above the
VNode layer Server cannot keep track of client mount points
Approach: Break path into components Do lookup per component Cache lookups in the client
NFS implementation
Completed around 1984 Implemented VNodes in the kernel RPC, XDR ported to kernel User-level mount service
47© 2009 University of Karlsruhe, System Architecture Group
User-level NFS server daemon allows for sleeping
NFS Problems
Root filesystems Sharing root file systems not possible
/tmp: names of temporary files are created with local names (process id)
/dev: no remote device access system Approach:
48© 2009 University of Karlsruhe, System Architecture Group
Approach: Share root FS partly, e.g., /usr only
Filesystem naming Client can mount a filesystem several times Different names for the same file system Increases confusion Approach:
Structure mountpoint names, e.g., /usr/server1
NFS Problems
Credentials and security Wanted UNIX style permissions Possible via RPC permission model
Pass authentication parameters with RPC UID, GID
49© 2009 University of Karlsruhe, System Architecture Group
UID, GID Problem: global UIDs, GIDs required
Administrative hassle Solution: Yellow pages (YP)
Database-like networked user/group administration Problem: remote root access
Remote root should not be equal to local root Solution: map root access to a special UID (nobody) Problem: root may have fewer rights to files than users!
root still can impersonate every local user
NFS Problems
Concurrent access: No agreed-on concurrency model for files Thus, NFS does not provide file locking
UNIX open file semantics: Problem:
50© 2009 University of Karlsruhe, System Architecture Group
can open a file and unlink afterwards strange but necessary semantics (tmp files)
Solution: Rename a file temporarily on server Client removes file after close
Similar problem: file access changes on open file Time skew:
E.g., making dependencies on remote files Solution: NTP (planned)
Initial NFS Performance
51© 2009 University of Karlsruhe, System Architecture Group
First version had pretty bad performance
NFS Performance Optimizations
Decrease number of read and write calls Add client cache Flush cache on close Helped a lot
Avoid extensive copying
52© 2009 University of Karlsruhe, System Architecture Group
py g Do XDR translation in place Saves 1 buffer copy
gettattr accounted for 90% of server calls stat on client produces 11 (!) getattr RPCs Add attribute cache Flushed periodically (every 3 seconds) Dropped to 10%
NFS Performance Optimizations
Make sequential reads faster Add read ahead in the server For on-demand executables:
Cluster on-demand loading requests
53© 2009 University of Karlsruhe, System Architecture Group
For small programs, load all pages at once
Increase lookup performance Add client name lookup cache Contains vnodes for remote directory names Flushed when retrieved attributes (modify time)
don’t match cached vnode attributes
NFS Performance Optimizations
Performance after optimizations
54© 2009 University of Karlsruhe, System Architecture Group
Problems remaining: Frequently executed stat calls are costly write is synchronous by design
NFS Future Work (anno 1984)
Future work Diskless mode for clients Remote file locking
Other filesystem types
55© 2009 University of Karlsruhe, System Architecture Group
Other filesystem types Performance Security improvements Automatic mounting
SDI File Service Design File names maintained by name server Names translate into a session handle as seen by the
client The session handle maps to disk blocks in the file
server
56© 2009 University of Karlsruhe, System Architecture Group
SDI File Service Design Fileserver design
Stateful Stateless
Fileserver interfaces File handle layout
57© 2009 University of Karlsruhe, System Architecture Group
File handle layout Operations on files Operations on directories File attributes (basic)
Fileserver implementation Fileserver / Nameserver relationship Data transfer: copying, mapping Stateful fileserver: which state to hold
SDI File Service Design Groups (2)
Groups SDI 3 SDI 6
Presentation Presentation June 04, 2009
Please don’t forget to discuss your slides with Marcel beforehand
58© 2009 University of Karlsruhe, System Architecture Group