29
Linux & The File System by Dale Brunner CS430 Spring 2006

Linux & The File System by Dale Brunner CS430 Spring 2006

Embed Size (px)

Citation preview

Page 1: Linux & The File System by Dale Brunner CS430 Spring 2006

Linux & The File System

by Dale Brunner

CS430 Spring 2006

Page 2: Linux & The File System by Dale Brunner CS430 Spring 2006

File Systems Supported

• Linux supports many file systems, some of them are as follows: ext, ext2, xia, minix, umsdos, msdos, vfat, proc, smb, ncp, iso9660, sysv, hpfs, affs and ufs

• First we will examine the EXT2 file system as it is commonly used with Linux.

• Then we will compare this with the MS-DOS file system.

Page 3: Linux & The File System by Dale Brunner CS430 Spring 2006

History• Minix was the first file system of Linux

since the first linux code was written on a machine using that FS.

• Minix had some shortcomings and as drive size grew, a new FS work began on a new FS for Linux.

• EXT was released in April 1992, and was the first to use the VFS API.

• As drive size increased further and further metadata information was desired by users, EXT2 was developed.

Page 4: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2

• One popular file system for Linux is EXT2

• As we learned, data held in files is kept in data blocks all of which are the same length.

• Each file in the system is described with an inode data structure.

• The inode contains the specifics of the file – the permissions, last access time, the blocks where data resides, etc.

Page 5: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 - inodes

• Each inode has a unique ID number.• The inodes are stored in inode tables.• Directories are essentially inodes that

point to other inodes.• EXT2 divides any logical partition that it

occupies into block groups. These groups not only hold file and directory information but also have a copy of data that can be used to recover the file system should corruption occur.

Page 6: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 – Disk Layout

Overall Disk:

Block Group:

Page 7: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 – inode diagram

Page 8: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 – inode Overview• From the diagram we know that EXT2 uses a

“combined scheme” of indexed allocation (using both linked and multilevel index blocks) as described in our textbook.

• There are 12 direct blocks that point directly to blocks with data followed by indirect blocks which use one or more level(s) of index blocks.

• The kernel keeps a hashtable of inodes in the inode cache (which resides in memory) – hash value is based on superblock address and inode number.

Page 9: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 - Superblocks

• The superblocks contain information about the general size and “shape” of the file system – they can be used to recover the system should corruption occur.

• A superblock holds the following information: magic number, revision level, mount count, block group number, block size, blocks per group, free blocks, free inodes, first inode.

Page 10: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 - Superblock Components• Magic number: a number provided as a mechanism for

the OS to check that the superblock belongs to the FS it says it does.

• Revision level: this is the version of the file system and is used to determine compatibility.

• Mount count, maximum mount count: mount count is incremented each time FS is mounted up to the max.

• Block group number: the number of the block group that this superblock is a part of.

• Block size: the size of a block in bytes.• Blocks per group: obviously, number of blocks per

block group.• Free blocks: number of free blocks in the entire FS• Free inodes: number of free inodes in the FS• First inode: inode number of the first inode in the FS

Page 11: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 – Group Descriptor

• Like superblocks, all group descriptors are duplicated in each block group in case of file system corruption.

• A group descriptor describes a block group and contains the following information: blocks bitmap, inode bitmap, inode table, free blocks count, free inodes count, used directory count.

Page 12: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 - Directories

Page 13: Linux & The File System by Dale Brunner CS430 Spring 2006

EXT2 – Directory Overview

• The first field in the directory entry is the inode – an index into the inode table of the block group.

• The second field is the length in bytes of the directory entry.

• The third field is the name (i.e path) of the file.

Page 14: Linux & The File System by Dale Brunner CS430 Spring 2006

Another Widely Used FS: MS-DOS

This is a typical DOS volume. As youcan see, the first block contains the bootstrap and some information aboutthe volume. After the bootstrap comesthe File Allocation Table (FAT). Thistable is used as both a free list and asa way to keep track of allocated blocks.After the FAT is the Root Directory and following that are the File Data Clusters which contain the files on the disk.

Page 15: Linux & The File System by Dale Brunner CS430 Spring 2006

MS-DOS: Bootstrap SectorByte(s) Contents

0-2 first instruction of bootstrap routine

3-10 OEM name

11-12 number of bytes per sector

13 number of sectors per cluster

14-15 number of reserved sectors

16 number of copies of the file allocation table

17-18 number of entries in root directory

19-20 total number of sectors

21 media descriptor byte

22-23 number of sectors in each copy of file allocation table

24-25 number of sectors per track

26-27 number of sides

28-29 number of hidden sectors

30-509 bootstrap routine and partition information

510-511 signature (for error checking)

Page 16: Linux & The File System by Dale Brunner CS430 Spring 2006

MS-DOS: Bootstrap Sector (cont).

Here we can see the structure of the bootstrap sector. The first piece of information is a branch to the bootstrap code which reads the BIOS parameter information and the FDISK table and boots MS-DOS. The FDISK table contains partition information (this was used in cases of “large” drives that were bigger than 32MB) so the users could have multiple file systems on a single physical drive.

Page 17: Linux & The File System by Dale Brunner CS430 Spring 2006

MS-DOS: FATWhile most file systems now have a bitmap to indicate free space, MS-DOS uses the FAT to keep track of allocation and free space simultaneously. If a block is free, the entry in the FAT will be all 0s. Otherwise the entry in the FAT will contain the number of the next logical cluster in the file. Note that the directory entry does not contain the number of the final block, instead the end of a file is indicated by a -1 in the FAT.

Page 18: Linux & The File System by Dale Brunner CS430 Spring 2006

MS-DOS, FAT16, FAT32

• The DOS file system is sometimes referred to the FAT12 file system because it uses 12 bits for each entry in the FAT

• The FAT16 file system was created to accommodate larger disks as it used 16 bits for each entry in the FAT

• The FAT32 file system was, obviously, designed to use 32 bits in each entry in the FAT in order to support even larger disks.

Page 19: Linux & The File System by Dale Brunner CS430 Spring 2006

Playing with the Linux FS• Now we will examine some of the structure

of the Linux FS code.• First, a look at the structure of both the

inode and superblock and a description of some of their components.

• Then a look at the structure of the superblock operations (which is a C struct containing the operations that can be done to the superblock).

Page 20: Linux & The File System by Dale Brunner CS430 Spring 2006

Playing with Linux FS:inode Structure

struct inode {struct hlist_node i_hash; //the hash value of this inode (I think)umode_t i_mode; //mode: read, writeunsigned int i_nlink; //number of links to this inodeuid_t i_uid; // id of ownerloff_t i_size; //size of this inodestruct timespec i_atime; //access timestruct timespec i_mtime; //modified timestruct timespec i_ctime; //created timeunsigned int i_blkbits; //number bits in blockunsigned long i_blksize; //block size in bytes?unsigned long i_blocks; //number of blocks usedunsigned short i_bytes; //number of bytes in this filestruct rw_semaphore i_alloc_sem; // read/write semaphore?struct inode_operations *i_op; //operations performed on an inodestruct file_operations *i_fop; // file operationsstruct super_block *i_sb; //pointer to the superblockstruct dquot *i_dquot[MAXQUOTAS]; //disk quota controlunsigned long i_state; //state of the inode, dirty/cleanunsigned long dirtied_when; /* jiffies of first dirtying */ …..};

Page 21: Linux & The File System by Dale Brunner CS430 Spring 2006

Playing with Linux FS: Superblock Structure

• The following is the struct representing the superblock:struct super_block { struct list_head s_list; unsigned long s_blocksize; unsigned char s_blocksize_bits; unsigned char s_lock; unsigned char s_dirt; struct file_system_type *s_type; struct super_operations *s_op; unsigned long s_magic; struct dentry *s_root; wait_queue_head_t s_wait; struct list_head s_dirty; /* dirty inodes */ struct list_head s_files; struct block_device *s_bdev; struct list_head s_mounts; /* vfsmount(s) of this one */ struct quota_mount_options s_dquot; /* Diskquota specific options */ ….. };

Page 22: Linux & The File System by Dale Brunner CS430 Spring 2006

Superblock struct: Explaineds_list: a doubly-linked list of all active superblocks.s_blocksize, s_blocksize_bits: blocksize and blocksize in bits.s_lock: indicates whether superblock is currently locked.s_dirt: indicates when superblock is changed and not yet written to disk.s_type: pointer to struct file_system_type of the corresponding filesystem. 7. s_op: pointer to super_operations structure which contains methods to

read/write inodes etc. s_magic: filesystem's magic number. Used by some (e.g. minix) filesystems

to determine their version.s_root: dentry of the filesystem's root. (dentry is a way to convert names to

inodes)s_wait: waitqueue of processes waiting for superblock to be unlocked.s_dirty: a list of all dirty inodes.s_files: a list of all open files on this superblock. s_bdev: for FS_REQUIRES_DEV, this points to the block_device structure

describing the device the filesystem is mounted on.s_mounts: a list of all vfsmount structures, one for each mounted instance of

this superblock.

Page 23: Linux & The File System by Dale Brunner CS430 Spring 2006

Superblock Operations:

• Now let’s look at some operations we can perform on the superblock:

struct super_operations { … void (*read_inode) (struct inode *); void (*dirty_inode) (struct inode *); int (*write_inode) (struct inode *, int);

void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); void (*put_super) (struct super_block *); void (*write_super) (struct super_block *);

int (*statfs) (struct super_block *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *);

…… };

Page 24: Linux & The File System by Dale Brunner CS430 Spring 2006

Superblock Operations Explainedread_inode: reads the inode from the filesystem. The job of the filesystem's read_inode() method is to locate the disk block which contains the inode to be read and use buffer cache bread() function to read it in and initialize the various fields of inode structure.write_inode: write inode back to disk.delete_inode: Used to delete the on-disk copy of the inode and calls clear_inode() on VFS inode.put_super: called at the last stages of umount() system call to notify the filesystem that any private information held by the filesystem about this instance should be freed. Typically this would brelse() the block containing the superblock and kfree() any bitmaps allocated for free blocks, inodes, etc. NOTENOTE: brelse releases locked buffers.write_super: called when superblock is written back to disk. statfs: implements fstatfs()/statfs() system calls – tells about the status of the file system. remount_fs: called whenever filesystem is being remounted.clear_inode: Filesystems that attach private data to inode structure are told to free it.umount_begin: called during forced umount to notify the filesystem beforehand, so that it can do its best to make sure that nothing keeps the filesystem busy.

Page 25: Linux & The File System by Dale Brunner CS430 Spring 2006

The VFS• What is it? Well, as we learned it is a way

to provide a level of abstraction when dealing with disk access.

• The VFS provides an API – whenever a process needs to read or write, rather than having to know about all the hardware specifics it just tells the OS, “Hey, write this thing to disk and I don’t care how”. The OS calls the VFS read/write function which in turn calls on the appropriate file system read/write function which uses device drivers to perform disk operations.

Page 26: Linux & The File System by Dale Brunner CS430 Spring 2006

VFS Diagram

Page 27: Linux & The File System by Dale Brunner CS430 Spring 2006

Modifying the VFS

• A simple modification of the VFS allows us to see when the Linux kernel is using the sys_open system call.

• The code I modified is located in linux/fs/open.c

asmlinkage long sys_open(const char __user *filename, int flags, int mode){ //This line prints the names of files as they’re opened by the kernel: printk("The VFS is opening the file: %s\n",filename); if (force_o_largefile()) flags |= O_LARGEFILE; return do_sys_open(filename, flags, mode);}

Page 28: Linux & The File System by Dale Brunner CS430 Spring 2006

Final Notes: EXT3 vs EXT2

• Why did I choose to provide information about EXT2? Well, it’s essentially the same as EXT3 except that EXT3 has journaling whereas EXT2 does not.

• Journaling, as we have discussed is a way of logging what is about to happen so that in the event of a system failure the log can be used for recovery.

Page 29: Linux & The File System by Dale Brunner CS430 Spring 2006

Remember, if an inode is no longer useful then:

“We just terminate it with extreme prejudice.”

Referenceshttp://www.tldp.org/HOWTO/Filesystems-HOWTO-6.htmlhttp://www.faqs.org/docs/kernel_2_4/lki-3.htmlhttp://e2fsprogs.sourceforge.net/ext2intro.htmlhttp://lxr.linux.no/source/http://www.tldp.org/LDP/tlk/fs/filesystem.htmlhttp://www.osweekly.com/index.php?option=com_content&Itemid=&task=view&id=2240http://alumnus.caltech.edu/~pje/dosfiles.htmlhttp://www.seas.ucla.edu/classes/mkampe/cs111.sq05/docs/dos.htmlhttp://en.wikipedia.org/wiki/File_Allocation_Table#FAT12http://kerneltrap.org/node/4462http://www.atnf.csiro.au/people/rgooch/linux/docs/vfs.txt