29
© 2006 IBM Corporation IBM Linux Technology Center The Linux ext2/3/4 Filesystem: Past, Present, and Future Theodore Ts'o IBM Linux Technology Center September 11, 2006

The Linux ext2/3/4 Filesystem: Past, Present, and Future

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

© 2006 IBM Corporation

IBM Linux Technology Center

The Linux ext2/3/4 Filesystem: Past, Present, and Future

Theodore Ts'oIBM Linux Technology Center

September 11, 2006

IBM Linux Technology Center

© 2006 IBM Corporation

Agenda

A brief history of the ext2/3 filesystemThe ext3 filesystem formatFeatures added to ext3 in Linux 2.6New features planned for ext3/4Why ext4?Conclusion

IBM Linux Technology Center

© 2006 IBM Corporation

A brief history of Linux filesystems

The Minix filesystem (1991, used to bootstrap Linux)Max FS size: 64MB

Max file size: 64MB

Max filename: 14/30 bytes (fixed-length directory entries)

Only supported modification timestampFirst attempt to improve on Minixfs: the ext filesystem (1992)

Max FS size: 2GB

Max file size: 2GB

Max filename: 255 bytes

Still only one timestamp

Linked lists for free block/inodes caused performance problems

IBM Linux Technology Center

© 2006 IBM Corporation

The xiafs and ext2fs filesystems

Xiafs: minimal changes from minix (January 1993)Max FS size: 2GB

Max file size: 64MB (instead of 2GB)

Max filename: 248 bytes (fixed-length directory entries)

ctime/mtime/atime timestampsExt2fs – improvements to extfs (January 1993)

Max FS extended to 4TB

Variable block sizes

ctime/mtime/atime timestamps

Improved block/inode allocation using bitmaps and block groups

IBM Linux Technology Center

© 2006 IBM Corporation

Competition between xiafs and ext2fs

Since xiafs only made minor changes to minix, it was initially (appeared) more stable.Frank Xia tried to rename xiafs to Linuxfs – negative reaction to marketing-driven changesExt2 had a larger development community (so more features added) and had a more scalable design. In the end it became the dominant “default” filesystemFeatures added to ext2 over the years

sparse superblocks

Large file support (> 2GB)

Extended attributes

ACL's

IBM Linux Technology Center

© 2006 IBM Corporation

The ext3 filesystem

Journalling added to ext2 in 2000 (work started in 1998).Since it required many changes to the code base, a new version of the filesystem code was created in the kernel.

Hence, ext3

But really just ext2 with the COMPAT_HAS_JOURNAL feature (from the filesystem format point of view)

Other Journaling FilesystemsReiserfs, JFS, XFS

Advantages of ext3Backwards compatibility with ext2

Robustness against hardware errors highest priority

IBM Linux Technology Center

© 2006 IBM Corporation

The ext2/3 filesystem format

Verfy similar to the BSD FFSCylinder groups have become “block groups”Compatibility feature sets allow controlled addition of new features via three bitmasks:

R/W Compat – The kernel may mount the filesystem even if it does not understand a feature in this bitmask. (E2fsck however will refuse to touch a filesystem it doesn't understand)

R/O Compat – The kernel may mount the filesystem read/only if it does not understand a feature in this bitmask

Incompat – The kernel must not mount the filesystem if it does not understand a feature in this bitmask

IBM Linux Technology Center

© 2006 IBM Corporation

Ext2 Filesystem Layout

Boot BG #0 BG #1 ... BG #N

SuperBlock

BlockBitmap

InodeBitmap

Data blocksInodeTable

FS des-criptors

IBM Linux Technology Center

© 2006 IBM Corporation

Ext2 Inode structure

ModeOwners

SizeTimestamps

...

direct blocks

indirectd. indirectt. indirect

data

datadata

datadata

datadata

datadatadatadata

IBM Linux Technology Center

© 2006 IBM Corporation

Ext2 Directory Layout

name1name2name3name4

I1i2I3I3

Inode Table

Directory

IBM Linux Technology Center

© 2006 IBM Corporation

Features added to Linux 2.6

BKL removal and other scalability improvements (Andrew Morton, Alex Thomas)Directory Indexing (Daniel Phillips, Theodore Ts'o)Extended Attributes (Andreas Gruenbacher)Online resizing (Andreas Dilger, Stephen Tweedie)Reservation-based block preallocation (Mingming Cao, Andrew Morton, Stephen Tweedie, Badari Pulvarty)

IBM Linux Technology Center

© 2006 IBM Corporation

Reducing Lock Contention

Motivation: scaling issues for 2.4's ext3/jbd under workloads with concurrent I/OTo address this problem:

replaced the per-filesystem superblock lock in ext3 with finer-grained locks

Removed the big (global) kernel lock from the JBD layerResult: SDET benchmark throughput improved by a factor of 10

IBM Linux Technology Center

© 2006 IBM Corporation

Directory Indexing

Motivation: large directories took a long time to searchSolution: Add a search tree indexed by the hash of the filename to the directory

Variation of a B+tree– Directory entries stored in only leaf nodes– The use of fixed-length, 64-bit hashes as keys results in a high

fanout factorFully backwards compatible with older kernels

Interior nodes look like deleted directory entries

Older kernels will clear the directory indexed bit when they modify a directory, thus invalidating the interior nodes until they can be regenerated.

IBM Linux Technology Center

© 2006 IBM Corporation

Extended Attributes

Motivation: need to store small amounts of custom metadata which is associated with files or directories

Also needed to support Access Control Lists (ACL's)EAs are stored in a single EA block, which can be shared by inodes have same extended attributesIn Linux 2.6.11+, EA's can be stored in the expanded inode as well.

This EA-in-inode makes the ext3 top filesystem on Samba4 benchmarks

IBM Linux Technology Center

© 2006 IBM Corporation

Online Resizing

Motivation: Taking advantage of new disk space after a logical volume has been grown by the LVM subsystem without needing to unmount the filesystemSolution: Reserve space so that the number of blocks needed for the block group descriptors can be grown

An additional 4k block is required for every 32 block groups

Block group descriptors must be contiguously stored after the superblock

Integrated into the kernel as of 2.6.10 and e2fsprogs 1.39

IBM Linux Technology Center

© 2006 IBM Corporation

Reservation based block preallocation

Block preallocation helps reduce file fragmentation caused by concurrent allocationExt3 added block preallocation since 2.6.10 kernel.Ext3 uses in-memory block reservation to support a large preallocation

Ext3 (before)

Ext3 (After)

file 1

file 2

file 3

file 4

IBM Linux Technology Center

© 2006 IBM Corporation

Reservation Tree

(8, 31)

(0, 7) (32, 63)

(64, 71)

file 3file 2file 1 file 4

disk blocks

Files

IBM Linux Technology Center

© 2006 IBM Corporation

4 threads 16threads 64threads0

5

10

15

20

25

30

35

40

tiobench sequential write

ext3 2.4.29ext3 2.6.11JFSXFS

Thro

ughp

ut(M

B/se

c)

IBM Linux Technology Center

© 2006 IBM Corporation

Features planned for ext3/4

ExtentsSupport for large disks (48 and 64 bit block numbers)Fine-grained timestampsAsynchornous (background) unlink/truncateSupport > 32,000 subdirectoriesFiner grained locking to support parallel directory operations

IBM Linux Technology Center

© 2006 IBM Corporation

01......11121314

200201......211212123765530

213

...1236

1238......

1239......

65531......

65532......

65533......

0......200201......213............1239.........6553......

direct blockindirect blockdouble indirect blocktriple indirect block

i_data Ext2/3 Indirect Block Map

disk blocksWhy Extents?

IBM Linux Technology Center

© 2006 IBM Corporation

Extents

● Extents are an efficient way to represent large files● An extent is a single descriptor for a range of contiguous

blocks

logical length physical

0 1000 200

IBM Linux Technology Center

© 2006 IBM Corporation

header

01000200100120006000

...

...

200201......1199.........60006001......6199......

i_data

Extent Map

disk blocks

IBM Linux Technology Center

© 2006 IBM Corporation

i_data

index node...

0

...

...

header

0

0

...

...

Extent Treeleaf node disk blocks

extents

extents index

node header

root

IBM Linux Technology Center

© 2006 IBM Corporation

Extent Related Works

Multiple block allocationAn efficient way to allocating a chunk of contiguous blocks at a time

Delayed allocationEnable multiple block allocation by deferring and clustering single block allocation

IBM Linux Technology Center

© 2006 IBM Corporation

Evaluation of Extents Patches

Improvements for large file creation/removal/sequential read/sequential rewriteBenchmarks used: dbench, tiobench, FFSB filemark, sqlbench, iozone, etc.

IBM Linux Technology Center

© 2006 IBM Corporation

4 threads 16threads 64threads0

5

10

15

20

25

30

35

40

Tiobench Sequential Write Comparison With Extents

ext3 2.6.11ext3+extetnsJFSXFS

Thro

ughp

ut(M

B/se

c)

IBM Linux Technology Center

© 2006 IBM Corporation

Large File Sequential I/O Comparison Using FFSB

Sequential Read Sequential write Sequential re-write0

20

40

60

80

100

120

140

160

180

127

7175.7

153.7

91.9

102.7

156.3

89.394.8

166.3

104.3 100 ext3 ext3+extentsJFSXFS

Thro

ughp

ut(M

B/se

c)

IBM Linux Technology Center

© 2006 IBM Corporation

Ext4: The next-generation ext3

When initial versions of the extents patches were sent out for comment, some Linux kernel developers expressed concern:

Ext3 was too important to risk destabilizing code quality

Backwards incompatible extensions could cause user confusionAfter much discussion, a consensus on moving forward

Ext3 cleanup patches would be applied

The ext3 code base would be forked to fs/ext4, with the filesystem name ext4-dev

New work would happen in ext4-dev, and when the feature set for ext4 is stablized it would be renamed from ext4-dev to ext4.

IBM Linux Technology Center

© 2006 IBM Corporation

Conclusion

The ext2/3/4 filesystem is oldest filesystem which is still being actively developed in LinuxHas served the Linux community well for over 10 yearsWith new improvements being constantly being proposed, implemented, and placed into production, ext3/4 development continues to remain vital and exciting!