28
O The Btrfs Filesystem Chris Mason Oracle Sept 2007 Chris Mason The Btrfs Filesystem

The Btrfs Filesystem - Oracle · O Storage Drives are getting bigger, but not much faster File sizes are often small Fsck times can be very large Filesystem scaling issues I Performance

  • Upload
    voduong

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

O

The Btrfs Filesystem

Chris Mason

Oracle

Sept 2007

Chris Mason The Btrfs Filesystem

O

Topics

Chris Mason The Btrfs Filesystem

O

Storage

Drives are getting bigger, but not much faster

File sizes are often small

Fsck times can be very large

Filesystem scaling issuesI PerformanceI ManagementI Space utilization

Chris Mason The Btrfs Filesystem

O

A New Filesystem?

Existing filesystems not well suited

Disk format compatibility

Upstream integration

Chris Mason The Btrfs Filesystem

O

Btrfs Design

Extent based file storage

Space efficient packing of small files

Space efficient indexed directories

Integrity checksumming (data and metadata)

Writable snapshots

Efficient incremental backups

Object level striping and mirroring

Online filesystem check

Chris Mason The Btrfs Filesystem

O

Btrfs Btree Basics

Simple copy on write tree

Reference counting for filesystem trees

Everything stored as a key/value pair

Large file data extents stored outside the tree

Defragments before writeback and via an ioctl

Btree blocks allocated from metadata-only block groups

Chris Mason The Btrfs Filesystem

O

Btrfs Trees

Super

Root Tree

Extent RootPointer

Subvolume 'default'Pointer

Snapshot 'snap'Pointer

Directory items...

defaultsnap

Extent Tree

Block allocationInformation and

Reference Counts

Subvolume Tree

Files andDirectories

Chris Mason The Btrfs Filesystem

O

Btrfs Files

Small files stored inside the btree (inline)

Large files stored in large extents

Written via delayed allocation

Variable sized checksums stored in the btree

Chris Mason The Btrfs Filesystem

O

Btrfs Directories

Double indexed by name and inode number

Compact: space reclaimed as names are removed

Backpointers from the file to the directory

Used to implement xattrs

Chris Mason The Btrfs Filesystem

O

Directory Layout Benchmarks

Create one million files in a single directory (512 bytes or 16k)

Time find, backup and delete operations

Chris Mason The Btrfs Filesystem

0 68 137 206 275 344 412 481 550Time (seconds)

0

11

22

33

45

MB

/s

Throughput

Ext3XFSBtrfs

0

30

60

90

120

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

7001

14002

21004

28005D

isk

offs

et (

MB

)Disk IO Ext3 Read

Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Create one million 512 byte files

0 14 28 42 56 70 84 98 112Time (seconds)

0

12

25

37

50

MB

/s

Throughput

Ext3Btrfs

0

40

80

120

160

Seek

s / s

ec

Seek Count

Ext3Btrfs

0

6001

12002

18003

24004D

isk

offs

et (

MB

)Disk IO

Ext3 ReadExt3 WriteBtrfs ReadBtrfs Write

Figure: Create one million 512 byte files

0 6 12 19 25 32 38 44 51Time (seconds)

0

7

15

22

30

MB

/s

Throughput

Ext3XFSBtrfs

0

62

125

187

250

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

6832

13665

20497

27330D

isk

offs

et (

MB

)Disk IO

Ext3 ReadExt3 WriteXFS ReadXFS WriteBtrfs Read

Figure: Find one million 512 byte files

0 44 89 134 179 224 269 314 359Time (seconds)

0

5

10

15

20

MB

/s

Throughput

XFSBtrfs

0

87

175

262

350

Seek

s / s

ec

Seek Count

XFSBtrfs

0

6832

13665

20497

27330D

isk

offs

et (

MB

)Disk IO

XFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Read one million 512 byte files

O

Btrfs Snapshots

Subvolume trees and file extents are reference counted

Snapshots are writable and can be snapshotted again

Root NodeRef: 1

Leaf 1

Ref: 1

Leaf 2

Ref: 1

Leaf 3

Ref: 1

Figure: Before COW

Root NodeRef: 1

Leaf 1

Ref: 2

Leaf 2

Ref: 2

Leaf 3

Ref: 2

Cow RootRef: 1

Figure: After COW

Chris Mason The Btrfs Filesystem

O

Btrfs Snapshots

Transactions are snapshots that are deleted after commit

Root NodeRef: 1

Leaf 1

Ref: 1

Leaf 2

Ref: 2

Leaf 3

Ref: 2

Cow RootRef: 1

Figure: Modification after COW

New RootRef: 1

Leaf 2

Ref: 1

Leaf 3

Ref: 1

Figure: After Deletion

Chris Mason The Btrfs Filesystem

O

Btrfs Allocation Trees

Will help model underlying storage (DM integration)

Record reference counts for all extents

Contain resizable block groups

Btrees can pull from one or more allocation trees

Chris Mason The Btrfs Filesystem

O

Btrfs Benchmarking

Single threaded, focus on layout

mkfs.xfs -d agcount=3 -l version=2,logsize=128m

mount -t xfs -o logbsize=256k

mount -t ext3 -o data=writeback

Single SATA drive, no barriers used

atimes left on

Graphs from http://oss.oracle.com/∼mason/seekwatcher

Chris Mason The Btrfs Filesystem

0 110 220 331 441 552 662 772 883Time (seconds)

0

12

25

37

50

MB

/s

Throughput

Ext3XFSBtrfs

0

30

60

90

120

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

7627

15255

22882

30510D

isk

offs

et (

MB

)Disk IO Ext3 Read

Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Create one million 16 KB files

0 54 108 162 216 270 324 378 432Time (seconds)

0

12

25

37

50

MB

/s

Throughput

Ext3Btrfs

0

11

22

33

45

Seek

s / s

ec

Seek Count

Ext3Btrfs

0

7627

15255

22882

30510D

isk

offs

et (

MB

)Disk IO

Ext3 ReadExt3 WriteBtrfs ReadBtrfs Write

Figure: Create one million 16 KB files

0 6 12 18 24 30 36 42 48Time (seconds)

0

5

10

15

20

MB

/s

Throughput

Ext3XFSBtrfs

0

62

125

187

250

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

6856

13712

20569

27425D

isk

offs

et (

MB

)Disk IO

Ext3 ReadExt3 WriteXFS ReadXFS WriteBtrfs Read

Figure: Find one million 16 KB files

0 114 228 342 456 570 684 799 913Time (seconds)

0

8

17

26

35

MB

/s

Throughput

XFSBtrfs

0

40

80

120

160

Seek

s / s

ec

Seek Count

XFSBtrfs

0

7439

14878

22317

29756D

isk

offs

et (

MB

)Disk IO

XFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Read one million 16 KB files

O

Compilebench

Simulates operations on kernel trees

File sizes and names collected from real kernel trees

Randomly create, compile, patch, clean, delete

Intended to age and fragment the FS

Created 90 initial trees and ran 150 operations

http://oss.oracle.com/∼mason/compilebench

Chris Mason The Btrfs Filesystem

O

Compilebench

Btrfs Ext3 XFS

Initial Create 33.78 MB/s 10.31 MB/s 15.30 MB/s

Create 14.39 MB/s 12.47 MB/s 8.94 MB/s

Patch 3.77 MB/s 4.48 MB/s 3.19 MB/s

Compile 20.56 MB/s 15.28 MB/s 21.61 MB/s

Clean 69.96 MB/s 43.84 MB/s 134.15 MB/s

Read 5.37 MB/s 8.66 MB/s 7.05 MB/s

Read Compiled 11.31 MB/s 13.22 MB/s 14.40 MB/s

Delete 15.77s 18.34s 16.63s

Delete Compiled 22.96s 33.35s 21.38s

Stat 15.86s 11.35s 7.25s

Stat Compiled 21.82s 14.17s 8.99s

Chris Mason The Btrfs Filesystem

0 37 74 111 148 185 223 260 297Time (seconds)

0

12

25

37

50

MB

/s

Throughput

Ext3XFSBtrfs

0

75

150

225

300

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

8653

17307

25960

34614D

isk

offs

et (

MB

)Disk IO Ext3 Read

Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Create 20 kernel trees

0 96 192 288 384 480 576 672 769Time (seconds)

0

8

17

26

35

MB

/s

Throughput

Ext3XFSBtrfs

0

62

125

187

250

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

10233

20467

30700

40934D

isk

offs

et (

MB

)Disk IO Ext3 Read

Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Make -j simulated 20 kernel trees

0 61 122 184 245 307 368 429 491Time (seconds)

0

6

12

18

25

MB

/s

Throughput

Ext3XFSBtrfs

0

45

90

135

180

Seek

s / s

ec

Seek Count

Ext3XFSBtrfs

0

10233

20466

30700

40933D

isk

offs

et (

MB

)Disk IO Ext3 Read

Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write

Figure: Delete 20 kernel trees

O

TODO

Concurrency (tree locking)

Multiple extent allocation trees

OSYNC and fsync performance

Online fsck

Many many more

Chris Mason The Btrfs Filesystem