Upload
voduong
View
221
Download
0
Embed Size (px)
Citation preview
O
Storage
Drives are getting bigger, but not much faster
File sizes are often small
Fsck times can be very large
Filesystem scaling issuesI PerformanceI ManagementI Space utilization
Chris Mason The Btrfs Filesystem
O
A New Filesystem?
Existing filesystems not well suited
Disk format compatibility
Upstream integration
Chris Mason The Btrfs Filesystem
O
Btrfs Design
Extent based file storage
Space efficient packing of small files
Space efficient indexed directories
Integrity checksumming (data and metadata)
Writable snapshots
Efficient incremental backups
Object level striping and mirroring
Online filesystem check
Chris Mason The Btrfs Filesystem
O
Btrfs Btree Basics
Simple copy on write tree
Reference counting for filesystem trees
Everything stored as a key/value pair
Large file data extents stored outside the tree
Defragments before writeback and via an ioctl
Btree blocks allocated from metadata-only block groups
Chris Mason The Btrfs Filesystem
O
Btrfs Trees
Super
Root Tree
Extent RootPointer
Subvolume 'default'Pointer
Snapshot 'snap'Pointer
Directory items...
defaultsnap
Extent Tree
Block allocationInformation and
Reference Counts
Subvolume Tree
Files andDirectories
Chris Mason The Btrfs Filesystem
O
Btrfs Files
Small files stored inside the btree (inline)
Large files stored in large extents
Written via delayed allocation
Variable sized checksums stored in the btree
Chris Mason The Btrfs Filesystem
O
Btrfs Directories
Double indexed by name and inode number
Compact: space reclaimed as names are removed
Backpointers from the file to the directory
Used to implement xattrs
Chris Mason The Btrfs Filesystem
O
Directory Layout Benchmarks
Create one million files in a single directory (512 bytes or 16k)
Time find, backup and delete operations
Chris Mason The Btrfs Filesystem
0 68 137 206 275 344 412 481 550Time (seconds)
0
11
22
33
45
MB
/s
Throughput
Ext3XFSBtrfs
0
30
60
90
120
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
7001
14002
21004
28005D
isk
offs
et (
MB
)Disk IO Ext3 Read
Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Create one million 512 byte files
0 14 28 42 56 70 84 98 112Time (seconds)
0
12
25
37
50
MB
/s
Throughput
Ext3Btrfs
0
40
80
120
160
Seek
s / s
ec
Seek Count
Ext3Btrfs
0
6001
12002
18003
24004D
isk
offs
et (
MB
)Disk IO
Ext3 ReadExt3 WriteBtrfs ReadBtrfs Write
Figure: Create one million 512 byte files
0 6 12 19 25 32 38 44 51Time (seconds)
0
7
15
22
30
MB
/s
Throughput
Ext3XFSBtrfs
0
62
125
187
250
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
6832
13665
20497
27330D
isk
offs
et (
MB
)Disk IO
Ext3 ReadExt3 WriteXFS ReadXFS WriteBtrfs Read
Figure: Find one million 512 byte files
0 44 89 134 179 224 269 314 359Time (seconds)
0
5
10
15
20
MB
/s
Throughput
XFSBtrfs
0
87
175
262
350
Seek
s / s
ec
Seek Count
XFSBtrfs
0
6832
13665
20497
27330D
isk
offs
et (
MB
)Disk IO
XFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Read one million 512 byte files
O
Btrfs Snapshots
Subvolume trees and file extents are reference counted
Snapshots are writable and can be snapshotted again
Root NodeRef: 1
Leaf 1
Ref: 1
Leaf 2
Ref: 1
Leaf 3
Ref: 1
Figure: Before COW
Root NodeRef: 1
Leaf 1
Ref: 2
Leaf 2
Ref: 2
Leaf 3
Ref: 2
Cow RootRef: 1
Figure: After COW
Chris Mason The Btrfs Filesystem
O
Btrfs Snapshots
Transactions are snapshots that are deleted after commit
Root NodeRef: 1
Leaf 1
Ref: 1
Leaf 2
Ref: 2
Leaf 3
Ref: 2
Cow RootRef: 1
Figure: Modification after COW
New RootRef: 1
Leaf 2
Ref: 1
Leaf 3
Ref: 1
Figure: After Deletion
Chris Mason The Btrfs Filesystem
O
Btrfs Allocation Trees
Will help model underlying storage (DM integration)
Record reference counts for all extents
Contain resizable block groups
Btrees can pull from one or more allocation trees
Chris Mason The Btrfs Filesystem
O
Btrfs Benchmarking
Single threaded, focus on layout
mkfs.xfs -d agcount=3 -l version=2,logsize=128m
mount -t xfs -o logbsize=256k
mount -t ext3 -o data=writeback
Single SATA drive, no barriers used
atimes left on
Graphs from http://oss.oracle.com/∼mason/seekwatcher
Chris Mason The Btrfs Filesystem
0 110 220 331 441 552 662 772 883Time (seconds)
0
12
25
37
50
MB
/s
Throughput
Ext3XFSBtrfs
0
30
60
90
120
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
7627
15255
22882
30510D
isk
offs
et (
MB
)Disk IO Ext3 Read
Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Create one million 16 KB files
0 54 108 162 216 270 324 378 432Time (seconds)
0
12
25
37
50
MB
/s
Throughput
Ext3Btrfs
0
11
22
33
45
Seek
s / s
ec
Seek Count
Ext3Btrfs
0
7627
15255
22882
30510D
isk
offs
et (
MB
)Disk IO
Ext3 ReadExt3 WriteBtrfs ReadBtrfs Write
Figure: Create one million 16 KB files
0 6 12 18 24 30 36 42 48Time (seconds)
0
5
10
15
20
MB
/s
Throughput
Ext3XFSBtrfs
0
62
125
187
250
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
6856
13712
20569
27425D
isk
offs
et (
MB
)Disk IO
Ext3 ReadExt3 WriteXFS ReadXFS WriteBtrfs Read
Figure: Find one million 16 KB files
0 114 228 342 456 570 684 799 913Time (seconds)
0
8
17
26
35
MB
/s
Throughput
XFSBtrfs
0
40
80
120
160
Seek
s / s
ec
Seek Count
XFSBtrfs
0
7439
14878
22317
29756D
isk
offs
et (
MB
)Disk IO
XFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Read one million 16 KB files
O
Compilebench
Simulates operations on kernel trees
File sizes and names collected from real kernel trees
Randomly create, compile, patch, clean, delete
Intended to age and fragment the FS
Created 90 initial trees and ran 150 operations
http://oss.oracle.com/∼mason/compilebench
Chris Mason The Btrfs Filesystem
O
Compilebench
Btrfs Ext3 XFS
Initial Create 33.78 MB/s 10.31 MB/s 15.30 MB/s
Create 14.39 MB/s 12.47 MB/s 8.94 MB/s
Patch 3.77 MB/s 4.48 MB/s 3.19 MB/s
Compile 20.56 MB/s 15.28 MB/s 21.61 MB/s
Clean 69.96 MB/s 43.84 MB/s 134.15 MB/s
Read 5.37 MB/s 8.66 MB/s 7.05 MB/s
Read Compiled 11.31 MB/s 13.22 MB/s 14.40 MB/s
Delete 15.77s 18.34s 16.63s
Delete Compiled 22.96s 33.35s 21.38s
Stat 15.86s 11.35s 7.25s
Stat Compiled 21.82s 14.17s 8.99s
Chris Mason The Btrfs Filesystem
0 37 74 111 148 185 223 260 297Time (seconds)
0
12
25
37
50
MB
/s
Throughput
Ext3XFSBtrfs
0
75
150
225
300
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
8653
17307
25960
34614D
isk
offs
et (
MB
)Disk IO Ext3 Read
Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Create 20 kernel trees
0 96 192 288 384 480 576 672 769Time (seconds)
0
8
17
26
35
MB
/s
Throughput
Ext3XFSBtrfs
0
62
125
187
250
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
10233
20467
30700
40934D
isk
offs
et (
MB
)Disk IO Ext3 Read
Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Make -j simulated 20 kernel trees
0 61 122 184 245 307 368 429 491Time (seconds)
0
6
12
18
25
MB
/s
Throughput
Ext3XFSBtrfs
0
45
90
135
180
Seek
s / s
ec
Seek Count
Ext3XFSBtrfs
0
10233
20466
30700
40933D
isk
offs
et (
MB
)Disk IO Ext3 Read
Ext3 WriteXFS ReadXFS WriteBtrfs ReadBtrfs Write
Figure: Delete 20 kernel trees