Upload
kuniyasu-suzaki
View
1.271
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)" http://www.linuxsymposium.org/2009/ Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf
Citation preview
1
Research Center for Information Security
Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)
@ Linux Symposium 2009, Montreal, Canada, 17/July
Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf
Kuniyasu Suzaki †, Toshiki Yagi †,
Kengo Iijima †, Nguyen Anh Quynh †,
Yoshihito Watanabe ††
†
††
2
Key words
• LBCAS: Loopback Content Addressable Storage– Virtual block device (network transparent block device)
• readahead– Disk prefetch mechanism in Linux kernel
• System call “readahead” is different function.
• file system block reallocation– A kind of defrag tool
– We developed “ext2/3optimizer” which reallocate i-node data block.
Today’s talk is optimization methods using them.
3
Today’s Contents
• Motivation– What is LBCAS used for?– Correlation among LBCAS, file system block reallocation (ext2/3optimizer),
and disk prefetch (readahead)
• LBCAS: Loopback Content Addressable Storage
• Optimization: ext2/3optimizer and readahead
• Performance Results
• Conclusions
4
MotivationWhat is “LBCAS” used for?
• LBCAS is developed for OS Circular.
• OS Circular is a project to distribute bootable disk image for virtual machine and real machine. – OS Circular project
• http://openlab.jp/oscircular/
5
OS Circular (Big Picture)
LBCAS(Loopback Content Addressable Storage)
Virtual Machine
KVM
Real Machine
QEMU
Internet
Construct Virtual Diskfrom block files
block files on HTTP Server
OS Suppliers(update timely)
UsersTry OS without installation
6
Performance Issues (Today’s Main Topic)• LBCAS is sensitive for access patterns.
– Performance is affected by Number and Size of Disk Prefetch (“readahead” of Linux kernel)
• Number and Size of readahead can be optimized by file system block reallocation.– Defrag Tools are not enough. We developed “ext2/3optimzer”.
ext2/3optimizer reallocates blocks of ext2/3, which is based on access profile.
•Number of readahead is reduced
•Size of readahead is extended
Performance of LBCAS is increased
General Technique
Presentation ③ ② ①Order
7
LBCAS: LoopBack Content Addressable Storage
• LBCAS= CAS + LoopBack– CAS
• Indirect addressing by SHA-1 digest of block contents• Benefit: Same blocks are expressed by same SHA-1 digest and reduced
total storage• Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02]
– LoopBack• Virtual block device. A file is used as a block device.• The abstraction by file makes easy to treat.
• LBCAS saves each block to a file, which is called “block file”. The file is named by SHA-1 digest of its contents.
• Block files are managed by “mapping table” file, which is a table of physical address and SHA-1 file name.
8
Block Device
256KB
…
4KB Page
ext2
…
…
…
Mapping Table and block files
…
map01.idx4ad36ffe8…974daf34a…2d34ff3e1…3310012a……
The block files are re-constructed as a virtual disk
with LBCAS
compressed by zlib
Address File Name00000000-0003FFFF 4ad36ffe8…00040000-0007FFFF 974daf34a…00080000-000BFFFF 2d34ff3e1…000C0000-000FFFFF 3310012a…… …
Block file is named by SHA-1 digest of its contents
Block files of LBCAS
9
LBCAS (1/2)
• The image of LBCAS are made from existing normal block device.
• Original block device is split by fixed size (64KB -512KB) and compressed by zlib.
• Block files are reconstructed to a loopback file by FUSE wrapper.– FUSE is a User-land File System.
• http://fuse.sf.net
• Each block file is measured with the SHA1 file name when it mapped to loopback file.
10
Construct a virtual disk of LBCAS on a Client PC
OS
11
• Storage Cache– Suppress download
• Memory Cache– Suppress disk-access and
uncompress
Structure of LBCAS
12
LBCAS (2/2)
• When a file is updated or created on the original block device, the relevant block files are newly created with new SHA1 file name. The mapping table file is also renewed.– Old block files are reusable.
• HTTP for file deliver– Most popular and well designed for Internet.
• Utilize inexpensive Web hosting services, Proxies, and Mirror Servers for world wide deployment.
• Block files are network/storage transparent.– If necessary block files are stored in a local storage, network connection is
not necessary.
13
Block Device
256KB4KB Page
ext2
…
…
…
block files named by SHA-1
256KB4KB Page
ext2
…
…
…
…
map01.idx4ad36ffe8…974daf34a…2d34ff3e1…3310012a……
FUSEdriver
Same files
Reusable for FUSE
…
map02.idx4ad36ffe8…dd4daf34a…2d34ff3e1…3310012a……
block file
Partial Update of LBCAS
Update
apt-get install …
Create Once, Use Many
14
Performance Issues
• LBCAS is sensitive for access patterns.• 2 types of block size mismatch
(1) between File System and LBCAS (Static Mismatch)• ext2/3 4KB block size• LBCAS 64KB-512KB block size
– Occupancy (Rate of necessary data in a block file) is low.» Kitagawa[LinuxKongress2006] reported the occupancy was 30% on
KNOPPIX 3.8.2 on 256KB LBCAS.
(2) between “readahead(disk prefetch)” and LBCAS (Dynamic Mismatch)
• readahead 4KB-128KB coverage size• LBCAS 64KB-512KB lock size
– Small and many access (worm-eaten access to a block file) causes redundant download and unnecessary uncompress for LBCAS Driver.
15
CAUTION for readahead
• Disk prefetch “readahead” and System Call “readahead”– System Call “readahead” populates the page cache with data
from a file. Thus, whole data of a file is stored at page cache.The coverage is size of a file.
– It is not directly related to the disk prefetch but it achieves same function from user space.
– Some boot procedure use the system call “readahead”. The files, which are populated the page cache at boot time in advance, are listed at “/etc/readahead/boot,desktop”. We call this function “u-readahead” in this presentation.
16
Block size mismatch
• Solution (increasing locality of reference)1. (for static mismatch) Increase occupancy by reallocate necessary
data in a block file.
2. (for dynamic mismatch) Extend the coverage size of readahead by sequential access and high hit rate of page cache.
• “ext2/3optimizer” repacks the data blocks of ext2/3 file system to be in line.– The repacking is based on the block access profile at boot time.
– As the results, ext2/3optimizer reduces the number of block files.
17Redundant block
Occupancy in a block file of LBCAS• Occupancy (necessary data in a block file) depends on the necessary data.• “Worn-eaten” access (readahead) causes redundant download of block file.
readahead(4K~128K)
Files Disk access via readahead
LBCAS(256KB)
Block search Block files downloaded
Ext2/3 File System(4K)
①
②
③
Read Order
Cache missed and the coverage is shrunk
Hit Page-CacheOccupancy is low
18
current_window ahead_window
start ahead_start I/O
sequential read from application
current_window
sequential read from application
ahead_window
Extend to “max_readahead”
I/O
Readahead and LBCAS 1/2• Readahead is a mechanism of disk prefetch. The data are saved to page cache.
• The coverage size is extended or shrank by the rate of page cache hit rate.
19
current_window ahead_window
start ahead_start I/O
sequential read from application
current_window
sequential read from application
ahead_window
Extend to “max_readahead”
LBCASDownload block files Map to loopback device
I/O
LBCASStoredMemory cache
D3E14…
D3E14…
3B441…
Readahead and LBCAS 2/2• When a readahead is issued, a part of block file is required and mapped to the virtual disk.
The size depends on the coverage size of readahead.– Wide readahead is effective for LBCAS driver.
• When a same block file is required sequentially, the block file is stored on the memory cache of LBCAS and the uncompression is eliminated.
Low occupancy caused size mismatch
20
Readahead and Block Reallocation
• Readahead can be improved by block reallocation of File System, if the hit rate of page cache is increased.
• Defrag tools looks work well …– Unfortunately, current defrag tools are not suitable, because
they are developed from the view of file defrag.
• We developed “ext2/3optimizer” which reallocate the data blocks of ext2/3 based on access profile.– It also increases occupancy in a block file.
21
VFS
File System Driver (ext2/3)
Profiler
Page Cache (Memory)
Access Profile(via /proc/ )
App
Kernel
Userext2/3optimizer
Page Cache (Memory)
VFS
File System Driver (ext2/3)
App
Reallocate
Block Driver (Loopback)
Block Driver (Loopback)
Device
Access profile and reallocation
scattered gathered
Readahead issmall and many(worm-eaten)
access
Readahead is sequential
access
22
Block Relocation: Ext2/3optimizer [LinuxKongress06]
Triple Indirect
Double Indirect
Indirect Blocks
Direct Blocks
Timestamps
Size
Owner info
Mode
• Change data blocks to be arranged in line. Structure of meta data is not changed.• The arrangement is based on the access profile.• Feature:
– Normal driver is used.– The fragmentation is occurred from the view of file– The relocation increases page-cache hit. readahead extend the coverage size.
Triple Indirect
Double Indirect
Indirect Blocks
Direct Blocks
Timestamps
Size
Owner info
Mode
highoccupancyreadahead
is widen
23
Performance Analysis
• Confirm effect of ext2/3optimizer on LBCAS for booting.– Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60.
• The ext3 was optimized by ext2/3optimizer for boot profile.
• The disk image is translated to LBCAS (64KB - 512KB).
• Compare with – Normal
– u-readahead: user level readahead (system call) for booting
– ext2/3optimzer
24
Static Analyze by DAVL (Disk Allocation Viewer for Linux)
Fragmentation 0.21% Fragmentation 1.11%
Non-contiguous
block
System block
contiguous block
normal ext2/3opt
25
normal u-readahead ext2/3opt
• BootChart showed utilization of I/O.– u-readahead caused spike of I/O.
Utilization of I/O
I/O Spike
Reduced I/O
26
Dynamic Analyze: Disk Access at boot time
• Ext2/3optimizer relocate data blocks, which are required at boot time, at the top of virtual disk.
Address (GB)
Tim
e (s
)
0 2.0 4.0 6.0 8.0
Red: normalBlue: ext2/3opt
27
normal
ext2/3opt
128KB
64KB
32KB
0KB
128KB
64KB
32KB
0KB
128KB
64KB
32KB
0KB
0 10 20 30 40 50 60
0 10 20 30 40 50 60
0 10 20 30 40 50 60
Trace of readahead coverage size
Time (s)
Time (s)
u-hreadahead
Fewer small readahead
28
request size (KB)
Fre
quen
cy
0 32 64 128
Frequency for each readahead coverage
• Ext2/3 optimizer reduced small “readahead”.
29
Volume Transition on processing level
231MB
freq:5, 827
size:41KB
u-readahead ext2/3optnormal
208MB
freq:6,379
size:33KB
127MB
203MB (2,248 Av: 92KB)
140MB
freq:2,129
size:67KB
Volume of access which includes coverage of readahead (frequency, average size)
Volume of required blocks
Volume of files (number, average)
144(474), 26.9%
114(358), 35.5%
96.8(290), 43.9%
86.1(247), 51.5%
normal
55.6(176), 71.8%153(508), 25.1%512KB
55.6(159), 80.0%123(386), 35.0%256KB
55.3(149), 85.3%104(315), 40.3%128KB
55.3(144), 88.7%93.4(272), 46.9%64KB
ext2/3optu-readaheadLBCAS size
• Volume of downloaded block files MB, (uncompressed MB), Occupancy % (127MB/ uncompressed MB)
1/21/3
2
76MB (67%)
+81MB +13MB+104MB
30
13 13 13 20 14 14 12 19 7 6 6 7
5.0 6.5 9.0 14.0
normal u-readahead ext2/3opt
Tim
e (s
)T
ime
(s)
5.7 4.6 4.7 3.1 6.6 5.8 2.9 4.5 3.6 2.7 1.7 1.1
5.2 6.7 7.3 11.4
2.5 2.8 3.5 4.8
43 43 42 37 43 43 45 38 45 45 46 44
normal u-readahead ext2/3opt
Consumed time in LBCAS
512KB was not efficient on each optimization
31
+ normal□ u-readahead× ext2/3opt
Time (s)
Total download of LBCAS
• Ext2/3opt reduced the necessary block files (256KB).
System call “readahead” downloaded required files in advance. It caused I/O spike. It also included redundant data.
140
120
100
80
60
40
20
Dow
nloa
d (M
B)
32
Frequency of function in LBCAS
① 5,434② 388
① 5,023② 804
① 4,181② 1,653
① 3,537② 1,259③ 1,029
4,023
3.,908
3,761
3,626
1,172
1,179
1,200
1,172
2,187
2,723
3,726
5,516
1.015
1,544
2,526
4,344
5,822512KB
5,827256KB
5,834128KB
5,82564KB
(Av: 41KB)u-readahead
① 6,054② 341
① 5,667② 717
① 4,919② 1,462
① 4,148② 1,450③ 740
Files per requestR= ①+②+③U+M=①+②*2+③*3
4,019
3,908
3,793
3,647
MemoryCache (M)
1,769
1,748
1,729
1,663
Storage Cache(S)
2,717
3,183
4,050
5,621
Uncompress(U)
D+S=U
848
1,435
2,321
3,958
Download(D)
6,395512KB
6,379256KB
6,381128KB
6,33864KB
Requests (R)(Av: 33KB)
normal
① 1,874② 258
① 1,639② 490
① 1,116② 1,032
① 941② 380③ 844
1,520
1,409
1,398
1,311
517
576
593
626
870
1,210
1,782
2,922
353
634
1,189
2,296
2,132512KB
2,129256KB
2,148128KB
2,16564KB
(Av: 67KB)ext2/3opt
I/O Requests are independent of
LBCAS
downloadis reduced
uncompressis reduced
33
Discussions
• Weak point of ext2/3optimizer– The reallocation is customized for booting. The other
applications may be subject to adverse effect.
• I guess boot procedure is special and has no strong relation to other applications.
– The reallocation is customized for a certain version. When a part of boot procedure is updated, we have to re-optimize the image.
34
Conclusions
• “ext2/3optimzer” is a strong tool to utilize “readahead”, because it reallocates data blocks which are used by boot procedure.– It increased occupancy (rate of necessary data in a block file)
of LBCAS block file.
– It made the coverage of readahead double and reduced the number of readahead to half.
• “ext2/3optimizer” is not for LBCAS. It is used for normal Linux Distributions.
35
Summary
The some services are available. Just try!http://openlab.jp/oscircular/
EXT2/3optimizer developershttp://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm
DAVL developershttp://sourceforge.net/projects/davl/
BootCharthttp://www.bootchart.org/