35
1 Research Center for Information Security Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage) @ Linux Symposium 2009, Montreal, Canada, 17/July Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf Kuniyasu Suzaki , Toshiki Yagi , Kengo Iijima , Nguyen Anh Quynh , Yoshihito Watanabe †† ††

Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

Embed Size (px)

DESCRIPTION

Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)" http://www.linuxsymposium.org/2009/ Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf

Citation preview

Page 1: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

1

Research Center for Information Security

Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)

@ Linux Symposium 2009, Montreal, Canada, 17/July

Paper: http://www.kernel.org/doc/ols/2009/ols2009-pages-275-286.pdf

Kuniyasu Suzaki †, Toshiki Yagi †,

Kengo Iijima †, Nguyen Anh Quynh †,

Yoshihito Watanabe ††

††

Page 2: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

2

Key words

• LBCAS: Loopback Content Addressable Storage– Virtual block device (network transparent block device)

• readahead– Disk prefetch mechanism in Linux kernel

• System call “readahead” is different function.

• file system block reallocation– A kind of defrag tool

– We developed “ext2/3optimizer” which reallocate i-node data block.

Today’s talk is optimization methods using them.

Page 3: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

3

Today’s Contents

• Motivation– What is LBCAS used for?– Correlation among LBCAS, file system block reallocation (ext2/3optimizer),

and disk prefetch (readahead)

• LBCAS: Loopback Content Addressable Storage

• Optimization: ext2/3optimizer and readahead

• Performance Results

• Conclusions

Page 4: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

4

MotivationWhat is “LBCAS” used for?

• LBCAS is developed for OS Circular.

• OS Circular is a project to distribute bootable disk image for virtual machine and real machine. – OS Circular project

• http://openlab.jp/oscircular/

Page 5: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

5

OS Circular (Big Picture)

LBCAS(Loopback Content Addressable Storage)

Virtual Machine

KVM

Real Machine

QEMU

Internet

Construct Virtual Diskfrom block files

block files on HTTP Server

OS Suppliers(update timely)

UsersTry OS without installation

Page 6: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

6

Performance Issues (Today’s Main Topic)• LBCAS is sensitive for access patterns.

– Performance is affected by Number and Size of Disk Prefetch (“readahead” of Linux kernel)

• Number and Size of readahead can be optimized by file system block reallocation.– Defrag Tools are not enough. We developed “ext2/3optimzer”.

ext2/3optimizer reallocates blocks of ext2/3, which is based on access profile.

•Number of readahead is reduced

•Size of readahead is extended

Performance of LBCAS is increased

General Technique

Presentation ③ ② ①Order

Page 7: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

7

LBCAS: LoopBack Content Addressable Storage

• LBCAS= CAS + LoopBack– CAS

• Indirect addressing by SHA-1 digest of block contents• Benefit: Same blocks are expressed by same SHA-1 digest and reduced

total storage• Mainly used for Archive. Example: Venti of Plan9 [USENIX FAST’02]

– LoopBack• Virtual block device. A file is used as a block device.• The abstraction by file makes easy to treat.

• LBCAS saves each block to a file, which is called “block file”. The file is named by SHA-1 digest of its contents.

• Block files are managed by “mapping table” file, which is a table of physical address and SHA-1 file name.

Page 8: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

8

Block Device

256KB

4KB Page

ext2

Mapping Table and block files

map01.idx4ad36ffe8…974daf34a…2d34ff3e1…3310012a……

The block files are re-constructed as a virtual disk

with LBCAS

compressed by zlib

Address File Name00000000-0003FFFF 4ad36ffe8…00040000-0007FFFF 974daf34a…00080000-000BFFFF 2d34ff3e1…000C0000-000FFFFF 3310012a…… …

Block file is named by SHA-1 digest of its contents

Block files of LBCAS

Page 9: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

9

LBCAS (1/2)

• The image of LBCAS are made from existing normal block device.

• Original block device is split by fixed size (64KB -512KB) and compressed by zlib.

• Block files are reconstructed to a loopback file by FUSE wrapper.– FUSE is a User-land File System.

• http://fuse.sf.net

• Each block file is measured with the SHA1 file name when it mapped to loopback file.

Page 10: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

10

Construct a virtual disk of LBCAS on a Client PC

OS

Page 11: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

11

• Storage Cache– Suppress download

• Memory Cache– Suppress disk-access and

uncompress

Structure of LBCAS

Page 12: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

12

LBCAS (2/2)

• When a file is updated or created on the original block device, the relevant block files are newly created with new SHA1 file name. The mapping table file is also renewed.– Old block files are reusable.

• HTTP for file deliver– Most popular and well designed for Internet.

• Utilize inexpensive Web hosting services, Proxies, and Mirror Servers for world wide deployment.

• Block files are network/storage transparent.– If necessary block files are stored in a local storage, network connection is

not necessary.

Page 13: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

13

Block Device

256KB4KB Page

ext2

block files named by SHA-1

256KB4KB Page

ext2

map01.idx4ad36ffe8…974daf34a…2d34ff3e1…3310012a……

FUSEdriver

Same files

Reusable for FUSE

map02.idx4ad36ffe8…dd4daf34a…2d34ff3e1…3310012a……

block file

Partial Update of LBCAS

Update

apt-get install …

Create Once, Use Many

Page 14: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

14

Performance Issues

• LBCAS is sensitive for access patterns.• 2 types of block size mismatch

(1) between File System and LBCAS (Static Mismatch)• ext2/3 4KB block size• LBCAS 64KB-512KB block size

– Occupancy (Rate of necessary data in a block file) is low.» Kitagawa[LinuxKongress2006] reported the occupancy was 30% on

KNOPPIX 3.8.2 on 256KB LBCAS.

(2) between “readahead(disk prefetch)” and LBCAS (Dynamic Mismatch)

• readahead 4KB-128KB coverage size• LBCAS 64KB-512KB lock size

– Small and many access (worm-eaten access to a block file) causes redundant download and unnecessary uncompress for LBCAS Driver.

Page 15: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

15

CAUTION for readahead

• Disk prefetch “readahead” and System Call “readahead”– System Call “readahead” populates the page cache with data

from a file. Thus, whole data of a file is stored at page cache.The coverage is size of a file.

– It is not directly related to the disk prefetch but it achieves same function from user space.

– Some boot procedure use the system call “readahead”. The files, which are populated the page cache at boot time in advance, are listed at “/etc/readahead/boot,desktop”. We call this function “u-readahead” in this presentation.

Page 16: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

16

Block size mismatch

• Solution (increasing locality of reference)1. (for static mismatch) Increase occupancy by reallocate necessary

data in a block file.

2. (for dynamic mismatch) Extend the coverage size of readahead by sequential access and high hit rate of page cache.

• “ext2/3optimizer” repacks the data blocks of ext2/3 file system to be in line.– The repacking is based on the block access profile at boot time.

– As the results, ext2/3optimizer reduces the number of block files.

Page 17: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

17Redundant block

Occupancy in a block file of LBCAS• Occupancy (necessary data in a block file) depends on the necessary data.• “Worn-eaten” access (readahead) causes redundant download of block file.

readahead(4K~128K)

Files Disk access via readahead

LBCAS(256KB)

Block search Block files downloaded

Ext2/3 File System(4K)

Read Order

Cache missed and the coverage is shrunk

Hit Page-CacheOccupancy is low

Page 18: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

18

current_window ahead_window

start ahead_start I/O

sequential read from application

current_window

sequential read from application

ahead_window

Extend to “max_readahead”

I/O

Readahead and LBCAS 1/2• Readahead is a mechanism of disk prefetch. The data are saved to page cache.

• The coverage size is extended or shrank by the rate of page cache hit rate.

Page 19: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

19

current_window ahead_window

start ahead_start I/O

sequential read from application

current_window

sequential read from application

ahead_window

Extend to “max_readahead”

LBCASDownload block files Map to loopback device

I/O

LBCASStoredMemory cache

D3E14…

D3E14…

3B441…

Readahead and LBCAS 2/2• When a readahead is issued, a part of block file is required and mapped to the virtual disk.

The size depends on the coverage size of readahead.– Wide readahead is effective for LBCAS driver.

• When a same block file is required sequentially, the block file is stored on the memory cache of LBCAS and the uncompression is eliminated.

Low occupancy caused size mismatch

Page 20: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

20

Readahead and Block Reallocation

• Readahead can be improved by block reallocation of File System, if the hit rate of page cache is increased.

• Defrag tools looks work well …– Unfortunately, current defrag tools are not suitable, because

they are developed from the view of file defrag.

• We developed “ext2/3optimizer” which reallocate the data blocks of ext2/3 based on access profile.– It also increases occupancy in a block file.

Page 21: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

21

VFS

File System Driver (ext2/3)

Profiler

Page Cache (Memory)

Access Profile(via /proc/ )

App

Kernel

Userext2/3optimizer

Page Cache (Memory)

VFS

File System Driver (ext2/3)

App

Reallocate

Block Driver (Loopback)

Block Driver (Loopback)

Device

Access profile and reallocation

scattered gathered

Readahead issmall and many(worm-eaten)

access

Readahead is sequential

access

Page 22: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

22

Block Relocation: Ext2/3optimizer [LinuxKongress06]

Triple Indirect

Double Indirect

Indirect Blocks

Direct Blocks

Timestamps

Size

Owner info

Mode

• Change data blocks to be arranged in line. Structure of meta data is not changed.• The arrangement is based on the access profile.• Feature:

– Normal driver is used.– The fragmentation is occurred from the view of file– The relocation increases page-cache hit. readahead extend the coverage size.

Triple Indirect

Double Indirect

Indirect Blocks

Direct Blocks

Timestamps

Size

Owner info

Mode

highoccupancyreadahead

is widen

Page 23: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

23

Performance Analysis

• Confirm effect of ext2/3optimizer on LBCAS for booting.– Ubuntu 9.04 (2.6.28) installed on ext3 (8GB) with KVM-60.

• The ext3 was optimized by ext2/3optimizer for boot profile.

• The disk image is translated to LBCAS (64KB - 512KB).

• Compare with – Normal

– u-readahead: user level readahead (system call) for booting

– ext2/3optimzer

Page 24: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

24

Static Analyze by DAVL (Disk Allocation Viewer for Linux)

Fragmentation 0.21% Fragmentation 1.11%

Non-contiguous

block

System block

contiguous block

normal ext2/3opt

Page 25: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

25

normal u-readahead ext2/3opt

• BootChart showed utilization of I/O.– u-readahead caused spike of I/O.

Utilization of I/O

I/O Spike

Reduced I/O

Page 26: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

26

Dynamic Analyze: Disk Access at boot time

• Ext2/3optimizer relocate data blocks, which are required at boot time, at the top of virtual disk.

Address (GB)

Tim

e (s

)

0 2.0 4.0 6.0 8.0

Red: normalBlue: ext2/3opt

Page 27: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

27

normal

ext2/3opt

128KB

64KB

32KB

0KB

128KB

64KB

32KB

0KB

128KB

64KB

32KB

0KB

0 10 20 30 40 50 60

0 10 20 30 40 50 60

0 10 20 30 40 50 60

Trace of readahead coverage size

Time (s)

Time (s)

u-hreadahead

Fewer small readahead

Page 28: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

28

request size (KB)

Fre

quen

cy

0 32 64 128

Frequency for each readahead coverage

• Ext2/3 optimizer reduced small “readahead”.

Page 29: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

29

Volume Transition on processing level

231MB

freq:5, 827

size:41KB

u-readahead ext2/3optnormal

208MB

freq:6,379

size:33KB

127MB

203MB (2,248 Av: 92KB)

140MB

freq:2,129

size:67KB

Volume of access which includes coverage of readahead (frequency, average size)

Volume of required blocks

Volume of files (number, average)

144(474), 26.9%

114(358), 35.5%

96.8(290), 43.9%

86.1(247), 51.5%

normal

55.6(176), 71.8%153(508), 25.1%512KB

55.6(159), 80.0%123(386), 35.0%256KB

55.3(149), 85.3%104(315), 40.3%128KB

55.3(144), 88.7%93.4(272), 46.9%64KB

ext2/3optu-readaheadLBCAS size

• Volume of downloaded block files MB, (uncompressed MB), Occupancy % (127MB/ uncompressed MB)

1/21/3

2

76MB (67%)

+81MB +13MB+104MB

Page 30: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

30

13 13 13 20 14 14 12 19 7 6 6 7

5.0 6.5 9.0 14.0

normal u-readahead ext2/3opt

Tim

e (s

)T

ime

(s)

5.7 4.6 4.7 3.1 6.6 5.8 2.9 4.5 3.6 2.7 1.7 1.1

5.2 6.7 7.3 11.4

2.5 2.8 3.5 4.8

43 43 42 37 43 43 45 38 45 45 46 44

normal u-readahead ext2/3opt

Consumed time in LBCAS

512KB was not efficient on each optimization

Page 31: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

31

+ normal□ u-readahead× ext2/3opt

Time (s)

Total download of LBCAS

• Ext2/3opt reduced the necessary block files (256KB).

System call “readahead” downloaded required files in advance. It caused I/O spike. It also included redundant data.

140

120

100

80

60

40

20

Dow

nloa

d (M

B)

Page 32: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

32

Frequency of function in LBCAS

① 5,434② 388

① 5,023② 804

① 4,181② 1,653

① 3,537② 1,259③ 1,029

4,023

3.,908

3,761

3,626

1,172

1,179

1,200

1,172

2,187

2,723

3,726

5,516

1.015

1,544

2,526

4,344

5,822512KB

5,827256KB

5,834128KB

5,82564KB

(Av: 41KB)u-readahead

① 6,054② 341

① 5,667② 717

① 4,919② 1,462

① 4,148② 1,450③ 740

Files per requestR= ①+②+③U+M=①+②*2+③*3

4,019

3,908

3,793

3,647

MemoryCache (M)

1,769

1,748

1,729

1,663

Storage Cache(S)

2,717

3,183

4,050

5,621

Uncompress(U)

D+S=U

848

1,435

2,321

3,958

Download(D)

6,395512KB

6,379256KB

6,381128KB

6,33864KB

Requests (R)(Av: 33KB)

normal

① 1,874② 258

① 1,639② 490

① 1,116② 1,032

① 941② 380③ 844

1,520

1,409

1,398

1,311

517

576

593

626

870

1,210

1,782

2,922

353

634

1,189

2,296

2,132512KB

2,129256KB

2,148128KB

2,16564KB

(Av: 67KB)ext2/3opt

I/O Requests are independent of

LBCAS

downloadis reduced

uncompressis reduced

Page 33: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

33

Discussions

• Weak point of ext2/3optimizer– The reallocation is customized for booting. The other

applications may be subject to adverse effect.

• I guess boot procedure is special and has no strong relation to other applications.

– The reallocation is customized for a certain version. When a part of boot procedure is updated, we have to re-optimize the image.

Page 34: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

34

Conclusions

• “ext2/3optimzer” is a strong tool to utilize “readahead”, because it reallocates data blocks which are used by boot procedure.– It increased occupancy (rate of necessary data in a block file)

of LBCAS block file.

– It made the coverage of readahead double and reduced the number of readahead to half.

• “ext2/3optimizer” is not for LBCAS. It is used for normal Linux Distributions.

Page 35: Linux Symposium 2009 Slide Suzaki "Effect of readahead and file system block reallocation for LBCAS (LoopBack Content Addressable Storage)"

35

Summary

The some services are available. Just try!http://openlab.jp/oscircular/

EXT2/3optimizer developershttp://unit.aist.go.jp/itri/knoppix/ext2optimizer/index-en.htm

DAVL developershttp://sourceforge.net/projects/davl/

BootCharthttp://www.bootchart.org/