27
Liquid : A Scalable Deduplication File System For Virtual Machine Images GUIDED BY AP: REMYA DEPT OF COMPUTER SCIENCE & ENGINEERING SUBMITT BY SANOJ A S ROLL NO: R11U016 S7 CSE

liquid a scalable deduplication file system for virtual machine images

Embed Size (px)

DESCRIPTION

liquid a scalable deduplication file system for virtual machine images

Citation preview

Page 1: liquid a scalable deduplication file system for virtual machine images

Liquid : A Scalable Deduplication File System For Virtual Machine Images

GUIDED BYAP: REMYA DEPT OF COMPUTER SCIENCE & ENGINEERING

SUBMITTED BY SANOJ A S ROLL NO: R11U016 S7 CSE

Page 2: liquid a scalable deduplication file system for virtual machine images

2

CONTENTS

INTRODUCTION VIRTUAL MACHINE DEDUPLICATION ISSUES IN VM STORAGE LIQUID SYSTEM ARCHITECTURE COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL DEDUPLICATION IN LIQUID OPTIMIZATIONS ON FINGER PRINT CALCULATION STORAGE FOR DATA BLOCKS ADVANTAGES OF LIQUID CONCLUSION

Page 3: liquid a scalable deduplication file system for virtual machine images

3

INTRODUCTION

Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.

Page 4: liquid a scalable deduplication file system for virtual machine images

4

VIRTUAL MACHINE

Saving as a critical component in cloud computing.

Virtual Machine - Hypothetical Computer.

Emulates the functions of a real world computer.

Executes programs like a physical machine.

Initial state of a virtual machine is stored in a file called virtual Machine image.

Page 5: liquid a scalable deduplication file system for virtual machine images

5

VIRTUAL MACHINE

Page 6: liquid a scalable deduplication file system for virtual machine images

6

DEDUPLICATION

Data Deduplication – data compression technology.

Eliminates duplicate copies of repeating data.

A redundant data block is replaced instead of storing multiple times.

Improves storage utilization

Page 7: liquid a scalable deduplication file system for virtual machine images

7

DEDUPLICATION

Page 8: liquid a scalable deduplication file system for virtual machine images

8

ISSUES IN VM STORAGE

High demand on VM storage remains a challenging problem.

Existing systems have made efforts to reduce storage consumption.

Uses SAN cluster.

Cannot satisfy increasing demand due to cost limitation.

Hence we propose LIQUID.

Page 9: liquid a scalable deduplication file system for virtual machine images

9

LIQUID SYSTEM ARCHITECTURE

Three components - Single meta server with hot back up multiple data server and multiple clients.

Runs on user-level service process.

VM images are split into fixed size data blocks.

Meta server – namespace , finger print , reference count.

Meta server – mirrored to hot back up shadow meta server.

Page 10: liquid a scalable deduplication file system for virtual machine images

10

LIQUID SYSTEM ARCHITECTURE (CONT)

Data servers – change of managing data blocks in VM images.

Organized in a distributed hash table.

A liquid client provides a POSIX compatible file system.

Client – critical component (provides deduplication)

Fault tolerance – Mirroring the meta server.

Replicas of data blocks are stored.

Page 11: liquid a scalable deduplication file system for virtual machine images

11

LIQUID SYSTEM ARCHITECTURE (CONT)

Shadow Meta Server

Meta server

Data Servers

Client FS

Client FS

Client FS

CacheCache Cache

Heart beat

Fig : Liquid architecture.

Hot backup

Page 12: liquid a scalable deduplication file system for virtual machine images

12

COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL

META SERVER-manages all data servers.

Exchange regular heart beat message with each data server in a ROUND ROBIN FASHION.

Detect failed data servers when there are many data servers.

To speed up failure detection data servers send an error signal to meta server.

Page 13: liquid a scalable deduplication file system for virtual machine images

13

DEDUPLICATION IN LIQUID

Liquid chooses fixed size chunking instead of variable size chunking. Better since all files stored in VM images will be aligned on disk

block boundaries.

Advantage-simplicity.

Block size choice.

Block size- balancing factor which is hard to choose.

Great impact on both deduplication and io performance.

Page 14: liquid a scalable deduplication file system for virtual machine images

14

DEDUPLICATION IN LIQUID(CONT)

Smaller block size-more random seeks when accessing a VM image.

Not tolerable. A large block size is also not preferable, it will reduce

deduplication ratio.

Liquid choose different block size under different situation. Advised to use a multiplication of 4 kb between 256 kb and 1

MB to achieve good balance between IO performance and deduplication ratio.

Page 15: liquid a scalable deduplication file system for virtual machine images

15

DEDUPLICATION IN LIQUID(CONT)

Page 16: liquid a scalable deduplication file system for virtual machine images

16

DEDUPLICATION IN LIQUID(CONT)

Page 17: liquid a scalable deduplication file system for virtual machine images

17

OPTIMIZATIONS ON FINGER PRINT CALCULATION

Rely on comparison of data block finger prints for redundancy.

Finger print-collision resistant hash value calculated from data block contents.

MD5[26] and SHA-1[12] are frequently used for this purpose.

Finger print collision - very small, orders of magnitude smaller than hardware error rates.

Page 18: liquid a scalable deduplication file system for virtual machine images

OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)

So we could safely assume that two data blocks are identical.

Finger print calculation - expensive.

Delays finger print calculation for recently modified data blocks.

Runs deduplication lazily only when it is necessary.

Client side maintains a shared cache which contains recently accessed data blocks. 18

Page 19: liquid a scalable deduplication file system for virtual machine images

19

OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)

A portion of memory is used by the client side of liquid as private cache.

Private cache hold-modified data blocks and delay finger print calculation on them.

Modified data block ejected from->shared cache and added to ->private cache.

Modified data will be ejected->if private cache becomes full.

Page 20: liquid a scalable deduplication file system for virtual machine images

20

OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)

And ejected based on LRU policy.

Only then will the modified data block’s finger print be calculated.

Liquid uses multiple threads for finger print calculation.

Multiple threads will process different data blocks currently.

Provides good IO performance.

Page 21: liquid a scalable deduplication file system for virtual machine images

21

FILE SYSTEM LAY OUT

All file system meta data are stored on the meta server.

Organized in a file system tree.

Client side could cache portions of file system meta data for

fast accesses.

When a VM is stopped ,modified meta data and data blocks

Will be pushed back to meta server.

Data servers ensures modification on VM image is visible to

other client nodes.

Page 22: liquid a scalable deduplication file system for virtual machine images

22

FILE SYSTEM LAY OUT

Fig. Process of look-up by fingerprint.

Page 23: liquid a scalable deduplication file system for virtual machine images

23

ADVANTAGES OF LIQUID

Fast Virtual Machine deployment with peer to peer data transfer.

Low storage consumption by means of deduplication.

Instant cloning for virtual machine images.

On demand fetching through a network caching with local disks.

LIQUID files has no specific limit.

Page 24: liquid a scalable deduplication file system for virtual machine images

24

CONCLUSION

Presented LIQUID which is a deduplication file system with good IO performance.

Achieved by caching frequently accessed data blocks in memory cache.

Avoids additional disk operations.

Deduplication of VM images proved to be effective.

Page 25: liquid a scalable deduplication file system for virtual machine images

25

REFERENCES

Bloom Filter, Sept. 2011. [Online]. Available :http://en.wikipedia.org/wiki/Bloom_filter

Filesystem in Userspace, Sept. 2011. [Online]. Available: http://fuse.sourceforge.net/

Rabin Fingerprint, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/Rabin_fingerprint.

Reiserfs, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/ReiserFS.

Xfs: A High-Performance Journaling Filesystem, Sept. 2011. [Online]. Available: http://oss.sgi.com/projects/xfs/.

Data Deduplication, Sept. 2013. [Online]. Available: http://en.wikipedia.org/wiki/Data_deduplication.

Page 26: liquid a scalable deduplication file system for virtual machine images
Page 27: liquid a scalable deduplication file system for virtual machine images