Upload
naseem-nisar
View
385
Download
2
Embed Size (px)
DESCRIPTION
liquid a scalable deduplication file system for virtual machine images
Citation preview
Liquid : A Scalable Deduplication File System For Virtual Machine Images
GUIDED BYAP: REMYA DEPT OF COMPUTER SCIENCE & ENGINEERING
SUBMITTED BY SANOJ A S ROLL NO: R11U016 S7 CSE
2
CONTENTS
INTRODUCTION VIRTUAL MACHINE DEDUPLICATION ISSUES IN VM STORAGE LIQUID SYSTEM ARCHITECTURE COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL DEDUPLICATION IN LIQUID OPTIMIZATIONS ON FINGER PRINT CALCULATION STORAGE FOR DATA BLOCKS ADVANTAGES OF LIQUID CONCLUSION
3
INTRODUCTION
Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.
4
VIRTUAL MACHINE
Saving as a critical component in cloud computing.
Virtual Machine - Hypothetical Computer.
Emulates the functions of a real world computer.
Executes programs like a physical machine.
Initial state of a virtual machine is stored in a file called virtual Machine image.
5
VIRTUAL MACHINE
6
DEDUPLICATION
Data Deduplication – data compression technology.
Eliminates duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple times.
Improves storage utilization
7
DEDUPLICATION
8
ISSUES IN VM STORAGE
High demand on VM storage remains a challenging problem.
Existing systems have made efforts to reduce storage consumption.
Uses SAN cluster.
Cannot satisfy increasing demand due to cost limitation.
Hence we propose LIQUID.
9
LIQUID SYSTEM ARCHITECTURE
Three components - Single meta server with hot back up multiple data server and multiple clients.
Runs on user-level service process.
VM images are split into fixed size data blocks.
Meta server – namespace , finger print , reference count.
Meta server – mirrored to hot back up shadow meta server.
10
LIQUID SYSTEM ARCHITECTURE (CONT)
Data servers – change of managing data blocks in VM images.
Organized in a distributed hash table.
A liquid client provides a POSIX compatible file system.
Client – critical component (provides deduplication)
Fault tolerance – Mirroring the meta server.
Replicas of data blocks are stored.
11
LIQUID SYSTEM ARCHITECTURE (CONT)
Shadow Meta Server
Meta server
Data Servers
Client FS
Client FS
Client FS
CacheCache Cache
Heart beat
Fig : Liquid architecture.
Hot backup
12
COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL
META SERVER-manages all data servers.
Exchange regular heart beat message with each data server in a ROUND ROBIN FASHION.
Detect failed data servers when there are many data servers.
To speed up failure detection data servers send an error signal to meta server.
13
DEDUPLICATION IN LIQUID
Liquid chooses fixed size chunking instead of variable size chunking. Better since all files stored in VM images will be aligned on disk
block boundaries.
Advantage-simplicity.
Block size choice.
Block size- balancing factor which is hard to choose.
Great impact on both deduplication and io performance.
14
DEDUPLICATION IN LIQUID(CONT)
Smaller block size-more random seeks when accessing a VM image.
Not tolerable. A large block size is also not preferable, it will reduce
deduplication ratio.
Liquid choose different block size under different situation. Advised to use a multiplication of 4 kb between 256 kb and 1
MB to achieve good balance between IO performance and deduplication ratio.
15
DEDUPLICATION IN LIQUID(CONT)
16
DEDUPLICATION IN LIQUID(CONT)
17
OPTIMIZATIONS ON FINGER PRINT CALCULATION
Rely on comparison of data block finger prints for redundancy.
Finger print-collision resistant hash value calculated from data block contents.
MD5[26] and SHA-1[12] are frequently used for this purpose.
Finger print collision - very small, orders of magnitude smaller than hardware error rates.
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
So we could safely assume that two data blocks are identical.
Finger print calculation - expensive.
Delays finger print calculation for recently modified data blocks.
Runs deduplication lazily only when it is necessary.
Client side maintains a shared cache which contains recently accessed data blocks. 18
19
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
A portion of memory is used by the client side of liquid as private cache.
Private cache hold-modified data blocks and delay finger print calculation on them.
Modified data block ejected from->shared cache and added to ->private cache.
Modified data will be ejected->if private cache becomes full.
20
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
And ejected based on LRU policy.
Only then will the modified data block’s finger print be calculated.
Liquid uses multiple threads for finger print calculation.
Multiple threads will process different data blocks currently.
Provides good IO performance.
21
FILE SYSTEM LAY OUT
All file system meta data are stored on the meta server.
Organized in a file system tree.
Client side could cache portions of file system meta data for
fast accesses.
When a VM is stopped ,modified meta data and data blocks
Will be pushed back to meta server.
Data servers ensures modification on VM image is visible to
other client nodes.
22
FILE SYSTEM LAY OUT
Fig. Process of look-up by fingerprint.
23
ADVANTAGES OF LIQUID
Fast Virtual Machine deployment with peer to peer data transfer.
Low storage consumption by means of deduplication.
Instant cloning for virtual machine images.
On demand fetching through a network caching with local disks.
LIQUID files has no specific limit.
24
CONCLUSION
Presented LIQUID which is a deduplication file system with good IO performance.
Achieved by caching frequently accessed data blocks in memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
25
REFERENCES
Bloom Filter, Sept. 2011. [Online]. Available :http://en.wikipedia.org/wiki/Bloom_filter
Filesystem in Userspace, Sept. 2011. [Online]. Available: http://fuse.sourceforge.net/
Rabin Fingerprint, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/Rabin_fingerprint.
Reiserfs, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/ReiserFS.
Xfs: A High-Performance Journaling Filesystem, Sept. 2011. [Online]. Available: http://oss.sgi.com/projects/xfs/.
Data Deduplication, Sept. 2013. [Online]. Available: http://en.wikipedia.org/wiki/Data_deduplication.