View
214
Download
0
Tags:
Embed Size (px)
Citation preview
DEFER Cache – an Implementation
Sudhindra Rao and Shalaka PrabhuThesis Defense Master of ScienceDepartment of ECECS
OSCAR Lab
DEFER Cache 2
DEFER Cache
Overview
Related Work
DEFER Cache Architecture
Implementation – Motivation and Challenges
Results and Analysis
Conclusion
Future Work
DEFER Cache 3
OverviewAccessing remote memory is faster than accessing local disks; hence co-operative cachingCurrent schemes - fast reads, slow writesGoal: Combine replication with logging for fast, reliable write-back to improve write performance
DCD and RAPID already do something similar
New architecture: Distributed, Efficient and Reliable (DEFER) Co-operative CacheDEFER Cache
Duplication & Logging Co-operative caching for both read and write requests Vast performance gains (up to 11.5x speedup) due to write-
back[9]
DEFER Cache 4
Co-operative Caching
High Performance, Scalable LAN Slow speed of the file server disks
Increasing RAM in the server File Server with 1GB RAM 64 Clients with 64MB RAM = 4GB
Cost-Effective Solution Using Remote Memory for Caching 6-12ms for accessing 8KB data from disk Vs 1.05 ms
from remote client Highly Scalable But all related work focuses on read performance
DEFER Cache 5
Other Related Work
N-Chance Forwarding Forwards singlets to remote host on capacity miss Re-circulates N times and then written to server disk Uses write-through cache
Co-operative Caching using hintsGlobal Memory SystemRemote Memory ServersLog Structured Storage Systems
LFS, Disk Caching Disk, RAPIDNVRAM - not cost effective with current technology
What’s DEFER? Improve Write Performance DCD Using Distributed Systems
DEFER Cache 6
Log based write mechanism
DCD[7] and RAPID[8] implement log based write
Improvement in small writes using log
Reliability and data availability from the log partition
Segment Buffer
Data Partition
Local disk
Remote Cache
Memory
Log Partition
Local Cache
DCD like structure of DEFER
DEFER Cache 7
Logging algorithmWriting a segment
Cache disk
RAM Cache
. . ...
. . ...
7 99 2348
10 1 1152
FreeLog segment Free
25
101
11.
Map
pin
g T
able
Segment Buffer
Write 128KB of LRU data to a cache-disk segment, in one large write
Pickup LRU data to capture temporal locality, improve reliability and reduce disk traffic
Most data will be overwritten repeatedly
. . ...
Cache disk
Segment write done
RAM Cache
. . ...
9948
10 11 25
101
11.
Map
pin
g T
able
Free Free2 5 7 1 23
Free
DEFER Cache 8
Garbage Collection
Data is written into the cache-disk continuouslyCache-disk will fill eventually – log writesMost of data in the cache-disk is “garbage”
caused by data overwriting
Need to clean the garbage to make free log disk
Log disk on client
11 10 4 89108 342294 1111
Log disk on client
11 10 4 8934229
Before garbage collection
After garbage collection
DEFER Cache 9
DEFER Cache Architecture
Typical distributed system (client-server)Applications run on workstations (clients) and access files from the ServerLocal disks on clients only for booting, swapping and loggingLocal RAM divided into I/O cache and segment bufferLocal disk has corresponding log partition
DEFER Cache 10
DEFER Cache Algorithms
DEFER is DCD distributed over the network Best of co-operative caching and logging
Reads handled exactly as in N-chance ForwardingWrites are immediately duplicated and eventually logged after a pre-determined time interval MDirty singlets are forwarded like N-chanceThree logging strategies used:
Server Logging Client Logging Peer Logging
DEFER Cache 11
Server Logging
Update Server Table. Free the Lock.
Client 1
Client n
W Request
.
.
.
.
.
.2. Invalidate
2. Invalidate
Log DiskSegment Buffer
4. Send a Copy to Server. Cache.
1. Lock Ownership Request
3. Lock Ownership Granted
Server
Client copies block to server cache on a write Server Table maintains consistency Invalidates clients and Logs the contents of
segment buffer Increased load on the server due to logging
DEFER Cache 12
Client Logging
Advantage: Server load is decreased Disadvantage: Availability of the block is affected
Client 2
Client n
.
.
Free the Lock.Update Server Table.
.
.
.
2. Invalidate
2. Invalidate
Log DiskSegment Buffer
W
6. Logging Complete. Remove the dirty blocks sent by Client 1 from the server cache.
4. Copy Data to Server Cache
1. Lock Ownership Request
3. Lock Ownership Granted
5. After ‘M’
seconds
Server
Client 1
DEFER Cache 13
Peer LoggingClient 1
Each workstation is assigned a peer – peer performs logging
Advantage: Reduces server load without compromising availability,
Log Disk
Segment Buffer
Client 2
Client n
W Request
.
.
.
Update Server Table. Send Invalidate
.
.
.2.Invalidate
2. Invalidate
5. After ‘M’
seconds
1. Lock Ownership Request
3. Lock Ownership Granted
4. Send a Copy of the Block to the Peer
4. Update Server Table
n-5n
..
..
..
12
21
Peer mapping
DEFER Cache 14
Reliability
Every M seconds, blocks that were modified within the last M to 2M seconds are loggedThus, for M = 15, we guarantee that data modified within the last 30 seconds is written to diskMost UNIX systems use a delayed write-back of 30 secondsM can be reduced, to increase frequency of logging, without introducing high overheadWith DEFER, blocks are logged and duplicated
DEFER Cache 15
Crash Recovery
Peer LoggingRecovery algorithm works on the on-log-disk version of dataIn-memory and on-log-disk copy are in different hostsFind the blocks that were updated by the crashed client and the peer informationServer initiates recovery of the blocks from the peer
DEFER Cache 17
Real WorkloadsSnake – Peer Logging 4.9x; Server and Client Logging 4.4x Cello – Peer Logging 8.5x; Server and Client Logging 7.3x
DEFER Cache 19
DEFER Cache Architecture
write to remote cache
send to peer/server
receive from peer/server
Segment 1
Segment 2
logging to segment buffer
ServerQueue
Defer Server
write to remote cache
send to peer/server
receive from peer/server
local Cache
Remote Cache
Segment 1
Segment 2
logging to segment buffer
ServerQueue
Defer_client Defer_client
local Cache
Remote Cache
DEFER Cache 20
DEFER Cache design
Follow the design principles Use only commodity hardware that is
available in typical systems. Avoid storage media dependencies such
as use of only SCSI or only IDE disks. Keep the data structures and
mechanisms simple. Support reliable persistence semantics. Separate mechanisms and policies.
DEFER Cache 21
ImplementationImplementation with Linux – open source
Implemented client logging as a device driver or a library – No change to the system kernel or application codeUses linux device drivers to create a custom block device attached to the network device – provides system call overload using loadable kernel moduleNetwork device uses Reliable UDP to ensure fast and reliable data transferAlso provides a library for testing and implementing on non-linux systems – provides system call overloadAlternative approach using NVRAM under test
DEFER Cache 22
DEFER as a module
unregister_capability()
register_capability()
printk, add_to_request_queue, ioctl, generic_make_request,
send/recv, ll_rw_blk
init_module()
cleanup_module()
read, write, open, close,
M-sec algorithm garbage collect, call nbd
Data
Defer_module Kernel proper
Kernel functions
Network device
DEFER Cache 23
Data Management
Plugs into the OS as a custom block device that contains memory and a diskDisk managed independent of the OSrequest_queue intercepted to redirect to Defer moduleRead/Write override with Defer read/writeInterfaces with the Network device to transfer dataInterfaces with the kernel by registering special capabilities – logging, de-stage, garbage collection, data recovery on crash
DEFER Cache 24
DEFER Cache - Implementation
Simulation results present a 11.5x speedupDEFER Cache implemented in real time system to support the simulation results.Multi-hierarchy cache structure can be implemented at Application level File System level Layered device driver Controller level
Kernel device driver selected as it achieves efficiency and flexibility.
DEFER Cache 25
Implementation Design
Implementation derived from DCD implementation.DEFER Cache can be considered as a DCD over a distributed system.Implementation design consists of three modules Data management
Implements the caching activities on the local machine. Network interface
Implements the network transfer of blocks to/from server/client. Coordinating daemons
Coordinates the activities of the above mentioned two modules.
DEFER Cache 26
Data Management
Custom block device driver developed and plugged into the kernel during execution.
Driver modified according to DEFER Cache design.
Request function of the device driver modified.
Read/Write for RAM replaced by DEFER Cache read/write call.
DEFER Cache 27
Network Interface
Implemented as a network block device (NBD) driver.NBD simulates a block device on the local client, but connects to a remote machine which actually hosts the data.Local disk representation for a remote client.Can be mounted and accessed as a normal block device.All read/write requests transferred over the network to the remote machine.Consists of three parts
NBD client NBD driver NBD server
DEFER Cache 28
NBD – Design
NBDClient
init_module()
ioctl()
transmit()
request()
NBD Driver Kernel
register_blkdev()
blk_init_queue()
Default Queue
NBDServer
User Space Kernel Space
DEFER Cache 29
NBD – Client
NBDClient
init_module()
ioctl()
transmit()
request()
NBD Driver Kernel
register_blkdev()
blk_init_queue()
Default Queue
NBDServer
User Space Kernel Space
DEFER Cache 30
NBD – Driver
NBDClient
init_module()
ioctl()
transmit()
request()
NBD Driver Kernel
register_blkdev()
blk_init_queue()
Default Queue
NBDServer
User Space Kernel Space
DEFER Cache 31
NBD – Driver
NBDClient
init_module()
ioctl()
transmit()
request()
NBD Driver Kernel
register_blkdev()
blk_init_queue()
Default Queue
NBDServer
User Space Kernel Space
DEFER Cache 32
Linux Device Driver Issues
Successfully implemented Linux device drivers for Data Management and Network Interface module.Could not be thoroughly tested and validated.Poses following problems Clustering of I/O requests by kernel Kernel memory corruption Synchronization problem No specific debugging tool
DEFER Cache 33
User-mode Implementation
Implementation of DEFER Cache switched to User-mode.
Advantages High flexibility. All data can be manipulated by user
according to requirements. Easier to design and debug. Good design can improve the performance.
Disadvantages Response time is slower – worse if data is swapped
DEFER Cache 34
User-Mode Design
Simulates drivers in the user-mode.
All data structures used by device drivers duplicated in the user-space.
Use raw disk.
32MB buffer space allocated for DEFER Cache in RAM.
Emulates I/O buffer cache.
DEFER Cache 35
DEFER Server - Implementation
Governs the entire cluster of workstation.Maintains it’s own I/O cache and a directory table.Server directory table maintains the consistency in the system.server-client handshake performed on every write update.Server directory table entry reflects the last writer.Used for garbage collection and data recovery.
DEFER Cache 36
Initial Testing
Basic idea : Accessing remote data faster than accessing data on local disk.Is LAN speed faster than disk access speed?As UDP used as network protocol, UDP transfer delay measured.Underlying network - 100Mbps Ethernet network.Use UDP monitor program.
DEFER Cache 38
Benchmark Program
Developed in-house benchmark program.Generates requests using a history table. Generates temporal locality and spatial locality.Runs on each workstation Following parameter can be modified at runtime
Working set size Client cache size Server cache size Block size Correlation factor (c)
DEFER Cache 39
Results (working set size)
Result of varying file size on Bandwidth (c=1)
0
2000
4000
6000
8000
10000
12000
2 4 8 16 32 64
File Size (MB)
Ban
dwid
th (K
B/s
ec)
Baseline System
DEFER Cache
DEFER Cache 40
Results (small writes)
Effect of Small write
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
4 8 16 32 64
File Size (KB)
Ban
dw
idth
(K
B/s
ec)
Baseline System
DEFER Cache
DEFER Cache 41
Results (Response time for small writes)
Result of varying file size on Response time (c=1)
0
2
4
6
8
10
12
14
16
4 8 16 32 64
File Size (KB)
Res
pons
e T
ime
(ms)
Baseline System
DEFER Cache
DEFER Cache 42
Results (Response time for sharing data)
Result of varying file size on Response time (c=0.75)
0
5
10
15
20
4 8 16 32
File Size (KB)
Res
po
nse
Tim
e (m
s)
Baseline System
DEFER System
DEFER Cache 43
Results (Varying client cache size)
Result of varying Client Cache Size on Bandwidth
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
16 32 64
Client Cache Size (MB)
Ban
dw
idth
(K
B/s
ec)
Baseline
DEFER Cache
DEFER Cache 44
Results (varying server cache size)
Result of varying Server Cache Size on Bandwidth
0
2000
4000
6000
8000
10000
128 256 512
Server Cache Size (MB)
Ban
dwid
th (
KB
/sec
)
Baseline System
DEFER Cache
DEFER Cache 45
Results (latency)
Latency comparison of DEFER Cache and Baseline System
0
200400
600800
10001200
14001600
1800
2 4 8 16
File Size (MB)
Late
ncy
(m
icro
seco
nd
s)
Baseline System
DEFER Cache
DEFER Cache 46
Results (Delay measurements)
Delay comparison of DEFER Cache and Baseline System
0
500
1000
1500
2000
2500
3000
3500
4000
2 4 8 16
File Size (MB)
Del
ay
(m
icro
seco
nd
s)
Baseline
DEFER
DEFER Cache 47
Results (Execution time)
0
10
20
30
40
50
60
4 8 16 32 64
Working set (MB)
Tim
e (s
ecs)
Baseline
DEFER
Execution time for DEFER Cache and Baseline system
DEFER Cache 48
Conclusions
Improves write performance for cooperative caching.
Reduces small write penalty.
Ensures reliability and data availability
Improves overall File system performance.
DEFER Cache 49
Future Work
Improve user-level implementation. Extend kernel-level functionality to user-level. Intercept
system level calls and modify them to implement DEFER read/write calls.
Kernel-Level Implementation. Successfully implement DEFER Cache at kernel level
and plug-in with kernel.