Upload
penelope-herman
View
24
Download
3
Embed Size (px)
DESCRIPTION
Caching in the Sprite Network File System Scale and Performance in a Distributed File System. COMP 520 September 21, 2004. Agenda. The Sprite file systems is functionally similar to UNIX. Read, write, open, and close calls provide access to files Sprite communicates kernel-to-kernel - PowerPoint PPT Presentation
Citation preview
Caching in the Sprite Caching in the Sprite Network File SystemNetwork File System
Scale and Performance Scale and Performance in a Distributed File in a Distributed File
SystemSystem
COMP 520COMP 520
September 21, 2004September 21, 2004
AgendaAgenda
The Sprite file system
Basic cache design
Concurrency issues
Benchmarking Andrew
The Sprite file systems is The Sprite file systems is functionally similar to UNIXfunctionally similar to UNIX
Read, write, open, and Read, write, open, and close calls provide close calls provide access to filesaccess to files
Sprite communicates Sprite communicates kernel-to-kernelkernel-to-kernel
Remote-procedure-Remote-procedure-calls (RPC) allow calls (RPC) allow kernels to talk to each kernels to talk to each otherother
ClientKernel
ClientKernel
ClientKernel
ServerKernel
RPC RPC
RPC
Sprite uses caching on the Sprite uses caching on the client and server sideclient and server side
Two different caching mechanismsTwo different caching mechanisms– Server workstations use caching to reduce Server workstations use caching to reduce
delays caused by disk accessesdelays caused by disk accesses– Client workstations use caching to minimize Client workstations use caching to minimize
the number of calls made to non-local disksthe number of calls made to non-local disks
Client Cache
Server Cache
Client Cache
Network
FileTraffic
ServerTraffic
ServerTraffic File
Traffic
DiskTraffic
DiskTraffic
Three main issues are Three main issues are addressed by Sprite’s caching addressed by Sprite’s caching
systemsystem1.1. Should client caches be kept in Should client caches be kept in
main memory or on local disk?main memory or on local disk?
2.2. What structure and addressing What structure and addressing scheme should be used for caching?scheme should be used for caching?
3.3. What should happen when a block What should happen when a block is written back to disk?is written back to disk?
Sprite caches client data in Sprite caches client data in main memory, not on local diskmain memory, not on local disk Allows clients to be disklessAllows clients to be diskless
– CheaperCheaper– QuieterQuieter
Data access is fasterData access is faster Physical memory is large enoughPhysical memory is large enough
– Provides a high hit ratioProvides a high hit ratio– Memory size will continue to growMemory size will continue to grow
A single caching mechanism can be A single caching mechanism can be used for both client and serverused for both client and server
LocalDisk
A virtual addressing structure A virtual addressing structure is used for cachingis used for caching
Data organized into blocksData organized into blocks– 4 Kbytes4 Kbytes– Virtually addressedVirtually addressed– Unique file identifier and block indexUnique file identifier and block index– Both client and server cache data blocksBoth client and server cache data blocks
Server also caches naming info.Server also caches naming info.– Addressed using physical addressAddressed using physical address– All naming operations (open, close, etc.) passed to the serverAll naming operations (open, close, etc.) passed to the server– Cached file info. lost if server crashedCached file info. lost if server crashed
Client Cache Server CacheDatablock
Datablock
Datablock
Datablock
Datablock
Datablock
Datablock
DatablockDatablock
MgmtInfo
MgmtInfo
MgmtInfo
OpenCloseReadWrite
Datablocks
Sprite uses a delayed-write Sprite uses a delayed-write policy to write dirty blocks to policy to write dirty blocks to
diskdisk Every 30 seconds dirty blocks which have not been Every 30 seconds dirty blocks which have not been
changed in the last 30 seconds are written to diskchanged in the last 30 seconds are written to disk Blocks written by a client are written to the server’s cache Blocks written by a client are written to the server’s cache
in 30-60 seconds and to the server’s disk in 30-60 more in 30-60 seconds and to the server’s disk in 30-60 more secondsseconds
Limits server trafficLimits server traffic Minimizes the damage in a crashMinimizes the damage in a crash
30 sec. DirtyBlock
DiskDirtyBlock
Client Server
30 sec.
Untouched for 30 seconds Untouched for
30 seconds
AgendaAgenda
The Sprite file system
Basic cache design
Concurrency issues
Benchmarking Andrew
Two unusual design Two unusual design optimizations differentiate optimizations differentiate
system, solve problemssystem, solve problems Consistency guaranteedConsistency guaranteed
– All clients see the most recent version of All clients see the most recent version of a filea file
– Provides transparency to the userProvides transparency to the user– Concurrent and sequential write-sharing Concurrent and sequential write-sharing
permittedpermitted Cache size changesCache size changes
– Virtual memory system and file system Virtual memory system and file system negotiate over physical memorynegotiate over physical memory
– Cache space reallocated dynamicallyCache space reallocated dynamically
Concurrent write-sharing Concurrent write-sharing makes the file system more makes the file system more
user friendlyuser friendly A file is opened by A file is opened by
multiple clientsmultiple clients At least one client At least one client
has the file open has the file open for writingfor writing
Concurrent write-Concurrent write-sharing occurssharing occurs
Client B Client C
Client A
Server
F1: RF1: R
F1: W
Concurrent write-sharing can Concurrent write-sharing can jeopardize file consistencyjeopardize file consistency
Server detects concurrent Server detects concurrent write-sharingwrite-sharing
Server instructs client B to Server instructs client B to write all dirty blocks to write all dirty blocks to memorymemory
Server notifies all clients Server notifies all clients that file is no longer that file is no longer cacheablecacheable
Clients remove all cached Clients remove all cached blocksblocks
All future access requests All future access requests sent to serversent to server
Server serializes requestsServer serializes requests File becomes cacheable File becomes cacheable
again once no longer open again once no longer open and undergoing write and undergoing write sharingsharing
Client B Client C
Client A
Server
F1: W
F1: R
NotifyWrite
NotifyRequest
Notify
Notify
Sequential write-sharing Sequential write-sharing provides transparency, but not provides transparency, but not
without risks without risks
Sequential write-sharingSequential write-sharing– Occurs when a file is modified by a client, Occurs when a file is modified by a client,
closed, then opened by a second clientclosed, then opened by a second client– Clients are always guaranteed to see the most Clients are always guaranteed to see the most
recent version of the filerecent version of the file
File 1: v1
File 1: v1File 1: v1
Client A
Client B Client C
Server
File 1: v2
Client A
Client B Client C
Server
File 1: v1File 1: v1
Sequential write-sharing: Sequential write-sharing: Problem 1Problem 1
Problem:Problem:– Client A modifies a fileClient A modifies a file– Client A closes the fileClient A closes the file– Client B opens the file Client B opens the file
using out-of-date using out-of-date cached blockscached blocks
– Client B has an out-of-Client B has an out-of-date version of the filedate version of the file
Solution: version Solution: version numbersnumbers
Client B Client C
Client A
Server
F1: v1
F1: cacheF1: v1
F1: v2
Close F1
Open F1
Sequential write-sharing: Sequential write-sharing: Problem 2Problem 2
Problem: The last client Problem: The last client to write to a file did not to write to a file did not flush the dirty blocksflush the dirty blocks
Solution:Solution:– Server keeps track of Server keeps track of
last writerlast writer– Only last writer allowed Only last writer allowed
to have dirty blocksto have dirty blocks– When server receives When server receives
open request, notifies open request, notifies last writerlast writer
– Writer writes any dirty Writer writes any dirty blocks to serverblocks to server
Ensures reader will Ensures reader will receive up-to-date info.receive up-to-date info.
Client B Client C
Client A
Server (Client A)
F1: Dirty blocks
Open: F1
Write: F1Notify
F1
Cache consistency does Cache consistency does increase server trafficincrease server traffic
Server traffic was reduced by over 70% due to Server traffic was reduced by over 70% due to client cachingclient caching
25% of all traffic is a result of cache consistency25% of all traffic is a result of cache consistency Table 2 Table 2
– Gives an upper bound on cache consistency algorithmsGives an upper bound on cache consistency algorithms– Unrealistic since incorrect results occurredUnrealistic since incorrect results occurred
Server Traffic (normalized)
0%
20%
40%
60%
80%
100%
120%
0 Mbyte 0.5Mbyte
1 Mbyte 2Mbytes
4Mbytes
8Mbytes
Client Cache Size
Cache ConsistencyGuaranteed
Ignoring CacheConsistency
Dynamic cache allocation also Dynamic cache allocation also sets Sprite apartsets Sprite apart
Virtual memory and the file system battle Virtual memory and the file system battle over main memory over main memory
Both modules keep a time-of-last accessBoth modules keep a time-of-last access Both compare oldest page with oldest Both compare oldest page with oldest
page in other modulepage in other module The oldest page in the cache is recycledThe oldest page in the cache is recycled
Virtual MemoryKeeps pages in approx. LRU order using clock
algorithm
File SystemKeeps blocks in perfect LRU order by tracking
read and write calls
Main Memory
Pages Blocks
Negotiations could cause Negotiations could cause double-cachingdouble-caching
Problem: Problem: – Pages being read from backing files could wind up in both the Pages being read from backing files could wind up in both the
files cache and the virtual memory cachefiles cache and the virtual memory cache– Could force a page eliminated from the virtual memory pool to Could force a page eliminated from the virtual memory pool to
be moved to the file cachebe moved to the file cache– The page would then have to wait another 30 seconds to be The page would then have to wait another 30 seconds to be
sent to the serversent to the server Solution: Solution:
– When writing and reading backing files, virtual memory skips When writing and reading backing files, virtual memory skips the local file cachethe local file cache
Virtual Memory File System
Main Memory
Page A Page A
Multi-block pages create Multi-block pages create problems in shared cachingproblems in shared caching
Problem:Problem:– Virtual memory pages are big enough to hold multiple file Virtual memory pages are big enough to hold multiple file
blocksblocks– Which block’s age should be used to represent the LRU time Which block’s age should be used to represent the LRU time
of the page?of the page?– What should be done with the other blocks once one is What should be done with the other blocks once one is
relinquished?relinquished? Solution:Solution:
– The age of the page is the age of the youngest blockThe age of the page is the age of the youngest block– All blocks in a page are removed togetherAll blocks in a page are removed together
Virtual Memory File System
2:15 2:16
2:19 3:05
4:30 4:31
Main Memory
AgendaAgenda
The Sprite file system
Basic cache design
Concurrency issues
Benchmarking Andrew
Micro-benchmarks show Micro-benchmarks show reading from a server cache is reading from a server cache is
fastfast An upper limit on remote file access costsAn upper limit on remote file access costs Two important results:Two important results:
– A client can access his own cache 6-8 time faster than A client can access his own cache 6-8 time faster than he can access the server’s cachehe can access the server’s cache
– A client is able to write and read from the server’s cache A client is able to write and read from the server’s cache about as quickly as he can from a local diskabout as quickly as he can from a local disk
Read and Write Throughput
0
500
1000
1500
2000
2500
3000
3500
Local Cache Server Cache Local Disk Server Disk
Kb
yte
s/s
ec
on
d
Read
Write
Macro-benchmarks indicate Macro-benchmarks indicate disks and caching together run disks and caching together run
fastestfastestNormalized Execution Time
0%
50%
100%
150%
200%
250%
Local disk w/ cache, cold
Local disk w/ cache, warm
Diskless, server cacheonly, cold
Diskless, server cacheonly, warm
Diskless, client and servercaches, cold
Diskless, client and servercaches, warm
With a warm start and client caching, diskless With a warm start and client caching, diskless machines were only up to 12% worse than machines were only up to 12% worse than machines with disks machines with disks
Without caching machines were 10-50% slowerWithout caching machines were 10-50% slower
AgendaAgenda
The Sprite file system
Basic cache design
Concurrency issues
Benchmarking Andrew
Andrew’s caching is notably Andrew’s caching is notably different from Sprite’sdifferent from Sprite’s
Vice: a group of trusted serversVice: a group of trusted servers– Stores data and status information in separate filesStores data and status information in separate files– Has a directory hierarchyHas a directory hierarchy
Venus: A user-level process on each client workstationVenus: A user-level process on each client workstation– Status cache: stored in virtual memory for quick status checksStatus cache: stored in virtual memory for quick status checks– Data cache: stored on local disk Data cache: stored on local disk
Client Cache Server Cache
MemoryDatablock
Datablock
Datablock
Datablock
Datablock
DatablockDatablock
Naming
Naming
Naming
OpenCloseReadWrite
Datablocks
Venus ViceLocalDisk
MemoryDatafile
Statusinfo
Datafile
Datafile
Statusinfo
Datafile
Datafile
Statusinfo
Statusinfo
Statusinfo
OpenClose
Datafiles
Sprite Andrew
……the pathname conventions the pathname conventions are also very differentare also very different
Two level namingTwo level naming– Each Vice file or directory identified by a unique Each Vice file or directory identified by a unique fidfid– Venus maps Vice pathnames to Venus maps Vice pathnames to fidfidss– Servers see only Servers see only fidfidss
Each Each fidfid has 3 parts and is 96 bits long has 3 parts and is 96 bits long– 32-bit Volume number32-bit Volume number– 32-bit Vnode number (index into the Volume)32-bit Vnode number (index into the Volume)– 32-bit Uniquifier (guarantees no fid is ever used twice)32-bit Uniquifier (guarantees no fid is ever used twice)– Contains no location informationContains no location information
Volume locations are maintained on Volume Volume locations are maintained on Volume Location Database found on each serverLocation Database found on each server
Fid:32-bits
Volume number UniquifierVnode number
32-bits 32-bits
Andrew uses write-on-close Andrew uses write-on-close conventionconvention
SpriteSprite Delayed-write policyDelayed-write policy
– Changes written back every Changes written back every 30 seconds30 seconds
– Prevents writing changes that Prevents writing changes that are quickly erasedare quickly erased
– Decreases damage in the Decreases damage in the event of a crashevent of a crash
– Rationed network trafficRationed network traffic
AndrewAndrew Write-on-close policyWrite-on-close policy Write changes are visible to Write changes are visible to
the network only after the file the network only after the file is closedis closed
Little information lost in a Little information lost in a crashcrash– caching on local disk, not caching on local disk, not
main memorymain memory– The network will not see a file The network will not see a file
in the event of a client crashin the event of a client crash 75% of files open less than 75% of files open less than
0.5 seconds0.5 seconds 90% open less than 10 90% open less than 10
secondsseconds Could results in higher server Could results in higher server
traffictraffic Delays closing processDelays closing process
Sequential consistency is Sequential consistency is guaranteed in Andrew and guaranteed in Andrew and
SpriteSprite
Clients are guaranteed to see the latest version of a fileClients are guaranteed to see the latest version of a file– Venus assumes that cached entries are validVenus assumes that cached entries are valid– Server maintains Callbacks to cached entriesServer maintains Callbacks to cached entries– Server notifies callbacks before allowing a file to be modifiedServer notifies callbacks before allowing a file to be modified– Server has the ability to break callbacks to reclaim storageServer has the ability to break callbacks to reclaim storage– Reduces server utilization since communication occurs only Reduces server utilization since communication occurs only
when a file is changedwhen a file is changed
File 1: v1
File 1: v1File 1: v1
Client A
Client B Client C
Server
File 1: v2
Client A
Client B Client C
Server
File 1: v1File 1: v1
Concurrent write-sharing Concurrent write-sharing consistency is not guaranteedconsistency is not guaranteed
Different workstations Different workstations can perform the same can perform the same operation on a file at operation on a file at the same timethe same time
No implicit file lockingNo implicit file locking Applications must Applications must
coordinate if coordinate if synchronization is an synchronization is an issueissue
Client A
Client B Client C
Server
File 1: W
File 1: WFile 1: R
Comparison to other Comparison to other systemssystems
With one client, With one client, Sprite is around 30% Sprite is around 30% faster than NFS and faster than NFS and 35% faster than 35% faster than AndrewAndrew
Andrew has the Andrew has the greatest scalabilitygreatest scalability– Each client in Andrew Each client in Andrew
utilized about 2.4% utilized about 2.4% of the CPUof the CPU
– 5.4% in Sprite5.4% in Sprite– 20% in NFS20% in NFS
0200400600800
0 1 2 3 4 5 6 7 8
Number of Clients
Ela
pse
d T
ime
SpriteAndrewNFS
0%
50%
100%
0 1 2 3 4 5 6 7 8
Number of Clients
Ser
ver
Uti
l.
SpriteAndrewNFS
Summary: Sprite vs. AndrewSummary: Sprite vs. AndrewSpriteSprite AndrewAndrew
Cache locationCache location MemoryMemory DiskDisk
Client cachingClient caching Clients cache data Clients cache data blocksblocks
Clients cache data files Clients cache data files and status info.and status info.
Cache sizeCache size VariableVariable FixedFixed
File path lookupsFile path lookups Lookups performed by Lookups performed by serverserver
Lookups performed by Lookups performed by clientsclients
Concurrent write-sharingConcurrent write-sharing 30 second write delay, 30 second write delay, consistency guaranteedconsistency guaranteed
Write on close, Write on close, consistency not consistency not guaranteedguaranteed
SequentialSequential Consistency guaranteed: Consistency guaranteed: Servers know which Servers know which workstations have a file workstations have a file cachedcached
Consistency guaranteed: Consistency guaranteed: Servers maintain Servers maintain callbacks to identify callbacks to identify which workstations which workstations cache filescache files
Cache ValidationCache Validation Validated on openValidated on open Server notifies when Server notifies when modifiedmodified
Kernel vs. User-levelKernel vs. User-level Kernel-to-kernel Kernel-to-kernel communicationcommunication
OS intercepts file system OS intercepts file system calls and forwards to calls and forwards to user-level Venususer-level Venus
Conclusion: Both file systems Conclusion: Both file systems have benefits and drawbackshave benefits and drawbacks
SpriteSprite Benefits:Benefits:
– Guarantees sequential and Guarantees sequential and concurrent consistencyconcurrent consistency
– Faster runtime with single Faster runtime with single client due to memory client due to memory caching and kernel-to-kernel caching and kernel-to-kernel communicationcommunication
– Files can be cached in blocksFiles can be cached in blocks
Drawbacks:Drawbacks:– Lacks the scalability of Lacks the scalability of
AndrewAndrew– Writing every 30 seconds Writing every 30 seconds
could result in lost datacould result in lost data– Fewer files can be cached on Fewer files can be cached on
main memory than on the main memory than on the diskdisk
AndrewAndrew Benefits:Benefits:
– Better scalability due in part to Better scalability due in part to shifting path lookup to clientshifting path lookup to client
– Transferring entire files Transferring entire files reduces communication with reduces communication with server, no read and write callsserver, no read and write calls
– Tracking entire files is easier Tracking entire files is easier than individual pagesthan individual pages
DrawbacksDrawbacks– Lacks concurrent write-sharing Lacks concurrent write-sharing
consistency guaranteesconsistency guarantees– Caching to the disk slows Caching to the disk slows
runtimeruntime– Files larger than the disk Files larger than the disk
cannot be cachedcannot be cached
BackupBackup
Cache consistency does Cache consistency does increase server trafficincrease server traffic
Server traffic was reduced by over 70% due to Server traffic was reduced by over 70% due to client cachingclient caching
25% of all traffic is a result of cache consistency25% of all traffic is a result of cache consistency Table 2 Table 2
– Gives an upper bound on cache consistency algorithmsGives an upper bound on cache consistency algorithms– Unrealistic since incorrect results occurredUnrealistic since incorrect results occurred
Server Traffic with Cache ConsistencyServer Traffic with Cache Consistency
Client Client Cache Cache SizeSize
Blocks Blocks readread
Blocks Blocks WrittenWritten
TotalTotal Traffic Traffic RationRation
0 Mbyte0 Mbyte 445815445815 172546172546 618361618361 100%100%
0.5 0.5 MbyteMbyte
102469102469 9686696866 199335199335 32%32%
1 Mbyte1 Mbyte 8401784017 9679696796 180813180813 29%29%
2 Mbytes2 Mbytes 7744577445 9679696796 174241174241 28%28%
4 Mbytes4 Mbytes 7532275322 9679696796 172118172118 28%28%
8 Mbytes8 Mbytes 7508875088 9679696796 171884171884 28%28%
Server Traffic, Ignoring Cache ConsistencyServer Traffic, Ignoring Cache Consistency
Client Client Cache Cache SizeSize
Blocks Blocks readread
Blocks Blocks WrittenWritten
TotalTotal Traffic Traffic RationRation
0 Mbyte0 Mbyte 445815445815 172546172546 618361618361 100%100%
0.5 0.5 MbyteMbyte
8075480754 9366393663 174417174417 28%28%
1 Mbyte1 Mbyte 5237752377 9325893258 145635145635 24%24%
2 Mbytes2 Mbytes 4176741767 9325893258 135025135025 22%22%
4 Mbytes4 Mbytes 3816538165 9325893258 131423131423 21%21%
8 Mbytes8 Mbytes 3700737007 9325893258 130265130265 21%21%
Micro-benchmarks show Micro-benchmarks show reading from a server cache is reading from a server cache is
fastfast
Give an upper limit on remote file access costsGive an upper limit on remote file access costs Two important results:Two important results:
– A client can access his own cache 6-8 time faster than A client can access his own cache 6-8 time faster than he can access the server’s cachehe can access the server’s cache
– A client is able to write and read from the server’s cache A client is able to write and read from the server’s cache about as quickly as he can from a local diskabout as quickly as he can from a local disk
Read and Write Throughput, Kbytes/secondRead and Write Throughput, Kbytes/second
Local CacheLocal Cache Server Server CacheCache
Local DiskLocal Disk Server DiskServer Disk
ReadRead 32693269 475475 224224 212212
WriteWrite 28932893 380380 197197 176176
Maximum read and write rates in various places
Macro-benchmarks indicate Macro-benchmarks indicate disks and caching together run disks and caching together run
fastestfastest With a warm start With a warm start
and client caching, and client caching, diskless machines diskless machines were only up to were only up to 12% worse than 12% worse than machines with machines with disks disks
Without caching Without caching machines were 10-machines were 10-50% slower50% slower
BenchmaBenchmarkrk
Local Disk, Local Disk,
with Cachewith CacheDiskless,Diskless,
Server Server Cache OnlyCache Only
Diskless,Diskless,
Client and Client and Server Server CachesCaches
ColdCold WarWarmm
ColdCold WarWarmm
ColdCold WarWarmm
AndrewAndrew 261261
105105%%
249249
100100%%
373373
150150%%
363363
146146%%
291291
117117%%
280280
112112%%
Fs-makeFs-make 660660
102102%%
649649
100100%%
855855
132132%%
843843
130130%%
698698
108108%%
685685
106106%%
SimulatorSimulator 161161
109109%%
147147
100100%%
168168
114114%%
153153
104104%%
167167
114114%%
147147
100100%%
SortSort 6565
107107%%
6161
100100%%
7474
121121%%
7272
118118%%
6666
108108%%
6161
100100%%
DiffDiff 2222
165165%%
88
100100%%
2727
225225%%
1212
147147%%
2727
223223%%
88
100100%%
NroffNroff 5353
103103%%
5151
100100%%
5757
112112%%
5656
109109%%
5353
105105%%
5252
102102%%
Top number: time in secondsBottom number: normalized time
Status info. on Andrew and Status info. on Andrew and SpriteSprite
Sprite mgmt cache Sprite mgmt cache contains:contains:– File mapsFile maps– Disk management Disk management
info.info.
Volumes in AndrewVolumes in Andrew
VolumeVolume– A collection of filesA collection of files– Forms a partial subtree in Vice name spaceForms a partial subtree in Vice name space
Volumes joined at Mount PointsVolumes joined at Mount Points Resides in a single disk partitionResides in a single disk partition Can be moved from server to server Can be moved from server to server
easily for load balancingeasily for load balancing Enables quotas and backupEnables quotas and backup
Sprite caching improves speed Sprite caching improves speed and reduces overheadand reduces overhead
Client side caching enables diskless Client side caching enables diskless workstationsworkstations– Caching on diskless workstations improves Caching on diskless workstations improves
runtime by 10-40%runtime by 10-40%– Diskless workstations with caching are only 0-Diskless workstations with caching are only 0-
12% slower than workstations with disks12% slower than workstations with disks Caching on the server and client side Caching on the server and client side
result in overall system improvementresult in overall system improvement– Server utilization is reduced from 5-18% to 1-Server utilization is reduced from 5-18% to 1-
9% per active client9% per active client– File intensive benchmarking was completed File intensive benchmarking was completed
30-35% faster on Sprite than on other systems30-35% faster on Sprite than on other systems