Caching in the Sprite Network File System Scale and Performance in a Distributed File System

Caching in the Sprite Caching in the Sprite Network File SystemNetwork File System

Scale and Performance Scale and Performance in a Distributed File in a Distributed File

SystemSystem

COMP 520COMP 520

September 21, 2004September 21, 2004

AgendaAgenda

The Sprite file system

Basic cache design

Concurrency issues

Benchmarking Andrew

The Sprite file systems is The Sprite file systems is functionally similar to UNIXfunctionally similar to UNIX

Read, write, open, and Read, write, open, and close calls provide close calls provide access to filesaccess to files

Sprite communicates Sprite communicates kernel-to-kernelkernel-to-kernel

Remote-procedure-Remote-procedure-calls (RPC) allow calls (RPC) allow kernels to talk to each kernels to talk to each otherother

ClientKernel

ClientKernel

ClientKernel

ServerKernel

RPC RPC

RPC

Sprite uses caching on the Sprite uses caching on the client and server sideclient and server side

Two different caching mechanismsTwo different caching mechanisms– Server workstations use caching to reduce Server workstations use caching to reduce

delays caused by disk accessesdelays caused by disk accesses– Client workstations use caching to minimize Client workstations use caching to minimize

the number of calls made to non-local disksthe number of calls made to non-local disks

Client Cache

Server Cache

Client Cache

Network

FileTraffic

ServerTraffic

ServerTraffic File

Traffic

DiskTraffic

DiskTraffic

Three main issues are Three main issues are addressed by Sprite’s caching addressed by Sprite’s caching

systemsystem1.1. Should client caches be kept in Should client caches be kept in

main memory or on local disk?main memory or on local disk?

2.2. What structure and addressing What structure and addressing scheme should be used for caching?scheme should be used for caching?

3.3. What should happen when a block What should happen when a block is written back to disk?is written back to disk?

Sprite caches client data in Sprite caches client data in main memory, not on local diskmain memory, not on local disk Allows clients to be disklessAllows clients to be diskless

– CheaperCheaper– QuieterQuieter

Data access is fasterData access is faster Physical memory is large enoughPhysical memory is large enough

– Provides a high hit ratioProvides a high hit ratio– Memory size will continue to growMemory size will continue to grow

A single caching mechanism can be A single caching mechanism can be used for both client and serverused for both client and server

LocalDisk

A virtual addressing structure A virtual addressing structure is used for cachingis used for caching

Data organized into blocksData organized into blocks– 4 Kbytes4 Kbytes– Virtually addressedVirtually addressed– Unique file identifier and block indexUnique file identifier and block index– Both client and server cache data blocksBoth client and server cache data blocks

Server also caches naming info.Server also caches naming info.– Addressed using physical addressAddressed using physical address– All naming operations (open, close, etc.) passed to the serverAll naming operations (open, close, etc.) passed to the server– Cached file info. lost if server crashedCached file info. lost if server crashed

Client Cache Server CacheDatablock

Datablock

Datablock

Datablock

Datablock

Datablock

Datablock

DatablockDatablock

MgmtInfo

MgmtInfo

MgmtInfo

OpenCloseReadWrite

Datablocks

Sprite uses a delayed-write Sprite uses a delayed-write policy to write dirty blocks to policy to write dirty blocks to

diskdisk Every 30 seconds dirty blocks which have not been Every 30 seconds dirty blocks which have not been

changed in the last 30 seconds are written to diskchanged in the last 30 seconds are written to disk Blocks written by a client are written to the server’s cache Blocks written by a client are written to the server’s cache

in 30-60 seconds and to the server’s disk in 30-60 more in 30-60 seconds and to the server’s disk in 30-60 more secondsseconds

Limits server trafficLimits server traffic Minimizes the damage in a crashMinimizes the damage in a crash

30 sec. DirtyBlock

DiskDirtyBlock

Client Server

30 sec.

Untouched for 30 seconds Untouched for

30 seconds

AgendaAgenda


Basic cache design

Concurrency issues

Benchmarking Andrew

Two unusual design Two unusual design optimizations differentiate optimizations differentiate

system, solve problemssystem, solve problems Consistency guaranteedConsistency guaranteed

– All clients see the most recent version of All clients see the most recent version of a filea file

– Provides transparency to the userProvides transparency to the user– Concurrent and sequential write-sharing Concurrent and sequential write-sharing

permittedpermitted Cache size changesCache size changes

– Virtual memory system and file system Virtual memory system and file system negotiate over physical memorynegotiate over physical memory

– Cache space reallocated dynamicallyCache space reallocated dynamically

Concurrent write-sharing Concurrent write-sharing makes the file system more makes the file system more

user friendlyuser friendly A file is opened by A file is opened by

multiple clientsmultiple clients At least one client At least one client

has the file open has the file open for writingfor writing

Concurrent write-Concurrent write-sharing occurssharing occurs

Client B Client C

Client A

Server

F1: RF1: R

F1: W

Concurrent write-sharing can Concurrent write-sharing can jeopardize file consistencyjeopardize file consistency

Server detects concurrent Server detects concurrent write-sharingwrite-sharing

Server instructs client B to Server instructs client B to write all dirty blocks to write all dirty blocks to memorymemory

Server notifies all clients Server notifies all clients that file is no longer that file is no longer cacheablecacheable

Clients remove all cached Clients remove all cached blocksblocks

All future access requests All future access requests sent to serversent to server

Server serializes requestsServer serializes requests File becomes cacheable File becomes cacheable

again once no longer open again once no longer open and undergoing write and undergoing write sharingsharing

Client B Client C

Client A

Server

F1: W

F1: R

NotifyWrite

NotifyRequest

Notify

Notify

Sequential write-sharing Sequential write-sharing provides transparency, but not provides transparency, but not

without risks without risks

Sequential write-sharingSequential write-sharing– Occurs when a file is modified by a client, Occurs when a file is modified by a client,

closed, then opened by a second clientclosed, then opened by a second client– Clients are always guaranteed to see the most Clients are always guaranteed to see the most

recent version of the filerecent version of the file

File 1: v1

File 1: v1File 1: v1

Client A

Client B Client C

Server

File 1: v2

Client A

Client B Client C

Server


Sequential write-sharing: Sequential write-sharing: Problem 1Problem 1

Problem:Problem:– Client A modifies a fileClient A modifies a file– Client A closes the fileClient A closes the file– Client B opens the file Client B opens the file

using out-of-date using out-of-date cached blockscached blocks

– Client B has an out-of-Client B has an out-of-date version of the filedate version of the file

Solution: version Solution: version numbersnumbers

Client B Client C

Client A

Server

F1: v1

F1: cacheF1: v1

F1: v2

Close F1

Open F1

Sequential write-sharing: Sequential write-sharing: Problem 2Problem 2

Problem: The last client Problem: The last client to write to a file did not to write to a file did not flush the dirty blocksflush the dirty blocks

Solution:Solution:– Server keeps track of Server keeps track of

last writerlast writer– Only last writer allowed Only last writer allowed

to have dirty blocksto have dirty blocks– When server receives When server receives

open request, notifies open request, notifies last writerlast writer

– Writer writes any dirty Writer writes any dirty blocks to serverblocks to server

Ensures reader will Ensures reader will receive up-to-date info.receive up-to-date info.

Client B Client C

Client A

Server (Client A)

F1: Dirty blocks

Open: F1

Write: F1Notify

F1

Cache consistency does Cache consistency does increase server trafficincrease server traffic

Server traffic was reduced by over 70% due to Server traffic was reduced by over 70% due to client cachingclient caching

25% of all traffic is a result of cache consistency25% of all traffic is a result of cache consistency Table 2 Table 2

– Gives an upper bound on cache consistency algorithmsGives an upper bound on cache consistency algorithms– Unrealistic since incorrect results occurredUnrealistic since incorrect results occurred

Server Traffic (normalized)

0%

20%

40%

60%

80%

100%

120%

0 Mbyte 0.5Mbyte

1 Mbyte 2Mbytes

4Mbytes

8Mbytes

Client Cache Size

Cache ConsistencyGuaranteed

Ignoring CacheConsistency

Dynamic cache allocation also Dynamic cache allocation also sets Sprite apartsets Sprite apart

Virtual memory and the file system battle Virtual memory and the file system battle over main memory over main memory

Both modules keep a time-of-last accessBoth modules keep a time-of-last access Both compare oldest page with oldest Both compare oldest page with oldest

page in other modulepage in other module The oldest page in the cache is recycledThe oldest page in the cache is recycled

Virtual MemoryKeeps pages in approx. LRU order using clock

algorithm

File SystemKeeps blocks in perfect LRU order by tracking

read and write calls

Main Memory

Pages Blocks

Negotiations could cause Negotiations could cause double-cachingdouble-caching

Problem: Problem: – Pages being read from backing files could wind up in both the Pages being read from backing files could wind up in both the

files cache and the virtual memory cachefiles cache and the virtual memory cache– Could force a page eliminated from the virtual memory pool to Could force a page eliminated from the virtual memory pool to

be moved to the file cachebe moved to the file cache– The page would then have to wait another 30 seconds to be The page would then have to wait another 30 seconds to be

sent to the serversent to the server Solution: Solution:

– When writing and reading backing files, virtual memory skips When writing and reading backing files, virtual memory skips the local file cachethe local file cache

Virtual Memory File System

Main Memory

Page A Page A

Multi-block pages create Multi-block pages create problems in shared cachingproblems in shared caching

Problem:Problem:– Virtual memory pages are big enough to hold multiple file Virtual memory pages are big enough to hold multiple file

blocksblocks– Which block’s age should be used to represent the LRU time Which block’s age should be used to represent the LRU time

of the page?of the page?– What should be done with the other blocks once one is What should be done with the other blocks once one is

relinquished?relinquished? Solution:Solution:

– The age of the page is the age of the youngest blockThe age of the page is the age of the youngest block– All blocks in a page are removed togetherAll blocks in a page are removed together

Virtual Memory File System

2:15 2:16

2:19 3:05

4:30 4:31

Main Memory

AgendaAgenda


Basic cache design

Concurrency issues

Benchmarking Andrew

Micro-benchmarks show Micro-benchmarks show reading from a server cache is reading from a server cache is

fastfast An upper limit on remote file access costsAn upper limit on remote file access costs Two important results:Two important results:

– A client can access his own cache 6-8 time faster than A client can access his own cache 6-8 time faster than he can access the server’s cachehe can access the server’s cache

– A client is able to write and read from the server’s cache A client is able to write and read from the server’s cache about as quickly as he can from a local diskabout as quickly as he can from a local disk

Read and Write Throughput

0

500

1000

1500

2000

2500

3000

3500

Local Cache Server Cache Local Disk Server Disk

Kb

yte

s/s

ec

on

d

Read

Write

Macro-benchmarks indicate Macro-benchmarks indicate disks and caching together run disks and caching together run

fastestfastestNormalized Execution Time

0%

50%

100%

150%

200%

250%

Local disk w/ cache, cold

Local disk w/ cache, warm

Diskless, server cacheonly, cold

Diskless, server cacheonly, warm

Diskless, client and servercaches, cold

Diskless, client and servercaches, warm

With a warm start and client caching, diskless With a warm start and client caching, diskless machines were only up to 12% worse than machines were only up to 12% worse than machines with disks machines with disks

Without caching machines were 10-50% slowerWithout caching machines were 10-50% slower

AgendaAgenda


Basic cache design

Concurrency issues

Benchmarking Andrew

Andrew’s caching is notably Andrew’s caching is notably different from Sprite’sdifferent from Sprite’s

Vice: a group of trusted serversVice: a group of trusted servers– Stores data and status information in separate filesStores data and status information in separate files– Has a directory hierarchyHas a directory hierarchy

Venus: A user-level process on each client workstationVenus: A user-level process on each client workstation– Status cache: stored in virtual memory for quick status checksStatus cache: stored in virtual memory for quick status checks– Data cache: stored on local disk Data cache: stored on local disk

Client Cache Server Cache

MemoryDatablock

Datablock

Datablock

Datablock

Datablock

DatablockDatablock

Naming

Naming

Naming

OpenCloseReadWrite

Datablocks

Venus ViceLocalDisk

MemoryDatafile

Statusinfo

Datafile

Datafile

Statusinfo

Datafile

Datafile

Statusinfo

Statusinfo

Statusinfo

OpenClose

Datafiles

Sprite Andrew

……the pathname conventions the pathname conventions are also very differentare also very different

Two level namingTwo level naming– Each Vice file or directory identified by a unique Each Vice file or directory identified by a unique fidfid– Venus maps Vice pathnames to Venus maps Vice pathnames to fidfidss– Servers see only Servers see only fidfidss

Each Each fidfid has 3 parts and is 96 bits long has 3 parts and is 96 bits long– 32-bit Volume number32-bit Volume number– 32-bit Vnode number (index into the Volume)32-bit Vnode number (index into the Volume)– 32-bit Uniquifier (guarantees no fid is ever used twice)32-bit Uniquifier (guarantees no fid is ever used twice)– Contains no location informationContains no location information

Volume locations are maintained on Volume Volume locations are maintained on Volume Location Database found on each serverLocation Database found on each server

Fid:32-bits

Volume number UniquifierVnode number

32-bits 32-bits

Andrew uses write-on-close Andrew uses write-on-close conventionconvention

SpriteSprite Delayed-write policyDelayed-write policy

– Changes written back every Changes written back every 30 seconds30 seconds

– Prevents writing changes that Prevents writing changes that are quickly erasedare quickly erased

– Decreases damage in the Decreases damage in the event of a crashevent of a crash

– Rationed network trafficRationed network traffic

AndrewAndrew Write-on-close policyWrite-on-close policy Write changes are visible to Write changes are visible to

the network only after the file the network only after the file is closedis closed

Little information lost in a Little information lost in a crashcrash– caching on local disk, not caching on local disk, not

main memorymain memory– The network will not see a file The network will not see a file

in the event of a client crashin the event of a client crash 75% of files open less than 75% of files open less than

0.5 seconds0.5 seconds 90% open less than 10 90% open less than 10

secondsseconds Could results in higher server Could results in higher server

traffictraffic Delays closing processDelays closing process

Sequential consistency is Sequential consistency is guaranteed in Andrew and guaranteed in Andrew and

SpriteSprite

Clients are guaranteed to see the latest version of a fileClients are guaranteed to see the latest version of a file– Venus assumes that cached entries are validVenus assumes that cached entries are valid– Server maintains Callbacks to cached entriesServer maintains Callbacks to cached entries– Server notifies callbacks before allowing a file to be modifiedServer notifies callbacks before allowing a file to be modified– Server has the ability to break callbacks to reclaim storageServer has the ability to break callbacks to reclaim storage– Reduces server utilization since communication occurs only Reduces server utilization since communication occurs only

when a file is changedwhen a file is changed

File 1: v1


Client A

Client B Client C

Server

File 1: v2

Client A

Client B Client C

Server


Concurrent write-sharing Concurrent write-sharing consistency is not guaranteedconsistency is not guaranteed

Different workstations Different workstations can perform the same can perform the same operation on a file at operation on a file at the same timethe same time

No implicit file lockingNo implicit file locking Applications must Applications must

coordinate if coordinate if synchronization is an synchronization is an issueissue

Client A

Client B Client C

Server

File 1: W

File 1: WFile 1: R

Comparison to other Comparison to other systemssystems

With one client, With one client, Sprite is around 30% Sprite is around 30% faster than NFS and faster than NFS and 35% faster than 35% faster than AndrewAndrew

Andrew has the Andrew has the greatest scalabilitygreatest scalability– Each client in Andrew Each client in Andrew

utilized about 2.4% utilized about 2.4% of the CPUof the CPU

– 5.4% in Sprite5.4% in Sprite– 20% in NFS20% in NFS

0200400600800

0 1 2 3 4 5 6 7 8

Number of Clients

Ela

pse

d T

ime

SpriteAndrewNFS

0%

50%

100%

0 1 2 3 4 5 6 7 8

Number of Clients

Ser

ver

Uti

l.

SpriteAndrewNFS

Summary: Sprite vs. AndrewSummary: Sprite vs. AndrewSpriteSprite AndrewAndrew

Cache locationCache location MemoryMemory DiskDisk

Client cachingClient caching Clients cache data Clients cache data blocksblocks

Clients cache data files Clients cache data files and status info.and status info.

Cache sizeCache size VariableVariable FixedFixed

File path lookupsFile path lookups Lookups performed by Lookups performed by serverserver

Lookups performed by Lookups performed by clientsclients

Concurrent write-sharingConcurrent write-sharing 30 second write delay, 30 second write delay, consistency guaranteedconsistency guaranteed

Write on close, Write on close, consistency not consistency not guaranteedguaranteed

SequentialSequential Consistency guaranteed: Consistency guaranteed: Servers know which Servers know which workstations have a file workstations have a file cachedcached

Consistency guaranteed: Consistency guaranteed: Servers maintain Servers maintain callbacks to identify callbacks to identify which workstations which workstations cache filescache files

Cache ValidationCache Validation Validated on openValidated on open Server notifies when Server notifies when modifiedmodified

Kernel vs. User-levelKernel vs. User-level Kernel-to-kernel Kernel-to-kernel communicationcommunication

OS intercepts file system OS intercepts file system calls and forwards to calls and forwards to user-level Venususer-level Venus

Conclusion: Both file systems Conclusion: Both file systems have benefits and drawbackshave benefits and drawbacks

SpriteSprite Benefits:Benefits:

– Guarantees sequential and Guarantees sequential and concurrent consistencyconcurrent consistency

– Faster runtime with single Faster runtime with single client due to memory client due to memory caching and kernel-to-kernel caching and kernel-to-kernel communicationcommunication

– Files can be cached in blocksFiles can be cached in blocks

Drawbacks:Drawbacks:– Lacks the scalability of Lacks the scalability of

AndrewAndrew– Writing every 30 seconds Writing every 30 seconds

could result in lost datacould result in lost data– Fewer files can be cached on Fewer files can be cached on

main memory than on the main memory than on the diskdisk

AndrewAndrew Benefits:Benefits:

– Better scalability due in part to Better scalability due in part to shifting path lookup to clientshifting path lookup to client

– Transferring entire files Transferring entire files reduces communication with reduces communication with server, no read and write callsserver, no read and write calls

– Tracking entire files is easier Tracking entire files is easier than individual pagesthan individual pages

DrawbacksDrawbacks– Lacks concurrent write-sharing Lacks concurrent write-sharing

consistency guaranteesconsistency guarantees– Caching to the disk slows Caching to the disk slows

runtimeruntime– Files larger than the disk Files larger than the disk

cannot be cachedcannot be cached

BackupBackup

Cache consistency does Cache consistency does increase server trafficincrease server traffic

Server traffic was reduced by over 70% due to Server traffic was reduced by over 70% due to client cachingclient caching

25% of all traffic is a result of cache consistency25% of all traffic is a result of cache consistency Table 2 Table 2

– Gives an upper bound on cache consistency algorithmsGives an upper bound on cache consistency algorithms– Unrealistic since incorrect results occurredUnrealistic since incorrect results occurred

Server Traffic with Cache ConsistencyServer Traffic with Cache Consistency

Client Client Cache Cache SizeSize

Blocks Blocks readread

Blocks Blocks WrittenWritten

TotalTotal Traffic Traffic RationRation

0 Mbyte0 Mbyte 445815445815 172546172546 618361618361 100%100%

0.5 0.5 MbyteMbyte

102469102469 9686696866 199335199335 32%32%

1 Mbyte1 Mbyte 8401784017 9679696796 180813180813 29%29%

2 Mbytes2 Mbytes 7744577445 9679696796 174241174241 28%28%

4 Mbytes4 Mbytes 7532275322 9679696796 172118172118 28%28%

8 Mbytes8 Mbytes 7508875088 9679696796 171884171884 28%28%

Server Traffic, Ignoring Cache ConsistencyServer Traffic, Ignoring Cache Consistency

Client Client Cache Cache SizeSize

Blocks Blocks readread

Blocks Blocks WrittenWritten

TotalTotal Traffic Traffic RationRation

0 Mbyte0 Mbyte 445815445815 172546172546 618361618361 100%100%

0.5 0.5 MbyteMbyte

8075480754 9366393663 174417174417 28%28%

1 Mbyte1 Mbyte 5237752377 9325893258 145635145635 24%24%

2 Mbytes2 Mbytes 4176741767 9325893258 135025135025 22%22%

4 Mbytes4 Mbytes 3816538165 9325893258 131423131423 21%21%

8 Mbytes8 Mbytes 3700737007 9325893258 130265130265 21%21%

Micro-benchmarks show Micro-benchmarks show reading from a server cache is reading from a server cache is

fastfast

Give an upper limit on remote file access costsGive an upper limit on remote file access costs Two important results:Two important results:

– A client can access his own cache 6-8 time faster than A client can access his own cache 6-8 time faster than he can access the server’s cachehe can access the server’s cache

– A client is able to write and read from the server’s cache A client is able to write and read from the server’s cache about as quickly as he can from a local diskabout as quickly as he can from a local disk

Read and Write Throughput, Kbytes/secondRead and Write Throughput, Kbytes/second

Local CacheLocal Cache Server Server CacheCache

Local DiskLocal Disk Server DiskServer Disk

ReadRead 32693269 475475 224224 212212

WriteWrite 28932893 380380 197197 176176

Maximum read and write rates in various places

Macro-benchmarks indicate Macro-benchmarks indicate disks and caching together run disks and caching together run

fastestfastest With a warm start With a warm start

and client caching, and client caching, diskless machines diskless machines were only up to were only up to 12% worse than 12% worse than machines with machines with disks disks

Without caching Without caching machines were 10-machines were 10-50% slower50% slower

BenchmaBenchmarkrk

Local Disk, Local Disk,

with Cachewith CacheDiskless,Diskless,

Server Server Cache OnlyCache Only

Diskless,Diskless,

Client and Client and Server Server CachesCaches

ColdCold WarWarmm

ColdCold WarWarmm

ColdCold WarWarmm

AndrewAndrew 261261

105105%%

249249

100100%%

373373

150150%%

363363

146146%%

291291

117117%%

280280

112112%%

Fs-makeFs-make 660660

102102%%

649649

100100%%

855855

132132%%

843843

130130%%

698698

108108%%

685685

106106%%

SimulatorSimulator 161161

109109%%

147147

100100%%

168168

114114%%

153153

104104%%

167167

114114%%

147147

100100%%

SortSort 6565

107107%%

6161

100100%%

7474

121121%%

7272

118118%%

6666

108108%%

6161

100100%%

DiffDiff 2222

165165%%

88

100100%%

2727

225225%%

1212

147147%%

2727

223223%%

88

100100%%

NroffNroff 5353

103103%%

5151

100100%%

5757

112112%%

5656

109109%%

5353

105105%%

5252

102102%%

Top number: time in secondsBottom number: normalized time

Status info. on Andrew and Status info. on Andrew and SpriteSprite

Sprite mgmt cache Sprite mgmt cache contains:contains:– File mapsFile maps– Disk management Disk management

info.info.

Volumes in AndrewVolumes in Andrew

VolumeVolume– A collection of filesA collection of files– Forms a partial subtree in Vice name spaceForms a partial subtree in Vice name space

Volumes joined at Mount PointsVolumes joined at Mount Points Resides in a single disk partitionResides in a single disk partition Can be moved from server to server Can be moved from server to server

easily for load balancingeasily for load balancing Enables quotas and backupEnables quotas and backup

Sprite caching improves speed Sprite caching improves speed and reduces overheadand reduces overhead

Client side caching enables diskless Client side caching enables diskless workstationsworkstations– Caching on diskless workstations improves Caching on diskless workstations improves

runtime by 10-40%runtime by 10-40%– Diskless workstations with caching are only 0-Diskless workstations with caching are only 0-

12% slower than workstations with disks12% slower than workstations with disks Caching on the server and client side Caching on the server and client side

result in overall system improvementresult in overall system improvement– Server utilization is reduced from 5-18% to 1-Server utilization is reduced from 5-18% to 1-

9% per active client9% per active client– File intensive benchmarking was completed File intensive benchmarking was completed

30-35% faster on Sprite than on other systems30-35% faster on Sprite than on other systems

Documents

Caching in the Sprite Network File System Scale and Performance in a Distributed File System