The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is

The Zebra Striped Network Filesystem

Approach

• Increase throughput, reliability by striping file data across multiple servers

• Data from each client is formed into a single stream– Data striped in approach similar to log structured file

system– Parity for each stripe is written in style of RAID disk

arrays– Parity used to allow system to continue to function with

server failures– Layered on top of Sprite

Per-File Striping

• Collection of file data that spans servers is called stripe, portion of stripe stored on a single server is stripe fragment

• Per-file striping– Each file stored in its own set of stripes– Parity computed on per-file basis– Bad for small files

• If file is striped, each disk has to access little pieces of a file• Small writes require 4 accesses as in RAIDs• See Figure 3

– Consistency management must be dealt with • We are dealing with separate file servers so some stripe fragments may get written

successfully but others might not• In that case parity will be inconsistent with data• Appropriate protocols to protect against partial writes and incorrect parity would be

needed (of course, analogous protocols are needed anyway in the log structured file approach).

Log Structured Network Filesystem

• LFS uses logging approach at the interface between file server and disks

• Zebra uses logging approach at the interface between client and servers

• Each Zebra client organizes new file data into append-only log, which it stripes across servers (Figure 4)

• Client computes parity for log, not for individual files

• Each client creates its own log, single strip contains data written by single client

• Issues:– How to share files between client workstations?

– How is free space reclaimed?

Architecture

• Clients (many)– Machines that run application programs

– Each client produces separate log with parity

• Storage servers (many)– Store file data

• File manager (one)– Manages metadata -- file and directory structure of file system

– Metadata can also be stored in a logged manner to improve performance and eliminate major potential point of failure

• Stripe cleaner (one)– Reclaims unused space on storage servers

• See figure 5

Storage Server

• Stores stripe fragments (512KB)– Log structured – fragments must not already exist, except for

parity fragments in which case new copy replaces old

• Appends to existing fragment

• Retrieves all or part of fragment

• Deletes fragment (invoked by stripe cleaner)

• Identify fragments – Most recent fragment written by client (used for recovery)

File Manager

• Stores all information in file except for data (metadata)– Protection information, block pointers to say where data is stored,

directories, symbolic links– Carries out name lookup, maintaining consistency of client file caches– Client requests block pointers from file manager, reads data from storage

server

• File manager implemented using Sprite file server with log-structured file system

• Zebra file – one file in file manager’s file system; data in file is array of block pointers that say where actual data are stored

• Spite network file protocols used with little modification – clients open, read, cache Zebra metadata in say way as caching regular Sprite files

Stripe Cleaner

• New stripes are initially full of data

• Over time, blocks in stripe become free – overwritten or deleted

• Zebra does’nt modify stripe, instead it writes new copy of block to new stripe

• Zebra stripe cleaner runs as user-level process, identifies stripes with large amounts of free space– Reads remaining live blocks

– Writes live blocks to new stripe by appending to client’s log

System Operation

• Contents of log– Disk blocks – raw data from file– Delta

• Changes to blocks in file• Used to communicate changes between clients, file manager, stripe cleaner• E.g. client puts delta into log when it writes a file block, file manager reads

delta to update metadata for that block• Deltas stored in client logs

– Deltas created • When blocks are added to file, deleted from file or overwritten• Created by stripe cleaner when it copies live blocks out of stripes (cleaner

deltas)• Created by file manager to resolve races between stripe cleaning and file

updates (reject deltas)

Contents of Delta

• File identifier – unique identification for file• File version – increments whenever block in file is written

or deleted• Block number – identifies particular block by position in

file• Old block pointer – fragment identifier and offset of

block’s old storage location. If delta is for a new block, old block pointer has special null value

• New block pointer – fragment identifier and offset for block’s new storage location. If delta is for a block deletion, new block pointer has special null value

Writing and Reading Files• New data placed in client’s file cache• Data written to server

– Reach threshold age– Cache fills with dirty data– Application issues fsync system call to force data to disk– File manager requests that data be written in order to maintain consistency

between file caches

• When writing to disk– Data put into log, formed into stripe fragments and written to storage server– For each file block written, client puts delta into its log– File manager harvests deltas

• Reading files -- almost same as non-striped filesystem– Open and close via RPC to file manager– Reading – obtain block pointers from file manager, obtain file data from storage manager

Stripe Cleaning

• Compute how much live data is in stripe– Deltas used for this – stripe cleaner processes deltas from client logs and

keeps running count of utilization in each stripe

– Stripe cleaner appends all deltas pertaining to a stripe to a stripe status file

– Stripe to be cleaned is chosen using cost benefit analysis is done (Rosenblum91)

– To clean stripe• Identify live blocks – use stripe status file

• Copy to new stripe

• Stripe cleaner copies live blocks to new stripe using kernel call– Read one or more blocks from storage server, append to its client log and

write new log contents to storage server

Conflicts between cleaning and file access

• Application can modify or delete file while stripe cleaner is modifying file– Client could modify block after cleaner reads old copy but before cleaner rewrites

the block– New data would be lost in favor of rewritten copy of old data– In LFS, cleaner locked files but this produced “lock convoys” which adversely

impacted performance

• Optimistic approach– No locking, stripe cleaner copies block and issues cleaner delta– If block was updated during cleaning, update delta will be processed by client that

made the change– File manager makes sure that final pointer for block reflects the update delta, not

the cleaner delta– File manager detects conflicts by comparing the old block pointer in each incoming

delta with the block pointer stored in the file manager’s delta, if they are different, block was simultaneously cleaned and update

Documents

The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is