Serverless Network File Systems Overview by Joseph Thompson

Serverless Network File Systems

Overview by Joseph Thompson

Problem

• Centralized file systems fundamentally limit performance and availability– All reads and writes go through the

centralized server– Increased server performance is expensive

Purpose

• Better performance and scalability

• High availability via redundant data storage

Assumption

• SNFS is only appropriate among machines that communicate over a fast network and that trust each other to enforce security– SNFS generates a significant amount of

network traffic– Security will be covered later

Components of SNFS

• Software RAID

• Log File System (LFS)

• Zebra – Merges RAID and LFS in a distributed

network– Don’t miss my next presentation on Zebra!

• Multiprocessor Cache Consistency– In this model, a each processor is one client

Three Problems to Be Solved

• Need distributed metadata which both provide cache consistency management and flexibility to dynamically reconfigure client responsibilities

• Scalable way to subset storage servers for efficiency

• Scalable log cleaning

Metadata

• Manager Map

• IMap

• File Directories

• Stripe Group Map

Mangers

• The manager of a file controls two sets of information about it– Cache consistency state– Disk location metadata

Manager Map

• Table that indicates which physical machines mange which groups of index numbers at any given time

• Globally replicates this table to all mangers in system– Table relatively small (10’s of kBytes per

hundreds of clients)– Table rarely changes

IMap

• A file’s imap entry contains the log addresses of the file’s inode– For scalability, imaps are only distributed to

managers who have been assigned to the file

File Directories

• Contains mappings from file names to index numbers– Stored in the file itself– Files created by the client are assigned to the

manager on that machine (if there is one)

• Index Numbers– Used to find the manager who is responsible

for the file

Stripe Group Map Justification

• In a large raid, even large log segments create small write inefficiencies with large RAIDs

• While one client write at is full network bandwidth to one stripe group, another client can do the same with another group

• Smaller segment size make cleaning more efficient

• Stripe groups greatly improve availability– Each group stores its own parity which helps if there

are multiple server failures in different groups

Stripe Group Implementation

• Group ID• Group Members • Current or Obsolete

– Current and Obsolete field is used to increase efficiency relying on the cleaner to eventually move all data to a current group and removing the obsolete group

• Also globally replicated to each client– Small and rarely changes

Cleaning

• Three main tasks– Utilization status– Uses status to decide which segment to clean– Writes blocks from old segment to new

segment

Distributed Utilization

• Assign the burden of maintaining each segment’s utilization status to the client that wrote the segment

• Client stores utilization information in s-files for each stripe group they write to which are written like normal files and can be found by a stripe group leader

Distributed Cleaning

• A stripe group leader (dynamically appointed) initiates cleaning when the number of free segments drops below a threshold value or when the group is idle

• The leader accumulates the s-files for the group and can dynamically assign cleaners from different machines to clean subsections of the stripe group in an efficient manner

Procedure to Read a Block

• Diagram Demystified!

Writing and Cache Consistency

• To write, a client must request a lock from the owning manager which the manger can revoke at any time

• The manger invalidates its cache and updates its cache consistency information

• One implementation uses a client caching lists to invalidate stale client caches and forward read requests to clients with valid cached copies

Recovery and Reconfiguration

• General Recovery Strategy

• Data Structure Recovery

• Storage Server Recovery

• Manager Recovery

• Cleaner Recovery

• Scalability of Recovery

General Recovery Strategy

• LFS has an append only log of every file modification between log segment writes called the delta

• Uses checkpoint recovery and roll-forward

• Unless additional parity servers per stripe group are used, multiple storage servers from a single stripe are unreachable, there can be no full recovery

Data Structure Recovery

• Layered dependence requires the recovery to start with the storage servers, then managers, then cleaner

Storage Server Recovery

• As we have seen with RAID architectures, recovering a single storage server is easy

• Once we do the initial recovery we can use LFS’ delta feature to poll clients for their unwritten changes in the process of rolling forward

Manager Recovery

• Retrieves last known imaps from its last checkpoint written to a storage server

• The manager gathers a consensus of map manager tables from clients in the roll-forward process to set the appropriate changes to data block locations

Cleaner Recovery

• Since s-files are stored like normal files, they will be recovered from the respective storage server

• Then must go through a roll-forward state where it checks the clients for a summary of their modifications to those segments that are more recent

• To avoid clients having to search their logs multiple times they can gather utilization information during the manager recovery process

Scalability of Recovery

• The roll-forward process can generate O(N^2) messages per object using the roll-forward step where N refers to the number of clients, manger, or storage servers

• An optimization is each object only need to contact N lower layer object, and if there is randomization used to reduce the number of concurrent access to a single storage server, each manager can roll-forward in parallel.

Other Information Not Covered Here

• Details of xFS prototype and performance testing

• Extra research to the state of xFS since 1995 when this paper was written

Conclusion

• Paper is valuable– Provides a creative use of new and old ideas

to pioneer a new file system

• Problems– Restrictions on the usability of this system in a

non-secure environment

• Solutions– P2P security solutions we discussed in class

Documents

Serverless Network File Systems Overview by Joseph Thompson