30
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Embed Size (px)

DESCRIPTION

Basics File: named collection of logically related data  Unix file: an uninterpreted sequence of bytes File system:  Provides a logical view of data and storage functions  User-friendly interface  Provides facility to create, modify, organize, and delete files  Provides sharing among users in a controlled manner  Provides protection

Citation preview

Page 1: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Distributed File Systems

Group A5Amit SharmaDhaval SanghviAli Abbas

Page 2: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Outline

What is a DFS Requirements of a DFS Sun Network File System

HistoryArchitectureProtocols Implementation

Page 3: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Basics File: named collection of logically related data

Unix file: an uninterpreted sequence of bytes File system:

Provides a logical view of data and storage functions User-friendly interface Provides facility to create, modify, organize, and delete

files Provides sharing among users in a controlled manner Provides protection

Page 4: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

What is a DFS

A distributed implementation of time sharing model of a file system, where multiple users share files and storage resources.

Overall storage space managed by a DFS consists of different, remotely located, smaller storage spaces.

Page 5: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Requirements

Transparency: Access transparency Location transparencyMobility transparency Failure transparencyPerformance transparency

Page 6: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Other Requirements

Scaling Security Hardware and operating System

heterogeneity

Page 7: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Sun’s Network File System

Introduced by Sun Microsystems in 1985

Sun published the protocol and licensed reference implementation

Since then, NFS has been supported by every Unix variant

Page 8: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS design objectives Machine and OS independence, no

recompilation of applications Crash recovery Transparent access Reasonable performance (comparable to

local FS)

Page 9: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS - The basic idea allow an arbitrary collection of clients and

servers to share a common file system In most cases all clients and servers are on

the same LAN each machine can be both a client and a

server Each NFS server exports one or more of its

directories for access by remote clients

Page 10: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS - The basic idea (cont.)

When a directory is made available, so are all of its subdirectories.whole directory trees are exported by NFS as a

unit The list of exported directories a server

exports is maintained in the /etc/exports file Uses RPC / XDR

Page 11: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS - How do we get the files

Mount protocolaccess shared file systems by mounting

them from an NFS server machine. Where? at mount pointMount point? -an empty directory or

subdirectory, created as place to attach a remote file system.

Page 12: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

How do we get the files (cont.) server returns a file handle to the client. The file handle contains fields uniquely

identifying the file system type (ext2, vfat, Novell,

BSD, NeXTSTEP..) the disk the i-node number of the directory and security information

Page 13: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

How do we get the files (cont.) The server daemons:

nfsd:The NFS Daemon which services requests from NFS clients.

mountd: The NFS Mount Daemon which actually carries out requests that nfsd passes on to it.

portmap: The portmapper daemon which allows NFS clients to find out which port the NFS server is using.

Page 14: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS software architecture

Page 15: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

VFS VFS allows diverse specific file systems

to coexist in a file tree, isolating all FS-dependencies in pluggable filesystem modules.

VFS was an internal kernel restructuring with no effect on the syscall interface.

VFS layer maintains a table with one entry for each open file

Page 16: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

VFS 2 VFS layer has an entry called a v-node (virtual

i-node). for every open file, V-nodes are used to tell

whether the file is local or remote. A V-node points to either an i-node, when the

file is on the local disk, or an r-node in the NFS Client code, when the reference is to data on a remote disk.

all state information on the open files is stored on the client's side.

Page 17: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Vnode use

To mount a remote file system, the system admn (or /etc/rc) calls the mount program

Kernel constructs vnode for remote directory and asks NFS-client code to create an r-node in its internal tables. Vnode in client VFS will point to local I-node or r-node.

Page 18: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS implementation

Servers are stateless: Each request has complete information – does not rely on previous state. i.e. idempotentUser’s identity must be verified for each

requestMost UNIX system calls are supported except

for open and close

Page 19: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Idempotent idem·po·tent Pronunciation: 'I -d&m-"pO-t&nt Date: 1870 : relating to or being a mathematical quantity

which when applied to itself under a given binary operation (as multiplication) equals itself;

also : relating to or being an operation under which a mathematical quantity is idempotent

Page 20: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Semantics of file sharing On a single

processor, when a read follows a write, the value returned by the read is the value just written.

In a distributed system with caching,obsolete values may be returned.

Page 21: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Semantics of file sharing NFS implements session semantics

All changes occur atomicallyTransaction

No updates are possible; simplifies sharing and replicationImmutable files

No changes are visible to other processes until the file is closedSession semantics

Every operation on a file is instantly visible to all processesUNIX semantics

CommentMethod

Page 22: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Caching The cache consistency problem: cached

data may become stale if cached data is updated elsewhere in the network.

NFS solution:Timestamp invalidation. Timestamp each

cache entry, and periodically query the server: “has this file changed since time t?”; invalidate cache if stale.

Page 23: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS Client Caching Where? -in main memory of clients What? - file blocks, translation of file

names to vnodes, and attributes of files and directories.

(1) File blocks- time stamp of file (when last modified on the server).After certain age, blocks have to be validated

with serverdelay writing policy: modified blocks flushed to

server after certain delay

Page 24: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS Client CachingClients do not free delayed-write

blocks until the server confirms that the data have been written to disk.

(2) Caching of file names to vnodes for remote directory accessspeeds up the lookup procedure

(3) Caching of file and directory attributesupdated when new attributes received from

server, discarded after certain time

Page 25: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS Client Caching Writes:

block marked dirty and scheduled for flushing. flushing: when file is closed, or a sync

occurs at client. What if multiple clients write to same file at the

same time? Can get either version (or parts of both).

Completely arbitrary. Just like normal Unix Problem: Writes from clients So if writes

happen at time t and close happens at t’ then other clients might not see new data till t’

Page 26: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Cache validation Validation check performed :

at file openwhenever server contacted to get new blockafter timeout (3s for file blocks, 30s for

directories) Done for all files (even if not being shared). Expensive!

Potentially, every 3 sec get file attributes. If needed invalidate all blocks.

Page 27: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Locking in NFS

Renew the lease on a specified lockRenew

Remove a lock from a range of bytesLocku

Test whether a conflicting lock has been grantedLockt

Creates a lock for a range of bytes (non-blocking_Lock

DescriptionOperation

NFS supports file lockingApplications can use locks to ensure consistencyLocking was not part of NFS until version 3NFS v4 supports locking as part of the protocol (see above table)

Page 28: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

NFS score card Pros:

simple highly portable

Cons: Not Secure Locking is not good Sometimes inconsistent Clients maintain 2 caches, one for file attributes (i-

nodes) and one for file data. Caching can be nasty

Page 29: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Summary How do we make it fast?

Answer: caching, read-ahead How do we make it reliable? What if a message is

dropped? What if the server crashes? Answer: client retransmits request until it receives a

response. How do we preserve file system semantics in the

presence of failures and/or sharing by multiple clients? Answer: well, we don’t, at least not completely.

Page 30: Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas

Alternatives to NFS

Andrew File System - CMU, now IBM Sprite Coda Distributed File System Remote File System Netware - Novell based file system