64
Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. [email protected] Mr. Coling Zhang [email protected] With Thanks to Prof. G. Coulouris, Prof. A.S. Tanenbaum and Prof. S.C Joo

Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. [email protected] Mr. Coling Zhang [email protected]

Embed Size (px)

Citation preview

Page 1: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Distributed Systems

Distributed File System

Dr. Sunny Jeong. [email protected]

Mr. Coling Zhang [email protected]

With Thanks to Prof. G. Coulouris, Prof. A.S. Tanenbaum and Prof. S.C Joo

Page 2: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Overview

Requirements for distributed file systems transparency, performance, fault-tolerance, Consistency...

Design issues possible options, architectures file sharing, concurrent updates Caching

Examples Sun Network File System Andrew File System

2

Page 3: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Distributed Services

3

Page 4: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

 

• A Distributed File System ( DFS ) is simply a classical model of a file system ( as discussed before ) distributed across multiple machines. The purpose is to promote sharing of dispersed files.

• This is an area of active research interest today.

• The resources on a particular machine are local to itself. Resources on other machines are remote.

• A file system provides a service for clients. The server interface is the normal set of file operations: create, read, etc. on files.

Definitions

Page 5: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

 Clients, servers, and storage are dispersed across machines. Configuration and implementation may vary -

a) Servers may run on dedicated machines, ORb) Servers and clients can be on the same machines.c) The OS itself can be distributed (with the file system a part of

that distribution.a) A distribution layer can be interposed between a

conventional OS and the file system.

Clients should view a DFS the same way they would a centralized FS; the distribution is hidden at a lower level.

Performance is concerned with throughput and response time.

Definitions

Page 6: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Distributed file service

Basic services persistent file storage of data and programs operations on files (create, open, read, write…) multiple remote clients within intranet file sharing typically one-copy update semantics over RPC

Many new developments persistent object stores (storage of objects)

Persistent Java, CORBA, … replication, whole-file caching distributed multimedia (Tiger video file server)

6

Page 7: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Storage system and their properties

Sharing Persis-tence

Distributedcache/replicas

Consistencymaintenance

Example

Main memory RAM

File system UNIX file system

Distributed file system Sun NFS

Web Web server

Distributed shared memory Ivy (Chap. 16)

Remote objects (RMI/ORB) CORBA

Persistent object store 1 CORBA PersistentObject Service

Persistent distributed object store PerDiS, Khazana

1

1

1

* “1” is for one-copy consistency

7

Page 8: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Characteristics of file systems

Operations on files ( =data + attributes) create/delete query/modify attributes open/close read/write access control

Storage organization directory structure (hierarchical, pathnames) metadata (= file management information, data about data)

file attributes directory structure information, etc

8

Page 9: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Characteristics of file systems

Persistently stored data sets( files = data + attributes) Hierarchic name space visible to all processes API with the following characteristics:

access and update operations on persistently stored data sets sequential access model (with additional random facilities)

Sharing of data between users, with access control Concurrent access:

certainly for read-only access what about updates?

9

Page 10: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File attribute record structure

File length

Creation timestamp

Read timestamp

Write timestamp

Attribute timestamp

Reference count

Owner

File type

Access control list(ACL)E.g. for UNIX: rw-rw-r--

User controlled

updated by system:

updated by owner:

10

Page 11: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File system Modules

Concentrate on higher levels.

Directory module: relates file names to file IDs

File module: relates file IDs to particular files

Access control module: checks permission for operation requested

File access module: reads or writes file data or attributes

Block module: accesses and allocates disk blocks

Device module: disk I/O and buffering

11

Page 12: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Distributed file system requirements [1/4]

Facilities support the sharing of persistent storage and information enable user programs to access files without copying them to a local disk

Transparency (clients unaware of the distributed nature) access transparency - client unaware of distribution of files, same interface

for local/remote files location transparency - uniform file name space from any client workstation mobility transparency - files can be moved from one server to another

without affecting client performance transparency - client performance not affected by load on

service scaling transparency - expansion possible if numbers of clients increase

12

Page 13: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

[Distributed file system requirements –ctd[2/4]

Concurrent file updates changes by one client do not affect another Isolation File-level or record-level locking Other forms of concurrency control to minimise contention (Minimum

Competition)

File replication File service maintains multiple identical copies of files

Load-sharing between servers makes service more scalable Local access has better response (lower latency) Fault tolerance

Full replication is difficult to implement Caching (of all or part of a file) gives most of the benefits (except fault

tolerance)13

Page 14: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

Naming is the mapping between logical and physical objects. 

Example: A user filename maps to <cylinder, sector>.

In a conventional file system, it's understood where the file actually resides; the system and disk are known.

In a transparent DFS, the location of a file, somewhere in the network, is hidden.

File replication means multiple copies of a file; mapping returns a SET of locations for the replicas.

 

Location transparency - 

a)The name of a file does not reveal any hint of the file's physical storage location.a)File name still denotes a specific, although hidden, set of physical disk

blocks.b)This is a convenient way to share data.c) Can expose correspondence between component units and machines.

Naming and Transparency

Page 15: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

Location independence - 

The name of a file doesn't need to be changed when the file's physical storage location changes. Dynamic, one-to-many mapping.

Better file abstraction. Promotes sharing the storage space itself. Separates the naming hierarchy from the storage devices

hierarchy.

Most DFSs today: 

Support location transparent systems. Do NOT support migration; (automatic movement of a file from

machine to machine.) Files are permanently associated with specific disk blocks.

Naming and Transparency

Page 16: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

The ANDREW DFS AS AN EXAMPLE: 

Is location independent. Supports file mobility. Separation of FS and OS allows for disk-less systems. These have lower

cost and convenient system upgrades. The performance is not as good.

NAMING SCHEMES: 

There are three main approaches to naming files: 1. Files are named with a combination of host and local name.  

• This guarantees a unique name. NOT location transparent NOR location independent.

• Same naming works on local and remote files. The DFS is a loose collection of independent file systems.

Naming and Transparency

Page 17: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

NAMING SCHEMES: 

2. Remote directories are mounted to local directories.  

• So a local system seems to have a coherent directory structure.

• The remote directories must be explicitly mounted. The files are location independent.

• SUN NFS is a good example of this technique. 3. A single global name structure spans all the files in the system.  

• The DFS is built the same way as a local filesystem. Location independent.

Naming and Transparency

Page 18: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

IMPLEMENTATION TECHNIQUES: 

Can Map directories or larger aggregates rather than individual files.

A non-transparent mapping technique: 

name ----> < system, disk, cylinder, sector >  A transparent mapping technique: 

name ----> file_identifier ----> < system, disk, cylinder, sector >

  So when changing the physical location of a file, only the file

identifier need be modified. This identifier must be "unique" in the universe.

 

Naming and Transparency

Page 19: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Service Design Options

State full server holds information on open files, current position, file locks open before access, close after access better performance

shorter message, read-ahead possible server failure

lose state client failure

tables fill up can provide file locks

19

Page 20: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Service Design Options -ctd

Stateless no state information held by server file operations(idempotent) must contain all information needed

(longer message) simpler file server design can recover easily from client or server crash locking requires extra lock server to hold state

20

Page 21: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Service Architecture

Client Side

File serverSide

21

Page 22: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Service Architecture

Client computer Server computer

Applicationprogram

Applicationprogram

Client module

Flat file service

Directory service

LookupAddNameUnNameGetNames

ReadWriteCreateDeleteGetAttributesSetAttributes

22

Page 23: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Server Architecture -ctd

Components (for openness): Flat file service

Flat file service operations below on file contents Have unique file identifiers (UFIDs) translates UFIDs to file locations

Read(FileId, i, n) -> Data — throws BadPosition

If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items

from a file starting at item i and returns it in Data.

Write(FileId, i, Data) — throws BadPosition

If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a

file, starting at item i, extending the file if necessary.

Create() -> FileId Creates a new file of length 0 and delivers a UFID for it. Delete(FileId) Removes the file from the file store.

GetAttributes(FileId) -> Attr Returns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not

shaded in ).

23

Page 24: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File Server Architecture -ctd

Directory service mapping between text-(file) names to UFIDs

Client module API for file access, one per client computer holds states: open files, positions knows network location of flat file & directory server

Flat file service

Read(FileId, i, n) -> Data

Write(FileId, i, Data)

Create() -> FileId

Delete(FileId)

GetAttributes(FileId) -> Attr

SetAttributes(FileId, Attr)

Directory service

Lookup(Dir, Name) -> FileId

AddName(Dir, Name, FileId)

UnName(Dir, Name)

GetNames(Dir, Pattern) -> NameSeq

24

Page 25: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Flat file service RPC interface

Used by client modules, not user programs FileId (UFID) uniquely identifies file invalid if file not present or inappropriate access Read/Write; Create/Delete; Get/SetAttributes

No open/close! (unlike UNIX) access immediate with FileId Read/Write identify starting point

Improved fault-tolerance operations idempotent except Create, can be repeated (at-least-once RPC

semantics) stateless service

25

Page 26: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Access control

In UNIX file system access rights are checked against the access mode (read, write, execute) in

open user identity checked at login time, cannot be tampered(=changed) with in

non-distributed implementations.

In distributed (file) systems Access rights must be checked at server

RPC unprotected Forging identity possible, a security risk

user id typically passed with every request (e.g. Sun NFS) stateless

26

Page 27: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Directory structure

Hierarchical tree-like, pathnames from root (in UNIX) several names per file (link operation)

Naming system implemented by client module, using directory service root has well-known UFID locate file following path from root

big bobjon

people

export

(root)

. . .

27

Page 28: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File names

Text name = directory pathname+file name hostname:local name

not mobility transparent

uniform name structure (the same name space for all clients)

remote mount (e.g. Sun NFS) remote directory inserted into local directory relies on clients maintaining consistent naming conventions across all clients

all clients must implement same local tree must mount remote directory into the same local directory

28

Page 29: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File names

Mount operation:

mount(remotehost, remotedirectory, localdirectory)

A server maintains a table of clients who have mounted file systems at that server.

Each client maintains a table of mounted file systems holding: < IP address, port number, file handle>

Hard versus soft mounts

29

Page 30: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Remote mount

jim jane joeann

usersstudents

usrvmunix

Client Server 2

. . . nfs

Remote

mountstaff

big bobjon

people

Server 1

export

(root)

Remote

mount

. . .

x

(root) (root)

Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. server-side : /export/people, : /nfs/users

client-side : mount -t nfs server1:/export/people /usr/students /* client: /usr/students(=people)/jon,… */client-side : mount -t nfs server2:/nfs/users /usr/staff /* client:/usr/staff(=users)/jane, … */

30

Page 31: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Directory service

Directory conventional file (client of the flat file service) mapping from text names to UFIDs

Operations require FileId, machine readable UFID as parameter locate file (LookUp) add/delete file (AddName/UnName) match file names to regular expression (GetNames)

31

Page 32: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Directory service operations

Lookup(Dir, Name) -> FileId— throws NotFound

Locates the text name in the directory and returns therelevant UFID. If Name is not in the directory, throws an

exception.

AddName(Dir, Name, File) — throws NameDuplicate

If Name is not in the directory, adds (Name, File) to thedirectory and updates the file’s attribute record.If Name is already in the directory: throws an exception.

UnName(Dir, Name) — throws NotFound

If Name is in the directory: the entry containing Name isremoved from the directory.

If Name is not in the directory: throws an exception.

GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match theregular expression Pattern.

32

Page 33: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File sharing

Multiple clients share the same file for read/write access.

One-copy update semantics every read sees the effect of all previous writes a write is immediately visible to clients who have the file open for reading

Problems! caching: maintaining consistency between several copies difficult to achieve serialize access by using file locks (affects performance ) trade-off between consistency and performance

33

Page 34: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHING

Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally.

If required data is not already cached, a copy of data is brought from the server to the user.

Perform accesses on the cached copy.

Files are identified with one master copy residing at the server machine, Copies of (parts of) the file are scattered in different caches.

Cache Consistency Problem -- Keeping the cached copies consistent with the master file.

Remote File Access

Page 35: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHING

A remote service ((RPC) has these characteristic steps: 

a) The client makes a request for file access.b) The request is passed to the server in message format.c) The server makes the file access.d) Return messages bring the result back to the client.

 

This is equivalent to performing a disk access for each request.

Remote File Access

Page 36: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHE LOCATION: 

Caching is a mechanism for maintaining disk data on the local machine. This data can be kept in the local memory or in the local disk. Caching can be advantageous both for read ahead and read again.

The cost of getting data from a cache is a few HUNDRED instructions; disk accesses cost THOUSANDS of instructions.

The master copy of a file doesn't move, but caches contain replicas of portions of the file.

Caching behaves just like "networked virtual memory".

Remote File Access

Page 37: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHE LOCATION: 

What should be cached? << blocks <---> files >>. Bigger sizes give a better hit rate; Smaller give better transfer times.

Caching on disk gives:— Better reliability.

Caching in memory gives:— The possibility of diskless work stations,— Greater speed,

 

Since the server cache is in memory, it allows the use of only one mechanism.

Remote File Access

Page 38: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHE UPDATE POLICY: 

A write through cache has good reliability. But the user must wait for writes to get to the server. Used by NFS.

Delayed write - write requests complete more rapidly. Data may be written over the previous cache write, saving a remote write. Poor reliability on a crash.

Flush sometime later tries to regulate the frequency of writes.

Write on close delays the write even longer.

Which would you use for a database file? For file editing?

Remote File Access

Page 39: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMSExample: NFS with Cachefs

Page 40: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

CACHE CONSISTENCY: 

The basic issue is, how to determine that the client-cached data is consistent with what's on the server.

 Client - initiated approach - 

The client asks the server if the cached data is OK. What should be the frequency of "asking"? On file open, at fixed time interval, ...?

 Server - initiated approach - 

Possibilities: A and B both have the same file open. When A closes the file, B "discards" its copy. Then B must start over. The server is notified on every open. If a file is opened for writing, then disable caching by other clients for that file. Get read/write permission for each block; then disable caching only for particular blocks.

Remote File Access

Page 41: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

COMPARISON OF CACHING AND REMOTE SERVICE:  Many remote accesses can be handled by a local cache. There's a

great deal of locality of reference in file accesses. Servers can be accessed only occasionally rather than for each access.

Caching causes data to be moved in a few big chunks rather than in many smaller pieces; this leads to considerable efficiency for the network.

Cache consistency is the major problem with caching. When there are infrequent writes, caching is a win. In environments with many writes, the work required to maintain consistency overwhelms caching advantages.

Caching requires a whole separate mechanism to support acquiring and storage of large amounts of data. Remote service merely does what's required for each call. As such, caching introduces an extra layer and mechanism and is more complicated than remote service.

Remote File Access

Page 42: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

STATEFUL VS. STATELESS SERVICE: 

Stateful: A server keeps track of information about client requests.

  It maintains what files are opened by a client; connection

identifiers; server caches. Memory must be reclaimed when client closes file or when client dies.

Stateless: Each client request provides complete information needed by the server (i.e., filename, file offset ).

The server can maintain information on behalf of the client, but it's not required.

Useful things to keep include file info for the last N files touched.

Remote File Access

Page 43: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

STATEFUL VS. STATELESS SERVICE: 

Performance is better for stateful.  

Don't need to parse the filename each time, or "open/close" file on every request.

Stateful can have a read-ahead cache. Fault Tolerance: A stateful server loses everything when it crashes.  

Server must poll clients in order to renew its state. Client crashes force the server to clean up its encached

information. Stateless remembers nothing so it can start easily after a crash.

Remote File Access

Page 44: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

DISTRIBUTED FILE SYSTEMS

FILE REPLICATION: 

Duplicating files on multiple machines improves availability and performance. Placed on failure-independent machines ( they won't fail together ).

Replication management should be "location-opaque".  The main problem is consistency - when one copy changes, how do

other copies reflect that change? Often there is a tradeoff: consistency versus availability and performance.

Example:  "Demand replication" is like whole-file caching; reading a file causes it

to be cached locally. Updates are done only on the primary file at which time all other copies are invalidated.

  Atomic and serialized invalidation isn't guaranteed ( message could

get lost / machine could crash. )

Remote File Access

Page 45: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Example: Sun NFS (1985)

An industry standard for file sharing on local networks since the 1980s

An open standard with clear and simple interfaces Closely follows the abstract file service model defined above Supports many of the design requirements already mentioned:

transparency heterogeneity efficiency fault tolerance

Limited achievement of: concurrency replication consistency security

45

Page 46: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Example: Sun NFS (1985)

Structure of flat file, client & directory service NFS protocol

RPC based, OS independent (originally UNIX) NFS server

stateless (no open/close) no locks or concurrency control no replication with updates

Virtual file system, Remote mount Access control (user id with each request)

security loophol modify RPC to impersonate users

Client and Server caching

46

Page 47: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Sun NFS architecture

UNIX kernel

protocol

Client computer Server computer

system calls

Local Remote

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

NFS

UNIX

UNIX kernel

Virtual file systemVirtual file system

Oth

er f

ile s

yste

m

Operationson remote files

47

Page 48: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

File identifier (FileId)

Simple Solution i-node (number identifying file

within file system) file migration requires finding and

changing all FileIds UNIX reuses i-node numbers after

file is deleted (i-generation. no)

NFS file handle Virtual file system uses i-node if local, file handle(fh) if remote.

Server address Index

IP address.socket i-node number

File system identifier i-node gener. no.i-node no.File handle(fh)

fh = file handle:

Filesystem identifier i-node number i-node generation no

48

Page 49: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

NFS Server Operations (simplified)

• read(fh, offset, count) -> attr, data• write(fh, offset, count, data) -> attr• create(dirfh, name, attr) -> newfh, attr• remove(dirfh, name) -> status• getattr(fh) -> attr• setattr(fh, attr) -> attr• lookup(dirfh, name) -> fh, attr• rename(dirfh, name, todirfh, toname)• link(newdirfh, newname, dirfh, name)• readdir(dirfh, cookie, count) -> entries• symlink(newdirfh, newname, string) -> status• readlink(fh) -> string• mkdir(dirfh, name, attr) -> newfh, attr• rmdir(dirfh, name) -> status• statfs(fh) -> fsstats

fh = file handle:

Filesystem identifier i-node number i-node generation no

•i-node contains information of files

49

Page 50: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Caching in NFS

Indispensable for performance (necessary) Caching

Retains recently the used data (file pages, directories, file attributes) in cache

updates data in cache for speed block size typically 8kbytes

Server caching cache in server memory (UNIX kernel)

Client caching cache in client memory, local disk

50

Page 51: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Server caching

Store data in server memory Read-ahead: anticipate which pages to read Delayed write

update in cache; write to disk periodically (UNIX sync to synchronize cache) or when space needed

which contents seen by users depends on timing

Write through cache and write to disk (reliable, poor performance), whenever updated

Write on close write to disk only when commit received (fast but problems with files open

for a long time)

51

Page 52: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Client caching

Potential consistency problems! different versions, portions of files, … since writes delayed clients poll server to check if copy still valid

Timestamp method Tag with latest time of validity check and modification time copy valid if time since last check less than freshness interval, or

modification time on server the same choose freshness interval adaptively, 3~30 sec for files, 30~60 sec for

directories for small freshness interval, potential heavy load on Network

52

Page 53: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Client caching ctd

Reads perform validity check whenever cache entry(input) used if not valid, request data from server several optimizations to reduce traffic Recent updates not always visible (timing!)

Writes when page modified, marked as dirty dirty pages flushed asynchronously, periodically (client’s synch) and on

close

Not truly one-copy update semantics...

53

Page 54: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

NFS summary

Transparency Access transparency

providing application programming interface(= local system interface) Location transparency

supporting a single network file name space Mobility transparency

migration transparency Scalability

To handle very large-world loads efficiently File replication

NSF : read-only replica supporting file replication with updates

Hardware and operating system - heterogeneity Fault tolerance

54

Page 55: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Example: Andrew File System(AFS)

Overview

A distributed computing environment (Andrew) under development since 1983 at Carnegie-Mellon University, purchased by IBM and released as Transarc DFS, now open sourced as OpenAFS.

Information sharing on a large scale via transparencyNFS compatible(called NSF-2)

File reference by NFS-style file handle

55

Page 56: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

AFS tries to solve complex issues such as uniform name space, location-independent file sharing, client-side caching (with cache consistency), secure authentication (via Kerberos)

Also includes server-side caching (via replicas), high availability Can span 5,000 workstations

Scalable Whole-file serving (> 64kbytes) Whole-file caching (on local client disk, 100s of recently used files)

Characteristics of AFS local-cached copy providing sufficient cache storage UNIX based on file size and referencing locality

DISTRIBUTED FILE SYSTEMS

Page 57: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

AFS Software architecture

Venus

Workstations(clients) Servers

Venus

VenusUserprogram

Network

UNIX kernel

UNIX kernel

Vice

Userprogram

Userprogram

ViceUNIX kernel

UNIX kernel

UNIX kernel

Two software components

Vice(user-level UNIX processing running in server, server module)

Venus( user-level process running in a client, client module)

57

Page 58: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

SHARED NAME SPACE: 

The server file space is divided into volumes. Volumes contain files of only one user. It's these volumes that are the level of granularity attached to a client.

A vice file can be accessed using a fid = <volume number, vnode >. The fid doesn't depend on machine location. A client queries a volume-location database for this information.

Volumes can migrate between servers to balance space and utilization. Old server has "forwarding" instructions and handles client updates during migration.

Read-only volumes ( system files, etc. ) can be replicated. The volume database knows how to find these.

DISTRIBUTED FILE SYSTEMSAndrew File System

Page 59: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

FILE OPERATIONS AND CONSISTENCY SEMANTICS: 

Andrew caches entire files form serversA client workstation interacts with Vice servers only during opening

and closing of files Venus – caches files from Vice when they are opened, and stores

modified copies of files back when they are closed Reading and writing bytes of a file are done by the kernel without

Venus intervention on the cached copy Venus caches contents of directories and symbolic links, for path-

name translation Exceptions to the caching policy are modifications to directories that

are made directly on the server responsibility for that directory

DISTRIBUTED FILE SYSTEMSAndrew File System

Page 60: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Clients have a partitioned space of file names: a local name space and a shared name space

Dedicated servers, called Vice, present the shared name space to the clients as an homogeneous, identical, and location transparent file hierarchy

Workstations run the Virtue protocol to communicate with Vice.

Are required to have local disks where they store their local name space

Servers collectively are responsible for the storage and management of the shared name space

DISTRIBUTED FILE SYSTEMSAndrew File System

Page 61: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Clients and servers are structured in clusters interconnected by a backbone LAN

A cluster consists of a collection of workstations and a cluster server and is connected to the backbone by a router

A key mechanism selected for remote file operations is whole file caching

Opening a file causes it to be cached, in its entirety, on the local disk

DISTRIBUTED FILE SYSTEMSAndrew File System

Page 62: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

IMPLEMENTATION – Flow of a request:  Deflection of open/close:  The client kernel is modified to detect references to vice files.

The request is forwarded to Venus with these steps:

Venus does pathname translation.

Asks Vice for the file

Moves the file to local disk

Passes inode of file back to client kernel.

Venus maintains caches for status ( in memory ) and data ( on local disk.)

A server user-level process handles client requests.

A lightweight process handles concurrent RPC requests from clients.

State information is cached in this process.

Susceptible to reliability problems.

DISTRIBUTED FILE SYSTEMSAndrew File System

Page 63: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

New developments -ctd

AFS enhancements DCE/DFS standards, adopts a similar Spritely NFS and NQNFS to callbacks improving in storage organization

Redundant array of inexpensive(RAID) Log-structure file storage(LFS)

New design approaches(UC of Berkeley) xFS (serverless network architecture, file serving responsibility distributed

across LAN) Frangipni( high scalable distributed file system, Digital System Research

Center, 1997)

63

Page 64: Distributed System and Middleware Distributed Systems Distributed File System Dr. Sunny Jeong. spjeong@uic.edu.hk Mr. Coling Zhang colinzhang@uic.edu.hk

Distributed System and Middleware

Summary

File service crucial to the running of a distributed system performance, consistency and easy recovery essential

Design issues separate flat file service from directory service and client module stateless for performance and fault-tolerance caching for performance concurrent updates difficult with caching approximation of one-copy update semantics

Case studies SUN-NFS AFS Recent advances

64