32
1 UNIX Internals – the New Frontiers Distributed File Systems

1 UNIX Internals – the New Frontiers Distributed File Systems

Embed Size (px)

Citation preview

Page 1: 1 UNIX Internals – the New Frontiers Distributed File Systems

1

UNIX Internals – the New Frontiers

Distributed File Systems

Page 2: 1 UNIX Internals – the New Frontiers Distributed File Systems

2

Difference between DOS and DFS

Distributed OS looks like a centralized OS, but runs simultaneously on multiple machines. It may provide a FS shared by all its host machines.

Distributed FS is a software layer that manages communication between conventional operating systems and file systems

Page 3: 1 UNIX Internals – the New Frontiers Distributed File Systems

3

General Characteristics of DFS

Network transparency Location transparency & Location

independence User Mobility Fault tolerance Scalability File mobility

Page 4: 1 UNIX Internals – the New Frontiers Distributed File Systems

4

Design Considerations Name Space Stateful or stateless Semantics of sharing

UNIX semantics Session semantics

Remote access method

Page 5: 1 UNIX Internals – the New Frontiers Distributed File Systems

5

Network File System(NFS)

Based on Client-server model Communicate via remote procedure call

Page 6: 1 UNIX Internals – the New Frontiers Distributed File Systems

6

User Perspective An NFS server exports one or more file

systems Hard mount: must get a reply Soft mount: returns an error Spongy mount: hard for mount, soft for I/O

Commands: mount –t nfs nfssrv:/usr /usr mount –t nfs nfssrv:/usr/u1 /u1 mount –t nfs nfssrv:/usr /users mount –t nfs nfssrv:/usr/local

/usr/local

Page 7: 1 UNIX Internals – the New Frontiers Distributed File Systems

7

Page 8: 1 UNIX Internals – the New Frontiers Distributed File Systems

8

Design goals Not restricted to UNIX Not be dependent on any hardware Simple recovery mechanisms To access remote files transparently UNIX semantics NFS performance must be comparable

to that of a local disk Transport-independent

Page 9: 1 UNIX Internals – the New Frontiers Distributed File Systems

9

NFS components

NFS protocol RPC protocol XDR(Extended Data Representation) NFS server code NFS client code Mount protocol Daemon processes (nfsd, mountd,biod) NLM(Network Lock Manager)& NSM(Network Status Monitor)

Page 10: 1 UNIX Internals – the New Frontiers Distributed File Systems

10

Statelessness Each request is independent It makes crash recovery simple

Client crash Server crash

Problem: It must commit all modifications to stable

storage before replying to a request.

Page 11: 1 UNIX Internals – the New Frontiers Distributed File Systems

11

10.4 The protocol suite

Why XDR? Differences among internal

representation of data elements: Order, sizes of types. Opaque (byte stream) Typed Little-endian Big-endian

Page 12: 1 UNIX Internals – the New Frontiers Distributed File Systems

12

XDR

Integers 32 bits, (0 byte leftmost - most significant),

(signed integers - 2’s compliment) Variable-length opaque data

Length(4B),data is NULL padded Strings

Length(4B), ASCII string, NULL padded Arrays

size(4B),same type of data Structures

Natural order

Page 13: 1 UNIX Internals – the New Frontiers Distributed File Systems

13

Page 14: 1 UNIX Internals – the New Frontiers Distributed File Systems

14

RPC Specify the format of communications

between the client and the server. SUN RPC: synchronous requests only. Implemented on UDP/IP. Authentication to identify callers

AUTH _NULL, AUTH _UNIX, AUTH_SHORT, AUTH _DES, and AUTH _KERB

RPC language compiler: rpcgen

Page 15: 1 UNIX Internals – the New Frontiers Distributed File Systems

15

Page 16: 1 UNIX Internals – the New Frontiers Distributed File Systems

16

10.5 NFS Implementation Control Flow Vnode Rnode

Page 17: 1 UNIX Internals – the New Frontiers Distributed File Systems

17

File Handle Assign a file handle for lookup, create or

mkdir. Subsequent I/O operations will use it. A file handle =Opaque 32B object =<file

system ID, inode number, generation number>

Generation number is used to check if the file is not obsolete (its inode is allocated to another file)

Page 18: 1 UNIX Internals – the New Frontiers Distributed File Systems

18

The mount operation nfs_mount():

send RPC request with argument of pathname

Mountd daemon translate Checks Reply success with a file handle Initialize vfs, records name, address Allocate rnode & vnode Server must check access rights on each

request

Page 19: 1 UNIX Internals – the New Frontiers Distributed File Systems

19

Pathname Lookup Client:

Initiate lookup during open, create & stat From current or root directory, proceeds one

component at a time Send request if it is a NFS directory

Server From file handle ->FS ID->vfs->VGET-> vnode

->VOP_LOOKUP->vnode & pointer VOP_GETATTR->VOP_FID-> file handle Reply message= status+file handle+file attributes

Client: Gets the reply, allocates rnode+vnode, copy info and

proceeds to search for the next component

Page 20: 1 UNIX Internals – the New Frontiers Distributed File Systems

20

10.6 UNIX Semantics

NFS leads to a few incompatibilities with UNIX because of stateless.

Open file permission UNIX checks for open NFS checks for each read and write In NFS, the server always allows the owner of the

file to read or write the file. Write to the write-protected?

Save attributes containing the file permission when open

Page 21: 1 UNIX Internals – the New Frontiers Distributed File Systems

21

Deletion of open files The server has no ideas about the

open file. The clients renames the file to be

deleted. Delete it when closing it Delete on different machines?

Page 22: 1 UNIX Internals – the New Frontiers Distributed File Systems

22

Reads and Writes UNIX locks the vnode at the start of I/O NFS clients can lock the vnode on the

same machine. NFS offers no protection against

overlapping I/O requests. Using NLM(Network Lock Manager)

protocol is only advisory.

Page 23: 1 UNIX Internals – the New Frontiers Distributed File Systems

23

10.7 NFS Performance

Bottlenecks Writes must be committed to stable storage Fetching of file attributes requires one RPC

call per file Processing retransmitted requests adds to

the load on the server

Page 24: 1 UNIX Internals – the New Frontiers Distributed File Systems

24

Client-side caching Caching both blocks and file attributes To avoid invalid data

Keep an expiry time in the kernel 60 seconds for rechecking the modified time

Reduces but not eliminates the problem

Page 25: 1 UNIX Internals – the New Frontiers Distributed File Systems

25

Deferral of writes

Asynchronous writes for full blocks Delayed writes for partial blocks Flush delayed writes when closing or 30

seconds by biod daemon Server uses NVRAM buffer, flushes the

buffer to disk Write-gathering:

Wait, process >1 writes to one file and reply for each

The server process gathered write requests

Page 26: 1 UNIX Internals – the New Frontiers Distributed File Systems

26

The retransmissions cache Idempotent Nonidempotent Problem:

Retransmissions (xid) cache (server): Check xid, procedure number, & client ID Check cache only when failure

Remove request Remove, sends reply success, but lostClient restransmit removeServer processes remove request Remove error, sends remove failureClient receives the error message

Page 27: 1 UNIX Internals – the New Frontiers Distributed File Systems

27

New implementation

Caches all requests Check xid, procedure number, client ID, state

field & timestamp If request in progress, discard; if done,

discards if timestamp shows the request is in the throwaway window(3-6s)

Otherwise processes request if idempotent; For nonidempotent, checks the file if

modified, if not - send success; otherwise, retry it.

Page 28: 1 UNIX Internals – the New Frontiers Distributed File Systems

28

10.9 NFS Security NFS Access Control

On mount and request By an exports list

Mount: checks the list, denies the ineligible Request: authentication information,

AUTH_UNIX form(UID,GID)

Loophole: a imposter can use <UID,GID> to access the files of others

Page 29: 1 UNIX Internals – the New Frontiers Distributed File Systems

29

UID Remapping

A translation map for each client. Same UID may map to different UID on

the server Nobody if does not match in the map Implemented at RPC level Implemented at NFS level

Merging the map and /etc/exports file

Page 30: 1 UNIX Internals – the New Frontiers Distributed File Systems

30

Root Remapping Map the super user to nobody Limit the super user of the client to

access files on the server The UNIX framework is designed for an

isolated, multi-user environment. The users trust each other.

Page 31: 1 UNIX Internals – the New Frontiers Distributed File Systems

31

10.10 NFS Version 3 Commit request

Client writes, the kernel sends asynchronous write

Server saves to local cache, replies immediately Client holds the data copy until the process

closes the file and sends commit request Server flushes data to disk

file length: From 32 bits(4GB) to 64 bits(234 GB)

READDIRPLUS =(LOOKUP+GETATTR) Returns names, file handles, file attributes

Page 32: 1 UNIX Internals – the New Frontiers Distributed File Systems

32

Other DFS

The Andrew File System (10.15 – 10.17)

The DCE Distributed File System (10.18 – 10.18.5)