74
03-10-09 Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc

Distributed File Systems

Embed Size (px)

Citation preview

Page 1: Distributed File Systems

03-10-09Some slides are taken from Professor Grimshaw, Ranveer Chandra, Krasimira Kapitnova, etc

Page 2: Distributed File Systems

3rd year graduate student working with Professor Grimshaw

Interests lie in Operating Systems, Distributed Systems, and more recently Cloud Computing

Also Trumpet Sporty things Hardware Junkie I like tacos … a lot

2

Page 3: Distributed File Systems

File System refresher Basic Issues

Naming / Transparency Caching Coherence Security Performance

Case Studies NFS v3 - v4 Lustre AFS 2.0

3

Page 4: Distributed File Systems

What is a file system?

Why have a file system?

4

Mmmm, refreshing File Systems

Page 5: Distributed File Systems

Must have Name e.g. “/home/sosa/DFSSlides.ppt” Data – some structured sequence of bytes

Tend to also have Size Protection Information Non-symbolic identifier Location Times, etc

5

Page 6: Distributed File Systems

A container abstraction to help organize files Generally hierarchical

(tree) structure Often a special type of

file Directories have a

Name Files and directories (if

hierarchical) within them

6

A large container for tourists

Page 7: Distributed File Systems

Two approaches to sharing files

Copy-based Application explicitly

copies files between machines

Examples: UUCP, FTP, gridFTP, {.*}FTP, Rcp, Scp, etc.

Access transparency – i.e. Distributed File Systems

7

Sharing is caring

Page 8: Distributed File Systems

Basic idea Find a copy

naming is based on machine name of source (viper.cs.virginia.edu), user id, and path

Transfer the file to the local file system scp [email protected]:fred.txt .

Read/write Copy back if modified

Pros ans Cons?8

Page 9: Distributed File Systems

Pros Semantics are clear No OS/library modification

Cons? Deal with model Have to copy whole file Inconsistencies Inconsistent copies all over the place Others?

9

Page 10: Distributed File Systems

Mechanism to access remote the same as local (i.e. through the file system hierarchy)

Why is this better?

… enter Distributed File Systems

10

Page 11: Distributed File Systems

A Distributed File System is a file system that may have files on more than one machine

Distributed File Systems take many forms Network File Systems Parallel File Systems Access Transparent

Distributed File Systems

Why distribute?11

Page 12: Distributed File Systems

Sharing files with other users Others can access your files You can have access to files you wouldn’t

regularly have access to Keeping files available for yourself on

more than one computer Small amount of local resources High failure rate of local resources Can eliminate version problems (same file

copied around with local edits)

12

Page 13: Distributed File Systems

Naming Performance Caching Consistency Semantics Fault Tolerance Scalability

13

Page 14: Distributed File Systems

What does a DFS look like to the user? Mount-like protocol .e.g

/../mntPointToBobsSharedFolder/file.txt Unified namespace. Everything looks like

they’re on the same namespace Pros and Cons?

14

Page 15: Distributed File Systems

Location transparency Name does not hint at physical location Mount points are not transparent

Location Independence File name does not need to be changed when

the file’s physical storage location changes

Independence without transparency?

15

Page 16: Distributed File Systems

Generally trade-off the benefits of DFS’s with some performance hits How much depends on

workload Always look at workload

to figure out what mechanisms to use

What are some ways to improve performance?

16

Page 17: Distributed File Systems

Single architectural feature that contributes most to performance in a DFS!!!

Single greatest cause of heartache for programmers of DFS’s Maintaining consistency semantics more

difficult Has a large potential impact on scalability

17

Page 18: Distributed File Systems

Size of the cached units of data Larger sizes make more

efficient use of the network –spacial locality, latency

Whole files simply semantics but can’t store very large files locally

Small files Who does what

Push vs Pull Important for

maintaining consistency

18

Page 19: Distributed File Systems

Different DFS’s have different consistency semantics UNIX semantics On Close semantics Timeout semantics (at least x-second up-to

date) Pro’s / Con’s?

19

Page 20: Distributed File Systems

Can replicate Fault Tolerance Performance

Replication is inherently location-opaque i.e. we need location independence in naming

Different forms of replication mechanisms, different consistency semantics Tradeoffs, tradeoffs, tradeoffs

20

Page 21: Distributed File Systems

Mount-based DFS NFS version 3 Others include SMB, CIFS, NFS version 4

Parallel DFS Lustre Others include HDFS, Google File System, etc

Non-Parallel Unified Namespace DFS’s Sprite AFS version 2.0 (basis for many other DFS’s)

Coda AFS 3.0

21

Page 22: Distributed File Systems

22

Page 23: Distributed File Systems

Most commonly used DFS ever!

Goals Machine & OS Independent Crash Recovery Transparent Access “Reasonable” Performance

Design All are client and servers RPC (on top of UDP v.1,

v.2+ on TCP) Open Network Computing

Remote Procedure Call External Data

Representation (XDR) Stateless Protocol

23

Page 24: Distributed File Systems

24

Page 25: Distributed File Systems

25

Client sends path name to server with request to mount

If path is legal and exported, server returns file handle Contains FS type, disk, i-node number of

directory, security info Subsequent accesses use file handle

Mount can be either at boot or automount Automount: Directories mounted on-use Why helpful?

Mount only affects client view

Page 26: Distributed File Systems

Mounting (part of) a remote file system in NFS.

26

Page 27: Distributed File Systems

Mounting nested directories from multiple servers in NFS.

27

Page 28: Distributed File Systems

28

Supports directory and file access via remote procedure calls (RPCs)

All UNIX system calls supported other than open & close

Open and close are intentionally not supported For a read, client sends lookup message to server Lookup returns file handle but does not copy info in

internal system tables Subsequently, read contains file handle, offset and

num bytes Each message is self-contained – flexible,

but?

Page 29: Distributed File Systems

a) Reading data from a file in NFS version 3.b) Reading data using a compound procedure in version 4.

29

Page 30: Distributed File Systems

Some general mandatory file attributes in NFS.

Attribute DescriptionTYPE The type of the file (regular, directory, symbolic link)

SIZE The length of the file in bytes

CHANGE Indicator for a client to see if and/or when the file has changed

FSID Server-unique identifier of the file's file system

30

Page 31: Distributed File Systems

Some general recommended file attributes.

Attribute DescriptionACL an access control list associated with the file

FILEHANDLE The server-provided file handle of this file

FILEID A file-system unique identifier for this file

FS_LOCATIONS Locations in the network where this file system may be found

OWNER The character-string name of the file's owner

TIME_ACCESS Time when the file data were last accessed

TIME_MODIFY Time when the file data were last modified

TIME_CREATE Time when the file was created

31

Page 32: Distributed File Systems

All communication done in the clear Client sends userid, group id of request

NFS server Discuss

32

Page 33: Distributed File Systems

Consistency semantics are dirty Checks non-dirty items every 5 seconds Things marked dirty flushed within 30

seconds Performance under load is horrible, why? Cross-mount hell - paths to files different

on different machines ID mismatch between domains

33

Page 34: Distributed File Systems

Goals Improved Access and good performance on

the Internet Better Scalability Strong Security Cross-platform interoperability and ease to

extend

34

Page 35: Distributed File Systems

Stateful Protocol (Open + Close) Compound Operations (Fully utilize

bandwidth) Lease-based Locks (Locking built-in) “Delegation” to clients (Less work for the

server) Close-Open Cache Consistency (Timeouts

still for attributes and directories) Better security

35

Page 36: Distributed File Systems

Borrowed model from CIFS (Common Internet File System) see MS

Open/Close Opens do lookup, create, and lock all in one (what

a deal)! Locks / delegation (explained later) released on

file close Always a notion of a “current file handle” i.e. see

pwd

36

Page 37: Distributed File Systems

Problem: Normal filesystem semantics have too many RPC’s (boo)

Solution: Group many calls into one call (yay)

Semantics Run sequentially Fails on first failure Returns status of each

individual RPC in the compound response (either to failure or success)

37

Compound Kitty

Page 38: Distributed File Systems

Both byte-range and file locks Heartbeats keep locks alive (renew

lock) If server fails, waits at least the agreed

upon lease time (constant) before accepting any other lock requests

If client fails, locks are released by server at the end of lease period

38

Page 39: Distributed File Systems

Tells client no one else has the file Client exposes callbacks

39

Page 40: Distributed File Systems

Any opens that happen after a close finishes are consistent with the information with the last close

Last close wins the competition

40

Page 41: Distributed File Systems

Uses the GSS-API framework

All id’s are formed with User@domain Group@domain

Every implementation must have Kerberos v5

Every implementation must have LIPKey

41

Meow

Page 42: Distributed File Systems

Replication / Migration mechanism added Special error messages to indicate

migration Special attribute for both replication and

migration that gives the location of the other / new location

May have read-only replicas

42

Page 43: Distributed File Systems

43

Page 44: Distributed File Systems

People don’t like to move Requires Kerberos (the death of many

good distributed file systems Looks just like V3 to end-user and V3 is

good enough

44

Page 45: Distributed File Systems

45

Page 46: Distributed File Systems

Need for a file system for large clusters that has the following attributes Highly scalable > 10,000 nodes Provide petabytes of storage High throughput (100 GB/sec)

Datacenters have different needs so we need a general-purpose back-end file system

46

Page 47: Distributed File Systems

Open-source object-based cluster file system

Fully compliant with POSIX Features (i.e. what I will discuss)

Object Protocols Intent-based Locking Adaptive Locking Policies Aggressive Caching

47

Page 48: Distributed File Systems

48

Page 49: Distributed File Systems

49

Page 50: Distributed File Systems

50

Page 51: Distributed File Systems

51

Page 52: Distributed File Systems

52

Page 53: Distributed File Systems

Policy depends on context

Mode 1: Performing operations on something they only mostly use (e.g. /home/username)

Mode 2: Performing operations on a highly contentious Resource (e.g. /tmp)

DLM capable of granting locks on an entire subtree and whole files

53

Page 54: Distributed File Systems

POSIX Keeps local journal of

updates for locked files One per file operation Hard linked files get special

treatment with subtree locks

Lock revoked -> updates flushed and replayed

Use subtree change times to validate cache entries

Additionally features collaborative caching -> referrals to other dedicated cache service

54

Page 55: Distributed File Systems

Security Supports GSS-API

Supports (does not require) Kerberos Supports PKI mechanisms

Did not want to be tied down to one mechanism

Page 56: Distributed File Systems

56

Page 57: Distributed File Systems

57

Named after Andrew Carnegie and Andrew Mellon Transarc Corp. and then IBM took development of AFS In 2000 IBM made OpenAFS available as open source

Goals Large scale (thousands of servers and clients) User mobility Scalability Heterogeneity Security Location transparency Availability

Page 58: Distributed File Systems

Features: Uniform name space Location independent file sharing Client side caching with cache

consistency Secure authentication via Kerberos High availability through automatic

switchover of replicas Scalability to span 5000 workstations

58

Page 59: Distributed File Systems

59

Based on the upload/download model Clients download and cache files Server keeps track of clients that cache the file Clients upload files at end of session

Whole file caching is key Later amended to block operations (v3) Simple and effective

Kerberos for Security AFS servers are stateful

Keep track of clients that have cached files Recall files that have been modified

Page 60: Distributed File Systems

60

Clients have partitioned name space: Local name space and shared name space Cluster of dedicated servers (Vice) present

shared name space Clients run Virtue protocol to communicate

with Vice

Page 61: Distributed File Systems

61

Page 62: Distributed File Systems

62

AFS’s storage is arranged in volumes Usually associated with files of a particular client

AFS dir entry maps vice files/dirs to a 96-bit fid Volume number Vnode number: index into i-node array of a volume Uniquifier: allows reuse of vnode numbers

Fids are location transparent File movements do not invalidate fids

Location information kept in volume-location database Volumes migrated to balance available disk space,

utilization Volume movement is atomic; operation aborted on server

crash

Page 63: Distributed File Systems

User process –> open file F

The kernel resolves that it’s a Vice file ->

passes it to Venus

D is in the cache & has callback –> use it without any network communication

D is in cache but has no callback –> contact the appropriate server for a new copy; establish callback

D is not in cache –> fetch it from the server; establish callback

File F is identified -> create a current cache copy

Venus returns to the kernel which opens F and returns its handle to the process

63

Page 64: Distributed File Systems

64

AFS caches entire files from servers Client interacts with servers only during open and close

OS on client intercepts calls, passes to Venus Venus is a client process that caches files from servers Venus contacts Vice only on open and close Reads and writes bypass Venus

Works due to callback: Server updates state to record caching Server notifies client before allowing another client to modify Clients lose their callback when someone writes the file

Venus caches dirs and symbolic links for path translation

Page 65: Distributed File Systems

The use of local copies when opening a session in Coda.

65

Page 66: Distributed File Systems

A descendent of AFS v2 (AFS v3 went another way with large chunk caching)

Goals More resilient to server and network

failures Constant Data Availability Portable computing

66

Page 67: Distributed File Systems

Keeps whole file caching, callbacks, end-to-end encryption

Adds full server replication General Update Protocol

Known as Coda Optimistic Protocol COP1 (first phase) performs actual semantic

operation to servers (using multicast if available) COP2 sends a data structure called an update set

which summarizes the client’s knowledge. These messages are piggybacked on later COP1’s

67

Page 68: Distributed File Systems

Disconnected Operation (KEY) Hoarding

Periodically reevaluates which objects merit retention in the cache (hoard walking)

Relies on both implicit and a lot of explicit info (profiles etc)

Emulating i.e. maintaining a replay log Reintegration – re-play replay log

Conflict Resolution Gives repair tool Log to give to user to manually fix issue

68

Page 69: Distributed File Systems

The state-transition diagram of a Coda client with respect to a volume.

69

Page 70: Distributed File Systems

AFS deployments in academia and government (100’s)

Security model required Kerberos Many organizations not willing to make the costly

switch AFS (but not coda) was not integrated into

Unix FS. Separate “ls”, different – though similar – API

Session semantics not appropriate for many applications

70

Page 71: Distributed File Systems

Goals Efficient use of large main memories Support for multiprocessor workstations Efficient network communication Diskless Operation Exact Emulation of UNIX FS semantics

Location transparent UNIX FS

71

Page 72: Distributed File Systems

Naming Local prefix table which maps path-name prefixes to

servers Cached locations Otherwise there is location embedded in remote stubs in

the tree hierarchy Caching

Needs sequential consistency If one client wants to write, disables caching on all open

clients. Assumes this isn’t very bad since this doesn’t happen often

No security between kernels. All over trusted network

72

Page 73: Distributed File Systems

The best way to implement something depends very highly on the goals you want to achieve

Always start with goals before deciding on consistency semantics

73

Page 74: Distributed File Systems

74