50
Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha Instructor Sandhya Dwarkadas November 08, 2011 Tuesday, November 8, 2011

Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

  • Upload
    letu

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Distributed File System: NFS, AFS, GPFS, and GFS

PresentersTiantong Yu and Phyo Thiha

InstructorSandhya Dwarkadas

November 08, 2011

Tuesday, November 8, 2011

Page 2: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

A Simple Model of a Distributed File System

NetworkRead File (RPC)

Data

Client

Server

Writ

e Fi

le (R

PC)

Ack

ServerCache

ClientCache

Client

Tuesday, November 8, 2011

Page 3: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Issues To Consider For A Distributed File System

• Name Resolution

• Security

• Consistency

• Are client copy and server copy of data consistent?

• Are metadata and actual file data consistent?

• Synchronization

• Simultaneous reads and writes

• Reliability

• What happens when crashes or disk failures occur

• Scalability (Related to performance)

• throughput, network load, delay

Tuesday, November 8, 2011

Page 4: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Network File System (NFS)• Three layers (see picture)

• RPC for file operations on server

• read, write

• search/maintain directories

• access metadata

• Write-through caching - safer but slower

• Stateless - Quick recovery from server failure

• Polling, no synchronization (better for newer versions)

• Bad scalability

• But simple and portable

Tuesday, November 8, 2011

Page 5: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

NFSSystem Calls Interface

VFS Interface

Other FS Local Main FS NFS Client

Local Main FSVFS Interface

NFS Server

Network

RPC/XDR RPC/XDR

Disk

Disk

Tuesday, November 8, 2011

Page 6: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Andrew File System(AFS)

• Linux-like interface

• Unified namespace (fid)

• Centrally manages clients’ states

• Use callbacks to maintain consistency

• Good Scalability

• Enhanced Security (access control list, encrypted password)

Tuesday, November 8, 2011

Page 7: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Special Case: GPFS • Designed for large computing clusters (support up to 4096

1TB disks), truly parallel

• Imitates the behavior of a general POSIX file system

• Integrate data striping in the file system

• Divide data into large trunks (256k) and distributed in round-robin.

• Use logging system to recover from faults

• Use distributed locking to ensure synchronization (vs. central management)

• There is still a token manager needed (possible bottleneck)

Tuesday, November 8, 2011

Page 8: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

File System Nodes

Switching Fabric

Disks

GPFS Structure

Tuesday, November 8, 2011

Page 9: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Name Resolution

• Hostname:local-path-name

• Mounting

• Globally unique file name

• Need some name translation service

• AFS fids

• Common in peer-to-peer systems

Tuesday, November 8, 2011

Page 10: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Consistency

• Goal: every client sees the same data

• Sources of inconsistency

• client crashes before a write completes

• server crashes

• loses data in memory (metadata and actual file inconsistency)

• loses client states (what might happen?)

• caching

• disk failures

Tuesday, November 8, 2011

Page 11: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Consistency - Con’t

• Solutions

• write-through caching (eg. NFS)

• modified data must be committed to server first

• pros and cons ?

• stateless protocols (eg. NFS)

• readAt(inumber, position) instead of read(filename)

• polling (eg. NFS), call-backs(eg. AFS)

Tuesday, November 8, 2011

Page 12: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Synchronization• Not much different from general synchronization model

• There is a tradeoff between synchronization and performance

• NFS does nothing about synchronization

• race conditions possible

• AFS uses write-on-close

• no partial writes

• do not get newer versions until reopen

• GPFS uses distributed locking protocol

Tuesday, November 8, 2011

Page 13: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Reliability

• Server data integrity (e.g. replicas, RAID system)

• Recover from client crash

• not big deal, just need to ensure data consistency (refer to consistency)

• Recover from server crash

• wait-and-continue (NFS) (pros and cons?)

• recover from information kept by clients (AFS)

• recovers from log of operations (GPFS)

Tuesday, November 8, 2011

Page 14: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Security

• Normal file system securities (e.g. access control, identity authentication)

• Network considerations (encrypted message)

Tuesday, November 8, 2011

Page 15: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Scalability• Depends on the implementation decision on other

issues

• At the same time, important factor that affects the implementation decisions on other issues

• Special efforts to improve scalability

• GPFS’s striping (e.g enable control over load balancing)

• NFS does not scale well (why?)

• AFS scales better (why?)

• GFS and GPFS has excellent scalability (teaser)

Tuesday, November 8, 2011

Page 16: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Workload

> one billion searches a day1

Gmail, Google Maps, YouTube, … Network Transactions

Goals Scalable, Reliable, Available Cheap?

1. Retrieved on November 06, 2011 from: http://www.forbes.com/sites/quentinhardy/2011/06/11/google-scale-changes-everything/

Tuesday, November 8, 2011

Page 17: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Assumptions

Failures are norm Files are HUGE (multi GB) Sequential reads Append to write Hundreds of producers write concurrently Bandwidth more important than Latency

Tuesday, November 8, 2011

Page 18: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Architecture

Chunk Server

Chunk [64MB]

Commodity Linux Machine

Tuesday, November 8, 2011

Page 19: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS

Commodity Linux Machine

Meta Data

name space access control mappings of files to chunk location of chunks chunk version numbers

Master Server

Tuesday, November 8, 2011

Page 20: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS

Master Server

Master – Chunk Servers Relationship

Tuesday, November 8, 2011

Page 21: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS

Chunk Server

Master Server

Master – Chunk Servers Relationship

Tuesday, November 8, 2011

Page 22: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS

Chunk Server

Master Server

Master – Chunk Servers Relationship

Chunk Location Info Chunk (Re)placement Dead/Alive? Disk Failure/Corruption

Tuesday, November 8, 2011

Page 23: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFSChunk Server 1

Meta Data

Master – Chunk Servers Relationship Heartbeat

Chunk Server 2 Heartbeat

.

.

.

Tuesday, November 8, 2011

Page 24: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Clients

Google Programmer

Gmail

Tuesday, November 8, 2011

Page 25: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFSChunk Server

Master

Tuesday, November 8, 2011

Page 26: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFSChunk Server

Master

Request for Meta Data

Tuesday, November 8, 2011

Page 27: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFSChunk Server

MasterChunk server locationFile-to-chunk mappings

Tuesday, November 8, 2011

Page 28: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFSChunk Server

Master

Read/write Request

Tuesday, November 8, 2011

Page 29: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Scalability - Very High

Commodity hardware Minimal Master-Client communication Simple

~One million servers2

2. Retrieved on November 06, 2011 from: http://www.datacenterknowledge.com/archives/2011/08/01/report-google-uses-about-900000-servers/

Tuesday, November 8, 2011

Page 30: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Big Picture of GFS Availability & Reliability - Very High

Replicate Data (Default 3 copies) Data on Different Racks Shadow Masters Fast Recovery

2. Retrieved on November 06, 2011 from: http://www.datacenterknowledge.com/archives/2011/08/01/report-google-uses-about-900000-servers/

Tuesday, November 8, 2011

Page 31: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 1 Issues Read Request

GFS Client

File name, Byte Range

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 32: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 2 Translate and Forward

File name, Chunk Index

GFS Client

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 33: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 3 Master Responds

GFS Client

Chunk handle, Replica locations

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 34: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 4 Picks Replica, Sends Request

GFS Client

Chunk handle, Byte Range

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 35: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 5 Replica Sends Data Back

GFS Client

DataChunk Servers

Replica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 36: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Read – Step 6 Client Forwards Data

GFS Client

Data

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 37: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Write – Step 1 Application Sends Data

GFS Client

File name, Data

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 38: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Write – Step 2 Translate and Forward

File name, Chunk Index

GFS Client

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 39: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS

Master

Write – Step 3 Master Responds

GFS Client

Chunk handle, Primary + Secondary Replica locations

Chunk ServersReplica 1 Replica 2 Replica 3

Tuesday, November 8, 2011

Page 40: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Write – Step 4

Client Pushes Data

GFS Client

Data

Primary Replica

Secondary Replica

Tertiary Replica

Buffer

Buffer

Buffer

Data

Data

Tuesday, November 8, 2011

Page 41: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Write – Step 5

Client Tells Primary to Write

GFS Client

Write( )

Primary Replica

Secondary Replica

Tertiary Replica

Buffer

Buffer

Buffer

Tuesday, November 8, 2011

Page 42: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Write – Step 6

Primary Tells Others to Write

GFS Client

Primary Replica

Secondary Replica

Tertiary Replica

Buffer

Buffer

Write( )

Tuesday, November 8, 2011

Page 43: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Write – Step 7

Primary Notifies Client

GFS Client

Primary Replica

Secondary Replica

Tertiary Replica

FailOrSuccess( )

Tuesday, November 8, 2011

Page 44: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Append more common Difference between Append and Write

At Step 6• Primary checks space remaining• Enough space same as regular write

• Not enough space pad the space,• Tells client to retry on next chunk

Atomic, GFS chooses offset

Tuesday, November 8, 2011

Page 45: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Details of GFS Atomic mutations (Write or Append)

Master’s operation log keeps global total order Concurrency

Locking Namespace = mapping full pathnames to

metadata• E.g., Snapshot-ing ‘/home/user’ to ‘/

save/user’• Avoid creating ‘/home/user/foo’

Tuesday, November 8, 2011

Page 46: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

Take-away Points of GFS Cheap, scalable, highly reliable and available

Single master bottleneck Google has developed a distributed master system3 Scales to hundreds of masters

Used by Google!

No full disclosure of the technology

3. Retrieved on November 06, 2011 from: http://storagemojo.com/2009/08/17/google-

Tuesday, November 8, 2011

Page 47: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

NFS vs. AFS vs. GPFS vs. GFS

State Name Resolution Caching Consistency

NFS UDP TCP/IP Local directory YES Write-through caching

AFS UDP, TCP Global Name YES Strong Consistency

GPFS SAN Local directory YES Strong Consistency

GFS TCP/IP Full path name None Relaxed consistency model

Tuesday, November 8, 2011

Page 48: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

NFS vs. AFS vs. GPFS vs. GFS – Cont.

Concurrency Reliability Scalability

NFS None? Normal Weak

AFS Write-on-close Normal Normal

GPFS Distributed Lock Logging Very High

GFSSingle master, Lease mechanism, Serialization

Replication Very High

Tuesday, November 8, 2011

Page 49: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

References Sanjay Ghemawat, Howard Gobioff, and Shun-

Tak Leung. 2003. The Google file system. SIGOPS Oper. Syst. Rev. 37, 5 (October 2003), 29-43.

John H Howard. 1988. An overview of the Andrew File System.

Michael Leon Kazar. Synchronization and Caching issues with Andrew File System.

Frank Schmuck and Roger Haskin. 2002. GPFS: A Shared-Disk File System for Large Computing Clusters

The Google File System presentation slides. Presented at SOSP ’03. URL: http://os.inf.tu-dresden.de/Studium/DOS/SS2011/04-GFS-2.pdf

Tuesday, November 8, 2011

Page 50: Distributed File System: NFS, AFS, GPFS, and GFSsandhya/csc256/seminars/dfs_phyo_tiantong.… · Distributed File System: NFS, AFS, GPFS, and GFS Presenters Tiantong Yu and Phyo Thiha

THANK YOU and

QUESTIONS?

Tuesday, November 8, 2011