13
February 2012 Filesystems, RPC and HDFS Alexander Lorenz

Filesystems, RPC and HDFS

Embed Size (px)

DESCRIPTION

Comparison between traditional filesystems and HDFS writes

Citation preview

Page 1: Filesystems, RPC and HDFS

February 2012

Filesystems, RPC and HDFSAlexander Lorenz

Page 2: Filesystems, RPC and HDFS

Agenda

2

1 Linux Kernel I/O Scheduler

2 I/O Stack in Linux

3 VFS Implementation

4 NFS RFC Model

5 RPC

6 HDFS

7 Limitations / Problems (Discussion)

©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is

prohibited.

Page 3: Filesystems, RPC and HDFS

Linux Kernel I/O Scheduler

• Disk seek is the slowest operation in a computer

• I/O scheduler arranges the disk head to move in a single direction to minimize seeks

• Prevent Starvation

• Improve overall disk throughput by • Reorder requests to reduce the disk seek time• Merge requests to reduce the number of requests

3

Page 4: Filesystems, RPC and HDFS

Kernel I/O Scheduler Framework

Block layer

external queue device driver

enqueue

Internal queues

dequeue

IO SchedulerMerge, sort

prioritize

• Linux elevator is an abstract layer to which different I/O scheduler can attach

• Merging mechanisms are provided by request queues• Front or back merge of a request and a bio• Merge two requests

• Sorting policy and merge decision are done in elevators• Pick up a request to be merged with a bio• Add a new request to the request queue• Select next request to be processed by block drivers

4

Page 5: Filesystems, RPC and HDFS

Filesystem

Userland

KernelspaceSys Calls

Access Locking

Prefetch Flush

Disk Layout MetaData

HDD Driver

Cache

I/O Stack in Linux

5

Application

Bulk writes

Page 6: Filesystems, RPC and HDFS

VFS Implementation

6

Userland

KernelspaceSys Calls

VFS

ext3 ext2 NFS CIFS

Application

Page 7: Filesystems, RPC and HDFS

NFS RFC Model

Local HDD Local HDD

Applicationwith

NFS Access

FilesystemNFS Client

RPC

TCP/IPUDP

RPC

TCP/IPUDP

NFS ServerFilesystem

Kernelspace Kernelspace

File Handler

7

Page 8: Filesystems, RPC and HDFS

NFS - OSI Model

8

Page 9: Filesystems, RPC and HDFS

RPC

Client Server

Process starts

Process continued

Server waits

Server waits

Server start

PCPE

PR

Termination

RPC Return

RPC Message

Client waits

9

Page 10: Filesystems, RPC and HDFS

HDFS Layer

10

Local Client

VFS

HDFS Application

POSIX API HDFS API

Network HDFSNFS Driver

Page 11: Filesystems, RPC and HDFS

HDFS Model

10

add Blck (src)

write

Pipe

line

HDFS Cluster

Namenode

Client

DN

DN

DN

Block received

Block received

Block received

Page 12: Filesystems, RPC and HDFS

HDFS Write Model

DN

DN

Client NNRPC (ClientProtocol)

RPC (DFSClient.DFSInputStream)

RPC (DataNodeProtocol)

RPC rcv only

FSData stream (socket)

RPC (DataNodeProtocol)

RPC proxy

RPC proxy

DFS

RPCProxy IPC

VFS

HDD

DN intern

xceiver

11

Page 13: Filesystems, RPC and HDFS

Links / Resources

13

The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!

http://dtrace.org/blogs/brendan/2011/05/11/

NFS and RPC Chavalit Srisathapornphat, CISC856

Linux I/O Schedulers Hao-Ran Liu