Hota hadoop

Preview:

Citation preview

File Systems for File Systems for Cloud ComputingCloud Computing

Chittaranjan Hota, PhDFaculty Incharge, Information Processing Division

Birla Institute of Technology & Science-Pilani, Hyderabad CampusJawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India

hota@hyderabad.bits-pilani.ac.in

16th March 2013Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar

Growth of the InternetGrowth of the Internet

Source: Cisco VNI Global Forecast, 2011-2016

Source: Internet world stats

Golden era in Golden era in Computing Computing

Cloud Futures 2011, Redmond

Cloud computing: Is it Cloud computing: Is it a hype?a hype?

 from $41 billion in 2011 to $241 billion in 2020

Scaling up…Scaling up…SETI

What is Cloud What is Cloud Computing?Computing?

FilesFiles•Permanent Storage•Information sharing •Files have data and attributes

What Distributed File What Distributed File System ProvidesSystem Provides

• Provide accesses to data stored at servers using file system interfaces

• What are the file system interfaces?o Open a file, check status on a file, close a fileo Read data from a fileo Write data to a fileo Lock a file or part of a fileo List files in a directory, delete a directoryo Delete a file, rename a file, add a symbolic link to a file

etc.

DFS Design IssuesDFS Design Issues

• Mounting• Caching• Hints• Bulk Data Transfer• Replica management• Writing policies

NFS architectureNFS architectureClient computer Server computer

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

Virtual file systemVirtual file system

PC

DO

S

UNIX kernel

system calls

RPC for (remote operations)

UNIX

Operations on local files

Operationson

remote files

UNIX kernel

Network

Google File SystemGoogle File SystemMetadata: namespace, access control, mapping of files to chunks, and current location of chunks

1

2

3

4

HDFS DesignHDFS Design

•Files stored as blockso Default 64MB

•Reliability through replicationo replicated across 3+ DataNodes

•Single NameNode coordinates access, metadatao Centralized management

•No data cachingo Little benefit due to large data sets, streaming reads

Commodity HardwareCommodity Hardware

HDFS ArchitectureHDFS Architecture

HDFS-Aware Application

POSIX API HDFS API

Regular VFS with local and NFS-supported files

Specific drivers

Separate HDFS view

Network stack

HDFS NameNode

HDFS NameNode

HDFS DataNodeHDFS DataNode

HDFS DataNodeHDFS DataNode

HDFS ArchitectureHDFS ArchitectureNamenode

B

replication

Rack1 Rack2

Client

Blocks

Datanodes Datanodes

Client

Write

Read

Metadata opsMetadata(Name, replicas, …)

Block ops

HDFS File ReadHDFS File Read

HDFS Client

Client Node

Distributed FileSystems

FSData InputStream

1: open

3: read

6: close

NameNodeNameNode

namenode

2: get block location

DataNodeDataNode

datanode

DataNodeDataNode

datanode

DataNodeDataNode

datanode

4: read5: read

Hadoop ClustersHadoop Clusters

Rack AwarenessRack Awareness

node

r1 r2 r1 rack

n2

d1 d2 Data center

d=2

n1 n1

d=0

n1

d=4d=6

HDFS WriteHDFS Write

HDFS Client

Client Node

Distributed FileSystems

FSData OutputStream

1: create

3: write

6: close

NameNodeNameNode

namenode

2: create

DataNodeDataNode

datanode

DataNodeDataNode

datanode

DataNodeDataNode

datanode

4: write packet 5: ack packet

7: complete

Pipeline

4

5 5

4

Data Center

NODE

RACK

Replica PlacementReplica Placement

Computational GridsComputational Grids

[Source: IBM TJ Watson Research Center]

Load DistributionLoad Distribution

Map/ReduceMap/Reduce

SLURMSLURM

Crowd SourcingCrowd Sourcing

Foxtrot: Associating Foxtrot: Associating audio with locationsaudio with locations

Allen Telescope Array 

Search for Extra Search for Extra Terrestrial Intelligence Terrestrial Intelligence

Thank You!

Recommended