23
HDFS Architecture HDFS and CAP Applied Big Data and Visualization P. Healy CS1-08 Computer Science Bldg. tel: 202727 [email protected] Spring 2019–2020 P. Healy (University of Limerick) CS6502 Spring 2019–2020 1 / 13

Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Applied Big Data and Visualization

P. Healy

CS1-08Computer Science Bldg.

tel: [email protected]

Spring 2019–2020

P. Healy (University of Limerick) CS6502 Spring 2019–2020 1 / 13

Page 2: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Outline

1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing

2 HDFS and CAPImplications for CAP

P. Healy (University of Limerick) CS6502 Spring 2019–2020 2 / 13

Page 3: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Outline

1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing

2 HDFS and CAPImplications for CAP

P. Healy (University of Limerick) CS6502 Spring 2019–2020 3 / 13

Page 4: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Typical Hadoop Cluster

Large HDFS instances run on a cluster of computers thatcommonly spread across many racksCommunication between two nodes in different racks hasto go through switches

P. Healy (University of Limerick) CS6502 Spring 2019–2020 4 / 13

Page 5: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Typical Hadoop Cluster

Usually, network bandwidth between machines in the samerack is greater than network bandwidth between machinesin different racks

P. Healy (University of Limerick) CS6502 Spring 2019–2020 4 / 13

Page 6: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

Placement of replicas is critical to HDFS reliability andperformanceHDFS places emphasis on optimizing replica placement

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 7: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

Rack-aware replica placement policy can improve datareliability, availability, and network bandwidth utilizationNeeds lots of tuning and experience to get right, however

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 8: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

A simple but non-optimal policy is to place replicas onunique racks:

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 9: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

3 prevents losing data when an entire rack fails and allowsuse of bandwidth from multiple racks when reading data

3 evenly distributes replicas in the cluster which makes iteasy to balance load on component failure

7 increases cost of writes because a write needs to transferblocks to multiple racks

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 10: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

Usually (when RF=3) HDFS’s placement policy is to putone replica on one node in the local rack, another on adifferent node in the local rack, and the last on a differentnode in a different rack

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 11: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

3 cuts the inter-rack write traffic which generally improveswrite performance

3 chance of rack failure is far less than that of node failure3 does not impact data reliability and availability guarantees3 aggregate network bandwidth used when reading data is

reduced since a block is placed in only two unique racksrather than three

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 12: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

7 With this policy, the replicas of a file do not evenlydistribute across the racks

7 One third of replicas are on one node, two thirds of replicasare on one rack, and the other third are evenly distributedacross the remaining racksOverall improves write performance without compromisingdata reliability or read performance

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 13: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Placement of Replica Blocks

If 3 < RF, placement of 4th and following replicas aredetermined randomly while keeping the number of replicasper rack below the upper limit (which is (#replicas - 1) /#racks + 2).

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13

Page 14: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Replica Selection for Reading

Goal: Minimize global bandwidth consumption (packetcontention) and read latency

HDFS tries to satisfy a read request from a replica that isclosest to the readerIf there exists a replica on the same rack as the readernode, then that replica is preferred to satisfy the readrequestIf HDFS cluster spans multiple data centers, then a replicathat is resident in the local data center is preferred overany remote replica.

To ensure data integrity each block’s checksum is verified onretrieval

P. Healy (University of Limerick) CS6502 Spring 2019–2020 6 / 13

Page 15: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Outline

1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing

2 HDFS and CAPImplications for CAP

P. Healy (University of Limerick) CS6502 Spring 2019–2020 7 / 13

Page 16: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Write Architecture

To write this file into HDFS:HDFS client connects to the NameNode for a WriteRequest against the two blocks, A, BNameNode grants client write permission and provides IPaddresses of destination DataNodes; destinations basedon availability, replication factor and rack awarenessSuppose NameNode provides following lists of IPaddresses to the client (RF=3):

Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP ofDataNode 6}Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP ofDataNode 9}

Each block copied in pipeline fashionP. Healy (University of Limerick) CS6502 Spring 2019–2020 8 / 13

Page 17: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Replication Pipelining

Set up of PipelineBefore writing blocks, client confirms whether theDataNodes, present in each of the list of IPs, are ready toreceive the data or notSo, client creates a pipeline for each of the blocks byconnecting the individual DataNodes in the respective listfor that blockConsidering Block A: list A = {IP of DataNode 1, IP ofDataNode 4, IP of DataNode 6}

P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13

Page 18: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Replication Pipelining

Data streaming and replication“Pass it on”: Block A is passed to first nominatedDataNode, 1, and is stored thereDataNode 1 then passes the block along to DataNode 4,the second nominated DataNodeIt, lastly, passes it along to DataNode 6

P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13

Page 19: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Replica Placement / SelectionHDFS Block-Writing

Replication Pipelining

Shutdown of Pipeline (Acknowledgement stage)Reverse of previous operationDataNode 6 acknowledges to DataNode 4 that it wrote theblock safelyIn turn, DataNode 4 acks its predecessorOperation complete when NameNode receives the final ack

P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13

Page 20: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Implications for CAP

Outline

1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing

2 HDFS and CAPImplications for CAP

P. Healy (University of Limerick) CS6502 Spring 2019–2020 10 / 13

Page 21: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Implications for CAP

What is CAP

A distributed file system requires a network, which may fail,resulting in a partition of the networkThe CAP theorem states that it is impossible for adistributed data store to simultaneously provide more thantwo out of the following three guarantees:

1 Consistency equivalent to having a single up-to-date copyof the data

2 high Availability of that data (for updates) but may not bemost current

3 tolerance to network PartitionsWhen a network partition failure happens should we

Cancel the operation thus ensuring consistency butdecreasing the availabilityProceed with the operation thus risking inconsistency butproviding availability

Some people claim this concern is overblownP. Healy (University of Limerick) CS6502 Spring 2019–2020 11 / 13

Page 22: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Implications for CAP

Hadoop and CAP

HDFS has a unique central decision point, the NameNodeThus it can only fall in the CP side, since taking down thenamenode takes down the entire HDFS system (noAvailability)Hadoop does not try to hide this (from home page):

The NameNode is a Single Point of Failure for the HDFSCluster. HDFS is not currently a High Availability system.When the NameNode goes down, the file system goesoffline.There is an optional SecondaryNameNode that can behosted on a separate machine.It only creates checkpoints of the namespace by mergingthe edits file into the fsimage file and does not provide anyreal redundancy.

P. Healy (University of Limerick) CS6502 Spring 2019–2020 12 / 13

Page 23: Applied Big Data and Visualizationgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect05.pdf · Hadoop and CAP (contd.) Since decision where to place data and where it can be read

HDFS ArchitectureHDFS and CAP

Implications for CAP

Hadoop and CAP (contd.)

Since decision where to place data and where it can beread from is always handled by the NameNode, whichmaintains a consistent view in memory, HDFS is alwaysconsistent (C)It is also partition-tolerant in that it can handle loosing datanodes, subject to replication factor and data topologystrategies

P. Healy (University of Limerick) CS6502 Spring 2019–2020 13 / 13