Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
HDFS ArchitectureHDFS and CAP
Applied Big Data and Visualization
P. Healy
CS1-08Computer Science Bldg.
tel: [email protected]
Spring 2019–2020
P. Healy (University of Limerick) CS6502 Spring 2019–2020 1 / 13
HDFS ArchitectureHDFS and CAP
Outline
1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing
2 HDFS and CAPImplications for CAP
P. Healy (University of Limerick) CS6502 Spring 2019–2020 2 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Outline
1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing
2 HDFS and CAPImplications for CAP
P. Healy (University of Limerick) CS6502 Spring 2019–2020 3 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Typical Hadoop Cluster
Large HDFS instances run on a cluster of computers thatcommonly spread across many racksCommunication between two nodes in different racks hasto go through switches
P. Healy (University of Limerick) CS6502 Spring 2019–2020 4 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Typical Hadoop Cluster
Usually, network bandwidth between machines in the samerack is greater than network bandwidth between machinesin different racks
P. Healy (University of Limerick) CS6502 Spring 2019–2020 4 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
Placement of replicas is critical to HDFS reliability andperformanceHDFS places emphasis on optimizing replica placement
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
Rack-aware replica placement policy can improve datareliability, availability, and network bandwidth utilizationNeeds lots of tuning and experience to get right, however
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
A simple but non-optimal policy is to place replicas onunique racks:
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
3 prevents losing data when an entire rack fails and allowsuse of bandwidth from multiple racks when reading data
3 evenly distributes replicas in the cluster which makes iteasy to balance load on component failure
7 increases cost of writes because a write needs to transferblocks to multiple racks
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
Usually (when RF=3) HDFS’s placement policy is to putone replica on one node in the local rack, another on adifferent node in the local rack, and the last on a differentnode in a different rack
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
3 cuts the inter-rack write traffic which generally improveswrite performance
3 chance of rack failure is far less than that of node failure3 does not impact data reliability and availability guarantees3 aggregate network bandwidth used when reading data is
reduced since a block is placed in only two unique racksrather than three
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
7 With this policy, the replicas of a file do not evenlydistribute across the racks
7 One third of replicas are on one node, two thirds of replicasare on one rack, and the other third are evenly distributedacross the remaining racksOverall improves write performance without compromisingdata reliability or read performance
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Placement of Replica Blocks
If 3 < RF, placement of 4th and following replicas aredetermined randomly while keeping the number of replicasper rack below the upper limit (which is (#replicas - 1) /#racks + 2).
P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Replica Selection for Reading
Goal: Minimize global bandwidth consumption (packetcontention) and read latency
HDFS tries to satisfy a read request from a replica that isclosest to the readerIf there exists a replica on the same rack as the readernode, then that replica is preferred to satisfy the readrequestIf HDFS cluster spans multiple data centers, then a replicathat is resident in the local data center is preferred overany remote replica.
To ensure data integrity each block’s checksum is verified onretrieval
P. Healy (University of Limerick) CS6502 Spring 2019–2020 6 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Outline
1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing
2 HDFS and CAPImplications for CAP
P. Healy (University of Limerick) CS6502 Spring 2019–2020 7 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Write Architecture
To write this file into HDFS:HDFS client connects to the NameNode for a WriteRequest against the two blocks, A, BNameNode grants client write permission and provides IPaddresses of destination DataNodes; destinations basedon availability, replication factor and rack awarenessSuppose NameNode provides following lists of IPaddresses to the client (RF=3):
Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP ofDataNode 6}Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP ofDataNode 9}
Each block copied in pipeline fashionP. Healy (University of Limerick) CS6502 Spring 2019–2020 8 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Replication Pipelining
Set up of PipelineBefore writing blocks, client confirms whether theDataNodes, present in each of the list of IPs, are ready toreceive the data or notSo, client creates a pipeline for each of the blocks byconnecting the individual DataNodes in the respective listfor that blockConsidering Block A: list A = {IP of DataNode 1, IP ofDataNode 4, IP of DataNode 6}
P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Replication Pipelining
Data streaming and replication“Pass it on”: Block A is passed to first nominatedDataNode, 1, and is stored thereDataNode 1 then passes the block along to DataNode 4,the second nominated DataNodeIt, lastly, passes it along to DataNode 6
P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13
HDFS ArchitectureHDFS and CAP
Replica Placement / SelectionHDFS Block-Writing
Replication Pipelining
Shutdown of Pipeline (Acknowledgement stage)Reverse of previous operationDataNode 6 acknowledges to DataNode 4 that it wrote theblock safelyIn turn, DataNode 4 acks its predecessorOperation complete when NameNode receives the final ack
P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 13
HDFS ArchitectureHDFS and CAP
Implications for CAP
Outline
1 HDFS ArchitectureReplica Placement / SelectionHDFS Block-Writing
2 HDFS and CAPImplications for CAP
P. Healy (University of Limerick) CS6502 Spring 2019–2020 10 / 13
HDFS ArchitectureHDFS and CAP
Implications for CAP
What is CAP
A distributed file system requires a network, which may fail,resulting in a partition of the networkThe CAP theorem states that it is impossible for adistributed data store to simultaneously provide more thantwo out of the following three guarantees:
1 Consistency equivalent to having a single up-to-date copyof the data
2 high Availability of that data (for updates) but may not bemost current
3 tolerance to network PartitionsWhen a network partition failure happens should we
Cancel the operation thus ensuring consistency butdecreasing the availabilityProceed with the operation thus risking inconsistency butproviding availability
Some people claim this concern is overblownP. Healy (University of Limerick) CS6502 Spring 2019–2020 11 / 13
HDFS ArchitectureHDFS and CAP
Implications for CAP
Hadoop and CAP
HDFS has a unique central decision point, the NameNodeThus it can only fall in the CP side, since taking down thenamenode takes down the entire HDFS system (noAvailability)Hadoop does not try to hide this (from home page):
The NameNode is a Single Point of Failure for the HDFSCluster. HDFS is not currently a High Availability system.When the NameNode goes down, the file system goesoffline.There is an optional SecondaryNameNode that can behosted on a separate machine.It only creates checkpoints of the namespace by mergingthe edits file into the fsimage file and does not provide anyreal redundancy.
P. Healy (University of Limerick) CS6502 Spring 2019–2020 12 / 13
HDFS ArchitectureHDFS and CAP
Implications for CAP
Hadoop and CAP (contd.)
Since decision where to place data and where it can beread from is always handled by the NameNode, whichmaintains a consistent view in memory, HDFS is alwaysconsistent (C)It is also partition-tolerant in that it can handle loosing datanodes, subject to replication factor and data topologystrategies
P. Healy (University of Limerick) CS6502 Spring 2019–2020 13 / 13