www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives
At the end of this module, you will be able to
Understand Hadoop fully distributed cluster setup with two nodes
Rack awareness
Understand Active NameNode Failure and how passive takes over
Hadoop 2.x Cluster Architecture – Federation
Hadoop Admin Responsibilities
www.edureka.co/hadoop-adminSlide 3
Hadoop Cluster: A Typical Use Case
NameNode Secondary NameNode
DataNode
RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 X 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 X 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
RAM: 16GBHard disk: 6 X 2TBProcessor: Xenon with 2 cores.Ethernet: 3 X 10 GB/sOS: 64-bit CentOS
DataNode
RAM: 16GBHard disk: 6 X 2TBProcessor: Xenon with 2 cores.Ethernet: 3 X 10 GB/sOS: 64-bit CentOS
www.edureka.co/hadoop-adminSlide 4
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately5TB per week
HDFS set up to replicate eachblock three times
Thus, 15TB of extra storagespace required per week
Assuming machines with 5x3TBhard drives, equating to a newmachine required each week
Assume Overheads to be 30%
www.edureka.co/hadoop-adminSlide 5
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
General ( Depends on requirement ‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration
» Do not use RAID!» 2 x Quad-core CPUs» 24 -32GB RAM» Gigabit Ethernet
General Configuration
Multiples of ( 1 hard drive + 2 cores+ 6-8GB RAM) generally work wellfor many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 6
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce taskwill take 1GB to 2GB of RAM
Slave nodes should not be usingvirtual memory
RULE OF THUMB!Total number of tasks = 1.5 x numberof processor core
Ensure enough RAM is present torun all tasks, plus the DataNode,TaskTracker daemons, plus theoperating system
www.edureka.co/hadoop-adminSlide 7
Master Node Hardware Recommendations
Carrier-class hardware (Not commodity hardware)
Dual power supplies
Dual Ethernet cards(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master Node
Requires
www.edureka.co/hadoop-adminSlide 8
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
No daemons, everything runs in a single JVM Suitable for running MapReduce programs during development Has no DFS
Hadoop daemons run on the local machine
Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 9
Configuration Files
ConfigurationFilenames
Description of Log Files
hadoop-env.shyarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xmlConfiguration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
www.edureka.co/hadoop-adminSlide 10
Configuration Files (Contd.)
Deprecated Property Name New Property Name
dfs.data.dir dfs.datanode.data.dir
dfs.http.address dfs.namenode.http-address
fs.default.name fs.defaultFS
The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties have been added and many have been deprecated
For example: ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
In Hadoop 2.2.0 release, you can use either the old or the new properties
The old property names are now deprecated, but still work!
www.edureka.co/hadoop-adminSlide 13
Hadoop 2.x Cluster Architecture - Federation
Nam
esp
ace NN-1 NN-k NN-n
Common Storage
Blo
ckSto
rage
… …
Hadoop 2.0
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
NS1 NSk NSn
Datanode 1…
Datanode m…
Datanode 2
…
Pool 1 Pool k Pool n
Block Pools
www.edureka.co/hadoop-adminSlide 14
Hadoop Admin Responsibilities
Responsible for implementation and administration of Hadoop infrastructure.
Testing HDFS, Hive, Pig and MapReduce access for Applications.
Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.
Performance tuning and Capacity planning for Clusters.
Monitor Hadoop cluster and deploy security.
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?
Questions
www.edureka.co/hadoop-adminSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.co/hadoop-adminSlide 18
Course Topics
Module 1 » Hadoop Cluster Administration
Module 2» Hadoop Architecture and Cluster setup
Module 3 » Hadoop Cluster: Planning and Managing
Module 4 » Backup, Recovery and Maintenance
Module 5 » Hadoop 2.0 and High Availability
Module 6» Advanced Topics: QJM, HDFS Federation and
Security
Module 7» Oozie, Hcatalog/Hive and HBase Administration
Module 8» Project: Hadoop Implementation