19
Administer Hadoop Cluster View Hadoop Administration Course at www.edureka.co/hadoop-admin

Administer Hadoop Cluster

  • Upload
    edureka

  • View
    232

  • Download
    4

Embed Size (px)

Citation preview

Administer Hadoop Cluster

View Hadoop Administration Course at www.edureka.co/hadoop-admin

www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Objectives

At the end of this module, you will be able to

Understand Hadoop fully distributed cluster setup with two nodes

Rack awareness

Understand Active NameNode Failure and how passive takes over

Hadoop 2.x Cluster Architecture – Federation

Hadoop Admin Responsibilities

www.edureka.co/hadoop-adminSlide 3

Hadoop Cluster: A Typical Use Case

NameNode Secondary NameNode

DataNode

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 X 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 X 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

RAM: 16GBHard disk: 6 X 2TBProcessor: Xenon with 2 cores.Ethernet: 3 X 10 GB/sOS: 64-bit CentOS

DataNode

RAM: 16GBHard disk: 6 X 2TBProcessor: Xenon with 2 cores.Ethernet: 3 X 10 GB/sOS: 64-bit CentOS

www.edureka.co/hadoop-adminSlide 4

Seeking cluster growth on storage capacity is often a good method to use!

Cluster Growth Based On Storage Capacity

Data grows by approximately5TB per week

HDFS set up to replicate eachblock three times

Thus, 15TB of extra storagespace required per week

Assuming machines with 5x3TBhard drives, equating to a newmachine required each week

Assume Overheads to be 30%

www.edureka.co/hadoop-adminSlide 5

Slave Nodes: Recommended Configuration

Higher-performance vs lower performance components

Save the Money, Buy more Nodes!

General ( Depends on requirement ‘base’ configuration for a slave Node

» 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration

» Do not use RAID!» 2 x Quad-core CPUs» 24 -32GB RAM» Gigabit Ethernet

General Configuration

Multiples of ( 1 hard drive + 2 cores+ 6-8GB RAM) generally work wellfor many types of applications

Special Configuration

Slave Nodes

“A cluster with more nodes performs better than one with fewer, slightly faster nodes”

www.edureka.co/hadoop-adminSlide 6

Slave Nodes: More Details (RAM)

Slave Nodes (RAM)

Generally each Map or Reduce taskwill take 1GB to 2GB of RAM

Slave nodes should not be usingvirtual memory

RULE OF THUMB!Total number of tasks = 1.5 x numberof processor core

Ensure enough RAM is present torun all tasks, plus the DataNode,TaskTracker daemons, plus theoperating system

www.edureka.co/hadoop-adminSlide 7

Master Node Hardware Recommendations

Carrier-class hardware (Not commodity hardware)

Dual power supplies

Dual Ethernet cards(Bonded to provide failover)

Raided hard drives

At least 32GB of RAM

Master Node

Requires

www.edureka.co/hadoop-adminSlide 8

Hadoop Cluster Modes

Hadoop can run in any of the following three modes:

Fully-Distributed Mode

Pseudo-Distributed Mode

No daemons, everything runs in a single JVM Suitable for running MapReduce programs during development Has no DFS

Hadoop daemons run on the local machine

Hadoop daemons run on a cluster of machines

Standalone (or Local) Mode

www.edureka.co/hadoop-adminSlide 9

Configuration Files

ConfigurationFilenames

Description of Log Files

hadoop-env.shyarn-env.sh

Settings for Hadoop Daemon’s process environment.

core-site.xmlConfiguration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN.

hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.

yarn-site.xml Configuration setting for Resource Manager and Node Manager.

mapred-site.xml Configuration settings for MapReduce Applications.

slaves A list of machines (one per line) that each run DataNode and Node Manager.

www.edureka.co/hadoop-adminSlide 10

Configuration Files (Contd.)

Deprecated Property Name New Property Name

dfs.data.dir dfs.datanode.data.dir

dfs.http.address dfs.namenode.http-address

fs.default.name fs.defaultFS

The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties have been added and many have been deprecated

For example: ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

In Hadoop 2.2.0 release, you can use either the old or the new properties

The old property names are now deprecated, but still work!

www.edureka.co/hadoop-adminSlide 11

Replication and Rack Awareness

www.edureka.co/hadoop-adminSlide 12

NameNode HA

www.edureka.co/hadoop-adminSlide 13

Hadoop 2.x Cluster Architecture - Federation

Nam

esp

ace NN-1 NN-k NN-n

Common Storage

Blo

ckSto

rage

… …

Hadoop 2.0

http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html

NS1 NSk NSn

Datanode 1…

Datanode m…

Datanode 2

Pool 1 Pool k Pool n

Block Pools

www.edureka.co/hadoop-adminSlide 14

Hadoop Admin Responsibilities

Responsible for implementation and administration of Hadoop infrastructure.

Testing HDFS, Hive, Pig and MapReduce access for Applications.

Cluster maintenance tasks like Backup, Recovery, Upgrade, Patching.

Performance tuning and Capacity planning for Clusters.

Monitor Hadoop cluster and deploy security.

www.edureka.co/hadoop-adminSlide 15

DEMO

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

www.edureka.co/hadoop-adminSlide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

How it Works?

Questions

www.edureka.co/hadoop-adminSlide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

www.edureka.co/hadoop-adminSlide 18

Course Topics

Module 1 » Hadoop Cluster Administration

Module 2» Hadoop Architecture and Cluster setup

Module 3 » Hadoop Cluster: Planning and Managing

Module 4 » Backup, Recovery and Maintenance

Module 5 » Hadoop 2.0 and High Availability

Module 6» Advanced Topics: QJM, HDFS Federation and

Security

Module 7» Oozie, Hcatalog/Hive and HBase Administration

Module 8» Project: Hadoop Implementation