Starting small on Hadoop.. - · PDF fileStarting small on Hadoop.. Cluster Installation This has 4 parts: 1. Cluster Planning. 2. OS installation. 3. Cluster Software Installation

StartingsmallonHadoop..

Cluster Installation This has 4 parts:

1. Cluster Planning. 2. OS installation. 3. Cluster Software Installation. 4. Cluster configuration.

1. Cluster Planning we plan to have the configuration as below:

NameNode Dedicated servers. OS is installed on the RAID1 device. The dfs.namedir will reside on the same RAID1 device. One more copy is configured to have on NFS.

Secondary NameNode Dedicated Server OS is installed on RAID1 device.

Job Tracker Dedicated Server. OS installed on JBOD configuration

DataNodes/TaskTrackers Individual servers. OS installed on JBOD configuration

For Apache Hadoop distribution installation we will be using the aphdp:aphdp as the user and group.

2. OS Installation and Configuration We will be installing the Hadoop on the RHEL 64-bit servers. Currently we are using RHEL 5.5.

For the Hadoop purpose we may not require the full blown RedHat. We will be using the RH kickstart to deploy the servers in the cluster.

Here is the kickstart file used to install the hadoop servers:

# Kickstart file automatically generated by anaconda. install nfs --server=10.1.33.188 --dir=/vol/unixdb/RedHat/RHEL5.5_64 key --skip lang en_US.UTF-8

keyboard us xconfig --startxonboot #network --device eth0 --bootproto static --ip 10.1.xx.xxx --netmask 255.255.255.0 --gateway 10.1.xx.1 --nameserver 10.1.xx.xx,10.1.xx.xx --hostname psrha4nn1.domain.com rootpw --iscrypted encryptedpasswd firewall --disabled authconfig --enableshadow --enablemd5 selinux --disabled timezone --utc America/Los_Angeles bootloader --location=mbr --driveorder=sda,sdb --append="rhgb quiet" # The following is the partition information you requested # Note that any partitions you deleted are not expressed # here so unless you clear all partitions first, this is # not guaranteed to work clearpart --all --initlabel #part /boot --size=1024 --fstype ext3 --ondisk sda #part pv.3 --size=0 --grow --ondisk sda #volgroup VolGroup00 --pesize=32768 pv.3 #logvol swap --fstype=swap --name=swapvol --vgname=VolGroup00 --size=65536 #logvol / --fstype ext3 --name=LogVol00 --vgname=VolGroup00 --size=20480 #logvol /var --fstype ext3 --name=LogVol02 --vgname=VolGroup00 --size=10240 #logvol /tmp --fstype ext3 --name=LogVol01 --vgname=VolGroup00 --size=10240 part /boot --size=1024 --fstype ext3 --ondisk cciss/c0d0 part pv.3 --size=0 --grow --ondisk cciss/c0d0 volgroup VolGroup00 --pesize=32768 pv.3 logvol swap --fstype=swap --name=swapvol --vgname=VolGroup00 --size=65536 logvol / --fstype ext3 --name=LogVol00 --vgname=VolGroup00 --size=20480 logvol /var --fstype ext3 --name=LogVol02 --vgname=VolGroup00 --size=10240 logvol /tmp --fstype ext3 --name=LogVol01 --vgname=VolGroup00 --size=10240 %packages --ignoremissing --resolvedeps @admin-tools @base @core @development-libs @development-tools @editors @ftp-server @java @java-development

@legacy-software-development @legacy-software-support @network-server @printing @ruby @server-cfg @system-tools @text-internet @web-server @base-x kexec-tools fipscheck device-mapper-multipath sgpio perl-Convert-ASN1 python-dmidecode imake emacs vnc-server dnsmasq audit xorg-x11-utils xorg-x11-server-Xnest xorg-x11-server-Xvfb %post --nochroot perl -i -pe 's/id\:5\:initdefault\:/id\:3\:initdefault\:/g' /etc/inittab

Note: The network portion of the kickstart file is to commented out here. This is intentional as we will be using the same kickstart file for all the nodes.

The servers will be placed in the init level 3. GUI is not installed to make more room for hadoop processing. (/etc/inittab -> id:3:initdefault: )

NTP: NTP needs to be configured on all the hadoop cluster servers)

SSH: For all the cluster nodes the ssh trust for the root account need to be established. The easy way is to create the rsa and dsa keys on the first node; copy the .pub files content to authorized_keys on the same server; rsync the .ssh folder to all other nodes in the cluster.

vm.swappiness: We need to set-up this as low as possible. We have set-this to 5 (the default is 30).

3. Cluster Software installation

The required cluster software has been downloaded and staged for this purpose: /hadoop/

Apache installation: Java: Hadoop needs Java to run. Install the latest Java on all the cluster nodes. Latest Java for hadoop had been downloaded and kept in : /hadoop/ jdk-6u26-linux-x86_64-rpm.bin The username used to install and administer the apache hadoop is - aphdp:aphdp with the homedir under /export/home/aphdp uid: 601, gid:601. The same username with the same uid and gid needs to be created across the cluster.

SSH Trust: SSH trust is created for the aphdp username across the cluster nodes. Create the ssh keys on one server, copy the .pub files contents to authorized_keys on the same server and rsync the contents of the ~/.ssh dir of aphdp across the cluster.

We have decided to use: hadoop-0.20.2 The software will be installed under /hadoop/apache on each server. Create a volume to install hadoop on every node of the cluster: Execute the following snippet on every node of the cluster. The hadoop install volume is created on the OS volume group:

lvcreate -L 10G -n hadoopvol VolGroup00 mkfs.ext3 /dev/VolGroup00/hadoopvol mkdir /hadoop echo "/dev/VolGroup00/hadoopvol /hadoop ext3 defaults 0 2" >> /etc/fstab mount /hadoop mkdir /hadoop/apache chown aphdp:aphdp /hadoop/apache

Copy the /vol/unixdb/hadoop/hadoop-0.20.2.tar.gz to /hadoop/apache directory on each node of the cluster and untar the package. The package will be extracted under the directory: /hadoop/apache/hadoop-0.20.2 This is our HADOOP_HOME. The same env variable is set and used in the hadoop configuration. This completes the Apache hadoop Software installation. Note: We have plans to use the same servers for the cloudera and emc distributions. Those will be installed under /hadoop/cloudera/ and /hadoop/emc

4. Cluster Configuration We have decided to have Hadoop clusters with the NameNode, Secondary NameNode, Job Tracker, Data Nodes/TaskTrackers in different servers.

Name Node Stores the HDFS Metadata. Needs at least 32 GB of RAM.

Secondary Name Node Used as a checkpoint node for Name Node Job Tracker For Map Reduce Jobs Tracker Data Node and Task Trackers Actual data nodes and the task execution Nodes.

Note: All the HDFS config is done using the aphdp user. All the daemons start-up and shutdown are done using the aphdp user. All the logfiles dirs, data storage dirs. are assigned the aphdp user ownerships.

Logs: All the nodes in the hadoop cluster produce lot of amount of logging data.

All the logs on all the cluster servers will be stored in /hadooplogs dir. The same is configured in the cluster Configuration.

Create the Hadoop logs volume on all the nodes in the cluster: lvcreate -L 100G -n hadooplogsvol VolGroup00 mkfs.ext3 /dev/VolGroup00/hadooplogsvol mkdir -p /hadooplogs echo "/dev/VolGroup00/hadooplogsvol /hadooplogs ext3 defaults 0 2" >> /etc/fstab mount /hadooplogs chown aphdp:aphdp /hadooplogs

Hard Drives and HDFS storage: HDFS is a userlevel FS implemented in Java. This is not a in-kernel FS. HDFS stores the data on underlying native FSs. In our case it is ext3. All the HDDs except the first one will be used for the HDFS data storage. The first one is used to store the OS, Hadoop install and Hadoop logs. For HDFS we need the FSs mounted with noatime option (inode access time update is not needed and not recommended for the HDFS configurations). All the HDFS paths are mounted as : /dfs/<hdnumber>

Eg: /dfs/1, /dfs/2, … We will be using the path /dfs/<hdnumber>/apache for the apache HDFS configuration. This was done based on the requirement that the R&D going to use the same servers and the same HDDs for other distributions of hadoop.

Eg: /dfs/1/apache, /dfs/2/apache, …

Here is a small piece of script to create the volumes for the HDFS use. Run this on all the nodes of the cluster:

i=1 for disk in `fdisk -l | grep Disk | grep -v c0d0 | awk '{ print $2 }' | cut –d ":" -f1` do mkfs.ext3 $disk mkdir -p /dfs/${i} echo "$disk /dfs/${i} ext3 rw,noatime 0 2" >> /etc/fstab mount /dfs/${i} mkdir /dfs/${i}/apache chown aphdp:aphdp /dfs/${i}/apache i=`expr $i + 1` done

Name Node and Secondary Name Node: NameNode is heart and brain of HDFS. This is a single point of failure for HDFS. Make sure NameNode is fully redundant by all means. RAID10, dual-PS. Name Node Stores the metadata in a set of defined directories called Namedirs. These should also reside on a RAID10 device. The Namedirs should be as much as the size of RAM on this server. We use at least 3 dirs to store the metadata. All these dirs. are redundant and are needed. Secondary Name Node is a checkpoint node. It is used as a checkpoint node for the NameNode metadata. SNN should have same size RAM as NN. SNN uses a configured checkpoint dir to do the checkpointing operation. The check point dir should also have the same size space as we configured in the NN. Here is a piece of script to create the NameNode Namedirs, unlike other scripts this should only be run on the NameNode:

lvcreate -L 64G -n apnamedirvol VolGroup00 mkfs.ext3 /dev/VolGroup00/apnamedirvol mkdir -p /dfs/apachenamedir mkdir -p /dfs/1/apachenamedir mkdir -p /dfs/1/apachenamedir echo "/dev/VolGroup00/apnamedirvol /dfs/apachenamedir ext3 defaults 0 2" >> /etc/fstab mount /dfs/apachenamedir chown aphdp:aphdp /dfs/namedir/apache /dfs/1/apachenamedir /dfs/2/apachenamedir

Here is a piece of script to create the Secondary NameNode Namedirs, unlike other scripts this should only be run on the Secondary NameNode:

lvcreate -L 64G -n apchkptdirvol VolGroup00

mkfs.ext3 /dev/VolGroup00/ apchkptdirvol mkdir -p /dfs/ apachechechkptdir echo "/dev/VolGroup00/ apchkptdirvol /dfs/apachechechkptdir ext3 defaults 0 2" >> /etc/fstab mount /dfs/ apachechechkptdir chown aphdp:aphdp /dfs/ apachechechkptdir

So far we have prepared the OS and HDDs to configure Hadoop. The following things will actually configure the Hadoop Daemons. Hadoop Environment for Daemons: Set the HADOOP_HOME and PATH environment variables for the aphdp user in ~aphdp/.bash_profile:

HADOOP_HOME=/hadoop/apache/hadoop-0.20.2 export HADOOP_HOME PATH=/usr/java/jdk1.6.0_26/bin:$HADOOP_HOME/bin:$HOME/bin:$PATH export PATH

Set Java,Hadoop logs, and Hadoop PIDs variables in the hadoop-env.sh file in the Hadoop conf directory: $HADOOP_HOME/conf/hadoop-env.sh (Here is the absolute path: /hadoop/apache/hadoop-0.20.2/conf/ hadoop-env.sh)

export JAVA_HOME=/usr/java/jdk1.6.0_26

export HADOOP_LOG_DIR=/hadooplogs

export HADOOP_PID_DIR=/hadooplogs/pids

HDFS Daemons configuration: Here is the HDFS high level arch diagram

NameNode – has all the metadata stored in Namedirs. Secondary NN – used to do a checkpoint operation on the stored NameNode metadata. DataNodes – Stores the actual data. They send/receive the metadata information from/to NameNode. $HADOOP_HOME/conf/core-site.xml and $HADOOP_HOME/conf/hdfs-site.xml contain the parameters for the HDFS daemons. $HADOOP_HOME/conf/core-site.xml (/hadoop/apache/hadoop-0.20.2/conf/core-site.xml) params: This file needs to be configured and pushed to all the nodes in the cluster.

fs.default.name hdfs://namenode:8020 this is the param the datanode daemons look and send the heart-beat to.

fs.checkpoint.dir /dfs/chechkptdir this is the dir on the SNN where the checkpoint operation will happen.

topology.script.filename /hadoop/apache/hadoop-0.20.2/rackaware/rascript.sh

this is the shell script absolute path that actualy decides the rack awareness in the cluster

topology.script.name - this is the shell script absolute path that actualy decides the rack awareness in the cluster:

#!/bin/sh /bin/basename `/bin/grep -w $1 /hadoop/apache/hadoop-0.20.2/rackaware/rack* | /b in/cut -d":" -f1` #Note: For this script the rack<n> files are pre-populated manually to make the logical racking of the nodes. This script accepts the hostname as the input param and returns the rack number.

$HADOOP_HOME/conf/hdfs-site.xml (/hadoop/apache/hadoop-0.20.2/conf/hdfs-site.xml) params: This file needs to be configured and pushed to all the nodes in the cluster. dfs.block.size 134217728 The min block size, which is set to 128 MB dfs.data.dir /dfs/1/apache,/dfs/2/apache

, …, /dfs/<n>/apache The list of dirs. on DataNode where actual HDFS data is stored. This param needs to be configured per machine depending upon the local disks and the filesysstems we configured prior.

dfs.name.dir /dfs/apachenamedir,/dfs/1/ This is the list of dirs. on the NameNode

apachenamedir,/dfs/2/apachenamedir

(only) where it stores the metadata information about HDFS. All the dirs. have the same data. They are all backups of the same copy. The best practice is to have one nfs mount in this list.

dfs.backup.address Secondaynamenode Host name of the Secondary NN. This is how the NN knows which is the Secondary NN.

dfs.secondary.http.address

secondarynamenode:50090 This param is needed. The NN will send the metadata image to the SNN for the checkpoint operation on this address:port (http)

dfs.http.address namenode:50070 This param is needed. The SNN will send back the metadata image to the NN after the checkpoint operation is done to this address:port (http)

MapReduce Daemons Configuration: Here is the high level MapReduce arch diagram:

MapReduce is the processing part of the Hadoop. Job Tracker – Accepts the Jobs and divides the Jobs into Tasks (Maps and Reduce) and disposes them to the TaskTrackers. Task Trackers – Accepts the Tasks from the Job Tracker and executes them on the individual machines, usually they run on the data stored on them. $HADOOP_HOME/conf/mapred-site.xml contains the parameters for the MapReduce daemons. $HADOOP_HOME/conf/mapred-site.xml (/hadoop/apache/hadoop-0.20.2/conf/mapred-site.xml) params: This file needs to be configured and pushed to all the nodes in the cluster. mapred.child.java.opts -Xmx2048M This is the Java Heap size for the java

programs to be executed on the nodes. mapred.job.tracker psrhaqajt1:8021 This is the Job Tracker address which needs

to be known to the Task Trackers. The TTs heartbeat to the JT and the JT will dispose the tasks to TTs.

mapred.jobtracker.taskScheduler

org.apache.hadoop.mapred.FairScheduler

This is the scheduler to be used for the jobs to be executed.

mapred.fairscheduler.allocation.file

/hadoop/apache/hadoop-0.20.2/conf/allocations.xml

An allocations file for the scheduler specified above. This is required for the FairShare Scheduler. By default this file doesn’t exist. We need to create one.

mapred.tasktracker.map.tasks.maximum

No. cores/1.5 This need to be set based on the number of cores on the server. This is a per node/server config variable.

mapred.tasktracker.reducetasks.maximum

No. cores/1.5 This need to be set based on the number of cores on the server. This is a per node/server config variable

The default scheduler for Hadoop is RoundRobin. When we select the FairShareScheduler as the task scheduler we need to copy the Scheduler jar file from the $HADOOP_HOME/contrib/fairscheduler directory to the $HADOOP_HOME/lib directory:

# cp –p $HADOOP_HOME/contrib/fairscheduler/* $HADOOP_HOME/lib/ For fair scheduler we need to have at least an empty allocations file as specified in the mapred.fairscheduler.allocation.file (as mentioned in the above table). Here is the empty allocations file:

<?xml version="1.0"?> <allocations> </allocations>

Masters and Slaves: There is NO concept of masters-slaves in the hadoop architecture. But there are a couple of files masters and slaves in the $HADOOP_HOME/conf directory. These are used by the start and stop scripts. HDFS and MapReduce have a different sets of masters and slaves. For HDFS masters slaves Namenode SecondaryNameNode

Datanode1 DN2 DN3 . . DNn

The HDFS set of masters and slaves files make sense only on the NameNode. When we run the $HADOOP_HOME/start-dfs.sh script, the respective daemons will be started on the defined nodes as the SSH trust was established. For MapReduce masters Slaves JobTracker TaskTracker1

TT2 TT3 . . TTn

The MapReduce set of masters and slaves files make sense only on the JobTracker node. When we run the $HADOOP_HOME/start-mapred.sh script, the respective daemons will be started on the defined nodes as the SSH trust was established. Statrt/Stop Hadoop Daemons: HDFS and MapReduce daemons are independent. They can run independently. HDFS is data storage. MapReduce is processing on that stored data. HDFS Daemons start-up/shutdown : $HADOOP_HOME/bin/start-dfs.sh (/hadoop/apache/hadoop-0.20.2/bin/start-dfs.sh)-> This script needs to be run on the NN with the user aphdp. Starts the NN and SNN daemons on the hosts defined in the masters file and as per the config defined in the core-site.xml and in the hdfs-site.xml. Starts the DN daemons on the hosts defined in the slaves file. This script uses the ssh trust between the hosts. $HADOOP_HOME/bin/stop-dfs.sh (/hadoop/apache/hadoop-0.20.2/bin/stop-dfs.sh)-> This script needs to be run on the NN with the user aphdp. Stops the NN, SNN and DN daemons on all the nodes defined in masters and slaves files. MapReduce Daemons start-up/shutdown : $HADOOP_HOME/bin/start-mapred.sh (/hadoop/apache/hadoop-0.20.2/bin/start-mapred.sh)-> This script needs to be run on the NN with the user aphdp. Starts the JT daemon on the host defined in the masters file and as per the config defined in the mapred-site.xml. Starts the TT daemons on the hosts defined in the slaves file. This script uses the ssh trust between the hosts. $HADOOP_HOME/bin/stop-mapred.sh (/hadoop/apache/hadoop-0.20.2/bin/stop-mapred.sh)-> This script needs to be run on the NN with the user aphdp. Stops the JT and TT daemons on all the nodes defined in masters and slaves files.

Here is a high level daemons/servers arch when both the HDFS and Mapreduce daemons run:

Note: The plan is to have the NN, SNN and JT to be run on dedicated nodes. As we are having limited hardware for hadoop, we configured the NN/SNN/JT nodes to run the DN/TT daemons as well. Please note that in this document the NN/SNN/DN/JT/TT terms are used interchangeably to refer nodes as well as the daemons running the respective nodes. The meaning needs to be understood based on the context.

4-node cluster set-up: This has been set-up with 4 nodes. We have used the rack-awareness script to divide the cluster into logical racks. All the nodes in the cluster run the DN/TT daemons. NN/SNN/JT run on individual servers. The no. of tasks to be executed by TT on the NN/SNN/JT were reduced manually as they are used for other main purposes.

8-node cluster set-up: This has been set-up with 8 nodes. We have used the rack-awareness script to divide the cluster into logical racks. All the nodes in the cluster run the DN/TT daemons. NN/SNN/JT run on individual servers. The no. of tasks to be executed by TT on the NN/SNN/JT were reduced manually as they are used for other main purposes.

Documents

Starting small on Hadoop.. - · PDF fileStarting small on Hadoop.. Cluster Installation This has 4 parts: 1. Cluster Planning. 2. OS installation. 3. Cluster Software Installation