Hadoop meet Rex(How to construct hadoop cluster with rex)

Hadoop meet (R)?ex - How to use Rexify for Hadoop cluster construct

Original Rex base image http://rexify.org

2013-08-26

Original Hadoop image http://hadoop.apahce.org

Background

Mission

• I’m not S/W developer any more

• I’m not system engineer

• But, I had to construct hadoop

cluster

– Moreover, in various types...

http://www.gkworld.com/product/GKW49102/Simpsons-Cruel-Fate-Why-Mock-Me-Homer-Magnet-SM130.html

Hadoop is

• The hadoop cluster is consist of many linux

• The hadoop has many configuration files and

parameters

• Besides hadoop, variety S/W of the hadoop eco

system should be installed.

• Except Hadoop & Hadoop eco, many types

S/W should be installed & configured

– Tomcat, apache, DBMS, other develop tools, other

utils/libs…

• And so on …

At first time,

• I have did it manually

– Install & Configure..

– Install & Configure

– ….

Img http://www.construire-en-vendee.fr/la-construction-dune-maison-de-a-a-z-les-fondations.html

Tiresome !!

• It is really tedious & horrible job !!

Img http://cuteoverload.com/2009/08/17/your-story-has-become-tiresome/

Find to other way

• I decide to find other way!!

• I’ve started to survey for other solutions

Img http://www.101-charger.com/wallpapers/21526,jeux,gratuit,pathfinder,7.html

Survey

Variety solutions

• Hadoop Managers

• Provisioning Tools

• Parallas SSH Tools

http://www.cbsnews.com/8301-505125_162-31042083/duke-

research-monkeys-like-humans-want-variety/

Hadoop Managers

Hortonworks Management Center™

Clouder’s CDH™

* Apache Ambari

Provisioning Tools

Fabric(Python)

Parallel SSH Tools

http://dev.naver.com/projects/dist/

https://code.google.com/p/parallel-ssh/

http://sourceforge.net/projects/clusterssh/

Examination(1/3)

• Hadoop Managers

↑ Specialized in the hadoop

↑ Aleardy confirmed

↑ Comportable

↓ Commercial or restrict license

↓ No support other App/libs, excluding Java/Hadoop/Hadoop Eco

Other solutions

• Hadoop Managers

http://bluebuddies.com/Smurfs_Panini_Smurf_Stickers-7.htm

Yes, I’m a greedy

● Simple &

● Powerful &

● No cost &

● Expandable &

● Smart way???

http://plug.hani.co.kr/heihei9999/459415

So, What is?

I have found solution

http://rexify.org/

It is Rex!!

● uses just ssh

● no agent required

● seamless intergration

● no conflicts

● easy to use

● easy to extend

● easy to learn

● can use advanced perl’s power http://swapiinthehouse.blogspot.kr/2012/02/final-term-was-over-

and-let-holiday.html

Rex is

Rex options [onycom@onydev: ~]$rex -h (R)?ex - (Remote)? Execution -b Run batch -e Run the given code fragment -E Execute task on the given environment -H Execute task on these hosts -G Execute task on these group -u Username for the ssh connection -p Password for the ssh connection -P Private Keyfile for the ssh connection -K Public Keyfile for the ssh connection -T List all known tasks. -Tv List all known tasks with all information. -f Use this file instead of Rexfile -h Display this help -M Load Module instead of Rexfile -v Display (R)?ex Version -F Force. Don't regard lock file -s Use sudo for every command -S Password for sudo -d Debug -dd More Debug (includes Profiling Output) -o Output Format -c Turn cache ON -C Turn cache OFF -q Quiet mode. No Logging output -Q Really quiet. Output nothing. -t Number of threads to use

Basic Gramma - Authentication

From>> http://www.slideshare.net/jfried/rex-25172864?from_search=3

Basic Gramma - Server Group

Basic Gramma - Task

Lets get down to the main subject!

Construct hadoop with (R)?ex

This presentaion is ● How to easy install & configure Hadoop

– Not “How to optimize & performance tunning”

● To easy understanding,

– exceptional cases are excluded

● No explain to OS installation

– no discuss about “PXE /kicstart”

● Reduced environment conditions

– ex) security, network, other servers/Apps, …

● I’ll not talk about perl language as possible

– It is no needed

● TMTOWTDI

– Even if it’s not refined, I’ll show variety way as possible

Network

vmaster(Name node/Job Tracker)

L2 switch

Onydev(Provision Server)

vnode0(Data node)

vnode1(Data node)

vnode2(Data node)

vmonitor(Monitoring Server)

Topology [spec]

Machine : 6 ea

(hadoop has just 4 ea)

OS : CentOS 6.4 64bit

Memory : 32GB(NN)

16GB(DN)

CPU : 4 core(i7, 3.5GHz)

Interface : 1G Ethernet

Disk : 250G SDD

1T HDD

※ I’ve configured NN and JT on the same machine

Our hadoop Env. is

● There is one control account

– ‘hadoop-user’

● hadoop & hadoop eco is installed in

‘hadoop-user’ account

Prepare – All machines

● On the each machine,

– same OS version would be installed

(at least, hadoop cluster )

– has own fixed IP address

– can be connect with SSH

– has one more normal user account & it’s sudoers edit work

(just optional)

Prepare – Provision Server(1/2)

● Develop tools & envrionment

– ex: gcc, glib, make/cmake, perl, etc...

● Install Perl modules

– yum install perl-ExtUtil*

– yum install perl-CPAN*

– excute ‘cpan’ command

Prepare – Provision Server(2/2)

● After execute ‘cpan’ command

– cpan 3> install Rex

– You may get fail!!

– This all story is based on the CentOS 6.XX

● So, I recommend ‘perl brew’

– If you want to use more perl power

※In my guess, redhat may dislike perl language

To Install Rex (1/3)

adduser brew-user

passwd brew-user

curl -L http://install.perlbrew.pl | bash

cd /home

chmod 755 brew-user

cd ~brew-user

chmod -R 755 ./perl5

echo "export PERLBREW_ROOT=\"/home/brew-user/perl5/perlbrew\"" >> /home/brew-user/.bashrc

##Append "$PERLBREW_ROOT/bin" to PATH on the .bashrc

source ~brew-user/.bashrc

## In the brew-user account,

perlbrew init

perlbrew available

### Choose recommanded stable perl 5.18.0 (this time is 2013/07/11)

perlbrew install perl-5.18.0

perlbrew switch perl-5.18.0

[brew-user@onydev: ~]$perlbrew switch perl-5.18.0

Use of uninitialized value in split at /loader/0x1f2f458/App/perlbrew.pm line 34.

.........

A sub-shell is launched with perl-5.18.0 as the activated perl. Run 'exit' to finish it.

● cpanm Rex

● cpan

● http://rexify.org/get/

Test for Rex [onycom@onydev: ~]$which rex /home/brew-user/perl5/perlbrew/perls/perl-5.18.0/bin/rex [onycom@onydev: ~]$rex -H localhost -u onycom -p blabla -e "say run 'hostname'" [2013-10-08 15:36:06] INFO - Running task eval-line on localhost [2013-10-08 15:36:06] INFO - Connecting to localhost:22 (onycom) [2013-10-08 15:36:07] INFO - Connected to localhost, trying to authenticate. [2013-10-08 15:36:07] INFO - Successfully authenticated on localhost. onydev [onycom@onydev: ~]$

● Rexfile

● plain text file

/etc/hosts - Provision Server

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ... skip ................. 192.168.2.100 onydev ... skip ................. 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor ~

SSH connection

● Between

Provision server and

other target servers

Hadoop master node

and data nodes

[onycom@onydev: ~]$ ssh-keygen –t rsa

Enter file in which to save the key (/home/onycom/.ssh/id_rsa):

Created directory '/home/onycom/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/onycom/.ssh/id_rsa.

Your public key has been saved in /home/tasha/.ssh/id_rsa.pub.

Prepare SSH public key

Create User

use Rex::Commands::User; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; my $USER = “hadoop-user”; desc "Create user"; task "new_user", group => “all_vm_node”, sub { create_user “$USER", home => "/home/$USER", comment=>"Account for _hadoop", password => "blabla", };

onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> new_user

Setup SSH for user

desc "setup ssh for user"; task "setup_ssh_user", group => “all_vm_node”, sub { run "mkdir /home/$USER/.ssh"; file "/home/$USER/.ssh/authorized_keys", source => "/home/onycom/.ssh/id_rsa.pub", owner => "$USER", group => "$USER", mode => 644; run "chmod 700 /home/$USER/.ssh"; };

onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u hadoop-user -p <pass> setup_ssh_user

※ Ok!! Done. Now you can login to each servers without password Then, do same thing for hadoop NN/DN nodes.

Install packages

parallelism 4; desc "Install packages for java"; task "install_java", group => “all_vm_node”, sub { install package => “java-1.6.*"; };

onycom@onydev: Prov]$ rex -f ./hd-su.Rexfile -u root -p <pass> install_java

• Some packages are should be installed globaly(ex: java, wget, etc) • For the hadoop 1.1.x, java 1.6 is recommanded. • use parallelism keyword (if long time is required)

Install hadoop(1/3)

user "hadoop-user"; private_key "/home/onycom/.ssh/id_rsa"; public_key "/home/onycom/.ssh/id_rsa.pub"; group "hadoop_node" => "vmaster", "vnode[0..2]" ; group "all_vm_node" => "vmaster", "vnode[0..2]", "vmonitor"; desc "prepare_dir"; task "prepare_dir", group=>"hadoop_node", sub { run "mkdir Work"; run "mkdir Download"; run "mkdir src“; run “mkdir tmp”; };

hd1.Rexfile

onycom@onydev: Prov]$ rex -f ./hd1.Rexfile prepare_dir

Install hadoop(2/3)

desc "hadoop 1.1.2 download with wget"; task "get_hadoop", group=>"hadoop_node", sub { my $f = run "wget http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; }; ...skip.... desc "pig 0.11.1 download with wget"; task "get_pig", group=>"hadoop_node", sub { my $f = run "wget http://apache.tt.co.kr/pig/pig-0.11.1/pig-0.11.1.tar.gz", cwd=>"/home/hadoop-user/src"; say $f; };

! hadoop ver. & hadoop eco s/w ver. should be matched This topic is get off the subject on this presentation

Install hadoop(3/3)

my $HADOOP_SRC_DIR = "/home/hadoop-user/src"; desc "unzip hadoop source files"; task "unzip_src",group=>"hadoop_node", sub { run "tar xvfz hadoop-1.1.2.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz hive-0.11.0.tar.gz", cwd=>"$HADOOP_SRC_DIR"; run "tar xvfz pig-0.11.1.tar.gz", cwd=>"$HADOOP_SRC_DIR"; }; desc "make link for hadoop source files"; task "link_src", group=>"hadoop_node", sub { run "ln -s ./hadoop-1.1.2 ./hadoop", cwd=>$HADOOP_SRC_DIR; run "ln -s ./hive-0.11.0 ./hive", cwd=>$HADOOP_SRC_DIR; run "ln -s ./pig-0.11.1 ./pig", cwd=>$HADOOP_SRC_DIR; };

Configuration files(1/3)

● System

– /etc/hosts

● Hadoop(../hadoop/conf)

– masters & slave

– hadoop-env.sh

– hdfs-site.xml

– core-site.xml

– mapred-site.xml

● Hadoop eco systems & other tools

– ex) Ganglia

– ex) Flume – agent/collector/master

– ex) Oozie or flamingo

– Skip these on this PPT.

● User rc file

These are just default & no consider optimization

ProvisionServer

Hadoop NN

HadoopDN 1

HadoopDN n

Hadoop configuration files(../hadoop_conf_repo)

SSH/SCP

※ Of course, this is just my policy

Edit hosts file

my $target_file = “/etc/hosts”; my $host_list =‘<<END’ 192.168.2.51 vmaster 192.168.2.52 vnode0 192.168.2.53 vnode1 192.168.2.54 vnode2 192.168.2.59 vmonitor END desc "Add hosts"; task "add_host", group => “all_vm_node", sub { my $exist_cnt = cat $target_file; my $fh = file_write $target_file; $fh->write( $exist_cnt ); $fh->write($host_list); $fh->close; };

※ You can consider ‘Augeas tool’ to handle system files. Please, refer to ‘Rex::Augeas’ or ‘http://augeas.net’

Setup .bashrc for user(1/2)

... skip ..... my $hadoop_rc=<<'END'; #Hadoop Configuration export JAVA_HOME="/usr/lib/jvm/jre-1.6.0-openjdk.x86_64" export CLASSPATH="$JAVA_HOME/lib:$JAVA_HOME/lib/ext" export HADOOP_USER="/home/hadoop-user" export HADOOP_SRC="$HADOOP_USER/src" export HADOOP_HOME="$HADOOP_USER/hadoop" export PIG_HOME="$HADOOP_SRC/pig" export HIVE_HOME="$HADOOP_SRC/hive" END ... skip .....

Setup .bashrc for user(2/2)

desc "setup hadoop-user's .rc file"; task "setup_rc_def", group=>"hadoop_node", sub { my $fh = file_append ".bashrc"; $fh->write($base_rc); $fh->write($hadoop_rc); $fh->close(); }; desc "setup hadoop master node .rc file"; task "setup_rc_master", "vmaster", sub { my $fh = file_append ".bashrc"; $fh->write($master_rc); $fh->close(); }; .......... skip ............

Configure Hadoop(1/6)

● ‘masters’

[hadoop-user@vmaster: ~]$cd hadoop/conf

[hadoop-user@vmaster: conf]$cat masters

vmaster

● ‘slaves’

[hadoop-user@vmaster: conf]$cat slaves

vnode0

vnode1

vnode2

• hadoop-env.sh ... skip ... The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 #hadoop-user #Remove warring message for "HADOOP_HOME" is deprecated export HADOOP_HOME_WARN_SUPPRESS=TRUE

• hdfs-site.xml ... skip ... <configuration>  <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/home/hadoop-user/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop-user/hdfs/data</value> </property> </configuration>

※ This ‘replication’ value is depend on our env.

• core-site.xml ... skip ... <configuration>  <property> <name>fs.default.name</name> <value>hdfs://vmaster:9000</value> </property> </configuration>

• mapred-site.xml .. skip .. <property> <name>mapred.job.tracker</name> <value>vmaster:9001</value> </property>  <property> <name>mapred.task.timeout</name> <value>1800000</value> <description>The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. </description> </property>

※ This ‘timeout’ value is just depend on our env.

Configure Hadoop(6/6) my $CNF_REPO="hadoop_conf_repo"; ... skip ... my $MAPRED="mapred-site.xml"; task "upload_mapred", group=>"hadoop_node", sub { file "$HD_CNF/$MAPRED", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$MAPRED"; }; my $CORE_SITE="core-site.xml"; task "upload_core", group=>"hadoop_node", sub { file "$HD_CNF/$CORE_SITE", owner => $HADOOP_USER, group => $HADOOP_USER, source => "$CNF_REPO/$CORE_SITE"; }; ... skip ....

Before going any further

● Stop selinux

– If it is enforcing

● modify policy of iptables

– I recommend to stop it while configure working

Lets start hadoop

● login to master node with hadoop-user

– ssh –X hadoop-user@vmaster

● hadoop namenode format

– hadoop namenode format

● execute start script

– ex) start-all.sh

Check hadoop status [hadoop-user@vmaster: ~]$jps -l 22161 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 22260 org.apache.hadoop.mapred.JobTracker 21968 org.apache.hadoop.hdfs.server.namenode.NameNode 27896 sun.tools.jps.Jps

[hadoop-user@vmaster: ~]$hadoop fs -ls / Found 1 items drwxr-xr-x - hadoop-user supergroup 0 2013-10-07 20:33 /tmp

※ It seems to be OK. Really?

But, life is not easy

http://www.trulygraphics.com/tg/weekend/

Check status for all DNs task "show_jps", "vnode[0..2]", sub { say run "hostname"; my $r = run "jps"; say $r; };

[onycom@onydev: Prov]$rex -f ./hd2.Rexfile show_jps vnode0 12682 Jps 12042 TaskTracker 11934 DataNode vnode1 11669 DataNode 11778 TaskTracker 12438 Jps vnode2 11128 DataNode 11237 TaskTracker 11895 Jps

If there is some problem,

http://blog.lib.umn.edu/isss/undergraduate/2011/11/you-do-have-any-tech-problem.html

● Check again

– /etc/hosts

– selinux & iptables

– name & data dir./permission in hdfs

– and so on...

(on the each node)

If you did not meet any problems or fixed those,

Now you have hadoop

https://hadoopworld2011.eventbrite.com/

Automatic MGM/Prov. solution

yonhap

Advnaced Challenge

What more can we do?(1/2)

● add/remove data node

● add/remove storage

● Intergrate with monitoring

– ex: Ganglia/Nagios

● Intergrate with other hadoop eco

– Flume, flamingo, Oozie

● Intergrate other device or server

– ex: Switch, DB server

What more can we do?(2/2)

● sophisticated hadoop paramer control

– ex: use XML parsing

● workflow control & batch

● backup

● periodic file system management

– ex: log files

● web GUI

● make a framework for your purpose

• http://hadoop.apache.org/

• http://pig.apache.org/

• http://hive.apache.org/

• http://confluence.openflamingo.org

• http://www.openankus.org

• http://www.rexify.org

• https://groups.google.com/forum/#!forum/rex-users

• http://modules.rexify.org/search?q=hadoop

http://www.projects2crowdfund.com/what-can-i-do-with-crowdfunding/

Thanks junkim@onycom.com

/ rainmk6@gmail.com

Hadoop meet Rex(How to construct hadoop cluster with rex)

Technology

Hadoop virtualization extensions hadoop world meetup

Hadoop , Hadoop , Hadoop !!!

Drumbeat Rex & Barack 13-07-18 Rex, Lies & Videotape

Why use Hadoop?, Challenges / Learning Hadoop & Average Salary of Hadoop Professional

Rex Bellator

REX/REX F/REX K/REX K F REX DUAL/REX DUAL F · rex/rex f/rex k/rex k f rex dual/rex dual f ... 2.4 rex dual/rex ... 191 10,5 13,5 14,0 8 5 105 216 230 50 ip40 20 xxxx rex 8 rex k

Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)

Indoor Communications Rex Chen rex@ics.uci.edu Ubiquitous Computing - Winter 2007

Hadoop Online Tutorials - indiatrainings.in · Menu Search Hadoop Online Tutorials Author REPLY #1825 Hadoop Eco System › Forums › Hadoop Discussion Forum › 250 Hadoop Interview

2. Hadoop - lsd.ls.fi.upm.eslsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/Hadoop_… · Hadoop Hadoop Software Ecosystem Hadoop MapReduce Hadoop Distributed File System

Hadoop Deployment Manual - Hyadespleiades.ucsc.edu/doc/bright/hadoop-deployment-manual.pdf2.2 Ncurses Installation Of Hadoop Using cm-hadoop-setup ... •The Hadoop Deployment Manual

Oedipus Rex

Introduction to Hadoop and Hadoop component

Continuous Delivery for Linux/Windows/Hadoop...Beta Cluster Hadoop JobTracker Jenkins Slave Hadoop node Hadoop node Hadoop node Hadoop node Slave Node Gateway Prod. Cluster PigServer

Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University

Hadoop Training #4: Programming with Hadoop

Http://amirenglishclub.synthasite.com/. T. REX Tyrannosaurus Rex Tyranno = Tyrant Saurus = Lizard Rex = King

T-Rex By: Matthew Werner. What is the name? Tyrannosaurus rex is first name T-Rex for short

REX Meter Technical Manual - FCC ID · Error codes ... Contact Elster Electricity for information REX Meter ... REX Meter ® REX Meter Technical Manual