An example Apache Hadoop Yarn upgrade

Apache Yarn Upgrade

Example upgrade

From V1 -> Yarn

Environment

Approach

Install steps

Install check

[email protected]

Yarn Upgrade Environment

Java OpenJDK 1.6.0_27

Ubuntu 12.04

Maven 3.0.4

Hadoop 1.2.0

Mahout 0.9

Hadoop to install

2.0.6-alpha

Full details are available from our web site site

under guides folder

[email protected]

Yarn Upgrade Approach

Install along side existing Hadoop on all nodes

Use existing hdfs

Change cfg files on all nodes

Set up as single nodes and test via mapreduce

Create cluster and test via mapreduce

Check web GUI access

Full details are available from our web site site

under guides folder

[email protected]

Yarn Upgrade Install

Build with Maven into a distribution directory

mvn clean package -Pdist -Dtar -DskipTests -Pnative

release created under ./hadoop-dist/target/hadoop-2.0.6-alpha

Only skip tests after first build to speed things up

Configure $HOME/.bashrc

HADOOP_COMMON_HOME

HADOOP_HDFS_HOME

HADOOP_MAPRED_HOME

HADOOP_YARN_HOME

HADOOP_CONF_DIR

YARN_CONF_DIR

MAPRED_CONF_DIR

HADOOP_PREFIX

PATH

YARN_CLASSPATH

[email protected]


Set up core-site.xml

cd $HADOOP_COMMON_HOME/etc/hadoop

Alter values for

fs.default.name

hadoop.tmp.dir

fs.checkpoint.dir

[email protected]


Set up hdfs-site.xml

cd $HADOOP_HDFS_HOME/etc/hadoop

Alter values for

dfs.name.dir

dfs.data.dir

dfs.http.address

dfs.secondary.http.address

dfs.https.address

[email protected]


Set up yarn-site.xml

cd $YARN_CONF_DIR

Alter values for

yarn.resourcemanager.resource-tracker.address

yarn.resourcemanager.scheduler.address

yarn.resourcemanager.scheduler.class

yarn.resourcemanager.address

yarn.nodemanager.local-dirs

yarn.nodemanager.address

yarn.nodemanager.resource.memory-mb

yarn.nodemanager.remote-app-log-dir

yarn.nodemanager.log-dirs

yarn.nodemanager.aux-services

yarn.web-proxy.address

[email protected]


Set up mapred-site.xml

cd $MAPRED_CONF_DIR

Alter values for

mapreduce.cluster.temp.dir

mapreduce.cluster.local.dir

mapreduce.jobhistory.address

mapreduce.jobhistory.webapp.address

[email protected]


Set up capcity-scheduler.xml

cd $HADOOP_YARN_HOME/etc/hadoop

Alter values for

yarn.scheduler.capacity.maximum-applications

yarn.scheduler.capacity.maximum-am-resource-percent

yarn.scheduler.capacity.resource-calculator

yarn.scheduler.capacity.root.queues

yarn.scheduler.capacity.child.queues

yarn.scheduler.capacity.child.unfunded.capacity

yarn.scheduler.capacity.child.default.capacity

yarn.scheduler.capacity.root.capacity

yarn.scheduler.capacity.root.unfunded.capacity

yarn.scheduler.capacity.root.default.capacity

yarn.scheduler.capacity.root.default.user-limit-factor

yarn.scheduler.capacity.root.default.maximum-capacity

yarn.scheduler.capacity.root.default.state

yarn.scheduler.capacity.root.default.acl_submit_applications

yarn.scheduler.capacity.root.default.acl_administer_queue

yarn.scheduler.capacity.node-locality-delay

[email protected]


Start Resource Manager

cd $HADOOP_YARN_HOME

sbin/yarn-deamon.sh start resourcemanager

Start Node Manager


sbin/yarn-deamon.sh start ndemanager

Test via map reduce job

cd $HADOOP_MAPRED_HOME/share/hadoop/mapreduce

$HADOOP_COMMON_HOME/bin/hadoop jar \

hadoop-mapreduce-examples-2.0.6-alpha.jar randomwriter out

[email protected]


Mapreduce job should end with

BYTES_WRITTEN=1073750341

RECORDS_WRITTEN=102099

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=1085699265

Job ended: Sun Aug 25 12:45:35 NZST 2013

The job took 89 seconds.

Run this test on each node being upgraded

[email protected]


Stop the servers


sbin/yarn-daemon.sh stop resourcemanager

stopping resourcemanager

sbin/yarn-daemon.sh stop nodemanager

stopping nodemanager

Alter Hadoop env

cd $HADOOP_CONF_DIR

vi hadoop-env.sh

add a JAVA_HOME definition at the end. i.e.

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

[email protected]


Alter $HADOOP_CONF_DIR/slaves file

Add details ( one per line ) for slave nodes

Format the cluster

DONT have the cluster running else you will lose data

hdfs namenode -format

Now proceed to start the cluster

[email protected]


cd $HADOOP_COMMON_HOME

sbin/hadoop-daemon.sh --config $HADOOP_COMMON_HOME/etc/hadoop

--script hdfs start namenode

cd $HADOOP_COMMON_HOME

sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode


sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager


sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager


bin/yarn start proxyserver --config $HADOOP_CONF_DIR

cd $HADOOP_MAPRED_HOME

sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR

[email protected]


Use jps to check servers running

jps

5856 DataNode

6434 Jps

5776 NameNode

6181 NodeManager

6255 WebAppProxyServer

5927 ResourceManager

6352 JobHistoryServer

Then run the same mapreduce job on the cluster

[email protected]

Web Access

[email protected]

Web Access

[email protected]

Web Access

[email protected]

Contact Us

Feel free to contact us at

www.semtech-solutions.co.nz

[email protected]

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems

Technology

An example Apache Hadoop Yarn upgrade