Upload
shiva1912-1
View
216
Download
0
Embed Size (px)
DESCRIPTION
hbfrds_hadtyuut_ijhytredew
Citation preview
The Enterprise Open Source Billing System
Hadoop/HBase Installation Note: This installation is ONLY for developer machines. It assumes that the Hadoop/HBase installation is on the same machine where JB will be running. With a multi-node cluster this installation guide will not work. (These instructions are for linux-Ubuntu OS)
INSTALL JAVA
This step can be skipped if this is done on a reference machine and a JB installation already exists.
You should already have the java 1.6 jdk file in the /root folder of your VM. Follow these installation instructions:
chmod 755 jdk-6u25-linux-i586.bin./jdk-6u25-linux-i586.bin
sudo mv jdk1.6.0_25/ /optcd /optsudo ln -s jdk1.6.0_25 jdk1.6
Create a new profile script in /etc/profile.d/ to set JAVA_HOME to the location of the unpacked JDK, then change the symlink in /etc/alternatives to point at the new java binary so that it’s available to all applications.
sudo vim /etc/profile.d/java.sh
java.sh:
#!/bin/bashJAVA_HOME=/opt/jdk1.6PATH=$PATH:$JAVA_HOME/binexport JAVA_HOME PATH
Run the profile script and update the software alternatives symlinks.
sudo chmod 755 /etc/profile.d/java.shsource /etc/profile.d/java.sh
sudo update-alternatives --install "/usr/bin/java" "java" "/opt/jdk1.6/bin/java" 1sudo update-alternatives --set java /opt/jdk1.6/bin/java
INSTALL HADOOP AND HBASE
CREATE HADOOP GROUP/USER
sudo groupadd hadoopsudo useradd hadoop -m -s /bin/bash -g hadoopsudo passwd hadoop
change the password to 'hadoop'
INSTALL HADOOP/HBASE BINARIES
Be very careful about hadoop and hbase version. Specific versions of hadoop work only with specific versions of hbase.
Download the binaries for hadoop and hbase into /root folder.
as root user:
cd ~wget http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/hadoop-1.1.2-bin.tar.gztar zxvf hadoop-1.1.2-bin.tar.gzwget http://archive.apache.org/dist/hbase/hbase-0.94.8/hbase-0.94.8.tar.gztar zxvf hbase-0.94.8.tar.gz
move the hadoop and hbase binary into the /opt folder and create symbolic links for those:
mv hbase-0.94.8/ /optmv hadoop-1.1.2/ /optcd /optln -s hbase-0.94.8 hbaseln -s hadoop-1.1.2 hadoop
make hadoop (group and user) owner of the newly create folders in /opt directory
chown hadoop:hadoop -R hbase-0.94.8/chown hadoop:hadoop -R hadoop-1.1.2/
HADOOP PROFILE SCRIPT
as root user:
vim /etc/profile.d/hadoop.sh
hadoop.sh:
#!/bin/bashHADOOP_HOME=/opt/hadoopPATH=$PATH:$HADOOP_HOME/binexport HADOOP_HOME PATH
make hadoop profile script readable and executable for all users,
chmod 755 /etc/profile.d/hadoop.sh
HBASE PROFILE SCRIPT
vim /etc/profile.d/hbase.sh
hbase.sh:
#!/bin/bashHBASE_HOME=/opt/hbasePATH=$PATH:$HBASE_HOME/binexport HBASE_HOME PATH
make hbase profile script readable and executable for all users,
chmod 755 /etc/profile.d/hbase.sh
HADOOP USER ENV CONFIG
as hadoop user:
cd ~vim .bashrc
at the bottom of the file add the following lines:
source /etc/profile.d/java.shsource /etc/profile.d/hadoop.shsource /etc/profile.d/hbase.sh
SSH KEYS (HADOOP USER)
Generate ssh keys (for hadoop user) to be able ssh into the machine without password.
as hadoop user:
cd ~ssh-keygen -t rsacd .sshcat id_rsa.pub > authorized_keyschmod 600 authorized_keys
use empty pass phrases.
(The local machine also requires an sshd server running, which may need to be installed if not already installed. (openssh-server)
HADOOP CONFIGURATION
Create a folder for the hadoop data. In this example we choose that place to be under the /opt folder but this is not mandatory. The hadoop data folder can be any place in the system that has sufficient space. Note, avoid the /tmp folder since Linux OSs do automatic cleaning in these folders.
as root user:
mkdir /opt/hadoop-datachown hadoop:hadoop /opt/hadoop-datachmod 755 /opt/hadoop-data
Note that hadoop will not like (and then not start) if its data dir is not own by itself and with the exact 755 permissions.
As 'hadoop' user, in $HADOOP_HOME/conf/hadoop-env.sh, uncomment and define the JAVA_HOME variable:
..export JAVA_HOME=/opt/jdk1.6..
As 'hadoop' user, in $HADOOP_HOME/conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property></configuration>
As 'hadoop' user, in $HADOOP_HOME/conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property>
<property> <name>dfs.permissions</name> <value>false</value> </property>
<!-- <property> <name>dfs.data.dir</name> <value>/opt/hadoop-data</value> <final>true</final> </property>-->
<property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-data</value> </property>
</configuration>
As 'hadoop' user, in $HADOOP_HOME/conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property></configuration>
In this file you should also configure how many mappers and reducers this node will be processing as a maximum. The default is only two, which is almost single threaded. If this node will be used for any real processing, change the '2' for about '200'.
When you decide the real number of mappers/reducers that will run in this node, adjust the maximum connections for postgres accordingly. Calculate 2 connections per reducer.
To modify the maximum number of connections, edit postgres.conf and edit the property 'max_connections'. Then restart postgres.
As 'hadoop' user, in $HADOOP_HOME/conf/log4j.properties, change the root logging level threshold to DEBUG,
.hadoop.root.logger=DEBUG,console.
HBASE CONFIGURATION
As 'hadoop' user, in $HBASE_HOME/conf/hbase-env.sh, uncomment and define the JAVA_HOME variable and uncomment the HBASE_MANAGES_ZK variable definition:
..export JAVA_HOME=/opt/jdk1.6...export HBASE_MANAGES_ZK=true..
in $HBASE_HOME/conf/hbase-site.xml,
<configuration><!-- zoo keeper is also needed for this --> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> <property> <name>hbase.master</name> <value>localhost:60000</value> <description>The host and port that the HBase master runs at.</description> </property>
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>
<!-- <property> <name>dfs.support.append</name> <value>true</value> </property>--> <property> <name>zookeeper.znode.parent</name> <value>/hbase</value> <description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper files that are configured with a relative path will go under this node. By default, all of HBase's ZooKeeper file path are configured with a relative path, so they will all go under this directory unless changed.</description>
</property>
<property> <name>zookeeper.znode.rootserver</name> <value>root-region-server</value> <description>Path to ZNode holding root region location. This is written by the master and read by clients and region servers. If a relative path is given, the parent folder will be ${zookeeper.znode.parent}. By default,this means the root location is stored at /hbase/root-region-server.</description> </property>
<!--ZooKeeper config --> <property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/opt/hadoop-data/zookeeper</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>2000</value> </property> <property> <name>hbase.zookeeper.property.initLimit</name> <value>10</value> </property> <property> <name>hbase.zookeeper.property.syncLimit</name> <value>5</value> </property>
</configuration>
INITIALIZE HDFS
Initialize HDFS by running the commmand:
$HADOOP_HOME/bin/hadoop namenode -format
output should look similar to this:
/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = HP610/127.0.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.1.1STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1411108; compiled by 'hortonfo' on Mon Nov 19 10:48:11 UTC 2012************************************************************/13/01/02 19:07:47 INFO util.GSet: VM type = 64-bit13/01/02 19:07:47 INFO util.GSet: 2% max memory = 17.77875 MB13/01/02 19:07:47 INFO util.GSet: capacity = 2^21 = 2097152 entries13/01/02 19:07:47 INFO util.GSet: recommended=2097152, actual=209715213/01/02 19:07:48 INFO namenode.FSNamesystem: fsOwner=hadoop13/01/02 19:07:48 INFO namenode.FSNamesystem: supergroup=supergroup13/01/02 19:07:48 INFO namenode.FSNamesystem: isPermissionEnabled=true13/01/02 19:07:48 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=10013/01/02 19:07:48 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)13/01/02 19:07:48 INFO namenode.NameNode: Caching file names occuring more than10 times 13/01/02 19:07:48 INFO common.Storage: Image file of size 112 saved in 0 seconds.13/01/02 19:07:49 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits13/01/02 19:07:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-hadoop/dfs/name/current/edits13/01/02 19:07:49 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.13/01/02 19:07:49 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at HP610/127.0.1.1************************************************************/
STARTING/STOPING HADOOP and HBASE
Always start hadoop before hbase. Hbase is configured to work on top of HDFS which is started and running along with Hadoop. Both hadoop and hbase come with scripts that will star/stop hadoop and hbase.
STARTING HADOOP
$HADOOP_HOME/bin/start-all.sh
or alternativly start each process manually
hadoop-daemon.sh start jobtrackerhadoop-daemon.sh start tasktrackerhadoop-daemon.sh start namenodehadoop-daemon.sh start datanode
to check if all the processes are started execute the 'jps' command.
hadoop@debian:~$ jps3318 SecondaryNameNode3506 TaskTracker3193 DataNode3090 NameNode3397 JobTracker3619 Jps
In the output hadoop processes are JobTracker, NameNode, DataNode, TaskTracker and SecondaryNameNode.
Two web interfaces will be started that give monitoring options for Hadoop and HDFS.
http://X.Y.Z.Q:50030/jobtracker.jsp - customer status and job monitoring
http://X.Y.Z.Q:50070/dfshealth.jsp - hdfs monitoring
STOPING HADOOP
$HADOOP_HOME/bin/stop-all.sh
or alternativly stop each process manually
hadoop-daemon.sh stop jobtrackerhadoop-daemon.sh stop tasktrackerhadoop-daemon.sh stop namenodehadoop-daemon.sh stop datanode
STARTING HBASE
$HBASE_HOME/bin/start-hbase.sh
check if all HBase process are started by using the 'jps' command,
hadoop@U10:~$ jps17890 HMaster17112 JobTracker17811 HQuorumPeer16811 DataNode17312 TaskTracker16608 NameNode17018 SecondaryNameNode18139 HRegionServer18256 Jps
In the output, HBase processes are HMaster, HQuorumPeer and HRegionServer.
STOPING HBASE
$HBASE_HOME/bin/stop-hbase.sh
After starting both Hadoop and HBase make sure they are working.
LOGS
Check the logs of both HBase and Hadoop and make sure there are no critical exceptions.
HBase logs path: /opt/hbase/logsHadoop logs path: /opt/hadoop/logs
WARNING:Known problem with some Linux distribution is a predefined /etc/hosts entry that starts with 127.0.1.1. When HBase starts the first thing it will do is to insert some nodes into ZooKeeper (ZK) which contain information about the location of the region servers. When client want to talk to HBase they talk to the ZK to find out the location of the region servers. The problem here is that when HBase starts it will do DNS resolution against /etc/hosts for the location of the region server and it will update the ZK node with that information instead of the configured data. Now, if we leave the 127.0.1.1 entry in /etc/hosts that HBase may use this entry and include it in ZK. When clients query ZK about information they will receive this information and they will not be able to connect to the region server. This problem is much more visible when we access the Hadoop/HBasefrom a different machine that the one with the Hadoop/HBase installation. The usual solution is to remove this line from /etc/hosts and restart everything. For more details read Why does HBase care about /etc/hosts?
/etc/hosts
The first entry in /etc/hosts should be similar to below:
127.0.0.1 localhost domain_name_or_machine_name
HDFS Sanity Test:
hadoop dfs -ls /
it should list the root folder content of the HDFS
HBASE Sanity Test:
Start HBase shell script, with:
hbase shell
Try to list all the tables, with:
list
Note, if the process hangs and is not listing the tables (or saying that there are not tables) than most likely HDFS is not available and something is wrong.
As part of sanity testing a good practice is to check the logs to see if there is some start up errors. The logs ca be found at:
$HADOOP_HOME/logs
TROUBLE SHOOTING
PID FILES
Files containing process pids are stored in /tmp. If you kill a hbase process you will have to delete the files in order for it to restart from the command line.