17
Alex Chengelis 2632220 CIS-612 LAB 4_1 1. Create a new virtual Machine with Ubuntu. I am using VMware player to do this. Just keep filling in the info you want and let it install.

Alex Chengelis 2632220 CIS-612 LAB 4 1

  • Upload
    others

  • View
    6

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Alex Chengelis 2632220 CIS-612 LAB 4 1

Alex Chengelis

2632220

CIS-612

LAB 4_1

1. Create a new virtual Machine with Ubuntu. I am using VMware player to do this.

Just keep filling in the info you want and let it install.

Page 2: Alex Chengelis 2632220 CIS-612 LAB 4 1

2. Download the Appropriate Java and Hadoop files.

I am using 2.7.3 since it is the latest stable release of Hadoop. You can either use the website to

download it or use curl.

Curl -O http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-

2.7.3.tar.gz

For Java Go to this page http://www.oracle.com/technetwork/java/javase/downloads/jdk8-

downloads-2133151.html

And download the linux tar.gz.

Page 3: Alex Chengelis 2632220 CIS-612 LAB 4 1

Place both the Hadoop and Java binaries in downloads.

3. Configure the SSH server.

sudo apt-get update

sudo apt-get install openssh-server

Page 4: Alex Chengelis 2632220 CIS-612 LAB 4 1

4. Configure the password-less ssh login.

cd

ssh-keygen -t rsa -P ""

cat ./.ssh/id_rsa.pub >> ./.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

##THEN

sudo service ssh restart

Page 5: Alex Chengelis 2632220 CIS-612 LAB 4 1

5. Standalone Mode Setup (you start with this and add more and more functionality). Start by

extracting the downloaded files.

cd Downloads

tar xzvf hadoop-2.7.2.tar.gz

After running the tar command the terminal will quickly fill up.

Verify that Hadoop has been extracted

Page 6: Alex Chengelis 2632220 CIS-612 LAB 4 1

6. Create soft links

cd

ln -s ./Downloads/hadoop-2.7.2/ ./Hadoop

7. Configure .bashrc

cd

vi ./.bashrc

export HADOOP_HOME=/home/alex/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

8. Configure Hadoop’s Hadoop-env.sh file

cd

vi ./hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/home/alex/jdk

9. Run a Hadoop job on a Standalone cluster. First exit and restart the terminal. Then type the

Hadoop command.

Page 7: Alex Chengelis 2632220 CIS-612 LAB 4 1

A sign that our installation is good so far.

Run a Hadoop job

Create a testhadoop directory

Create input directory inside testhadoop

Create some input files (the .xml files)

Run MapReduce example job

View the output directory using cat command

cd

mkdir testhadoop

cd testhadoop

mkdir input

cp ~/hadoop/etc/hadoop/*.xml input

hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-

mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

cat output/*

You’ll see some output in the terminal

Page 8: Alex Chengelis 2632220 CIS-612 LAB 4 1

Finally check the output

This is working.

10. Now to transform this into a pseud-Distributed Mode without YARN setup (to start).

a. Configure core-site.xml and hdfs-site.xml

cd

vi ./hadoop/etc/hadoop/core-site.xml

## adding these lines to the file ##

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://10.1.37.12:9000</value>

Page 9: Alex Chengelis 2632220 CIS-612 LAB 4 1

</property>

</configuration>

vi ./hadoop/etc/hadoop/hdfs-site.xml

## adding these lines to the file ##

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Replace the ip with the one from the following command

11. Format the namenode

hdfs namenode -format

Page 10: Alex Chengelis 2632220 CIS-612 LAB 4 1

12. Start/Stop Hadoop cluster

$ start-fs.sh

13. Create a user on the HDFS system

$ hdfs dfs -mkdir /user

$hdfs dfs -mkdir /user/alex

Put some info into that input

$hdfs dfs -put ~/hadoop/etc/hadoop input

Page 11: Alex Chengelis 2632220 CIS-612 LAB 4 1

14. Run a Hadoop job now

$hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-

examples-2.7.2.jar grep input output ‘d[a-z.]+’

Check the output

$hdfs dfs -cat output/*

Page 12: Alex Chengelis 2632220 CIS-612 LAB 4 1

15. Since everything is working so far we are going to extend our Pseudo-Distributed Mode with

YARN Setup.

a. Configure mapred-site.xml and yarn-site.xml

$cd

$nano ./hadoop/etc/hadoop/mapred-site.xml

Add the following lines

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

$nano ./hadoop/etc/hadoop/yarn-site.xml

Add the following lines

<configuration>

<property>

<name>yarn.nodemanager.aux-

services</name><value>mapreduce_shuffle</value>

</property>

</configuration>

Page 13: Alex Chengelis 2632220 CIS-612 LAB 4 1

16. Start YARN cluster

$start-yarn.sh

Go to http://localhost:8088 to make sure it is working

17. Let’s test.

$cd

$cd testhadoop

$rm -rf output/

$hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input

output ‘dfs[a-z.]+’

Page 14: Alex Chengelis 2632220 CIS-612 LAB 4 1

$hdfs dfs -cat output/*

Will look the same as the previous one

18. Time to run the word count.

a. Let’s get a file from the Gutenberg project: http://www.gutenberg.org/files/76/76-0.txt

it’s a copy of Huckleberry Fin

b. Use wget to get it.

c. Create a directory for our wordcount, and the input directory

$mkdir wordcount && cd wordcount

$mkdir input

d. Move our test file into the input file

e. Navigate back to the wordcount directory

$cd wordcount

Page 15: Alex Chengelis 2632220 CIS-612 LAB 4 1

f. Remove the output file currently in the system

$ hdfs dfs -rmr /user/alex/output

g. Now remove and copy over our current input directory.

$ hdfs dfs -rm -r /user/alex/input

$ hdfs dfs -put input /user/alex/input

$ hdfs dfs -ls /user/alex/input (just to check to make sure it is there)

h. Finally it is time to run the wordcount program.

$ Hadoop jar ~/Hadoop/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-

2.7.3.jar wordcount input output

i. Check the output

$ hdfs dfs -cat output/*

Page 16: Alex Chengelis 2632220 CIS-612 LAB 4 1

j. Copy over the output to the “local” machine.

$ hdfs dfs -get /user/alex/output/ .

$ ls (to verify)

$ ls output (to verify)

k. Open it up in your favorite editor. Have fun looking through the results.

Page 17: Alex Chengelis 2632220 CIS-612 LAB 4 1

Guide was taken from:

https://medium.com/@luck/installing-hadoop-2-7-2-on-ubuntu-16-04-3a34837ad2db