Hadoop Install Configure

Embed Size (px)

Citation preview

  • 8/11/2019 Hadoop Install Configure

    1/4

    Hadoop Installation and Configuration

    1.

    Setup passphraseless ssh

    Now check that you can ssh to the localhost without a passphrase:

    $ ssh localhost

    If you cannot ssh to localhost without a passphrase, execute the following commands:

    $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

    $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    2.

    Download Hadoop from Apache. (Lets try the stable version that will be

    compatible with Mahout in the future lectures)

    http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/

    3.

    Using the command tar xzvf hadoop-1.0.4-tar.gz to unzip the package.

    4.

    Go the hadoop-1.0.3 directory; you will find the following files and

    subdirectories.1.-rwxrw-r--@ 1 kzhang6 staff 446615 May 8 2012 CHANGES.txt

    2.-rwxrw-r--@ 1 kzhang6 staff 13366 May 8 2012 LICENSE.txt

    3.-rwxrw-r--@ 1 kzhang6 staff 101 May 8 2012 NOTICE.txt

    4.-rwxrw-r--@ 1 kzhang6 staff 1366 May 8 2012 README.txt

    5.drwxrw-r--@ 19 kzhang6 staff 646 May 8 2012 bin

    6.-rwxrw-r--@ 1 kzhang6 staff 119875 May 8 2012 build.xml

    7.drwxrw-r--@ 4 kzhang6 staff 136 May 8 2012 c++

    8.drwxrw-r--@ 18 kzhang6 staff 612 Nov 4 14:29 conf

    9.

    drwxrw-r--@ 10 kzhang6 staff 340 May 8 2012 contrib10. drwxrw-r--@ 69 kzhang6 staff 2346 May 8 2012 docs

    11. -rwxrw-r--@ 1 kzhang6 staff 6840 May 8 2012 hadoop-ant-1.0.3.jar

    12. -rwxrw-r--@ 1 kzhang6 staff 410 May 8 2012 hadoop-client-1.0.3.jar

    13. -rwxrw-r--@ 1 kzhang6 staff 3928345 May 8 2012 hadoop-core-1.0.3.jar

    14. -rwxrw-r--@ 1 kzhang6 staff 142452 May 8 2012 hadoop-examples-1.0.3.jar

    15. -rwxrw-r--@ 1 kzhang6 staff 413 May 8 2012 hadoop-minicluster-1.0.3.jar

    16. -rwxrw-r--@ 1 kzhang6 staff 2656632 May 8 2012 hadoop-test-1.0.3.jar

    17. -rwxrw-r--@ 1 kzhang6 staff 287807 May 8 2012 hadoop-tools-1.0.3.jar

    18. drwxrw-r--@ 13 kzhang6 staff 442 May 8 2012 ivy

    19. -rwxrw-r--@ 1 kzhang6 staff 10525 May 8 2012 ivy.xml

    20. drwxrw-r--@ 52 kzhang6 staff 1768 May 8 2012 lib

    21. drwxrw-r--@ 4 kzhang6 staff 136 May 8 2012 libexec

    22. drwxrw-r-- 84 kzhang6 staff 2856 Oct 21 18:27 logs

    23.

    drwxrw-r--@ 9 kzhang6 staff 306 May 8 2012 sbin24. drwxrw-r--@ 3 kzhang6 staff 102 May 8 2012 share

    25. drwxrw-r--@ 18 kzhang6 staff 612 May 8 2012 src

    26. drwxrw-r--@ 9 kzhang6 staff 306 May 8 2012 webapps

    5. In the conf directory, you need to edit the following 4 files: hadoop-env.sh,

    hdfs-site.xml, core-site.xml, and mapred-site.xml.

    hadoop-env.sh:1 # Set Hadoop-specific environment variables here.

    http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/
  • 8/11/2019 Hadoop Install Configure

    2/4

    2

    3 # The only required environment variable is JAVA_HOME. All others are

    4 # optional. When running a distributed configuration it is best to

    5 # set JAVA_HOME in this file, so that it is correctly defined on

    6 # remote nodes.

    7

    8 # The java implementation to use. Required.

    9 exportJAVA_HOME=/Library/Java/Home10

    11 # Extra Java CLASSPATH elements. Optional.

    12 # export HADOOP_CLASSPATH=

    13

    14 # The maximum amount of heap to use, in MB. Default is 1000.

    15 exportHADOOP_HEAPSIZE=8000

    16

    17 # Extra Java runtime options. Empty by default.

    18 # export HADOOP_OPTS=-server

    19

    20 # Command specific options appended to HADOOP_OPTS when specified

    21 export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote$HADOOP_NAMENODE_OPTS"

    22 export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote

    $HADOOP_SECONDARYNAMENODE_OPTS "23 export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote$HADOOP_DATANODE_OPTS"

    24 export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote$HADOOP_BALANCER_OPTS "

    25 export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote$HADOOP_JOBTRACKER_OPTS "

    26 # export HADOOP_TASKTRACKER_OPTS=

    27 # The following applies to multiple commands (fs, dfs, fsck, distcp etc)

    28 # export HADOOP_CLIENT_OPTS

    29

    30 # Extra ssh options. Empty by default.

    31 # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

    32

    33 exportHADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -

    Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

    34

    35 # Where log files are stored. $HADOOP_HOME/logs by default.

    36 # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

    37

    38 # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.

    39 # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

    40

    41 # host:path where hadoop code should be rsync'd from. Unset by default.

    42 # export HADOOP_MASTER=master:/home/$USER/src/hadoop

    43

    44 # Seconds to sleep between slave commands. Unset by default. This

    45 # can be useful in large clusters, where, e.g., slave rsyncs can

    46 # otherwise arrive faster than the master can service them.

    47 # export HADOOP_SLAVE_SLEEP=0.1

    48

    49 # The directory where pid files are stored. /tmp by default.

    50 # export HADOOP_PID_DIR=/var/hadoop/pids

    51

    52 # A string representing this instance of hadoop. $USER by default.

    53 # export HADOOP_IDENT_STRING=$USER

    54

    55 # The scheduling priority for daemon processes. See 'man nice'.

    56 # export HADOOP_NICENESS=10

    Remove some warnings for mac system

    Java home directory

  • 8/11/2019 Hadoop Install Configure

    3/4

    hdfs-site.xml1

    2

    3

    4

    5

    6

    7

    8 dfs.replication

    9 2

    10

    11

    core-site.xml1

    2

    3

    4

    5

    6

    7 8 hadoop.tmp.dir

    9 /Users/kzhang6/Hadoop-data

    10 A base for temporary directories

    11

    12

    13

    14 fs.default.name

    15 hdfs://localhost:54310

    16

    17

    mapred-site.xml1

    2

    3

    4

    5

    6

    7

    8 mapred.job.tracker

    9 localhost:54311

    10

    11

    12

    13 mapred.tasktracker.map.tasks.maximum

    14 2

    15

    16

    17

    18 mapred.tasktracker.reduce.tasks.maximum

    19 2

    20

    21

    6.

    Format your namenode1.10-1-210-140:hadoop-1.0.3 kzhang6$ ./bin/hadoop namenode -format

  • 8/11/2019 Hadoop Install Configure

    4/4

    2.Warning: $HADOOP_HOME is deprecated.

    3.

    4.14/01/27 11:38:47 INFO namenode.NameNode: STARTUP_MSG:

    5./************************************************************

    6.STARTUP_MSG: Starting NameNode

    7.STARTUP_MSG: host = 10-1-208-217.cba.uic.edu/10.1.208.217

    8.STARTUP_MSG: args = [-format]

    9.

    STARTUP_MSG: version = 1.0.310. STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r

    1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012

    11. ************************************************************/

    12. Re-format filesystem in /Users/kzhang6/Hadoop-data/dfs/name ? (Y or N) Y

    13. 14/01/27 11:38:55 INFO util.GSet: VM type = 64-bit

    14. 14/01/27 11:38:55 INFO util.GSet: 2% max memory = 159.6675 MB

    15. 14/01/27 11:38:55 INFO util.GSet: capacity = 2^24 = 16777216 entries

    16. 14/01/27 11:38:55 INFO util.GSet: recommended=16777216, actual=16777216

    17. 14/01/27 11:38:55 INFO namenode.FSNamesystem: fsOwner=kzhang6

    18. 14/01/27 11:38:55 INFO namenode.FSNamesystem: supergroup=supergroup

    19. 14/01/27 11:38:55 INFO namenode.FSNamesystem: isPermissionEnabled=true

    20. 14/01/27 11:38:55 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

    21. 14/01/27 11:38:55 INFO namenode.FSNamesystem: isAccessTokenEnabled=false

    accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)22. 14/01/27 11:38:55 INFO namenode.NameNode: Caching file names occuring more than 10 times

    23. 14/01/27 11:38:56 INFO common.Storage: Image file of size 113 saved in 0 seconds.

    24. 14/01/27 11:38:56 INFO common.Storage: Storage directory /Users/kzhang6/Hadoop-data/dfs/name has

    been successfully formatted.

    25. 14/01/27 11:38:56 INFO namenode.NameNode: SHUTDOWN_MSG:

    26. /************************************************************

    27. SHUTDOWN_MSG: Shutting down NameNode at 10-1-208-217.cba.uic.edu/10.1.208.217

    28. ************************************************************/

    7. Start/Stop Hadoop1.10-1-210-140:hadoop-1.0.3 kzhang6$ ./bin/start-all.sh

    2.Warning: $HADOOP_HOME is deprecated.

    3.

    4.

    starting namenode, logging to /Users/kzhang6/Software/hadoop-1.0.3/libexec/../logs/hadoop-kzhang6-namenode-10-1-208-217.cba.uic.edu.out

    5.localhost: starting datanode, logging to /Users/kzhang6/Software/hadoop-1.0.3/libexec/../logs/hadoop-

    kzhang6-datanode-10-1-208-217.cba.uic.edu.out

    6.localhost: starting secondarynamenode, logging to /Users/kzhang6/Software/hadoop-

    1.0.3/libexec/../logs/hadoop-kzhang6-secondarynamenode-10-1-208-217.cba.uic.edu.out

    7.starting jobtracker, logging to /Users/kzhang6/Software/hadoop-1.0.3/libexec/../logs/hadoop-kzhang6-

    jobtracker-10-1-208-217.cba.uic.edu.out

    8. localhost: starting tasktracker, logging to /Users/kzhang6/Software/hadoop-1.0.3/libexec/../logs/hadoop-

    kzhang6-tasktracker-10-1-208-217.cba.uic.edu.out

    8. Check the status

    http://localhost:50070/dfshealth.jsp

    http://localhost:50030/jobtracker.jsp

    http://localhost:50070/dfshealth.jsphttp://localhost:50070/dfshealth.jsphttp://localhost:50030/jobtracker.jsphttp://localhost:50030/jobtracker.jsphttp://localhost:50030/jobtracker.jsphttp://localhost:50070/dfshealth.jsp