Playing with Hadoop 2013-10-31

Embed Size (px)

Citation preview

2. Needed to run Hadoop You need the following to run Hadoop Java JDKLinux server Hadoop tarballI'm using the following JDK 1.6.24 64 bitUbuntu 12.04 LTS 64 bit Hadoop 1.0.4Could not get JDK7 + Hadoop 2.2 to work 3. Installing Hadoop 4. Install Java 5. Setup Java home and path 6. Add hadoop user 7. Install Hadoop and add to path 8. Create SSH key for hadoop user 9. Accept SSH key 10. Disable IPv6 11. Reboot and check installation 12. Running an example job 13. Calculate Pi 14. Estimated value of Pi 15. Three modes of operation Pi was calculated in Local standalone mode it is the default mode (i.e. no configuration needed) all components of Hadoop run in a single JVMPseudo-distributed mode components communicate using socketsa separate JVM is spawned for each component it is a minicluster on a single hostFully distributed mode components are spread across multiple machines 16. Configuring for pseudo distributed mode 17. Create base directory for HDFS 18. Set JAVA_HOME 19. Edit core-site.xml 20. Edit hdfs-site.xml 21. Edit mapred-site.xml 22. Log out and log on as hadoop 23. Format HDFS 24. Start HDFS 25. Start Map Reduce 26. Create home directory & test data 27. Running Word Count 28. First let's try the example jar 29. Inspect the result 30. Compile and run our own jar https://gist.github.com/soren/7213273 31. Inspect result 32. Run improved version https://gist.github.com/soren/7213453 33. Inspect (improved) result 34. The Web User Interface HDFS MapReduce http://localhost:8070/File Browser http://localhost:8030/http://localhost:8075/browseDirectory.jsp?namenodeInfoPortNote: this is with port forwarding in VirtualBox 50030 8030, 50070 8070, 50075 8075 35. Now you can go play with Hadoop...