ACADGILD:: HADOOP LESSON- Running mapreduce in local mode

Embed Size (px)

Citation preview

ACADGILDhttps://acadgild.com/blog/running-mapreduce-in-local-mode-2/INTRODUCTION

In this blog we have explained in detail about how to run your mapreduce code locally in eclipse in any linux machine.After reading this blog you can easily run your mapreduce codes in eclipse without starting any of your hadoop daemons.Before getting started with the things let us learn something about local mode and cluster modeLocal ModeLocal mode means you are not connected to any other system or any other network,In local mode you need not to start your hadoop daemons also.You need not to store your files in hdfs,you can just specify your local file paths.ClusterModeCluster is a collection of systems connected in a network ,cluster mode in the sense running your program in a distributed network which means a distributed collection of systems.Here you need to ensure that all your hadoop daemons are started and then you need to run your mapreduce application by building a jar file.Running in clustermode is not recommended all the time because it wastes your HDFS space and decreases your cluster performance.Every time when you try to deploy your application in cluster mode,your hdfs takes atleast 128MB of spaces beacuse the default block size in Hadoop2.x is 128MB.For Testing your MapReduce program you can deploy it in local mode rather than cluster mode.Follow the below procedure to execute your Mapreduce programs locally in eclipse,this saves your hdfs memory and time to check your program1.Open eclipse2.Create a Java Project3.Create a new package(optional)4.Create a new class5.Copy your program in to that class

You need to add dependencies for running in eclipse which means few more jars need to be configured in your libraries.All the jars present in the lib folder of the common directory of hadoop.

Hadoop common 1.2.1 jar(Need to be imported externally)

To add the jar filesRight click on the project-->Build Path-->Configure Build Path-->Libraries-->Add External Jars-->open your hadoop folder-->share-->hadoop-->common-->lib-->Add all the jars in lib folderThen you need to add another external jar for dependencies i.e., hadoop-core-1.2.1 jarDownload that jar file from the below linkhttps://drive.google.com/file/d/0ByJLBTmJojjzM2IwU1FPdmExLUE/view?usp=sharingAfter downloading you need to add this jar in to your libraries.Now you are ready to run your program in eclipse,To runRight click on the project-->Run as-->Run configurations-->main
In main you need to select your project and main class correctly

HadoopThen move into the Arguments tabHere you need to give your input file path and output file path separated byTab space

Now click onRunthen your program will start running and you can track the status inconsoleafter the whole process you can see that an output file will be created in your specified folder.Inside that folder you can see a part file and a success file which indicates that you have executed your program successfully in eclipse locally.href="https://s3.amazonaws.com/acadgildsite/wordpress_images/bigdatadeveloper/RUNNING+MAPREDUCE+IN+LOCAL+MODE/hadoop+eclipse.png">With this approachyou can test your MapReduce codes and make changes in the MapReduce code easily before deploying it in a cluster

you can save your HDFS space

Hadoop