Upload
trinhtruc
View
216
Download
0
Embed Size (px)
Citation preview
Pokok Bahasan
1. Running Code Program Pada Hadoop Single/Multi
Node Cluster Pada Linux:
o Klasifikasi: NB, KNN
o Clustering: K-Means
o WordCount: Load+Simpan Hasil di HDFS Folder (Pert. Ke-
12-13)
o WordCount: Load+Simpan Hasil di NoSQL (MongoDB)
o Movie Ratings: Load di HDFS Folder+Simpan Hasil di
NoSQL (MongoDB)
2. Studi Kasus
3. Tugas
Siapkan semua file *.java (dari algoritma Naive_Bayes_Classifier_MapReduce, link
kode: https://goo.gl/CEdr2R ) untuk dicompile ke *.jar:
Note: cek pada setiap file *.java tersebut, apakah terdapat package atau tidak
Study Kasus
Klasifikasi: NB
Jika tidak terdapat package, langsung file *.java tersebut dicopy-paste ke folder
/usr/local/hadoop
Jika ada package-nya, misal “package comratingbymovies.nidos;”, maka
“comratingbymovies.nidos” masing-masing akan menjadi folder “comratingbymovies”,
lalu “nidos”.
Study Kasus
Klasifikasi: NB
NBCDriver.java NBCMap.java NBCReduce.java
Jika ada package-nya, misal “package comratingbymovies.nidos;”, maka
“comratingbymovies.nidos” masing-masing akan menjadi folder “comratingbymovies”,
lalu “nidos”.
Maka di folder “/usr/local/hadoop”, akan ada folder “comratingbymovies/nidos” yang berisi file
Study Kasus
Klasifikasi: NB
package comratingbymovies.nidos;
import org.apache.hadoop.conf.Configuration;
.....
...
public class RatingCounterByMovieMongo extends Configured implements Tool{
public int run(String[] args) throws Exception {
....
..
System.exit(res);
}
}
RatingCounterByMovieMongo.java
“RatingCounterByMovieMongo.java”, dan file *.java
lainnya yang memang memiliki package yang sama.
Jika pada file *.java tidak ada package-nya, maka di folder “/usr/local/hadoop”, akan
langsung diisi dengan semua file *.java tersebut
Study Kasus
Klasifikasi: NB
Copy-Paste
Lakukan perintah (Membuat file *.class):
Study Kasus
Klasifikasi: NB
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main *.java
Note: NBCDriver.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Lakukan perintah (Membuat file *.class *.jar):
Study Kasus
Klasifikasi: NB
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main *.java
Note: NBCDriver.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Atau bisa dengan
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main -Xlint:deprecation *.java
nidos@master:/usr/local/hadoop$ jar cf NB.jar *.class
NBCDriver.java:16: warning: [deprecation] Job(Configuration) in Job has been deprecated
Job job = new Job(conf);
^
1 warning
Buat folder HDFS, “nb” dan “input”
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/nb
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/input
-copyFromLocal, misal file“1.txt” ada difolder
“/home/nidos/Desktop/kode/Naive_Bayes_Classifier_MapReduce/input” “1.txt” sebagai data training
nidos@master:/usr/local/hadoop$ bin/hadoop jar NB.jar NBCDriver Urgent,Yes,Yes /user/nidos/nb/input/1.txt
/user/nidos/nb/output
Untuk mengecek kode yang
deprecated dimana saja
Main Class
Siapkan semua file *.java (dari algoritma KNN, link kode: https://goo.gl/N0QjY8 )
untuk dicompile ke *.jar:
Note: dalam folder “mapreduce”, semua file java-nya terdapat package “com.
clustering.mapreduce”, dan dalam folder “model”, semua file java-nya terdapat
package “com.clustering.model”, jadi copy paste-nya ke folder “/usr/local/hadoop”
mulai dari folder “com”
Study Kasus
Klasifikasi: KNN
Jika pada file *.java yang ada package-nya sesuai penjelasan slide sebelumnya,
maka di folder “/usr/local/hadoop”, akan langsung diisi dengan folder “com” tersebut
Study Kasus
Clustering: K-Means
Copy-Paste
Lakukan perintah (Membuat file *.class):
Study Kasus
Clustering: K-Means
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main com/clustering/mapreduce/*.java
com/clustering/model/*.java
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Lakukan perintah (Membuat file *.class *.jar):
Study Kasus
Clustering: K-Means
nidos@master:/usr/local/hadoop$ jar cf KMeans.jar com/clustering/mapreduce/*.class com/clustering/model/*.class
nidos@master:/usr/local/hadoop$ bin/hadoop jar KMeans.jar com/clustering/mapreduce/KMeansClusteringJob
Main Class
Siapkan semua file *.java (dari algoritma K-Means, link kode: https://goo.gl/N0QjY8 )
untuk dicompile ke *.jar:
Note: dalam folder “mapreduce”, semua file java-nya terdapat package “com.
clustering.mapreduce”, dan dalam folder “model”, semua file java-nya terdapat
package “com.clustering.model”, jadi copy paste-nya ke folder “/usr/local/hadoop”
mulai dari folder “com”
Study Kasus
Clustering: K-Means
Jika pada file *.java yang ada package-nya sesuai penjelasan slide sebelumnya,
maka di folder “/usr/local/hadoop”, akan langsung diisi dengan folder “com” tersebut
Study Kasus
Clustering: K-Means
Copy-Paste
Lakukan perintah (Membuat file *.class):
Study Kasus
Clustering: K-Means
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main com/clustering/mapreduce/*.java
com/clustering/model/*.java
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Lakukan perintah (Membuat file *.class *.jar):
Study Kasus
Clustering: K-Means
nidos@master:/usr/local/hadoop$ jar cf KMeans.jar com/clustering/mapreduce/*.class com/clustering/model/*.class
nidos@master:/usr/local/hadoop$ bin/hadoop jar KMeans.jar com/clustering/mapreduce/KMeansClusteringJob
Main Class
NoSQL (Not Only SQL, atau maknanya kurang lebih adalah "tidak
hanya SQL") adalah kelas yang luas dari sistem manajemen basis
data yang di identifikasikan tidak mematuhi aturan pada model
sistem manajemen basis data relasional yang banyak digunakan.
NoSQL tidak dibangun terutama dengan table dan umumnya tidak
menggunakan SQL untuk memanipulasi data.
Teorema CAP menyatakan bahwa, tidak mungkin untuk sebuah
sistem komputer terdistribusi secara bersamaan, memberikan ketiga
jaminan sebagai berikut :
o Consistency: Semua node melihat data yang sama dalam
waktu yang bersamaan.
o Availability: Jaminan pada setiap permintaan, menerima
tanggapan tentang apakah itu berhasil atau tidak.
o Partition tolerance: Sistem terus beroperasi meski sistem yang
lain gagal.
NoSQL dan MongoDB
MongoDB merupakan open-source document database dari keluarga
NoSQL, yang memberikan kinerja tinggi, high availability, dan automatic
scaling.
Document, Data pada MongoDB dikenal sebagai document. MongoDB
menggunakan BSON (Binary JSON) serialization sebagai format
penyimpanan data untuk document, mirip dengan JSON (JavaScript Object
Notation). Contoh document:
Contoh Penggunaan
NoSQL dan MongoDB
{
"_id": ObjectId('58520a516a2ea8e208cd6c26'),
"name": "Employee One",
"salary": 20000
}
SQL:
SELECT *
FROM employeedetails
WHERE name = "Employee One"
OR salary = 20000
MongoDB:
db.employeedetails.find(
{
$or: [
{ name: "Employee One" },
{ nomor: 20000 }
]
})
SQL Vs MongoDB:
SQL Vs MongoDB
SQL MongoDB
database database
table collection
row document
column field
index Index
table joins Embedded documents
dan linking
primary key primary key
Case Study: Mengolah data dan menyimpan hasilnya ke MongoDB:
o Wordcount
o Rating By Movie
Note: Secara default, Map Reduce akan menyimpan hasilnya di file teks biasa di
HDFS. Hal ini sangat tidak praktis jika memang hasilnya ingin dimasukkan ke
NoSQL supaya mudah dibaca.
MongoDB dan Big Data
Lakukan perintah (Install MongoBD untuk simpan file hasil running kode
program di Hadoop):
Running Hadoop
(Save hasil di MongoDB)
nidos@master:~$ sudo apt-get install mongodb
nidos@master:~$ sudo apt-get update
nidos@master:~$ sudo service mongodb start
start: Job is already running: mongodb
nidos@master:~$ mongo
MongoDB shell version: 2.4.9
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
> use mydb;
switched to db mydb
> db;
mydb
> db.employeedetails.insert({name:"Employee One",salary:20000});
> db.employeedetails.find();
{ "_id" : ObjectId("58520a516a2ea8e208cd6c26"), "name" : "Employee One", "salary" : 20000 }
> exit
bye
Lakukan perintah:
Running Hadoop
(Install GUI MongoDB)
nidos@master:~$ sudo apt-get install curl
nidos@master:~$ curl http://packages.litixsoft.de/installer.sh | sh
Enter ´sudo ./bbs-installer.sh´ to start installation...
nidos@master:~$ sudo ./bbs-installer.sh
nidos@master:~$ sudo tar xvzf /home/nidos/Desktop/GUI\ Mongo/mms-v1.9.4-community-linux.tar.gz
nidos@master:~$ sudo tar xvzf mms-v1.9.4-community-linux-x86_64.tar.gz
nidos@master:~$ cd mms-v1.9.4-community-linux-x86_64/
nidos@master:~$ sudo ./bbs-installer.sh
BaboonStack Installer for Linux/MacOS
Local Packet found...
Install baboonstack-v1.5.1-linux-x64.tar.gz...
Create /opt/litixsoft/baboonstack...
Extract Files...
Execute Installscript bbs/lxscript.sh...
SUCCESS: Enter ´bbs´ for package updates.
Install MongoDB v2.6.7?: [Y/n] n Jika sudah Install
Install RedisIO v2.8.4?: [Y/n] Y
Install Node.js v0.12.0?: [Y/n] Y
Activate Io.js support?: [Y/n] Y
lxManager for BaboonStack - Litixsoft GmbH 2015
Upgrade successfull...
Lakukan perintah:
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64$ cd lx-mms
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo nano config.js
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Lakukan perintah:
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Lakukan perintah (http://127.0.0.1:3333/):
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Lakukan perintah (http://127.0.0.1:3333/):
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Create/Edit connection
Name: ..........., misal “test”
Server connection*: [username:password@]host1[:port1][,host2[:port2],....,[,hostN[:portN]]], misal “localhost”
Lakukan perintah (http://127.0.0.1:3333/):
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Lakukan perintah (http://127.0.0.1:3333/):
Running Hadoop
(Install GUI MongoDB)
nidos@master:~/mms-v1.9.4-community-linux-x86_64/lx-mms$ sudo ./start
Lakukan perintah:
Konfigurasi Koneksi
(Hadoop + MongoDB)
nidos@master:/usr/local/hadoop$ sudo apt-get install git
nidos@master:/usr/local/hadoop$ git clone https://github.com/mongodb/mongo-hadoop
nidos@master:/usr/local/hadoop$ cd mongo-hadoop
nidos@master:/usr/local/hadoop/mongo-hadoop$ ./gradlew jar
Lakukan perintah:
Konfigurasi Koneksi
(Hadoop + MongoDB)
Copy file “mongo-hadoop-core-2.0.1.jar” dari
/usr/local/hadoop/mongo-hadoop/core/build/libs
File hasil compile dari slide sebelumnya:
nidos@master:/usr/local/hadoop/mongo-hadoop$ ./gradlew jar
Lakukan perintah:
Konfigurasi Koneksi
(Hadoop + MongoDB)
Download file “mongo-java-driver” dari link http://central.maven.org/maven2/org/mongodb/mongo-java-driver/ dan
pilih versi terbaru, misal “mongo-java-driver-3.4.0.jar”
Lakukan perintah:
Konfigurasi Koneksi
(Hadoop + MongoDB)
Setelah selesai build, copy-kan file jars (“mongo-hadoop-core-2.0.1.jar” dan “mongo-java-driver-3.4.0.jar”) ke directory lib
pada setiap node (master, node1, node2, node3) di hadoop cluster. Berikut lokasinya yang bergantung pada versi Hadoop:
$HADOOP_PREFIX/lib/
$HADOOP_PREFIX/share/hadoop/mapreduce/
$HADOOP_PREFIX/share/hadoop/lib/
Misal 2 file jar tersebut sudah disiapkan di “/home/nidos/Desktop/kode/WordCountMongo/”
nidos@master:~/Desktop/kode/WordCountMongo$ ls
mongo-hadoop-core-2.0.1.jar WordCountMongo.java
mongo-java-driver-3.4.0.jar
Untuk PC master:
nidos@master:~$ cp /home/nidos/Desktop/kode/WordCountMongo/mongo* /usr/local/hadoop/lib
nidos@master:~$ cp /home/nidos/Desktop/kode/WordCountMongo/mongo* /usr/local/hadoop/share/hadoop/mapreduce
nidos@master:~$ mkdir r /usr/local/hadoop/share/hadoop/lib
nidos@master:~$ cp /home/nidos/Desktop/kode/WordCountMongo/mongo* /usr/local/hadoop/share/hadoop/lib
Untuk PC node1: Lakukan juga pada node2 dan node3
nidos@node1:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo* /usr/local/hadoop/lib
nidos@node1:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo*
/usr/local/hadoop/share/hadoop/mapreduce
nidos@node1:~$ mkdir r /usr/local/hadoop/share/hadoop/lib
nidos@node1:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo*
/usr/local/hadoop/share/hadoop/lib
Lakukan perintah:
Konfigurasi Koneksi
(Hadoop + MongoDB)
Untuk PC node3:
nidos@node3:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo* /usr/local/hadoop/lib
nidos@node3:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo*
/usr/local/hadoop/share/hadoop/mapreduce
nidos@node3:~$ mkdir r /usr/local/hadoop/share/hadoop/lib
nidos@node3:~$ scp nidos@master:/home/nidos/Desktop/kode/WordCountMongo/mongo*
/usr/local/hadoop/share/hadoop/lib
Lakukan perintah:
Running Hadoop WordCount
(Save hasil di MongoDB)
Buat Collecton “in” on DB “testmr”
Klik kanan pada DB “testmr”, lalu pilih “Add collection..”, beri nama misal “in”, lalu klik Save
Lakukan perintah:
Running Hadoop WordCount
(Save hasil di MongoDB)
Import file “*.json” as collection pada DB “testmr” ke collection “in” sebagai file input untuk diproses oleh
WordCountMongo.java
Lakukan perintah:
Running Hadoop WordCount
(Save hasil di MongoDB)
Import file “*.json” as collection pada DB “testmr” sebagai file input untuk diproses oleh WordCountMongo.java
Untuk file “in.json” tidak standar Mongo:
Klik kanan pada database atau DB “testmr”, lalu pilih “Import documents...”, lalu Isikan Name of destination-
collection*, misal “in”, lalu pilih “Import documents from textbox”, lalu isikan pada “Text Input”, dengan:
lalu Klik Import
{"x": "hello world"},
{"x": "nice to meet you"},
{"x": "good to see you"},
{"x": "world war 2"},
{"x": "see you again"},
{"x": "bye bye"}
in.json tidak standar Mongo in_standard.json standar Mongo
{ "x" : "hello world", "_id" : { "$oid" : "5856c34689b0ce4a6e000001" } }
{ "x" : "nice to meet you", "_id" : { "$oid" : "5856c34689b0ce4a6e000002" } }
{ "x" : "good to see you", "_id" : { "$oid" : "5856c34689b0ce4a6e000003" } }
{ "x" : "world war 2", "_id" : { "$oid" : "5856c34689b0ce4a6e000004" } }
{ "x" : "see you again", "_id" : { "$oid" : "5856c34689b0ce4a6e000005" } }
{ "x" : "bye bye", "_id" : { "$oid" : "5856c34689b0ce4a6e000006" } }
[{"x": "hello world"},
{"x": "nice to meet you"},
{"x": "good to see you"},
{"x": "world war 2"},
{"x": "see you again"},
{"x": "bye bye"}]
Lakukan perintah:
Running Hadoop WordCount
(Save hasil di MongoDB)
Untuk file “in.json” tidak standar Mongo:
Lakukan perintah:
Running Hadoop WordCount
(Save hasil di MongoDB)
Untuk file “in.json” tidak standar Mongo:
Lakukan perintah (Klik kanan pada database atau DB “testmr”, lalu pilih “Import documents...”, lalu
Isikan Name of destination-collection*, misal “in”, lalu klik Browse..):
Running Hadoop WordCount
(Save hasil di MongoDB)
Untuk file in_standard.json standar Mongo
Lakukan perintah (klik Open):
Running Hadoop WordCount
(Save hasil di MongoDB)
in_standard.json standar Mongo
Lakukan perintah (klik Import):
Running Hadoop WordCount
(Save hasil di MongoDB)
in_standard.json standar Mongo
Lakukan perintah (hasil klik Import):
Running Hadoop WordCount
(Save hasil di MongoDB)
in_standard.json standar Mongo
Siapkan file *.java (misal WordCountMongo.java Part 1 of 2) untuk dicompile ke *.jar:
Running Hadoop WordCount
(Save hasil di MongoDB)
import java.util.*;
import java.io.*;
import org.bson.*;
import com.mongodb.hadoop.MongoInputFormat;
import com.mongodb.hadoop.MongoOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
/**
* test.in
db.insert( { x: "eliot was here" } )
db.insert( { x: "eliot is here" } )
db.insert( { x: "who is here" } )
*
*/
public class WordCountMongo {
public static class TokenizerMapper extends Mapper<Object, BSONObject, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, BSONObject value, Context context )
throws IOException, InterruptedException {
System.out.println( "key: " + key );
System.out.println( "value: " + value );
StringTokenizer itr = new StringTokenizer(value.get( "x" ).toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Siapkan file *.java (misal WordCountMongo.java Part 2 of 2) untuk dicompile ke *.jar:
Running Hadoop WordCount
(Save hasil di MongoDB)
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context )
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set( "mongo.input.uri" , "mongodb://localhost/testmr.in" );
conf.set( "mongo.output.uri" , "mongodb://localhost/testmr.out" );
@SuppressWarnings("deprecation")
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountMongo.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class );
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Menghitung Kemunculan Kata dalam file dokumen:
o file “WordCountMongo.java”:
Running Hadoop WordCount
(Save hasil di MongoDB)
WordCountMongo.java dicompile ke *.jar:
o Lakukan hal berikut:
Running Hadoop WordCount
(Save hasil di MongoDB)
nidos@master:~$ cd /usr/local/hadoop
nidos@master:/usr/local/hadoop$ cp /home/nidos/Desktop/kode/WordCountMongo/WordCountMongo.java /usr/local/hadoop
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main WordCountMongo.java
Jika muncul error:
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main WordCountMongo.Java
error: Class names, 'WordCountMongo.Java', are only accepted if annotation processing is explicitly requested
1 error
Solusi: ubah “WordCountMongo.Java” “WordCountMongo.java”
WordCountMongo.java dicompile ke *.jar:
o Lakukan hal berikut:
Running Hadoop WordCount
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main WordCountMongo.java
Jika muncul error:nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main WordCountMongo.java
WordCountMongo.java:4: error: package org.bson does not exist
import org.bson.*;
^
WordCountMongo.java:6: error: package com.mongodb.hadoop does not exist
import com.mongodb.hadoop.MongoInputFormat;
^
WordCountMongo.java:7: error: package com.mongodb.hadoop does not exist
import com.mongodb.hadoop.MongoOutputFormat;
^
WordCountMongo.java:22: error: cannot find symbol
public static class TokenizerMapper extends Mapper<Object, BSONObject, Text, IntWritable> {
^
symbol: class BSONObject
location: class WordCountMongo
WordCountMongo.java:25: error: cannot find symbol
public void map(Object key, BSONObject value, Context context )
^
symbol: class BSONObject
location: class TokenizerMapper
WordCountMongo.java:60: error: cannot find symbol
job.setInputFormatClass( MongoInputFormat.class );
^
symbol: class MongoInputFormat
location: class WordCountMongo
WordCountMongo.java:61: error: cannot find symbol
job.setOutputFormatClass( MongoOutputFormat.class );
^
symbol: class MongoOutputFormat
location: class WordCountMongo
7 errors
Solusi: kembali ke langkah “Konfigurasi Koneksi (Hadoop + MongoDB)” SCP
Hasil: nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main
WordCountMongo.java
Running Hadoop WordCount
(Save hasil di MongoDB)
Hasil: nidos@master:/usr/local/hadoop$ jar cf wcmongo.jar WordCountMongo*.class
Running Hadoop WordCount
(Save hasil di MongoDB)
Running proses perhitungan kata dalam file dokumen dalam MongoDB:
o Lakukan hal berikut:
Running Hadoop WordCount
(Save hasil di MongoDB)
Jika menggunakan hdfs, maka gunakan dfs
Jika menggunakan hadoop, maka gunakan fs
nidos@master:/usr/local/hadoop$ bin/hadoop jar wcmongo.jar WordCountMongo
{"_id":"2","value":1}
{"_id":"again","value":1}
{"_id":"bye","value":2}
{"_id":"good","value":1}
{"_id":"hello","value":1}
{"_id":"meet","value":1}
{"_id":"nice","value":1}
{"_id":"see","value":2}
{"_id":"to","value":2}
{"_id":"war","value":1}
{"_id":"world","value":2}
{"_id":"you","value":3}
Case Study 2: Rating By Movie
o List dataset (dari MovieLens) dan Link ( https://goo.gl/Jd8GOI ):
Membuat Directories “ratemovie/dataset” di HDFS harus satu demi satu:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:~$ cd /usr/local/hadoop
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/ratemovie
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir
/user/nidos/ratemovie/dataset
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos/ratemovie
Copy semua file dataset dari local folder
(/home/nidos/Desktop/data/ratemovie/dataset) ke HDFS folder
(/user/nidos/ratemovie/dataset)
o List dataset (dari MovieLens) dan Link ( https://goo.gl/Jd8GOI ):
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal
/home/nidos/Desktop/data/ratemovie/dataset/* /user/nidos/ratemovie/dataset
Copy semua file dataset dari local folder
(/home/nidos/Desktop/data/ratemovie/dataset) ke HDFS folder
(/user/nidos/ratemovie/dataset)
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal
/home/nidos/Desktop/data/ratemovie/dataset/* /user/nidos/ratemovie/dataset
Siapkan file *.java untuk dicompile ke *.jar:
dari “package comratingbymovies.nidos;”
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
Siapkan file (MovieMapper.java Part 1 of 1) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
package comratingbymovies.nidos;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MovieMapper extends Mapper<LongWritable,Text,Text,Text>{
Text keyEmit = new Text();
Text valEmit = new Text();
public void map(LongWritable k, Text value, Context context) throws IOException, InterruptedException{
String line=value.toString();
String[] words=line.split("::");
keyEmit.set(words[0]);
valEmit.set(words[1]);
context.write(keyEmit, valEmit);
}
}
Siapkan file (MovieReducer.java Part 1 of 2) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
package comratingbymovies.nidos;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Reducer;
public class MovieReducer extends Reducer<Text,Text,Text,IntWritable>
{
Text valTitle = new Text();
Text valEmit = new Text();
String merge;
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException ,
InterruptedException{
int counter = 0;
merge = "";
for(Text value:values){
if (value.toString().startsWith("#")){ //from rating
counter++;
}
Siapkan file (MovieReducer.java Part 2 of 2) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
else if ("".equalsIgnoreCase(merge)){// from movies get the title
merge = value.toString();
}
}
valTitle.set(merge);
context.write(valTitle, new IntWritable(counter));
}
}
Siapkan file (RatingCounterByMovieMongo.java Part 1 of 2) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
package comratingbymovies.nidos;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.io.BSONWritable;
import com.mongodb.hadoop.mapred.MongoOutputFormat;
public class RatingCounterByMovieMongo extends Configured implements Tool{
public int run(String[] args) throws Exception {
final Configuration conf = getConf();
conf.set("mongo.output.uri", args[2]);
Path p1=new Path(args[0]);
Path p2=new Path(args[1]);
Job job = new Job(conf,"Multiple Job");
job.setJarByClass(RatingCounterByMovieMongo.class);
MultipleInputs.addInputPath(job, p1, TextInputFormat.class, RatingMapper.class);
MultipleInputs.addInputPath(job, p2, TextInputFormat.class, MovieMapper.class);
Siapkan file (RatingCounterByMovieMongo.java Part 2 of 2) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
job.setReducerClass(MovieReducer.class);
job.setOutputFormatClass(com.mongodb.hadoop.MongoOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
boolean success = job.waitForCompletion(true);
return success?0:1;
}
public static void main(String[] args) throws Exception {
if (args.length != 3 ){
System.err.println ("Usage :<inputlocation1> <inputlocation2> <outputlocation>
>");
System.exit(0);
}
int res = ToolRunner.run(new Configuration(), new RatingCounterByMovieMongo(), args);
System.exit(res);
}
}
Siapkan file (RatingMapper.java Part 1 of 1) untuk dicompile ke *.jar:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
package comratingbymovies.nidos;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapreduce.Mapper;
public class RatingMapper extends Mapper<LongWritable,Text,Text,Text>{
Text keyEmit = new Text();
Text valEmit = new Text();
public void map(LongWritable k, Text v, Context context) throws IOException, InterruptedException{
String line=v.toString();
String[] words=line.split("::");
keyEmit.set(words[1]);
valEmit.set("#");
context.write(keyEmit, valEmit);
}
}
Semua file *.java dicompile ke *.jar:
o Lakukan hal berikut:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:~$ cd /usr/local/hadoop
nidos@master:/usr/local/hadoop$ cp -r /home/nidos/Desktop/kode/RatingByMovies/comratingbymovies /usr/local/hadoop
Semua file *.java dicompile ke *.jar:
o Lakukan hal berikut:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hdfs com.sun.tools.javac.Main comratingbymovies/nidos/*.java
Note: comratingbymovies/nidos/RatingCounterByMovieMongo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Hasil: nidos@master:/usr/local/hadoop$ jar cf ratemovie.jar comratingbymovies/nidos/*.class
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
Hasil: nidos@master:/usr/local/hadoop$ jar cf ratemovie.jar comratingbymovies/nidos/*.class
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
Running proses perhitungan rating movie:
o Lakukan hal berikut:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hadoop jar ratemovie.jar
comratingbymovies/nidos/RatingCounterByMovieMongo /user/nidos/ratemovie/dataset/ratings.dat
/user/nidos/ratemovie/dataset/movies.dat mongodb://localhost/ratemovie.out
Running proses perhitungan rating movie:
o Lakukan hal berikut:
Running Hadoop Rating By Movie
(Save hasil di MongoDB)
nidos@master:/usr/local/hadoop$ bin/hadoop jar ratemovie.jar
comratingbymovies/nidos/RatingCounterByMovieMongo /user/nidos/ratemovie/dataset/ratings.dat
/user/nidos/ratemovie/dataset/movies.dat mongodb://localhost/ratemovie.out2
Tugas Kelompok
1. Lakukan Studi Kasus WordCount dengan dokumen yang berbeda (pada file > 0.5
MB, file > 1 MB, file > 10 MB, file > 100 MB, file > 1000 MB) pada Hadoop Single
atau Multi Node Cluster dan simpan hasilnya dalam MongoDB! dan berilah
penjelasan untuk setiap langkah-langkahnya disertai screenshot! Optional
2. Lakukan dan ulangi kembali Studi Kasus RateMovie dengan dokumen yang sama
seperti dicontoh pada Hadoop Single atau Multi Node Cluster dan simpan hasilnya
dalam MongoDB! dan berilah penjelasan untuk setiap langkah-langkahnya disertai
screenshot! Optional