Big data Lambda Architecture - Batch Layer Hands On

Preview:

Citation preview

Big Data PipelineLambda Architecture - Batch Layer

with AngularJS

Java Restful Web ServicesApache HadoopApache Spark

Apache Cassandraon Amazon Web Services Cloud Platform

INGEST STORE Process Visualize

BIG Data Pipeline

Data Pipeline

AngularJS Web App

RestWeb Services

ApacheWeb Logs

S3

Log/Data File

SparkEngine

SparkSQL

HDFS

ApacheCassandra S3

HDFS

ApacheCassandra

AngularJS Web App

April

May

June

July

0125

00

30

INGEST STORE PROCESSVISUALIZE

STORE

InteractiveQueries

BIG Data Batch Layer Pipeline

Spark Cluster

AngularJS Web App

ClickStreamData

ApacheWeb Logs

Log/Data File

SparkStreaming

SparkSQL

ApacheKafka

S3

HDFS

ApacheCassandra

AngularJS Web App

April

May

June

July

0125

00

30

INGEST STREAM PROCESSVISUALIZE

STORE

InteractiveQueries

Spark Cluster

TCPSockets

BIG Data Real-Time Layer Pipeline

Install Web Server

EC2 instance for Web Server

cat /etc/*-release

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java8-installer

java -version

mkdir webserver

cd webserver

wget http://www-eu.apache.org/dist/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.tar.gz

tar xvzf apache-tomcat-8.0.36.tar.gz

ubuntu@ip-172-31-59-137:~/webserver/apache-tomcat-8.0.36/bin$ ./startup.sh

Commands to setup Apache Tomcat 8.0

Apache Tomcat 8.0 running on EC2 Instance

Install Apache Cassandra - 3 Node Cluster on AWS

3 EC2 instance for Cassandra Cluster

cat /etc/*-release

sudo add-apt-repository ppa:webupd8team/javasudo apt-get updatesudo apt-get install oracle-java8-installer

java -version

mkdir db

cd db

wget http://www-eu.apache.org/dist/cassandra/3.0.7/apache-cassandra-3.0.7-bin.tar.gz

tar xvzf apache-cassandra-3.0.7-bin.tar.gz

cd apache-cassandra-3.0.7/

cd apache-cassandra-3.0.7

bin/cassandra -f

bin/cqlsh

cassandra1 ——-> 52.87.183.121cassandra2 ——-> 52.207.239.229cassandra3 ——-> 54.174.185.29

Commands to setup Apache Cassandra 3.0.7Repeat for all 3 EC2 instances

Change following in conf/cassandra.yaml

cluster_name: 'Test Cluster’

listen_address:

broadcast_address: 54.174.185.29

seeds: “52.87.183.121,52.207.239.229"

rpc_address:

cassandra1 ——-> 52.87.183.121cassandra2 ——-> 52.207.239.229cassandra3 ——-> 54.174.185.29

3 Node Cassandra Server running on AWS EC2 Instances

3 Node Cassandra Server running

CREATE KEYSPACE users;

WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};

CREATE TABLE user( id int PRIMARY KEY, name text );

select * from user;

AngularJS - Java Restful WebServices Deployed on AWS Cloud

AngularJS - Java Restful WebServices

AngularJS - Java Restful WebServices

AngularJS - Java Restful WebServices

Tomcat Web Server Web Log we will be processingwith Apache Hadoop/Spark

Web Log and Python Application deployed toAWS Bucket

Spark job executed on AWS EMR - Spark Cluster

Results stored in Cassandra Database

Results stored in AWS S3 Bucket

Python Application BatchLogAnalyzer.py executed on AWS Spark Cluster

Results compared in console and Cassandra Database

Thank Youhkbhadraa@gmail.com

Recommended