Download pdf - Spark is going to replace Apache Hadoop! Know Why?

Spark is going to replace Hadoop! Know Why?www.edureka.co/apache-spark-scala-training

http://www.edureka.co/apache-spark-scala-training

Agenda

At the end of the session, you will be able to:

Understand Why Learn Spark?

Know Advantages of Spark & its Survey for 2015

Discover Spark Career Path

Understand how Companies are using Spark?

Slide 2 www.edureka.co/apache-spark-scala-training


Why Spark?



Rise of Big Data

Unstructured Data

7000

6000

5000

4000

3000

2000

1000

0

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Structured Data Un-structured Data

By 2020, IDC (International Data Corporation) predicts the number will have reached 40,000 EB, or 40 Zettabytes(ZB)

The world’s information is doubling every two years. By 2020, there will be 5,200 GB of data for every person onEarth.



Application of Big Data

Source: Twitter



Application of Big Data



Hadoop is not Enough!

Limitations:

Hadoop MapReduce is Limited to Batch Processing.Real-time processing was a big “No” in Hadoop

Real-time Processing

Hadoop MapReduce is fast but not fast enoughNot Fast Enough

Conclusion:

It is essential and can be achieved using Spark!



Spark Survey and its Advantages



Spark Survey 2015!

Slide 9 Source: Typesafe 2015 www.edureka.co/apache-spark-scala-training


Advantages of Spark

Slide 10Runs Everywhere

Generality

Ease of Use

100x faster than MR

www.edureka.co/apache-spark-scala-training


Feature Comparision

Slide 11 Source: Databrix

Hadoop MapReduce HADOOP Spark

Fast 100x faster than MapReduce

Batch Processing Batch and Real-time Processing

Stores Data on Disk Stores Data in Memory

OpenSource OpenSource

Written in Java Written in Scala



Spark Features/Modules in Demand



New Features in 2015

Data Frames

• Similar API to data frames in R and Pandas• Automatically optimised via Spark SQL• Released in Spark 1.3

SparkR

• Released in Spark 1.4• Exposes DataFrames, RDD’s & ML library in R

Machine Learning Pipelines

• High Level API• Featurization• Evaluation• Model Tuning

External Data Sources

• Platform API to plug Data-Sources into Spark• Pushes logic into sources

Slide 13 Source: Databrix www.edureka.co/apache-spark-scala-training


Spark Career Path



Job Roles & Industry Focus



Job Trends



Major Companies Using Hadoop



Industry Adoption



How Companies are using Spark?



General Business Goals



Demo



The Big Question!

Is Spark going to replace Hadoop?



The Big Question!

Is Spark going to replace Hadoop?

Answer – Yes, Spark will be used on top of Hadoop and replace MapReduce

Reasons:

1.2.3.

Hadoop MapReduce cannot handle real-time processingHadoop MapReduce is slower than Hadoop SparkWith rise of IOT, Spark is a must



Questions



Survey

Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make

the course better!

Please spare few minutes to take the survey after the webinar.