1
Apache Spark and Its Rolein the Enterprise Data HubMike Olson, Chief Strategy Officer, [email protected], @mikeolson
2 ©2014 Cloudera, Inc. All rights reserved.
Spark Unifies and Simplifies Hadoop
Batch Processing
Stream Processing
Machine Learning
3 ©2014 Cloudera, Inc. All rights reserved.
Developing and supporting Spark together to ensure customer success
4 ©2014 Cloudera, Inc. All rights reserved.
Spark at Cloudera
October 2013
February 2014
July 2014
Databricks and Cloudera partner
Spark support added to CDH
Continuing support & innovation
5 ©2014 Cloudera, Inc. All rights reserved.
Spark is a Core Component of Hadoop
Hadoop Core; 2589
Spark; 4149All Other Ecosystem Projects Shipped by
Cloudera; 12438
Commit Activity Past 12 Months
6 ©2014 Cloudera, Inc. All rights reserved.
Fully Integrated into CDH
• Integrated and supported part of our platform
• Diverse use cases in production
• Well-trained support and external trainings
3RD PARTY APPS
STORAGE
BATCHPROCESSING
INTERACTIVESQL
SEARCHENGINE
MACHINELEARNING
STREAMPROCESSING
WORKLOAD MANAGEMENT
FILESYSTEM ONLINE NOSQL
7 ©2014 Cloudera, Inc. All rights reserved.
Customer Adoption
Search personalization through machine
learning investigations
Fast processing of millions of stock
positions and future scenarios
Genomics research using Spark pipelines
Predictive modeling of disease conditions
8
What’s Next?
9 ©2014 Cloudera, Inc. All rights reserved.
The only hands-on deep dive into building unified
applications with Spark
Cloudera Developer Training for Apache Spark
Public GA: Aug 5, Redwood City
10 ©2014 Cloudera, Inc. All rights reserved.
• Simplifies and speeds up complex cluster deployments• Includes Cloudera Enterprise and ScaleMP's Versatile SMP
(vSMP) architecture• Built on the Intel(R) Xeon(R) processor-based Dell R920
hardware• Optimized for Spark
Dell In-Memory Appliances for Cloudera Enterprise
11 ©2014 Cloudera, Inc. All rights reserved.
Spark as the Standard Processing Engine
12 ©2014 Cloudera, Inc. All rights reserved.
The Hive and Spark communities are coming together to drive consolidation in the Hadoop ecosystem
Bringing the Communities Together
13 ©2014 Cloudera, Inc. All rights reserved.
Hive on Spark
14 ©2014 Cloudera, Inc. All rights reserved.
Architecture
SPARK
BATCH PROCESSING
STREAM PROCESSING
HIVEParser, Metastore, Semantic Analyser,
Logical Plan, Optimizer, Task execution layer
HDFS
MR Tez
15 ©2014 Cloudera, Inc. All rights reserved.
Our SQL on Hadoop Vision
SQL
BI and SQL Analytics
BatchProcessing
Mixed Spark and SQL Applications