32
Spark Summit June 2014

Announcing Databricks Cloud (Spark Summit 2014)

Embed Size (px)

Citation preview

Page 1: Announcing Databricks Cloud (Spark Summit 2014)

Spark Summit June 2014

Page 2: Announcing Databricks Cloud (Spark Summit 2014)

Apache Spark and Databricks

Page 3: Announcing Databricks Cloud (Spark Summit 2014)

Adoption All major Hadoop distributions include Spark Beyond Hadoop

Page 4: Announcing Databricks Cloud (Spark Summit 2014)

Partnerships Partner with Spark distributors to provide great experience to every Spark user Partners

Page 5: Announcing Databricks Cloud (Spark Summit 2014)

Certification Build a strong application ecosystem

Spark API

Spark Distros …

Distros Cert

Spark Apps

… App Cert

Page 6: Announcing Databricks Cloud (Spark Summit 2014)

Certification Free certification process

Scripts for certifying Spark distributions •  Developed by community •  Open-source

Anyone will be able to certify any Spark distribution

Page 7: Announcing Databricks Cloud (Spark Summit 2014)

Training We’ve been teaching Spark since 2012 •  400+ people this year through Databricks

Just launched a new training program •  Already hold workshops in 5 cities

300+ people signed up for training on Wednesday

Page 8: Announcing Databricks Cloud (Spark Summit 2014)

Solve Big Data Challenges

Page 9: Announcing Databricks Cloud (Spark Summit 2014)

Big Promise

Great successes using Big Data

Page 10: Announcing Databricks Cloud (Spark Summit 2014)

Big Promise

Your company here! Every organization collects data

Great successes using Big Data

Page 11: Announcing Databricks Cloud (Spark Summit 2014)

Big Challenge

Great successes using Big Data

Your company here!

Google, Facebook spend billions $ to develop, implement, and run data analysis tools and products

Every organization collects data

Page 12: Announcing Databricks Cloud (Spark Summit 2014)

Typical Story

Your company starts a Big Data initiative You are tasked to… 1) Build a Hadoop cluster 2) Build a data pipeline

3) Get insights & build data products

Clusters hard to set up and manage Need to integrate a zoo of tools Tools are hard to use

(IT)

(engineers, data scientists)

(engineers, data scientists, analysts)

Page 13: Announcing Databricks Cloud (Spark Summit 2014)

Typical Data Pipeline

Data

ETL

Exploration

Dashboards& Reports

Data Products

Integrate disparate, clunky tools Hard to navigate data, develop and deploy apps

Advanced Analytics

Page 14: Announcing Databricks Cloud (Spark Summit 2014)

Vision

Make big data easy

Page 15: Announcing Databricks Cloud (Spark Summit 2014)

From Challenges to Solutions

Challenges Solutions

Apache Spark

Hosted platform

Interactive Workspace Tools are hard to use

Clusters hard to set up and manage

Need to integrate a zoo of tools

Page 16: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Cloud

Databricks Cloud

Databricks Workspace

Databricks Platform

Page 17: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Platform

… …

Databricks Workspace

Databricks Platform

Page 18: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Platform

Start clusters in seconds Zero-cost management Dynamically scale up & down

Page 19: Announcing Databricks Cloud (Spark Summit 2014)

Apache Spark

Unifies •  Streaming •  SQL •  Machine learning •  Graphs Single system, single API Databricks Platform

Databricks Workspace

Page 20: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Workspace

Dashboards Notebooks Jobs Apps

Databricks Platform

Databricks Workspace

Page 21: Announcing Databricks Cloud (Spark Summit 2014)

Notebooks

Support Python, SQL, Scala Interactive commands & plots On-line collaboration

Page 22: Announcing Databricks Cloud (Spark Summit 2014)

Dashboards

WYSIWYG builder Interactive plots One-click publishing

Page 23: Announcing Databricks Cloud (Spark Summit 2014)

Job Launcher

Run arbitrary Spark jobs, programmatically

Page 24: Announcing Databricks Cloud (Spark Summit 2014)

Dramatically Simplify Data Pipeline

Data

ETL Exploration Advanced Analytics Dashboards & Reports Data Products

Cloud

Page 25: Announcing Databricks Cloud (Spark Summit 2014)

Dramatically Simplify Data Pipeline

Data

Free users to focus on finding answers & building products

ETL Exploration Advanced Analytics Dashboards & Reports Data Products

Cloud

Page 26: Announcing Databricks Cloud (Spark Summit 2014)

Demo

Page 27: Announcing Databricks Cloud (Spark Summit 2014)

Availability

Started closed beta program earlier this year

Limited availability soon •  Gradually ramping up •  Sign up on databricks.com!

Page 28: Announcing Databricks Cloud (Spark Summit 2014)

3rd Party Apps

Databricks Platform

DatabricksWorkspace

Page 29: Announcing Databricks Cloud (Spark Summit 2014)

3rd Party Apps

Databricks Platform

… DatabricksWorkspace Apps

Page 30: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Cloud and Spark

Databricks Cloud runs 100% Apache Spark •  No lock in: any Databricks Cloud app runs on any

certified Spark distribution

Databricks Cloud accelerates Spark adoption •  Provide easiest way to learn and use Apache Spark

Page 31: Announcing Databricks Cloud (Spark Summit 2014)

Databricks Cloud

Databricks Platform

Databricks Workspace

Make big data easy

Dramatically simplify •  analyzing big data •  building data products

Fuel growth of Spark ecosystem

Page 32: Announcing Databricks Cloud (Spark Summit 2014)

Thank You!