15
Virtualizing Analytics with Apache Spark Arsalan Tavakoli- Shiraji Spark Summit East 2017

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Embed Size (px)

Citation preview

Page 1: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Virtualizing Analytics with Apache Spark

Arsalan Tavakoli-ShirajiSpark Summit East 2017

Page 2: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Enterprise aspirations:More data, more intelligence

David Wang
Pick from customers who are presenting at Summit
Page 3: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

So what’s the formula for success?

Page 4: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

ANALYTICS

PEOPLEDATA

3 pillars of any data-driven use case

David Wang
Pick from customers who are presenting at Summit
Page 5: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Data: Bigger, messier, more spread out

DATA • Spread out into silos• Varying types and structure• Faster Velocity

David Wang
Pick from customers who are presenting at Summit
Page 6: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Analytics: More variety and complexity

• Multiple approaches• Iterative discovery• Difficult to productionize

ANALYTICS

David Wang
Pick from customers who are presenting at Summit
Page 7: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

People: Collaboration from start to finish

PEOPLE • Many roles involved• Diverse skillsets and goals• Inefficient hand-offs

David Wang
Pick from customers who are presenting at Summit
Page 8: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Can we reuse existing technologies?

Page 9: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

DATA

Only structured data; Costly to scale

First Generation: The Data WarehouseReporting on small data

ANALYTICS

PEOPLE

SQL only

Targeted at BI

David Wang
Pick from customers who are presenting at Summit
Page 10: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

ANALYTICS

PEOPLE

Disparate and complex tools

Limited to developers with big data expertise

Second Generation: Hadoop + Data LakeCapture data first, ETL later

DATA

Hard to centralize the data;Limited value without ETL

Page 11: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

V I R T U A L A N A LY T I CS

Decoupled compute and storage

Uniform data management and security model

Unified analytics engine

Enterprise-wide collaboration

Data Warehouses

DATA

Cloud storage

Cloud Storage

And many others…

Hadoop Storage

PEOPLE

Data Science

Data Engineering

And many others…

BI Analysts

The New Paradigm

Page 12: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Is Spark the Answer?

Data Warehouses

DATA

Cloud storage

Cloud Storage

And many others…

Hadoop Storage

PEOPLE

Data Science

Data Engineering

And many others…

BI Analysts

Page 13: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Databricks + Apache Spark

Managed Cloud Platform Integrated Workspace

Production Workflow

Automation

Optimized Data Access

Layer

Databricks Enterprise Security

Data Warehouses

DATA

Cloud storage

Many others…

Cloud Storage

And many others…

Hadoop Storage

PEOPLE

Data Science

Data Engineering

And many others…

BI Analysts

Page 14: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Case Study |

Video qualityReal-time anomaly detection

Viewer loyaltyGrow the Viacom audience

Page 15: Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

The Road Ahead