18
Democratizing AI with Apache Spark Ali Ghodsi Co-Founder and CEO

Democratizing AI with Apache Spark

Embed Size (px)

Citation preview

Page 1: Democratizing AI with Apache Spark

Democratizing AI with Apache SparkAli GhodsiCo-Founder and CEO

Page 2: Democratizing AI with Apache Spark

AI is changing the world

2

Why now?

AlphaGoSIRI/assistantsSelf-driving cars

Page 3: Democratizing AI with Apache Spark

Data is the catalyst

3

AI hasn’t been democratized

Better training, tuning, validationMore data

Clickstreams

Sensor data (IoT)

Video

Speech

Handwriting

Page 4: Democratizing AI with Apache Spark

The hardest part of AI isn’t AI

4

“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015

How do we democratize AI?

Page 5: Democratizing AI with Apache Spark

5

“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015

+ AI

FLEXIBLE FAST BIG DATA

Page 6: Democratizing AI with Apache Spark

Some gaps remain

6

Manage Data infrastructure

• Create, configure, monitor resilient big data clusters.• Securely access silos of disparate data sources.• Enforce proper data governance.• 1

Empower teams to be productive

• Interactively explore data and prototype ideas.• Securely share big data clusters among analysts.• Debug, troubleshoot, version-control big data applications.•

2

Establish Production-Ready Applications

• Setup robust ML data pipelines for ETL/ELT.• Productionize real-time applications with HA, FT.• Build, serve, maintain advanced machine learning models.• 3

Page 7: Democratizing AI with Apache Spark

Databricks: Closing the gap

7

• Separate compute & storage

• Integrate existing data stores

• Efficient cache on first access

Just-in-Time Data Platform1

Agile + Low TCO

• Interactive notebooks, dashboards, reports

• Real-time exploration, machine learning, graph use cases

Integrated Workspace2

Accelerate Time to Value

• Workflow scheduler for ML, streaming, SQL, ETL

• Performance-optimized, high availability, fault-tolerant

Automated Spark Management3

Performance

Page 8: Democratizing AI with Apache Spark

Enterprise AI use-cases

8

Predict credit score, credit limit, anomalies

Predict energy demand based on massive weather data

Natural language processing to extract author graph

Predict player churn, predicting network outages

Predict machine equipment failure

Page 9: Democratizing AI with Apache Spark

New Frontier of AI: Deep Learning

9

Detect cancer Understand speech Infer locationIdentify landmarks in photosRecognize Mandarin and

EnglishImprove cancer detection

Page 10: Democratizing AI with Apache Spark

Faster and easier deep learning with Databricks

10

GPUs

• TensorFlow: The most popular deep learning framework.

• TensorFrames: Makes TensorFlow computations faster and easier to program on Spark.

TensorFlow on

TensorFrames and GPUs support out-of-the-box

Massive parallelism

Page 11: Democratizing AI with Apache Spark

Deep Learning on Databricks

11

Data Ingest

Feature extraction

Model Training

Product-ionizeClusters

Jobs & WorkflowsTensorFrames+

GPUs

Interactive exploration

Just-in-time data platform

Automated management

Page 12: Democratizing AI with Apache Spark

Thank you.

Page 13: Democratizing AI with Apache Spark

Deep Learning references• Image recognition (Geo ID):

– https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/

• Cancer screening:– http://

www.popsci.com/how-deep-learning-technology-could-be-next-step-in-cancer-detection

– https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/

• Speech translation:– https://www.technologyreview.com/s/544651/baidus-deep-learning-syst

em-rivals-people-at-speech-recognition/

13

Page 14: Democratizing AI with Apache Spark

PHARMA MEDIA INDUSTRIAL

Generating programsbased on Nielsen ratings

Predictive Analytics

Analytics Transforming Industries

15

Real-time detection of failing wind-turbines

Anomaly Detection

Predicting Diabetes in Rural Counties

Next-Gen Product R&D

Real-time Data-Driven Analytics Applications

Page 15: Democratizing AI with Apache Spark

DATA WAREHOUSES

HADOOP / DATA LAKESYour Storage

CLOUD STORAGE

Orchestrated Spark In The Cloud

16

Databricks Just-in-Time Data Platform

Integrated WorkspaceDASHBOARDSReports

NOTEBOOKSgithub, viz, collaboration

Ente

rpris

e Se

curit

yAc

cess

Con

trol

, Aud

iting

, Enc

rypt

ion

BI Tools

Open Source

Your Custom Spark Apps

• Clusters: Auto-scaled, resilient, multi-tenant• Data Integration: Universal secure and fast• Interfaces: BI tools & REST API

Databricks Managed Services

PRODUCTION JOBS

+

Powered by Apache Spark

Page 16: Democratizing AI with Apache Spark

Data Warehouses Hadoop / Data LakesYOUR STORAGECloud Storage

Orchestrated Spark In The Cloud

Databricks Just-in-Time Data Platform

Integrated WorkspaceDashboardsReports

Notebooksgithub, viz, collaboration

BI Tools

Open Source

Your Custom Spark Apps

• Clusters: Auto-scaled, resilient, multi-tenant• Data Integration: Universal secure and fast• Interfaces: BI tools & REST API

Production Jobs

Powered by Apache Spark

+Managed Services

Your Storage

Enterprise Security Access Control, Auditing, Encryption

Page 17: Democratizing AI with Apache Spark

Today’s Data Reality

18

Siloed, Unstructured, Fast-Growing Data

Data Warehouses Hadoop / Data LakesCloud Storage

Page 18: Democratizing AI with Apache Spark

Databricks Just-in-Time Data Platform Powered by Apache Spark® ™

Integrated Enterprise Security Framework

Integrated Workspace

Notebooks Dashboards

Your Custom Spark Apps

Production Jobs

BI Tools

Orchestrated Apache® Spark™ in the Cloud

Open Source Managed Services+

Your Storage

Cloud Storage | Data Warehouses | Data Lakes

OPTIONAL