Upload
spark-summit
View
836
Download
0
Embed Size (px)
Citation preview
Democratizing AI with Apache SparkAli GhodsiCo-Founder and CEO
AI is changing the world
2
Why now?
AlphaGoSIRI/assistantsSelf-driving cars
Data is the catalyst
3
AI hasn’t been democratized
Better training, tuning, validationMore data
Clickstreams
Sensor data (IoT)
Video
Speech
Handwriting
…
The hardest part of AI isn’t AI
4
“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015
How do we democratize AI?
5
“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015
+ AI
FLEXIBLE FAST BIG DATA
Some gaps remain
6
Manage Data infrastructure
• Create, configure, monitor resilient big data clusters.• Securely access silos of disparate data sources.• Enforce proper data governance.• 1
Empower teams to be productive
• Interactively explore data and prototype ideas.• Securely share big data clusters among analysts.• Debug, troubleshoot, version-control big data applications.•
2
Establish Production-Ready Applications
• Setup robust ML data pipelines for ETL/ELT.• Productionize real-time applications with HA, FT.• Build, serve, maintain advanced machine learning models.• 3
Databricks: Closing the gap
7
• Separate compute & storage
• Integrate existing data stores
• Efficient cache on first access
Just-in-Time Data Platform1
Agile + Low TCO
• Interactive notebooks, dashboards, reports
• Real-time exploration, machine learning, graph use cases
Integrated Workspace2
Accelerate Time to Value
• Workflow scheduler for ML, streaming, SQL, ETL
• Performance-optimized, high availability, fault-tolerant
Automated Spark Management3
Performance
Enterprise AI use-cases
8
Predict credit score, credit limit, anomalies
Predict energy demand based on massive weather data
Natural language processing to extract author graph
Predict player churn, predicting network outages
Predict machine equipment failure
New Frontier of AI: Deep Learning
9
Detect cancer Understand speech Infer locationIdentify landmarks in photosRecognize Mandarin and
EnglishImprove cancer detection
Faster and easier deep learning with Databricks
10
GPUs
• TensorFlow: The most popular deep learning framework.
• TensorFrames: Makes TensorFlow computations faster and easier to program on Spark.
TensorFlow on
TensorFrames and GPUs support out-of-the-box
Massive parallelism
Deep Learning on Databricks
11
Data Ingest
Feature extraction
Model Training
Product-ionizeClusters
Jobs & WorkflowsTensorFrames+
GPUs
Interactive exploration
Just-in-time data platform
Automated management
Thank you.
Deep Learning references• Image recognition (Geo ID):
– https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/
• Cancer screening:– http://
www.popsci.com/how-deep-learning-technology-could-be-next-step-in-cancer-detection
– https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
• Speech translation:– https://www.technologyreview.com/s/544651/baidus-deep-learning-syst
em-rivals-people-at-speech-recognition/
13
PHARMA MEDIA INDUSTRIAL
Generating programsbased on Nielsen ratings
Predictive Analytics
Analytics Transforming Industries
15
Real-time detection of failing wind-turbines
Anomaly Detection
Predicting Diabetes in Rural Counties
Next-Gen Product R&D
Real-time Data-Driven Analytics Applications
DATA WAREHOUSES
HADOOP / DATA LAKESYour Storage
CLOUD STORAGE
Orchestrated Spark In The Cloud
16
Databricks Just-in-Time Data Platform
Integrated WorkspaceDASHBOARDSReports
NOTEBOOKSgithub, viz, collaboration
Ente
rpris
e Se
curit
yAc
cess
Con
trol
, Aud
iting
, Enc
rypt
ion
BI Tools
Open Source
Your Custom Spark Apps
• Clusters: Auto-scaled, resilient, multi-tenant• Data Integration: Universal secure and fast• Interfaces: BI tools & REST API
Databricks Managed Services
PRODUCTION JOBS
+
Powered by Apache Spark
Data Warehouses Hadoop / Data LakesYOUR STORAGECloud Storage
Orchestrated Spark In The Cloud
Databricks Just-in-Time Data Platform
Integrated WorkspaceDashboardsReports
Notebooksgithub, viz, collaboration
BI Tools
Open Source
Your Custom Spark Apps
• Clusters: Auto-scaled, resilient, multi-tenant• Data Integration: Universal secure and fast• Interfaces: BI tools & REST API
Production Jobs
Powered by Apache Spark
+Managed Services
Your Storage
Enterprise Security Access Control, Auditing, Encryption
Today’s Data Reality
18
Siloed, Unstructured, Fast-Growing Data
Data Warehouses Hadoop / Data LakesCloud Storage
Databricks Just-in-Time Data Platform Powered by Apache Spark® ™
Integrated Enterprise Security Framework
Integrated Workspace
Notebooks Dashboards
Your Custom Spark Apps
Production Jobs
BI Tools
Orchestrated Apache® Spark™ in the Cloud
Open Source Managed Services+
Your Storage
Cloud Storage | Data Warehouses | Data Lakes
OPTIONAL