9
Apache Spark: The Analytics Operating System Anjul Bhambhri Vice President, IBM Big Data Engineering

Spark Summit Presentation by Anjul Bhambhri

Embed Size (px)

Citation preview

Page 1: Spark Summit Presentation by Anjul Bhambhri

Apache Spark:The Analytics Operating System

Anjul BhambhriVice President, IBM Big Data Engineering

Page 2: Spark Summit Presentation by Anjul Bhambhri

Deep Blue SQL RISC

DNA Transistor Magnetic Tape Linux PC

Fortran DRAM Mainframe Watson

Floppy Disk UPC

Punch Card

IBM: 100 years of (supporting) innovation

Page 3: Spark Summit Presentation by Anjul Bhambhri

The Analytics

Operating System

Apache Spark

Page 4: Spark Summit Presentation by Anjul Bhambhri

Enhance it! Offer it!

Leverage it!

Spark Technology Center @ SF

On-prem and on the cloud

Inside our products

At IBM, We Love Spark!

IBM Cloud Data Servicesnow featuring Spark isopen for data

Page 5: Spark Summit Presentation by Anjul Bhambhri

IBM is Building on Apache Spark

• IBM Analytics• IBM Commerce• IBM Watson• IBM Research• IBM Cloud

Quarks from IBMAnnounced Feb 2016

• Open-source platform for building IoT applications

• Light-weight & embeddable• Integrates with Spark

Page 6: Spark Summit Presentation by Anjul Bhambhri

• Lambda Architecture and Spark enable efficient batch and streaming analytics• Visualization at every step of data discovery enables better self service

The Weather Company clusters running hot: ~30 billion API requests per day ~120 million active mobile users #3 most active mobile user base Billions of events per day (1.3M/sec) ~360 PB of traffic daily Need to keep data forever

The use case:Efficient batch + streaming analysisSelf-serve data scienceBI / visualization tool support

An IBM Business

Spark for daily weather

Page 7: Spark Summit Presentation by Anjul Bhambhri

Spark in Health CareHealth Care Data Lakes Improve how healthcare is delivered Collect and combine data from dozens of sources Clinical, Operational, Financial Inside and outside your enterprise

Benefits Better medical outcomes for patients Control cost and improve quality

SystemML on Spark Predictive Risk Modeling Right patient intervention relating to adverse health events

Page 8: Spark Summit Presentation by Anjul Bhambhri

Spark in TelecomThe challenge: Improve customer satisfaction rates Multiple channels for customer interactions Very large data volumes

The need: Create a 360 degree view of a customer Stitch all interactions across channels –

“Customer Experience Journey” Classify interaction sentiment and take

necessary actions

• Spark Streaming brings all the data together• Spark Core is used to process and transform text and voice data• Spark MLLib algorithms stitch interactions on a journey and score “sentiment”• Spark SQL drives interactive queries via visual dashboards

PUB / SUBMQTT / WebSockets / Flume / Kafka

` ` `

JourneyDashboards

Interaction & Journey Data

Voice & Text Data

Page 9: Spark Summit Presentation by Anjul Bhambhri

Apache Spark:The Analytics Operating System

THANK YOU!