Demonstrating the Benefits of Hyper-Acceleration

Preview:

Citation preview

TitleDemonstrat ing the Benefits o f Hyper-Acce lerat ionRoop Ganguly, Solution Architect

The End of Moore’s Law

350 nm180 nm

130 nm

90 nm

65 nm

1.0

2.0

3.0

1970 1980 1990 2000

Power Wall

GHz

Gordon Moore

Implications for Big Data

Security AnalyticsRisk Management

Behavioral Analytics

Natural Language Processing

AI/Deep Learning

Machine Learning

CPU-Bound Applications – A New Bottleneck

40Gb-100GbNetwor

k

Now that faster networking and disk technologies have emerged, CPUs act like “stop signs” for computation

Node 1

Node 2

Node 3

AcceleratorsMicroprocessor and Cloud Vendors Respond

ASIC

GPU

FPGA

Data Scientists & Developers

Performance Team

Inhibitor: Programming Model Gapfor Hardware Accelerators

Two wildly different skill

sets

CPU GPU FPGA

Data Science Programming Model

BIG DATA PLATFORMS

Acceleration Programming Model

Programming Model Gap

Cross Platform

Cross Hardware

Intelligent, automatic computation routing

Zero code change

Introducing BigstreamHyper-acceleration Layer

Dataflow Adaptation Layer

Bigstream Dataflow

Bigstream Hypervisor

HYPER-ACCELERATION LAYER

BIG DATA PLATFORMS

CPU GPU FPGA3X to 30X acceleration

Accelerated Spark Architecture with Bigstream

9 9

Business Intelligence Use Case

Business Intelligence Query

•Based on Transaction Processing Performance Council – Decision Support (TPC-DS) Benchmark

•Spark/SQL Query: SELECT i_item_id , avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3 | FROM store_sales, customer_demographics, date_dim, item, promotion WHERE ss_sold_date_sk = d_date_sk AND ss_item_sk = i_item_sk AND…….•Input: approximately 2GB of avro table data •Simultaneously run software-accelerated and unaccelerated on identical Amazon EMR clusters

Business Intelligence Use Case Demo

12 12

ETL Adtech Use Case

Adtech ETL/ML Data Pipeline

Spark Streamin

g

Spark Streamin

g

APPLICATION/WEB

SERVERS KAFKA

clicks

clicks, likes

impressions

USERS

Spark ML

RTB System

s

Distributed messaging system

(tens of servers)

Distributed computation system

(hundreds of servers)

Millions of users

ETL Use Case Demo

Announcement –Bigstream onAWS EMR

Setting the bootstrap script

Bigstream ON EMRAdd the Bigstream bootstrap URLand your cluster has hyper-acceleration

Thank You

Recommended