Upload
bishwaroop-ganguly
View
45
Download
0
Embed Size (px)
Citation preview
TitleDemonstrat ing the Benefits o f Hyper-Acce lerat ionRoop Ganguly, Solution Architect
The End of Moore’s Law
350 nm180 nm
130 nm
90 nm
65 nm
1.0
2.0
3.0
1970 1980 1990 2000
Power Wall
GHz
Gordon Moore
Implications for Big Data
Security AnalyticsRisk Management
Behavioral Analytics
Natural Language Processing
AI/Deep Learning
Machine Learning
CPU-Bound Applications – A New Bottleneck
40Gb-100GbNetwor
k
Now that faster networking and disk technologies have emerged, CPUs act like “stop signs” for computation
Node 1
Node 2
Node 3
AcceleratorsMicroprocessor and Cloud Vendors Respond
ASIC
GPU
FPGA
Data Scientists & Developers
Performance Team
Inhibitor: Programming Model Gapfor Hardware Accelerators
Two wildly different skill
sets
CPU GPU FPGA
Data Science Programming Model
BIG DATA PLATFORMS
Acceleration Programming Model
Programming Model Gap
Cross Platform
Cross Hardware
Intelligent, automatic computation routing
Zero code change
Introducing BigstreamHyper-acceleration Layer
Dataflow Adaptation Layer
Bigstream Dataflow
Bigstream Hypervisor
HYPER-ACCELERATION LAYER
BIG DATA PLATFORMS
CPU GPU FPGA3X to 30X acceleration
Accelerated Spark Architecture with Bigstream
9 9
Business Intelligence Use Case
Business Intelligence Query
•Based on Transaction Processing Performance Council – Decision Support (TPC-DS) Benchmark
•Spark/SQL Query: SELECT i_item_id , avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3 | FROM store_sales, customer_demographics, date_dim, item, promotion WHERE ss_sold_date_sk = d_date_sk AND ss_item_sk = i_item_sk AND…….•Input: approximately 2GB of avro table data •Simultaneously run software-accelerated and unaccelerated on identical Amazon EMR clusters
Business Intelligence Use Case Demo
12 12
ETL Adtech Use Case
Adtech ETL/ML Data Pipeline
Spark Streamin
g
Spark Streamin
g
APPLICATION/WEB
SERVERS KAFKA
clicks
clicks, likes
impressions
USERS
Spark ML
RTB System
s
Distributed messaging system
(tens of servers)
Distributed computation system
(hundreds of servers)
Millions of users
ETL Use Case Demo
Announcement –Bigstream onAWS EMR
Setting the bootstrap script
Bigstream ON EMRAdd the Bigstream bootstrap URLand your cluster has hyper-acceleration
Thank You