Upload
spark-summit
View
185
Download
7
Embed Size (px)
Citation preview
SPARK + FLASHBLADE DELIVERING INSIGHTS FROM 5PB OF PRODUCT LOGS AT PURE STORAGE
Brian Gold Pure Storage
© 2017 PURE STORAGE INC. 2
ALL-FLASH STORAGE FOR DATA-INTENSIVE COMPUTING
© 2017 PURE STORAGE INC. 3
DATA POTENTIAL IS GROWING EXPONENTIALLY
BIG DATA
FAST DATA X
© 2017 PURE STORAGE INC. 4
DATA POTENTIAL IS GROWING EXPONENTIALLY
BIG DATA
FAST DATA
AGILE DATA X X = DATA
ADVANTAGE
© 2017 PURE STORAGE INC. 5
AGILE DATA MUST BE EFFORTLESS TOP 1%: PURE STORAGE vs ALL OTHER BUSINESSES IN CUSTOMER SATISFACTION
The Pure Storage team brings issues to our attention before we are even aware of them.”
– Matt Harris, Head of IT MERCEDES AMG PETRONAS Formula One™ Team
> 7700 incidents automatically ID’d & resolved > 170 sev-1 issues avoided to date
(As measured by Satmetrix 2016 NPS Global B2B Benchmarks.)
© 2017 PURE STORAGE INC. 6
PURE’S ANALYTICS PIPELINE
CURRENT CHALLENGES
① >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS
② >5PB ACROSS 100s OF DATA SILOS
Engineering tickets
Internal QA tests Product logs
CRM/Sales feed
Manufacturing
© 2017 PURE STORAGE INC. 7
LOG ANALYSIS TODAY
Raw logs (grep, awk, etc.)
AD-HOC OR ENSEMBLE ANALYSIS IS TOO SLOW (OR IMPOSSIBLE)
Extract & Aggregate
© 2017 PURE STORAGE INC. 8
IMPROVEMENTS WITH SPARK
100X+ PERFORMANCE IMPROVEMENTS WITH LESS BESPOKE CODE
Raw logs (archive only)
Extract Transform Aggregate Load
© 2017 PURE STORAGE INC. 9
TOOL CONSOLIDATION
� SPARK ENABLES WORKFLOW CONSOLIDATION • Fewer compute resources • More code reuse
� CHALLENGES FOR SPARK & ECOSYSTEM • Increase performance visibility for tuning partitioning and ingest flows • Improve indexing options integrated in Spark for real-time queries • Eliminating OOMs and spurious exceptions
© 2017 PURE STORAGE INC. 10
PURE’S ANALYTICS PIPELINE
CURRENT CHALLENGES
① >10 ANALYSIS TOOLS, PLUS AD-HOC SCRIPTS
② >5PB ACROSS 100s OF DATA SILOS
Engineering tickets
Internal QA tests Product logs
CRM/Sales feed
Manufacturing
© 2017 PURE STORAGE INC. 11
INFRASTRUCTURE SILOS
HDFS
POSTGRES
ELASTIC
KAFKA
© 2017 PURE STORAGE INC. 12
INFRASTRUCTURE SILOS SPREAD OVER 100+ DATABASES
HDFS
POSTGRES
ELASTIC
KAFKA
HDFS
POSTGRES
MYSQL HDFS
ELASTIC POSTGRES
© 2017 PURE STORAGE INC. 13
INFRASTRUCTURE SILOS SPREAD OVER 100+ DATABASES
HDFS
POSTGRES
ELASTIC
KAFKA
HDFS
POSTGRES
MYSQL HDFS
ELASTIC POSTGRES
RACKS OF EQUIPMENT, OFTEN AT LOW AVERAGE UTILIZATION
© 2017 PURE STORAGE INC. 14
DECOUPLING COMPUTE & STORAGE
?
© 2017 PURE STORAGE INC. 15
INTRODUCING FLASHBLADE ALL-FLASH FILE AND OBJECT STORAGE
BIG, FAST, SIMPLE All-flash HW & SW Up to 1.6PBs, 1M NFS OPS, and 16 GB/s in each 4U chassis Designed to scale to 100s of blades
FILE + OBJECT Access via multiple protocols Ideal for consolidation of applications and access patterns BLADE ELASTICITY ELASTIC FABRIC
© 2017 PURE STORAGE INC. 16
INFRASTRUCTURE CONSOLIDATION CLOUD-SCALE EFFICIENCY AND AGILITY
rsyslog
1000-CORE SPARK CLUSTER ON 250TB FLASHBLADE
© 2017 PURE STORAGE INC. 17
INFRASTRUCTURE CONSOLIDATION CLOUD-SCALE EFFICIENCY AND AGILITY
rsyslog
1000-CORE SPARK CLUSTER ON 250TB FLASHBLADE
DENSITY: SHRINK 5 RACKS TO 1 (HDFS VS FLASHBLADE) FLEXIBILITY: SCALE COMPUTE AND STORAGE INDEPENDENTLY
EFFICIENCY: CONSOLIDATE APPLICATIONS & CLUSTERS, SHARE DATA
© 2017 PURE STORAGE INC. 18
THIS IS JUST THE BEGINNING…
� THE FUTURE IS BRIGHT FOR SPARK + FLASHBLADE • Spark is a fast, flexible, swiss-army knife for consolidating analytics workflows • Running Spark on FlashBlade enables infrastructure consolidation and drives
flexibility and efficiency for data science and supporting teams
� COME VISIT US AT BOOTH 208 IN THE EXPO HALL!
� BIG
� FAST
� SIMPLE