20
1 Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC), Vladimir Vlassov(KTH)

Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

Embed Size (px)

Citation preview

Page 1: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

1

Micro-architectural Characterization of Apache Spark on Batch and Stream

Processing Workloads

Ahsan Javed Awan EMJD-DC (KTH-UPC)

(https://www.kth.se/profile/ajawan/)Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC),

Vladimir Vlassov(KTH)

Page 2: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

2

MotivationWhy should we care about architecture support?

*Taken from Babak's slides

Data Growing Faster Than Technology

Page 3: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

3

MotivationCont...

Our Goal Our Goal

Improve the node level performancethrough architecture support

*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Phoenix ++,Metis, Ostrich,

etc..

Hadoop, Spark,Flink, etc..

Page 4: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

4

Our Approach

● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).

● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA

– Limited to batch processing workloads only

– Does not consider the velocity aspect of big data

– Experiments are based on older version of Spark.

What are the major performance bottlenecks??

Page 5: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

5

Our Approach

● Does micro-architectural performance remains consistent across batch and stream processing workloads ?

● How Data-frames micro-architecturally compare to RDDs ?

● How data velocity affect the micro-architectural performance ?

What are the remaining questions??

Page 6: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

6

Progress Meeting 12-12-14Which Scale-out Framework ?

[Picture Courtesy: Amir H. Payberah]

● Tuning of Spark internal Parameters● Tuning of JVM Parameters (Heap size etc..)● Micro-architecture Level Analysis using Hardware Performance

Counters.

Page 7: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

7

Our ApproachWhich benchmarks?

Page 8: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

8

Our Hardware Configuration

Which Machine ?

Hyper Threading and Turbo-boost are disabled

Intel's Ivy Bridge Server

Page 9: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

9

Does micro-architectural performance remains consistent ?

Stream processing is micro-architecturally similar to batch processing in Spark

Page 10: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

10

Cont..

Stream processing is micro-architecturally similar to batch processing in Spark

Page 11: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

11

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

Page 12: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

12

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

Page 13: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

13

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

Page 14: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

14

Cont..

Workload Spark Transformation Input data rate

Window size (s)

Working Set with 2s sampling

interval

WWc FlatMap, Map, ReduceByKeyAndWindow

10^4 30 15 x 10^4

CSpc FlatMap, Map, CountByValueAndWindow

10^4 10 5 x 10^4

CErpz FlatMap, Map, Window, GroupByKey

10^4 30 15 x 10^4

CAuC FlatMap, Map, Window, GroupByKey, Count

10^4 10 5 x 10^4

Tpt FlatMap, ReduceByKeyAndWindow,

Transform

10^1 60 30 x 10^1

Micro-batch size determines the micro-architectural behavior of stream processing workloads with similar Spark transformations

Page 15: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

15

Do Dataframes perform better than RDDs at micro-architectural level?

DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles

25% less BW consumption10% less starvation of execution resources

Dataframes have better micro-architectural performance than RDDs

Page 16: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

16

How Data Velocity affect micro-architectural performance?

Better CPU utilization at higher data velocity

Page 17: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

17

Cont..

Higher instruction retirement at higher data velocity Higher L1-Bound stalls at higher data velocity

Less starvation at higher data velocity Higher BW consumption at higher velocity

Page 18: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

18

Our ApproachConclusion

● Batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only.

● Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

● If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved.

Page 19: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

19

THANK YOU

Page 20: Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

20

Our ApproachList of Papers

● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).

● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA .

● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. (accepted to BDCloud 2016)

● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters (accepted to IEEE BDCAT 2016)

● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).