Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads

  • View
    138

  • Download
    0

  • Category

    Science

Preview:

Citation preview

1

Micro-architectural Characterization of Apache Spark on Batch and Stream

Processing Workloads

Ahsan Javed Awan EMJD-DC (KTH-UPC)

(https://www.kth.se/profile/ajawan/)Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC),

Vladimir Vlassov(KTH)

2

MotivationWhy should we care about architecture support?

*Taken from Babak's slides

Data Growing Faster Than Technology

3

MotivationCont...

Our Goal Our Goal

Improve the node level performancethrough architecture support

*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Phoenix ++,Metis, Ostrich,

etc..

Hadoop, Spark,Flink, etc..

4

Our Approach

● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).

● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA

– Limited to batch processing workloads only

– Does not consider the velocity aspect of big data

– Experiments are based on older version of Spark.

What are the major performance bottlenecks??

5

Our Approach

● Does micro-architectural performance remains consistent across batch and stream processing workloads ?

● How Data-frames micro-architecturally compare to RDDs ?

● How data velocity affect the micro-architectural performance ?

What are the remaining questions??

6

Progress Meeting 12-12-14Which Scale-out Framework ?

[Picture Courtesy: Amir H. Payberah]

● Tuning of Spark internal Parameters● Tuning of JVM Parameters (Heap size etc..)● Micro-architecture Level Analysis using Hardware Performance

Counters.

7

Our ApproachWhich benchmarks?

8

Our Hardware Configuration

Which Machine ?

Hyper Threading and Turbo-boost are disabled

Intel's Ivy Bridge Server

9

Does micro-architectural performance remains consistent ?

Stream processing is micro-architecturally similar to batch processing in Spark

10

Cont..

Stream processing is micro-architecturally similar to batch processing in Spark

11

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

12

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

13

Cont..

Streaming workloads with similar Spark transformations have different micro-architectural behavior

14

Cont..

Workload Spark Transformation Input data rate

Window size (s)

Working Set with 2s sampling

interval

WWc FlatMap, Map, ReduceByKeyAndWindow

10^4 30 15 x 10^4

CSpc FlatMap, Map, CountByValueAndWindow

10^4 10 5 x 10^4

CErpz FlatMap, Map, Window, GroupByKey

10^4 30 15 x 10^4

CAuC FlatMap, Map, Window, GroupByKey, Count

10^4 10 5 x 10^4

Tpt FlatMap, ReduceByKeyAndWindow,

Transform

10^1 60 30 x 10^1

Micro-batch size determines the micro-architectural behavior of stream processing workloads with similar Spark transformations

15

Do Dataframes perform better than RDDs at micro-architectural level?

DataFrame exhibit 25% less back-end bound stalls 64% less DRAM bound stalled cycles

25% less BW consumption10% less starvation of execution resources

Dataframes have better micro-architectural performance than RDDs

16

How Data Velocity affect micro-architectural performance?

Better CPU utilization at higher data velocity

17

Cont..

Higher instruction retirement at higher data velocity Higher L1-Bound stalls at higher data velocity

Less starvation at higher data velocity Higher BW consumption at higher velocity

18

Our ApproachConclusion

● Batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only.

● Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

● If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved.

19

THANK YOU

20

Our ApproachList of Papers

● Performance characterization of in-memory data analytics on a modern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015 (Best Paper Award).

● How Data Volume Affects Spark Based Data Analytics on a Scale-up Server in 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with VLDB 2015, Hawaii, USA .

● Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. (accepted to BDCloud 2016)

● Node Architecture Implications for In-Memory Data Analytics in Scale-in Clusters (accepted to IEEE BDCAT 2016)

● Implications of In-Memory Data Analytics with Apache Spark on Near Data Computing Architectures (under submission).

Recommended