Architecture Support for Big Data Analytics/A.Awan.pdf · Architecture Support for Big Data...

Preview:

Citation preview

1

Architecture Support for Big Data Analytics

Ahsan Javed Awan EMJD-DC (KTH-UPC)

(http://uk.linkedin.com/in/ahsanjavedawan/)Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir

Vlassov(KTH)

2

MotivationWhy should we care?

*Source: Babak Falsafi slides

3

MotivationCont...

● A mismatch between the characteristics of emerging workloads and the underlying hardware.

– M. Ferdman et-al, “Clearing the clouds: A study of emerging scale-out workloads on modern hardware,” in ASPLOS 2012.

– Z. Jia, et-al “Characterizing data analysis workloads in data centers,” in IISWC 2013.

– C. Zheng et-al, “Characterizing os behavior of scale-out data center workloads.” in WIVOSCA 2013.

– M. Dimitrov et al, “Memory system characterization of big data workloads,” in BigData Conference, 2013.

– Z. Jia et-al, “Characterizing and subsetting big data workloads,” in IISWC 2014

– A. Yasin et-al, “Deep-dive analysis of the data analytics workload in cloudsuite,” in IISWC 2014.

– L. Wang et-al, “Bigdatabench: A big data benchmark suite from internet services,” in HPCA, 2014.

– T. Jiang, et-al, “Understanding the behavior of in-memory computing workloads,” in IISWC 2014

4

MotivationPerformance Characterization of In-Memory Data Analytics

on a Modern Cloud Server

*Source: SGI

5

MotivationCont...

Our Focus Our Focus

Improve the single node performancein scale-out configuration

*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Phoenix ++,Metis, Ostrich,

etc..

Hadoop, Spark,Flink, etc..

6

Progress Meeting 12-12-14Which Scale-out Framework ?

[Picture Courtesy: Amir H. Payberah]

7

Our Approach

● A three fold analysis method at Application, Thread and Micro-architectural level

– Tuning of Spark internal Parameters

– Tuning of JVM Parameters (Heap size etc..)

– Concurrency Analysis

– General Architectural Exploration

Methodology

8

Our ApproachBenchmarks

3GB of Wikipedia raw datasets, Amazon Movies Reviews and numerical records have been used

9

Our Hardware Configuration

Machine Details

Hyper Threading and Turbo Boost is disabled

Hyper Threading and Turbo-boost are disabled

10

Our ApproachSystem Configuration

11

Multicore Scalability of SparkApplication Level Performance

Spark scales poorly in Scale-up configuration

12

Multicore Scalability of SparkStage Level Performance

● Shuffle Map Stages don't scale beyond 12 threads across different workloads

● No of concurrent files open in Map-side shuffling is C*R where C is no of threads in executor pool and R is no of reduce tasks

13

Multicore Scalability of SparkTask Level Performance

Percentage increase in Area Under the Curve compared to 1-thread

14

Is there thread level load imbalance ??

15

CPU Utilization is not scaling with performance

16

Is there any Work Time Inflation ??

17

How does Micro-architecture contribute to Work time inflation ??

18

Cont...

19

Cont...

20

Is Memory Bandwidth a bottleneck??

21

Key Findings

● More than 12 threads in an executor pool does not yield significant performance

● Spark runtime system need to be improved to provide better load balancing and avoid work-time inflation.

● Work time inflation and load imbalance on the threads are the scalability bottlenecks.

● Removing the bottlenecks in the front-end of the processor would not remove more than 20% of stalls.

● Effort should be focused on removing the memory bound stalls since they account for up to 72% of stalls in the pipeline slots.

● Memory bandwidth of current processors is sufficient for in-memory data analytics

22

MotivationHow Data Volume Affects Spark Based Data Analytics on a

Scale-up Server

23

MotivationDo Spark based data analytics benefit from using larger

scale-up servers

24

MotivationIs GC detrimental to scalability of Spark applications?

25

MotivationHow does performance scale with data volume ?

26

MotivationDoes GC time scale linearly with Data Volume ??

27

MotivationHow does CPU utilization scale with data volume ?

28

MotivationIs File I/O detrimental to performance ?

29

MotivationHow does data size affects micro-architectural

performance ?

30

MotivationHow Data Volume Affects Spark Based Data Analytics on a

Scale-up Server

31

MotivationCont..

32

MotivationCont..

33

Key Findings

● Spark workloads do not benefit significantly from executors with more than 12 cores.

● The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time.

● With out any tuning, Parallel Scavenge garbage collection scheme outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads.

● Spark workloads exhibit improved instruction retirement due to lower L1 cache misses and better utilization of functional units inside cores at large volumes of data.

● Memory bandwidth utilization of Spark benchmarks decreases with large volumes of data and is 3x lower than the available off-chip bandwidth on our test machine

34

Our Approach

● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “Performance charaterization of in-memory data analytics on a mordern cloud server,” in 5th International IEEE Conference on Big Data and Cloud Computing, 2015.

● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How Data Volume Affects Spark Based Data Analytics on a Scale-up Server”, in 6th International Workshop on Big Data Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015.

What are the major bottlenecks??

35

MotivationFuture Directions

NUMA Aware Task Scheduling

Cache Aware Transformations

Exploiting Processing In Memory Architectures

HW/SW Data Prefectching

Rethinking Memory Architectures

Recommended