45

Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Embed Size (px)

Citation preview

Page 1: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

wwwbsces

Automating Big Data Benchmarking

and Performance Analysis with ALOJA

October 2015

Nicolas Poggi Senior Researcher

Barcelona Supercomputing Center (BSC)

Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed

systems research ndash Based at the Technical University of Catalonia (UPC)

Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications

Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein

Interactions amp Docking

In place computational capabilities ndash Mare Nostrum Super Computer

Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA

bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc

BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams

ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures

ndash Open model bull No patents public IP publications and open source main focus

The MareNostrum 3 Supercomputer

Over 1015 Floating Point Operations per

second

Nearly 50000 cores

1008 TB of main memory 2 PB of disk storage

70 distributed through PRACE

24 distributed through RES

6 for BSC-CNS use

Over 1015 Floating Point Operations per second

Nearly 50000 cores

1008 TB of main memory

2 PB of disk storage

Agenda

1 Intro on Hadoop

performance

1 Current scenario and

problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking

1 Benchmarking workflow

2 DEMO

4 Results

1 HW and SW speedups

2 CostPerformance

3 Online results DEMO

5 Predictive Analytics and

learning

6 Future lines and conclusions

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 2: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Barcelona Supercomputing Center (BSC)

Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed

systems research ndash Based at the Technical University of Catalonia (UPC)

Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications

Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein

Interactions amp Docking

In place computational capabilities ndash Mare Nostrum Super Computer

Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA

bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc

BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams

ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures

ndash Open model bull No patents public IP publications and open source main focus

The MareNostrum 3 Supercomputer

Over 1015 Floating Point Operations per

second

Nearly 50000 cores

1008 TB of main memory 2 PB of disk storage

70 distributed through PRACE

24 distributed through RES

6 for BSC-CNS use

Over 1015 Floating Point Operations per second

Nearly 50000 cores

1008 TB of main memory

2 PB of disk storage

Agenda

1 Intro on Hadoop

performance

1 Current scenario and

problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking

1 Benchmarking workflow

2 DEMO

4 Results

1 HW and SW speedups

2 CostPerformance

3 Online results DEMO

5 Predictive Analytics and

learning

6 Future lines and conclusions

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 3: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

The MareNostrum 3 Supercomputer

Over 1015 Floating Point Operations per

second

Nearly 50000 cores

1008 TB of main memory 2 PB of disk storage

70 distributed through PRACE

24 distributed through RES

6 for BSC-CNS use

Over 1015 Floating Point Operations per second

Nearly 50000 cores

1008 TB of main memory

2 PB of disk storage

Agenda

1 Intro on Hadoop

performance

1 Current scenario and

problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking

1 Benchmarking workflow

2 DEMO

4 Results

1 HW and SW speedups

2 CostPerformance

3 Online results DEMO

5 Predictive Analytics and

learning

6 Future lines and conclusions

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 4: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Agenda

1 Intro on Hadoop

performance

1 Current scenario and

problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking

1 Benchmarking workflow

2 DEMO

4 Results

1 HW and SW speedups

2 CostPerformance

3 Online results DEMO

5 Predictive Analytics and

learning

6 Future lines and conclusions

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 5: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 6: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 7: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 8: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 9: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 10: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 11: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 12: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 13: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 14: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages

ndash No one-fits-all solution

Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 15: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 16: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 17: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 18: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 19: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 20: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 21: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 22: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 23: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 24: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 25: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 26: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk

800s 3 containers Local disk

600s 2 containers Remote disk

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 27: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Comparing 3 runs on same cluster different configs

CPU utilization 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

Moderate iowait

Higher iowait

Very high iowait

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 28: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Comparing 3 runs on same cluster different configs

CPU queues 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

1 blocked process

4 blocked processes

4 blocked processes (map phase)

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 29: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Comparing 3 runs on same cluster different configs

CPU context switches 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 30: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 31: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 32: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 33: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 34: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Clusters by cost-effectiveness 22

URL httpalojabscesclustercosteffectiveness

Fastest Exec Cheapest exec

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs

bull RL-33 = 8 performance2-30 VMs

bull RL-30 = 8 io1-30 VMs

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 35: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 36: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 37: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 38: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 39: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 40: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH

IOFBuf=128KB rArr 1248s IOFBuf=64KB rArr 1256s Net=IB

IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 41: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 42: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 43: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Page 44: Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces