Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015

wwwbsces

Automating Big Data Benchmarking

and Performance Analysis with ALOJA

October 2015

Nicolas Poggi Senior Researcher

Barcelona Supercomputing Center (BSC)

Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed

systems research ndash Based at the Technical University of Catalonia (UPC)

Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications

Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein

Interactions amp Docking

In place computational capabilities ndash Mare Nostrum Super Computer

Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA

bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc

BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams

ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures

ndash Open model bull No patents public IP publications and open source main focus

The MareNostrum 3 Supercomputer

Over 1015 Floating Point Operations per

second

Nearly 50000 cores

1008 TB of main memory 2 PB of disk storage

70 distributed through PRACE

24 distributed through RES

6 for BSC-CNS use

Over 1015 Floating Point Operations per second

Nearly 50000 cores

1008 TB of main memory

2 PB of disk storage

Agenda

1 Intro on Hadoop

performance

1 Current scenario and

problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking

1 Benchmarking workflow

2 DEMO

4 Results

1 HW and SW speedups

2 CostPerformance

3 Online results DEMO

5 Predictive Analytics and

learning

6 Future lines and conclusions

Intro Hadoop performance and ecosystem

Hadoop design

Hadoop was designed to solve complex data ndash Structured and non structured

ndash with [close to] linear scalability

ndash and application reliability

Simplifying the programming model ndash From MPI OpenMP CUDA hellip

Operating as a blackbox for data analysts buthellip ndash Complex runtime for admins

ndash YARN abstracts even more

Image source Hadoop the definitive guide

Hadoop highly-scalable buthellip

Not a high-performance solution

Requires

ndash Design

bull Clusters topology clusters

ndash Setup

bull OS Hadoop config

ndash Fine tuning required

bull Iterative approach

bull Time consuming

and extensive benchmarking

Setting up your Big Data system

Hadoop

ndash gt 100+ tunable parameters

ndash obscure and interrelated

bull mapredmapreducetasksspeculativeexecution

bull iosortmb 100 (300)

bull iosortrecordpercent 5 (15)

bull iosortspillpercent 80 (95 ndash 100)

ndash Similar for Hive Spark HBase

Dominated by rules-of-thumb

ndash Number of containers in parallel

bull 05 - 2 per CPU core

Large stack for tuning

Image source Intelreg Distribution for Apache Hadoop

Product claims on performance and TCO

Eco-system is not transparent

ndash Needs auditing

How do I set my system too many options

Default values in Apache source not ideal

Large and spread eco system

ndash Different distributions

ndash Product claims

Each job is different

ndash No one-fits-all solution

Cloud vs On-premise

ndash IaaS

bull Tens of different VMs to choose

ndash PaaS

bull HDInsight CloudBigData EMR

New economic HW

ndash SSDs InfiniBand Networking

The ALOJA project research lines and challenges

BSCrsquos project ALOJA towards cost-effective Big Data

Open research project for improving the cost-effectiveness

of Big Data deployments

Benchmarking and Analysis tools

Online repository and largest Big Data repo

ndash 50000+ runs of HiBench TPC-H and [some] BigBench

ndash Over 100 HW configurations tested bull Of dif ferent NodeVM disks and networks

bull Cloud Multi-cloud provider including both IaaS and PaaS

bull On-premise High-end HPC commodity low-power

Community ndash Collaborations with industry and Academia

ndash Presented in different conferences and workshops

ndash Visibility 47 different countries

httpalojabsces

Big Data Benchmarking

Online Repository

Web

Analytics

ALOJA research lines broad coverage

Techniques for obtaining CostPerformance Insights

Profiling

bull HPC Low-level

bull High Accuracy

bull Manual Analysis

Benchmarking

bull Iterate configs

bull HW and SW

bull Real executions

bull Log parsing and data sanitization

Analysis tools

bull Summarize large number of results

bull By criteria

bull Filter noise

bull Fast processing

Predictive Analytics

bull Automated modeling

bull Estimations

bull Virtual executions

bull Automated KD

Big Data Apps

Frameworks

Systems Clusters

Cloud ProvidersDatacenters

Evaluation of

Test different clusters and architectures ndash On-premise

bull Commodity high-end appliance low-power

ndash Cloud IaaS bull 32 different VMs in Azure

similar in other providers

ndash Cloud PaaS bull HDInsight EMR CloudBigData

Different access level ndash Full admin user-only request-

to-install everything ready queuing systems (SGE)

Different versions ndash Hadoop JVM Spark Hive

etchellip

ndash Other benchmarks

Problems ndash All systems though for PROD

bull Not for comparison

ndash No Azure support

ndash Many different packages


Dev environments and testing

ndash Big Data usually requires a cluster to develop and test

Solution ndash Custom implementation

ndash Based in simple components

ndash Wrapping commands

Challenges (circa end 2013)

Benchmarking with ALOJArsquos open source tools

ALOJA Platform main components

2 Online Repository

bullExplore results

bullExecution details

bullCluster details

bullCosts

bullData sharing

3 Web Analytics

bullData views and evaluations

bullAggregates

bullAbstracted Metrics

bullJob characterization

bullMachine Learning

bullPredictions and clustering

1 Big Data Benchmarking

bullDeploy amp Provision

bullConf Management

bullParameter selection amp Queuing

bullPerf counters

bullLow-level instrumentation

bullApp logs

19

NGINX PHP MySQL

BASH Unix tools CLIs R SQL JS

Extending and collaborating in ALOJA

1 Install prerequisites ndash git vagrant VirtualBox

2 git clone httpsgithubcomAlojaalojagit

3 cd aloja

4 vagrant up

5 Open your browser at httplocalhost8080

6 Optional start the benchmarking cluster

vagrant up

Setting up a DEV environment

Installs a Web Server with sample data

Sets a local cluster to test benchmarking

Workflow in ALOJA

Cluster(s) definition

bull VM sizes

bull nodes

bull OS disks bull Capabilities

Execution plan

bull Start cluster

bull Setup

bull Exec Benchmarks

bull Cleanup

Import data

bull Convert perf metric

bull Parse logs

bull Import into DB

Evaluate data

bull Data views in Vagrant VM

bull Or httpalojabsces

PA and KD

bullPredictive Analytics

bullKnowledge Discovery

Historic

Repo

Commands and providers

Provisioning commands Providers

Connect

ndash Node and Cluster

ndash Builds SSH cmd line

bull SSH proxies

Deploy ndash Creates a cluster

ndash Sets SSH credentials

ndash If created updates config as needed

ndash If stopped starts nodes

Start Stop

Delete

Queue jobs to clusters

On-premise

ndash Custom settings for

clusters

bull Multiple disk types

bull Different architectures

Cloud IaaS

ndash Azure OpenStack

Rackspace AWS (testing)

Cloud PaaS

ndash HDInsight CloudBigData

EMR soon

Code at httpsgithubcomAlojaalojatreemasteraloja-deploy

Cluster and nodes definitions multi-provider abstraction

Steps to define a cluster Import defaults (if any) ndash Sets OS version

Select provider ndash Azure RackSpace AWS On-

premise vagranthellip

Name the cluster and size

Optional ndash Select VM type

ndash Attached disks

ndash Define metadata

ndash And costs

Nodes can also be defined ndash For Web share folders etc

You can logically split clusters

Azure 8-datanode sample load AZURE defaults

source $CONF_DIRcluster_defaultsconf

clusterName=azure-large-8

numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs

clusterCostHour=1584 in USD

clusterType=IaaS

Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf

Running benchmarks in ALOJA

Benchmarking with defaults

repo_locationaloja-benchrun_benchssh

To queue jobs

repo_locationshellexeqsh

Code at httpsgithubcomAlojaalojablobmasteraloja-benchrun_benchssh

Testing different configurations

Approaches

1 Config folders

2 Override variables

1 In benchmark_defaultsconf

2 In cluster config

3 Cmd line

1 Via parameters

run_benchssh -r 2 -m 10

1 Via shell globals HADOOP_VERSION=hadoop-271

BENCH_DATA_SIZE=1TB

Things to look for HW OS ndash Versions

ndash Disk config and mounts

SW ndash Replication

ndash Block sizes

ndash Compression

ndash IO buffers

Build your exec plan in a script and queue

Or follow ML recommendations

ALOJA-WEB

Entry point for explore the results collected from the executions ndash Provides insights on the obtained results through continuously evolving data views

Online DEMO at httpalojabsces

Online benchmarking results

28

2) ALOJA-WEB Online Repository

Entry point for explore the results collected from the executions

ndash Index of executions bull Quick glance of executions

bull Searchable Sortable

ndash Execution details bull Performance charts and histograms

bull Hadoop counters

bull Jobs and task details

Data management of benchmark executions ndash Data importing from different clusters ndash Execution validation ndash Data management and backup

Cluster definitions ndash Cluster capabilities (resources) ndash Cluster costs

Sharing results ndash Download executions ndash Add external executions

Documentation and References ndash Papers links and feature documentation

Available at httphadoopbsces

Comparing 3 runs on same cluster different configs

Mappers and reducers 48-node cluster

URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104

400s 2 containers Local disk


600s 2 containers Remote disk


CPU utilization 48-node cluster


Moderate iowait

Higher iowait

Very high iowait


CPU queues 48-node cluster


1 blocked process

4 blocked processes

4 blocked processes (map phase)


CPU context switches 48-node cluster


High context switches with 3

containers on a 2-core VM

Impact of SW configurations in Speedup (4 node clusters)

Number of mappers Compression algorithm

No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m

Speedup (higher is better)

Results using httphadoopbscesconfigimprovement

Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf

Impact of HW configurations in Speedup

Disks and Network Cloud remote volumes

Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB




VM Size comparison (Azure)

Lower is better

Clusters by cost-effectiveness 12

URL httpalojabscesclustercosteffectiveness

bull Cluster ID reference

bull RL-06 = 8 performance1-8 VMs

bull RL-16 = 8 general1-8 VMs

bull RL-19 = 8 io1-15 VMs



Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30



Fastest Exec Cheapest exec







CostPerformance Scalability of cluster size

This shows a sample of a new screen (with sample data) to find the most cost-effective cluster size ndash X axis number of datanodes (cluster size

ndash Left Y Execution time (lower is better)

ndash Right Y Execution cost

Execution time Execution cost

Recommended size

Predictive Analytics and automated learning

Modeling Hadoop ndash Methodology

Methodology ndash 3-step learning process

ndash Different split sizes tested (10 le training le 50)

ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks

Learning results ndash Mean Absolute Errors ~250s (ranges in [100s 6000s])

ndash Relative Absolute Errors between [010 025]

bull Depend on benchmark and of examples per benchmark

bull Some executions aremay be anomalies

40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model

Tune algorithm re-train

NO

YES

Use case 1 Anomaly Detection

Anomaly Detection

ndash Model-based detection procedure

ndash Pass executions through the model

ndash Executions not fitting the model are considered ldquoout of the systemrdquo

Anomaly detection procedure Sample view from site

Use case 2 Guided Benchmarking ndash Method

Guided Benchmarking

ndash Best subset of configurations for modeling a Hadoop deployment

ndash Clustering to get the ldquorepresentative executionrdquo for each similar subset

of executions

ALOJA

Data-Set

Increase number of centers

NO

YES Clustering

Data-set

(centers) Model Is error

OK

Configs to

execute

Model Build

Build

Test

Reference

42

Use case 3 Knowledge Discovery

Make analyzing results easier

ndash Multi-variable visualization

ndash Trees separating relevant attributes

ndash Other interesting tools

43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH

IOFBuf=128KB rArr 2935s IOFBuf=64KB rArr 2942s Net=IB

IOFBuf=128KB rArr 3118s IOFBuf=64KB rArr 3125s Disk=SSD

Net=ETH


IOFBuf=128KB rArr 1233s IOFBuf=64KB rArr 124s1

Concluding remarks and reference

Concluding remarks

Benchmarking its fun or at leasthellip

ndash It will save you euroeuroeuro and allow you to scale

But it is also tough ndash The industry needs more transparency

ndash We still have a lot to dohellip

In ALOJA we provide the benchmarking scripts ndash And also de results that should be your first entry point

We are adding constantly new features ndash Benchmarks systems providers

It is an open initiate your invited to participate ndash Beta testers ndash Contributors

With predictive analytics we can automate and find tendencies faster ndash Especially to save in benchmarking costs and time

Find us around the conference for more details on the toolshellip

Fork our repo at httpsgithubcomAlojaaloja

ne

More info

ALOJA Benchmarking platform and online repository ndash httpalojabsces httpalojabscespublications

Benchmarking Big Data ndash httpwwwslidesharenetni_pobenchmarking-hadoop

BDOOP meetup group in Barcelona

Big Data Benchmarking Community (BDBC) mailing list ndash (~200 members from ~80organizations)

ndash httpcldssdscedubdbccommunity

Workshop Big Data Benchmarking (WBDB) ndash Next httpcldssdsceduwbdb2015ca

SPEC Research Big Data working group ndash httpresearchspecorgworking-groupsbig-data-working-

grouphtml

Slides and video ndash Michael Frank on Big Data benchmarking

bull httpwwwtele-taskdearchivepodcast20430

ndash Tilmann Rabl Big Data Benchmarking Tutorial bull httpwwwslidesharenettilmann_rablieee2014-tutorialbarurabl

wwwbsces

QampA

Thanks

Contact nicolaspoggibsces

Barcelona Supercomputing Center (BSC)

Spanish national supercomputing center ndash 22 year history in Computer Architecture networking and distributed

systems research ndash Based at the Technical University of Catalonia (UPC)

Led by Mateo Valero ndash ACM fellow Eckert-Mauchly award 2007 Goode award 2009 ndash Active research staff with 1000+ publications

Large ongoing life science computational projects ndash Computational Genomics Molecular modeling amp Bioinformatics Protein

Interactions amp Docking

In place computational capabilities ndash Mare Nostrum Super Computer

Prominent body of research activity around Hadoop since 2008 ndash Previous to ALOJA

bull SLA-driven scheduling (Adaptive Scheduler) in memory caching etc

BSC-MSRS Centre ndash Long-term relationship between BSC Microsoft Research product teams

ndash ALOJA is the latest phase of the engagement to explore cost-efficient upcoming Big Data architectures

ndash Open model bull No patents public IP publications and open source main focus



second

Nearly 50000 cores




6 for BSC-CNS use


Nearly 50000 cores



Agenda

1 Intro on Hadoop

performance


problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking


2 DEMO

4 Results


2 CostPerformance



learning



Hadoop design










Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




second

Nearly 50000 cores




6 for BSC-CNS use


Nearly 50000 cores



Agenda

1 Intro on Hadoop

performance


problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking


2 DEMO

4 Results


2 CostPerformance



learning



Hadoop design










Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


Agenda

1 Intro on Hadoop

performance


problematic

2 ALOJA project

1 Background

2 Open source tools

3 Benchmarking


2 DEMO

4 Results


2 CostPerformance



learning



Hadoop design










Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Hadoop design










Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


Hadoop design










Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




Requires

ndash Design


ndash Setup




bull Time consuming



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Hadoop























Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks












Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks









Cloud vs On-premise

ndash IaaS


ndash PaaS


New economic HW















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks















httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks














httpalojabsces


Online Repository

Web

Analytics



Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




Profiling

bull HPC Low-level

bull High Accuracy


Benchmarking


bull HW and SW



Analysis tools


bull By criteria

bull Filter noise




bull Estimations


bull Automated KD

Big Data Apps

Frameworks

Systems Clusters


Evaluation of









etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks










etchellip















2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



2 Online Repository

bullExplore results


bullCluster details

bullCosts

bullData sharing

3 Web Analytics


bullAggregates







bullConf Management


bullPerf counters


bullApp logs

19

NGINX PHP MySQL





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks





3 cd aloja

4 vagrant up



vagrant up




Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


Workflow in ALOJA


bull VM sizes

bull nodes


Execution plan

bull Start cluster

bull Setup


bull Cleanup

Import data


bull Parse logs

bull Import into DB

Evaluate data



PA and KD



Historic

Repo



Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




Connect



bull SSH proxies





Start Stop

Delete


On-premise


clusters



Cloud IaaS



Cloud PaaS


EMR soon










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks










ndash And costs






numberOfNodes=8

vmSize=Large

attachedVolumes=3

diskSize=1024 in GB

details

vmCores=4

vmRAM=7 in GB

costs


clusterType=IaaS





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks





To queue jobs




Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Approaches

1 Config folders



2 In cluster config

3 Cmd line

1 Via parameters



BENCH_DATA_SIZE=1TB




ndash Block sizes

ndash Compression

ndash IO buffers



ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


ALOJA-WEB




28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


28






















Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks











Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks





Moderate iowait

Higher iowait

Very high iowait




1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks





1 blocked process

4 blocked processes









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks









No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




No comp

ZLIB

BZIP2

snappy

4m

6m

8m

10m






Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks




Local only

1 Remote

2 Remotes

3 Remotes

3 Remotes tmp local

2 Remotes tmp local

1 Remotes tmp local

HDD-ETH

HDD-IB

SSD-ETH

SDD-IB





Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Lower is better









Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks










Performance2-30

Io1-30

Io1-15

General1-8

Performance1-8

Io1-30















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks
















Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks







Recommended size










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks











40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks










40

ALOJA

Data-Set

Training

Validation

Testing

Model Select this

model

Final

Model Train

Test the model

Test the model


NO

YES


Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Anomaly Detection






Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Guided Benchmarking



of executions

ALOJA

Data-Set


NO

YES Clustering

Data-set


OK

Configs to

execute

Model Build

Build

Test

Reference

42






43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks







43

pred_time

HDD SSD

Tree Descriptor

Disk=HDD

Net=ETH



Net=ETH




Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks



Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


Concluding remarks











ne

More info








grouphtml




wwwbsces

QampA

Thanks


More info








grouphtml




wwwbsces

QampA

Thanks


wwwbsces

QampA

Thanks


Technology

Automating Big Data benchmarking and performance analysis with Aloja by Nicolas Poggi at Big Data Spain 2015